I would like to write a little post about deleteing duplicates from a table.
Is is a wide issue and there are many different possibilities for doing that.
The most easy one is when there is a primary key defined in the table.
CREATE TABLE [dbo].[TableDuplicatesA] ([ID] [int] IDENTITY(1,1) NOT NULL, [Key] [varchar](50) NOT NULL CONSTRAINT [pk_dbo_TableDuplicatesA] PRIMARY KEY CLUSTERED ([ID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY] INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyA') INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyB') INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyB') INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyC') INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyD') INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyD')
This delete is very simple and clear:
DELETE FROM [dbo].[TableDuplicatesA] WHERE [ID] IN (SELECT max([ID]) FROM [dbo].[TableDuplicatesA] GROUP BY [Key] HAVING count([Key]) > 1)
It gets more tricky where there is no primary key defined.
CREATE TABLE [dbo].[TableDuplicatesB] ([Key] [varchar](50) NOT NULL) ON [PRIMARY] INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyA') INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyB') INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyB') INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyC') INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyD') INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyD')
Here are some samples:
Probably the worst example is this here:
SELECT DISTINCT [Key] INTO #temp FROM [dbo].[TableDuplicatesB] TRUNCATE TABLE [dbo].[TableDuplicatesB] INSERT INTO [dbo].[TableDuplicatesB] SELECT * FROM #temp SELECT * FROM [dbo].[TableDuplicatesA] DROP TABLE #temp
Better and elegant are the following three examples:
DELETE t1 FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY [Key] ORDER BY [Key] DESC) [RowID] FROM [dbo].[TableDuplicatesB]) t1 WHERE [RowID] > 1
WITH CTE_Duplicates AS (SELECT ROW_NUMBER() OVER (PARTITION BY [Key] ORDER BY (SELECT 0)) [RowNumber] FROM [dbo].[TableDuplicatesB]) DELETE FROM CTE_Duplicates WHERE [RowNumber] > 1
DELETE t1 FROM (SELECT ROW_NUMBER() OVER (ORDER BY [Key]) AS [RowID], [Key] FROM [dbo].[TableDuplicatesB]) t1, (SELECT ROW_NUMBER() OVER (ORDER BY [Key]) AS [RowID], [Key] FROM [dbo].[TableDuplicatesB]) t2 WHERE t1.[Key] = t2.[Key] AND t1.[RowID] > t2.[RowID]
Based on the execution plan you can get an idea and decide yourselves which Query fits best for you.
Leave A Comment
You must be logged in to post a comment.