I would like to write a little post about deleteing duplicates from a table.
Is is a wide issue and there are many different possibilities for doing that.
The most easy one is when there is a primary key defined in the table.
CREATE TABLE [dbo].[TableDuplicatesA] ([ID] [int] IDENTITY(1,1) NOT NULL,
[Key] [varchar](50) NOT NULL
CONSTRAINT [pk_dbo_TableDuplicatesA] PRIMARY KEY CLUSTERED
([ID] ASC)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyA')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyB')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyB')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyC')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyD')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyD')
This delete is very simple and clear:
DELETE FROM [dbo].[TableDuplicatesA]
WHERE [ID] IN (SELECT max([ID])
FROM [dbo].[TableDuplicatesA]
GROUP BY [Key]
HAVING count([Key]) > 1)
It gets more tricky where there is no primary key defined.
CREATE TABLE [dbo].[TableDuplicatesB] ([Key] [varchar](50) NOT NULL) ON [PRIMARY]
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyA')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyB')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyB')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyC')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyD')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyD')
Here are some samples:
Probably the worst example is this here:
SELECT DISTINCT [Key] INTO #temp FROM [dbo].[TableDuplicatesB] TRUNCATE TABLE [dbo].[TableDuplicatesB] INSERT INTO [dbo].[TableDuplicatesB] SELECT * FROM #temp SELECT * FROM [dbo].[TableDuplicatesA] DROP TABLE #temp
Better and elegant are the following three examples:
DELETE t1
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Key] ORDER BY [Key] DESC) [RowID]
FROM [dbo].[TableDuplicatesB]) t1
WHERE [RowID] > 1
WITH CTE_Duplicates
AS (SELECT ROW_NUMBER() OVER (PARTITION BY [Key] ORDER BY (SELECT 0)) [RowNumber]
FROM [dbo].[TableDuplicatesB])
DELETE FROM CTE_Duplicates
WHERE [RowNumber] > 1
DELETE t1
FROM (SELECT ROW_NUMBER() OVER (ORDER BY [Key]) AS [RowID],
[Key]
FROM [dbo].[TableDuplicatesB]) t1,
(SELECT ROW_NUMBER() OVER (ORDER BY [Key]) AS [RowID],
[Key]
FROM [dbo].[TableDuplicatesB]) t2
WHERE t1.[Key] = t2.[Key]
AND t1.[RowID] > t2.[RowID]
Based on the execution plan you can get an idea and decide yourselves which Query fits best for you.



Leave A Comment
You must be logged in to post a comment.