Delete duplicates from table

I would like to write a little post about deleteing duplicates from a table.
Is is a wide issue and there are many different possibilities for doing that.

The most easy one is when there is a primary key defined in the table.

CREATE TABLE [dbo].[TableDuplicatesA] ([ID]  [int]           IDENTITY(1,1) NOT NULL,
                                       [Key] [varchar](50)                 NOT NULL
CONSTRAINT [pk_dbo_TableDuplicatesA] PRIMARY KEY CLUSTERED 
([ID] ASC)
WITH (PAD_INDEX               = OFF,
      STATISTICS_NORECOMPUTE  = OFF,
	  IGNORE_DUP_KEY          = OFF,
	  ALLOW_ROW_LOCKS         = ON,
	  ALLOW_PAGE_LOCKS        = ON) ON [PRIMARY]) ON [PRIMARY]
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyA')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyB')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyB')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyC')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyD')
INSERT INTO [dbo].[TableDuplicatesA] ([Key]) VALUES ('KeyD')

This delete is very simple and clear:

DELETE FROM [dbo].[TableDuplicatesA]
WHERE [ID] IN (SELECT max([ID])
               FROM [dbo].[TableDuplicatesA]
               GROUP BY [Key]
               HAVING count([Key]) > 1)

It gets more tricky where there is no primary key defined.

CREATE TABLE [dbo].[TableDuplicatesB] ([Key] [varchar](50) NOT NULL) ON [PRIMARY]
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyA')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyB')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyB')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyC')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyD')
INSERT INTO [dbo].[TableDuplicatesB] ([Key]) VALUES ('KeyD')

Here are some samples:

Probably the worst example is this here:

SELECT DISTINCT [Key]
INTO #temp
FROM [dbo].[TableDuplicatesB]
TRUNCATE TABLE [dbo].[TableDuplicatesB]
INSERT INTO [dbo].[TableDuplicatesB]
SELECT * FROM #temp
SELECT * FROM [dbo].[TableDuplicatesA]
DROP TABLE #temp

Better and elegant are the following three examples:

DELETE t1
FROM (SELECT *, 
       ROW_NUMBER() OVER (PARTITION BY [Key] ORDER BY [Key] DESC) [RowID] 
FROM [dbo].[TableDuplicatesB]) t1
WHERE [RowID] > 1

WITH CTE_Duplicates
AS (SELECT ROW_NUMBER() OVER (PARTITION BY [Key] ORDER BY (SELECT 0)) [RowNumber]
    FROM [dbo].[TableDuplicatesB])
DELETE FROM CTE_Duplicates
WHERE [RowNumber] > 1

DELETE t1
FROM (SELECT ROW_NUMBER() OVER (ORDER BY [Key]) AS [RowID],
             [Key]
      FROM [dbo].[TableDuplicatesB]) t1,
	 (SELECT ROW_NUMBER() OVER (ORDER BY [Key]) AS [RowID],
	         [Key]
      FROM [dbo].[TableDuplicatesB]) t2
WHERE t1.[Key]   = t2.[Key]
AND   t1.[RowID] > t2.[RowID]

Based on the execution plan you can get an idea and decide yourselves which Query fits best for you.

Wir entwickeln für Sie

Delete duplicates from table

About the Author: Frank Uray

How to JOIN

Leave A Comment

Delete duplicates from table

Share This Story, Choose Your Platform!

About the Author: Frank Uray

Related Posts

How to JOIN

Leave A Comment