Data is everywhere, but?: Ignore Duplicates

Saturday, February 23, 2013

Ignore Duplicates

Today I was asked about situation where entire data insertion is failing due to violation of Unique Index. In this scenario, actually they need to ignore the duplicate data and continue with the other inserts.

  1:  CREATE TABLE Name (ID INT IDENTITY PRIMARY KEY CLUSTERED,
  2:  FirstName VARCHAR(100),
  3:  LastName VARCHAR(100))
  4:  GO
  5: 
  6:  CREATE UNIQUE NONCLUSTERED INDEX [IX_Name_FirstName_LastName]
  7:  ON [dbo].Name
  8:  (
  9:  [FirstName] ASC,
 10:  [LastName] ASC
 11:  ) ON [PRIMARY]
 12:  GO

So above script will create a table and unique index on that table.

Lets insert two rows for this table.

  1:  INSERT INTO Name
  2:  VALUES
  3:  ('John','Samuels'),
  4:  ('Steve','John')

Since the above two records does not violate the unique key constraints you will not find any difficulties with the above query and you will have following records in the table.

Then we will insert this set of rows. Highlighted row is the row which will violate the unique constraint.

  1:   INSERT INTO Name
  2:  VALUES
  3:   ('Steve','Smith'),
  4:  ('John','Samuels'),
  5:  ('Steve','Rodes'),
  6:  ('Phill','Samuels')

As expected this will fail with the following error

Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object 'dbo.Name' with unique index 'IX_Name_FirstName_LastName'.
The duplicate key value is (John, Samuels).
The statement has been terminated.

If you query the table you will see that not only the duplicate row but other rows are not in the table.

So the solution is to use IGNORE DUPLICATE option.This is how you create the constraints. Option is highlighted. By default this option is OFF.

  1:  CREATE UNIQUE NONCLUSTERED INDEX [IX_Name_FirstName_LastName]
  2:  ON [dbo].Name
  3:  (
  4:  [FirstName] ASC,
  5:  [LastName] ASC
  6:  ) WITH (IGNORE_DUP_KEY = ON) ON  [PRIMARY]
  7:  GO

When you inserts same data set again, no errors will be generated but this message.

Duplicate key was ignored.

(3 row(s) affected)

So, duplicate record was ignored and no errors were generated.

9 comments:

UnknownFebruary 23, 2013 at 8:38 PM
Paul White had the post awhile ago.
ReplyDelete
Replies
SQL-HellFebruary 24, 2013 at 12:11 PM
If you have to allow duplicates then why the UNIQUE Index?
ReplyDelete
Replies
Dinesh AsankaFebruary 24, 2013 at 6:04 PM

Hi!
Maybe you can use autoincrement field for Primary Key? It provides automatic unique number generation. If you can't do it( database already filled with data for example) you may use "instead of" trigger to verify unique value against Primary Key values.

By Gregory Mashargin @LinkedIN group SQL Server DBA

ReplyDelete
Replies
Dinesh AsankaFebruary 24, 2013 at 6:11 PM
How about the MERGE (MUPSERT) statement?

Also, Oracle has a great way to handle this in 10g and belyond.

By James Williams
@LinkedIN group SQL Server DBA

ReplyDelete
Replies
Dinesh AsankaFebruary 24, 2013 at 6:12 PM
If you have to allow the duplicates then why do you have the UNIQUE index??

By Ibrahim Shaik
@LinkedIN group SQL Server DBA
ReplyDelete
Replies
UnknownFebruary 26, 2013 at 11:47 AM
An unpleasant way to hide a bug within the database codes. Since this is not throwing any error, we will never able to handle the error in the data flow . so I think this should only use under special circumstances and then should get rid of it ASAP to make sure that we don't accidentally hide bugs. Instead of doing this we can introduce a data cleansing or validation step just before the insertion.
ReplyDelete
Replies

Add comment

Translate

Saturday, February 23, 2013

Ignore Duplicates

9 comments: