Saturday, February 23, 2013

Ignore Duplicates

Today I was asked about situation where entire data insertion is failing due to violation of Unique Index. In this scenario, actually they need to ignore the duplicate data and continue with the other inserts.

  1:  CREATE TABLE Name (ID INT IDENTITY PRIMARY KEY CLUSTERED,
  2:  FirstName VARCHAR(100),
  3:  LastName VARCHAR(100))
  4:  GO
  5: 
  6:  CREATE UNIQUE NONCLUSTERED INDEX [IX_Name_FirstName_LastName]
  7:  ON [dbo].Name
  8:  (
  9:  [FirstName] ASC,
 10:  [LastName] ASC
 11:  ) ON [PRIMARY]
 12:  GO

So above script will create a table and unique index on that table.


Lets insert two rows for this table.

  1:  INSERT INTO Name
  2:  VALUES
  3:  ('John','Samuels'),
  4:  ('Steve','John')


Since the above two records does not violate the unique key constraints you will not find any difficulties with the above query and you will have following records in the table.


image


Then we will insert this set of rows. Highlighted row is the row which will violate the unique constraint. 

  1:   INSERT INTO Name
  2:  VALUES
  3:   ('Steve','Smith'),
  4:  ('John','Samuels'),
  5:  ('Steve','Rodes'),
  6:  ('Phill','Samuels')

As expected this will fail with the following error


Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object 'dbo.Name' with unique index 'IX_Name_FirstName_LastName'.
The duplicate key value is (John, Samuels).
The statement has been terminated.


If you query the table you will see that not only the duplicate row but other rows are not in the table.


So the solution is to use IGNORE DUPLICATE option.This is how you create the constraints. Option is highlighted. By default this option is OFF.

  1:  CREATE UNIQUE NONCLUSTERED INDEX [IX_Name_FirstName_LastName]
  2:  ON [dbo].Name
  3:  (
  4:  [FirstName] ASC,
  5:  [LastName] ASC
  6:  ) WITH (IGNORE_DUP_KEY = ON) ON  [PRIMARY]
  7:  GO

When you inserts same data set again, no errors will be generated but this message.


Duplicate key was ignored.


(3 row(s) affected)


So, duplicate record was ignored and no errors were generated. 

9 comments:

  1. Paul White had the post awhile ago.

    ReplyDelete
    Replies
    1. I didn't know. This was something came during a discussion with one of my friends.

      Delete
  2. If you have to allow duplicates then why the UNIQUE Index?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Dear Hell, index not allow duplicate rows, but silently ignored them.


      Delete

  3. Hi!
    Maybe you can use autoincrement field for Primary Key? It provides automatic unique number generation. If you can't do it( database already filled with data for example) you may use "instead of" trigger to verify unique value against Primary Key values.

    By Gregory Mashargin @LinkedIN group SQL Server DBA

    ReplyDelete
  4. How about the MERGE (MUPSERT) statement?

    Also, Oracle has a great way to handle this in 10g and belyond.

    By James Williams
    @LinkedIN group SQL Server DBA

    ReplyDelete
  5. If you have to allow the duplicates then why do you have the UNIQUE index??

    By Ibrahim Shaik
    @LinkedIN group SQL Server DBA

    ReplyDelete
  6. An unpleasant way to hide a bug within the database codes. Since this is not throwing any error, we will never able to handle the error in the data flow . so I think this should only use under special circumstances and then should get rid of it ASAP to make sure that we don't accidentally hide bugs. Instead of doing this we can introduce a data cleansing or validation step just before the insertion.

    ReplyDelete