SQL

Find duplicate table records


Problem

While you hopefully never have to work with bad data, you might still have to find duplicates in a table. Let's say we have a table like this and would like to find duplicate rows.

id name
1 Bob
2 Joe
3 Bob


Solution

The key here is HAVING, an often unkown SQL condition. It behaves a lot like WHERE, but it's executed after GROUP BY has applied while WHERE would be executed before.

The following query would show you one of the duplicate rows:

SELECT id FROM  test GROUP BY name HAVING COUNT(*) > 1

The query above has one problem, if you run MySQL in strict mode, you'll have a problem because it isn't clear which row is returned to show the value of id. By grouping the records, we are combining two into one. MySQL not running in strict mode is able to execute the query above and will simply return a row, but it's hard to predict which one.

Let's use an aggregation to avoid that and always get the smallest duplicate id:

SELECT min(id) FROM  test GROUP BY name HAVING COUNT(*) > 1

There's a nice aggregation function which allows you to see all duplicate id's in a single cell:

SELECT group_concat(id) FROM  test GROUP BY name HAVING COUNT(*) > 1

Comments


Gi Tios

1 year ago

yeah!

Leave a Comment





Please sign-in to post a comment