Problem
While you hopefully never have to work with bad data, you might still have to find duplicates in a table. Let's say we have a table like this and would like to find duplicate rows.
id | name |
---|---|
1 | Bob |
2 | Joe |
3 | Bob |
Solution
The key here is HAVING
, an often unkown SQL condition. It behaves a lot like WHERE
, but it's executed after GROUP BY
has applied while WHERE
would be executed before.
The following query would show you one of the duplicate rows:
SELECT id FROM test GROUP BY name HAVING COUNT(*) > 1
The query above has one problem, if you run MySQL in strict mode, you'll have a problem because it isn't clear which row is returned to show the value of id
. By grouping the records, we are combining two into one. MySQL not running in strict mode is able to execute the query above and will simply return a row, but it's hard to predict which one.
Let's use an aggregation to avoid that and always get the smallest duplicate id:
SELECT min(id) FROM test GROUP BY name HAVING COUNT(*) > 1
There's a nice aggregation function which allows you to see all duplicate id's in a single cell:
SELECT group_concat(id) FROM test GROUP BY name HAVING COUNT(*) > 1
Comments
Gi Tios
yeah!
Leave a Comment