While you hopefully never have to work with bad data, you might still have to find duplicates in a table. Let's say we have a table like this and would like to find duplicate rows.
The key here is
HAVING, an often unkown SQL condition. It behaves a lot like
WHERE, but it's executed after
GROUP BY has applied while
WHERE would be executed before.
The following query would show you one of the duplicate rows:
SELECT id FROM test GROUP BY name HAVING COUNT(*) > 1
The query above has one problem, if you run MySQL in strict mode, you'll have a problem because it isn't clear which row is returned to show the value of
id. By grouping the records, we are combining two into one. MySQL not running in strict mode is able to execute the query above and will simply return a row, but it's hard to predict which one.
Let's use an aggregation to avoid that and always get the smallest duplicate id:
SELECT min(id) FROM test GROUP BY name HAVING COUNT(*) > 1
There's a nice aggregation function which allows you to see all duplicate id's in a single cell:
SELECT group_concat(id) FROM test GROUP BY name HAVING COUNT(*) > 1