MainframeSupports
tip week 14/2003:

Despite all our efforts it sometimes happens that dirty data enters our DB2 tables. On these occations the problem often is that two or more rows has got the same "keyvalue" even though they weren't supposed to. In these cases it can be rather difficult to track down the error that caused the problem and also to identify the duplicates and decide which rows to remove.

You can use the following SQL statement as a model for how to locate rows with the same "keyvalue" and show you the full contens of these rows. You can even control the acceptable number of duplicates for a given "keyvalue". The example will show you all rows in SYSIBM.SYSTABLES where the table name is the same for more than one row.

SELECT tab.*
FROM sysibm.systables tab
   , ( SELECT name
       FROM sysibm.systables
       GROUP BY name
       HAVING COUNT(*) > 1
     ) keyvalue
WHERE tab.name = keyvalue.name
ORDER BY tab.creator, tab.name

The trick is to create a "temporary table" with all the interesting "keyvalues" and then join these "keyvalues" with the table containing the duplicates. In most cases you will have a "keyvalue" consisting of more than one column, but the method is the same. The "temporary table" called keyvalue in the example, must contain all the columns that forms the "keyvalue". The same columns must be present in the GROUP BY clause. You must also remember to join all the columns in the "temporary table" with the columns in the real table.

The HAVING COUNT(*) > 1 clause ensures that only rows with more than one "keyvalue" will be selected from the real table. If your interest are in rows without duplicate "keyvalues" you can use HAVING COUNT(*) = 1 instead. Or maybe other rules governs your problematic rows. Now you have seen an easy way of identifying them.

Previous tip in english        Sidste danske tip        Tip list