MainframeSupport tip about The difference between correlated and noncorrelated subselects

MainframeSupports
tip week 27/2007:

One of the subjects that has been and still is discussed is the difference between correlated and noncorrelated subselects. I have seen a lot of correlated subselects used where a noncorrelated subselect would have been better. I will look into one of these discoveries in this tip.

You will be able to find SQL statements similar to the following SQL statement on almost any DB2 installation:

SELECT columna, columnb FROM tablex x WHERE keycolumn1 = :key AND keycolumn2 = ( SELECT MAX(keycolumn2) FROM tablex WHERE keycolumn1 = x.keycolumn1 )

Keycolumn1 may be a customer number or an account number. Keycolumn2 may then be some kind of versioning maybe a timestamp or some kind of ever increasing number. The subselect in the example is a correlated subselect and therefore it is executed once for every row matching the value of :key in the outer select. If you assume that the outer select matches 10 rows, then DB2 will carry out the subselect 10 times even though keycolumn1 has the same value. Db2 normally caches the result of a correlated subselect, so in this case DB2 will only carry out the data access in the subselect once and the other 9 times DB2 will recognize the same value for keycolumn1 and return the corresponding result. Despite the cache it is possible to optimize the SQL statement considerably using the following small change:

SELECT columna, columnb FROM tablex x WHERE keycolumn1 = :key AND keycolumn2 = ( SELECT MAX(keycolumn2) FROM tablex WHERE keycolumn1 = :key )

This SQL statement will produce exactly the same result as the previous one, but now the subselect is noncorrelated. Firstly DB2 executes the subselect once for the value of :key. Secondly DB2 will execute the outer select using the value of :key and the value returned by the subselect. If there is only one row in tablex that matches the value of :key, the two SQL statements will be equally efficient, but as soon as more than one row matches the value of :key the noncorrelated subselect becomes the most efficient. Normally a composite index on keycolumn1 and keycolumn2 is defined and therefore DB2 for the outer select will be able to use MATCHCOLS=2 in the noncorrelated example and only MATCHCOLS=1 in the correlated example.

Previous tip in english Sidste danske tip Tip list

MainframeSupportstip week 27/2007:

MainframeSupports
tip week 27/2007: