MainframeSupports
tip week 34/2011:

I am not and has never been happy about DB2 when it comes to the lack of control over how the optimizer accesses data. I have in various tips described a few possibilities on how to improve the performance of your SQL, but they are all based on knowledge about how the optimizer works. I have once described OPTIMIZE FOR, which gives you some degree of control over how DB2 should access data.

In this tip I will describe a fully legal method of controlling which way DB2 should perform a join. The method is fully documented, and has been available for several years. As you can read from the link the method is mentioned in connection with index access. It can be used to force DB2 to use another index than the one chosen by the optimizer. How to use the method in connection with joins I will show you using an example which you can carry out on any DB2 system on the mainframe. The following SQL will most likely start by accessing SYSIBM.SYSTABLES and afterwards SYSIBM.SYSCOLUMNS:

SELECT *
FROM SYSIBM.SYSTABLES T
   , SYSIBM.SYSCOLUMNS C
WHERE T.CREATOR = C.TBCREATOR
  AND T.NAME = C.TBNAME

By coding the following SQL statement instead the optimizer is forced to start in SYSIBM.SYSCOLUMNS and afterwards access SYSIBM.SYSTABLES:

SELECT *
FROM SYSIBM.SYSTABLES T
   , SYSIBM.SYSCOLUMNS C
WHERE T.CREATOR = C.TBCREATOR !! ''
  AND T.NAME = C.TBNAME

The first statement will choose SYSTABLES as the first table because it has fewer rows and because there is an index on TBCREATOR in SYSCOLUMNS. The expression C.TBCREATOR !! '' prevents the optimizer from choosing the index on TBCREATOR, but as there is an index on CREATOR in SYSTABLES then DB2 chooses to access SYSCOLUMNS as the first table instead. This example is very simple and just illustrates how the method works. If you are not convinced you can run an EXPLAIN on both statements and see the difference. If the join columns are a numeric datatype you must use NUMCOL + 0 instead to prevent an index from being used.

I have known this trick for years and used it successfully several times. Some of my colleagues has also abused it (after watching my successes) with quite unfortunate results. I must clearly warn you about using the tip without close examination. The optimizer chooses the best access path in most cases. It is only when you know that it will be better to use another access path that you should use the method in this tip. And please remember to run an EXPLAIN to review that the new access path is the one you want.

Let me return to the example. It will be a bad idea to start by accessing SYSCOLUMNS as it contain a lot more rows than SYSTABLES and thereby it will access the same row in SYSTABLES many times. When DB2 starts by accessing SYSTABLES each row will only be read once and the same will happen in SYSCOLUMNS and therefore the first SQL statement is the most efficient. Just to remind you that you must be careful when you start manipulating your joins.

Previous tip in english        Sidste danske tip        Tip list