For many years the syntax of SQL has offered nested table expressions. I have often used this facility to create GROUP BY on expressions, see week 11/2002. Recently I discovered another usage which gives better performance for SQL statements combining GROUP BY with HAVING and joins. I will illustrate the idea using an example. At first I present a statement before optimization:
This statement retrieves all DB2 tables in the DB2 catalog having more than five columns defined as integers and shows you the number of rows in the table and and the number of INTEGER columns. The good things about this SQL statement are that you can actually execute it at your installation and that it is a perfect illustation of what this tip is all about. DB2 will carry out this statement by retrieving all rows in SYSIBM.SYSCOLUMNS that fulfills the WHERE clause and then for each row make a join with SYSIBM.SYSTABLES. Finally all the remaining rows will be processed by the GROUP BY part and selected according to the HAVING clause.
The following SQL statement will produce exactly the same result:
With a high propality DB2 will carry out this SQL statement starting with the SELECT against SYSIBM.SYSCOLUMNS and retrieving all rows containing tables with more than five INTEGER columns. Each of these rows will be joined with SYSIBM.SYSTABLES. The big difference compared to the first example is that the number of joins are dramatically reduced. In the first example a join for every INTEGER column in the DB2 catalog will be carried out while in the last example only one join for each table containing at least five integer columns will be carried out.
The hard part is to locate some SQL statements that looks like the first example. These SQL statements must have at least one join and a GROUP BY with a HAVING clause. Without the HAVING clause this tip has no effect, because you will eliminate the rows in the WHERE clause. The trick is to avoid joins for rows that are eliminated as part of the HAVING clause.