MainframeSupports
tip week 11/2002:

Since version 4 of DB2 it has been possible to make GROUP BY on expressions. I will show you how in the following examples. Let's start with the general approach:

SELECT EXP_RESULT, COUNT(*)
FROM ( SELECT <EXPRESSION> AS EXP_RESULT
       FROM <TABLE>
       WHERE <SOMETHING>
     ) AS TEMP_TABLE
GROUP BY EXP_RESULT

In this SQL statement you replace <EXPRESSION> with the expression, you want your GROUP BY to operate on. Replace <TABLE> with the table or tables holding the data that makes up your expression and replace <SOMETHING> with a WHERE clause suitable for your needs. If you want to use a column function on another column in the result set you must include the column in the inner select, but the column function must only be done in the outer select.

And now to an example from an installation where the system name is a part of the table name, namely the first three characters. On this installation they want to know how many tables there are in each system and they also want to know an estimate of the number of rows of data for each system. They only want data from their own production tables. The result can be retrieved like this:

SELECT TAB.SYSTEM_NAME
     , COUNT(*) AS TABLE_COUNT
     , SUM(TAB.ROW_COUNT) AS ROW_COUNT
FROM ( SELECT SUBSTR(NAME, 1, 3) AS SYSTEM_NAME
            , CARD AS ROW_COUNT
       FROM SYSIBM.SYSTABLES
       WHERE CREATOR = 'PROD'
     ) AS TAB
GROUP BY TAB.SYSTEM_NAME

Before you start using this method for grouping data on expressions on large tables, it is a good idea to run an EXPLAIN to find out what access path DB2 will choose to satisfy the request. It is very likely that DB2 will create a temporary table to hold the result of the inner select, because there is an expression in the select list of the inner select. DB2 will carry out the GROUP BY on this temporary table. If the temporary table must hold many million rows of data it is possible that your SQL statement won't be finished within an acceptable period of time. Notice that in DB2 version 5 and later versions there is a new column called CARDF that replaces the function of CARD. The CARD column is still present and is still maintained by RUNSTATS even though the documentation doesn't state this.

Previous tip in english        Sidste danske tip        Tip list