
MainframeSupports tip week 03/2006:
Today it is almost impossible to execute a SQL SELECT statement without having to
join at least two tables. Actually DB2 is pretty good and fast at joining tables
when the join columns have the same definitions. CHAR and VARCHAR is considered
to be of the same type, INTEGER and SMALLINT are also considered to be of the
same type. You might still experience problems between INTEGER and DECIMAL and
between different types of DECIMAL. There are a lot of improvements in this area
in DB2 version 8, so check it out. Joining numbers with characters will remain a
challenge, so please read on.
Some people (mainly IT architects) will say, that joining columns of different
definitions never happens because the design of the data model or the services/components
will take care of such problems. Real world experience tells a different story. Let us
assume a customer table with a unique key called customerNo defined as DECIMAL(10). Years
have passed and new systems emerge around the customer table and for different reasons
another table called internetOrder is created with a customerNo defined as CHAR(10).
After a while where both tables are used in production someone discovers the need for
joining the internetOrder and the customer table. This join may occur in a program or
in a simple adhoc query. The SQL statement may look like:
SELECT C.CUSTOMERNO, C.ZIPCODE, I.ORDERNO, I.ORDERTYPE
FROM CUSTOMER C, INTERNETORDER I
WHERE C.CUSTOMERNO = I.CUSTOMERNO
AND C.INTERNETCUSTOMER = 'Y'
This simple SQL statement executes for a very long time. Either DB2 selects
a merge scan join (best accesspath in this case) or a nested loop join (the query from
hell). Nested loop join will be carried out as a tablespace scan of the
outer table and a tablespace scan on the inner table for each qualifying row from
the outer table. We can add indexes, we can make nested table expressions or perform
other tricks, but they will not help. This join will suffer almost forever because of
the different definitions of customerNo.
The solution can be to code the join in a program, but you can also use a
global temporary table. In this case I will declare it like:
CREATE GLOBAL TEMPORARY TABLE INTERNETCUSTOMER
(CUSTOMERNO CHAR(10) NOT NULL, ZIPCODE CHAR(10))
The column ZIPCODE is declared using the same definition as in table CUSTOMER. Now you
only need to rephrase the previous SQL SELECT statement into the following two SQL
statements:
INSERT INTO INTERNETCUSTOMER
SELECT DIGITS(CUSTOMERNO), ZIPCODE
FROM CUSTOMER
WHERE INTERNETCUSTOMER = 'Y'
;
SELECT C.CUSTOMERNO, C.ZIPCODE, I.ORDERNO, I.ORDERTYPE
FROM INTERNETCUSTOMER C, INTERNETORDER I
WHERE C.CUSTOMERNO = I.CUSTOMERNO
;
The first of these two SQL statements creates a temporary table with the definition
declared earlier and fills it with rows for all the customers being internet customers.
In the select list we convert from DECIMAL(10) to CHAR(10) using the DIGITS function. If
we had to convert from CHAR to DECIMAL it is possible to use the
DECIMAL function.
The second SQL statement will make the extract. If there is an index for column
customerNo on table internetOrder this SQL statement will finish in a short time. A
satisfactory result will heavily depend on whether the customerNo in table InternetOrder
has leading zeroes or not and contains numeric characters only or not.
The advantages of using a global temporary table in this case is that it can be used over
and over again, you can run explain against it to check the access path, and you do not
have to worry about cleaning up the rows, because they disappear at the next COMMIT or
ROLLBACK. This last detail will cause you a lot of grief in QMF, because QMF always perform
a COMMIT of ROLLBACK after each executed SQL statement, at least as far as I know.
Previous tip in english
Sidste danske tip
Tip list
|