MainframeSupport tip about Join of columns with different definitions

MainframeSupports
tip week 03/2006:

Today it is almost impossible to execute a SQL SELECT statement without having to join at least two tables. Actually DB2 is pretty good and fast at joining tables when the join columns have the same definitions. CHAR and VARCHAR is considered to be of the same type, INTEGER and SMALLINT are also considered to be of the same type. You might still experience problems between INTEGER and DECIMAL and between different types of DECIMAL. There are a lot of improvements in this area in DB2 version 8, so check it out. Joining numbers with characters will remain a challenge, so please read on.

Some people (mainly IT architects) will say, that joining columns of different definitions never happens because the design of the data model or the services/components will take care of such problems. Real world experience tells a different story. Let us assume a customer table with a unique key called customerNo defined as DECIMAL(10). Years have passed and new systems emerge around the customer table and for different reasons another table called internetOrder is created with a customerNo defined as CHAR(10).

After a while where both tables are used in production someone discovers the need for joining the internetOrder and the customer table. This join may occur in a program or in a simple adhoc query. The SQL statement may look like:

SELECT C.CUSTOMERNO, C.ZIPCODE, I.ORDERNO, I.ORDERTYPE FROM CUSTOMER C, INTERNETORDER I WHERE C.CUSTOMERNO = I.CUSTOMERNO AND C.INTERNETCUSTOMER = 'Y'

This simple SQL statement executes for a very long time. Either DB2 selects a merge scan join (best accesspath in this case) or a nested loop join (the query from hell). Nested loop join will be carried out as a tablespace scan of the outer table and a tablespace scan on the inner table for each qualifying row from the outer table. We can add indexes, we can make nested table expressions or perform other tricks, but they will not help. This join will suffer almost forever because of the different definitions of customerNo.

The solution can be to code the join in a program, but you can also use a global temporary table. In this case I will declare it like:

CREATE GLOBAL TEMPORARY TABLE INTERNETCUSTOMER (CUSTOMERNO CHAR(10) NOT NULL, ZIPCODE CHAR(10))

The column ZIPCODE is declared using the same definition as in table CUSTOMER. Now you only need to rephrase the previous SQL SELECT statement into the following two SQL statements:

INSERT INTO INTERNETCUSTOMER SELECT DIGITS(CUSTOMERNO), ZIPCODE FROM CUSTOMER WHERE INTERNETCUSTOMER = 'Y' ; SELECT C.CUSTOMERNO, C.ZIPCODE, I.ORDERNO, I.ORDERTYPE FROM INTERNETCUSTOMER C, INTERNETORDER I WHERE C.CUSTOMERNO = I.CUSTOMERNO ;

The first of these two SQL statements creates a temporary table with the definition declared earlier and fills it with rows for all the customers being internet customers. In the select list we convert from DECIMAL(10) to CHAR(10) using the DIGITS function. If we had to convert from CHAR to DECIMAL it is possible to use the DECIMAL function.

The second SQL statement will make the extract. If there is an index for column customerNo on table internetOrder this SQL statement will finish in a short time. A satisfactory result will heavily depend on whether the customerNo in table InternetOrder has leading zeroes or not and contains numeric characters only or not.

The advantages of using a global temporary table in this case is that it can be used over and over again, you can run explain against it to check the access path, and you do not have to worry about cleaning up the rows, because they disappear at the next COMMIT or ROLLBACK. This last detail will cause you a lot of grief in QMF, because QMF always perform a COMMIT of ROLLBACK after each executed SQL statement, at least as far as I know.

Previous tip in english Sidste danske tip Tip list

MainframeSupportstip week 03/2006:

MainframeSupports
tip week 03/2006: