Way back in 2013 I wrote a tip about joining several rows into one. My idea used left join for the task and was thus limited in the number of rows it could join. In the same tip I noted that a recusive solution may be possible, but I had not used time to find out how to do it. Recently I got a task where I could use the same idea, but this time I decided to make it recursive. I think it is about time to share the recursive solution with you.
I will use the same problem as in the previous tip so you can see the difference. Just to repeat the problem is to fetch one row for each table in the system catalogue where the row contains both the table name and the names of all columns in the primary key. The recursive solution looks like this:
WITH RECURSIVERESULT (CREATOR, TABLE, KEYSEQ, PRIMARY_KEY_COLUMNS) AS ( SELECT T.CREATOR, T.NAME, C.KEYSEQ, CAST(STRIP(C.NAME) AS VARCHAR(254)) FROM SYSIBM.SYSTABLES T, SYSIBM.SYSCOLUMNS C WHERE T.CREATOR = 'SYSIBM' AND T.NAME LIKE 'SYS%' AND T.TYPE = 'T' AND T.KEYCOLUMNS > 0 AND T.CREATOR = C.TBCREATOR AND T.NAME = C.TBNAME AND C.KEYSEQ = 1 UNION ALL SELECT R.CREATOR, R.TABLE, C.KEYSEQ , R.PRIMARY_KEY_COLUMNS || ' ' || STRIP(C.NAME) FROM RECURSIVERESULT R, SYSIBM.SYSCOLUMNS C WHERE R.CREATOR = C.TBCREATOR AND R.TABLE = C.TBNAME AND R.KEYSEQ + 1 = C.KEYSEQ ) SELECT TABLE, PRIMARY_KEY_COLUMNS FROM RECURSIVERESULT R WHERE KEYSEQ = ( SELECT MAX(KEYSEQ) FROM RECURSIVERESULT WHERE R.CREATOR = CREATOR AND R.TABLE = TABLE ) ORDER BY TABLE
Compared to the previous tip the above SQL looks quite complicated. On the other hand it will include all column names no matter how many there are. There is though a limitation of 254 characters in the composite key column names. This number is adjustable by changing VARCHAR(254) to for instance VARCHAR(1000) or something even larger. In the final SELECT from the collective recursive result in RECURSIVERESULT containing all rows from every single recursion, it is only the row containing all primary key column names which is interesting. These rows are identifiable by having the largest value of KEYSEQ. Thus the WHERE clause contains a sub-select in order to fetch only the rows having MAX(KEYSEQ) for each table in the final result.
In the above example it is straightforward to identify the row in RECURSIVERESULT where all rows are joined into the final row. And at the same time it is easy by using KEYSEQ to identify the next recursion. If you have a problem where you also want to join several rows into one, but a column like KEYSEQ is not available you can use ROWNUMBER() OVER(PARTITION BY col1, col2, ...) to generate your personal KEYSEQ. And you also need a work table to hold your generated KEYSEQ column. The above example would look like this if KEYSEQ were not available:
WITH WORKRESULT (CREATOR, TABLE, KEYSEQ, KEYCOL) AS ( SELECT T.CREATOR, T.NAME , ROWNUMBER() OVER(PARTITION BY T.CREATOR, T.NAME ) , CAST(STRIP(C.NAME) AS VARCHAR(254)) FROM SYSIBM.SYSTABLES T, SYSIBM.SYSCOLUMNS C WHERE T.CREATOR = 'SYSIBM' AND T.NAME LIKE 'SYS%' AND T.TYPE = 'T' AND T.KEYCOLUMNS > 0 AND T.CREATOR = C.TBCREATOR AND T.NAME = C.TBNAME AND C.KEYSEQ > 0 ), RECURSIVERESULT (CREATOR, TABLE, KEYSEQ, PRIMARY_KEY_COLUMNS) AS ( SELECT CREATOR, TABLE, KEYSEQ, KEYCOL FROM WORKRESULT WHERE KEYSEQ = 1 UNION ALL SELECT R.CREATOR, R.TABLE, W.KEYSEQ , R.PRIMARY_KEY_COLUMNS || ' ' || W.KEYCOL FROM RECURSIVERESULT R, WORKRESULT W WHERE R.CREATOR = W.CREATOR AND R.TABLE = W.TABLE AND R.KEYSEQ + 1 = W.KEYSEQ ) SELECT TABLE, PRIMARY_KEY_COLUMNS FROM RECURSIVERESULT R WHERE KEYSEQ = ( SELECT MAX(KEYSEQ) FROM RECURSIVERESULT WHERE R.CREATOR = CREATOR AND R.TABLE = TABLE ) ORDER BY TABLE
This SQL statement will yield almost the same result as the first one. Only difference is the order of the columns in PRIMARY_KEY_COLUMNS, which will probably be another as I am not using any column to determine the correct order. It might be the case that in your problem you do not have any information about the correct order of the rows you are joining. In the above example you can get the original order back by appending ORDER BY C.KEYSEQ within the OVER parenthesis after the PARTITION expression.