MainframeSupport tip about Join of datasets

MainframeSupports
tip week 22/2018:

Joining data in a SQL based database system like DB2 is business as usual, but to join data in datasets is a challenge. Those of you who had the need did probably load the data into some DB2 tables and did the join in SQL, but there is actually no need to do so. SORT has for some time (I do not know for how long, but probably for the last 5 to 10 years) provided us with functionality to join data in datasets. And as usual when SORT is involved a join is super fast.

To show how it works it does not take a lot of data so I will present you with at little example which you can copy, paste and run:

//JOINING  EXEC PGM=SORT
//SORTJNF1 DD *
OTHERKEY1 KEY1 OTHERDATA A
OTHERKEY2 KEY1 OTHERDATA B
OTHERKEY3 KEY2 OTHERDATA C
OTHERKEY4 KEY2 OTHERDATA D
OTHERKEY5 KEY2 OTHERDATA E
OTHERKEY6 KEY2 OTHERDATA F
OTHERKEY6 KEY3 OTHERDATA F
//SORTJNF2 DD *
KEY1 SOMEDATA
KEY1 MOREDATA
KEY2 OTHERDATA
KEY4 OTHERDATA
//SORTOUT  DD SYSOUT=*
//SYSOUT   DD SYSOUT=*
//SYSIN    DD *
    JOINKEYS FILE=F1,FIELDS=(11,4,A)
    JOINKEYS FILE=F2,FIELDS=(1,4,A)
    JOIN UNPAIRED,F1
    REFORMAT FIELDS=(F1:1,27,F2:1,15),FILL=C'-'
    SORT FIELDS=COPY
/*
//

Instead of the usual SORTIN DD name you must use SORTJNF1 for the first dataset and SORTJNF2 for the second one. It is not possible to join more than two datasets in one execution. In return you can use three different SORT commands to perform a join and you must use at least two of them. The JOINKEYS command is used to specify which positions need to match in the two datasets in order to be joined by SORT. You must specify two JOINKEYS commands, one for each SORTJNF* DD name. The positions in the two datasets to be joined does not need to be the same just as in the above example.

The second command to be used is REFORMAT, which specifies which data (given by positions) will be handed over to normal SORT processing from the two joined files. The result of a join in SORT is to be considered as the data you normally will find in the SORTIN DD name. This is why the possibilities of data manipulation are quite limited in REFORMAT as you can use INREC and OUTREC to reformat data. And as you can see from the example you must provide a SORT command. In the example I want positions 1-27 from SORTJNF1 to appear in SORTOUT. Appended to these data I want positions 1-15 from SORTJNF2.

If you only specify JOINKEYS and REFORMAT you will end up with a result corresponding to a inner join in SQL. With the JOIN command you can create outer joins. In the above example UNPAIRED,F1 specifies that records in F1 which does not match any records in F2 will be part of the result. FILL on the REFORMAT command specifies which values to replace the positions coming from F2 as there are no data in F2. The result on SORTOUT after execution of the above step will look like:

OTHERKEY1 KEY1 OTHERDATA A KEY1 SOMEDATA
OTHERKEY1 KEY1 OTHERDATA A KEY1 MOREDATA
OTHERKEY2 KEY1 OTHERDATA B KEY1 SOMEDATA
OTHERKEY2 KEY1 OTHERDATA B KEY1 MOREDATA
OTHERKEY3 KEY2 OTHERDATA C KEY2 OTHERDATA
OTHERKEY4 KEY2 OTHERDATA D KEY2 OTHERDATA
OTHERKEY5 KEY2 OTHERDATA E KEY2 OTHERDATA
OTHERKEY6 KEY2 OTHERDATA F KEY2 OTHERDATA
OTHERKEY6 KEY3 OTHERDATA F ---------------

As you can see the result completely matches the result you would have obtained in SQL. There was a single record on F1 which did not match any record in F2 so the positions normally coming from F2 has been filled with dashes. I must admit I am totally crazy about the join possibilities in SORT. They have made me do things I previously would have to do a lot of coding to achieve. And it runs so extremely fast. I hope you will be just as excited about join in SORT as I am.

Previous tip in english Forrige danske tip Tip list

MainframeSupportstip week 22/2018:

MainframeSupports
tip week 22/2018: