MainframeSupport tip about GROUP BY functionality on datasets

MainframeSupports
tip week 31/2015:

I hope you know and understand how GROUP BY in SQL works. If not it will be a little hard to understand the idea behind this tip. Imagine you want to execute some GROUP BY like functionality on a dataset or member. Of course you can choose to create a DB2 table and load the dataset into this table, but it is very likely to be much faster to use DFSORT for the purpose.

In the following example I have a dataset of RECFM=FB and LRECL=80 (very common). In the first 40 positions I have the data to GROUP BY and I want to know how many records there are in each group. The last 40 positions in the file does not matter in this context. The task can be solved using the following step:

//GROUPBY EXEC PGM=SORT //SORTIN DD DISP=SHR,DSN=MY.ORDINARY.DATASET //SORTOUT DD DISP=(NEW,CATLG),SPACE=(TRK,(99,99),RLSE), // DSN=MY.GROUP.BY //SYSOUT DD SYSOUT=* //SYSIN DD * INREC FIELDS=(1,40,C' 000000001',30X) SORT FIELDS=(1,40,CH,A) SUM FIELDS=(42,9,ZD) /*

By using INREC I build the layout of the data I want to make a sort and a sum on. Each record should only count as one in their corresponding group and thus I concatenate the key by a readable value of one. In order to separate the group value from the sum/count I have inserted a space. DFSORT can be touchy about conservation of the total record length so I add 30 spaces in order to keep the LRECL of 80. The SORT function is carried out on the layout of the record after it has been formatted by INREC. I make the SORT work on the positions that forms the group. It is quite easy to sort on parts of the 40 positions in the group if this should turn out to be interesting. The SUM function creates a sum of all the one-numbers in position 42 to 50 in each group.

The result is available in the dataset MY.GROUP.DATASET. As you may have noticed SORT is executing extremely fast so it will not take many seconds to produce the result even on large amounts of data. An on top of that it is very easy to change the SORT parameter if you want another grouping. It is not as readable as a SQL SELECT statement with a GROUP BY, but it is much faster than first to create a table and load data.

You can of course make sums on any numeric fields already present in the dataset, but you do not need to have any numeric fields to make SUM on. You might just want the unique value of each group without knowing how many records there are in each group. Then you can use a SUM FIELDS=NONE and then you do not need a INREC to create a field to sum on.

Previous tip in english Forrige danske tip Tip list

MainframeSupportstip week 31/2015:

MainframeSupports
tip week 31/2015: