MainframeSupport tip about Dynamic SQL statement CPU consumption and more

MainframeSupports
tip week 25/2013:

A little less than a year ago I wrote a tip about the dynamic statement cache in DB2 and how to extract information from it. In this tip I will tell you more about how you can use it to measure a specific SQL statement. I use this technique to determine if a tuning effort has the expected benefit.

In order to measure the CPU consumption of a SQL statement it is important to remember that the exact same SQL statement most likely will use more CPU on the first execution than the following ones. The first time you execute the statement data will be read into the bufferpool while the data are likely to remain in the bufferpool for the following executions depending on the time used between each execution.

Another very important detail is that if two SQL statement are textually equal to each other, then the statement cache will treat those two statements as the same statement. If you want a new row to occur in the statement cache for your statement, then you can add a constant in front of the select list which you can change each time you want a new row to occur in the statement cache. If your statement is not a SELECT you can put a QUERYNO 999 at the end of the statement and give it a new number when you want to have a new row in the statement cache.

When I want to measure a SELECT statement I typically use this approach:

I insert a constant in front of the SELECT list for instance '1.TIME'.
I execute the SELECT statement.
I change the constant to 'N.TIME'.
I execute the SELECT statement five times as quickly as possible.

Now I have created two new rows in the statement cache, one for the first execution which reads data into the bufferpool and one for the next five executions. I do it five times because many factors beyond our control can influence the CPU consumption. Executing the statement ten times makes the measurement more accurate, but I only do it if the statement is using a very small amount of CPU which increases the uncertainty of the CPU measurement even though it is measured in microseconds.

The next exiciting step is of course to extract information from the statement cache. Please read my previous tip on how to do this. When the statement cache is populated in DSN_STATEMENT_CACHE_TABLE, I will recommend you to use the following SELECT:

SELECT CHAR(STMT_TEXT, 30) STMT, STAT_EXEC EXECS, DECIMAL(STAT_CPU,10,6) CPU , STAT_GPAG GETPAGE, STAT_SYNR SYNCREAD , DECIMAL(STAT_ELAP,10,6) ELAPSED FROM DSN_STATEMENT_CACHE_TABLE WHERE CACHED_TS > CURRENT TIMESTAMP - 15 MINUTES ORDER BY CACHED_TS DESC

As the statement cache may contain many of the old statements I have executed, I have set a time limit so I only see new statements executed within the last 15 minutes. I can always adjust this time limit. The first rows returned is hopefully the row for the statement containing the constant N.TIME and the second row statement 1.TIME. You may see a DELETE FROM DSN_STATEMENT_CACHE as the second row if you have followed the guidelines from my previous tip. The first column contains the first 30 characters of the SQL statement present in the statement cache. You can extract more characters if you like, but the inserted constant value will be present within the first 30 characters. If the 1.TIME or the N.TIME row is missing it is probably because I executed exactly the same statement more than 15 minutes ago.

STMT_TEXT is a LOB column which may cause the expression CHAR(STMT_TEXT, 30) to fail for instance in SPUFI. Instead you can use SUBSTR(STMT_TEXT, 1, 30). In QMF it is the opposite problem where SUBSTR(STMT_TEXT, 1, 30) will cause problems while the CHAR version works fine. This difference has occured because the programmers behind these two products are not the same, and neither of them can handle LOB columns correctly.

The column EXECS (STAT_EXEC) specifies how many times the statement has been executed while it has resided in the statement cache. The contents will be 1 for 1.TIME and 5 for N.TIME if you have used the above guidelines. The column CPU will contain the total CPU consumption for all executions. You need to divide this figure with EXECS if you want to know what one execution costs. The column GETPAGE (STAT_GPAG) contains the total number of getpages for all executions. Getpages is the number of pages in the bufferpool visited by the statement. More visits to the same page counts one for each visit. The column SYNCREAD (STAT_SYNR) shows the number of synchronous reads. This figure should be zero for the N.TIME row otherwise you cannot compare the CPU consumption for one statement with another unless they have carried out the same number of SYNCREAD (which can be very hard to acheive). You perform the first time execution as a separate statement to get a zero in the SYNCREAD column. The column ELAPSED contains the total elapsed time used by DB2 and the application (SPUFI/QMF/whatever) for the execution.

When you have measured different SQL statements you will soon discover that there is a clear correlation between CPU consumption and getpages. More getpages means more CPU consumption. To reduce the number of getpages for a statement is the sure way to reduce its CPU consumption. You will probably also discover that both CPU and ELAPSED increases as soon as SYNCREAD is greater than 0. Here you clearly see the effect of the bufferpool.

Maybe you are wondering whether CPU consumption and other stuff to determine the access path is included in the figures from the statement cache. They are not included. It is only CPU, GETPAGE and so on for the execution after determining the access path that is included.

Previous tip in english Forrige danske tip Tip list

MainframeSupportstip week 25/2013:

MainframeSupports
tip week 25/2013: