The two predominant reasons for changing DB2 Catalog statistics to influence access paths are to try to get DB2 to use an index and to influence DB2 to change the order in which tables are joined. In each case, the tuning methods require that you “play around” with the DB2 Catalog statistics to create a lower filter factor. You should keep in mind five rules when doing so.
Rule 1: As first key cardinality (FIRSTKEYCARDF) increases, the filter factor decreases. As the filter factor decreases, DB2 is more inclined to use an index to satisfy the SQL statement.
Rule 2: As an index becomes more clustered, you increase the probability that DB2 will use it. To enhance the probability of an unclustered index being used, increase its cluster ratio (CLUSTERRATIOF) to a value between 96 and 100, preferably 100.
So understanding these rules, if you wish to influence DB2 to use an index by changing statistics, consider adjusting the COLCARDF,FIRSTKEYCARDF, and FULLKEYCARDFcolumns to an artificially high value. As cardinality increases, the filter factor decreases. As the filter factor decreases, the chance that DB2 will use an available index becomes greater. DB2 assumes that a low filter factor means that only a few rows are being returned, causing indexed access to be more efficient. Adjusting COLCARDF, FIRSTKEYCARDF, and FULLKEYCARDF may also be useful for getting DB2 to choose an unclustered index because DB2 is more reluctant to use an unclustered index with higher filter factors. You also can change the value of CLUSTERRATIOF to 100 to remove DB2’s reluctance to use unclustered indexes from the access path selection puzzle.
Rule 3: In a join, DB2’s choice for inner and outer tables is a delicate trade-off. Because the inner table is accessed many times for each qualifying outer table row, it should be as small as possible to reduce the time needed to scan multiple rows for each outer table row. The more inner table rows, the longer the scan. But the outer table should also be as small as possible to reduce the overhead of opening and closing the internal cursor on the inner table.
It is impossible to choose the smallest table as both the inner table and the outer table. When two tables are joined, one must be chosen as the inner table, and the other must be chosen as the outer table. My experience has shown that as the size of a table grows, the DB2 optimizer favors using it as the outer table in a nested loop join. Therefore, changing the cardinality (CARDF) of the table that you want as the outer table to an artificially high value can influence DB2 to choose that table as the outer table.
Rule 4: As column cardinality (COLCARDF) decreases, DB2 favors the use of the nested loop join over the merge scan join. Lowering the value of COLCARDF can be used to try to favor the nested loop join.
Rule 5:HIGH2KEY and LOW2KEY can be altered to more accurately reflect the overall range of values stored in a column. This is particularly useful for influencing access path selection for data with a skewed distribution.
The combination of HIGH2KEY and LOW2KEYprovides a range of probable values accessed for a particular column. The absolute highest and lowest values are discarded to create a more realistic range. For certain types of predicates, DB2 uses the following formula when calculating filter factor:
Filter Factor = (Value-LOW2KEY) / (HIGH2KEY-LOW2KEY)
Because HIGH2KEY and LOW2KEYcan affect the size of the filter factor, the range of values that they provide can significantly impact access path selection.
For troublesome queries, check whether the distribution of data in the columns accessed is skewed. If you querySYSIBM.SYSCOLDIST, the most frequently occurring values are shown for indexed columns. To be absolutely accurate, however, obtain a count for each column value, not just the top values collected by RUNSTATS using a query such as:
SELECT COL, COUNT(*)
FROM your.table
GROUP BY COLORDER BY COL;
This query produces an ordered listing of column values. You can use this list to determine the distribution of values. If a few values occur much more frequently than the other values, the data is not evenly distributed. In this circumstance, consider using dynamic SQL, hard coding predicate values, or binding with REOPT(ALWAYS). This enables DB2 to use nonuniform distribution statistics when calculating filter factors.
If neither dynamic SQL nor hard-coded predicates are practical, you might try to change HIGH2KEY to a lower value and/or LOW2KEY to a higher value to reduce the range of possible values, thereby lowering the filter factor. Alternatively, or additionally, you can increase COLCARDF, FIRSTKEYCARDF, and FULLKEYCARDF.
Remember that modifying DB2 Catalog statistics is not a trivial exercise. Simply making the changes indicated in this section might be insufficient to resolve your performance problems because of DB2’s knowledge of the DB2 Catalog statistics. Some statistical values have implicit relationships. When one value changes, DB2 assumes that the others have changed also. For example, consider these relationships:
- When you change COLCARDF for a column in an index, be sure to also change the FIRSTKEYCARDF of any index in which the column participates as the first column of the index key, and the FULLKEYCARDF of any index in which the column participates.
- Provide a value to both HIGH2KEY and LOW2KEY when you change cardinality information. When COLCARDF is not –1, DB2 assumes that statistics are available. DB2 factors these high and low key values into its access path selection decision. Failure to provide both a HIGH2KEY and a LOW2KEY can result in the calculation of inaccurate filter factors and the selection of inappropriate access paths.
Before deciding to update DB2 Catalog statistics to force DB2 to choose different access paths, be sure that you never change the DB2 Catalog statistics without documenting the following:
- Why the statistics will be modified
- How the modifications will be made and how frequently the changes must be run
- The current values for each statistic and the values they will be changed to
Additionally, be aware that when you change DB2 Catalog statistics, you are robbing from Peter to pay Paul. In other words, your changes might enhance the performance of one query at the expense of the performance of another query. DB2 maintenance (PTFs, new releases, and new versions) might change the access path selection logic in the DB2 optimizer. As a result of applying maintenance, binding or rebinding static and dynamic SQL operations could result in different access paths, thereby invalidating your hard work. In other words, IBM might get around to correcting the problem in the logic of the optimizer (that you solved using trickery).
Choosing the correct values for the statistics and keeping the statistics accurate can be an intimidating task. Do not undertake this endeavor lightly. Plan to spend many hours changing statistics, rebinding plans, changing statistics again, rebinding again, and so on.
The situation that caused the need to tinker with the statistics in the DB2 Catalog could change. For example, the properties of the data could vary as your application ages. Distribution, table and column cardinality, and the range of values stored could change. If the statistics are not changing because they have been artificially set outside the jurisdiction of RUNSTATS, these newer changes to the data cannot be considered by the DB2 optimizer, and an inefficient access path could be used indefinitely.
When DB2 Catalog statistics have been changed to influence access path selection, it is a good idea to periodically executeRUNSTATS and rebind to determine if the artificial statistics are still required. If they are, simply reissue the DB2 Catalog UPDATE statements. If not, eliminate this artificial constraint from your environment. Failure to implement this strategy eventually results in inefficient access paths in your environment (as DB2 and your applications mature).
This blog post was adapted from material in Craig's best-selling book on DB2 for z/OS, DB2 Developer's Guide. If you are looking for more in-depth tuning, performance, and administration guidelines for your mainframe DB2 environment, be sure to buy yourself a copy!