The Db2 Portal Blog: statistics

Showing posts with label statistics. Show all posts

Tuesday, November 11, 2025

Don't Forget About Histogram Statistics in Db2 for z/OS

There are a lot of options and parameters to consider when putting together your RUNSTATS jobs to collect statistics on your Db2 data. For those who do not know, RUNSTATS is a Db2 for z/OS utility program that is used to collect information about characteristics of tables and indexes, and storing that data in the Db2 Catalog where it can be used by the Db2 Optimizer to formulate SQL access paths.

Without reasonable statistics, as collected by RUNSTATS, you will likely experience SQL performance problems because Db2 will be accessing the data based on incorrect or outdated information about your data. As I have discussed before, RUNSTATS is one of The 5 R's of Db2 Performance.

OK, with that introductory bit out of the way, one capability of RUNSTATS, first introduced in Db2 9 for z/OS is somewhat underutilized. As you might guess from the title of this blog post, I am talking about histogram statistics.

But what is a histogram? Well, in mathematical terms, a histogram is a representation of data that shows the distribution of data in a data set. Often represented using a bar graph, each bar represents the frequency of data points within a specific range. However, unlike a typical bar chart (displaying categorical data), a histogram groups numbers into ranges, making it useful for identifying patterns, such as underlying frequency distribution, outliers, and skew.

So, a histogram is a way of summarizing data that’s measured on an interval scale. A histogram is particularly helpful when you want:

to highlight how data is distributed,
to determine if data is symmetrical or skewed, or
to indicate whether or not outliers exist.

The histogram is appropriate only for variables whose values are numerical and measured on an interval scale. To be complete, let’s define interval, which is a set of real numbers between two numbers either including or excluding one or both of them.

Histograms are generally most useful when dealing with large data sets.

Why Use Histogram Statistics?

Db2 historgram statistics can be quite useful for certain types of data. Instead of the frequency statistics, which are collected for only a subset of the data, sometimes Db2 can improve access path selection by estimating predicate selectivity from histogram statistics, which are collected over all values in a table space.

Histogram statistics can help the Db2 Optimizer estimate filtering when certain data ranges are heavily populated and others are sparsely populated. Think about data that represents customers of a diner. Typically, there will be periods of high activity (breakfast, lunch, dinner) and low activity (other periods). Or perhaps on a grander scale, consider data that represents customer activity at a big box store. During big sales (such a Black Friday) and over holidays (Christmas, for example), there will be more sales than during other periods.

Depending on the type and makeup of the data, it can make sense to collect histogram statistics to improve access paths for troublesome queries used with the following predicates:

RANGE,
LIKE, and
BETWEEN.

Histogram statistics also can help in some cases for equality (=), IS NULL, IN list, and COL op COL predicates.

How to Collect Histogram Statistics

OK, so how can you go about collecting histogram statistics? Well, the IBM RUNSTATS utility can collect statistics by quantiles. A quantile is the name for a bucket representing a range of data. For instance, from our diner example discussed earlier, you might want to have quantiles representing pre-breakfast, breakfast, pre-lunch, lunch, pre-dinner, dinner, and late night.

Db2 allows up to 100 quantiles. You can specify how many quantiles Db2 is to use: from one up to 100. Of course, specifying one isn’t wise because it’s the same as collecting for everything and so it will not be very helpful.

You do not explicitly define the make-up of quantiles, though. Db2 does that. Db2 creates equal-depth histogram statistics, dividing the whole range of values into quantiles that each contain about the same percentage of the total number of rows.

You indicate that RUNSTATS should collect histogram statistics by coding the HISTOGRAM keyword in conjunction with the COLGROUP option. This way you can collect histogram statistics for a group of columns. You also must tell Db2 the number of quantiles to collect by specifying the NUMQUANTILES parameter. NUMQUANTILES also can be specified with the INDEX parameter, in which case it indicates that histogram statistics are to be collected for the columns of the index.

A single value can never be broken into more than one interval. This means the maximum number of intervals is equal to the number of distinct column values. For example, if you have 40 values, you can have no more than 40 quantiles, each consisting of a single value. In other words, be sure that you don’t specify a value for NUMQUANTILES that is greater than the total number of distinct values for the column (or column group) specified. Also, keep in mind that any NULLs will occupy a single interval.

So, then, how do you decide on the number of quantiles to collect? If you don’t specify NUMQUANTILES, the default value of 100 will be used; then based on the number of records in the table, Db2 will adjust the number of quantiles to an optimal number. Therefore, unless you have a good understanding of the application or a viable reason to deviate, a reasonable rule of thumb is to simply let NUMQUANTILES default and let Db2 work it out.

RUNSTATS will attempt to produce an equal-depth histogram. This means each interval will have about the same number of rows. Please note that this doesn’t mean the same number of values—it’s the same number of rows. In some cases, a highly frequent single value could potentially occupy an interval all by itself.

Where Are the Histogram Statistics Stored?

The following columns in a histogram statistics table define the interval:

QUANTILENO - a sequence number that identifies the interval
HIGHVALUE - a value specifying the upper bound for the interval.
LOWVALUE - a value specifying the lower bound for the interval

These columns can be found in the following six Db2 Catalog tables:

SYSCOLDIST,
SYSKEYTGTDIST,
SYSCOLDIST_HIST,
SYSCOLDISTSTATS,
SYSKEYTGTDIST_HIST, and
SYSKEYTGTDISTSTATS.

Here’s a sample RUNSTATS specification that collects histogram statistics for the key columns of the indexes on tables in the CSMTS02 table space:

RUNSTATS TABLESPACE DB.CSMTS02

INDEX ALL

HISTOGRAM NUMCOLS 2 NUMQUANTILES 10

SHRLEVEL(CHANGE)

UPDATE ALL

REPORT YES

Summary

If you have troublesome queries on data that is skewed or disproportionately distributed, you should consider gathering histogram statistics. Keep in mind that running RUNSTATS specifying HISTOGRAM will probably consume more CPU than just gathering frequency distribution statistics, the performance gains from improved access paths for your queries could more than offset that cost.

Thursday, June 25, 2020

Db2 12 for z/OS Function Level 507

This month, June 2020, IBM introduced a new function level, FL507, for Db2 12 for z/OS. This is the first new function level this year, and the first since October 2019. The Function Level process was designed to release Db2 functionality using Continuous Delivery (CD) in short, quick bursts. However, it seems that the global COVID-19 pandemic slowed things a bit… and that, of course, is understandable. But now we have some new Db2 for z/OS capabilities to talk about for this first time in a little bit!

There are four significant impacts of this new function level:

Application granularity for locking limits
Deletion of old statistics from the Db2 Catalog when using profiles
CREATE OR REPLACE capability for stored procedures
Passthrough-only expressions with IBM Db2 Analytics Accelerator (IDAA)

Let’s take a quick look at each of these new things.

The first new capability is the addition of application granularity for locking limits. Up until now, the only way to control locking limits was with NUMLKUS and NUMLKTS subsystem parameters, and they applied to the entire subsystem.

NUMLKTS defines the threshold for the number of page locks that can be concurrently held for any single table space by any single DB2 application (thread). When the threshold is reached, DB2 escalates all page locks for objects defined as LOCKSIZE ANY according to the following rules:

All page locks held for data in segmented table spaces are escalated to table locks.
All page locks held for data in partitioned table spaces are escalated to table space locks.

NUMLKUS defines the threshold for the total number of page locks across all table spaces that can be concurrently held by a single DB2 application. When any given application attempts to acquire a lock that would cause the application to surpass the NUMLKUS threshold, the application receives a resource unavailable message (SQLCODE of -904).

Well, now we have two new built-in global variables to support application granularity for locking limits.

The first is SYSIBMADM.MAX_LOCKS_PER_TABLESPACE and it is similar to the NUMLKTS parameter. It can be set to an integer value for the maximum number of page, row, or LOB locks that the application can hold simultaneously in a table space. If the application exceeds the maximum number of locks in a single table space, lock escalation occurs.

The second is SYSIBMADM.MAX_LOCKS_PER_USER and it is similar to the NUMLKUS parameter. You can set it to an integer value that specifies the maximum number of page, row, or LOB locks that a single application can concurrently hold for all table spaces. The limit applies to all table spaces that are defined with the LOCKSIZE PAGE, LOCKSIZE ROW, or LOCKSIZE ANY options.

The next new capability is the deletion of old statistics when using profiles. When you specify the USE PROFILE option with RUNSTATS, Db2 collects only those statistics that are included in the specified profile. Once function level 507 is activated, Db2 will delete any existing statistics for the object(s) that are not part of the profile. This means that all frequency, key cardinality, and histogram statistics that are not included in the profile are deleted from the Db2 Catalog for the target object.

This is a welcome new behavior because it makes it easier to remove old and stale distribution statistics. Keep in mind that this new behavior also applies when you use profiles to gather inline statistics with the REORG TABLESPACE and LOAD utilities.

Another great new capability that stored procedure users have been waiting for for some time now is the ability to specify CREATE OR REPLACE for procedures. This means that you do not have to first DROP a procedure if you want to modify it. You can simply specify CREATE OR REPLACE PROCEDURE and if it already exists, the procedure will be replaced, and if not, it will be created. This capability has been available in other DBMS products that support stored procedures for a while and it is good to see it come to Db2 for z/OS!

Additionally, for native SQL procedures, you can use the OR REPLACE clause on a CREATE PROCEDURE statement in combination with a VERSION clause to replace an existing version of the procedure, or to add a new version of the procedure. When you reuse a CREATE statement with the OR REPLACE clause to replace an existing version or to add a new version of a native SQL procedure, the result is similar to using an ALTER PROCEDURE statement with the REPLACE VERSION or ADD VERSION clause. If the OR REPLACE clause is specified on a CREATE statement and a procedure with the specified name does not yet exist, the clause is ignored and a new procedure is still created.

And finally, we have support for passthrough-only expressions to IDAA. This is needed because you may want to use an expression that exists on IDAA, but not on Db2 12 for z/OS. With a passthrough-only expression, Db2 for z/OS simply verifies that the data types of the parameters are valid for the functions. The expressions get passed over to IDAA, and the accelerator engine does all other function resolution processing and validation.

What new expressions does FL507 support you may ask? Well all of the following built-in functions are now supported as passthrough-only expressions to IDAA:

ADD_DAYS
BTRIM
DAYS_BETWEEN
NEXT_MONTH
Regression functions

REGR_AVGX
REGR_AVGY
REGR_COUNT
REGR_INTERCEPT
REGR_ICPT
REGR_R2
REGR_SLOPE
REGR_SXX
REGR_SXY
REGR_SYY

ROUND_TIMESTAMP (when invoked with a DATE expression)

You can find more details on the regression functions from IBM here.

Summary

These new capabilities are all nice, new features that you should take a look at, especially if you have applications and use cases where they can help.

The enabling APAR for FL507 is PH24371. There are no incompatible changes with FL 507. But be sure to read the instructions for activation details and Db2 Catalog impacts for DL 507.

Wednesday, June 13, 2018

Db2 for z/OS Performance Traces Part 2 - Global, Monitor, Performance, and Statistics

In Part 1 of the series on Db2 performance traces we provided a general overview, as well as a discussion of the Accounting and Audit trace classes. Today, in Part 2, we will discuss the remaining 4 trace classes: Global, Monitor, Performance, and Statistics.

Global Trace

The global trace is one I hope you never have to use. It produces information that is used to service Db2, so you'd only start a global trace at the direction of IBM if you are having some sort of trouble. A global trace records information regarding entries and exits from internal Db2 modules as well as other information about Db2 internals.

Global trace records are not accessible through normal tools that monitor Db2 performance. Most sites will never need to use the DB2 global trace. You should avoid it unless an IBM representative requests that your shop initiate it.

A global trace can add significant CPU overhead to your Db2 subsystem.

Monitor Trace

Quite a bit of useful performance monitoring information is recorded by the Db2 monitor trace. Most of the information in a monitor trace is also provided by other types of Db2 traces. The primary reason for the existence of the monitor trace type is to enable you to write application programs that provide online monitoring of Db2 performance.

Information provided by the monitor trace includes Db2 statistics and accounting trace information, as well as details of current SQL statements.

There are ten groups of DB2 monitor trace classes:

Class 1: Standard accounting data
Class 2: Entry or exit from DB2 events
Class 3: DB2 wait for I/O or locks
Class 4: Installation-defined monitor trace record
Class 5: Time spent processing IFI requests
Class 6: Changes to tables created with DATA CAPTURE CHANGES
Class 7: Entry or exit from event signaling package accounting
Class 8: Wait time for a package
Class 9: Statement level accounting
Class 10: Package detail
Class 11 through 28: Reserved
Class 29: Dynamic statement detail
Class 30 through 32: Local use

The overhead that results from the monitor trace depends on how it is used at your site. If it is used as recommended, class 1 is always active with 2 and 3 started and stopped as required, the overhead will likely be minimal but will depend on the activity of the Db2 system and the number of times that the other classes are started and stopped. If you make use of the reserved classes (30 through 32), or additional classes (as some vendors do), your site will incur additional overhead.

Do not start the monitor trace using DSNZPARMs unless online performance monitors in your shop explicitly require you to do so. It is best to start only monitor trace class 1 and to use a performance monitor that starts and stops the other monitor trace classes as required.

Some online performance monitoring tools do not use the monitor trace; instead, they read the information directly from the Db2 control blocks. Sampling Db2 control blocks requires less overhead than a monitor trace but it can be disruptive if the tools encounters bugs.

Performance Trace

The Db2 performance trace records an abundance of information about all types of Db2 events. You should use it only after you have exhausted all other avenues of monitoring and tuning because it consumes a great deal of system resources. When a difficult problem persists, the performance trace can provide valuable information, including SQL statement text, a complete trace of the execution of SQL statements, including details of all events, all index accesses and all data access due to referential constraints.

There are 22 groups of Db2 performance trace classes:

Class 1: Background events
Class 2: Subsystem events
Class 3: SQL events
Class 4: Reads to and writes from buffer pools and the EDM pool
Class 5: Writes to log or archive log
Class 6: Summary lock information
Class 7: Detailed lock information
Class 8: Data scanning detail
Class 9: Sort detail
Class 10: Detail on BIND, commands, and utilities
Class 11: Execution unit switch and latch contentions
Class 12: Storage manager
Class 13: Edit and validation exits
Class 14: Entry from, and exit to an application
Class 15: Installation-defined performance trace record
Class 16: Distributed processing
Class 17: Claim and drain information
Class 18: Event-based console messages
Class 19: Data set open and close activity
Class 20: Data sharing coherency summary
Class 21: Data sharing coherency detail
Class 22: Authorization exit parameters
Class 23 through 29: Reserved
Class 30 through 32: Local use

When all Db2 performance trace classes are active, you will experience significant overhead, perhaps as much as 100% CPU overhead by each program being traced. The actual overhead might be greater (or lesser) depending on actual system activity. The overhead when using only classes 1, 2, and 3, however, typically ranges between 5% and 30%.

Performance traces must be explicitly started with the -START TRACE command. Starting the performance trace only for the plan (or plans) you want to monitor by using the PLAN parameter of the -START TRACE command is wise. Here’s an example:

-START TRACE(PERFM) CLASS(1,2,3) PLAN(PLANNAME) DEST(GTF)

Failure to start the trace at the plan level can result in the trace being started for all plans, which causes undue overhead on all DB2 plans that execute while the trace is active.

Furthermore, due to the large number of trace records cut by the Db2 performance trace, system-wide (Db2 and non-Db2) performance might suffer because of possible SMF or GTF contention.

Statistics Trace

The final type of Db2 trace is the statistics trace, which contains information pertaining to the entire Db2 subsystem. This type of information is particularly useful for measuring the activity and response of Db2 as a whole. Information on the utilization and status of the buffer pools, DB2 locking, DB2 logging, and DB2 storage is accumulated by the statistics trace.

There are ten groups of DB2 statistics trace classes:

Class 1: Statistics data
Class 2: Installation-defined statistics record
Class 3: Data on deadlocks, lock escalation, group buffers, data set extension, long-running units of recovery, and active log shortage
Class 4: Exceptional conditions
Class 5: Data sharing statistics
Class 6: Storage usage
Class 7: DRDA location statistics
Class 8: Data set I/O
Class 9 through 29: Reserved
Class 30 through 32: Local use

The estimated overhead of the statistics trace is low. Approximately 1% to 2% CPU overhead per transaction is incurred by the Statistics trace.

Db2 cuts a statistics trace record periodically based on the setting of the STATIME subsystem parameter (DSNZPARM). STATIME is specified as a time interval, in minutes, and can range from 1 to 60 minutes. It is a good practice to set STATIME to 1, thereby specifying 1,440 statistics intervals per day. The information accumulated by cutting these statistics trace records can provide valuable details for solving complex system problem.

By analyzing the evolutional trend of statistics, sometimes the cause of problems can become evident that would otherwise be difficult to track down.

Even though 1,440 records sound large, in reality the amount of data collected is small when compared to the typical volume of accounting trace data collected. An additional thousand or so SMF records should not cause any problems, while at the same time offering valuable system information.

Next time...

This concludes the overview of the types of Db2 tracing that are available. In part 3, we will examine where trace records can be written as well as more narrow tracing using IFCIDs.