Tuesday, November 11, 2025

Don't Forget About Histogram Statistics in Db2 for z/OS

There are a lot of options and parameters to consider when putting together your RUNSTATS jobs to collect statistics on your Db2 data. For those who do not know, RUNSTATS is a Db2 for z/OS utility program that is used to collect information about characteristics of tables and indexes, and storing that data in the Db2 Catalog where it can be used by the Db2 Optimizer to formulate SQL access paths. 

Without reasonable statistics, as collected by RUNSTATS, you will likely experience SQL performance problems because Db2 will be accessing the data based on incorrect or outdated information about your data. As I have discussed before, RUNSTATS is one of The 5 R's of Db2 Performance.

OK, with that introductory bit out of the way, one capability of RUNSTATS, first introduced in Db2 9 for z/OS is somewhat underutilized. As you might guess from the title of this blog post, I am talking about histogram statistics.

But what is a histogram? Well, in mathematical terms, a histogram is a representation of data that shows the distribution of data in a data set. Often represented using a bar graph, each bar represents the frequency of data points within a specific range. However, unlike a typical bar chart (displaying categorical data), a histogram groups numbers into ranges, making it useful for identifying patterns, such as underlying frequency distribution, outliers, and skew.


So, a histogram is a way of summarizing data that’s measured on an interval scale. A histogram is particularly helpful when you want:

  • to highlight how data is distributed, 
  • to determine if data is symmetrical or skewed, or 
  • to indicate whether or not outliers exist.

The histogram is appropriate only for variables whose values are numerical and measured on an interval scale. To be complete, let’s define interval, which is a set of real numbers between two numbers either including or excluding one or both of them.  


Histograms are generally most useful when dealing with large data sets.


Why Use Histogram Statistics?


Db2 historgram statistics can be quite useful for certain types of data. Instead of the frequency statistics, which are collected for only a subset of the data, sometimes Db2 can improve access path selection by estimating predicate selectivity from histogram statistics, which are collected over all values in a table space.


Histogram statistics can help the Db2 Optimizer estimate filtering when certain data ranges are heavily populated and others are sparsely populated. Think about data that represents customers of a diner. Typically, there will be periods of high activity (breakfast, lunch, dinner) and low activity (other periods). Or perhaps on a grander scale, consider data that represents customer activity at a big box store. During big sales (such a Black Friday) and over holidays (Christmas, for example), there will be more sales than during other periods.


Depending on the type and makeup of the data, it can make sense to collect histogram statistics to improve access paths for troublesome queries used with the following predicates: 

  • RANGE
  • LIKE, and 
  • BETWEEN

Histogram statistics also can help in some cases for equality (=), IS NULL, IN list, and COL op COL predicates.

 

How to Collect Histogram Statistics

 

OK, so how can you go about collecting histogram statistics? Well, the IBM RUNSTATS utility can collect statistics by quantiles. A quantile is the name for a bucket representing a range of data. For instance, from our diner example discussed earlier, you might want to have quantiles representing pre-breakfast, breakfast, pre-lunch, lunch, pre-dinner, dinner, and late night.


Db2 allows up to 100 quantiles. You can specify how many quantiles Db2 is to use: from one up to 100. Of course, specifying one isn’t wise because it’s the same as collecting for everything and so it will not be very helpful.


You do not explicitly define the make-up of quantiles, though. Db2 does that. Db2 creates equal-depth histogram statistics, dividing the whole range of values into quantiles that each contain about the same percentage of the total number of rows. 


You indicate that RUNSTATS should collect histogram statistics by coding the HISTOGRAM keyword in conjunction with the COLGROUP option. This way you can collect histogram statistics for a group of columns. You also must tell Db2 the number of quantiles to collect by specifying the NUMQUANTILES parameter. NUMQUANTILES also can be specified with the INDEX parameter, in which case it indicates that histogram statistics are to be collected for the columns of the index.

 

A single value can never be broken into more than one interval. This means the maximum number of intervals is equal to the number of distinct column values. For example, if you have 40 values, you can have no more than 40 quantiles, each consisting of a single value. In other words, be sure that you don’t specify a value for NUMQUANTILES that is greater than the total number of distinct values for the column (or column group) specified. Also, keep in mind that any NULLs will occupy a single interval.

 

So, then, how do you decide on the number of quantiles to collect? If you don’t specify NUMQUANTILES, the default value of 100 will be used; then based on the number of records in the table, Db2 will adjust the number of quantiles to an optimal number. Therefore, unless you have a good understanding of the application or a viable reason to deviate, a reasonable rule of thumb is to simply let NUMQUANTILES default and let Db2 work it out.

 

RUNSTATS will attempt to produce an equal-depth histogram. This means each interval will have about the same number of rows. Please note that this doesn’t mean the same number of values—it’s the same number of rows. In some cases, a highly frequent single value could potentially occupy an interval all by itself.

 

Where Are the Histogram Statistics Stored?

 

The following columns in a histogram statistics table define the interval: 

  • QUANTILENO - a sequence number that identifies the interval
  • HIGHVALUE - a value specifying the upper bound for the interval. 
  • LOWVALUE - a value specifying the lower bound for the interval

These columns can be found in the following six Db2 Catalog tables: 

  • SYSCOLDIST, 
  • SYSKEYTGTDIST, 
  • SYSCOLDIST_HIST, 
  • SYSCOLDISTSTATS, 
  • SYSKEYTGTDIST_HIST, and 
  • SYSKEYTGTDISTSTATS.

 

Here’s a sample RUNSTATS specification that collects histogram statistics for the key columns of the indexes on tables in the CSMTS02 table space:

 

RUNSTATS TABLESPACE DB.CSMTS02

  INDEX ALL

  HISTOGRAM NUMCOLS 2 NUMQUANTILES 10

  SHRLEVEL(CHANGE)

  UPDATE ALL

  REPORT YES

 

Summary

 

If you have troublesome queries on data that is skewed or disproportionately distributed, you should consider gathering histogram statistics. Keep in mind that running RUNSTATS specifying HISTOGRAM will probably consume more CPU than just gathering frequency distribution statistics, the performance gains from improved access paths for your queries could more than offset that cost.

Wednesday, October 15, 2025

A Few Details on the TRUNCATE Statement

TRUNCATE is a simple, yet frequently forgotten way to remove data from your Db2 tables.

The TRUNCATE statement is simply a quick way to remove all of the data from a table. The table can be in any type of table space. And the table can be a declared global temporary table. If the table contains LOB or XML columns, the corresponding table spaces and indexes are also truncated.

For clarification, consider the following example:

TRUNCATE TABLE EXAMPLE_TABLE
  REUSE STORAGE
  IGNORE DELETE TRIGGERS
  IMMEDIATE;

But why do we need this statement? Won't a DELETE without a WHERE clause do the same thing? Well, sort of. TRUNCATE will not log any changes, whereas that DELETE would log your changes. That means TRUNCATE should operate more efficiently. But it could be hard to undo.

Seems easy enough, doesn’t it? But what are those additional parameters? 

Well, REUSE STORAGE tells Db2 to empty the storage that is allocated but keeps it allocated. The alternate, which is the default, is DROP STORAGE. This option tells Db2 to release the storage that is allocated for the table and to make it available for use for the same table or any other table in the table space.  

The next parameter, which is the default if nothing is specified, is IGNORE DELETE TRIGGERS. This tells Db2 to not fire any DELETE triggers. Alternately, you could specify RESTRICT WHEN DELETE TRIGGERS, which will return an error if there are any delete triggers defined on the table. In other words, TRUNCATE is not going to fire any DELETE triggers no matter what you tell it, but it can return an error if DELETE triggers exist that you do not know about (for example). 

Finally, we have the IMMEDIATE option. This causes the TRUNCATE to be immediately executed and it cannot be undone. If IMMEDIATE is not specified you can issue a ROLLBACK to undo the TRUNCATE.

Some More Details

So far, so good. But there always seem to be small things that fall through the cracks until you encounter them in the real world. One example is "How does TRUNCATE operate on archive tables?"

There are two considerations to ponder here. First, will the truncated data be moved to the archive like it would with a DELETE statement? The answer is no, TRUNCATE will NOT move the rows it deletes to the archive table. TRUNCATE simply gets rid of the data without archiving it.

The second consideration is if you TRUNCATE a table that has an archive table, will TRUNCATE remove the data in both the base table and the archive table? Again, the answer is no. TRUNCATE will NOT remove rows in the archive table. You have to remove those separately using a TRUNCATE statement directly against the archive table by name to get rid of archived rows.

Another thing I have seen folks get confused about is the STORAGE option. Some assume that specifying DROP STORAGE will cause the operating system allocated storage/files to be removed. That is not the case. The DROP STORAGE option only releases the storage inside the tablespace. This means that you need to run a REORG to fully clean up the space (after you run the TRUNCATE DROP STORAGE). Also, be sure not to specify REUSE when you run the REORG.

Furthermore, specifying DROP STORAGE should remove extents, not allocated space. So if the tablespace is allocated at say 100 tracks you will never go below 100 tracks without either changing the primary allocation or dropping the object. Now if there were say 5 extents of 10 tracks each, truncating and then reorganizing should get rid of those 50 tracks.

What About PBGs?

For Partition-By-Growth (PBG) table spaces, the REORG utility can physically drop data sets associated with partitions that become entirely empty and are at the "end" of the table space. However, this behavior is controlled by the REORG_DROP_PBG_PARTS subsystem parameter (ZPARM) available since Db2 11 for z/OS:

  • REORG_DROP_PBG_PARTS = ENABLE: When this is set, reorganizing the entire PBG tablespace will attempt to consolidate all data into the lowest-numbered partition(s). Any trailing partitions (highest-numbered partitions) that become empty as a result of the REORG will have their associated data sets physically removed (dropped).

  • REORG_DROP_PBG_PARTS = DISABLE (Default): Empty trailing partitions are retained, and their data sets are not removed.

So keep this in mind if you are attempting to reclaim space in PBG table spaces that have decreased data volume.

A Useful Little Statement

Overall, TRUNCATE is a useful little tool to keep in your quiver. For certain requirements it can be just what the doctor ordered!

Monday, October 06, 2025

The Many Types of Mainframe Pricing

Whenever somebody starts talking about software pricing on the mainframe, or tuning to save on their monthly mainframe bill, I always wonder just exactly "what" they are talking about. Are they aware of all their company’s licensing agreements and all the intricacies they contain?


Before we move on, let me just state that the mainframe still makes good economic sense. The total cost of ownership (TCO) for the mainframe, when you add up all of the cost components (such as hardware, software licenses, storage, network, labor, power and cooling), is similar to or lower than comparable platforms.

So, don’t get me wrong, the mainframe is a great place to run your enterprise applications. And I am all in favor of tuning your code, infrastructure, and environment. But there are many reasons for doing so. Of course, the first reason is to improve the experience of your end users. The more efficiently tuned your software and the system it runs on, the more optimal and pleasant the experience of your end users will be. So, improving your customer experience is always a valid reason for tuning and optimizing your software.

That said, sometimes the intention of tuning efforts gets stated as to reduce costs. And that is a noble goal. But quite frequently, it seems, people undertaking the assignment to tune for cost optimization do not have enough information on achieving that goal!

What must be understood if you are looking to reduce mainframe cost by tuning? Well, I think the first thing you need to understand is what comprises your mainframe cost.

At a high level there is the cost of the IBM Z hardware that has to be taken into account. Of course, there is little that you can do to reduce hardware cost once you have purchased your mainframe. That said, by tuning your workloads to avoid reaching the peak utilization capacity of your mainframe you can avoid the need to upgrade to a new machine (a costly endeavor, indeed). This can be achieved by tuning activity to reduce CPU requirements or by moving workload around to smooth out the peaks. This is something that capacity planners should always be looking at and taking measures to achieve.

The next thing you need to understand is the mainframe software licenses that you have in place. This is not as easy as it might sound. Let’s start with the pricing metrics offered by IBM for Monthly License Charge (MLC) products. The IBM MLC products include operating systems, middleware, compilers, and other system software offerings. Examples of MLC products include: z/OS, z/TPF, CICS, Db2, IMS, COBOL, and so on. These products have a recurring charge that is applied each month for the right to use the product and also access IBM product support. Fair enough, and so far, not overly confusing.

But you also must know the specific MLC pricing metric used by your organization. And there are multiple, including:

           Advanced Workload License Charges (AWLC)

           Country Multiples License Charges (CMLC)

           Variable Workload License Charges (VWLC)

           Flat Workload License Charges (FWLC)

           Advanced Entry Workload License Charges (AEWLC)

            Entry Workload License Charges (EWLC)

            Tiered Workload License Charges (TWLC)

            System z New Application License Charges (zNALC)

            Parallel Sysplex License Charges (PSLC)

            Midrange Workload License Charges (MWLC)

            zSeries Entry License Charges (zELC)

            Growth Opportunity License Charges (GOLC)

Each of these metrics has specific requirements and specific conditions for how MSU usage is charged. Some of these metrics, such as AWLC and VWLC, offer sub-capacity licensing. That means that the software charges are based on the utilization capacity of the logical partitions (LPARs) on which the product runs. There are many nuances to how this actually works, but the bottom line is that if you can tune workloads that run during a peak rolling four-hour average (R4HA) period for the month, you can likely reduce your monthly bill.

Of course, there are also full-capacity metrics, where all software charges are determined by the full IBM-rated capacity (MSUs) of the CPC in which a product runs. Examples of full capacity based pricing metrics are PSLC and zELC. 

With the sub-capacity pricing metrics above your monthly bill can vary, sometimes substantially, from month-to-month. IBM introduced Tailored-Fit Pricing (TFP) for organizations looking for a predictable monthly bill. The general approach of TFP is to provide a predictable cloud-like pricing option for IBM z software. At a high-level, your overall usage for the last year is reviewed and an estimated increase is factored into the usage. Your monthly bill for the upcoming year is then 1/12 of last year’s total usage (plus the increase).

Then we have to consider International Program License Agreement (IPLA) products, which have an up-front license fee and an optional annual maintenance charge. IPLA products include tools for managing Db2, CICS, IMS, and others. To further complicate matters, at least when trying to tune for cost reduction, some mainframe IPLA products can be licensed at a sub-capacity level. 

And let’s not forget Container Pricing for IBM Z, where certain defined workloads can scale without impacting other workloads, either within an existing LPAR, separate LPARs, or multiple LPARs.

The Bottom Line

Given this general information, you can see how the choice of pricing metric will impact the efficacy of your tuning to reduce costs. Before undertaking any mainframe cost optimization effort be sure that you understand the different licenses for IBM Z software that are in effect at your organization. Furthermore, it is essential that you have effective tools for monitoring your CPU utilization, including identify monthly peaks. Armed with sufficient information and proper tools (including tuning and modernization tools), your chances of achieving cost reduction will increase greatly!

--- 

For additional information on mainframe pricing metrics and how zIIPs can help, check out my book on the topic: IBM Mainframe Specialty Processors: Understanding zIIPs, Licensing, and Cost Savings on the IBM Sytem z.


Thursday, August 28, 2025

Some Under-the-Radar Db2 13 for z/OS Features to Consider

Db2 13 for z/OS has been available for some time now, and soon (December 2025) it will be the only version of Db2 supported by IBM. So, we all should be either using Db2 13 already, or well into the process of migrating to Db2 13.

With that in mind, here are a few a lesser-known but compelling new features in Db2 13 for z/OS. That is, these enhancements have not received as much attention as the more news-hogging AI and performance new features. 

Online Removal of Active Log Datasets 

It is now possible to remove active log data sets while Db2 is up and running. This new capability is available in Function Level 500 and above. This can be accomplished using 

–SET LOG REMOVELOG 

Using this new option of the -SET LOG allows you to safely remove an active log data set from the BSDS without requiring downtime—as long as the data set isn't currently in use.

  • If the data set is in use, the data set will be placed in a “REMOVAL PENDING” state, making it unavailable moving forward, until explicitly handled.

  • If it's the next log to be written, the command fails with “REMOVAL PROHIBITED.”

  • You can monitor this using -DISPLAY LOG DETAIL and use D GRS,RES=(*,dsname,*) to check log usage.

This feature greatly reduces operational risk and complexity during log maintenance in active environments.

DDL Lock Management

Db2 13 adds several new controls that help improve availability and reduce contention when performing DDL (data definition) operations:

  • CURRENT LOCK TIMEOUT special register: Lets you override the system-level lock timeout (IRLMRWT) on a per-statement basis (values from 1–32767 seconds), limiting how long transactions queue behind DDL.

  • DEADLOCK_RESOLUTION_PRIORITY global variable: Assigns a numerical priority (0–255) to help determine which process is likely to win in a deadlock. Higher values make a DDL process less likely to be chosen as a deadlock victim.

  • System monitor profiles can now be configured — for both local and remote applications — to automatically set these values and even adjust package release behavior between RELEASE(COMMIT) and RELEASE(DEALLOCATE).

These features provide more granular control over lock management and thereby should help us reduce disruptions, improve the responsiveness of DDL, and help maintain service levels across transactional workloads.

Why These Features Deserve More Spotlight

  • Operational impact without fanfare: While AI functions and accelerator improvements grab headlines, these enhancements quietly deliver high-impact capabilities—especially in high-availability, non-stop environments.

  • Prevents outages during routine tasks: The ability to remove log datasets live and better manage DDL locking improves reliability and uptime for critical systems.

  • Real-world value for DBAs and sysprogs: These are features that seasoned Db2 for z/OS professionals will deeply appreciate—and can use to simplify otherwise risky operations.


Bonus: Other Less-Heralded, but Useful Enhancements

From the 2024 continuous delivery updates (without function-level control), these two new capabilities also seem to be flying under the radar:

  • Database utilities running on zIIP: APAR PH63832 allows portions of the COPY utility to leverage zIIP processing, reducing CPU costs.

  • Targeted statistics deletion: APAR PH63145 lets you delete catalog statistics for a specific partition—without touching the whole object.


Maybe I have missed your favorite under the rader Db2 13 enhancement? If so, please share it with the community in a comment below!

Tuesday, August 19, 2025

Mainframe Relevance in an AI-First Era: How Db2 Fits

For decades, the IBM Z mainframe has been the backbone of mission-critical computing. Db2 for z/OS sits at the center of this story, reliably managing the world’s most sensitive and high-value data. Yet in today’s IT landscape, dominated by discussions of artificial intelligence (AI), machine learning, and data-driven transformation, the question inevitably arises: 

Where does Db2 fit in an AI-first world?

The answer is clear: Db2 remains central. In fact, it is uniquely positioned to power and support enterprise AI initiatives.

The Foundation of Trustworthy Data

AI is only as good as the data that feeds it. Models trained on incomplete, inconsistent, or inaccurate data produce unreliable outcomes. This is where Db2 shines. With its proven capabilities for data integrity, security, and availability, Db2 for z/OS provides the foundation of trustworthy, enterprise-grade data that AI depends upon.

Organizations already store their most critical operational data in Db2. Leveraging this data directly—without needing complex ETL processes that move it into less secure environments—offers a significant advantage. AI workloads can run against reliable, current data with governance and compliance controls already in place.

Db2 and Embedded AI Capabilities

IBM has not stood still in bringing AI to Db2 for z/OS. For example, Db2 AI for z/OS (Db2ZAI) uses machine learning models to improve database performance. By analyzing workload patterns, Db2ZAI can recommend optimal buffer pool configurations, predict query performance, and even assist the optimizer in choosing the best access paths. This closes the loop: AI is being applied inside Db2 itself to make database management more intelligent and efficient.

Similarly, SQL Data Insights brings AI-powered analytics directly into Db2 for z/OS, enabling built--in SQL functions to use AI for anomaly detection and data pattern recognition without requiring external AI platforms. These capabilities allow organizations to unlock the hidden value in their Db2 data more quickly and intuitively.

Synergy with IBM Z and AI Acceleration

The hardware platform itself reinforces this story. The latest IBM z16 and z17 mainframes incorporate on-chip AI acceleration with the Telum processor and Spyre AI accelerator. This means that inferencing can be performed where the data resides, avoiding latency and risk associated with data movement. For financial institutions detecting fraud, retailers optimizing transactions, or insurers assessing claims, the ability to apply AI in real-time on operational data is transformative.

Db2, running on these systems, is directly positioned to take advantage of this capability—turning the mainframe into not just a system of record, but also a system of insight and decision.

The DBA’s Evolving Role in an AI-First Era

As AI integrates more deeply into Db2, the role of the DBA also evolves. No longer solely the guardian of performance tuning and availability, the modern DBA must understand how AI tools are being embedded in their environment. This includes evaluating AI-driven recommendations, integrating AI queries into business applications, and ensuring that AI workloads are governed and secure.

Rather than diminishing the DBA’s importance, AI amplifies it. Human expertise is needed to validate, interpret, and operationalize AI-driven insights in ways that align with business priorities and regulatory requirements.

Conclusion

The narrative that positions mainframes and Db2 as “legacy” is misguided. In reality, Db2 for z/OS sits at the heart of enterprise AI adoption. With its unmatched reliability, native AI capabilities, and synergy with IBM Z’s AI-accelerated hardware, Db2 is not only relevant but critical in an AI-first world.

For organizations pursuing AI, the best path forward often starts with the data they already trust most—residing in Db2. The mainframe is not being left behind by AI; it is, in fact, helping to lead the way.