The Db2 Portal Blog: SQL

Showing posts with label SQL. Show all posts

Thursday, July 02, 2026

The Most Overlooked Db2 Performance Metric

If I had to identify a single most overlooked Db2 performance metric, it would be GETPAGEs.

Many organizations focus on CPU utilization, elapsed time, or buffer pool hit ratios because those metrics are easy to understand and frequently appear on dashboards. But getpages often reveal performance problems long before those other metrics become alarming.

In Db2 for z/OS, a GETPAGE represents a request by Db2 to access a page of data or an index page. If the page is already in the buffer pool, no physical I/O is required. If not, Db2 must read it from disk. Either way, every GETPAGE consumes CPU resources.

Why GETPAGEs Matter

Because every SQL statement generates GETPAGE requests, the number of GETPAGEs is a fundamental measure of how much work Db2 performs to execute a workload. The more pages Db2 must examine to satisfy a query, the more work it performs. Reducing unnecessary GETPAGEs often leads directly to lower CPU consumption and better application performance.

High GETPAGE counts usually indicate one or more of the following:

Inefficient index usage
Poor clustering
Table scans
Low filter factors
Redundant index probes
Accessing more columns or rows than necessary

Even when all pages are found in the buffer pool, millions of unnecessary GETPAGEs still consume CPU.

The Hidden CPU Consumer

I've worked with organizations that upgraded processors because CPU costs continued to rise. After analyzing the workload, we discovered that many critical SQL statements were generating ten or even one hundred times more GETPAGEs than necessary.

A simple index redesign or SQL rewrite reduced GETPAGE counts dramatically. And CPU usage dropped without any hardware changes.

That's why I often tell clients: "Every unnecessary GETPAGE results in costly unneeded work."

Don't Just Count GETPAGEs

The raw number of GETPAGEs by itself does not tell the whole story. Instead, monitor:

GETPAGEs per transaction
GETPAGEs per SQL statement
GETPAGEs per row returned
Trends over time
The highest GETPAGE-consuming applications

A workload processing twice as many transactions should naturally generate more GETPAGEs. The real warning sign is when GETPAGEs per transaction steadily increase. That usually means SQL or access paths have degraded.

GETPAGEs Point to Root Causes

Unlike CPU utilization, which tells you that work is occurring, GETPAGEs often explain why.

For example:

Symptom	What High GETPAGEs May Indicate
High CPU	Excessive index or table page accesses
Long elapsed time	Inefficient access path
Buffer pool pressure	Poor locality of reference
Lock contention	Long-running scans
Increased zIIP usage	More work being offloaded, but still excessive overall processing

My Rule of Thumb 👍

After more than four decades tuning Db2 systems, one principle has remained remarkably consistent:

Watch the work, not just the time.

Elapsed time can fluctuate because of concurrency. CPU depends on hardware generation. I/O depends on storage technology.

But GETPAGEs measure the amount of work Db2 is performing. If you reduce the work, you almost always improve performance.

Other Frequently Overlooked Metrics

Although GETPAGEs are my top choice, several other metrics deserve more attention:

Synchronous read percentage — Indicates how often Db2 must wait for I/O instead of benefiting from prefetch.
Pages read per GETPAGE — Helps evaluate buffer pool effectiveness.
Lock suspension time — Often more meaningful than simply counting lock waits.
Class 3 suspension time — Shows where Db2 is waiting (I/O, locks, logging, etc.).
RID pool failures — Can reveal access path problems that are otherwise difficult to diagnose.
Sort overflows — Indicate insufficient sort memory or inefficient SQL.
Index leaf page split rates — A useful indicator of index maintenance issues and clustering degradation.

If I were teaching a new Db2 performance analyst, I'd spend less time looking at CPU graphs and more time asking, "Why is Db2 doing so much work?" In many cases, the answer begins with GETPAGEs.

Thursday, May 14, 2026

The Mullins 5‑Step Performance Diagnosis Framework

In the complex ecosystem of enterprise mainframes and hybrid cloud environments, performance tuning often feels like chasing ghosts in the machine. However, achieving peak efficiency in systems like Db2 doesn't require magic. No, it requires a repeatable, disciplined methodology.

Whether you are managing massive digital transactions or high-concurrency analytics, the following 5-Step Performance Diagnosis Framework provides a blueprint for resolving the vast majority of enterprise performance bottlenecks. This framework consists of the following 5 steps...

Analyze workload patterns
Review SQL and access paths
Inspect locking and contention
Evaluate buffer pool efficiency
Check for recent changes

...and it can help to resolve the majority of enterprise database/application performance issues.

Let's examine each step in a little more detail.

1. Analyze Workload Patterns

Before diving into tweaking and turning technical knobs, you first have to understand the nature of the demand. Is the performance degradation systemic or isolated?

Systemic issues are environment-wide or application-wide degradations. They affect a broad range of users and processes simultaneously, regardless of the specific task being performed.
Isolated issues are localized to a specific transaction, query, or user group. While one part of the system is failing to meet service level agreements (SLAs), the rest of the environment may be operating at peak efficiency.

By differentiating between these two types of problems, you can avoid the "shotgun approach" to tuning, such as changing global settings (e.g. DSNZPARMs) to fix a local problem. This often results in unintended consequences for the rest of the enterprise.

When you analyze workload patterns, you can differentiate between a runaway batch job and a steady-state increase in transaction volume. Understanding the "shape" of your data traffic is the first step in identifying if the issue is a sudden spike or a gradual trend toward resource exhaustion.

2. Review SQL and Access Paths

The most significant performance gains are almost always found in the SQL. A single poorly written query or a sub-optimal access path chosen by the optimizer can cripple an application. Reviewing the EXPLAIN output to verify index usage, join methods, and sort requirements is critical. In a world of evolving database engines, ensuring your SQL remains efficient is a primary responsibility of the modern DBA.

Using tools that create a visual diagram and "explanation" of your EXPLAIN output can greatly improve the efficiency of your SQL tuning efforts. The primary tool used by Db2 for z/OS sites these days is IBM Db2 Administration Foundation for z/OS, which is the modern, browser-based replacement for IBM Data Studio. It is built on the Zowe framework (open source) and provides a streamlined interface for mainframe DBAs. It displays the SQL execution plans as a graphical tree, showing table scans, index lookups, and join methods.

Consider reviewing and using my recently published blog containing my SQL Review Workflow.

3. Inspect Locking and Contention

In a high-availability Parallel Sysplex or clustered environment, concurrency is king. However, where there is concurrency, there is the potential for contention. Inspecting lock wait times, deadlocks, and timeout statistics can reveal hidden bottlenecks where processes are fighting over the same resources. Tuning your commit frequency and isolation levels can often alleviate these "soft" performance hits that don't show up in CPU metrics alone.

Indeed, one of the most common problems I find in my consulting engagements is what I call Bachelor Programming Syndrome or "Fear of COMMITing."

Unless you plan for, and issue regular COMMITs in your programs that access Db2 data, you will be causing locking problems. It is important for every programmer to issue COMMIT statements in all application programs where data is modified (INSERT, UPDATE, and DELETE).

Failing to code COMMITs in a data modification program
is what I like to call Bachelor Programming Syndrome...
in other words, Fear of Committing.

The COMMIT externalizes the modifications that occurred in the program since the beginning of the program or the last COMMIT. A COMMIT ensures that all modifications have been physically applied to the database, thereby ensuring data integrity and recoverability. Without COMMITs, data remains locked making it inaccessible to others, eventually causing timeout failures.

4. Evaluate Buffer Pool Efficiency

Data that stays in memory is data that moves fast. Evaluating buffer pool hit ratios and asynchronous I/O rates is essential for ensuring that your most frequently accessed "hot" data isn't being constantly paged out to disk. Efficient memory management reduces the I/O overhead that often serves as the hard ceiling for database performance.

When allocating Db2 buffer pools, keep some basic rules of thumb in mind:

Don't allocate everything to a single buffer pool (e.g., BP0); use a multiple buffer pool strategy.
Explicitly specify a buffer pool for every table space and index.
Isolate the Db2 Catalog in BP0 (and BP8K0 as of V8); put user and application Db2 objects into other buffer pools.
It is a best practice to separate indexes from table spaces with each in their own dedicated buffer pools.
Consider isolating heavily hit data into its own buffer pool to better control performance.
Consider isolating sorts into a single buffer pool and tuning it for mostly sequential access (e.g. BP7).
Consider separating Db2 objects into separate buffer pools that have been configured for sequential vs random access.
Consider pinning small, high usage tables in memory by dedicating a buffer pool to each and using the PGSTEAL(NONE) option.

5. Check for Recent Changes

The most common cause of a performance "event" is change. Whether it’s a recent REORG, a change in Db2 function level (or even maintenance being applied), or a modified application module, you must look at the delta between "yesterday" and "today". Always audit the most recent modifications to the environment to see if a configuration shift has triggered an unintended consequence in the execution plan.

Summary: While AI-driven tools are beginning to assist in database administration, the foundational logic of performance tuning remains rooted in these core pillars. By systematically moving through workload, SQL, locking, memory, and change management, you can stabilize and optimize even the most demanding enterprise environments.

How has your team integrated these diagnostic steps into your current hybrid cloud monitoring workflow?

Wednesday, April 08, 2026

A High-Level Guide for Db2 Database Health Checks

Here’s a practical, field-tested 10-point Db2 database health check you can use to quickly assess the state of your environment and identify risk areas.

1. Buffer Pool Efficiency

Check hit ratios (GETPAGE vs. I/O)
Look for excessive synchronous reads
Ensure critical objects reside in appropriately sized buffer pools

2. SQL Performance & Access Paths

Identify top CPU-consuming and elapsed time SQL
Look for table scans, non-matching index scans, and sort activity
Validate access paths haven’t regressed (especially after RUNSTATS or REBIND)

3. Index Effectiveness

Check for unused or duplicate indexes
Identify missing indexes for high-frequency queries
Review clustering ratios and index levels

4. RUNSTATS Currency & Quality

Ensure statistics are up to date on critical tables and indexes
Verify distribution stats (FREQVAL, HISTOGRAM) where needed
Watch for stale or missing stats causing poor optimizer choices

5. Locking & Concurrency

Monitor lock waits, timeouts, and deadlocks
Identify hot objects or pages
Evaluate isolation levels and commit frequency

6. I/O Performance

Review I/O response times by dataset and volume
Check for hotspots or imbalance across storage groups
Ensure proper striping and dataset placement

7. Logging & Recovery Health

Monitor active log utilization and archive activity
Check for long-running units of work delaying log reuse
Validate backup and recovery procedures (image copies, PIT recovery readiness)

8. Utilities & Maintenance

Ensure REORGs are run where needed (based on RTS, not just schedule)
Validate COPY, RUNSTATS, and REORG cadence
Look for objects in advisory REORG or COPY pending states

9. System Resource Utilization

Track CPU consumption (general purpose vs. zIIP if applicable)
Monitor memory usage and EDM pool (for Db2 for z/OS)
Watch for thread reuse and connection pooling efficiency

10. Configuration & ZPARM Settings

Review key ZPARMs (or Db2 LUW configuration parameters)
Validate they align with workload characteristics
Check for outdated defaults that may no longer be optimal

Bonus Tip: Trend, Don’t Snapshot

A single point-in-time check is useful—but trending over time is where the real insight lives. Look for gradual degradation, not just obvious breakage.

Wednesday, March 25, 2026

The Silent Performance Killers in Db2 for z/OS: What You’re Probably Not Monitoring

If you’ve been working with Db2 for z/OS long enough, you’ve likely experienced this scenario.

Everything looks fine. CPU is within expected ranges. Buffer pools aren’t thrashing. Locking isn’t out of control. The dashboards are green. Nothing is obviously broken.

And yet, users start to complain. Batch jobs run a little longer. Online response times stretch just enough to be noticeable. Not a crisis—but not quite right, either.

So you dig in. At first, nothing stands out. But as you peel back the layers, you discover the truth: performance didn’t fall off a cliff. It eroded. Quietly. Gradually. Almost invisibly.

These are the silent performance killers. And in many Db2 for z/OS environments, they’re not being monitored closely enough.

The Illusion of Stability

One of the strengths of Db2 for z/OS is its stability. With features like plan management, access path stability, and mature instrumentation, it is entirely possible to run a system that appears steady for long periods of time.

But stability can be deceptive.

When access paths remain “stable,” it is easy to assume they remain “optimal.” When performance metrics stay within historical ranges, it is tempting to believe everything is under control.

In reality, small inefficiencies can accumulate. A slightly suboptimal access path here. A bit of extra I/O there. A few more GETPAGEs than last quarter. Individually insignificant. Collectively impactful.

And because the degradation is gradual, it often escapes notice... until it becomes difficult to ignore.

Predicate Processing: Stage 1 vs. Stage 2 Still Matters

It is fashionable in some circles to assume that the old distinctions no longer matter. But predicate processing still plays a critical role in performance.

Stage 2 predicates, non-indexable expressions, and function-wrapped columns can quietly force Db2 to do more work than necessary. Even when an index is present, it may not be used as efficiently as expected.

What makes this particularly insidious is that the SQL hasn’t necessarily changed. But data distribution might have. Or perhaps a new release introduced subtle optimizer behavior differences. Or a developer added a seemingly harmless function to a predicate.

The query still runs. It just runs a little slower. Multiply that across hundreds or thousands of query executions, and the cost can become significant.

Statistics Drift: RUNSTATS Is Not a “Set It and Forget It”

Most Db2 shops run RUNSTATS. That’s not the problem. The problem is assuming that any RUNSTATS is sufficient.

Over time, data changes. Skew increases. New values appear. Old assumptions become invalid. Yet many organizations rely on static RUNSTATS profiles that no longer reflect reality. When statistics drift, the optimizer’s decisions drift with them.

Access paths that were once ideal become merely acceptable. Then marginal. Then problematic. But because RUNSTATS is still running, there is a false sense of confidence.

The real question is not whether RUNSTATS is being executed. It is whether the right statistics are being collected at the right time with the right level of detail.

zIIP Offload: Are You Getting What You Expect?

zIIP utilization is often viewed as a win when the numbers look high. But high utilization does not necessarily mean optimal utilization.

Some workloads that should be offloaded are not. Others are only partially eligible. In some cases, system or application changes inadvertently reduce zIIP eligibility without anyone noticing.

The result is subtle but important: more work shifts back to general-purpose CPUs.

You may not see a dramatic spike. But you may see a steady increase over time—one that is difficult to explain if you are not explicitly tracking zIIP efficiency at a granular level.

Buffer Pool Health: Beyond Hit Ratios

Buffer pool hit ratios have long been a staple metric. But by themselves, they can be misleading.

A high hit ratio does not guarantee efficient memory usage. It is entirely possible to have acceptable hit ratios while still experiencing excessive page churn, suboptimal page residency, or inefficient object placement.

As workloads evolve, the way data is accessed changes. Tables grow. Index usage shifts. New applications introduce different access patterns.

If buffer pools are not periodically re-evaluated and tuned in light of these changes, inefficiencies can creep in unnoticed.

Again, nothing breaks. Things just get a little slower.

The Application Factor

Perhaps the most significant, and most likely least visible, source of silent degradation is the application layer. Modern development practices introduce new challenges. ORMs generate SQL that is technically correct but not always efficient. Microservices increase the number of distinct access patterns. Dynamic SQL proliferates. And often, these changes occur outside the traditional DBA change control process.

The DBA sees the symptoms—more executions, different access paths, increased resource consumption... but not always the cause. Without tighter collaboration between development and database teams, these inefficiencies can persist indefinitely.

Plan Stability: A Double-Edged Sword

Plan stability features are invaluable. They protect against sudden access path regressions and provide a safety net during change. But they can also create complacency.

When a plan is “locked in,” it is easy to stop questioning whether it is still the best plan. Over time, as data and workloads evolve, a once-optimal access path may no longer be ideal.

Yet it persists. Not because it is the best choice, but because it is thought to be the safest.

Periodic review is essential. Stability should not mean stagnation.

What Should You Be Doing?

The answer is not to monitor more metrics indiscriminately. It is to monitor the right things with the right perspective. Look for trends, not just thresholds. Compare performance over time, not just against static baselines. Question assumptions that have gone unchallenged for years.

Examine access paths periodically—even for stable workloads. Revisit RUNSTATS strategies. Validate zIIP expectations. Reassess buffer pool configurations.

And perhaps most importantly, engage with application teams. Understand how the workload is changing, not just how it is performing.

Final Thoughts

The most dangerous performance problems in Db2 for z/OS are not the ones that cause immediate failures. They are the ones that quietly erode efficiency over time.

They do not trigger alarms. They do not demand urgent attention. They simply persist. Until one day, “everything looks fine” is no longer true. And by then, the fix is far more involved than it would have been if the problem had been caught earlier.

The challenge... and the opportunity... for today’s DBA is to look beyond the obvious. To question the comfortable. And to recognize that in a mature, stable system, the biggest risks are often the ones you are not actively watching.

Tuesday, November 11, 2025

Don't Forget About Histogram Statistics in Db2 for z/OS

There are a lot of options and parameters to consider when putting together your RUNSTATS jobs to collect statistics on your Db2 data. For those who do not know, RUNSTATS is a Db2 for z/OS utility program that is used to collect information about characteristics of tables and indexes, and storing that data in the Db2 Catalog where it can be used by the Db2 Optimizer to formulate SQL access paths.

Without reasonable statistics, as collected by RUNSTATS, you will likely experience SQL performance problems because Db2 will be accessing the data based on incorrect or outdated information about your data. As I have discussed before, RUNSTATS is one of The 5 R's of Db2 Performance.

OK, with that introductory bit out of the way, one capability of RUNSTATS, first introduced in Db2 9 for z/OS is somewhat underutilized. As you might guess from the title of this blog post, I am talking about histogram statistics.

But what is a histogram? Well, in mathematical terms, a histogram is a representation of data that shows the distribution of data in a data set. Often represented using a bar graph, each bar represents the frequency of data points within a specific range. However, unlike a typical bar chart (displaying categorical data), a histogram groups numbers into ranges, making it useful for identifying patterns, such as underlying frequency distribution, outliers, and skew.

So, a histogram is a way of summarizing data that’s measured on an interval scale. A histogram is particularly helpful when you want:

to highlight how data is distributed,
to determine if data is symmetrical or skewed, or
to indicate whether or not outliers exist.

The histogram is appropriate only for variables whose values are numerical and measured on an interval scale. To be complete, let’s define interval, which is a set of real numbers between two numbers either including or excluding one or both of them.

Histograms are generally most useful when dealing with large data sets.

Why Use Histogram Statistics?

Db2 historgram statistics can be quite useful for certain types of data. Instead of the frequency statistics, which are collected for only a subset of the data, sometimes Db2 can improve access path selection by estimating predicate selectivity from histogram statistics, which are collected over all values in a table space.

Histogram statistics can help the Db2 Optimizer estimate filtering when certain data ranges are heavily populated and others are sparsely populated. Think about data that represents customers of a diner. Typically, there will be periods of high activity (breakfast, lunch, dinner) and low activity (other periods). Or perhaps on a grander scale, consider data that represents customer activity at a big box store. During big sales (such a Black Friday) and over holidays (Christmas, for example), there will be more sales than during other periods.

Depending on the type and makeup of the data, it can make sense to collect histogram statistics to improve access paths for troublesome queries used with the following predicates:

RANGE,
LIKE, and
BETWEEN.

Histogram statistics also can help in some cases for equality (=), IS NULL, IN list, and COL op COL predicates.

How to Collect Histogram Statistics

OK, so how can you go about collecting histogram statistics? Well, the IBM RUNSTATS utility can collect statistics by quantiles. A quantile is the name for a bucket representing a range of data. For instance, from our diner example discussed earlier, you might want to have quantiles representing pre-breakfast, breakfast, pre-lunch, lunch, pre-dinner, dinner, and late night.

Db2 allows up to 100 quantiles. You can specify how many quantiles Db2 is to use: from one up to 100. Of course, specifying one isn’t wise because it’s the same as collecting for everything and so it will not be very helpful.

You do not explicitly define the make-up of quantiles, though. Db2 does that. Db2 creates equal-depth histogram statistics, dividing the whole range of values into quantiles that each contain about the same percentage of the total number of rows.

You indicate that RUNSTATS should collect histogram statistics by coding the HISTOGRAM keyword in conjunction with the COLGROUP option. This way you can collect histogram statistics for a group of columns. You also must tell Db2 the number of quantiles to collect by specifying the NUMQUANTILES parameter. NUMQUANTILES also can be specified with the INDEX parameter, in which case it indicates that histogram statistics are to be collected for the columns of the index.

A single value can never be broken into more than one interval. This means the maximum number of intervals is equal to the number of distinct column values. For example, if you have 40 values, you can have no more than 40 quantiles, each consisting of a single value. In other words, be sure that you don’t specify a value for NUMQUANTILES that is greater than the total number of distinct values for the column (or column group) specified. Also, keep in mind that any NULLs will occupy a single interval.

So, then, how do you decide on the number of quantiles to collect? If you don’t specify NUMQUANTILES, the default value of 100 will be used; then based on the number of records in the table, Db2 will adjust the number of quantiles to an optimal number. Therefore, unless you have a good understanding of the application or a viable reason to deviate, a reasonable rule of thumb is to simply let NUMQUANTILES default and let Db2 work it out.

RUNSTATS will attempt to produce an equal-depth histogram. This means each interval will have about the same number of rows. Please note that this doesn’t mean the same number of values—it’s the same number of rows. In some cases, a highly frequent single value could potentially occupy an interval all by itself.

Where Are the Histogram Statistics Stored?

The following columns in a histogram statistics table define the interval:

QUANTILENO - a sequence number that identifies the interval
HIGHVALUE - a value specifying the upper bound for the interval.
LOWVALUE - a value specifying the lower bound for the interval

These columns can be found in the following six Db2 Catalog tables:

SYSCOLDIST,
SYSKEYTGTDIST,
SYSCOLDIST_HIST,
SYSCOLDISTSTATS,
SYSKEYTGTDIST_HIST, and
SYSKEYTGTDISTSTATS.

Here’s a sample RUNSTATS specification that collects histogram statistics for the key columns of the indexes on tables in the CSMTS02 table space:

RUNSTATS TABLESPACE DB.CSMTS02

INDEX ALL

HISTOGRAM NUMCOLS 2 NUMQUANTILES 10

SHRLEVEL(CHANGE)

UPDATE ALL

REPORT YES

Summary

If you have troublesome queries on data that is skewed or disproportionately distributed, you should consider gathering histogram statistics. Keep in mind that running RUNSTATS specifying HISTOGRAM will probably consume more CPU than just gathering frequency distribution statistics, the performance gains from improved access paths for your queries could more than offset that cost.

Thursday, July 25, 2024

Coding Db2 Applications for Performance - Expert Videos Series

Today's blog post is to share with my readers that I have partnered with Interskill Learning and produced a series of videos in the Expert Video Series on how to code Db2 applications for performance.

My regular readers know that application performance is a passion of mine. You may also have read my recent book on the topic, A Guide to Db2 Performance for Application Developers. But if you are looking for videos to guide you through the process optimizing your application development for Db2, look no further than the six-part series I recorded for Interskill Learning, Coding Db2 Applications for Performance.

You do not need in-depth pre-existing knowledge of Db2 to gain insight from these video lessons. The outline of the six courses are as follows:

Db2 Coding – Defining Database Performance

Providing a Definition
The Four Components
Diving a Little Deeper

Db2 Coding – Coding Relationally

What is Relational?
Relational vs. Traditional Thinking
What Does It Mean to Code Relationally?
Unlearning Past Coding Practices

Db2 Coding – General SQL and Indexing Guidelines

Types of SQL
SQL Coding Best Practices
Indexes and Performance
Stages and Clustering

Db2 Coding – Coding for Concurrent Access

Introduction to Concurrency
Locking
Locking Duration and Binding
Locking Issues and Strategies
Query Parallelism

Db2 Coding – Understanding and Reviewing Db2 Access Paths

Single Table Access Paths
Multi-table Access Paths
Filter Factors
Access Paths and EXPLAIN

Db2 Coding – SQL Coding Tips and Techniques

Avoid Writing Code
Reusable Db2 Code
Dynamic and Static SQL
SQL Guidelines
Set Operations

So if you are looking for an introduction to Db2 performance or want to brush up on the fundamentals of coding for performance, look no further. Check out this series of videos on Coding Db2 Applications for Performance from Interskill Learning (featuring yours truly)!

Note that Interskill Learning also offers other categories of training in their Expert Video series including systems programming, quantum computing, and pervasive encryption.

Tuesday, March 26, 2024

Mixing Db2 Database Administration with DevOps - Part 5: SQL Performance Testing

Although implementing database schema changes is the most important component when incorporating database structures into your DevOps pipeline, it is not the only thing to consider. It is also important to be able to analyze and optimize SQL performance within your application code.

As anyone who has written SQL knows, it is a very flexible language. There are multiple ways to write SQL queries to achieve the same results. For example, you can combine multiple tables using a join or a subselect and achieve the same results. But each SQL formulation is likely to perform differently, one better than the other. And this is but one example of the various ways you can build SQL statements to perform the same function.

The development mindset is usually to write code that matches the requirements and delivers the expected results, not necessarily to assure the best performance. Therefore, SQL performance testing should be carried out on all programs before they are migrated to a production environment. Failure to do so will likely result in poorly performing applications.

In a DevOps environment, the best approach is to measure, analyze and improve SQL statements at all stages as your code progresses from development to testing to production. The more SQL performance testing that can be accomplished by developers the earlier performance problems will be found and corrected. And that means the cost of delivering high-quality Db2 applications will decline.

However, things are not as simple as just running your program and evaluating its performance metrics. The data that you use in your test environment will not be the same as your production data. Typically, you will have less test data than you do in production. So, if you run the RUNSTATS utility on your test data you will get different statistics than in production, which means you will also get different access paths and performance results.

Setting up the test environment with production statistics and modeling the environment to mimic production is an important aspect of performance testing during development.

With the proper setup and tooling, developers can examine the access paths of their SQL statements to judge their efficiency. Of course, tools that can simplify this process are needed to speed up SQL performance testing. Such tooling should be able to capture Explain information, display it graphically and combine it with pertinent catalog statistics, store a repository of access paths by statement, compare access paths, identify changes, and make recommendations. Ideally, the tool should be integrated into the DevOps toolchain so that information is automatically captured and analyzed each time the program is compiled and bound.

Considerations should also be made for testing specific use cases for performance. For example, consider skewed data. Db2 assumes that data values are mostly uniformly distributed throughout the data. However, not all data is uniformly distributed. RUNSTATS can be used to capture information about non-uniformly distributed and skewed data.

Another performance testing consideration is to always try multiple SQL variations, especially for queries that access a lot of data or have complex access paths. Do not just find one SQL formulation that works and stick with it. Remember that you can code multiple variations of SQL statements that return the same data, but that perform quite differently.

Tools that can help set up testing for various use cases and SQL variations will be needed for integrating SQL performance testing into the DevOps toolchain. There are a wide variety of vendors and solutions for managing Db2 for z/OS SQL performance, but I am not aware of any that have been fully integrated into the DevOps toolchain.

Tuesday, August 08, 2023

Top 10 Db2 Performance Tips - No. 6 Monitoring and Performance Metrics

Monitoring and measuring performance metrics are essential practices for maintaining and optimizing the performance of an IBM Db2 environment. By regularly monitoring and analyzing performance data, DBAs can identify bottlenecks, proactively address performance issues, and make informed decisions to improve overall system efficiency.

It is important to monitor and measure performance metrics in order to gain insights into the behavior of the applications and databases in use at your site. By examining their behavior and identifying areas for improvement, you can improve the overall satisfaction of your customers and end users.

Performance metrics provide valuable information about resource utilization, query response times, disk I/O, CPU usage, and other critical aspects of database performance. By tracking these metrics over time, administrators can detect patterns, identify trends, and pinpoint potential performance bottlenecks.

A Strategy

The first part of any Db2 performance management strategy should be to provide a comprehensive approach to the monitoring of the Db2 subsystems operating at your site. This approach involves monitoring not only the threads accessing Db2 and the SQL they issue, but also the Db2 address spaces. You can accomplish this task in three ways:

Batch reports run against DB2 trace records. While DB2 is running, you can activate traces that accumulate information, which can be used to monitor both the performance of the DB2 subsystem and the applications being run.

Online access to DB2 trace information and DB2 control blocks. This type of monitoring also can provide information on DB2 and its subordinate applications.

Sampling DB2 application programs as they run and analyzing which portions of the code use the most resources.

Using all three approaches can be a reasonable approach if you have analyzed the type of workloads in use and which types of monitoring are most effective for each.
You also need to establish a strategy for your organization's tuning goals. And your tuning goals should be set using the discipline of service level management (SLM). A service level is a measure of operational behavior. SLM ensures applications behave accordingly by applying resources to those applications based on their importance to the organization. Depending on the needs of the organization, SLM can focus on availability, performance, or both. In terms of availability, the service level can be defined as “99.95% up time, during the hours of 9:00 AM to 10:00 PM on weekdays.” Of course, a service level can be more specific, stating “average response time for transactions will be two seconds or less for workloads of strategic external clients.”

For a service level agreement (SLA) to be successful, all of the parties involved must agree upon stated objectives for availability and performance. The end users must be satisfied with the performance of their applications, and the DBAs and technicians must be content with their ability to manage the system to the objectives. Compromise is essential to reach a useful SLA.

Furthermore, the objectives of the SLA must be defined and measurable. For example, in the earlier SLA you must define what a “strategic client” is and differentiate strategic from nonstrategic. Although doing this may be difficult, failing to do so can result in worthless SLAs that cannot be achieved.

In the end, if you do not identify service levels for each transaction, then you will always be managing to an unidentified requirement. Without a predefined and agreed upon SLA, how will the DBA and the end users know whether an application is performing adequately? Without SLAs, business users and DBAs might have different expectations, resulting in unsatisfied business executives and frustrated DBAs. Not a good situation.

Db2 Traces

One of the first types of performance metrics to consider is monitoring based on reading trace information. You can think of a Db2 trace as a window into the performance characteristics of aspects of your Db2 workload. Db2 traces record diagnostic information describing particular events. As Db2 operates, it writes trace information that can be read and analyzed to obtain performance information.

Db2 provides six types of traces, and each describes information about the Db2 environment:

Accounting - Records performance information about the execution of DB2 application programs
Audit - Provides information about DB2 DDL, security, utilities, and data modification
Global - Provides information for the servicing of DB2
Monitor - Records data useful for online monitoring of the DB2 subsystem and DB2 application programs
Performance - Collects detailed data about DB2 events, enabling database and performance analysts to pinpoint the causes of performance problems
Statistics - Records information regarding the DB2 subsystem’s use of resources

You can start Db2 traces in two ways: by specifying the appropriate DSNZPARMs at Db2 startup or by using the -START TRACE command to initiate specific traces when Db2 is already running.

Each trace is broken down further into classes, each of which provides information about aspects of that trace. Additional informatiohn about the classes for each type of trace can be found here, here, and here.

You should understand what type of information is traced and the approximate overhead of each trace class before starting any of the Db2 traces. Some traces are better left off until or unless a performance problem is occurring, at which point, the trace can be started to capture details about the situation. Others are better to have turned on before problems occur. Keep in mind that you will have some trace classes (and IFCIDs) that are always started, and other that are only used in emergencies.

What are IFCIDs?

Each trace class is associated with specific trace events known as Instrumentation Facility Component Identifier (IFCIDs), pronounced “if-kid.” An IFCID defines a record that represents a trace event. IFCIDs are the single smallest unit of tracing that can be invoked
by Db2.

In some cases, it can make sense to avoid activating trace classes altogether and start traces specifying only the IFCIDs needed. This way, you can reduce the overhead associated with tracing by recording only the trace events needed.

There are several hundred different IFCIDs. Most IFCIDs contain data fields that describe events pertinent to the event being traced. Some IFCIDs have no data; instead they merely mark a specific time. Certain trace events of extended durations require a pair of IFCIDs: one for the beginning of the event and another for the end. These pairs enable the computation of elapsed times. Other trace events that are not as lengthy require only a single IFCID.

You can find the IFCIDs associated with each trace class in the IBM Db2Command Reference manual in the section describing the -START TRACE command. But that manual does not describe the purpose of each IFCID. A list describing each IFCID can be found in the data set named SDSNIVPD(DSNWMSGS), which is part of the Db2 installation.

Db2 Performance Monitors

Several popular performance monitoring solutions are available for Db2 for z/OS. IBM’s Omegamon, BMC Software’s MainView, Broadcom’s Sysview, and Rocket Software's TMON are the leading performance monitors. When selecting a performance monitor, be sure to examine the online components as well as the batch reporting capabilities of the monitor.

An online performance monitor is a tool that provides real-time reporting on Db2 performance statistics as Db2 operates. In contrast, a batch performance monitor reads previously generated trace records from an input data set. Most performance monitors today can be used to serve both purposes.

With online DB2 performance monitors, you can perform proactive performance management tasks. In other words, you can set up the monitor such that when it detects a problem it alerts a DBA and possibly takes actions on its own to resolve the problem.

The most common way to provide online performance monitoring capabilities is by online access to DB2 trace information in the MONITOR trace class. Some online DB2 performance monitors also provide direct access to Db2 performance data by reading the control blocks of the Db2 and application address spaces. This type of monitoring provides a “window” to up-to-the-minute performance statistics while Db2 runs. This information is important if a quick reaction to performance problems is required.

Most online Db2 performance monitors provide a menu-driven interface accessible from TSO or VTAM. It enables online performance monitors to start and stop traces as needed based on the menu options chosen by the user. Consequently, you can reduce overhead and diminish the learning curve involved in understanding Db2 traces and their correspondence to performance reports.

Following are some typical uses of online performance monitors. Many online performance monitors can establish effective exception-based monitoring. When specified performance thresholds are reached, triggers can offer notification and take action. For example, you could set a trigger when the number of lock suspensions for the TXN00002 plan is reached; when the trigger is activated, a message is sent to the console, and a batch report is generated to provide accounting detail information for the plan. You can set any number of triggers for many thresholds. Following are some examples of thresholds you might choose to set:

When a buffer pool threshold is reached (PREFETCH DISABLED, DEFERRED WRITE THRESHOLD, or DM CRITICAL THRESHOLD).
For critical transactions, when predefined performance objectives are not met. For example, if TXN00001 requires subsecond response time, set a trigger to notify a DBA when the transaction receives a class 1 accounting elapsed time exceeding 1 second by more than 25%.
Many types of thresholds can be established. Most online monitors support this capability. As such, you can customize the thresholds for the needs of your DB2 environment.

Most online performance monitors can produce real-time EXPLAINs for long-running SQL statements. If an SQL statement takes a significant amount of time to process, an analyst can display the SQL statement as it executes and dynamically issue an EXPLAIN for the statement. Even as the statement executes, an understanding of why it takes so long to run can be achieved. A complete discussion of the EXPLAIN statement is provided in the next chapter.

Some online performance monitors provide historical trending. These monitors track performance statistics and store them in DB2 tables or in VSAM files with a timestamp. They also provide the capability to query these stores of performance data to assist in the following:

Analyzing recent history. Most SQL statements execute quickly, making difficult the job of capturing and displaying information about the SQL statement as it executes. However, you might not want to wait until the SMF data is available to run a batch report. Quick access to recent past-performance data in these external data stores provides a type of online monitoring that is as close to real time as is usually needed.
Determining performance trends, such as a transaction steadily increasing in its CPU consumption or elapsed time.
Performing capacity planning based on a snapshot of the recent performance of DB2 applications.

Some monitors also run when Db2 is down to provide access to the historical data accumulated by the monitor.

A final benefit of online DB2 performance monitors is their capability to interface with other z/OS monitors for IMS/TM, CICS, z/OS, or VTAM. This way, an analyst gets a view of the entire spectrum of system performance. Understanding and analyzing the data from each of these monitors, however, requires a different skill. Quite often, one person cannot master all these monitors

Conclusion

Although this blog entry was brief, and there are many additional aspects to Db2 performance monitoring, monitoring and measuring performance metrics should be viewed as a vital requirement for all organizations using Db2 databases. By collecting and analyzing performance data, DBAs can detect performance bottlenecks, identify areas for improvement, and make informed decisions to enhance overall system efficiency.