Tuesday, June 23, 2026

Db2 for z/OS Lock Escalation: When Fine-Grained Locking Becomes a Problem

One of the primary goals of Db2 for z/OS locking is to maximize concurrency while maintaining data integrity. In a perfect world, every application would acquire only the locks it needs, hold them for the shortest possible duration, and release them promptly. But the real world is not always perfect. Sometimes an application acquires so many locks that Db2 decides it is more efficient to replace those many locks with a single, larger lock. This process is known as lock escalation.


Lock escalation is one of those Db2 behaviors that every DBA should understand because when it occurs unexpectedly, it can have a significant impact on application performance and availability.

What Is Lock Escalation?

Lock escalation occurs when Db2 replaces numerous row, page, or LOB locks with a single table space or partition lock.

Imagine an application updating hundreds of thousands of rows. Instead of managing and tracking an enormous number of individual locks, Db2 may determine that maintaining all of those locks consumes too much storage and processing overhead. Rather than continue managing thousands of granular locks, Db2 escalates them to a larger lock.

For example:

  • 50,000 row locks become one table space lock.
  • Thousands of page locks become one partition lock.
  • Many LOB locks become a higher-level lock.

From Db2's perspective, lock escalation can reduce lock management overhead. From the application's perspective, however, lock escalation reduces concurrency because other applications may now be blocked from accessing a much larger portion of the data.

Why Does Db2 Escalate Locks?

Db2 lock escalation is generally driven by one of two conditions:

LOCKMAX Threshold Reached

The most common cause is the LOCKMAX parameter. LOCKMAX is set at the tablespace level and it defines the maximum number of page, row, or LOB locks that can be held for a table space or partition before Db2 attempts escalation.

The value can be:

  • A specific number (ranging from 0 to 2,147,483,647)
    • 0 (means lock escalation is disabled)
  • SYSTEM (use subsystem default NUMLKTS)

You can find the value of LOCKMAX in the Db2 Catalog by reviewing the MAXROWS column of SYSIBM.SYSTABLESPACE.

When the threshold is exceeded, Db2 attempts escalation. For example, if LOCKMAX is set to 10,000 and an application acquires its 10,001st lock, Db2 attempts to escalate.

Lock Storage Shortage

Db2 also monitors lock storage consumption.

Even if LOCKMAX is not reached, Db2 may escalate locks when lock storage resources become constrained. This protects the subsystem from excessive lock memory consumption.

In these cases, escalation is a defensive measure designed to preserve overall system stability.

What Happens During Escalation?

Suppose an application holds 25,000 row locks and that is also the value of LOCKMAX. When the next row lock is requested while updating a table Db2 attempts to replace all 25,000 + 1 locks with a higher-level lock, typically:

  • Exclusive table space lock for updates
  • Share table space lock for read activity

If Db2 successfully acquires the higher-level lock:

  1. The individual locks are released.
  2. The table space or partition lock is acquired.
  3. Processing continues.

The problem is that other applications may now be blocked from accessing data that previously would have remained available through row-level concurrency. A single poorly designed batch job can suddenly become a bottleneck for dozens or hundreds of online transactions.

Why Lock Escalation Can Be Dangerous

Many DBAs think lock escalation is merely a locking event. In reality, it is often an application design warning signal.

Consider a CICS transaction that normally updates ten rows. No issue. Now consider a batch job (running concurrently with the transactions) that updates five million rows under one unit of work. Without frequent commits, the job accumulates massive numbers of locks. Eventually escalation occurs. The consequences may be dire, including:

  • Increased lock contention
  • Application timeouts
  • Deadlocks
  • Reduced concurrency
  • Unexpected outages for online users

In production environments, lock escalation frequently becomes visible only after users begin reporting delays.

Common Causes

Over the years, I have found that lock escalation is usually symptomatic of one or more underlying issues. And it is usually an application design/coding issue.

Infrequent Commits

Not issuing sufficient (or any COMMITs) is probably the most common cause. Applications that process large volumes of data without committing work accumulate locks continuously. And the locks are not released until a COMMIT is issued (or the program ends).

A batch job committing every 100,000 rows will typically consume far more lock resources than one committing every 1,000 rows.

I have written about Bachelor Programming Syndrome before (check the link), which is just my way of saying don’t fear committing. In general, I recommend that you plan to issue COMMITs in every batch program. You can structure the logic so that the COMMIT processing is contingent on a parameter passed to the program. This approach enables an analyst to modify COMMIT frequency, or even turn off COMMIT processing, as the concurrency needs of the application varies.

Mass Updates and Deletes

Large-scale data modification operations naturally acquire large numbers of locks.

Examples include:

DELETE FROM CUSTOMER_HISTORY
WHERE CREATE_DATE < CURRENT DATE - 5 YEARS;

or

UPDATE ACCOUNT
SET STATUS = 'I'
WHERE LAST_ACTIVITY_DATE < CURRENT DATE - 3 YEARS;

These operations can quickly exceed escalation thresholds.

Poor Access Paths

Inefficient access paths may cause Db2 to examine and lock far more rows or pages than intended. An application expected to update 100 rows might actually scan millions due to a missing or ineffective index.

Excessively Large Units of Work

The larger the unit of work, the greater the lock accumulation. Applications that hold locks for extended periods are prime candidates for escalation. Again, parameterized control of COMMIT frequency makes it easier to manage and optimize concurrency without requiring a program change.

Detecting Lock Escalation

Fortunately, Db2 provides several ways to identify escalation activity. DBAs should monitor:

  • IFCID traces
  • Db2 statistics reports
  • Accounting reports
  • Performance monitor alerts
  • System messages

A sudden increase in lock waits often points directly to escalation activity. When troubleshooting, examine:

  • Which object escalated
  • Which application triggered escalation
  • COMMIT frequency
  • Number of locks acquired
  • Access path efficiency

The goal is not merely to identify that escalation occurred, but to determine why.

Preventing Lock Escalation

The best strategy is usually prevention rather than accommodation.

Improve Commit Frequency

Frequent commits reduce lock accumulation.

This is often the single most effective corrective action.

Tune SQL

Efficient SQL accesses fewer pages and rows, reducing lock requirements.

Better indexing and improved access paths often eliminate escalation problems entirely.

Adjust LOCKMAX

In some situations, increasing LOCKMAX may be appropriate.

However, simply raising thresholds without understanding the underlying workload can mask deeper application issues.

Use Partitioning

Partition-level locking can significantly reduce the scope of lock contention.

An escalated partition lock is generally less disruptive than a full table space lock.

Be Cautious with LOCKMAX 0

Setting LOCKMAX to 0 disables lock escalation. This prevents escalation, but it does not eliminate lock consumption.

If applications accumulate excessive locks, other resource constraints may emerge. Therefore, LOCKMAX 0 should be used only after careful analysis.

The DBA Perspective

One lesson I have learned repeatedly is that lock escalation is rarely the root problem. It is usually a symptom. When escalation occurs, Db2 is telling you something important. Specifically:

"This application is holding more locks than I am comfortable managing efficiently."

The correct response is usually not to disable escalation or simply raise thresholds. Instead, investigate the workload. Examine COMMIT frequency. Review SQL efficiency. Analyze access paths. Understand the business process generating the activity.

In many cases, the real solution lies in better application design rather than lock configuration.

Lock escalation exists to protect Db2. But when it appears regularly in your environment, it is often signaling an opportunity to improve performance, scalability, and concurrency. Wise DBAs treat lock escalation not as a nuisance, but as valuable diagnostic information about the health of their applications and workloads.

 

Thursday, May 21, 2026

Concurrency vs. Throughput in Db2

As Db2 DBAs we often find ourselves chasing two elusive targets: speed and capacity. We want our systems to handle everything all at once, and we want it done yesterday. But as database architectures evolve and workloads grow increasingly complex, DBAs must constantly manage the delicate, often misunderstood relationship between two fundamental metrics: Concurrency and Throughput.

It’s easy to mistake one for the other, or to assume that maximizing one automatically boosts the other. But the two are not the same. Let’s break down what these terms really mean for your Db2 for z/OS subsystems and why balancing them is the key to a healthy production environment.

Defining the Duo

Before we look at how they interact, let’s establish a clear baseline for both concepts.

  • Concurrency is the database's ability to handle multiple interactive sessions, transactions, or users at the exact same time. It’s a measure of simultaneous access. Think of it as the number of lanes on a highway. More lanes typically mean that more cars can be on the road simultaneously.

  • Throughput is the actual amount of work successfully completed by the database per unit of time (e.g., transactions per second, rows processed per minute). If concurrency is the number of lanes on the highway, throughput is the number of cars that actually pass through the toll booth every hour.

In an ideal world, as concurrency increases, throughput rises right along with it. But database systems don't operate in a vacuum. They are bound by physical limitations: CPU, memory, I/O bandwidth, and, most importantly, locking and latching mechanisms.

The Concurrency Curve: When More Becomes Less

If you increase the number of concurrent threads entering your Db2 subsystem, throughput will generally scale linearly, at least up to a point. But eventually you hit a tipping point that is referred to as the "knee of the curve."

Beyond this point, adding more concurrent users doesn't get more work done. Instead, it breeds contention.

When too many transactions fight for the same resources, Db2 spends more time managing the queue than doing actual work. You’ll start to see symptoms like:

  • High internal latch contention.

  • Increased lock wait times and, worst-case scenario, spikes in deadlocks and timeouts.

  • Elongated class 3 suspension times in your accounting reports.

At this stage, you haven't plateaued your throughput... you’ve actually degraded it. You are burning CPU just to manage the traffic jam.

Db2 Mechanisms to Balance the Scale

Maximizing throughput while maintaining healthy concurrency requires an understanding of Db2’s internal mechanics. Here are the core areas where DBAs can tip the scales back in their favor:


1. Lock Avoidance and Isolation Levels

Locks are the primary gatekeepers of concurrency. The best way to improve concurrency is to avoid locking altogether when safe. Ensure your application packages are bound with CURRENTDATA(NO) to allow Db2 to use lock avoidance techniques. Furthermore, when appropriate, consider dirty reads. If an application can tolerate reading uncommitted data, UNCOMMITTED READ (UR) blows the doors wide open for concurrency because it doesn't acquire read locks.

Note: it has been my experience that far too many programs and SQL statements use UR. Although removing locks with UR improves concurrency, it can damage data quality if used inappropriately.

2. Thread Management (MAX REMOTE ACTIVE / MAX ONLINE)

It is not an uncommon misconception that letting every single request hit the engine at once improves throughput. It really doesn’t. Leveraging Db2’s thread pooling and setting intelligent limits on MAX REMOTE ACTIVE (for distributed workloads via DDF) ensures that work is queued efficiently before it can thrash the engine.

3. Page-Level vs. Row-Level Locking

Row-level locking (LOCKSIZE ROW) sounds like a silver bullet for concurrency because it minimizes the footprint of a lock. However, it comes with a steep price tag: you might experience considerable CPU overhead for lock acquisition and management. If your throughput is CPU-bound, using LOCKSIZE PAGE (combined with smart page splitting and small row sizes) might actually increase your overall throughput by freeing up CPU cycles.

The trade-off between row and page locks depends on understanding the nature of the application accessing the data and its current execution profile. If there are contention problems when accessing a table space that is currently set to LOCKSIZE PAGE you might consider altering it to LOCKSIZE ROW and then monitoring the impact on performance, resource consumption, and concurrency.

The Bottom Line

Mainframes are designed to process massive, mind-boggling volumes of concurrent data better than almost anything else on earth. But the laws of database physics still apply.

To achieve maximum throughput, you cannot simply crank the concurrency dial to eleven and hope for the best. You must monitor your buffer pools, optimize your indexing to minimize table scans, design your applications for short commit scopes, and precisely configure your subsystem parameters.

Remember: True database performance isn't about how many transactions you let through the door at once; it's about how fast you can successfully get them out the exit.

Thursday, May 14, 2026

The Mullins 5‑Step Performance Diagnosis Framework

In the complex ecosystem of enterprise mainframes and hybrid cloud environments, performance tuning often feels like chasing ghosts in the machine. However, achieving peak efficiency in systems like Db2 doesn't require magic. No, it requires a repeatable, disciplined methodology.

Whether you are managing massive digital transactions or high-concurrency analytics, the following 5-Step Performance Diagnosis Framework provides a blueprint for resolving the vast majority of enterprise performance bottlenecks. This framework consists of the following 5 steps...

  1. Analyze workload patterns
  2. Review SQL and access paths
  3. Inspect locking and contention
  4. Evaluate buffer pool efficiency
  5. Check for recent changes

...and it can help to resolve the majority of enterprise database/application performance issues.

Let's examine each step in a little more detail.

1. Analyze Workload Patterns

Before diving into tweaking and turning technical knobs, you first have to understand the nature of the demand. Is the performance degradation systemic or isolated? 

  • Systemic issues are environment-wide or application-wide degradations. They affect a broad range of users and processes simultaneously, regardless of the specific task being performed.

  • Isolated issues are localized to a specific transaction, query, or user group. While one part of the system is failing to meet service level agreements (SLAs), the rest of the environment may be operating at peak efficiency.

By differentiating between these two types of problems, you can avoid the "shotgun approach" to tuning, such as changing global settings (e.g. DSNZPARMs) to fix a local problem. This often results in unintended consequences for the rest of the enterprise.

When you analyze workload patterns, you can differentiate between a runaway batch job and a steady-state increase in transaction volume. Understanding the "shape" of your data traffic is the first step in identifying if the issue is a sudden spike or a gradual trend toward resource exhaustion.

2. Review SQL and Access Paths

The most significant performance gains are almost always found in the SQL. A single poorly written query or a sub-optimal access path chosen by the optimizer can cripple an application. Reviewing the EXPLAIN output to verify index usage, join methods, and sort requirements is critical. In a world of evolving database engines, ensuring your SQL remains efficient is a primary responsibility of the modern DBA.

Using tools that create a visual diagram and "explanation" of your EXPLAIN output can greatly improve the efficiency of your SQL tuning efforts. The primary tool used by Db2 for z/OS sites these days is IBM Db2 Administration Foundation for z/OS, which is the modern, browser-based replacement for IBM Data Studio. It is built on the Zowe framework (open source) and provides a streamlined interface for mainframe DBAs. It displays the SQL execution plans as a graphical tree, showing table scans, index lookups, and join methods.

Consider reviewing and using my recently published blog containing my SQL Review Workflow.

3. Inspect Locking and Contention

In a high-availability Parallel Sysplex or clustered environment, concurrency is king. However, where there is concurrency, there is the potential for contention. Inspecting lock wait times, deadlocks, and timeout statistics can reveal hidden bottlenecks where processes are fighting over the same resources. Tuning your commit frequency and isolation levels can often alleviate these "soft" performance hits that don't show up in CPU metrics alone.

Indeed, one of the most common problems I find in my consulting engagements is what I call Bachelor Programming Syndrome or "Fear of COMMITing."

Unless you plan for, and issue regular COMMITs in your programs that access Db2 data, you will be causing locking problems. It is important for every programmer to issue COMMIT statements in all application programs where data is modified (INSERT, UPDATE, and DELETE). 


Failing to code COMMITs in a data modification program
is what I like to call Bachelor Programming Syndrome...
in other words, Fear of Committing.

The COMMIT externalizes the modifications that occurred in the program since the beginning of the program or the last COMMIT. A COMMIT ensures that all modifications have been physically applied to the database, thereby ensuring data integrity and recoverability. Without COMMITs, data remains locked making it inaccessible to others, eventually causing timeout failures. 

 4. Evaluate Buffer Pool Efficiency

Data that stays in memory is data that moves fast. Evaluating buffer pool hit ratios and asynchronous I/O rates is essential for ensuring that your most frequently accessed "hot" data isn't being constantly paged out to disk. Efficient memory management reduces the I/O overhead that often serves as the hard ceiling for database performance.

When allocating Db2 buffer pools, keep some basic rules of thumb in mind:

  • Don't allocate everything to a single buffer pool (e.g., BP0); use a multiple buffer pool strategy.
  • Explicitly specify a buffer pool for every table space and index.
  • Isolate the Db2 Catalog in BP0 (and BP8K0 as of V8); put user and application Db2 objects into other buffer pools.
  • It is a best practice to separate indexes from table spaces with each in their own dedicated buffer pools.
  • Consider isolating heavily hit data into its own buffer pool to better control performance.
  • Consider isolating sorts into a single buffer pool and tuning it for mostly sequential access (e.g. BP7).
  • Consider separating Db2 objects into separate buffer pools that have been configured for sequential vs random access.
  • Consider pinning small, high usage tables in memory by dedicating a buffer pool to each and using the PGSTEAL(NONE) option. 

5. Check for Recent Changes

The most common cause of a performance "event" is change. Whether it’s a recent REORG, a change in Db2 function level (or even maintenance being applied), or a modified application module, you must look at the delta between "yesterday" and "today". Always audit the most recent modifications to the environment to see if a configuration shift has triggered an unintended consequence in the execution plan.


Summary: While AI-driven tools are beginning to assist in database administration, the foundational logic of performance tuning remains rooted in these core pillars. By systematically moving through workload, SQL, locking, memory, and change management, you can stabilize and optimize even the most demanding enterprise environments.

 

How has your team integrated these diagnostic steps into your current hybrid cloud monitoring workflow?

Wednesday, April 29, 2026

A SQL review workflow


It is important to review SQL statements during testing... certainly before they get to production! A good SQL review workflow shouldn’t feel like bureaucracy, instead it should feel like a safety net that consistently catches performance problems before they hit production. The key is to structure the workflow so that it’s repeatable, lightweight enough to follow, and grounded in how databases actually execute queries.

Here’s a practical, end-to-end workflow you can implement.

1. Define the Context First (Before Looking at SQL)

Before reviewing the query itself, establish intent:

  • What is the query supposed to do?
  • Expected result set size (1 row, 100 rows, millions?)
  • Frequency of execution (ad hoc vs. thousands per minute)
  • Performance expectations (sub-second? batch window?)

Without this, you can’t judge whether a query is “fast enough.”

2. Perform a Structural Review

This is a quick, surface-level pass to catch obvious issues:

  • Avoid SELECT * (unless absolutely justified... and it almost never is)
  • Ensure proper filtering (WHERE clauses exist and are meaningful)
  • Check join conditions (no accidental cross joins)
  • Validate correct use of INNER vs OUTER joins
  • Look for unnecessary nested subqueries or overly complex CTE chains
  • Confirm consistent aliasing and readability

This step is less about performance tuning and more about preventing obvious inefficiencies.

3. Validate Data Access Paths

Now move into performance-critical territory:

  • Are indexes available for:
    • Join columns?
    • Filter predicates?
    • Sorting/grouping columns?
  • Are functions being applied to indexed columns (breaking index usage)?
    • For example: WHERE YEAR(order_date) = 2024 (unless you have an index on the expression)
  • Are implicit data type conversions occurring?

If access paths are wrong, everything else becomes irrelevant.

4. Review the Execution Plan

This is probably the most important step. You need to be able to review PLAN_TABLE output and understand what the columns mean. A visual EXPLAIN capability (from any number of tools) is also useful to simplify this step (and to include additional information like statistics, column type/length, etc.)

Look for:

  • Table scans on large tables (red flag unless justified)
  • Index scans vs. index seeks
  • Join methods (nested loop vs. merge scan vs. hybrid)
  • High-cost operators
  • Cardinality estimation issues (expected vs. actual rows)

Questions to ask:

  • Is the optimizer choosing the expected path?
  • Are there warning signs like excessive sorts or large intermediate results?
An Aside: The PLAN_TABLE

A lot of information is contained in the PLAN_TABLE. After the optimizer creates the access paths and populates the PLAN_TABLE with data representing those access paths. Many questions can be answered by analyzing the PLAN_TABLE for the query (or package) being tested, questions like:
  • if we are joining what type of join is used (NLJ, MS, Hybrid),
  • was an index used, and if so how many columns matched,
  • are we doing a scan, and if so what type of scan (full or page range)
  • is prefetch being used, and if so what type (sequential, list)
  • was a hint used
  • was parallelism used, and if so what degree and type (I/O, CPU, Sysplex)
  • was a sort required, and if so why (Join, Unique, Group By, Order By)
  • what type of locking is required
And that just covers the main PLAN_TABLE. Keep in mind that EXPLAIN can also populate multiple additional metadata tables, if they exist. These are documented in the addendum at the end of this blog post.

5. Evaluate Data Volume Impact

Because the size of development test data/tables is usually nowhere near the size of production, be sure to test the query against realistic data volumes (and/or realistically set statistics). Small dev/test volumes frequently hide problems!

  • Check:
    • Execution time
    • I/O (logical/physical reads)
    • Memory usage
  • Consider worst-case scenarios:
    • End-of-month data spikes
    • Historical queries over large ranges
Another Aside: Testing Like Production

Keep in mind that additional differences (other than data volume) between test and production can impact access path selection. For example, different buffer pool sizes or running under different Db2 versions can cause different access paths to be chosen.

6. Check for Scalability Risks

A query that works today may fail tomorrow.

Look for:

  • Stage 2 and non-indexable predicates
  • Repeated correlated subqueries
  • Row-by-row processing 
  • Excessive temp table usage
  • Sorting or grouping on large data sets without support

Ask yourself questions like What happens when data grows 10x?

7. Validate Concurrency and Locking Behavior

Performance isn’t just speed, it’s also how queries behave under load. I frequently say that to optimize performance you want to be decreasing I/O, decreasing CPU usage, and increasing concurrency. Examine your SQL and access paths for the following

  • Does the query:
    • Lock large portions of a table?
    • Escalate locks?
  • Is isolation level appropriate?
  • Are there risks of blocking or deadlocks?

8. Recommend Improvements

At this point, provide actionable feedback:

  • Query rewrites (e.g., replace subqueries with joins or vice versa)
  • Index recommendations
  • Partitioning suggestions (if applicable)
  • Caching or materialization options
  • Pre-aggregation strategies

Keep recommendations prioritized:

  1. High impact / low effort
  2. High impact / high effort
  3. Nice-to-have

9. Re-Test and Benchmark

After changes:

  • Compare before vs. after:
    • Execution time
    • Logical reads
    • CPU usage
  • Validate that improvements hold under realistic conditions

10. Document and Approve

Capture:

  • Final version of the query
  • Key decisions (e.g., why a specific index was added)
  • Known limitations or assumptions

Then approve for deployment.

11. Post-Deployment Monitoring (Often Missed)

After release:

  • Monitor query performance in production
  • Watch for:
    • Plan regressions
    • Data growth effects
    • Parameter sensitivity issues

Optional: Lightweight Checklist Version

If you want something quick and enforceable:

No SELECT *
Proper joins and filters
Indexes support predicates and joins
Execution plan reviewed
No large table scans (unless justified)
Tested with realistic data volume
No obvious scalability risks
Concurrency impact considered

Final Thought

The biggest mistake in SQL reviews is treating them like code reviews. They’re not. SQL performance lives at the intersection of data distribution, indexing, and optimizer behavior. A good workflow forces reviewers to engage with all three—not just the syntax.


Addendum

Additional EXPLAIN Tables


Table Description
DSN_COLDIST_TABLE contains non-uniform column group statistics that are obtained dynamically by Db2 from non-index leaf pages.
DSN_DETCOST_TABLEcontains information about detailed cost estimation of the mini-plans in a query.
DSN_FILTER_TABLE contains information about how predicates are used during query processing
DSN_FUNCTION_TABLE contains descriptions of functions that are used in specified SQL statements
DSN_KEYTGTDIST_TABLE contains non-uniform index expression statistic that are obtained dynamically by the Db2 optimizer
DSN_FUNCTION_TABLE contains descriptions of functions that are used in specified SQL statements
DSN_PGRANGE_TABLE contains information about qualified partitions for all page range scans in a query
DSN_PGROUP_TABLE contains information about about the parallel groups in a query
DSN_PREDICAT_TABLE contains information about all of the predicates in a query; also used as input when you issue a BIND QUERY command to override the selectivity of predicates for matching SQL statements
DSN_PREDICATE_SELECTIVITY contains information about the selectivity of predicates that are used for access path selection; also used as an input table for the BIND QUERY command when selectivity overrides are specified
DSN_PTASK_TABLE contains information about all of the parallel tasks in a query
DSN_QUERYINFO_TABLE contains information about the eligibility of query blocks for automatic query rewrite, information about the MQTs that are considered for eligible query blocks, reasons why ineligible query blocks are not eligible, and information about acceleration of query blocks
DSN_QUERY_TABLE contains information about a SQL statement, and displays the statement before and after query transformation
DSN_SORTKEY_TABLE contains information about sort keys for all of the sorts required by a query
DSN_SORT_TABLE contains information about sort operations required by a query
DSN_STATEMENT_CACHE_TABLE contains information about the SQL statements in the statement cache, information captured as the results of an EXPLAIN STATEMENT CACHE ALL statement
DSN_STATEMENT_TABLE contains information about the estimated cost of specified SQL statements
DSN_STATFEEDBACK contains recommendations for capturing missing or conflicting statistics that are defined during EXPLAIN
DSN_STRUCT_TABLE contains information about all of the query blocks in a query
DSN_VIEWREF_TABLE contains information about all of the views and materialized query tables that are used to process a query

Wednesday, April 08, 2026

A High-Level Guide for Db2 Database Health Checks

 Here’s a practical, field-tested 10-point Db2 database health check you can use to quickly assess the state of your environment and identify risk areas.


1. Buffer Pool Efficiency

  • Check hit ratios (GETPAGE vs. I/O)
  • Look for excessive synchronous reads
  • Ensure critical objects reside in appropriately sized buffer pools

2. SQL Performance & Access Paths

  • Identify top CPU-consuming and elapsed time SQL
  • Look for table scans, non-matching index scans, and sort activity
  • Validate access paths haven’t regressed (especially after RUNSTATS or REBIND)

3. Index Effectiveness

  • Check for unused or duplicate indexes
  • Identify missing indexes for high-frequency queries
  • Review clustering ratios and index levels

4. RUNSTATS Currency & Quality

  • Ensure statistics are up to date on critical tables and indexes
  • Verify distribution stats (FREQVAL, HISTOGRAM) where needed
  • Watch for stale or missing stats causing poor optimizer choices

5. Locking & Concurrency

  • Monitor lock waits, timeouts, and deadlocks
  • Identify hot objects or pages
  • Evaluate isolation levels and commit frequency

6. I/O Performance

  • Review I/O response times by dataset and volume
  • Check for hotspots or imbalance across storage groups
  • Ensure proper striping and dataset placement

7. Logging & Recovery Health

  • Monitor active log utilization and archive activity
  • Check for long-running units of work delaying log reuse
  • Validate backup and recovery procedures (image copies, PIT recovery readiness)

8. Utilities & Maintenance

  • Ensure REORGs are run where needed (based on RTS, not just schedule)
  • Validate COPY, RUNSTATS, and REORG cadence
  • Look for objects in advisory REORG or COPY pending states

9. System Resource Utilization

  • Track CPU consumption (general purpose vs. zIIP if applicable)
  • Monitor memory usage and EDM pool (for Db2 for z/OS)
  • Watch for thread reuse and connection pooling efficiency

10. Configuration & ZPARM Settings

  • Review key ZPARMs (or Db2 LUW configuration parameters)
  • Validate they align with workload characteristics
  • Check for outdated defaults that may no longer be optimal

Bonus Tip: Trend, Don’t Snapshot

A single point-in-time check is useful—but trending over time is where the real insight lives. Look for gradual degradation, not just obvious breakage.