The Db2 Portal Blog: A SQL review workflow

It is important to review SQL statements during testing... certainly before they get to production! A good SQL review workflow shouldn’t feel like bureaucracy, instead it should feel like a safety net that consistently catches performance problems before they hit production. The key is to structure the workflow so that it’s repeatable, lightweight enough to follow, and grounded in how databases actually execute queries.

Here’s a practical, end-to-end workflow you can implement.

1. Define the Context First (Before Looking at SQL)

Before reviewing the query itself, establish intent:

What is the query supposed to do?
Expected result set size (1 row, 100 rows, millions?)
Frequency of execution (ad hoc vs. thousands per minute)
Performance expectations (sub-second? batch window?)

Without this, you can’t judge whether a query is “fast enough.”

2. Perform a Structural Review

This is a quick, surface-level pass to catch obvious issues:

Avoid SELECT * (unless absolutely justified... and it almost never is)
Ensure proper filtering (WHERE clauses exist and are meaningful)
Check join conditions (no accidental cross joins)
Validate correct use of INNER vs OUTER joins
Look for unnecessary nested subqueries or overly complex CTE chains
Confirm consistent aliasing and readability

This step is less about performance tuning and more about preventing obvious inefficiencies.

3. Validate Data Access Paths

Now move into performance-critical territory:

Are indexes available for:
- Join columns?
- Filter predicates?
- Sorting/grouping columns?
Are functions being applied to indexed columns (breaking index usage)?
- For example: WHERE YEAR(order_date) = 2024 (unless you have an index on the expression)
Are implicit data type conversions occurring?

If access paths are wrong, everything else becomes irrelevant.

4. Review the Execution Plan

This is probably the most important step. You need to be able to review PLAN_TABLE output and understand what the columns mean. A visual EXPLAIN capability (from any number of tools) is also useful to simplify this step (and to include additional information like statistics, column type/length, etc.)

Look for:

Table scans on large tables (red flag unless justified)
Index scans vs. index seeks
Join methods (nested loop vs. merge scan vs. hybrid)
High-cost operators
Cardinality estimation issues (expected vs. actual rows)

Questions to ask:

Is the optimizer choosing the expected path?
Are there warning signs like excessive sorts or large intermediate results?

An Aside: The PLAN_TABLE

A lot of information is contained in the PLAN_TABLE. After the optimizer creates the access paths and populates the PLAN_TABLE with data representing those access paths. Many questions can be answered by analyzing the PLAN_TABLE for the query (or package) being tested, questions like:

if we are joining what type of join is used (NLJ, MS, Hybrid),
was an index used, and if so how many columns matched,
are we doing a scan, and if so what type of scan (full or page range)
is prefetch being used, and if so what type (sequential, list)
was a hint used
was parallelism used, and if so what degree and type (I/O, CPU, Sysplex)
was a sort required, and if so why (Join, Unique, Group By, Order By)
what type of locking is required

And that just covers the main PLAN_TABLE. Keep in mind that EXPLAIN can also populate multiple additional metadata tables, if they exist. These are documented in the addendum at the end of this blog post.

5. Evaluate Data Volume Impact

Because the size of development test data/tables is usually nowhere near the size of production, be sure to test the query against realistic data volumes (and/or realistically set statistics). Small dev/test volumes frequently hide problems!

Check:
- Execution time
- I/O (logical/physical reads)
- Memory usage
Consider worst-case scenarios:

End-of-month data spikes
Historical queries over large ranges

Another Aside: Testing Like Production

Keep in mind that additional differences (other than data volume) between test and production can impact access path selection. For example, different buffer pool sizes or running under different Db2 versions can cause different access paths to be chosen.

6. Check for Scalability Risks

A query that works today may fail tomorrow.

Look for:

Stage 2 and non-indexable predicates
Repeated correlated subqueries
Row-by-row processing
Excessive temp table usage
Sorting or grouping on large data sets without support

Ask yourself questions like What happens when data grows 10x?

7. Validate Concurrency and Locking Behavior

Performance isn’t just speed, it’s also how queries behave under load. I frequently say that to optimize performance you want to be decreasing I/O, decreasing CPU usage, and increasing concurrency. Examine your SQL and access paths for the following

Does the query:
- Lock large portions of a table?
- Escalate locks?
Is isolation level appropriate?
Are there risks of blocking or deadlocks?

8. Recommend Improvements

At this point, provide actionable feedback:

Query rewrites (e.g., replace subqueries with joins or vice versa)
Index recommendations
Partitioning suggestions (if applicable)
Caching or materialization options
Pre-aggregation strategies

Keep recommendations prioritized:

High impact / low effort
High impact / high effort
Nice-to-have

9. Re-Test and Benchmark

After changes:

Compare before vs. after:
- Execution time
- Logical reads
- CPU usage
Validate that improvements hold under realistic conditions

10. Document and Approve

Capture:

Final version of the query
Key decisions (e.g., why a specific index was added)
Known limitations or assumptions

Then approve for deployment.

11. Post-Deployment Monitoring (Often Missed)

After release:

Monitor query performance in production
Watch for:
- Plan regressions
- Data growth effects
- Parameter sensitivity issues

Optional: Lightweight Checklist Version

If you want something quick and enforceable:

No SELECT *
Proper joins and filters
Indexes support predicates and joins
Execution plan reviewed
No large table scans (unless justified)
Tested with realistic data volume
No obvious scalability risks
Concurrency impact considered

Final Thought

The biggest mistake in SQL reviews is treating them like code reviews. They’re not. SQL performance lives at the intersection of data distribution, indexing, and optimizer behavior. A good workflow forces reviewers to engage with all three—not just the syntax.

Addendum

Additional EXPLAIN Tables

Table	Description
DSN_COLDIST_TABLE	contains non-uniform column group statistics that are obtained dynamically by Db2 from non-index leaf pages.
DSN_DETCOST_TABLE	contains information about detailed cost estimation of the mini-plans in a query.
DSN_FILTER_TABLE	contains information about how predicates are used during query processing
DSN_FUNCTION_TABLE	contains descriptions of functions that are used in specified SQL statements
DSN_KEYTGTDIST_TABLE	contains non-uniform index expression statistic that are obtained dynamically by the Db2 optimizer
DSN_FUNCTION_TABLE	contains descriptions of functions that are used in specified SQL statements
DSN_PGRANGE_TABLE	contains information about qualified partitions for all page range scans in a query
DSN_PGROUP_TABLE	contains information about about the parallel groups in a query
DSN_PREDICAT_TABLE	contains information about all of the predicates in a query; also used as input when you issue a BIND QUERY command to override the selectivity of predicates for matching SQL statements
DSN_PREDICATE_SELECTIVITY	contains information about the selectivity of predicates that are used for access path selection; also used as an input table for the BIND QUERY command when selectivity overrides are specified
DSN_PTASK_TABLE	contains information about all of the parallel tasks in a query
DSN_QUERYINFO_TABLE	contains information about the eligibility of query blocks for automatic query rewrite, information about the MQTs that are considered for eligible query blocks, reasons why ineligible query blocks are not eligible, and information about acceleration of query blocks
DSN_QUERY_TABLE	contains information about a SQL statement, and displays the statement before and after query transformation
DSN_SORTKEY_TABLE	contains information about sort keys for all of the sorts required by a query
DSN_SORT_TABLE	contains information about sort operations required by a query
DSN_STATEMENT_CACHE_TABLE	contains information about the SQL statements in the statement cache, information captured as the results of an EXPLAIN STATEMENT CACHE ALL statement
DSN_STATEMENT_TABLE	contains information about the estimated cost of specified SQL statements
DSN_STATFEEDBACK	contains recommendations for capturing missing or conflicting statistics that are defined during EXPLAIN
DSN_STRUCT_TABLE	contains information about all of the query blocks in a query
DSN_VIEWREF_TABLE	contains information about all of the views and materialized query tables that are used to process a query

Wednesday, April 29, 2026

A SQL review workflow