Wednesday, April 29, 2026

A SQL review workflow


It is important to review SQL statements during testing... certainly before they get to production! A good SQL review workflow shouldn’t feel like bureaucracy, instead it should feel like a safety net that consistently catches performance problems before they hit production. The key is to structure the workflow so that it’s repeatable, lightweight enough to follow, and grounded in how databases actually execute queries.

Here’s a practical, end-to-end workflow you can implement.

1. Define the Context First (Before Looking at SQL)

Before reviewing the query itself, establish intent:

  • What is the query supposed to do?
  • Expected result set size (1 row, 100 rows, millions?)
  • Frequency of execution (ad hoc vs. thousands per minute)
  • Performance expectations (sub-second? batch window?)

Without this, you can’t judge whether a query is “fast enough.”

2. Perform a Structural Review

This is a quick, surface-level pass to catch obvious issues:

  • Avoid SELECT * (unless absolutely justified... and it almost never is)
  • Ensure proper filtering (WHERE clauses exist and are meaningful)
  • Check join conditions (no accidental cross joins)
  • Validate correct use of INNER vs OUTER joins
  • Look for unnecessary nested subqueries or overly complex CTE chains
  • Confirm consistent aliasing and readability

This step is less about performance tuning and more about preventing obvious inefficiencies.

3. Validate Data Access Paths

Now move into performance-critical territory:

  • Are indexes available for:
    • Join columns?
    • Filter predicates?
    • Sorting/grouping columns?
  • Are functions being applied to indexed columns (breaking index usage)?
    • For example: WHERE YEAR(order_date) = 2024 (unless you have an index on the expression)
  • Are implicit data type conversions occurring?

If access paths are wrong, everything else becomes irrelevant.

4. Review the Execution Plan

This is probably the most important step. You need to be able to review PLAN_TABLE output and understand what the columns mean. A visual EXPLAIN capability (from any number of tools) is also useful to simplify this step (and to include additional information like statistics, column type/length, etc.)

Look for:

  • Table scans on large tables (red flag unless justified)
  • Index scans vs. index seeks
  • Join methods (nested loop vs. merge scan vs. hybrid)
  • High-cost operators
  • Cardinality estimation issues (expected vs. actual rows)

Questions to ask:

  • Is the optimizer choosing the expected path?
  • Are there warning signs like excessive sorts or large intermediate results?
An Aside: The PLAN_TABLE

A lot of information is contained in the PLAN_TABLE. After the optimizer creates the access paths and populates the PLAN_TABLE with data representing those access paths. Many questions can be answered by analyzing the PLAN_TABLE for the query (or package) being tested, questions like:
  • if we are joining what type of join is used (NLJ, MS, Hybrid),
  • was an index used, and if so how many columns matched,
  • are we doing a scan, and if so what type of scan (full or page range)
  • is prefetch being used, and if so what type (sequential, list)
  • was a hint used
  • was parallelism used, and if so what degree and type (I/O, CPU, Sysplex)
  • was a sort required, and if so why (Join, Unique, Group By, Order By)
  • what type of locking is required
And that just covers the main PLAN_TABLE. Keep in mind that EXPLAIN can also populate multiple additional metadata tables, if they exist. These are documented in the addendum at the end of this blog post.

5. Evaluate Data Volume Impact

Because the size of development test data/tables is usually nowhere near the size of production, be sure to test the query against realistic data volumes (and/or realistically set statistics). Small dev/test volumes frequently hide problems!

  • Check:
    • Execution time
    • I/O (logical/physical reads)
    • Memory usage
  • Consider worst-case scenarios:
    • End-of-month data spikes
    • Historical queries over large ranges
Another Aside: Testing Like Production

Keep in mind that additional differences (other than data volume) between test and production can impact access path selection. For example, different buffer pool sizes or running under different Db2 versions can cause different access paths to be chosen.

6. Check for Scalability Risks

A query that works today may fail tomorrow.

Look for:

  • Stage 2 and non-indexable predicates
  • Repeated correlated subqueries
  • Row-by-row processing 
  • Excessive temp table usage
  • Sorting or grouping on large data sets without support

Ask yourself questions like What happens when data grows 10x?

7. Validate Concurrency and Locking Behavior

Performance isn’t just speed, it’s also how queries behave under load. I frequently say that to optimize performance you want to be decreasing I/O, decreasing CPU usage, and increasing concurrency. Examine your SQL and access paths for the following

  • Does the query:
    • Lock large portions of a table?
    • Escalate locks?
  • Is isolation level appropriate?
  • Are there risks of blocking or deadlocks?

8. Recommend Improvements

At this point, provide actionable feedback:

  • Query rewrites (e.g., replace subqueries with joins or vice versa)
  • Index recommendations
  • Partitioning suggestions (if applicable)
  • Caching or materialization options
  • Pre-aggregation strategies

Keep recommendations prioritized:

  1. High impact / low effort
  2. High impact / high effort
  3. Nice-to-have

9. Re-Test and Benchmark

After changes:

  • Compare before vs. after:
    • Execution time
    • Logical reads
    • CPU usage
  • Validate that improvements hold under realistic conditions

10. Document and Approve

Capture:

  • Final version of the query
  • Key decisions (e.g., why a specific index was added)
  • Known limitations or assumptions

Then approve for deployment.

11. Post-Deployment Monitoring (Often Missed)

After release:

  • Monitor query performance in production
  • Watch for:
    • Plan regressions
    • Data growth effects
    • Parameter sensitivity issues

Optional: Lightweight Checklist Version

If you want something quick and enforceable:

No SELECT *
Proper joins and filters
Indexes support predicates and joins
Execution plan reviewed
No large table scans (unless justified)
Tested with realistic data volume
No obvious scalability risks
Concurrency impact considered

Final Thought

The biggest mistake in SQL reviews is treating them like code reviews. They’re not. SQL performance lives at the intersection of data distribution, indexing, and optimizer behavior. A good workflow forces reviewers to engage with all three—not just the syntax.


Addendum

Additional EXPLAIN Tables


Table Description
DSN_COLDIST_TABLE contains non-uniform column group statistics that are obtained dynamically by Db2 from non-index leaf pages.
DSN_DETCOST_TABLEcontains information about detailed cost estimation of the mini-plans in a query.
DSN_FILTER_TABLE contains information about how predicates are used during query processing
DSN_FUNCTION_TABLE contains descriptions of functions that are used in specified SQL statements
DSN_KEYTGTDIST_TABLE contains non-uniform index expression statistic that are obtained dynamically by the Db2 optimizer
DSN_FUNCTION_TABLE contains descriptions of functions that are used in specified SQL statements
DSN_PGRANGE_TABLE contains information about qualified partitions for all page range scans in a query
DSN_PGROUP_TABLE contains information about about the parallel groups in a query
DSN_PREDICAT_TABLE contains information about all of the predicates in a query; also used as input when you issue a BIND QUERY command to override the selectivity of predicates for matching SQL statements
DSN_PREDICATE_SELECTIVITY contains information about the selectivity of predicates that are used for access path selection; also used as an input table for the BIND QUERY command when selectivity overrides are specified
DSN_PTASK_TABLE contains information about all of the parallel tasks in a query
DSN_QUERYINFO_TABLE contains information about the eligibility of query blocks for automatic query rewrite, information about the MQTs that are considered for eligible query blocks, reasons why ineligible query blocks are not eligible, and information about acceleration of query blocks
DSN_QUERY_TABLE contains information about a SQL statement, and displays the statement before and after query transformation
DSN_SORTKEY_TABLE contains information about sort keys for all of the sorts required by a query
DSN_SORT_TABLE contains information about sort operations required by a query
DSN_STATEMENT_CACHE_TABLE contains information about the SQL statements in the statement cache, information captured as the results of an EXPLAIN STATEMENT CACHE ALL statement
DSN_STATEMENT_TABLE contains information about the estimated cost of specified SQL statements
DSN_STATFEEDBACK contains recommendations for capturing missing or conflicting statistics that are defined during EXPLAIN
DSN_STRUCT_TABLE contains information about all of the query blocks in a query
DSN_VIEWREF_TABLE contains information about all of the views and materialized query tables that are used to process a query

No comments: