It is important to review SQL statements during testing... certainly before they get to production! A good SQL review workflow shouldn’t feel like bureaucracy, instead it should feel like a safety net that consistently catches performance problems before they hit production. The key is to structure the workflow so that it’s repeatable, lightweight enough to follow, and grounded in how databases actually execute queries.
Here’s a practical, end-to-end workflow you can implement.
1. Define the Context First (Before Looking at SQL)
Before reviewing the query itself, establish intent:
- What is the query supposed to do?
- Expected result set size (1 row, 100 rows, millions?)
- Frequency of execution (ad hoc vs. thousands per minute)
- Performance expectations (sub-second? batch window?)
Without this, you can’t judge whether a query is “fast enough.”
2. Perform a Structural Review
This is a quick, surface-level pass to catch obvious issues:
-
Avoid
SELECT *(unless absolutely justified... and it almost never is) -
Ensure proper filtering (
WHEREclauses exist and are meaningful) - Check join conditions (no accidental cross joins)
-
Validate correct use of
INNERvsOUTERjoins - Look for unnecessary nested subqueries or overly complex CTE chains
- Confirm consistent aliasing and readability
This step is less about performance tuning and more about preventing obvious inefficiencies.
3. Validate Data Access Paths
Now move into performance-critical territory:
-
Are indexes available for:
- Join columns?
- Filter predicates?
- Sorting/grouping columns?
-
Are functions being applied to indexed columns (breaking index usage)?
- For example:
WHERE YEAR(order_date) = 2024(unless you have an index on the expression)
- For example:
- Are implicit data type conversions occurring?
If access paths are wrong, everything else becomes irrelevant.
4. Review the Execution Plan
This is probably the most important step. You need to be able to review PLAN_TABLE output and understand what the columns mean. A visual EXPLAIN capability (from any number of tools) is also useful to simplify this step (and to include additional information like statistics, column type/length, etc.)
Look for:
- Table scans on large tables (red flag unless justified)
- Index scans vs. index seeks
- Join methods (nested loop vs. merge scan vs. hybrid)
- High-cost operators
- Cardinality estimation issues (expected vs. actual rows)
Questions to ask:
- Is the optimizer choosing the expected path?
- Are there warning signs like excessive sorts or large intermediate results?
- if we are joining what type of join is used (NLJ, MS, Hybrid),
- was an index used, and if so how many columns matched,
- are we doing a scan, and if so what type of scan (full or page range)
- is prefetch being used, and if so what type (sequential, list)
- was a hint used
- was parallelism used, and if so what degree and type (I/O, CPU, Sysplex)
- was a sort required, and if so why (Join, Unique, Group By, Order By)
- what type of locking is required
5. Evaluate Data Volume Impact
Because the size of development test data/tables is usually nowhere near the size of production, be sure to test the query against realistic data volumes (and/or realistically set statistics). Small dev/test volumes frequently hide problems!
-
Check:
- Execution time
- I/O (logical/physical reads)
- Memory usage
- Consider worst-case scenarios:
- End-of-month data spikes
- Historical queries over large ranges
6. Check for Scalability Risks
A query that works today may fail tomorrow.
Look for:
- Stage 2 and non-indexable predicates
- Repeated correlated subqueries
- Row-by-row processing
- Excessive temp table usage
- Sorting or grouping on large data sets without support
Ask yourself questions like What happens when data grows 10x?
7. Validate Concurrency and Locking Behavior
Performance isn’t just speed, it’s also how queries behave under load. I frequently say that to optimize performance you want to be decreasing I/O, decreasing CPU usage, and increasing concurrency. Examine your SQL and access paths for the following
-
Does the query:
- Lock large portions of a table?
- Escalate locks?
- Is isolation level appropriate?
- Are there risks of blocking or deadlocks?
8. Recommend Improvements
At this point, provide actionable feedback:
- Query rewrites (e.g., replace subqueries with joins or vice versa)
- Index recommendations
- Partitioning suggestions (if applicable)
- Caching or materialization options
- Pre-aggregation strategies
Keep recommendations prioritized:
- High impact / low effort
- High impact / high effort
- Nice-to-have
9. Re-Test and Benchmark
After changes:
-
Compare before vs. after:
- Execution time
- Logical reads
- CPU usage
- Validate that improvements hold under realistic conditions
10. Document and Approve
Capture:
- Final version of the query
- Key decisions (e.g., why a specific index was added)
- Known limitations or assumptions
Then approve for deployment.
11. Post-Deployment Monitoring (Often Missed)
After release:
- Monitor query performance in production
-
Watch for:
- Plan regressions
- Data growth effects
- Parameter sensitivity issues
Optional: Lightweight Checklist Version
If you want something quick and enforceable:
NoSELECT *Proper joins and filters
Indexes support predicates and joins
Execution plan reviewed
No large table scans (unless justified)
Tested with realistic data volume
No obvious scalability risks
Concurrency impact considered
Final Thought
The biggest mistake in SQL reviews is treating them like code reviews. They’re not. SQL performance lives at the intersection of data distribution, indexing, and optimizer behavior. A good workflow forces reviewers to engage with all three—not just the syntax.
Addendum
Additional EXPLAIN Tables
| Table | Description |
|---|---|
| DSN_COLDIST_TABLE | contains non-uniform column group statistics that are obtained dynamically by Db2 from non-index leaf pages. |
| DSN_DETCOST_TABLE | contains information about detailed cost estimation of the mini-plans in a query. |
| DSN_FILTER_TABLE | contains information about how predicates are used during query processing |
| DSN_FUNCTION_TABLE | contains descriptions of functions that are used in specified SQL statements |
| DSN_KEYTGTDIST_TABLE | contains non-uniform index expression statistic that are obtained dynamically by the Db2 optimizer |
| DSN_FUNCTION_TABLE | contains descriptions of functions that are used in specified SQL statements |
| DSN_PGRANGE_TABLE | contains information about qualified partitions for all page range scans in a query |
| DSN_PGROUP_TABLE | contains information about about the parallel groups in a query |
| DSN_PREDICAT_TABLE | contains information about all of the predicates in a query; also used as input when you issue a BIND QUERY command to override the selectivity of predicates for matching SQL statements |
| DSN_PREDICATE_SELECTIVITY | contains information about the selectivity of predicates that are used for access path selection; also used as an input table for the BIND QUERY command when selectivity overrides are specified |
| DSN_PTASK_TABLE | contains information about all of the parallel tasks in a query |
| DSN_QUERYINFO_TABLE | contains information about the eligibility of query blocks for automatic query rewrite, information about the MQTs that are considered for eligible query blocks, reasons why ineligible query blocks are not eligible, and information about acceleration of query blocks |
| DSN_QUERY_TABLE | contains information about a SQL statement, and displays the statement before and after query transformation |
| DSN_SORTKEY_TABLE | contains information about sort keys for all of the sorts required by a query |
| DSN_SORT_TABLE | contains information about sort operations required by a query |
| DSN_STATEMENT_CACHE_TABLE | contains information about the SQL statements in the statement cache, information captured as the results of an EXPLAIN STATEMENT CACHE ALL statement |
| DSN_STATEMENT_TABLE | contains information about the estimated cost of specified SQL statements |
| DSN_STATFEEDBACK | contains recommendations for capturing missing or conflicting statistics that are defined during EXPLAIN |
| DSN_STRUCT_TABLE | contains information about all of the query blocks in a query |
| DSN_VIEWREF_TABLE | contains information about all of the views and materialized query tables that are used to process a query |


