Wednesday, July 08, 2015

Influencing the DB2 Optimizer: Part 2 - Standard Methods

Last week, in Part 1 of this series, we introduced the concept and the 5 high-level methods of influencing the DB2 optimizer's access path choices. In today's post, we will tackle the first approach, which is using standard, DB2-based methods.

Of all the methods for influencing the DB2 optimizer, standard DB2 methods are the only mandatory ones. Try all the standard methods covered in this section before attempting any of the other methods (that we will cover in later installments). There are several reasons for this precaution.

The standard methods place the burden for generating optimal access paths on the shoulders of DB2, which is where it usually belongs. They also use IBM-supported techniques available for every version and release of DB2. Finally, these methods generally provide the greatest gain for the smallest effort.

There are four (4) standard methods for tuning DB2 access paths. The first method is ensuring that accurate statistics are available using the RUNSTATS utility and the BIND or REBIND command. RUNSTATS populates the DB2 Catalog with statistics that indicate the state of your DB2 objects, including the following:
  • Their organization
  • Clustering information
  • The cardinality of table spaces, tables, columns, and indexes
  • The range of values for columns
  • Skew and data distribution details

All of these factors are considered by the optimizer when it chooses what it deems to be the optimal access path for a given SQL statement. We will not delve deeply into all of the options at your disposal when running RUNSTATS in this series. 

A good DB2 tuning strategy is to execute RUNSTATS at least once for every table space, table, column, and index known to your DB2 subsystem. Schedule regular RUNSTATS executions for all DB2 objects that are not read-only. This keeps the DB2 Catalog information current, enabling proper access path selection. If you do not run RUNSTATS whenever your data changes significantly, the DB2 optimizer will not have accurate information upon which to rely when determining optimal access paths. And that means your performance will probably suffer.

The second standard method for tuning DB2 access paths is ensuring that the DB2 objects are properly organized. Disorganized objects, if properly reorganized, might be chosen for an access path. An object is disorganized when data modification statements executed against the object cause data to be stored in a non-optimal fashion, such as non-clustered data or data that exists on a different page than its RID, thereby spanning more than one physical page. To organize these objects more efficiently, run the REORG utility, followed by RUNSTATS and REBIND.

Use the DB2 Catalog and Real Time Statistics (RTS) to determine when your DB2 table spaces and indexes need to be reorganized. A few guidelines you might follow include: 

  • Reorganize a table space when the cluster ratio of its clustering index falls below 95%. (Schedule this so that it does not affect system performance and availability.)
  • Reorganize indexes when there have been a lot of INSERTs and DELETEs since the last time it was reorganized. As data is added to and removed from an index, it can cause disorganization that impacts the performance of queries using the index. The RTS columns
REORGINSERTS and REORGDELETES can be examined to ascertain the number of index entries inserted or deleted since the index was reorganized. A good rule of thumb is to consider reorganizing an index when 25% or more entries have been inserted or deleted.
  • Another index-related statistic to pay particular attention to is R
  • REORGAPPENDINSERT, which contains the number of inserts into an index since the last reorganization for which the index key was higher than any existing key value. If this column consistently grows, you have identified an object where data is inserted using an ascending key sequence. You might consider lowering the free space for such objects, because the free space is wasted space if inserts are always done in ascending key sequence. You should also consider reorganizing the index when 20% or more entries have been appended.
  • In addition, the number of index leaf page splits should also be examined when considering index reorganization. The RTS column that indicates this metric is REORGLEAFAR. Think about reorganizing the index at 10% or more.
  • Consider reorganizing any DB2 table space or index when its data set is in multiple physical extents. Before reorganizing, ensure that space allocations have been modified to cause all data to be stored in a single extent.

  • You may want to reorganize more frequently than indicated here by creating scheduled REORG jobs for heavily accessed or critical DB2 table spaces and indexes. This limits performance problems due to disorganized DB2 objects and reduces the number of reorganizations that must be manually scheduled or submitted by a DBA or performance analyst. 

    Also, keep in mind that online REORGs can be run thereby mitigating the impact ot availability that reorganization can cause. For your largest and most important table spaces you might want to consider advanced REORG utilities such as those offered by third party vendors.

    The third standard method for tuning DB2 access paths is to encourage parallelism. Consider changing simple and segmented table spaces to partitioned table spaces to encourage I/O, CPU, and Sysplex parallelism. Furthermore, it may be advantageous to repartition already partitioned table spaces to better align ranges of values, thereby promoting better parallel access.

    Note: Partitioning is not mandatory for parallel access, but it does aid in parallel access path selection and execution.

    The fourth standard method for tuning DB2 access paths is ensuring that there are proper indexes by creating new indexes or dropping unnecessary and unused indexes. DB2 relies on indexes to achieve optimum performance.

    Analyze the predicates in your SQL statements to determine whether there is an index that DB2 can use. Indexes can be used efficiently by DB2 if the first column of the index key is specified in an indexable predicate in the SQL statement. If no index meets these requirements, consider creating one. As you index more columns referenced in predicates, performance generally increases.

    Dropping unused indexes is another critical part of application tuning. Every table INSERT and DELETE incurs I/O to every index defined for that table. Every UPDATE of indexed columns incurs I/O to every index defined for that column. If an index is not being used, drop it. This reduces the I/O incurred for data modification SQL statements, reduces RUNSTATS resource requirements, and speeds REORG and RECOVER processing.

    If table data is accessed by random reads of small amounts of data, consider building a hash to improve the performance of those reads. 

    You should also know which predicates are Stage 1 and which are Stage 2 and try to favor Stage 1 to improve the performance of your queries.

    That's all for today... but tune in again next week for more ways to influence DB2 access paths...

    No comments: