The Db2 Portal Blog: access paths

Showing posts with label access paths. Show all posts

Wednesday, July 08, 2015

Influencing the DB2 Optimizer: Part 2 - Standard Methods

Last week, in Part 1 of this series, we introduced the concept and the 5 high-level methods of influencing the DB2 optimizer's access path choices. In today's post, we will tackle the first approach, which is using standard, DB2-based methods.

Of all the methods for influencing the DB2 optimizer, standard DB2 methods are the only mandatory ones. Try all the standard methods covered in this section before attempting any of the other methods (that we will cover in later installments). There are several reasons for this precaution.

The standard methods place the burden for generating optimal access paths on the shoulders of DB2, which is where it usually belongs. They also use IBM-supported techniques available for every version and release of DB2. Finally, these methods generally provide the greatest gain for the smallest effort.

There are four (4) standard methods for tuning DB2 access paths. The first method is ensuring that accurate statistics are available using the RUNSTATS utility and the BIND or REBIND command. RUNSTATS populates the DB2 Catalog with statistics that indicate the state of your DB2 objects, including the following:

Their organization
Clustering information
The cardinality of table spaces, tables, columns, and indexes
The range of values for columns
Skew and data distribution details

All of these factors are considered by the optimizer when it chooses what it deems to be the optimal access path for a given SQL statement. We will not delve deeply into all of the options at your disposal when running RUNSTATS in this series.

A good DB2 tuning strategy is to execute RUNSTATS at least once for every table space, table, column, and index known to your DB2 subsystem. Schedule regular RUNSTATS executions for all DB2 objects that are not read-only. This keeps the DB2 Catalog information current, enabling proper access path selection. If you do not run RUNSTATS whenever your data changes significantly, the DB2 optimizer will not have accurate information upon which to rely when determining optimal access paths. And that means your performance will probably suffer.

The second standard method for tuning DB2 access paths is ensuring that the DB2 objects are properly organized. Disorganized objects, if properly reorganized, might be chosen for an access path. An object is disorganized when data modification statements executed against the object cause data to be stored in a non-optimal fashion, such as non-clustered data or data that exists on a different page than its RID, thereby spanning more than one physical page. To organize these objects more efficiently, run the REORG utility, followed by RUNSTATS and REBIND.

Use the DB2 Catalog and Real Time Statistics (RTS) to determine when your DB2 table spaces and indexes need to be reorganized. A few guidelines you might follow include:

Reorganize a table space when the cluster ratio of its clustering index falls below 95%. (Schedule this so that it does not affect system performance and availability.)
Reorganize indexes when there have been a lot of INSERTs and DELETEs since the last time it was reorganized. As data is added to and removed from an index, it can cause disorganization that impacts the performance of queries using the index. The RTS columns

REORGINSERTS and REORGDELETES can be examined to ascertain the number of index entries inserted or deleted since the index was reorganized. A good rule of thumb is to consider reorganizing an index when 25% or more entries have been inserted or deleted.

Another index-related statistic to pay particular attention to is R

REORGAPPENDINSERT, which contains the number of inserts into an index since the last reorganization for which the index key was higher than any existing key value. If this column consistently grows, you have identified an object where data is inserted using an ascending key sequence. You might consider lowering the free space for such objects, because the free space is wasted space if inserts are always done in ascending key sequence. You should also consider reorganizing the index when 20% or more entries have been appended.

In addition, the number of index leaf page splits should also be examined when considering index reorganization. The RTS column that indicates this metric is REORGLEAFAR. Think about reorganizing the index at 10% or more.

Consider reorganizing any DB2 table space or index when its data set is in multiple physical extents. Before reorganizing, ensure that space allocations have been modified to cause all data to be stored in a single extent.

You may want to reorganize more frequently than indicated here by creating scheduled REORG jobs for heavily accessed or critical DB2 table spaces and indexes. This limits performance problems due to disorganized DB2 objects and reduces the number of reorganizations that must be manually scheduled or submitted by a DBA or performance analyst.

Also, keep in mind that online REORGs can be run thereby mitigating the impact ot availability that reorganization can cause. For your largest and most important table spaces you might want to consider advanced REORG utilities such as those offered by third party vendors.

The third standard method for tuning DB2 access paths is to encourage parallelism. Consider changing simple and segmented table spaces to partitioned table spaces to encourage I/O, CPU, and Sysplex parallelism. Furthermore, it may be advantageous to repartition already partitioned table spaces to better align ranges of values, thereby promoting better parallel access.

Note: Partitioning is not mandatory for parallel access, but it does aid in parallel access path selection and execution.

The fourth standard method for tuning DB2 access paths is ensuring that there are proper indexes by creating new indexes or dropping unnecessary and unused indexes. DB2 relies on indexes to achieve optimum performance.

Analyze the predicates in your SQL statements to determine whether there is an index that DB2 can use. Indexes can be used efficiently by DB2 if the first column of the index key is specified in an indexable predicate in the SQL statement. If no index meets these requirements, consider creating one. As you index more columns referenced in predicates, performance generally increases.

Dropping unused indexes is another critical part of application tuning. Every table INSERT and DELETE incurs I/O to every index defined for that table. Every UPDATE of indexed columns incurs I/O to every index defined for that column. If an index is not being used, drop it. This reduces the I/O incurred for data modification SQL statements, reduces RUNSTATS resource requirements, and speeds REORG and RECOVER processing.

If table data is accessed by random reads of small amounts of data, consider building a hash to improve the performance of those reads.

You should also know which predicates are Stage 1 and which are Stage 2 and try to favor Stage 1 to improve the performance of your queries.

That's all for today... but tune in again next week for more ways to influence DB2 access paths...

Wednesday, July 01, 2015

Influencing the DB2 Optimizer: Part 1

Today is July 1, 2015 - mid way through the year and the beginning a fresh, new month. With that in mind, I'm kicking off a new series of blog posts here on the various methods of influencing the DB2 Optimizer's access path decisions. The bulk of the material will be excerpted from my book - DB2 Developer's Guide - which just so happens to be on sale for 40% over the 4ht of July holidays at InformIT if you're interested in picking up a copy.

Before going into the various methods that can be used to alter access paths or influence access path selection, let's first take a moment to laud the DB2 optimizer and the fine minds that built it. The DB2 optimizer is one of the most intricate pieces of software on the market. It does an admirable job of optimizing SQL requests. To achieve this level of success, the optimizer contains a great deal of performance-specific expertise. For example, the optimizer estimates both elapsed times and CPU times when choosing an access path. When a SQL statement is rebound, the optimizer might choose a new access path that increases CPU time but decreases elapsed time.

Now many shops may choose to enhance elapsed time at the expense of additional CPU use because elapsed time has a measurable effect on user productivity. In other words, it can be good to trade off CPU cycles for user satisfaction, and the DB2 optimizer attempts to accomplish this. Of course, if both CPU and elapsed time can be reduced, the optimizer tries to do that, too.

But the DB2 optimizer is not infallible. Sometimes the application analyst or DBA understands the nature of the data better than DB2. And that is the reason that there are ways to influence the optimizer into choosing an access path that you know is a better one but the optimizer thinks is a worse one.

As the functionality and complexity of the optimizer is enhanced from release to release of DB2, the need to trick the optimizer in this way invariably diminishes. But until the day when software is perfect, the need to tweak it will remain (and I'm pretty sure that will be the case throughout my lifetime).

There are five ways to influence the optimizer’s access path decisions:

Updating DB2 Catalog statistics
Standard, DB2-based methods
Tweaking SQL statements
Specifying the OPTIMIZE FOR n ROWS clause
Using OPTHINT to indicate that an access path in the PLAN_TABLE should be chosen

Over the course of the ensuing weeks, we will examine each of these methods, and more. So stay tuned as we delve into the techniques at your disposal to influence the choices made by the DB2 optimizer.

Tuesday, January 20, 2015

Approaches to DB2 Access Path Management

BIND and REBIND are crucially important components as you attempt to assure efficient DB2 applications. Because the BIND/REBIND process determines exactly how your DB2 data is accessed it is important that you develop an appropriate strategy for when and how to REBIND your programs.

There are several common REBIND approaches taken by DB2 users. By far, the best approach is to REBIND your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change.

Other approaches include REBINDing only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to REBIND automatically after a regular period of time (days, weeks, months, etc.). This approach can work if the period of time is wisely chosen based on the application data – but it still can pose administrative issues.

The final approach can be summarized as “if it ain’t broke don’t fix it!” This is the worst of the several approaches discussed here. The biggest problem with this approach is that you are penalizing every program in your subsystem for fear that a program or two may have a degraded access path. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data changes.

Of course, the possibility of degraded performance is real – and that is why this approach has been adopted at some sites. The problem is being able to find which statements have degraded. In an ideal world we would be to be able to review the access path changes beforehand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to REBIND on a regular basis as your data changes. This approach has become known as the Three Rs. To implement this approach you:

Regularly reorganize the data to ensure that it is optimally structured.
Follow that with RUNSTATS to be sure that the reorganized state of the data is reflected in the DB2 Catalog.
And follow that with a REBIND for all the application programs that access the data structures impacted by the REORG and RUNSTATS.

At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions. DB2 is unlikely to make the same access path choice as your data grows – and as patterns within the data change.

By REBINDing you can generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. Additionally, as DB2 changes are introduced (PTFs, new version/release) optimizer improvements and new access techniques can be incorporated into the access paths. That is, if you never REBIND, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

Adopting the Three R’s approach can pose additional questions. For example, when should you reorganize? In order to properly determine when a REORG is needed you’ll have to look at statistics. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps it should be at least 4 R’s – in other words:

RTS (or RUNSTATS)
REORG
RUNSTATS
REBIND

Now it is true that some folks don’t rely on statistics to schedule a REORG. Instead, they just build the JCL to REORG their database objects when they create the object. So they create a table space then build the REORG job and schedule it to run monthly, or quarterly, or on some regular basis. This is better than no REORG at all, but it is probably not the best approach because you are most likely either reorganizing too soon (in which case you waste the CPU cycles to do the REORG) or you are reorganizing too late (in which case performance is suffering for a period of time before the REORG runs). Better to base your REORGs off of statistics and thresholds using either RUNSTATS or RTS.

Without accurate statistics there is little hope that the optimizer will formulate the best access path to retrieve your data. If the optimizer does not have accurate information on the size, organization, and particulars of your data then it will be creating access paths based on either default or inaccurate statistics. Incorrect statistics will cause bad choices to be made – such as choosing a merge-scan join when a nested loop join would be better, or failure to invoke sequential prefetch, or using the wrong index – or no index at all. And the problem of inaccurate statistics is pervasive. There are shops out there that never, or rarely, run RUNSTATS to gather up-to-date statistics. Make sure yours is not one of those shops!

When should you run RUNSTATS? One answer is “As frequently as possible based on how often your data changes.” To do this you will need to know a thing or two about your data growth patterns: what is its make-up, how is it used, how fast does it grow, and how often does it change? These patterns will differ for every table space in your system.

Next we need to decide when to REBIND? The best answer for this is when statistics have changed significantly enough to change access paths. When we know that data has significantly changed it makes sense to REBIND after the RUNSTATS completes. But the trick is determining exactly when we have a “significant” change in our data. Without an automated method of comparing and contrasting statistics (or even better yet, access paths) coming up with an answer in a manual way can be time-consuming and error-prone – especially if we have thousands of DB2 programs to manage.

As we REBIND, we always must be on alert for rogue access paths. A rogue access path is created when the optimizer formulates a new access path that performs worse than the previous access path. This can happen for a variety of reasons. Of course, number one is that the optimizer, though good, is not perfect. So mistakes can happen. Other factors can cause degraded access paths, too. The access paths for volatile tables depend on when you run the RUNSTATS. Volatile tables are those that start out empty, get rows added to them during processing, and are emptied out at the end of the day. And, of course, if the catalog or statistics are not accurate we can get problems, too.

So adopting the Four Rs approach implies that you will have to develop a methodology for reviewing your access paths and taking care of any “potential” problem access paths. Indeed, the Four Rs becomes The Five Rs as we add a step to review the access paths after REBINDing to make sure that there are no rogue access paths:

Start with Real Time Stats (or RUNSTATS) to determine when to REORG.
Then we Reorganize the needy table spaces (and indexes)
After reorganizing, run RUNSTATS to pick up current statistics,
Follow that with the BEBINDs.
Then we need that fifth R – which is to Review the access paths generated by the REBIND.

The review is of utmost importance because the optimizer can make mistakes. And, of course, so can you. But your users will not call you when performance is better (or the same). They only dial your numbers when performance gets worse. As such, proactive shops will put best practices in place to test REBIND results comparing the before and after impact of the optimizer’s choices.

Tuesday, July 08, 2014

DB2 Application Performance Management

Assuring optimal performance for database applications can be a tricky thing. In today's blog I ruminate on the high-level issues involved in optimizing your DB2 for z/OS applications.

Applications that access databases are only as good as the performance they achieve. And every user wants their software to run as fast as possible. As such, performance tuning and management is one of the biggest demands on the DBA’s time. When asked what is the single most important or stressful aspect of their job, DBAs typically respond "assuring optimal performance." Indeed, a Forrester Research survey indicates that performance and troubleshooting tops the list of most challenging DBA tasks.

But when you are dealing with data in a database management system there are multiple interacting components that must be managed and tuned to achieve optimal performance. That is, every database application, at its core, requires three components to operate:

· the system (that is, the DBMS itself, the network, and the O/S),

· the database (that is, the DDL and database schema), and

· the application (that is, the SQL and program logic).

Each of these components requires care and attention, but today I want to focus on the high-level aspects of performance management from the perspective of the application. Furthermore, I will discuss this in terms of DB2 for z/OS.

So where do we begin? For DB2, a logical starting point is with BIND Parameters. There are many parameters and values that must be chosen from and specified when you bind a DB2 application program. The vast array of options at our disposal can render the whole process extremely confusing -- especially if you don’t bind on a daily basis. And even if you do, some of the options still might be confusing if you rarely have to change them. You know what I’m talking about, parameters like ACQUIRE, RELEASE, VALIDATE, and DEGREE.

I will not delve into the myriad bind options and give you advice on which to use when. There are many articles and books, as well as the IBM DB2 manuals that you can use to guide you along that path. Suffice it to say, that there are some standard parameters and values that should be chosen most of the time in most situations. As such, a wise DBA group will set up canned routines for the programmers to use for compiling and binding their applications. Choices such as: CICS transaction, DB2 batch, and BI/analytical query can be presented to the developer and then, based on which of the various types of programs and environments that are available, the canned script can choose the proper bind options. Doing so can greatly diminish the problems that can be encountered when the "wrong" parameters or values are chosen at Bind time.

Before concluding this short section on Bind parameters I want to give one important piece of advice: In production, always Bind your plans and packages specifying EXPLAIN YES. Failing to do so means that access paths will be generated, but you will not know what they are. This is akin to blinding yourself to what DB2 is doing and it makes application performance tuning much more difficult.

Access Path Management

Bind and Rebind are important components to achieve optimal DB2 application performance. This is so because these commands are what determine the access paths to the data requested by your program. So it is vitally important that you create a strategy for when and how to Rebind your programs. There are several common approaches. The best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change. More on this in a moment.

Other approaches include Rebinding only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to Rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data â€“ but it still can pose significant administrative issues.

The final approach -- the worst of the bunch -- comes from the if it ain’t broke don’t fix it school of thought. Basically, it boils down to (almost) never rebinding your programs. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Oh, the possibility of degraded performance is real and that is why this approach has been adopted by some. And it can be difficult to find which statements may have degraded after a Rebind. The ideal situation would allow us to review the access path changes before hand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions.

By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to DB2 (new releases/PTFs) optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need 4 R’s:

RUNSTATS or preferably, RTS
REORG
RUNSTATS
REBIND

But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another R to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don't call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.

So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal DB2 application performance.

Tuning the Code

Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with -- and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.

SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:

Let SQL do the work instead of the program. For example, code an SQL join instead of two cursors using program logic to join.
Simpler is generally better, but complex SQL can be very efficient.
Retrieve only the columns required, never more.
Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate.
When joining tables, always provide join predicates. In other words, avoid Cartesian products.
Favor using Stage 1 and Indexable predicates.
But favor Stage 2 predicates over application logic.
Avoid sorting (if possible) by creating indexes for ORDER BY and GROUP BY operations.
Avoid black boxes -- that is, avoid I/O routines that are called by programs instead of using embedded SQL.
Minimize deadlocks by updating tables in the same sequence in every program.
Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.

Even if you follow the guidelines in this bulleted list, there will still be numerous opportunities for you to tune SQL for performance. To tune SQL you must be able to interpret the output of the access paths produced by EXPLAIN. This information is encoded in the plan tables. IBM offers Data Studio (as a free download) with a visual explain capability that can simplify this process. But you will also have to accumulate experience as to which SQL formulations work more efficiently than others. This skill will come with time and on-the-job learning.

Finally, some attention must be paid to the host language code. Host language code refers to the application programs written in C, COBOL, Java, Visual Basic or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem.

Bottom Line

Although DBAs must understand all three aspects of database performance management concentrating on the application aspects of performance will most likely provide the most bang-for-the-buck. Of course, we have only touched the tip of the DB2 application performance iceberg today. But even this high-level view into application performance can serve as a nice starting place for tuning your DB2 applications.

Good luck with DB2 for z/OS and happy performance tuning!

Monday, February 17, 2014

Rebinding for Optimal DB2 Access Paths

The access paths formulated by the DB2 optimizer during the BIND and REBIND processes are critical to your application performance. It is these access paths that determine not only how DB2 data is accessed by your program, but how efficiently it is accessed. Whether you are preparing a new program, implementing changes into your existing DB2 applications, upgrading to a new version of DB2, or simply trying to achieve optimum performance for existing applications, an exhaustive and thorough REBIND management policy should be of paramount importance.

However, many organizations are not doing everything possible to keep access paths up-to-date with the current state of their data. So what is the best practice approach for rebinding your DB2 programs? The answer is “The Five R’s.” This methodology of regular rebinding followed by a review of your access paths required the following steps:

Start with an inspection of the RTS (Real Time Statistics) to determine which objects need to be reorganized.
Follow that up by running a REORG on table spaces and indexes as appropriate based on the statistics.
After reorganizing, run RUNSTATS (to ensure the DB2 Catalog is up-to-date).
Follow that with REBINDs of your programs.
Then we need that fifth R – which is to review the access paths generated by the REBIND.

For shops that have avoided rebinding for years this approach represents a significant change. So what new DB2 features are available to help? Well, several releases ago, back in DB2 9 for z/OS, plan stability was added. This feature enables you to save a backup version of your access paths as a precautionary measure. If any of the new access paths after rebinding are less efficient, the DBA can switch back to the backed up access paths. In order to implement this level of stability you can use the PLANMGMT parameter of the REBIND command. There are three options: NONE, BASIC, and EXTENDED. BASIC saves the previous access paths, and EXTENDED saves the previous and an original. You can use REBIND and the SWITCH parameter to revert back to the saved access paths when the new access paths cause degraded performance.

As of DB2 10 for z/OS you can tell DB2 to try to reused previous access paths for SQL statements whenever possible. This is called access path reuse and is implemented using the APREUSE bind option. When invoked, DB2 uses information about the previous access paths to create a hint.

When BIND PACKAGE or REBIND PACKAGE specifies APREUSE(ERROR), DB2 tries to locate the access path information from a package that has a matching identity. If no such package exists, DB2 tries to locate another recent version of the package that has the matching location, collection ID, and name. The APREUSE option only applies to statements that have identical statement text in both packages. Newly added statements and statements with text changes never reuse previous access paths.

Reusing access paths can help to minimize administrative and performance issues involved in rebinding.

Of course, there are products on the market which can be used to implement a proactive approach to rebinding. These products preview the new access paths and then run them through a rules system to determine if the new access will be improved, unchanged, or degraded. With this information we can rebind everything that would improve and avoid rebinding anything else until we can analyze the cause of the degradation. Using such an approach you should not have degraded access paths sneaking into your production environment.

Summary

At any rate, a systematic approach to DB2 binding and rebinding is necessary to assure optimal performance within your DB2 applications. This short blog entry covers some of the basics and recent changes to DB2 in this area.

Be sure to take your time and to plan your statistics-gathering and bind/rebind approach... or be ready to be in constant firefighter mode as you address poorly-performing SQL in your applications!