The Db2 Portal Blog: optimization

Showing posts with label optimization. Show all posts

Tuesday, July 21, 2015

Influencing the DB2 Optimizer: Part 4 - OPTIMIZE FOR n ROWS

In the first installments of this series of posts on ways to influence the DB2 optimizer we looked at standard methods like RUNSTATS and SQL tweaking techniques. Today, we will look at yet another method of influencing access path selection: specifying the OPTIMIZE FOR n ROWS for a SELECT statement.

This clause enables programmers to specify the estimated maximum number of rows that will be retrieved. By indicating that a different number of rows will be returned than DB2 anticipates, you can influence access path selection.

For example, consider the following statement:

EXEC SQL

DECLARE OPT_CUR FOR

SELECT WORKDEPT, EMPNO, SALARY

FROM DSN81010.EMP

WHERE WORKDEPT IN (‘A00’, ‘D11’)

OPTIMIZE FOR 5 ROWS

END-EXEC.

In this example, the number of rows to be optimized for has been set to 5. Keep in mind, though, that this this query could return more than 5 rows. DB2 formulates an access path optimized for 5 rows. More rows can be retrieved, but performance could suffer if you greatly exceed the estimated maximum.

Note: The FETCH FIRST n ROWS ONLY clause can be used to limit the number of rows returned regardless of the number that actually qualify.

Using OPTIMIZE FOR generally is preferable to both manually updating the DB2 Catalog statistics (our next topic, so say tuned) and tweaking the SQL statement (our previous topic). It provides more information to DB2’s optimization process, thereby giving DB2 the opportunity to establish a better access path. The crucial point, though, is that DB2 is doing the optimization; no manual updates or artificial SQL constructs are required.

When using the OPTIMIZE FOR n ROWS clause, it is a good idea to make n as accurate as possible. An accurate estimate gives DB2 the best opportunity to achieve optimum performance for the statement and also helps document the purpose of the SQL statement. For example, if you are coding an online transactions in which 25 rows (for example) are displayed on the screen, code OPTIMIZE FOR 25 ROWS.

But this is not really a hard and fast rule. You can tinker with different values to see what impact it may have on the optimization process.

There is another "trick" that you can deploy using OPTIMIZE FOR n ROWS. By setting n=1 you should be able to disable list prefetch for the query.

One caution to keep in mind: DB2 uses the value of n for the block size of a distributed network request. The smaller the value of n, the fewer rows sent across the network for each block. The only exception is that when n=1, DB2 will set the block size to 16.

Synopsis

So if you are having trouble with a specific access path, consider using the OPTIMIZE FOR n ROWS clause with various values for n. It just might give you some reasonable results.

Thursday, July 16, 2015

Influencing the DB2 Optimizer: Part 3 - Tweaking SQL Statements

In Part 2 of this series we took a look at standard methods of influencing the DB2 optimizer, essentially boiling down to running RUNSTATS, reorganizing, encouraging parallelism and indexing.

But sometimes you may not want to change the DB2 Catalog statistics (or use hints, which we will cover later in this series). In such situations, you might consider tweaking the offending SQL statement. Tweaking is the process of changing a statement in a non-intuitive fashion, without altering its functionality. Let's take a look at some of these methods...

At times, you may need to disable a specific index from being considered by the optimizer. One method of achieving this is to append OR 0 = 1 to the predicate. For example, consider a query against the EMP table on which two indexes exist: one on EMPNO and one on WORKDEPT. Appending OR 0 = 1 (as shown next) to the WORKDEPT predicate will cause DB2 to avoid using an index on WORKDEPT.

SELECT EMPNO, WORKDEPT, EDLEVEL, SALARY

FROM EMP

WHERE EMPNO BETWEEN ‘000020’ AND ‘000350’

AND (WORKDEPT > ‘A01’ OR 0 = 1);

The OR 0 = 1 clause does not change the results of the query, but it can change the access path chosen.

There are other tweaks that can be used to dissuade the DB2 optimizer from choosing a specific index, such as multiplying or dividing by 1, adding or subtracting 0, or appending an empty string to a character column.

Another method of tweaking SQL to influence DB2’s access path selection is to code redundant predicates. When DB2 calculates the filter factor for a SQL statement, it multiplies the filter factors for all predicates connected with AND.

You can lower the filter factor of a query by adding redundant predicates. For example, consider the following statement:

SELECT LASTNAME

FROM DSN81010.EMP

WHERE WORKDEPT = :VAR;

You can attempt to lower the filter factor for the predicate by adding redundant predicates, for example:

SELECT LASTNAME

FROM DSN81010.EMP

WHERE WORKDEPT = :VAR

AND WORKDEPT = :VAR

AND WORKDEPT = :VAR;

The two predicates added to the end are redundant and do not affect SQL statement functionally. However, DB2 calculates a lower filter factor, which increases the possibility that an index on the WORKDEPT column will be chosen. The lower filter factor also increases the possibility that the table will be chosen as the outer table, if the redundant predicates are used for a join.

When redundant predicates are added to enhance performance, be sure to document the reasons for the extra predicates. Failure to do so may cause a maintenance programmer to assume that the redundant predicates are an error and thus remove them.

Another option for getting a small amount of performance out of an SQL statement is to change the physical order of the predicates in your SQL code. DB2 evaluates predicates first by predicate type, then according to the order in which it encounters the predicates. The four types of SQL predicates are listed in the order that DB2 processes them:

Equality, in which a column is tested for equivalence to another column, a variable, or a literal
Ranges, in which a column is tested against a range of values (for example, greater than, less than, or BETWEEN)
IN, where a column is tested for equivalence against a list of values
Stage 2 predicates

Understanding this hierarchy, you can place the most restrictive predicates at the beginning of your predicate list. For example, consider the following query:

SELECT LASTNAME

FROM DSN8810.EMP

WHERE WORKDEPT = ‘A00’

AND SEX = ‘M’;

The first predicate has a lower filter factor than the second because there are fewer workers in department A00 than there are males in the entire company. This probably will not increase performance by much, but it can shave a little off a query’s processing time.

Before deciding to tweak SQL statements to achieve different access paths, remember that you are changing SQL code in a non-intuitive fashion. For each modification you make to increase performance, document the reasons in the program, the data dictionary or repository (if one exists), and the system documentation. Otherwise, the tweaked SQL could be maintained after it is no longer required, or modified away when it still is required for performance.

Also remember that the changes could enhance performance for one release of DB2 but result in no gain or decreased efficiency in subsequent releases. Re-examine your SQL for each new version and release of DB2.

Wednesday, July 08, 2015

Influencing the DB2 Optimizer: Part 2 - Standard Methods

Last week, in Part 1 of this series, we introduced the concept and the 5 high-level methods of influencing the DB2 optimizer's access path choices. In today's post, we will tackle the first approach, which is using standard, DB2-based methods.

Of all the methods for influencing the DB2 optimizer, standard DB2 methods are the only mandatory ones. Try all the standard methods covered in this section before attempting any of the other methods (that we will cover in later installments). There are several reasons for this precaution.

The standard methods place the burden for generating optimal access paths on the shoulders of DB2, which is where it usually belongs. They also use IBM-supported techniques available for every version and release of DB2. Finally, these methods generally provide the greatest gain for the smallest effort.

There are four (4) standard methods for tuning DB2 access paths. The first method is ensuring that accurate statistics are available using the RUNSTATS utility and the BIND or REBIND command. RUNSTATS populates the DB2 Catalog with statistics that indicate the state of your DB2 objects, including the following:

Their organization
Clustering information
The cardinality of table spaces, tables, columns, and indexes
The range of values for columns
Skew and data distribution details

All of these factors are considered by the optimizer when it chooses what it deems to be the optimal access path for a given SQL statement. We will not delve deeply into all of the options at your disposal when running RUNSTATS in this series.

A good DB2 tuning strategy is to execute RUNSTATS at least once for every table space, table, column, and index known to your DB2 subsystem. Schedule regular RUNSTATS executions for all DB2 objects that are not read-only. This keeps the DB2 Catalog information current, enabling proper access path selection. If you do not run RUNSTATS whenever your data changes significantly, the DB2 optimizer will not have accurate information upon which to rely when determining optimal access paths. And that means your performance will probably suffer.

The second standard method for tuning DB2 access paths is ensuring that the DB2 objects are properly organized. Disorganized objects, if properly reorganized, might be chosen for an access path. An object is disorganized when data modification statements executed against the object cause data to be stored in a non-optimal fashion, such as non-clustered data or data that exists on a different page than its RID, thereby spanning more than one physical page. To organize these objects more efficiently, run the REORG utility, followed by RUNSTATS and REBIND.

Use the DB2 Catalog and Real Time Statistics (RTS) to determine when your DB2 table spaces and indexes need to be reorganized. A few guidelines you might follow include:

Reorganize a table space when the cluster ratio of its clustering index falls below 95%. (Schedule this so that it does not affect system performance and availability.)
Reorganize indexes when there have been a lot of INSERTs and DELETEs since the last time it was reorganized. As data is added to and removed from an index, it can cause disorganization that impacts the performance of queries using the index. The RTS columns

REORGINSERTS and REORGDELETES can be examined to ascertain the number of index entries inserted or deleted since the index was reorganized. A good rule of thumb is to consider reorganizing an index when 25% or more entries have been inserted or deleted.

Another index-related statistic to pay particular attention to is R

REORGAPPENDINSERT, which contains the number of inserts into an index since the last reorganization for which the index key was higher than any existing key value. If this column consistently grows, you have identified an object where data is inserted using an ascending key sequence. You might consider lowering the free space for such objects, because the free space is wasted space if inserts are always done in ascending key sequence. You should also consider reorganizing the index when 20% or more entries have been appended.

In addition, the number of index leaf page splits should also be examined when considering index reorganization. The RTS column that indicates this metric is REORGLEAFAR. Think about reorganizing the index at 10% or more.

Consider reorganizing any DB2 table space or index when its data set is in multiple physical extents. Before reorganizing, ensure that space allocations have been modified to cause all data to be stored in a single extent.

You may want to reorganize more frequently than indicated here by creating scheduled REORG jobs for heavily accessed or critical DB2 table spaces and indexes. This limits performance problems due to disorganized DB2 objects and reduces the number of reorganizations that must be manually scheduled or submitted by a DBA or performance analyst.

Also, keep in mind that online REORGs can be run thereby mitigating the impact ot availability that reorganization can cause. For your largest and most important table spaces you might want to consider advanced REORG utilities such as those offered by third party vendors.

The third standard method for tuning DB2 access paths is to encourage parallelism. Consider changing simple and segmented table spaces to partitioned table spaces to encourage I/O, CPU, and Sysplex parallelism. Furthermore, it may be advantageous to repartition already partitioned table spaces to better align ranges of values, thereby promoting better parallel access.

Note: Partitioning is not mandatory for parallel access, but it does aid in parallel access path selection and execution.

The fourth standard method for tuning DB2 access paths is ensuring that there are proper indexes by creating new indexes or dropping unnecessary and unused indexes. DB2 relies on indexes to achieve optimum performance.

Analyze the predicates in your SQL statements to determine whether there is an index that DB2 can use. Indexes can be used efficiently by DB2 if the first column of the index key is specified in an indexable predicate in the SQL statement. If no index meets these requirements, consider creating one. As you index more columns referenced in predicates, performance generally increases.

Dropping unused indexes is another critical part of application tuning. Every table INSERT and DELETE incurs I/O to every index defined for that table. Every UPDATE of indexed columns incurs I/O to every index defined for that column. If an index is not being used, drop it. This reduces the I/O incurred for data modification SQL statements, reduces RUNSTATS resource requirements, and speeds REORG and RECOVER processing.

If table data is accessed by random reads of small amounts of data, consider building a hash to improve the performance of those reads.

You should also know which predicates are Stage 1 and which are Stage 2 and try to favor Stage 1 to improve the performance of your queries.

That's all for today... but tune in again next week for more ways to influence DB2 access paths...

Wednesday, July 01, 2015

Influencing the DB2 Optimizer: Part 1

Today is July 1, 2015 - mid way through the year and the beginning a fresh, new month. With that in mind, I'm kicking off a new series of blog posts here on the various methods of influencing the DB2 Optimizer's access path decisions. The bulk of the material will be excerpted from my book - DB2 Developer's Guide - which just so happens to be on sale for 40% over the 4ht of July holidays at InformIT if you're interested in picking up a copy.

Before going into the various methods that can be used to alter access paths or influence access path selection, let's first take a moment to laud the DB2 optimizer and the fine minds that built it. The DB2 optimizer is one of the most intricate pieces of software on the market. It does an admirable job of optimizing SQL requests. To achieve this level of success, the optimizer contains a great deal of performance-specific expertise. For example, the optimizer estimates both elapsed times and CPU times when choosing an access path. When a SQL statement is rebound, the optimizer might choose a new access path that increases CPU time but decreases elapsed time.

Now many shops may choose to enhance elapsed time at the expense of additional CPU use because elapsed time has a measurable effect on user productivity. In other words, it can be good to trade off CPU cycles for user satisfaction, and the DB2 optimizer attempts to accomplish this. Of course, if both CPU and elapsed time can be reduced, the optimizer tries to do that, too.

But the DB2 optimizer is not infallible. Sometimes the application analyst or DBA understands the nature of the data better than DB2. And that is the reason that there are ways to influence the optimizer into choosing an access path that you know is a better one but the optimizer thinks is a worse one.

As the functionality and complexity of the optimizer is enhanced from release to release of DB2, the need to trick the optimizer in this way invariably diminishes. But until the day when software is perfect, the need to tweak it will remain (and I'm pretty sure that will be the case throughout my lifetime).

There are five ways to influence the optimizer’s access path decisions:

Updating DB2 Catalog statistics
Standard, DB2-based methods
Tweaking SQL statements
Specifying the OPTIMIZE FOR n ROWS clause
Using OPTHINT to indicate that an access path in the PLAN_TABLE should be chosen

Over the course of the ensuing weeks, we will examine each of these methods, and more. So stay tuned as we delve into the techniques at your disposal to influence the choices made by the DB2 optimizer.

Tuesday, April 07, 2015

SQL Performance Basics: Part 4, The Order of Predicates

Did you know that the order in which you code your predicates can have an impact on query performance? It is usually a minimal impact, but it may buy you a couple of microseconds for a very performance-critical query. In order to use predicate ordering to your advantage however, you need to be armed with some basic information on how DB2 evaluates predicates as it processes your SQL.

So, before we continue, let's review the order in which DB2 evaluates predicates at execution time. DB2 will evaluate indexable predicates first: matching predicates before non-matching. Then, Stage 1 predicates, and finally Stage 2 predicates. Within each of these four groups, DB2 will evaluate equal predicates, then BETWEEN and NOT NULL predicates, and finally, any other predicates. If more than one predicate exists within a group, then DB2 will evaluate them in the physical order in which they are coded in the SQL statement.

The re-ordering of predicates to take advantage of this situation should be considered only as a last resort. When implemented, the technique will usually shave only a little bit from the query's execution time. It is also important to note that predicate order will not impact a query's access path: it will remain unchanged (as shown in the PLAN_TABLE).

Now, how can we use this to our advantage?

Consider the following query:

SELECT LAST_NAME, FIRST_NAME
FROM EMPLOYEE
WHERE SEX = 'M'
AND TITLE = 'MANAGER'

For the purposes of this discussion, no index exists for either of the columns coded in the predicates. They are therefore the same type: stage 1 and equal predicates. Furthermore, we know our data - in our organization, there is approximately a 50-50 split between males and females, and 15% of all employees are managers.

To optimize this query then, we can swap the two predicates to achieve better performance. So the query becomes:

SELECT LAST_NAME, FIRST_NAME
FROM EMPLOYEE
WHERE TITLE = 'MANAGER'
AND SEX = 'M'

Why should this query outperform the previous version? Well, assume we have 100,000 employees. If DB2 retrieves 50% of the rows (SEX = 'M') and then retrieves 15% of those 50%, we will have processed 57,500 rows:

( 100000 * 0.5 ) + ( ( 100000 * 0.5 ) * 0.15 ) = 57,500

But, if instead, DB2 were to retrieve 15% of the rows (TITLE = 'MANAGER') and then 50% of those, we will have processed only 22,500 rows:

( 100000 * 0.15) + ( ( 100000 * 0.15 ) * 0.5 ) = 22,500

Obviously, it is better for fewer rows to qualify early, thereby reducing the answer set and the number of rows that will have to be subsequently scanned.

Sunday, August 11, 2013

The Top Ten DB2 Development Best Practices

If you have been reading my blogs lately you know that I have been posting Top Ten lists of various types, and today's post offers up yet another one. This time, the list provides guidance for DB2 developers to keep in mind as they build DB2 applications...

1. Minimize network calls

The fewer number of times you need to make calls across the network the better your program will perform.

2. Minimize passes through the data

Try to read each data item only once. The more times you access the same data the worse performance will be.

3. Put the work into the SQL, not the program

Let SQL do the work... DB2 can optimize SQL better than you can optimize your programming language of choice.

4. Unlearn the “flat file” mentality

SQL is a set-based language. Each SQL statement can operate on multiple rows of data at once. And joins are more efficient than opening multiple cursors and performing "master file" processing logic on them.

5. Be sure data type and length match in predicates

Even though DB2 no longer automatically degrades non-matching predicates to Stage 2, it is still a best practice to match the data type and length for columns and host variables that participate in predicates.

6. Know your Stage 1, Stage 2, and Indexable predicates

...and avoid Stages 3 and 4!

7. Document your code

Without proper documentation application maintenance becomes difficult... especially if you used any SQL tuning tricks/techniques.

8. Always check the SQLCODE or SQLSTATE

If you don't check for an error you may be processing with bad, incorrect, or missing data.

9. Analyze your access paths (and tune your SQL in test)

If you do not examine the access paths formulated by the DB2 optimizer for your SQL then how do you know how efficient (or not) your code is (or will be).

10. Avoid Bachelor Programming Syndrome

Be sure to code COMMITs in your application programs

Thursday, May 13, 2010

IDUG NA 2010, Days Two and Three

I’ve been running around kinda busy the past couple of days here at IDUG in Tampa, so I got a bit behind in blogging about the conference. So, today I’m combining two days of thoughts into one blog post.

(For a summary of IDUG Day One, click here.)

I started off day two by attending Brent Gross’ presentation on extracting the most value from .NET and ODBC applications. Brent discussed some of the things to be aware of when developing with .NET, an important “thing” being awareness that .NET is designed to work in a disconnected data architecture. So applications will not go through data a row at a time but instead send the data to the application and let it process it there. As an old mainframe DBA that caused alarm bells to ring.

I also got the opportunity to hear Dave Beulke discuss Java DB2 developer performance best practices. Dave delivered a lot of quality information, including the importance of developing quality code because Java developers reuse code – and you don’t want bad code being reused everywhere, right?

Dave started out mentioning how Java programmer are usually very young and do not have a lot of database experience. So DBAs need to get some Java knowledge and work closely with Java developers to ensure proper development. He also emphasized the importance of understanding the object to relational mapping method.

From a performance perspective Dave noted the importance of understanding the distributed calls (how many, where located, and bandwidth issues), controlling commit scope, and making sure your servers have sufficient memory. He also indicated that it is important to be able to track how many times Java programs connect to the database. He suggested using a server connection pool and to be sure that threads are always timed out after a certain period of time.

And I’d be remiss if I didn’t note that Dave promoted the use of pureQuery, which can be used to turn dynamic JDBC into static requests. Using pureQuery can improve performance (perhaps as much as 25 percent), as well as simplifying debugging & maintenance.

Dave also discussed how Hibernate can cause performance problems. Which brings me to the first session I attended on day three, John Mallonee’s session titled Wake Up to Hibernate. Hibernate is a persistent layer that maps Java objects to relational tables. It provides an abstraction layer between DB2 and your program. And it can also be thought of as a code generator. Hibernate plugs into popular IDEs, such as Eclipse and Rational tools. It is open source, and part of JBoss Enterprise Middleware (JBoss is a division of Red Hat).

John walked attendees through Hibernate, discussing the Java API for persistence, its query capabilities (including HQL, or Hibernate Query Language), and configuration issues. Examples of things that are configurable include JDBC driver, connection URL, user name, DataSource, connection pool settings, SQL controls (logging, log formatting), and the mapping file location.

HQL abstracts SQL. It is supposed to simplify query coding, but from what I saw of it in the session, I am dubious. John warned, too, that when HQL is turned into SQL the SQL won’t necessarily look the way you are used to seeing it. He recommended to setup the configuration file such that it formats the generated SQL or it won’t be very readable. John noted that one good thing about HQL is that you cannot easily write code with literals in them; it forces you to use parameter markers.

OK, so why can Hibernate be problematic? John talked about four primary concerns:

SQL is obscured
performance can be bad with generated code
Hibernate does not immediately support new DB2 features
Learning curve can be high

But he also noted that as you learn more about these problems -- and how Hibernate works -- that things tend to improve. Finally (at least with regard to Hibernate) John recommends that you should consider using HQL for simple queries, native SQL for advanced queries, for special situations use JDBC, and to achieve the highest performance use native DB2 SQL (e.g. stored procedure).

I also attended two presentations on the DB2 for z/OS optimizer. Terry Purcell gave his usual standout performance on optimization techniques. I particularly enjoyed his advice on what to say when someone asks why the optimizer chose a particular path: “Because it thinks that is the lowest cost access path.” After all, the DB2 optimizer is a cost-based optimizer. So if it didn’t choose the “best” path then chances are you need to provide the optimizer with better statistics.

And Suresh Sane did a nice job in his presentation in discussing the optimization process and walking thru several case studies.

All-in-all, it has been a very productive IDUG conference… but then again, I didn’t expect it to be anything else! Tomorrow morning I deliver my presentation titled “The Return of the DB2 Top Ten Lists.” Many of you have seen my original DB2 top ten lists presentation, but this one is a brand new selection of top ten lists… and I’m looking forward to delivering it for the first time at IDUG…