Tuesday, July 21, 2015
Influencing the DB2 Optimizer: Part 4 - OPTIMIZE FOR n ROWS
Thursday, July 16, 2015
Influencing the DB2 Optimizer: Part 3 - Tweaking SQL Statements
- Equality, in which a column is tested for equivalence to another column, a variable, or a literal
- Ranges, in which a column is tested against a range of values (for example, greater than, less than, or BETWEEN)
- IN, where a column is tested for equivalence against a list of values
- Stage 2 predicates
Wednesday, July 08, 2015
Influencing the DB2 Optimizer: Part 2 - Standard Methods
- Their organization
- Clustering information
- The cardinality of table spaces, tables, columns, and indexes
- The range of values for columns
- Skew and data distribution details
- Reorganize a table space when the cluster ratio of its clustering index falls below 95%. (Schedule this so that it does not affect system performance and availability.)
- Reorganize indexes when there have been a lot of INSERTs and DELETEs since the last time it was reorganized. As data is added to and removed from an index, it can cause disorganization that impacts the performance of queries using the index. The RTS columns
Also, keep in mind that online REORGs can be run thereby mitigating the impact ot availability that reorganization can cause. For your largest and most important table spaces you might want to consider advanced REORG utilities such as those offered by third party vendors.
You should also know which predicates are Stage 1 and which are Stage 2 and try to favor Stage 1 to improve the performance of your queries.
That's all for today... but tune in again next week for more ways to influence DB2 access paths...
Wednesday, July 01, 2015
Influencing the DB2 Optimizer: Part 1
- Updating DB2 Catalog statistics
- Standard, DB2-based methods
- Tweaking SQL statements
- Specifying the OPTIMIZE FOR n ROWS clause
- Using OPTHINT to indicate that an access path in the PLAN_TABLE should be chosen
Tuesday, April 07, 2015
SQL Performance Basics: Part 4, The Order of Predicates
So, before we continue, let's review the order in which DB2 evaluates predicates at execution time. DB2 will evaluate indexable predicates first: matching predicates before non-matching. Then, Stage 1 predicates, and finally Stage 2 predicates. Within each of these four groups, DB2 will evaluate equal predicates, then BETWEEN and NOT NULL predicates, and finally, any other predicates. If more than one predicate exists within a group, then DB2 will evaluate them in the physical order in which they are coded in the SQL statement.
The re-ordering of predicates to take advantage of this situation should be considered only as a last resort. When implemented, the technique will usually shave only a little bit from the query's execution time. It is also important to note that predicate order will not impact a query's access path: it will remain unchanged (as shown in the PLAN_TABLE).
Now, how can we use this to our advantage?
Consider the following query:
SELECT LAST_NAME, FIRST_NAME
FROM EMPLOYEE
WHERE SEX = 'M'
AND TITLE = 'MANAGER'
For the purposes of this discussion, no index exists for either of the columns coded in the predicates. They are therefore the same type: stage 1 and equal predicates. Furthermore, we know our data - in our organization, there is approximately a 50-50 split between males and females, and 15% of all employees are managers.
To optimize this query then, we can swap the two predicates to achieve better performance. So the query becomes:
SELECT LAST_NAME, FIRST_NAME
FROM EMPLOYEE
WHERE TITLE = 'MANAGER'
AND SEX = 'M'
Why should this query outperform the previous version? Well, assume we have 100,000 employees. If DB2 retrieves 50% of the rows (SEX = 'M') and then retrieves 15% of those 50%, we will have processed 57,500 rows:
( 100000 * 0.5 ) + ( ( 100000 * 0.5 ) * 0.15 ) = 57,500
But, if instead, DB2 were to retrieve 15% of the rows (TITLE = 'MANAGER') and then 50% of those, we will have processed only 22,500 rows:
( 100000 * 0.15) + ( ( 100000 * 0.15 ) * 0.5 ) = 22,500
Obviously, it is better for fewer rows to qualify early, thereby reducing the answer set and the number of rows that will have to be subsequently scanned.
Sunday, August 11, 2013
The Top Ten DB2 Development Best Practices
The fewer number of times you need to make calls across the network the better your program will perform.
Try to read each data item only once. The more times you access the same data the worse performance will be.
Let SQL do the work... DB2 can optimize SQL better than you can optimize your programming language of choice.
SQL is a set-based language. Each SQL statement can operate on multiple rows of data at once. And joins are more efficient than opening multiple cursors and performing "master file" processing logic on them.
Even though DB2 no longer automatically degrades non-matching predicates to Stage 2, it is still a best practice to match the data type and length for columns and host variables that participate in predicates.
Without proper documentation application maintenance becomes difficult... especially if you used any SQL tuning tricks/techniques.
If you don't check for an error you may be processing with bad, incorrect, or missing data.
If you do not examine the access paths formulated by the DB2 optimizer for your SQL then how do you know how efficient (or not) your code is (or will be).
Thursday, May 13, 2010
IDUG NA 2010, Days Two and Three
I’ve been running around kinda busy the past couple of days here at IDUG in Tampa, so I got a bit behind in blogging about the conference. So, today I’m combining two days of thoughts into one blog post.
(For a summary of IDUG Day One, click here.)
I started off day two by attending Brent Gross’ presentation on extracting the most value from .NET and ODBC applications. Brent discussed some of the things to be aware of when developing with .NET, an important “thing” being awareness that .NET is designed to work in a disconnected data architecture. So applications will not go through data a row at a time but instead send the data to the application and let it process it there. As an old mainframe DBA that caused alarm bells to ring.
I also got the opportunity to hear Dave Beulke discuss Java DB2 developer performance best practices. Dave delivered a lot of quality information, including the importance of developing quality code because Java developers reuse code – and you don’t want bad code being reused everywhere, right?
Dave started out mentioning how Java programmer are usually very young and do not have a lot of database experience. So DBAs need to get some Java knowledge and work closely with Java developers to ensure proper development. He also emphasized the importance of understanding the object to relational mapping method.
From a performance perspective Dave noted the importance of understanding the distributed calls (how many, where located, and bandwidth issues), controlling commit scope, and making sure your servers have sufficient memory. He also indicated that it is important to be able to track how many times Java programs connect to the database. He suggested using a server connection pool and to be sure that threads are always timed out after a certain period of time.
And I’d be remiss if I didn’t note that Dave promoted the use of pureQuery, which can be used to turn dynamic JDBC into static requests. Using pureQuery can improve performance (perhaps as much as 25 percent), as well as simplifying debugging & maintenance.
Dave also discussed how Hibernate can cause performance problems. Which brings me to the first session I attended on day three, John Mallonee’s session titled Wake Up to Hibernate. Hibernate is a persistent layer that maps Java objects to relational tables. It provides an abstraction layer between DB2 and your program. And it can also be thought of as a code generator. Hibernate plugs into popular IDEs, such as Eclipse and Rational tools. It is open source, and part of JBoss Enterprise Middleware (JBoss is a division of Red Hat).
John walked attendees through Hibernate, discussing the Java API for persistence, its query capabilities (including HQL, or Hibernate Query Language), and configuration issues. Examples of things that are configurable include JDBC driver, connection URL, user name, DataSource, connection pool settings, SQL controls (logging, log formatting), and the mapping file location.
HQL abstracts SQL. It is supposed to simplify query coding, but from what I saw of it in the session, I am dubious. John warned, too, that when HQL is turned into SQL the SQL won’t necessarily look the way you are used to seeing it. He recommended to setup the configuration file such that it formats the generated SQL or it won’t be very readable. John noted that one good thing about HQL is that you cannot easily write code with literals in them; it forces you to use parameter markers.
OK, so why can Hibernate be problematic? John talked about four primary concerns:
- SQL is obscured
- performance can be bad with generated code
- Hibernate does not immediately support new DB2 features
- Learning curve can be high
But he also noted that as you learn more about these problems -- and how Hibernate works -- that things tend to improve. Finally (at least with regard to Hibernate) John recommends that you should consider using HQL for simple queries, native SQL for advanced queries, for special situations use JDBC, and to achieve the highest performance use native DB2 SQL (e.g. stored procedure).
I also attended two presentations on the DB2 for z/OS optimizer. Terry Purcell gave his usual standout performance on optimization techniques. I particularly enjoyed his advice on what to say when someone asks why the optimizer chose a particular path: “Because it thinks that is the lowest cost access path.” After all, the DB2 optimizer is a cost-based optimizer. So if it didn’t choose the “best” path then chances are you need to provide the optimizer with better statistics.
And Suresh Sane did a nice job in his presentation in discussing the optimization process and walking thru several case studies.
All-in-all, it has been a very productive IDUG conference… but then again, I didn’t expect it to be anything else! Tomorrow morning I deliver my presentation titled “The Return of the DB2 Top Ten Lists.” Many of you have seen my original DB2 top ten lists presentation, but this one is a brand new selection of top ten lists… and I’m looking forward to delivering it for the first time at IDUG…