Tuesday, February 02, 2016

The Most Misunderstood Features of DB2 – Part 2: Optimize vs. Limited Fetch

Welcome to Part 2 in my on-going blog series on The Most Misunderstood Features of DB2. In Part 1 of the series we tackled the topic of locking, which IMHO is easily the most misunderstood feature of DB2 (probably of most DBMSes).

Today's topic is a brief one, but one that I've found folks to be confused about. Namely, the difference between the OPTIMIZE FOR and FETCH x ROWS ONLY clauses.

The OPTIMIZE FOR n ROWS and FETCH FIRST n ROWS ONLY are similar when it comes to influencing the optimizer. Both will provide help to the optimizer about the scale of the query (that is, the number of rows to expect when optimizing the access path to the data).

But, the two clauses have different purposes. If all you want to do is influence the optimizer then you should use OPTIMIZE FOR n ROWS as its usage does not limit the number of rows that can be fetched when the statement is executed. For example, you might specify OPTIMIZE FOR 1 ROW on a SELECT statement. But if 500 rows qualify then you can FETCH all 500 rows at run time. 

(Note: the specific numbers used in the previous paragraph are not important; it is important only that you know the number of rows specified in the OPTIMIZE clause does not inhibit the number of rows that can later be accessed by the query/program.)

The FETCH FIRST n ROWS ONLY clause is different. It actually limits the number of rows that can be fetched when the query is run. If you specify the clause FETCH FIRST 5 ROWS ONLY, then only 5 rows can be fetched - even if 100 or 1000 or more actually qualify. And yes, the number of rows you specify for FETCH FIRST will be provided to the optimizer for access path formulation, but it also will change the way the query behaves when it runs. 

So, when you write a SELECT statement that qualifies a large number of rows, but you only need a few rows at run time, it is a good idea to use FETCH FIRST n ROWS ONLY. On the other hand, if you are simply trying to tweak query performance by getting the optimizer to consider a different number of qualifying rows, then you should consider using OPTIMIZER FOR n ROWS.

The two statements are quite different and it is important that developers not confuse the two!

Tuesday, January 26, 2016

The Most Misunderstood Features of DB2 – Part 1: Locking


Today I am introducing a new series of blog posts here on misunderstood DB2 features and functions. But before I start this blog post I want to emphasize that this is just my opinion. I’m sure many of you have your own ideas of the DB2 features that are most misunderstood. But please, take a moment to consider my thoughts here… and then share your own in the comments section below!

Locking!

One of the most misunderstood features of DB2 is how locking works. This is probably to be expected because database locking is a rather complex topic and it is constantly morphing with each and every new version of DB2.

First of all you have to understand the lock options available and how DB2 locks at each level: table space, table, page and row. You need to understand share and exclusive locks and that the manner in which DB2 takes them depends upon the SQL statements you issue, the program bind options you choose, and even, in some cases, the DDL options you’ve chosen.

And more. What about claims and drains? Do you know the different between locks and latches? What about the difference between lock suspension, timeouts, and deadlocks?

Then there is the whole issue of lock avoidance and what that means. I visit a lot of shops each year and I still see a lot of ambiguous cursors. Simply adding FOR READ ONLY to all of the cursors in your program where you will not be modifying the data can make a world of performance difference because DB2 can avoid locking.

There is also the frequently-ignored need for a commit strategy. Every batch program should have a commit strategy implemented whereby modifications are committed to the database and locks are released. Failing to do this causes other programs to timeout waiting for locks. I wrote about this in the past and I call it Bachelor Programming Syndrome.

Another locking issue is the whole dirty read everywhere problem. Programmers learn that they can improve performance by adding WITH UR to the end of their SQL statements in their programs. UR, Uncommitted Read – aka dirty read, causes DB2 to not take locks. But that can cause problems because the program is reading uncommitted data – and that data it is reading may never actually exist in the database (it could be rolled back) and may not be consistent because it is not reading only committed units of work. I’ve visited shops that use WITH UR everywhere and they insist that they have not encountered any problems. To which I always add, “…yet.”

Then there are the new features added in the last few releases of DB2, including improved lock efficiencies when accessing the DB2 Catalog, the ability to access currently committed data, optimistic locking, skipping locked data, and more.

Summary

Yes, it is easy to misunderstand locking, and many do so. Taking the time to bolster your knowledge of how DB2 locks, the locking options available, and how they work can do you and your shop a world of good.


And be sure to check back here for future blogs about misunderstood DB2 features!

Monday, January 11, 2016

The Impact of Data Volume on Operational Databases

Operational databases are growing in size for many reasons, not the least of which is the growing importance of big data and analytics projects. There is the overarching trend of more and more data being generated every year. But also, there is the growing need to store more data for longer periods of time due to regulatory and compliance issues. Some organizations and business have encountered the need to store certain types of data for 100 years or more (as this video and this storage project point out).

But I doubt that I really need to convince you that your databases are growing in size. Most DBAs experience the reality of data growth every day.

As data volumes expand, it impacts operational databases in two ways:
  1. additional data stresses transaction processing and can cause performance slow downs, and;
  2. database administration tasks are negatively impacted.
In terms of performance, the more data in the operational database, the less efficient transactions running against that database tend to be. Table scans must reference more pages of data to return a result. Indexes grow in size to support larger data volumes, causing access by the index to degrade because there are more levels to traverse to return an answer. Such performance impacts are causing many companies to seek solutions that offload older data to either reference databases or to archive data stores.

The other impact, database administration complexity, causes longer processing time and outages to perform such functions as backups, unloads, reorganizations, recoveries, and disaster recoveries.  The larger the underlying data sets for your tables and table spaces, the longer it takes to run administrative utilities for them. In many cases the lengthened outages can become unacceptable, causing companies to again seek ways to lighten up the operational databases... or perhaps acquire next generation utility technology that understands the reality of large DB2 database objects.

But even though we want to keep all of that additional data, there is no reason that it necessarily has to be stored in operational databases that run the business. For many reasons, you probably want to separate active data from historical data. 

Some companies create purge jobs for all (or many) of their tables to remove data from the production databases as it ages. This can be an acceptable approach to reduce the size of your operational databases. But it also means that the data, which you might want to keep for analytical purposes, is lost. Another approach is to archive the data. Archiving data and purging data are two different processes. When data is purged, it is removed from the operational database and discarded. But archived data is removed from the operational database and maintained in an archive data store. The archive might be a flat file, another relational table or to HDFS using Hadoop.

The bottom line is that it makes sense for us, as DBAs, to keep any eye on the size of our operational databases and take action when production workload is impacted.  

Monday, January 04, 2016

A Lot of Extraneous Data Sets?

In a recent blog post here I talked about a quick and dirty method of converting your partitioned table spaces from index-controlled to table-controlled. If you haven't read that post, take a moment to click over and read it here: Easily Convert to Table-Controlled Partitioning.

The reason I bring this up today is that I received an interesting e-mail from a long-time friend and DB2 DBA who read the post and had some information to share with me. He told me about how his organization used one of my tips to drop unused indexes as part of this process.

He told me that during the conversion process they dropped a lot of the clustering indexes because they weren't being used for access paths or for uniqueness. And they were able to release an "astonishing 4,100 data sets" by doing so!

Now I am not suggesting that every shop will be able to experience a similar savings, but if you have indexes that have no purpose other than enforcing index-controlled partitioning, it is time to bite the bullet and drop those indexes as you convert to table-controlled-partitioning (and then on to Universal table spaces).

And when you convert, please drop a note here on the blog to let us know how your conversion efforts went!

Tuesday, December 15, 2015

Happy Holidays!

Well, it is that time of year again. The days are shorter and the weather is colder... even if it isn't as cold as normal it is colder than it was in July! And most people are taking the time to celebrate the holiday season. 

Here's wishing each and every one of my readers a happy holiday... regardless of your chosen season to celebrate! Whether you celebrate Chanukah, Christmas, Kwanzaa, the Winter Solstice, Saturnalia, or just the end of another year on Planet Earth, I'm with you, and celebrating my good fortune, great family and friends, and you, my regular blog readers. I appreciate and thank you all...

This will be the final post of the year (2015) for this blog, but be sure to join me again next year - 2016 - as we continue to examine all aspects of everybody's favorite DBMS... IBM's DB2...