Tuesday, November 29, 2005

Cross Platform Help

As regular readers of this blog know, this space is devoted to DB2 for z/OS – that’s the mainframe for those of you just visiting. I’d like to take a moment here, though, to point out some interesting posts for cross-platform DB2 DBAs.

Chris Eaton, Senior Product Manager for DB2 Universal Database at IBM, writes a blog for ITtoolbox focusing on DB2 UDB for Linux, Unix, and Windows (LUW). If you use DB2 on that platform, be sure to read his blog regularly.

So why am I mentioning this on a mainframe DB2 blog? Well, Chris has recently posted some very nice entries outlining the similarities and differences between DB2 on the z/OS and LUW platforms. Here are the links to those postings for those of you interested in expanding your understanding. Each is interesting and worth reading:


Cheers!
Craig

Tuesday, November 22, 2005

Mainframes Rock!

It is good to see mainframes getting some positive press again. I'm talking about this November 17, 2005 article published in InfoWorld. It talks about a company that tried to get rid of its mainframe and replace it with first, Windows servers; and when that didn't work, Unix servers. When neither worked for them, they finally gave in and moved back to the reliable environment provided by mainframe computing.

Basically, it boils down to this. There are some workloads that just are better off being served by mainframes. This is the parallel I like to draw:

If you are going to plow a field, what animal(s) would you choose to drive your plow: a nice strong, sturdy ox (mainframe) -or- 64 chickens (Unix servers) -or- 128 gerbils (Windows servers)?

Monday, November 21, 2005

To REBIND or Not to REBIND, That is the Question

There are two basic mindsets on when to REBIND your DB2 plans and packages. The first -- which I believe is the best approach -- is to REBIND regularly after running RUNSTATS. Using this approach you will ensure that your access paths have been formulated by the DB2 optimizer using the most up-to-date information available on your data. If you fail to REBIND your static SQL you are failing to give DB2 the chance to achieve the best performance it can for your applications.

(Of course, this begs the question: "How frequently should I run RUNSTATS?" to which my answer is "As frequently as possible based on how often your data changes," but enough of this aside for now.)

Of course, the DB2 optimizer is not perfect so sometimes rebinding can cause the performance of certain SQL statements to degrade. You will have to be ready to handle these problems by using optimization hints (OPTHINT in the PLAN_TABLE) to go back to a satisfactory access path or by tweaking your SQL to achieve a better performing access path (and some people also may say "...or change the catalog statistics," but that should only be a last resort and is rarely required these days).

Additionally, we have not considered the impact and need to periodically reorganize your DB2 table spaces using the REORG utility. RUNSTATS populates the DB2 Catalog with the information you need to decide when a REORG is warranted. Of course, you would want to run RUNSTATS again after a REORG to obtain the most up-to-date statistics... and only then would you want to REBIND your plans and packages.

The second approach is the "if it ain't broke, don't fix it" approach. In this scenario, you will continue to run RUNSTATS regularly but you will not REBIND your plans and packages until performance degrades. This approach is embraced by shops that do not have the manpower or time to review all access paths after a mass REBIND. By not running REBIND the thought is that performance will continue along as is until data volumes change so significantly that end users start to complain. Only then will individual plans and packages be rebound following the next scheduled RUNSTATS or immediately if the problem is large enough. This approach will degrade performance, albeit possibly subtly over time. However, it does save DBA manpower, which might be in short supply.

Examine your shop's approach to the REBIND issue to see which approach is best for you. Although philosophically I agree with the first approach, I understand that the second approach sometimes can be preferable in practice. If you follow the second approach, be sure that you have pre-agreed Service Level Agreements (SLAs) for your DB2 applications. Then, you can reasonably argue that there is no reason to REBIND anything until you are no longer meeting the SLA.

Friday, November 11, 2005

New DB2 DBA Portal from IBM

Just a quick post to alert readers to a new portal from the IBM DeveloperWorks team. The portal is named DBA Central and it bills itself as offering resources for IBM Information Management database administrators.

So, the portal won't be 100% mainframe content, but it should have some interesting nuggets of data for mainframe DB2 DBAs.

Thursday, November 10, 2005

DB2 Compression: z/OS versus LUW

Space compression for non-mainframe DB2 is quite a bit different than it is for DB2 for z/OS. In mainframe DB2, specifying COMPRESS YES on the CREATE TABLESPACE statement will cause DB2 to implement Ziv-Lempel compression for the table space in question. Data is compressed upon entry to the database and decompressed when it is read.

For DB2 UDB on Linux/Unix/Windows, when creating a table, you can use the optional VALUE COMPRESSION clause to specify that the table is using the space saving row format at the table level and possibly at the column level. There are two ways in which tables can occupy less space when stored on disk:
  • If the column value is NULL, do not set aside the defined, fixed amount of space.
  • If the column value can be easily known or determined (like default values) and if the value is available to the database manager during record formatting and column extraction.
When VALUE COMPRESSION is used, NULLs and zero-length data that has been assigned to defined variable-length data types (VARCHAR, VARGRAPHICS, LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, and DBCLOB) will not be stored on disk. Only overhead values associated with these data types will take up disk space.

If VALUE COMPRESSION is used then the optional COMPRESS SYSTEM DEFAULT parameter can also be specified to further reduce disk space usage. Minimal disk space is used if the inserted or updated value is equal to the system default value for the data type of the column. The default value will not be stored on disk. Data types that support COMPRESS SYSTEM DEFAULT include all numerical type columns, fixed-length character, and fixed-length graphic string data types. This means that zeros and blanks can be compressed.

The two platforms vary dramatically in how they approach "compression." The mainframe actually applies an algorithm to the data to compress it into another format. Every row that is inserted must first be compressed before storing it; every row that is read must be decompressed. On LUW platforms, DB2 compression is simply a way of avoiding the storage of certain types of data that either can be determined easily, or need not be stored.

So, it is highly probable that you will get completely different results on LUW than you do on a mainframe (OS/390, z/OS). Which one is better will depend on the type of data you are storing based on the requirements of your applications.

So, when should you consider using compression? In general, use DB2 for z/OS compression for larger tablespaces where the disk savings can be significant. For very small tables, the amount of space required to store the compression dictionary may exceed the space saved by compressing the data.

What is the compression dictionary? Well, as I mentioned earlier, DB2 for z/OS compression is enabled by specifying COMPRESS YES for the tablespace in your DDL. When compression is specified, DB2 builds a static dictionary to control compression. This will cause from 2 to 17 dictionary pages to be stored in the tablespace. These pages are stored after the header and first space map page.

For partitioned tablespaces, DB2 will create a separate compression dictionary for each tablespace partition. Multiple dictionaries tend to cause better overall compression ratios. In addition, it is more likely that the partition-level compression dictionaries can be rebuilt more frequently than non-partitioned dictionaries. Frequent rebuilding of the compression dictionary can lead to a better overall compression ratio.

Avoid compressing table spaces with multiple tables in them because the compression ratio can be impacted by the different types of data in the multiple tables, and DB2 can only have one compression dictionary per table space.

But why compress data at all? Consider an uncompressed table with a large row size, say 800 bytes. Therefore, five of this table's rows fit on a 4K page. If the compression routine achieves 30 percent compression, on average, the 800-byte row uses only 560 bytes, because (800*0.3)=560. Now, on average, seven rows fit on a 4K page. Because I/O occurs at the page level, the cost of I/O is reduced because fewer pages must be read for tablespace scans, and the data is more likely to be in the bufferpool because more rows fit on a physical page. This can be a significant I/O improvement. Consider the following scenarios. A 10,000-row table with 800-byte rows requires 2,000 pages. Using a compression routine as outlined previously, the table would require only 1,429 pages. Another table also with 800-byte rows but now having 1 million rows would require 200,000 pages without a compression routine. Using the compression routine, you would reduce the pages to 142,858 - a reduction of more 50,000 pages.

Another question I am commonly asked is about overhead. Yes, there is going to be some overhead involved if you turn on compression... CPU is required to apply the Ziv-Lempel algorithm to compress upon insertion - and to de-compress upon access. Of course, this does NOT mean that overall performance will suffer if you turn on compression. Rememeber the trade-off: additional CPU in exchange for possibly improved I/O efficiency. You see, when more compressed rows fit onto a single page fewer I/O operations may be needed to satisfy your query processing needs. If you are performing a lot of sequential access (as opposed to random access) you can get improved performance because fewer I/O operations are required to access the same number of rows.

Of course. there is always the other trade-off to consider, too: disk storage savings in exchange for CPU cost of compressing and decompressing data. Keep in mind, too though, DB2 can use hardware-assisted compression if you have the right type of hardware. Hardware-assisted compression simply speeds up the compression and decompression of data -- it is not a requirement for the inherent data compression features of DB2. So, the overall cost of compression may be minimal with hardware-assisted compression. Indeed, due to I/O issues, overall elapsed time for certain I/O heavy processes may decrease when data is compressed.

You can use the DSN1COMP utility to estimate how much disk space will be saved by compressing a tablespace before deciding whether to turn compression on or not. This utility can be run on full image copy data sets, VSAM data sets that contain DB2 table spaces, or sequential data sets that contain DB2 table spaces (such as DSN1COPY output). DSN1COMP does not estimate savings for data sets that contain LOB table spaces or index spaces. Refer to the IBM Utility Guide and Reference for more information on DSN1COMP.

Of course, before you consider compression be sure to examine all of its details -- and be sure to understand all of the nuances of your particular data and applications. But don't be afraid of investigating its use... compression can be a very handy tool in the DBA's arsenal!