Saturday, December 29, 2007

Mainframe Career Still a Good Idea!

Just a quick post today to point you over to an article that ran in eWeek Magazine titled Big Iron Remains Career Option. Those of us who have worked with mainframes for decades know that mainframes still run the bulk of the mission-critical, high transaction workload for the biggest companies. And we know that you can make a good career out of mainframe development, administration, and management.

But it is good news that others are starting to hear this message.

Tuesday, December 18, 2007

Indexing the DB2 Catalog [DB2 9 for z/OS]

First of all, I want to apologize for the lack of new posts here the last couple of weeks. I've been busy traveling and preparing for the holidays -- as have most of you I guess, at least the holiday preparation part! Anyway, this will likely be my last post this year and it will be a short one at that. But be sure to check back again in the new year (2008) as I will begin posting a bit more regularly again (hopefully).

In today's post I want to tout a small but helpful improvement in DB2 9 for z/OS that makes it easier to build all of the indexes you want on your DB2 Catalog tables. As you know, even though the DB2 Catalog ships with a set of pre-defined indexes from IBM, you can add your own for improving the performance of your catalog queries. Some DB2 tools vendors even ship recommendations of indexes for you to build to make their solutions work more efficiently.

Well, prior to V9 the limit for user-defined defined indexes on the DB2 Catalog was 100. Over time, some shops have bumped up against this limit and were unable to define any additional indexes.

The good news is that DB2 V9 ups the limit to 500 user-defined catalog indexes.

That's all for today... and I wish all of my readers a very happy holiday season!

Monday, November 26, 2007

UPDATE SCHEMA and CATMAINT [DB2 9 for z/OS]

Welcome back to my blog as I continue our examination of the new features of DB2 9 for z/OS. Today we will look at the new UPDATE SCHEMA capability of the CATMAINT utility.

Have you ever wanted to make a global change to a schema, owner, creator, or VCAT name for your DB2 objects? Well, you can do that with CATMAINT in DB2 9 for z/OS using new UPDATE SCHEMA options.

There are three (3) new options added to CATMAINT, namely:

  • SCHEMA: Owner, creator and schema names can be changed using this option.
  • VCAT: Indexes, table spaces and storage groups can be altered to use a different ICF or VCATNAME using this option.
  • OWNER: Ownership of objects can be changed to a role using this option.

To use any of these options you must be in DB2 9 NFM and have Install SYSADM authority.

How does it work? Well, let’s take a look at a few examples, starting with the SCHEMA option. To rename the owner, creator, and schema of database objects, plan, and packages, we will run CATMAINT specifying the SCHEMA SWITCH option. This process updates every owner, creator or schema name in the catalog and directory that matches the specified schema_name value. Importantly, all GRANTs that were made by or received by the original owner are changed to the new owner. Ownership of objects is not changed if the owner is a role.

So if we want to change OLDNAME to NEWNAME we can code the following CATMAINT job:

CATMAINT UPDATE
SCHEMA SWITCH(OLDNAME, NEWNAME)

You can change multiple names by repeating the SWITCH keyword, but you are not allowed to code the same name more than once.

Be aware though, when the schema name of an object is changed, any plans or packages that are dependent on the object are invalidated. If you do not REBIND those plans and packages an automatic REBIND will occur the next time you execute any of those programs.

Here is another example, this time for the VCAT option. To change the VCAT name that is used by storage groups or by index spaces and table spaces, we can run CATMAINT specifying the VCAT SWITCH option. This option is similar to using the ALTER TABLESPACE USING VCAT statement for changing the VCAT name. You need to move the data for the affected indexes or table spaces to the data set on the new catalog in a separate step.

So if we want to change OLDVCAT to NEWVCAT we can code the following CATMAINT job:

CATMAINT UPDATE
VCAT SWITCH(OLDVCAT, NEWVCAT)

You can change multiple VCAT names by repeating the SWITCH keyword, but you cannot specify the same name more than once. There are several restrictions to this option that you should research in the IBM manuals before attempting to switch VCAT names.

The final option is the OWNER option. It is used for changing the ownership of objects from a user to a role. Roles are new in DB2 9 and are associated with a TRUSTED CONTEXT. This will be the subject of a future blog posting here on the DB2portal blog – so keep an eye out for that one soon.

For example, if we want to switch ownership of objects for OWNER1, OWNER2 and OWNER3 to a role, we can run CATMAINT as follows:

CATMAINT UPDATE
OWNER FROM(OWNER1, OWNER2, OWNER3) TO ROLE

You must be running under a trusted context with a role to run this utility. The current role will become the owner. Privileges held on the object will be transferred from the original owner to the role.

A final caveat: be sure to create backups of your DB2 Catalog and DB2 Directory before running this CATMAINT to switch SCHEMA, VCAT, or OWNER.

Monday, November 19, 2007

Free Database Archiving Webinar

Attend a free webinar on database archiving on November 28, 2007 at 10:30 am CST.

The webinar, which I will be co-hosting with Bill Baker, will tackle the issue of skyrocketing data volumes in conjunction with increasing governmental regulations that impact the duration of data retention. These dual trends (among several others) conspire to make it difficult to ensure that your organization is in compliance regarding its operational data. As required data retention periods increase it becomes imperative that organizations develop a solid practice for archiving and managing business data from their online operational databases.

Now you may have read about database archiving either in my blogs (Long-Term Data Retention Drives Database Archiving, Data Management Today blog) or elsewhere, but here is a chance to attend a free presentation on the topic to learn all about the trends, troubles, and nuances involved in successfully archiving database data.

I hope you can find the time to attend!

Wednesday, November 07, 2007

BACKUP and RESTORE SYSTEM [DB2 9 for z/OS]

I am posting today’s blog entry from Athens, Greece as I participate in the European IDUG conference. Good thing I know how to use the Blogger site because when I log in over here in Greece the text on their site is all converted into Greek - and as I'm sure comes as no surprise to anyone, I don't understand Greek!

Anyway, today's post will be about the improvements IBM has made to the BACKUP SYSTEM and RESTORE SYSTEM utilities in DB2 9 for z/OS. And this will be the final entry in this series on Version 9 features discussing utility improvements… it will not be the last in the series on V9 improvements though, just the last one on the utilities.

Also, please keep in mind that these blog posts are meant to deliver a flavor of the new functionality in DB2 9 for z/OS. They will not cover every nuance and detail of what V9 has to offer. With that said, let’s dive into the enhancements to the BACKUP and RESTORE SYSTEM utilities.

Overview

As most of you surely know, BACKUP SYSTEM and RESTORE SYSTEM were are relatively new utilities, added to DB2 as of Version 8. They use disk volume FlashCopy backups and copypool z/OS DFSMShsm V1R5 constructs to copy and restore large volumes of DB2 data. In DB2 V9 these utilities are enhanced to use new functions available with z/OS V1R8 DFSMShsm.

Recovery of Individual Database Objects

In V9, backups produced by BACKUP SYSTEM (aka system level backups) can be used to recover individual table spaces or index spaces. This is helpful because previously you had to recover the entire system, and that is not always what is necessary.

When you wish to recover a subset of a system level backup you will use the RECOVER utility instead of RESTORE SYSTEM. Before your RECOVER jobs can use system level backups you must first set the SYSTEM_LEVEL_BACKUPS DSNZPARM option to YES. This can be set from the DSNTIP6 install panel. If you specify YES then your system-level backups will be considered in object level recoveries (along with your other image copy backups).

If you wish to use your system level backups for individual database object recoveries then you need to make sure that you are copying your indexes (specifying COPY YES).

Why would you want to use your system level backups in this way? Well, doing so should enable you to reduce the frequency with which you are taking conventional image copies. If you take a daily system level backup, then the database objects that you were also backing up on a daily basis may not be required. Of course, you cannot completely forgo all individual image copies because the system level backup timing may not conform to the timing needed for each object based on application requirements, and of course, image copies will still be needed after running utilities like LOAD REPLACE and REORG LOG NO to resolve copy pending situations.

Tape Support for BACKUP SYSTEM

DB2 V9 also delivers the ability for the BACKUP SYSTEM utility to copy the data directly to tape. The new parameters allowing this capability are the DUMP and DUMPONLY options.

The output of the DUMP or DUMPONLY is directed to a DFSMShsm dump class, which specifies the unit type the data will be directed to. Although IBM implemented this change to enable tape support, an SMS dump class is not restricted to tape.

Keep in mind that directing data to tape will have an impact on the speed of your restore. Restoring from tape will not be as fast as restoring from a FlashCopy made to disk. Of course, having your data on tape can help in terms of storage management, disaster recovery and off-site data storage, and long-term data retention. So be aware of these trade-offs before creating system level backups on tape.

Additionally, recognizing that copying data to tape can be time-consuming IBM has added a new keyword, FORCE, to enable a new backup to be started even if a previous DUMP has not yet completed. Of course, FORCE should not be used all the time - - only be used when it is very critical that a new backup be started.

Incremental FlashCopy

And finally, support for incremental copying has been added to FlashCopy. So now you can take a system level backup and then subsequent incremental system level backups. An incremental FlashCopy will copy only the tracks that have changed on the source volume since the last copy was taken. But unlike a typical incremental image copy, the previous content on the volume(s) will be replaced by the new content. That means there is no merging of incrementals required; essentially, the merge is part of the incremental FlashCopy.

I won’t go into all of the gory details here but this new functionality can greatly minimize I/O activity for system level backups.

Summary

So, to sum things up, the ability to work with system level backups becomes easier in V9 because you can recover individual table spaces and indexes from a system level backup without having to restore the entire backup, you can make system level backups directly to tape, and we get the ability to do incremental system level backups. All in all, some nice new features for BACKUP SYSTEM and RESTORE SYSTEM, wouldn't you say?

Monday, October 29, 2007

Improvements to the REORG Utility [DB2 9 for z/OS]

As we continue to wind our way through the multiple improvements that DB2 V9 brings to the IBM utilities, it is time to address the enhancements made to the REORG utility.

Performance

First up, let’s talk about performance improvements. In DB2 9 for z/OS, the REORG utility can unload and reload partitions in parallel. This should result in a nice reduction in elapsed time when you are reorganizing partitioned table spaces. To enable this improvement you will need to code the NOSYSREC keyword or the UNLDDN keyword with a template. Note that NOSYSREC is always used for SHRLEVEL change.

Parallelism will not be enabled if any of the following conditions apply:

  • DATAWKnn ddnames are specified in the JCL
  • SORTDEVT keyword is not specified
  • UTPRINT is allocated to anything other than SYSOUT
  • The REBALANCE keyword is used.

Also consider that REORG SHRLEVEL CHANGE can use subtasks during the LOG phase to speed up the processing of log records. Subtasks will be used when unload and reload have been done by partition and there are log records to apply for the partition.

BUILD2 Phase Eliminated in Online REORG

As you probably know, an Online REORG uses shadow data sets. The reorganization is done to the shadows while the data remains available in the primary data sets. When the REORG is done, the shadow data sets are switched to be the primary data sets.

As most DBA know, reorganization by partition can create a long outage when non-partitioned indexes (NPIs) exist on the table. Why is this so? Well, prior to V9, during Online REORG, NPIs had to be updated with the new RIDs for the data of the partition being reorganized during the BUILD2 phase. And that is what typically caused the sometimes significant outage.

With DB2 V9, the BUILD2 phase is eliminated. Instead, the entire NPI will be reorganized and then switched in the SWITCH phase along with all of the other data sets. The only outage is during the SWITCH phase, which will be much, much shorter than the BUILD2 phase outage.

As a result of the elimination of the BUILD2 phase, DB2 will need additional temporary storage for the shadow data sets for each NPI. Additionally, the cost of the REORG will increase because building an entire NPI will consume more CPU than the previous method of updating RIDs in the BUILD2 phase. Finally, you will need to modify any REORG jobs against different partitions that ran in parallel because the entire NPI will be built and then switched.

What about running REORG SHRLEVEL REFERENCE by partition? Well, REORG SHRLEVEL REFERENCE by partition will also rebuild any associated NPIs using shadows. So the same type of concerns apply (more disk storage, additional CPU, no separate jobs in parallel).

Reorganization and LOBs

Finally, as I blogged about before in the posting on LOBS, prior to V9 you could not access LOB data during a REORG. And a REORG did not reclaim physical space from the LOB data set because LOBs were moved within the existing LOB table space. V9 fixes these problems. During a REORG (in V9), the original LOB table space is drained of writers. All LOBs are then extracted from the original data set and inserted into a shadow data set. When this operation is complete, all access to the LOB table space is stopped (the readers are drained) while the original data set is switched with the shadow data set. At this point, full access to the new data set is enabled, and an inline copy is taken to ensure recoverability of data.

Wednesday, October 24, 2007

Miscellaneous Utility Enhancements [DB2 9 for z/OS]

Today we will continue along with our coverage of new features and functionality oif DB2 9, specifically within the IBM DB2 utilities. Here we will discuss several miscellaneous changes introduced to the IBM DB2 utilities in version 9.

One of these small changes comes to us with the DSN1LOGP utility. The utility will now detect possible erroneous recovery information. For example, consider a situation where you specify a range of log records to print but the entire range is no longer recorded in the BSDS. This can happen if your archive logs have rolled off. Prior to DB2 V9 DSN1LOGP would have just merrily printed the records for you. But because this can be confusing and c ould lead you to recoverying improperly, V9 will detect this situation, produce a RC 4 and the DSN1224I error message Indicating logs could not be found.

Another nice new utility-related enhancement made to DB2 9 for z/OS is that the DB2 Utilities Panel within DB2I adds an option for you to specify the SDSNLOAD library. This is useful when there are more than one DB2 subsystems on an LPAR.

DSN1COPY has been enhanced, too. As of DB2 9 for z/OS, running DSN1COPY RESET to copy a compressed table space from a non-data sharing DB2 subsystem to another non-data sharing subsystem will copy the specified data and reset the PGLOGRBA of each page and the dictionary version. In V7 and V8 the dictionary version was not reset (instead, you had to REORG the copied table space with KEEPDICTIONARY=NO in the target subsystem). This was not an issue for data sharing because the LRSN is used instead of an RBA.

Keep an eye out for future editions of this blog where we will go over the V9 changes for REORG and the BACKUP and RESTORE SYSTEM utilities.

Wednesday, October 10, 2007

RECOVER Improvements [DB2 9 for z/OS]

Our examination of the new utility functionality in DB2 9 for z/OS continues today as we uncover what’s new with the IBM RECOVER utility.

Point-In-Time Consistency

The most significant enhancement centers on recovering to a point-in-time (PIT) with consistency. As any DBA who has ever been charged with recovering an operational database to a prior PIT knows, it can be challenging to accomplish. Of course, if you have a usable QUIESCE point then point in time recovery is easy. But that means you’d have to have planned ahead of time and taken a QUIESCE, which is not always possible with heavy 24x7 workloads.

Well, DB2 V9 offers some relief because the IBM RECOVER utility is enhanced to automatically identify uncommitted work at the PIT to which you are recovering. After detecting the uncommitted work DB2 will rollback any changes made to the database objects that are being recovered. This means that anything that is being recovered will be in a transactionally consistent state. Of course, you still have to make sure that all of the appropriate objects are participating in the recovery.

This works with for PIT recoveries executed with either the TOLOGPOINT or TORBA keywords only. To accomplish this two new phases are added after the LOGAPPLY phase: LOGSCR and LOGUNDO.

The LOGSCR phase runs for each DB2 member with an active unit of recovery (UR). If no UR was active then this phase will be skipped. During the LOGCSR phase, all active and incomplete URs are identified. This includes URs that were:
  • INFLIGHT
  • INABORT
  • INDOUBT
  • POSTPONED ABORT

If any URs were found, the LOGUNDO phase runs. This is where RECOVER backs out
the changes made on recovered objects by active URs. In a data sharing environment, this
will be done one member at a time.

Additional Information on the Progress of Your Recovery

Another improvement made to the IBM RECOVER utility in V9 is the ability to gather more information about the progress of the recovery. Any DBA who has been tasked with sitting around and monitoring recoveries what the DISPLAY command to show accurate, up-to-date, and appropriate information about the status of the recovery.

Well, DB2 V9 enhances the -DISPLAY UTILITY command to provide additional information during the LOGAPPLY phase. DISPLAY now will show the range of the log that needs to be applied, for the database objects that are being recovered. It will also show the progress up to the last commit, as well as the elapsed time since the start of the log apply phase of the recovery.

This information will assist the DBA in determining how long it will take for the recovery job to successfully complete.

Using an Earlier Image Copy

One final enhancement made to the RECOVER utility in V9 gives the DBA control to get DB2 to recover using an earlier image copy. This is accomplished by specifying a point in the log before which DB2 must select an image copy for the recovery.

This technique can be particularly helpful when the most recent image copy is damaged or somehow unusable. You can use an earlier copy by providing an RBA/LRSN value to the new RESTOREBEFORE parameter that is at a point on the log prior to when the unusable image copy was taken.

Monday, October 08, 2007

LOAD and UNLOAD Enhancements [DB2 9 for z/OS]

This week we continue our exploration of the features and functionality of DB2 9 for z/OS with a look at what’s new in the IBM LOAD and UNLOAD utilities. The first question people seem to ask with regard to LOAD is “Has performance been improved?” And the answer is yes, due mostly to improvements in handling indexes. IBM reports that performance can improve up to 24 percent if a table has mostly variable keys.

Handling DECFLOAT Data

In terms of functionality, the first change we’ll discuss is how to deal with DECFLOAT data type in LOAD (and UNLOAD). If you do not recall what the DECFLOAT data type is, refresh your memory here.

The DECFLOAT data type is compatible with SMALLINT, INTEGER, BIGINT, DECIMAL, FLOAT, DECFLOAT. Of course, you will have to concern yourself with how the data is handled if it is loaded to or from DECFLOAT. You specify how to handle this manipulation using the DECFLOAT_ROUNDMODE parameter, which can be used in both the LOAD and UNLOAD utilities.

The following DECFLOAT_ROUNDMODE specifications are allowed:

  • ROUND_CEILING - Round towards positive infinity.
  • ROUND_DOWN - Round towards zero.
  • ROUND_FLOOR - Round towards negative infinity.
  • ROUND_HALF_DOWN - Round to nearest; if equidistant, round down.
  • ROUND_HALF_EVEN - Round to nearest; if equidistant, round so that the final digit is even.
  • ROUND_HALF_UP - Round to nearest; if equidistant, round up.
  • ROUND_UP - Round away from zero.

If you do not specify DECFLOAT_ROUNDMODE, the LOAD statement uses the DFPDEFDM value in DSNHDECP as the default value.

Loading LOBs

Loading and unloading LOBs has been improved. For LOAD, an input field value can contain the name of the file that contains a LOB column value. The LOB column value will then be loaded from that file. For UNLOAD, you can store the value of a LOB column in a file and record the name of the file in the unloaded record in the base table.

LOAD, LOBs, and SORTKEYS NO

When a LOAD SHRLEVEL NONE of a table with a LOB column, fails, it can only be restarted with RESTART(CURRENT). This has been the case since LOBs were introduced to DB2.

However, when the SORTKEYS keyword was introduced in V5, a restriction with its use was that a LOAD that abended in the RELOAD, SORT, or BUILD phases could only be restarted with RESTART(PHASE). Restart will convert any RESTART option to RESTART(PHASE) when SORTKEYS is specified. This situation limited our options when LOBs were involved.

In V8 SORTKEYS became the default, so SORTKEYS and LOB restart processing were incompatible. In V9 this is fixed:

  • You can turn SORTKEYS off by specifying SORTKEYS NO.
  • And you can now use RESTART(PHASE) for LOAD REPLACE of a table with a LOB.

If you run LOAD REPLACE SHRLEVEL NONE on a table with a LOB, specifying SORTKEYS NO means that you can restart with either RESTART(CURRENT) or RESTART(PHASE). Without the SORTKEYS NO you can only use RESTART(PHASE).

If you run LOAD RESUME YES SHRLEVEL NONE on a table with a LOB, you must specify SORTKEYS NO if you want the job to be restartable; only RESTART(CURRENT) will be available.

Skipping Locked Rows

Recall from an earlier post here that you can skip locked rows in your transactions using the SKIP LOCKED DATA option within your SQL statements. Well, guess what, you can skip locked rows when running an UNLOAD, too.

If you are running an UNLOAD with SHRLEVEL CHANGE ISOLATION CS, you can also specify SKIP LOCKED DATA. This will cause the UNLOAD utility to skip rows on which incompatible locks are held by other transactions. This option only applies to table spaces with row level or page level locking.

Tuesday, October 02, 2007

More Online Utilities [DB2 9 for z/OS]

Today we continue our investigation into the new functionality in DB2 9 for z/OS... and specifically into the improvements made to IBM utilities for version 9. We will turn our attention to improved availability with IBM DB2 utilities. Under DB2 V9, three more utilities can run online – that is, with SHRLEVEL CHANGE.

First up we have CHECK DATA. This utility is executed when you want to make sure that the data in your tables matches your integrity constraints. Prior to V9, access to tables (or table spaces) is read only when CHECK DATA is being run. This means that as long as CHECK DATA is executing you cannot modify data. If you have complex referential sets and/or very large databases you likely will have issues with running CHECK DATA.

You can run CHECK DATA online – that means with SHRLEVEL CHANGE – as of DB2 V9. As with other online utilities, an online CHECK DATA will work with on a shadow copy of the data and indexes. DB2 makes the shadow copy using DFSMS ADRDSSU. This copy can be super fast if you have data set level FlashCopy V2 enabled. Even if you do not have FlashCopy availability will improve because the outage is reduced when you use SHRLEVEL CHANGE (just to make the copy) instead of SHRLEVEL REFERENCE (for the entire utility execution).

DB2 will check data integrity using the shadow copies of the data and indexes. When violations are found DB2 will not set the CHECK pending state. And remember, we are checking shadows, so DB2 will not delete rows in violation. Instead, CHECK DATA will create a PUNCHDDN containing REPAIR LOCATE DELETE input that can be run against the active data.

The CHECK LOB utility has also been enhanced to enable online execution. DB2 V9 allows SHRLEVEL CHANGE for checking LOBs using a shadow copy of the LOB table space. And just like CHECK DATA, DB2 uses the DFSMS ADRDSSU utility to create the shadow. And similarly, the check is executed against the shadow and REPAIR statements are generated that you subsequently can run against the actual LOB table space.

Finally, you can also run the REPAIR LOCATE utility specifying SHRLEVEL CHANGE. This enhancements allows us to execute the REPAIR utility online with LOCATE against indexes, index spaces and table spaces.

Monday, September 24, 2007

Histogram Statistics [DB2 9 for z/OS]

Another utility upgrade that has found its way into DB2 9 for z/OS is the ability to gather histogram statistics. This feature is already available in DB2 for Linux, Unix, and Windows… and after you migrate to DB2 V9 it will be available to you on z/OS.

What is it? Well, let’s first define what a histogram is for those of you who are not statistics experts. A histogram is a way of summarizing data that is measured on an interval scale. A histogram is particularly helpful to quickly highlight how data is distributed; to determine if data is symmetrical or skewed; and to indicate whether or not outliers exists.

The histogram is only appropriate for variables whose values are numerical and measured on an interval scale. It is generally used when dealing with large data sets. Histogram statistics can be quite useful to the optimizer for certain types of queries.

Instead of the frequency statistics, which are collected for only a subset of the data, sometimes DB2 can improve access path selection by estimating predicate selectivity from histogram statistics, which are collected over all values in a table space.

Consider collecting histogram statistics to improve access paths for troublesome queries with RANGE, LIKE, and BETWEEN predicates. They can also help in some cases for =, IS NULL, IN LIST and COL op COL predicates.

How to Collect Histogram Statistics

IBM RUNSTATS in DB2 V9 can collect statistics by quantiles. DB2 allows up to 100 quantiles. The user can specify how many quantiles DB2 is to use from 1 to 100. Of course, avoid 1 because it will not help.

You can tell RUNSTATS to collect histogram statistics by coding the HISTOGRAM keyword in conjunction with the COLGROUP option. In this way you can collect histogram statistics for a group of columns. You must also tell DB2 the number of quantiles to collect by specifying the NUMQUANTILES parameter. NUMQUANTILES can also be specified with the INDEX parameter, in which can it indicates that histogram statistics are to be collected for the columns of the index.

A single value can never broken into more than one interval. This means that the maximum number of intervals is equal to the number of distinct column values. Therefore, be sure that you do not specify a value for NUMQUANTILES that is greater than the total number of distinct values for the column (or column group) specified. Also, keep in mind that any NULLs will occupy a single interval.

So then, how do you decide on the number of quantiles to collect? If you do not specify NUMQUANTILES, the default value of 100 will be used, and then based on the number of records in the table, the number will be readjusted to an optimal number. Therefore, unless you have a good understanding of the application or a viable reason to deviate, a good rule of thumb is to simply let the NUMQUANTILES default and let DB2 work it out.

RUNSTATS will produce an equal-depth histogram. This means that each interval will have about the same number of rows. Please note that this does not mean the same number of values – it is the same number of rows. This means that in some cases a highly frequent single value could potentially occupy an interval all by itself.

The histogram statistics are collected in three new columns: QUANTILENO, LOWVALUE, and HIGHVALUE. These columns can be found in the following six DB2 Catalog tables:

  • SYSIBM.SYSCOLDIST
  • SYSIBM.SYSKEYTGTDIST
  • SYSIBM.SYSCOLDIST_HIST
  • SYSIBM.SYSCOLDISTSTATS
  • SYSIBM.SYSKEYTGTDIST_HIST
  • SYSIBM.SYSKEYTGTDISTSTATS.

Here is an example of a RUNSTATS to gather histogram statistics for the key columns of the indexes.:

RUNSTATS TABLESPACE DB07.CSMTS02
INDEX ALL
HISTOGRAM NUMCOLS 2 NUMQUANTILES 10
SHRLEVEL(CHANGE)
UPDATE ALL
REPORT YES

Summary

Histogram statistics is a very powerful new capability of the RUNSTATS utility that can be used to gather distribution statistics across all data values. These statistics can be helpful when you need additional distribution data to enable the optimizer to arrive at a better access path for certain queries/predicates.

Friday, September 07, 2007

COPY Improvements [DB2 9 for z/OS]

Next up in this sub-series on IBM utility improvements in the larger series on DB2 V9 features and functions we will take a look at the new “stuff” in the IBM COPY utility.

Copying Just the Pending Database Objects

The first nice new feature is a new parameter that can be used to make backup copies of only those database objects in a copy pending state. The new parameter is called SCOPE PENDING, and when you specify it the IBM COPY utility will only copy objects in copy pending or informational copy pending state.

The default is SCOPE ALL, which makes COPY act like it always acted; that is, without regard to the pending state.

This option can be used in conjunction with LISTDEF to build very powerful, selective backup jobs.

Copying Clone Tables

The IBM COPY utility also has been extended to support copying clone tables. If you are not sure what they are, follow the link to the blog entry in this series that covers that topic.

Anyway, the new CLONE parameter must be coded on the COPY specification in order for clone table and index data to be copied. It is important to understand that the CLONE parameter is basically an on/off toggle. If CLONE is coded, then COPY will copy only clone table (and/or index) data. You can specify both clone and regular database objects in the same COPY specification, but the IBM COPY utility will ignore non-clone objects when the CLONE parameter is specified; and it will ignore clone objects if the CLONE parameter is not specified.

CHECKPAGE

The CHECKPAGE parameter is not new to V9, but it is improved. Specifying CHECKPAGE indicates that the COPY utility should check each page in the table space or index for validity. If it finds an error, COPY issues a message that describes the type of error. Using CHECKPAGE is a good idea because it will identify problems in your page sets.

In previous releases of DB2, using CHECKPAGE with COPY was problematic because of high overhead and availability issues. In the DB2 9 for z/OS Technical Overview redbook, IBM reports that the CPU overhead is about 5 percent for COPY INDEX and 14 percent for COPY TABLESPACE if you are running V8. As for the availability issue, as soon as a validity problem is found on a page, the COPY utility will place the page set into COPY PENDING so it cannot be accessed while COPY chugs merrily along looking for additional invalid pages.

Both problems are addressed in DB2 9 for z/OS: IBM has significantly reduced the CPU overhead and fixed the availability issue. Instead of putting the page set into COPY PENDING immediately when it finds a problem, instead COPY will produce the message that it found a problem, but it will not set the pending state. Furthermore, it will continue checking pages but not copying anything.

Upon completion, COPY will issue a return code of 8 and it will update the SYSIBM.SYSCOPY catalog table indicating it found a broken page. This information is recording in a new column, TTYPE, with a value of “B”). This is important to understand because the page set will not be in a COPY PENDING state so you will have to check for the return code and fix the problem.

Tuesday, September 04, 2007

MODIFY RECOVERY [DB2 9 for z/OS]

Another IBM DB2 utility that has received some attention under Version 9 is MODIFY RECOVERY. For those not sure what this utility is, MODIFY RECOVERY is used to remove records from the SYSIBM.SYSCOPY table in the DB2 Catalog, along with related log records from the SYSIBM.SYSLGRNX directory table and related information from the DBD.

Before the new V9 changed, when you ran MODIFY RECOVERY you had to specify the deletion criteria in one of two ways:

  • Given a specific date, delete all recovery information before that data
  • Or given an age in days, delete all recovery information older than the age

But as of V9 we can alternately specify what is to be retained instead of what is to be deleted. This way, we tell the utility what we want to keep and it will remove the rest.

So, instead of coding the DELETE parameter (with a DATE or AGE) we would instead code a RETAIN parameter. The RETAIN parameter can specify one of five different options:

LAST (integer) – Using this option informs DB2 to choose a date such that the number of full image copies kept is not less than the integer specified. So, if we specify LAST(5), then we will delete all entries that exist prior to the date on which the fifth oldest copy was taken. If more than one copy was taken on that date the result could be that more than 5 copies are retained.

LOGLIMIT – Specifying this option will delete all copies where DB2 no longer has log records to recover forward from. DB2 uses the BSDS to establish the oldest archive log timestamp.

GDGLIMIT – You can use this option to enable the number of copies to be retained to be linked to the corresponding GDG base limit.

GDGLIMIT LAST (integer) – It is possible also to combine the GDGLIMIT and LAST options. DB2 will then use the GDG base limit, if the last primary copy is a GDG, if not it uses the integer specified.

GDGLIMIT LOGLIMIT – And finally, we can combine the GDGLIMIT and LOGLIMIT options, too. If the most recent primary full copy is a GDG then the GDG limit is used. If the last copy is not a GDG then the BSDS is used to determine the date prior to which deletions will occur.

The last two options (the ones that combine two parameters) are particularly useful if you have a mix of GDGs and non-GDGs for your image copies. They are also helpful if you use the LISTDEF option when you make image copy backups.

It is important to understand that for all these options MODIFY RECOVERY will cause DB2 to determine the most recent date that satisfies the retention requirement. After ascertaining the correct date to use, all entries before that data will be deleted. In effect DB2 chooses the most recent date that would satisfy the retention criteria. So, under some circumstances there may be more than one image copy on the specific date that is established, and therefore DB2 will keep those additional copies, too.

As an example, say we take 7 image copies on a single day. Then we run MODIFY RECOVERY … RETAIN LAST (3). In this case, all 7 will be retained as they were made on the same day, even though we indicated that we wanted to retain only the last 3.

Finally, there is one more significant change made to the MODIFY RECOVERY utility. Prior to V9, information was only deleted when copies were removed from SYSIBM.SYSCOPY. But as of V9, MODIFY RECOVERY will deletes SYSIBM.SYSLGRNX entries even if no SYSIBM.SYSCOPY records are deleted.

Why is this important? Well, some shops have migrated over to running BACKUP SYSTEM instead of individual image copies. In this case, there will be no individual image copies to remove in SYSIBM.SYSCOPY, but the SYSIBM.SYSLGRNX records will continue to grow. After migrating to V9 you can use MODIFY RECOVERY to trim the size of SYSIBM.SYSLGRNX which can improve performance for processes that need to access log range information.

Tuesday, August 28, 2007

TEMPLATE Switching [DB2 9 for z/OS]

Next up in this on-going series covering the new features and functionality of DB2 9 for z/OS we will tackle the improvements that have been made to the IBM DB2 utilities.

A nice new feature that impacts multiple utilities is TEMPLATE switching. For those not familiar with templating, TEMPLATE is a utility control statement that enables the allocation of data sets for a LISTDEF (LISTDEF is the way you control the list of database objects for a utility execution). With the TEMPLATE statement you essentially specify data set naming conventions and allocation information without using JCL DD statements.

OK, so what is TEMPLATE switching? What it enables you to do is to specify different characteristics for image copies of varying sizes. So, for example, you can create a different template for low and high volume image copies. Here is an example:



//SYSIN DD *
TEMPLATE low DSN &DB..&TS..IC.D&DA..T&TI.
UNIT=DASD LIMIT(50 CYL,high)

TEMPLATE high DSN &DB..&TS..IC.D&DA..T&TI.

UNIT=TAPE

COPY TABLESPACE MY.SMALLTS COPYDDN(low)

COPY TABLESPACE MY.LARGETS COPYDDN(low)



Note the new parameter named LIMIT. This is what controls when the TEMPLATE is switched. In this example we set the LIMIT for low volume at 50 cylinders. When this is reached the TEMPLATE will be switched to the high volume template. Of course, you have the flexibility to control the limit by specifying whatever number is appropriate for your site as well as specifying it in CYL, GB, or MB.

It is important to understand that DB2 can only switch the TEMPLATE once, so you cannot have more than two templates.

Template switching is available for image copies produced by COPY, COPYTOCOPY, MERGECOPY, LOAD, and REORG. And you can set up template switching for both the COPYDDN and RECOVERYDDN.

Wednesday, August 22, 2007

Optimistic Locking [DB2 9 for z/OS]

DB2 Version 9 improves support for coding optimistic locking techniques. What is optimistic locking? Sometimes referred to as optimistic concurrency control, optimistic locking is basically just what it sounds like. We are optimists and think that usually we will be the only ones with interest in the data. In other words, when optimistic locking is implemented you are assuming that most of the time there will be no other programs that are interested in the page of data that you are planning to modify.


Of course, even in the most optimistic world there will be exceptions, so optimistic locking does not assume that there will never be any concurrent processes that need to access your page(s). Basically, with optimistic locking you can improve performance by minimizing locking. So how do we do that?


When an application uses optimistic locking, locks are obtained immediately before a read operation and then released immediately. Update locks are obtained immediately before an update operation and held until the end of the transaction. Optimistic locking uses the RID (Record IDentifier) and a row change timestamp to test whether data has been changed by another transaction since the last read operation.


DB2 knows when a row was changed and so therefore he (I make DB2 masculine) can ensure data integrity even as he minimizes the duration of locks. With optimistic locking, DB2 releases the page (or row) locks immediately after a read operation. And if you are using row locks, DB2 releases the row lock after each FETCH, taking a new lock on a row only for a positioned update or a positioned delete.


Careful readers will have noticed that I talked about a “row change timestamp” but you may not have heard that expression before. DB2 V9 adds support for automatically generated timestamp columns and if you wish to implement optimistic locking you will need to create (or alter) your tables to have a row change timestamp column, defined as follows:


NOT NULL GENERATED ALWAYS
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP

or

NOT NULL GENERATED BY DEFAULT
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP

For tables having a row change timestamp column, DB2 will automatically populate and maintain the timestamp values for each row. Notice how the syntax is similar to the syntax used for other automatically generated DB2 values, such as sequences. DB2 will automatically generate the timestamp value for each row when the row is inserted, and modify the timestamp for each row when any column in that row is updated.

When you add a ROW CHANGE TIMESTAMP column to an existing table, the initial value for existing rows will not be immediately populated. Instead, DB2 places the table space in an advisory-REORG pending state. When you reorganize the table space, DB2 will generates the values for the ROW CHANGE TIMESTAMP column for all rows (and, of course, remove the advisory-REORG pending status).

Now that you have this new column you can use it as a condition for making an UPDATE and specify it in your WHERE clause. Let’s walk thru a couple of examples.

First of all, when a table contains a ROW CHANGE TIMESTAMP you can use it to find out when its rows were modified. Let’s use the following table as an example:

CREATE TABLE CUSTOMER
(CUSTNO CHAR(8) NOT NULL,
CUST_INFOCHANGE NOT NULL GENERATED ALWAYS
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP,
CUST_NAME VARCHAR(50),
CUST_ADDRESS VARCHAR(100),
CUST_CITY CHAR(20),
CUST_STATE CHAR(2),
CUST_ZIP CHAR(9),
CUST_PHONE CHAR(10),

PRIMARY KEY (CUSTNO)
);

Now that the table is defined with the ROW CHANGE TIMESTAMP we can use it in our programs and queries to determine change information about the data. For example, if we want to find all of the customer rows that were changed in the past week (ie. the last 7 days) we could run the following query:


SELECT CUSTNO, CUST_NAME

FROM CUSTOMER
WHERE ROW CHANGE TIMESTAMP FOR CUSTOMER <=
CURRENT TIMESTAMP AND
ROW CHANGE TIMESTAMP FOR CUSTOMER >=
CURRENT TIMESTAMP - 7 DAYS;

But what would happen if you issued a statement like this against a table that was altered to include a ROW CHANGE TIMESTAMP? For example, if we created the CUSTOMER table as shown but without the CUST_INFOCHANGE column, populated the table with data, and then altered the table to include the CUST_INFOCHANGE column? In this case, DB2 will use the time the page was last modified. So the results will not be exactly correct because it would return all the rows on each page that qualifies (because at least one row on the page changed). This is why it is important to clear up the advisory REORG pending as soon as possible after adding the ROW CHANGE TIMESTAMP.

OK, this is all well and good, and you can probably see the value of having this automagically changing timestamp in some of your tables, but where is the optimistic locking part? Well, for programs that use updateable static scrollable cursors DB2 can use optimistic locking as long as the plan/package is bound specifying ISOLATION(CS). If you have this situation, DB2 will deploy optimistic locking to reduce the duration of locks between consecutive FETCH operations and between fetch operations and subsequent positioned UPDATE or DELETE operations.

Remember, though, DB2 cannot use optimistic concurrency control for dynamic scrollable cursors. With dynamic scrollable cursors, the most recently fetched row or page from the base table remains locked to maintain position for a positioned UPDATE or DELETE.

Without optimistic locking, the lock taken at the first FETCH is held until the next FETCH. The lock taken at the last FETCH is held until COMMIT, ROLLBACK, or the end of transaction.

With optimistic locking, the scenario changes significantly. When the application requests a FETCH to position the cursor on a row, DB2 locks that row, executes the FETCH and releases the lock. When the application requests a positioned UPDATE or DELETE on the row, DB2 locks the row and then re-evaluates the predicate to ensure that the row still qualifies for the result table.

So, when you move to DB2 9 for z/OS you should evaluate your applications looking for programs that could take advantage of optimistic locking – and then add the ROW CHANGE TIMESTAMP to the appropriate tables.

Tuesday, August 14, 2007

DB2-L Is Back Up & Running

In case you haven't noticed yet, the DB2 mailing list (DB2-L) is back up and running as of Monday, August 13, 2007.

However, the archives are not yet available, but it looks like they will be back up soon (with no data loss). So, welcome back DB2-L - - we all missed you!

Thursday, August 09, 2007

DB2-L Is Down

Those of you who subscribe to the DB2 mailing list, also known as DB2-L, may have noticed that the usually steady stream of information, questions, and comments that used to pour into your in-box has dried up. Don't worry (well, you can worry a little bit, I s'pose), it is nothing you have done... DB2-L is not functional.

The company that hosts the list for IDUG moved the server two weeks ago and when they did they destroyed the disk drive. IDUG has been struggling to get everything rebuilt and back up since then, but it is taking longer than expected.

If you want to keep abreast of when it will be back online, status updates will be posted on IDUG's website...

Wednesday, August 08, 2007

Altering LONG VARCHAR [DB2 9 for z/OS]

Just a short entry today to discuss a feature that will be useful for organizations having tables containing long text strings which are stored as VARCHARs. It involves altering the data type of a column from LONG VARCHAR to a compatible data type.

In previous versions of DB2 before to V9 it is not possible to change the data type of a LONG VARCHAR or LONG VARGRAPHIC column using ALTER. Instead, you would have had to unload the data, drop the table, create the table using the new column definitions and reload the data back into the new table. Of course, when you drop the table you’d lose any authorizations, indexes, etc. defined on the table so you’d have to save and re-create those as well. Basically, it wasn’t easy unless you have a change management tool that automates the process.

But before going any further, what is a LONG VARCHAR? Well, a VARCHAR is a varying-length column for character string data where n specifies the maximum length of the string. If n was greater than 254, the column was setup as a long string column – that is, a LONG VARCHAR; same story for VARGRAPHIC columns.

As of DB2 V9, you will not create any more LONG VARCHAR columns – they are all treated as VARCHARs. But the old LONG VARCHAR columns, migrated from earlier releases, still exist. The COLTYPE in the DB2 Catalog (SYSIBM.SYSCOLUMNS) will be LONGVAR for LONG VARCHAR columns and LOGVARG for LONG VARGRAPHIC columns.

The good news is that V9 allows you to ALTER the data type of LONG VARCHAR to VARCHAR and LONG VARGRAPHIC to VARGRAPHIC using ALTER TABLE with the ALTER COLUMN parameter. First, you must lookup the LENGTH of the column in the SYSCOLUMNS and use that for the length of the new data type specification. For example, if LONGCOL in table EXAMPLE is a LONG VARCHAR you can ALTER it in DB2 V9 to VARCHAR. Assuming the length of the column was 500, you can issue the following ALTER to change the column to VARCHAR from LONG VARCHAR:

ALTER TABLE EXAMPLE
ALTER COLUMN LONGCOL
SET DATA TYPE VARCHAR(500);

You must specify the exact value stored in the DB2 Catalog for LENGTH in this ALTER TABLE statement. Failure to do so will result in an error message.

So, you can convert all of those LONG VARCHAR columns from past versions of DB2 to VARCHAR using the ALTER statement in V9.

Monday, August 06, 2007

Encryption [DB2 9 for z/OS]

DB2 V9 offers some encryption news, but we need to go back a version to start the story. You see, DB2 supports encryption in Version 8 through encryption functions that need to be explicitly coded in order to encrypt and decrypt data.

These functions (ENCRYPT and DECRYPT) allow you to encrypt and decrypt data at the column level. Because you can specify a different password for every row that you insert, you are encrypting data at the “cell” level in your tables. If you use these functions to encrypt your data, be sure to put some mechanism in place to manage the passwords that are used to encrypt the data. Without the password, there is absolutely no way to decrypt the data.

To assist you in remembering the password, you have an option to specify a hint (for the password) at the time you encrypt the data. The following SQL example shows an INSERT that encrypts the SSN ( social security number ) using a password and a hint:

INSERT INTO EMP (SSN)
VALUES(ENCRYPT('289-46-8832','TARZAN','? AND JANE'));

The password is “TARZAN” and the hint we’ve chosen to provide is “? AND JANE”… so the hint will prompt us to think of Tarzan as the companion of Jane.

In order to retrieve the encrypted data you will need to use the DECRYPT function supplying the correct password. This is shown in the following SELECT statement:

SELECT DECRYPT_BIT(SSN,'TARZAN') AS SSN
FROM EMP;


If we fail to supply a password, or the wrong password, the data is returned in an encrypted format that is unreadable.

The result of encrypting data using the ENCRYPT function is VARCHAR FOR BIT DATA. The encryption algorithm is an internal algorithm. For those who care to know, it uses Triple DES cipher block chaining (CBC) with padding and the 128-bit secret key is derived from the password using an MD5 hash.

When defining columns to contain encrypted data the DBA must be involved because the data storage required is significantly different. The length of the column has to include the length of the non-encrypted data + 24 bytes + the number of bytes to the next 8 byte boundary + 32 bytes for the hint.

OK, that is all V8 stuff and this series of blog postings is supposed to be about V9 functionality, right? So what about version 9? Well, DB2 9 for z/OS offers some nice improvements to encryption support. Firstly, DB2 can take advantage of encryption hardware advances.

CP Assist for Cryptographic Function, aka CPACF, is available on z990 hardware. CPACF can run on all the CPUs, but remember, this feature is available only on z990 and later machines, not the older z900. The z990 also introduces a PCIXCC card which is needed for the IBM Data Encryption Tool, but not for the DB2 encryption functions.

Note: The IBM Data Encryption Tool (available from IBM at an additional price) offers encryption for DB2 tables at the table level, whereas the encryption functions (free with DB2) offer encryption at the column level.

The CP Assist for Cryptographic Function delivers cryptographic support on every CP with Data Encryption Standard (DES), Triple DES (TDES), and Advanced Encryption Standard (AES)-128 bit data encryption/decryption, as well as Secure Hash Algorithm (SHA-1) and SHA-256 hashing. For a more detailed discussion of CPACF, associated technology and functionality, check out the following IBM redbook: IBM eServer zSeries 990 (z990) Cryptography Implementation (SG24-7070).

Basically, the net result is that the cost of encrypting DB2 data under V9 is reduced on the z990 hardware.

Additionally, IBM has added encryption support in the controllers of its storage devices.

Both the IBM TS1120 tape drive and IBM Ultrium 4 tape drives include data encryption capabilities within the drives. This support can allow you to avoid the need for host-based encryption of data or the use of specialized encryption appliances. In addition, IBM claims that the encryption does not significantly impact the performance of the drives so there should be minimal to no impact on the batch processing window when encrypting in this manner.

So far, we’ve been talking about encryption for data at rest. But DB2 9 for z/OS also improves support for encryption of data in transit. DB2 9 supports the Secure Socket Layer (SSL) protocol by implementing the z/OS Communications Server IP Application Transparent Transport Layer Security (AT-TLS) function. The z/OS V1R7 Communications Server for TCP/IP introduces the AT-TLS function in the TCP/IP stack for applications that require secure TCP/IP connections. AT-TLS performs transport layer security on behalf of the application, in this case DB2 for z/OS, by invoking the z/OS system SSL in the TCP layer of the TCP/IP stack. The z/OS system SSL provides support for TLS V1.0, SSL V3.0, and SSL V2.0 protocols.

So encryption of data over the wire is improved in z/OS 1.7. The Communications Server supports AT-TLS, which uses SSL data encryption. Now SSL encryption has been available on z/OS for a long time, but now DB2 9 for z/OS makes use of this facility and offers SSL encryption using a new secure port.

When acting as a requester, DB2 for z/OS can request a connection using the secure port of another DB2 subsystem. When acting as a server, and from within a trusted context (I’ll discuss trusted context in a later DB2portal blog entry), SSL encryption can be required for the connection.

So, little by little, better encryption support is being made available within the world of DB2 for z/OS.

Tuesday, July 24, 2007

Index Compression [DB2 9 for z/OS]

Another useful new feature debuting in V9 is the ability to compress indexes. We’ve been able to compress DB2 data in table spaces for a long time now, either through an exit routine or with the COMPRESS table space parameter (added in DB2 V3). But before V9 we’ve never been able to compress index data.

Why would you want to compress index data? Well, some types of applications require very large indexes on very large tables - - data warehousing applications are one good example. Sometimes, the storage required for indexes to support your data warehouse applications can exceed the storage required for the base table. So it makes sense that you might want to reduce the storage consumed by such indexes.

DB2 V9 introduces the COMPRESS parameter for indexes. You can specify COMPRESS YES (or NO) on your CREATE INDEX and ALTER INDEX statements. Index partitioning is done at the index level and cannot be performed on a partition by partition basis.

Additionally, DB2 will only compress the data in leaf pages, not in the root page and any non-leaf pages in between. This makes sense because you don’t want to incur the expense of decompressing all of these type of pages in your indexes in order to find the right leaf page range.

Index compression does not require a compression dictionary. As such, DB2 can begin to immediately compress data in the leaf pages of your newly created indexes.

Now think about what we’ve already learned for a minute. The data on the leaf page is compressed, but we will want to access it uncompressed, right? So index pages are stored on disk in a compressed format but will be expanded when read. So those 4K index pages on disk will require more than 4K when expanded. This means that compressed indexes must be defined in a larger buffer pool (8K, 16K, or 32K). Nevertheless, when you compress an index DB2 will always compress the data down into a 4K page size on disk no matter what page size you choose.

Another consideration to keep in mind is that index data is decompressed for index image copies, so a copy of an index will require more storage space than the actual index requires.

So, when you move to DB2 9 in NFM you have an additional compression decision to make: which indexes should be compressed and which should not? But it is a good thing to have more options at our disposal, especially for applications with huge indexing requirements.

Monday, July 16, 2007

CLONE Tables [DB2 9 for z/OS]

This new feature in DB2 V9 might sound like an old monster movie (Invasion of the Clone Tables!!!), but it is actually a nifty new capability for managing DB2 data availability. Cloning is basically a method of entirely refreshing all of the rows of a table while maintaining availability to the table.

OK, so how does it work? Basically, you will create a table with the exact same attributes as a table that already exists at the current server, except that it is empty of data. It is created using the ALTER TABLE SQL statement with the ADD CLONE parameter. This clone table is created in the same table space as the existing table. After creating the clone table you can do whatever you want to do with it. LOAD it, INSERT rows to it, etc.

The clone table will be structurally identical to the base table in every way: same number of columns, column names, data types, as well as have the same indexes, before triggers, LOB objects and so on. All of this is done auto-magically by DB2 when it the clone table is created. Authorizations and views on the base table, however, are not duplicated, or cloned.

So, the clone is created and it can be manipulated without impacting the base table. When the clone is ready to become the base table it is exchanged with the base table. This is accomplished using the new EXCHANGE SQL statement. After running an EXCHANGE the clone becomes the real table and the previously “real” table becomes the clone - - and you can repeat the process.

The clone and the base table are kept in different underlying VSAM linear data sets. We all should know that the data sets used by DB2 are named using the following format:
cat.DSNDBD.DBNAME.TSNAME.I0001.A001. Well, the clone uses the same name except for the next to last component. It will be I0002. When the exchange is made, DB2 flips the I0001 with the I0002.

Of course, there are rules and details you will need to know to properly deploy cloning in your organizations, but that is all documented in the manuals. The point of these blog entries is more to make you aware of new functionality and give you the basic flavor for how it works and what it is intended for…

With that in mind, this is intended to allow you to quickly replace one version of DB2 table data with another version, without impacting availability while the new version is built.

Thursday, July 12, 2007

New Data Types [DB2 9 for z/OS]

As we continue our blog series on the new features and functionality of DB2 9 for z/OS, today we examine the four (OK, five) new data types introduced in this version of DB2.

BIGINT

First up, we have the BIGINT data type. A BIGINT is an exact numeric data type capable of representing 63-bit integers. This is the third integer data type now available to DB2 and it offers the ability to store the largest range of values:

  • SMALLINT values can range from -32768 to 32767
  • INTEGER values can range from -2147483648 to 2147483647
  • BIGINT values can range from -9223372036854775808 to 9223372036854775807

So when you have the need to store very large integers you don’t have to use DECIMAL with a zero scale any longer.

BINARY and VARBINARY

Next up, and perhaps more exciting, V9 delivers a true binary data type for the first time in DB2, so we no longer have to use a BLOB or CHAR with FOR BIT DATA. BINARY is a fixed-length binary string up to 255 bytes. DB2 9 also delivers a VARBINARY data type, which is a variable-length binary string up to 32704 bytes.

BINARY and VARBINARY data types extend current support of binary strings (BLOB), and are compatible with the BLOB data type. They are not compatible with character string data types. IBM does make it somewhat easy to migrate existing columns defined as CHAR FOR BIT DATA or VARCHAR FOR BIT DATA over to BINARY or VARBINARY. If there is an index defined on the column, the index is placed in RBDP. You cannot alter BINARY or VARBINARY data types to CHAR FOR BIT DATA or VARCHAR FOR BIT DATA.

Also, there are some usage considerations to keep in mind. Two binary strings are equal only if the lengths are identical. If two strings are equal up to the length of the shorter string length, the shorter string is considered less than the longer string, even when the remaining bytes in the longer string are hex zeros.

DECFLOAT

The next new data type supported by DB2 9 for z/OS is DECFLOAT. V9 takes advantage of new System z9 hardware support delivering the DECFLOAT data type that lets you use decimal floating-point numbers with greater precision. A decimal floating-point value (DECFLOAT) is an IEEE 754r number with a decimal point.

The maximum precision is 34 digits and the range of a decimal floating point number is either 16 or 34 digits of precision.


DECFLOAT(16) values can range from a low of:

-9.999999999999999×10**384


to a high of:

9.999999999999999×10**384



DECFLOAT(34) values can range from a low of:

-9.999999999999999999999999999999999 ×10**6144


to a high of:

9.999999999999999999999999999999999 ×10**6144



Anyone think they’ll need bigger numbers?

In addition, the DECFLOAT data type is able to represent special values representing “non-number numbers”:

  • Infinity - a value that represents a number whose magnitude is infinitely large.
  • Quiet NaN - a value that represents undefined results which does not cause an invalid number condition. NaN is not a number.
  • Signaling NaN - a value that represents undefined results which will cause an invalid number condition if used in any numerical operation.

XML

And finally, we get pureXML support to store XML as a native data type in DB2. That means you can specify XML as a data type for columns in your DB2 tables in DB2 9 for z/OS. But I’m not really going to elaborate any further on XML here. A few more details can be found in an earlier post here.

Monday, July 09, 2007

New Built-in Functions [DB2 9 for z/OS]

DB2 9 for z/OS introduces a bevy of new built-in functions (BIFs) for programmers to use in their SQL statements. It is important to keep track of the BIFs available in DB2 because BIFs simplify your coding and development. Invoking a function is always easier than trying to write the equivalent functionality in your host language code. And the BIF will work properly, whereas you cannot always be so sure about your own (sometimes buggy) code.

So, what new function functionality do we get with DB2 9 for z/OS? First of all, we get some new ASCII and EBCDIC conversion functions. The ASCII_CHR function returns the character that has the ASCII code value that is specified by the argument.; and the ASCII_STR function returns a string, in the system ASCII CCSID that is an ASCII version of the string. Knowing that, I bet you can guess what EBCDIC_CHR and EBCDIC_STR do.

We also get UNICODE and UNICODE_STR functions in DB2 9 for z/OS. The UNICODE function returns the Unicode UTF-16 code value of the leftmost character of the argument as an integer. And the UNICODE_STR function? It returns a string in Unicode UTF-8 or UTF-16 (depending on the specified parameter) representing a Unicode encoding of the input string.

Perhaps more interesting is soundex support for testing whether two strings sound the same as each other. DB2 offers two functions here: SOUNDEX and DIFFERENCE. The SOUNDEX function returns a 4 character code that represents the sound of the words in the argument. The result can be used to compare with the sound of other strings. The data type of the result is CHAR(4). So what does this mean? Consider the following example:




SELECT LASTNAME
FROM DSN910.EMP
WHERE SOUNDEX(LASTNAME) = SOUNDEX(’Smith’);



This query would return not only employees with a last name of “Smith,” but also anything that sounds like Smith, such as Smythe.

The DIFFERENCE function is related to SOUNDEX. It returns a value from 0 to 4 where the number represents the difference between the sounds of two strings based on applying the SOUNDEX function to the strings. The higher the value, the closer the two strings are to sounding alike. Consider:


SELECT DIFFERENCE(’CONSTRAINT’,’CONSTANT’),
SOUNDEX(’CONSTRAINT’),
SOUNDEX(’CONSTANT’)
FROM SYSIBM.SYSDUMMY1;

This example returns the values 4, C523, and C523. Since the two strings return the same SOUNDEX value, the difference is 4 (the highest value possible). The more different-sounding the two strings are, the smaller the number would be.

Another interesting series of BIFs are focused on date and time data. These functions include EXTRACT, MONTHS_BETWEEN, and various new timestamp-related functions.

EXTRACT returns a portion of a date or timestamp. You can use EXTRACT to slice up a date/time value into its component pieces. Consider:

SELECT BIRTHDATE,
EXTRACT(DAY FROM BIRTHDATE) AS DAY,
EXTRACT(MONTH FROM BIRTHDATE) AS MONTH,
EXTRACT(YEAR FROM BIRTHDATE) AS YEAR
FROM DSN8910.EMP;

This query would return the entire date, along with each component (year, month, day) of the date as a separate column. You can use one function (EXTRACT) to do the job of the already-existing YEAR, MONTH, and DAY functions. Of course, YEAR, MONTH, and DAY are still available for your use. Similar functionality for EXTRACT exists for time components using HOUR, MINUTE, and SECOND.

The next temporal BIF introduced with DB2 9 for z/OS is the MONTHS_BETWEEN function. It returns an estimate of the number of months between two expressions. If the first expression represents a date that is later than the second, the result will be positive; if the opposite is true the result will be negative. The result is calculated based on a 31 day month.

For example, consider this statement:

SELECT MONTHS_BETWEEN ('2007-02-20','2007-01-17')
AS MONTHS_BETWEEN
FROM SYSIBM.SYSDUMMY1;

The result of this query would be 1.096774193548387.

We also get four new timestamp-related BIFs: TIMESTAMPADD, TIMESTAMPDIFF, TIMESTAMP_FORMAT, and TIMESTAMP_ISO. The first two are rather straightforward: TIMESTAMPADD adds an interval to a timestamp and TIMESTAMPDIFF subtracts two timestamps and returns an interval. I will not get into a discussion of date/time arithmetic here, but if you are interested check out my article Q+As on Dates and DB2.

What about the other two timestamp-related BIFs? TIMESTAMP_FORMAT offers the much-needed ability to choose different display formats for a timestamp value. The valid formats that can be specified are:
• ‘YYYY-MM-DD’
• ‘YYYY-MM-DD-HH24-MI-SS’
• ‘YYYY-MM-DD-HH24-MI-SS-NNNNNN’

And instead of using the dash ( - ) as the separator, you can also use . / , : ; and blank. These separators can be used in any combination. And the VARCHAR_FORMAT function returns a character representation of a timestamp in a format specified as above. So, consider the following query:

SELECT SUBSTR(NAME,1,8) AS TSNAME,
VARCHAR_FORMAT(CREATEDTS,'YYYY-MM-DD-HH24:MI:SS') AS TSCR
FROM SYSIBM.SYSTABLESPACE;
WHERE
CREATEDTS >=
TIMESTAMP_FORMAT('2007-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS');

This query will return the name of all table spaces created since the first of the year, along with its creation timestamp using the format specified.

We also get some new string manipulation functions. The LOCATE_IN_STRING function returns the starting position of the first occurrence of one string within another string. This is basically the same as the existing LOCATE function.

We also get LPAD and RPAD functions. The LPAD function returns a string that is padded on the left, with blanks (or a specific character). The LPAD function treats leading or trailing blanks as significant. RPAD, of course, does the same but on the right. So, consider the following example:

SELECT LPAD(LASTNAME, 30, ’.’ ) AS LAST,
RPAD(FIRSTNME, 30) AS FIRST
FROM DSN910.EMP;

This query will left pad the last name with periods and right pad the first name with blanks.

OVERLAY is yet another new string manipulation function. It allows you to return a string with portions of it overlaid by a specified value. You provide the string, a substring to be overlaid, and its starting point and length, and DB2 does the rest. Learning by example is simpler than trying to explain how it works, so here goes:

SELECT CHAR(OVERLAY('PLATELET','CEMEN',4,4,OCTETS),9),
CHAR(OVERLAY('INSERTING','IS',4,2,OCTETS),10),
FROM SYSIBM.SYSDUMMY1;

The results returned by this query would be:
• 'PLACEMENT ' (starting at position 4 overlay 4 bytes with 5 bytes 'CEMEN')
• 'INSISTING' (starting at position 4 overlay 2 bytes with 'IS')

Additional BIFs include RID, which returns the RID of a row, COLLATION_KEY which returns a varying-length binary string that represents the collation key of the expression in a named collation, DECRYPT_BINARY which adds the ability to decrypt the new BINARY and VARBINARY data types, and NORMALIZE_STRING which takes a Unicode string argument and returns a normalized string. And, of course, we also get scalar functions to support the new data types in DB2 9 (BIGINT, BINARY, VARBINARY and DECFLOAT).

Finally, DB2 9 for z/OS adds three new aggregate functions: CORRELATION, COVARIANCE, and COVARIANCE_SAMP.

The CORRELATION function returns the coefficient of correlation of a set of number pairs. And the COVARIANCE and COVARIANCE_SAMP functions return the (population) covariance of a set of number pairs.

So, as you can see, DB2 9 for z/OS continues adding new and useful functions to simplify our development efforts with DB2.

Friday, June 29, 2007

Implicitly Created Database Objects [DB2 9 for z/OS]

Today we continue our series on new features in DB2 V9 with a quick discussion of implicitly created database objects. To understand what this is, let’s first review the way DB2 works today (pre-V9). If you issue a CREATE TABLE statement and do not specify the database and table space into which the table is to be created, DB2 will automagically create a new table space in the default database (DSNDB04).

Not being content with that, DB2 9 extends this capability with the ability to implicitly create additional types of database objects. By coding your CREATE TABLE statement with the proper options you can implicitly create any or all of the following:
  • Database
  • Table space
  • Index to enforce Primary Key uniqueness
  • Index to enforce unique constraint
  • ROWID index (if the ROWID is defined as GENERATED BY DEFAULT)
  • LOB structures (LOB table space, auxiliary table, auxiliary index)

  • OK, so how does this happen? Let’s go down the list. If you fail to specify the IN clause on a CREATE TABLE, DB2 works a bit differently. In the past, DB2 would simply create an implicit table space in DSNDB04. As of DB2 9, the database is involved as well. DB2 will either create an implicit database or use a previously implicitly created database. The names of these implicitly created databases will range from DSN00001 to DSN60000. The first time, DB2 will create DSN00001, the second DSN00002, and so on until we reach DSN60000. The next time, DB2 will wrap around and start again from the beginning, using existing implicitly created databases. For the implicitly created databases, the STOGROUP will be set to SYSDEFLT; buffer pool values are determined via DSNZPARMs.

    Next up is the table space. Although DB2 has supported implicitly created table spaces forever, there are some twists in DB2 9. First of all, you cannot create simple table spaces any longer, so all implicitly created table spaces will be segmented. In compatibility mode (CM), a implicitly created table spaces will be defined as SEGSIZE 4 and LOCKSIZE ROW. After migrating to new function mode (NFM) your implicitly created table spaces will be created as partition by growth table spaces. The options uses will be SEGSIZE 4, DSSIZE 4G, MAXPARTITIONS 256, LOCKSIZE ROW, and LOCKMAX SYSTEM.

    As for the rest of the objects in the list, these system-required objects will always be implicitly created if the table space is created implicitly. For indexes that support the primary key or unique constraints, the names will be generated using a combination of the table name and randomly generated characters.

    OK, so now that you know about the ability of DB2 9 to implicitly create objects, let me give you some advice. Whenever possible, don’t rely on it. It is much better, if at all feasible, for your DBAs to explicitly create and name all database objects as needed. Yes, it takes more time, but it gives you more control. You can explicitly decide which objects go into which database; you can explicitly set paramters; etc.

    So, this is a nice new feature and it can enable DB2 to do some definitional things for you automatically. But most DBAs will want to continue to do things the traditional way, that is, building their DDL themselves without relying on implicitly

    Monday, June 18, 2007

    Skipping Locked Rows [DB2 9 for z/OS]

    In DB2 9 it is possible for a transaction to skip over rows that are locked. This can be accomplished by means of the SKIP LOCKED DATA option within your SQL statement(s). SKIP LOCKED DATA can be specified in SELECT, SELECT INTO, and PREPARE, as well as searched UPDATE and DELETE statements. You can also use the SKIP LOCKED DATA option with the UNLOAD utility.

    Of course, if a program skips over locked data then that data is not accessed and the program will not have it available. When this option is used DB2 will just skip over any locked data instead of waiting for it to be unlocked. The benefit, of course, is improved performance because you will not incur any lock wait time. But it comes at the cost of not accessing the locked data at all. This means that you should only utilize this clause when your program can tolerate skipping over some data.

    The SKIP LOCKED DATA option is compatible with cursor stability (CS) isolation and read stability (RS) isolation. But it cannot be used with uncommitted read (UR) or repeatable read (RR) isolation levels. DB2 will simply ignore the SKIP LOCKED DATA clause under UR and RR isolation levels.

    Additionally, SKIP LOCKED DATA works only with row locks and page locks. That means that SKIP LOCKED DATA does not apply to table, partition, LOB, XML, or table space locks.
    Let's look at an example. Suppose we have a table with 5 rows in it that looks like this:

    KEY FNAME LNAME
    --- ------ -------
    1 JOE MAMA
    2 DON KNOTTS
    3 KIM PORTANT
    4 BOB NOBBLE
    5 KIM BIMBO


    Assume row level locking. Next assume that an UPDATE statement is run against the table changing FNAME to JIM WHERE LNAME = 'KIM'. And it is hanging out there without a COMMIT. Next, we run:


    SELECT COUNT (*)
    FROM TABLE
    WHERE FNAME >= ’AAA’
    SKIP LOCKED DATA;

    The count returned would be 3 because DB2 skips the two locked rows (rows 3 and 5). And, of course, if the locks are released the count would be 5 again.

    Wednesday, June 13, 2007

    Do You Want to Ignore Clustering? [DB2 9 for z/OS]

    DB2 9 for z/OS offers a new DDL parameter for your tables: APPEND. If you specify APPEND NO, which is the default, DB2 will operate as you are accustomed to it operating. That is, when rows are inserted or loaded DB2 will attempt to sequence them based on the clustering index key.

    If you specify APPEND YES though, DB2 will ignore clustering during inserts and online LOAD processing. Instead of attempting to maintain clustering, DB2 will just append the rows at the end of the table or partition. If the table space is partition-by-growth (new DB2 9 feature) then DB2 can use any partition with space available at the end; for range-partitioned table spaces, obviously DB2 will append the data to the end of the partition corresponding to the range for the value(s) being inserted.

    You might want to choose this option to speed up the addition of new data. Appending data is faster because DB2 does not have to search for the proper place to maintain clustering. And you can always re-cluster the table by running a REORG.

    The APPEND option cannot be specified on LOB tables, XML tables, or tables in work files.

    To track the state of the APPEND option there is a new column, APPEND, in the DB2 Catalog in SYSTABLES. Its value will be either ‘Y’ or ‘N’.

    Monday, June 04, 2007

    Reordered Row Format [DB2 9 for z/OS]

    If you’ve worked with DB2 for awhile, especially as a DBA, you’ve probably heard the advice to re-arrange the columns of your tables to optimize logging efficiency. Basically, the more data that DB2 has to log, the more overhead your programs will incur, and performance will degrade. DB2 will log data from the first byte changed to the last byte changed – unless the row is variable, in which case DB2 will log from the first byte changed to the end of the row – unless the change does not cause the length of the variable row to change, in which case DB2 goes back to logging from the first byte changed to the last byte changed.

    So, the advice goes something like this: put you static columns (those that do not change frequently) at the beginning of the row and your dynamic columns (those that will change more frequently) at the end of the row. And put your variable columns at the end of each. This would make your row look something like this:

    [Static fixed-length cols]
    [Static variable cols]
    [Dynamic fixed-length cols]
    [Dynamic variable cols]

    Make sense?

    Well, DB2 9 for z/OS takes this advice to heart (sort of). In New Function Mode (NFM), for new table spaces, DB2 will automatically put the variable columns at the end of the row. This is called reordered row format (RRF); the row format we are all familiar with today is now referred to as basic row format (BRF). This is all how the row is stored – it does not mean that your DDL is changed nor does it require changes to anything external or how you access the rows.

    To summarize, this means that a row in RRF will store the fixed-length columns first and the variable columns at the end. Pointers within the row will point to the beginning of the variable columns.

    So far so good, right? Well, we DB2 will also convert our old table spaces to RRF over time. Once we are in DB2 9 NFM, a REORG or a LOAD REPLACE will cause a change from BRF to RRF. So run a LOAD REPLACE a table space in NFM and the row format changes. REORG a partition and the row format for that partition changes. And yes, you can have a partitioned table space with some partitions in BRF and some in RRF.

    With BRF we can be sure that DB2 is putting our variable columns at the end of the row – where they belong. But it still is not helping us with placing static columns before the dynamic ones. You’ll still have to guide DB2 to do that.

    Tuesday, May 29, 2007

    INSTEAD OF Triggers [DB2 9 for z/OS]

    DB2 9 for z/OS introduces a new type of trigger: the INSTEAD OF trigger. The primary usage of INSTEAD OF triggers is to enable views that would not otherwise be updatable to support updates. Typically, a view that consists of multiple base tables cannot be updated. But with an INSTEAD OF trigger this problem can be surmounted. You can code an INSTEAD OF trigger to direct inserts, updates and deletes to the appropriate underlying tables of the view.


    With the INSTEAD OF trigger, your application code does not have to include complex algorithms to specify which operations should be performed against views and which should be performed against base table. Instead, all actions are performed against the view and the activated trigger determines which underlying base tables are to be impacted.


    So, you might choose to code an INSTEAD OF trigger on a view over a join to allow modifications on the view to go through to the underlying base tables joined in that view. Or you can encode and decode data within a view: for example, the view could contain the decryption functions while the INSTEAD OF triggers use the encryption functions to ensure security.


    Only one INSTEAD OF trigger is allowed for each type of operation on a given subject view. That is, one for inserts, one for updates, and one for deletes. Therefore, you can have a grand total of three INSTEAD OF triggers per view.


    DB2 executes the triggered-action instead of the insert, update, or delete operation on the subject view. Neither the WHEN clause nor the FOR EACH STATEMENT clause are allowed in INSTEAD OF triggers.


    Furthermore, there are some restrictions on the view in order for an INSTEAD OF trigger to be allowed. First of all, the view must exist at the current server when the INSTEAD OF trigger is created. Additionally, none of the following are permitted for a view to have an INSTEAD OF trigger:

    • the WITH CASCADED CHECK option
    • a view on which a symmetric view has been defined
    • a view that references data encoded with different encoding schemes or CCSID values
    • a view with a ROWID, LOB, or XML column (or a distinct type that is defined as one of these types)
    • a view with a column based on an underlying column defined as an identity column, security label column, or a row change timestamp column
    • a with a column that is defined (directly or indirectly) as an expression
    • a view with a column that is based on a column of a result table that involves a set operator
    • a view with any columns that have field procedures
    • a view where all of the underlying base tables are DB2 Catalog tables or created global temporary tables
    • a view that has other views dependent on it

    One way to think of INSTEAD OF triggers is that they contain the inverse of the logic in the body of the view. If the view joins tables, the trigger should break the join apart to modify the correct data. If the view decrypts columns, the INSTEAD OF trigger should encrypt the columns. Etc.


    Let’s take a look at an example to better understand the INSTEAD OF trigger. First, we create a view that joins the EMP and DEPT tables:



    CREATE VIEW EMP_DEPT (EMPNO, FIRSTNME, MIDINIT, LASTNAME,
    PHONENO, HIREDATE, DEPTNAME)
    AS SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, PHONENO,
    HIREDATE, DEPTNAME
    FROM EMP, DEPT
    WHERE EMP.WORKDEPT = DEPT.DEPTNO;



    OK, so far, so good. But since this view is a join it is not updateable. Let’s fix this by coding up some INSTEAD OF triggers. First, we’ll take care of inserts:




    CREATE TRIGGER E_D_ISRT
    INSTEAD OF INSERT ON EMP_DEPT
    REFERENCING NEW AS NEWEMP
    FOR EACH ROW INSERT INTO EMPLOYEE
    (EMPNO, FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE)
    VALUES
    (EMPNO, FIRSTNME, MIDINIT, LASTNAME,
    COALESCE
    ((SELECT DEPTNO FROM DEPT AS D WHERE D.DEPTNAME = NEWEMP.DEPTNAME),
    RAISE_ERROR('70001', 'Unknown dept name')
    ),
    PHONENO, HIREDATE);



    An insert against the view would not be inserting a new department, so we will be inserting data into the EMP table. If the department does not exist, we’ll raise an error. Next we’ll consider updates:




    CREATE TRIGGER E_D_UPD
    INSTEAD OF UPDATE ON EMP_DEPT
    REFERENCING NEW AS NEWEMP OLD AS OLDEMP
    FOR EACH ROW
    BEGIN ATOMIC
    VALUES(CASE WHEN NEWEMP.EMPNO = OLDEMP.EMPNO THEN 0
    ELSE RAISE_ERROR('70002', 'Must not change EMPNO') END);
    UPDATE EMP AS E SET
    (FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE)
    = (NEWEMP.FIRSTNME, NEWEMP.MIDINIT, NEWEMP.LASTNAME,
    COALESCE((SELECT DEPTNO FROM DEPT AS D
    WHERE D.DEPTNAME = NEWEMP.DEPTNAME),
    RAISE_ERROR ('70001', 'Unknown dept name')),
    NEWEMP.PHONENO, NEWEMP.HIREDATE)
    WHERE NEWEMP.EMPNO = E.EMPNO;
    END








    Finally we take care of deletions:




    CREATE TRIGGER E_D_DEL
    INSTEAD OF DELETE ON EMP_DEPT
    REFERENCING OLD AS OLDEMP
    FOR EACH ROW
    DELETE FROM EMP AS E WHERE E.EMPNO = OLDEMP.EMPNO;


    Using an INSTEAD OF trigger, each requested modification operation made against the view is replaced by the trigger logic. The trigger performs the insert, update, or delete on behalf of the view. No application changes are required because the code is in the trigger which resides in the database.


    If you want to read more about INSTEAD OF triggers, I recommend this quite extensive article (albeit for DB2 LUW) out on the IBM Developer Works web site.


    Thursday, May 24, 2007

    LOB Enhancements [DB2 9 for z/OS]

    IBM focused their attention on improving DB2’s ability to store and manage LOB data in Version 9. As anyone who has tried to use LOBs in a previous version of DB2 knows, the usability limitations were troublesome. But with Version 9, IBM chips away at some of the more annoying LOB limitations.

    FETCHing LOBs

    Prior to Version 9, there were two methods you could deploy in your programs to fetch LOB data:

    • Fetching data into a pre-allocated buffer
    • Using a LOB locator to retrieve a handle on the data.

    Both methods have their issues. Fetching data into a preallocated buffer can cause virtual storage constraint problems, especially for larger LOBs. On the other hand, using LOB locators that commit infrequently or do not explicitly free the locators can use considerable amounts of DB2 resources.

    V9 introduces a new clause, WITH CONTINUE, for use on your FETCH statements. By coding your program to use WITH CONTINUE you can retrieve LOB columns in multiple pieces without using a LOB locator, and continue a FETCH operation to retrieve the remaining LOB data when truncation occurs. (Note: this method can be used with XML data, too.) You will have to manage the buffers and reassemble the pieces of data in your application program.

    So, by specifying WITH CONTINUE on your FETCH statement you tell DB2 to allow subsequent FETCH CURRENT CONTINUE operations. These will allow you to access the remaining truncated LOB (or XML) column after the initial FETCH. If truncation occurs, DB2 will remember the truncation position and will not discard the remaining data. DB2 will return the total length that would have been required to hold all of the data of the LOB or XML column. This will either be in the first four bytes of the LOB host variable structure or in the 4 byte area that is pointed to by the SQLDATALEN pointer in the SQLVAR entry of the SQLDA for that host variable.

    File Reference Variables

    DB2 V9 adds support for a LOB file reference variable. A file reference variable is a host variable that is defined in a host language to contain the file name that directs file input and output for a large object (LOB).

    Using file reference variables, large LOB values can be inserted from a file or selected into a file rather than a host variable. This means that your application program does not need to acquire storage to contain the LOB value. File reference variables also enable you to move LOB values from the DBMS to a client application or from a client application to a database server without going through the working storage of the client application.

    LOBs and Utilities

    The manner in which DB2 handles LOBs in utility processing has also been improved in DB2 V9.

    Loading and unloading LOBs has been improved. For LOAD, an input field value can contain the name of the file that contains a LOB column value. The LOB column value will then be loaded from that file. For UNLOAD, you can store the value of a LOB column in a file and record the name of the file in the unloaded record in the base table.

    What about REORG? Well, prior to V9 you could not access LOB data during a REORG. And a REORG did not reclaim physical space from the LOB data set because LOBs were moved within the existing LOB table space. V9 fixes these problems. During a REORG (in V9), the original LOB table space is drained of writers. All LOBs are then extracted from the original data set and inserted into a shadow data set. When this operation is complete, all access to the LOB table space is stopped (the readers are drained) while the original data set is switched with the shadow data set. At this point, full access to the new data set is enabled, and an inline copy is taken to ensure recoverability of data.

    Additionally, both CHECK LOB and CHECK DATA have SHRLEVEL REFERENCE and SHRLEVEL CHANGE options. Using these you can minimize downtime. For LOBs, CHECK DATA checks for consistency between a base table space and the corresponding LOB or XML table spaces. The CHECK LOB utility identifies structural defects in the LOB table space and any invalid LOB values. Running CHECK (DATA or LOB) with SHRLEVEL REFERENCE indicates that applications can read from but cannot write to the index, table space, or partition that is to be checked. SHRLEVEL CHANGE means that applications can read from and write to the index, table space, or partition that is to be checked.

    Performance

    Finally, DB2 V9 provides several performance improvements related to LOBs. Using file reference variables or WITH CONTINUE to “chunk” read LOBs can improve performance over using locators.

    And as we all know, removing locks can improve performance. DB2 V9 eliminates LOB locking for space allocation, as well as for insert, delete, update, and select operations. Additionally, a LOB lock is no longer required to serialize the consistency between the value of the LOB and the column of the base row for an uncommitted read operation.

    Monday, May 21, 2007

    New OLAP Capabilities [DB2 9 for z/OS]

    DB2 9 for z/OS offers several new SQL improvements focused on improving OLAP functionality. The OLAP functions provide the ability to return ranking, row numbering, and existing aggregate function information as a scalar value in the result of a query. You can include OLAP specifications in an expression, in a select-list, or in the ORDER BY clause of a select-statement.


    The result to which the OLAP specification is applied is the result table of the innermost subselect that includes the OLAP specification. OK, at this point you might well be asking, “So, what are these new OLAP things and what can they do for me?” Let’s take them one at a time.


    First up, we have the RANK and DENSE_RANK functions. These functions specify that the ordinal rank of a row within the specified window is computed. Rows that are not distinct with respect to the ordering within the specified window are assigned the same rank. Here is a quick example:


    SELECT
    EMPNO,
    LASTNAME,
    FIRSTNAME,
    SALARY+BONUS+COMM AS TOTAL_COMP,
    RANK() OVER(ORDER BY SALARY+BONUS+COMM DESC) AS RANK_COMP
    FROM EMP
    WHERE SALARY+BONUS+COMM > 30000
    ORDER BY LASTNAME;


    This query will rank employees who have total compensation greater than $30,000, but order the results by last name. This allows you to rank data differently than the order in which it is presented.

    You can define the results of ranking with gaps in the sequential rank numbering by using the RANK specification, or without gaps, by using the DENSE_RANK specification.

    The difference between the two can be a bit difficult to decipher at first. Think of it this way: RANK specifies that the rank of a row is defined as 1 plus the number of rows that strictly precede the row. So, if two or more rows are not distinct you will get gaps in the ranking. With DENSE_RANK the rank of a row is defined as 1 plus the number of preceding rows that are distinct with respect to the ordering. In this case there will be no gaps in the sequential rank numbering. Consider the following data:


    EMPNO LASTNAME FIRSTNAME SALARY BONUS COMM
    ----- -------- --------- ------ ------ ------
    100 MULLINS CRAIG 500000 100000 400000
    200 SHAW DONNA 25000 10000 0
    300 ABBOTT RANDY 700000 300000 0
    400 WISER BUD 10000 0 0
    500 GREEN RACHEL 40000 2000 5000



    The results of the previous query run against this data would look like this:


    300 ABBOTT RANDY 1000000 1
    500 GREEN RACHEL 47000 3
    100 MULLINS CRAIG 1000000 1
    200 SHAW DONNA 35000 4
    400 WISER BUD 10000 5


    Note that both ABBOTT and MULLINS earn the most, but the amount is the same, so they share the number one ranking. As this is not a dense rank, the next rank value is 3, and not 2.


    The next OLAP specification introduced by DB2 9 for z/OS is ROW_NUMBER. ROW_NUMBER specifies that a sequential row number is computed for the row that is defined by the ordering, starting with 1 for the first row. If the ORDER BY clause is not specified in the window, the row numbers are assigned to the rows in an arbitrary order, as the rows are returned. This satisfies an often-requested capability to simply assign a number to the result rows of a query. Row numbers also enable easy formulation of queries for computing histogram statistics and they enable formation of other OLAP specifications (for example, moving sums, moving averages, and so on).


    Here is a sample query using ROW_NUMBER:



    SELECT
    ROW_NUMBER() OVER(ORDER BY WORKDEPT, LASTNAME) AS NUMBER,
    LASTNAME,
    SALARY
    FROM EMP
    ORDER BY WORKDEPT, LASTNAME;


    The result of a RANK, DENSE_RANK, and ROW_NUMBER specification is BIGINT, and the result cannot be null.


    One more thing to consider for the new OLAP features is the ability to partition results for the OLAP specification. This is specified using the PARTITION BY clause with a partitioning-expression, which is an expression used to define the partitioning of the result table. Each column name that is referenced in a partitioning-expression must unambiguously reference a column of the result table of the subselect that contains the OLAP specification. A partitioning-expression cannot include a scalar-fullselect or any function that is not deterministic or has an external action.


    Here is an SQL example using partitioning:



    SELECT
    WORKDEPT,
    EMPNO,
    LASTNAME,
    FIRSTNAME,
    EDLEVEL,
    DENSE RANK() OVER
    (PARTITION BY WORKDEPT ORDER BY EDLEVEL DESC) AS RANK_EDLEVEL
    FROM EMP
    ORDER BY WORKDEPT, LASTNAME;


    This SQL ranks departments according to their education level. And because DENSE_RANK is specified multiple employees with the same rank in the department will not increase the next ranking.


    Sometimes RANK, DENSE_RANK, and ROW_NUMBER are called window functions. An OLAP specification is not valid in a WHERE, VALUES, GROUP BY, HAVING, or SET clause. An OLAP specification cannot be used as an argument of an aggregate function. When invoking an OLAP specification, a window is specified that defines the rows and order of the rows over which the function is applied.