The Db2 Portal Blog

Wednesday, March 18, 2009

A Short Introduction to Lock Avoidance

Lock avoidance is a mechanism employed by DB2 for z/OS to access data without locking while still maintaining data integrity. It prohibits access to uncommitted data and serializes access to pages. Lock avoidance improves performance by reducing the overall volume of lock requests. After all, let’s face it, the most efficient lock is the one never taken.

Of course, even if it is not taking a lock, DB2 must still maintain the integrity of its data. Instead of taking a lock, DB2 uses a latch. To take advantage of Lock Avoidance, the SQL statement must be Read Only and the plan must be bound with Isolation Level Cursor Stability (CS) and CURRENTDATA(NO).

In general, DB2 avoids locking data pages if it can determine that the data to be accessed is committed and that no semantics are violated by not acquiring the lock. DB2 avoids locks by examining the log to verify the committed state of the data.

When determining if lock avoidance techniques will be practical, DB2 first scans the page to be accessed to determine whether any rows qualify. If none qualify, a lock is not required.

For each data page to be accessed, the RBA of the last page update (stored in the data page header) is compared with the log RBA for the oldest active unit of recovery. This RBA is called the Commit Log Sequence Number, or CLSN. If the CLSN is greater than the last page update RBA, the data on the page has been committed and the page lock can be avoided.

Additionally, a bit is stored in the record header for each row on the page. The bit is called the Possibly UNCommitted, or PUNC, bit. The PUNC bit indicates whether update activity has been performed on the row. For each qualifying row on the page, the PUNC bit is checked to see whether it is off. This indicates that the row has not been updated since the last time the bit was turned off. Therefore, locking can be avoided. (Note that there is no external method for DBAs to use to determine whether a row’s PUNC bit is on or off.)

If neither CLSN nor PUNC bit testing indicates that a lock can be avoided, DB2 acquires the requisite lock.

In addition to enhancing performance, lock avoidance improves data availability. Data that without lock avoidance would have been considered locked, and therefore unavailable, can now be accessible.

Lock avoidance is used only for data pages. Further, DB2 Catalog and DB2 Directory access does not use lock avoidance techniques. You can avoid locks under the following circumstances:

For any pages accessed by read-only or ambiguous queries bound with ISOLATION(CS) and CURRENTDATA NO
For any unqualified rows accessed by queries bound with ISOLATION(CS) or ISOLATION(RS)
When DB2 system-managed referential integrity checks for dependent rows caused by either the primary key being modified, or the parent row being deleted and the DELETE RESTRICT rule is in effect
For both COPY and RUNSTATS when SHRLEVEL(CHANGE) is specified

To determine the impact of lock avoidance on your system, you can review DB2 trace records. IFCIDs 218 and 223 provide CLSN information, and IFCIDs 226 and 227 provide 'wait for page latch' information.

Avoiding locks can improve the performance of your queries and programs that satisfy the preceding requirements. To encourage DB2 to avoid locks, BIND your plans and packages specifying ISOLATION(CS) and CURRENTDATA NO. Furthermore, avoid ambiguous cursors by specifying FOR READ ONLY for all cursors that are not used for updating.

Friday, March 06, 2009

Attend the 2009 IDUG North American Conference (at a Discount)

Today's blog post is just a friendly reminder to the DB2 community that the North American IDUG conference is fast approaching. This year's event will be held in Denver, CO from Tuesday May 12, 2009 through Friday, May 15th. And if you act quickly you can attend at a discounted rate using the early bird registration discount coupon valid through March 27th).

And don't forget those day long seminars that IDUG holds before the regular conference. Instead of being on a Sunday, the seminars will be on the Monday before the conference this year! The
Monday-Friday schedule is a departure from previous IDUG conferences, and was done to reduce or even eliminate the need for weekend travel.

IDUG is one of the best places to advance your DB2 knowledge. This year's conference boasts over 120 hours of technical material to be presented by a mix of real-world DB2 users, third-party vendors, DB2 Gold Consultants, IBM Fellows, IBM Distinguished Engineers, IBM Vice Presidents, and dozens of the most sought-after DB2 speakers in the world.

I will be delivering two presentations at this year's IDUG:

DB2 9: For Developers Only - Wed, 5/13/09 at 1:30 PM
Counting Down the DB2 Performance Top 40 - Fri, 5/15/09 at 9:00 AM

I will also be participating in the Data Privacy, Security and Audit Compliance special Interest Group (SIG), one of the many SIGs that will be conducted at IDUG.

Attendees will have ample time to meet informally between sessions, or as part of SIGs, discussion panels, or the Thursday night "dine-around" with some of IDUG's most popular presenters. And if you are thinking about getting certified, IDUG is the place to do that! Throughout the conference, IBM will waive the $200 certification test fee for all attendees, with no limit on the number of tests each attendee can take. IBM will offer 40 different certification tests that cover DB2, InfoSphere, U2, Content Management, DataStage, and other IBM Information Management products.

Seriously, you don't want to miss out on all of the wonderful learning and networking opportunities that IDUG offers DB2 professionals. Take the time to check out the IDUG conference details on the web and work on getting your management's approval for this great educational event.

Thursday, February 12, 2009

A Twittering You Will Go?

This week, a thread was started on the DB2-L list server about Twitter, the micro-messaging Web 2.0 social networking tool. Basically, someone wanted to know why more DB2 people did not use Twitter. The consensus seems to be that many organizations block it as a "non-business" web site.

(Surprisingly, LinkedIn seems not to be blocked as often as Twitter, even though LinkedIn is a prime vehicle for job search networking.)

This is disputable. If you've tried Twittering you know that it can be addictive, but it is also growing in popularity as a business tool for communication. This might seem hard to believe when you first dive into Twittering.

The basic idea of Twitter is simple: provide a platform for users to publish messages of no more than 140 characters at a time. And that can seem limiting... until you've used Twitter for awhile. If you subscribe to my Twitter feed you'll find that I send out regular Tweets (that is what a Twitter message is called) for many things, such as:

when I post a new blog entry (maybe you got here that way),
to share the highlights of interesting sessions when I attend a conference or user group,
to notify folks when I've published a new article or column, and
just to share some of the "things" going on in my life.

OK, so what are the business uses of Twitter? Well, sharing information (like I do) is absolutely a
business usage. Sharing practical web links is another. Keeping abreast of technology topics, yet
another. Micro-messaging can help you reduce email and eliminate unproductive meetings.

Other DB2 professionals use Twitter to communicate and solve problems. Willie Favero, Troy Coleman, and even some in-the-trenches folks use Twitter. So you know you'll get some good DB2 information if you participate.

So what? you may say: "my company already blocked Twitter so I can't participate." Well, there might be a way around that (I don't know if this will work or not). From your home PC, or some other non-company PC, go to twitter.com, register and see what it is all about. Then download a Twitter client, like TweetDeck (which my personal favorite) or Twhirl. Take the download and install it at work... now see if things are still blocked when you use a different client. They might be, but then again, maybe not...

Now (wink-wink) I do not really advocate people trying to get around their company's policies. But if you try this out and it works (or even if it does not) post a comment here to let us all know.

Wednesday, February 11, 2009

Don't Forget DISPLAY as a Part of Your DB2 Tuning Efforts

Although a DB2 performance monitor is probably the best solution for gathering information about your DB2 subsystems and databases, you can gain significant insight into “what is going on out there” using the simple DISPLAY command. The DISPLAY command can be used to return information about the status of DB2 data sharing groups, databases and table spaces, threads, stored procedures, user-defined functions, utilities, and traces; it can also monitor the Resource Limit Facility (RLF) and distributed data locations. Let’s take a quick tour of the useful information provided by the DISPLAY command.

Database Information

There are eight variations of the DISPLAY command that you can utilize, depending on the type of information you are looking for. Probably the most often-used variation of the DISPLAY command is the DATABASE option. By running the DISPLAY DATABASE command, you can gather information on DB2 databases and tablespaces. The output of the basic command will show the status of the objects specified along with any exception states that apply. For example:

-DISPLAY DATABASE(DBNAME)

Issuing this command will display details on the DBNAME database including information about the tablespaces and indexes in that database. So, with a simple command you can easily find all of the tablespaces and indexes within any database — pretty powerful stuff. But the status information for each space is useful, too. When a status other than RO or RW is encountered, the object is in an indeterminate state or is being processed by a DB2 utility. The possible statuses that DB2 can assign to a page set are detailed in the following table.

ARBDP	Index is in Advisory Rebuild Pending status; the index should be rebuilt to improve performance and allow the index to be used for index-only access again.
AREO*	The table space, index, or partition is in Advisory Reorg Pending status; the object should be reorganized to improve performance. This status is new as of DB2 V8.
ACHKP	The Auxiliary Check Pending status has been set for the base table space. An error exists in the LOB column of the base table space.
AREST	The table space, index space, or partition is in Advisory Restart Pending status. If back-out activity against the object is not already underway, either issue the RECOVER POSTPONED command or recycle the specifying LBACKOUT=AUTO.
AUXW	Either the base table space or the LOB table space is in the Auxiliary Warning status. This warning status indicates an error in the LOB column of the base table space or an invalid LOB in the LOB table space.
CHKP	The Check Pending status has been set for this table space or partition.
COPY	The Copy Pending flag has been set for this table space or partition.
DEFER	Deferred restart is required for the object.
GRECP	The table space, table space partition, index, index partition, or logical index partition is in the group buffer pool Recover Pending state.
ICOPY	The index is in Informational Copy Pending status.
INDBT	In-doubt processing is required for the object.
LPL	The table space, table space partition, index, index partition, or logical index partition has logical page errors.
LSTOP	The logical partition of a non-partitioning index is stopped.
PSRBD	The entire non-partitioning index space is in Page Set Rebuild Pending status.
OPENF	The table space, table space partition, index, index partition, or logical index partition had an open data set failure.
PSRCP	Indicates Page Set Recover Pending state for an index (non-partitioning indexes).
PSRBD	The non-partitioning index space is in a Page Set Rebuild Pending status.
RBDP	The physical or logical index partition is in the Rebuild Pending status.
RBDP*	The logical partition of a non-partitioning index is in the Rebuild Pending status, and the entire index is inaccessible to SQL applications. However, only the logical partition needs to be rebuilt.
RECP	The Recover Pending flag has been set for this table space, table space partition, index, index partition, or logical index partition.
REFP	The table space, index space, or index is in Refresh Pending status.
RELDP	The object has a release dependency.
REORP	The data partition is in a REORG Pending state.
REST	Restart processing has been initiated for the table space, table space partition, index, index partition, or logical index partition.
RESTP	The table space or index is in the Restart Pending status.
RO	The table space, tables pace partition, index, index partition, or logical index partition has been started for read-only processing.
RW	The table space, table space partition, index, index partition, or logical index partition has been started for read and write processing.
STOP	The table space, table space partition, index, index partition, or logical index partition has been stopped.
STOPE	The table space or index is stopped because of an invalid log RBA or LRSN in one of its pages.
STOPP	A stop is pending for the table space, table space partition, index, index partition, or logical index partition.
UT	The table space, table space partition, index, index partition, or logical index partition has been started for the execution of utilities only.
UTRO	The table space, table space partition, index, index partition, or logical index partition has been started for RW processing, but only RO processing is enabled because a utility is in progress for that object.
UTRW	The table space, table space partition, index, index partition, or logical index partition has been started for RW processing, and a utility is in progress for that object.
UTUT	The table space, table space partition, index, index partition, or logical index partition has been started for RW processing, but only UT processing is enabled because a utility is in progress for that object.
WEPR	Write error page range information.

Of course, there are many additional options that can be used in conjunction with the DISPLAY DATABASE command. The following options can be used to narrow down the amount of information displayed:

USE displays what processes are using resources for the page sets in the database
CLAIMERS displays the claims on the page sets in the database
LOCKS displays the locks held on the page sets in the database
LPL displays the logical page list entries
WEPR displays the write error page range information.

Additionally, for partitioned page sets, you can specify which partition, or range of partitions, that you wish to display.

The OVERVIEW option can be specified to display each object in the database on its own line. This condenses the output of the command and makes it easier to view. The OVERVIEW keyword cannot be specified with any other keywords except SPACENAM, LIMIT, and AFTER.

Another tactic that can be used to control the amount of output generated by DISPLAY DATABASE is to use the LIMIT parameter. The default number of lines returned by the DISPLAY command is 50, but the LIMIT parameter can be used to set the maximum number of lines returned to any numeric value. For example:

-DISPLAY DATABASE(DBNAME) LIMIT(300)

Using the LIMIT parameter in this manner would increase the limit to 200 lines of output. To indicate no limit, you can replace the numeric limit with an asterisk (*).

Finally, you can choose to display only objects in restricted or advisory status using either the ADVISORY or RESTRICT key word.

Buffer Pool Information

The DISPLAY BUFFERPOOL command can be issued to display the current status and allocation information for each buffer pool. For example, consider the following:

-DISPLAY BUFFERPOOL (BP0)


DSNB401I ALLOCATED = 2000   TO BE DELETED = 0
             IN USE/UPDATED =       12

DSNB403I ALLOCATED = 100000   TO BE DELETED = 0
             BACKED BY ES   =    91402

DSNB404I VPSEQUENTIAL = 80  HPSEQUENTIAL = 80
            DEFERRED WRITE  = 50  VERTICAL DEFERRED WRT  = 10
            IOP SEQUENTIAL  = 50

DSNB405I HIPERSPACE NAMES - @001SSOP

DSN9022I DSNB1CMD '-DISPLAY BUFFERPOOL' NORMAL COMPLETION

We can see by reviewing these results that BP0 has been assigned 2,000 pages, all of which have been allocated. Furthermore, we see that it is backed by a hiperpool of 100,000 pages (so this is not a V8 subsystem, because hiperpools are no longer supported as of V8). The output also shows us the current settings for each of the sequential steal and deferred write thresholds.

For additional information on buffer pools you can specify the DETAIL parameter. Using DETAIL(INTERVAL) produces buffer pool usage information since the last execution of DISPLAY BUFFERPOOL. To report on buffer pool usage since the pool was activated, specify DETAIL(*). In each case, DB2 will return detailed information on buffer-pool usage such as the number of GETPAGEs, prefetch usage, and synchronous reads. The detailed data returned after executing this command can be used for rudimentary buffer pool tuning. For example, you can monitor the read efficiency of each buffer pool using the following formula:

(Total GETPAGEs) / [ (SEQUENTIAL PREFETCH) +
                    (DYNAMIC PREFETCH) +
                    (SYNCHRONOUS READ)
                  ]

A higher read efficiency value is better than a lower one because it indicates that pages, once read into the buffer pool, are used more frequently. Additionally, if buffer pool I/O is consistently high, you might consider adding pages to the buffer pool to handle more data.

Finally, you can gather even more information about your buffer pools using the LIST and LSTATS parameters. The LIST parameter lists the open table spaces and indexes within the specified buffer pools; the LSTATS parameter lists statistics for the table spaces and indexes reported by LIST. Statistical information is reset each time DISPLAY with LSTATS is issued, so the statistics are as of the last time LSTATS was issued.

Utility Execution Information

If you are charged with running (IBM) DB2 utilities, another useful command is DISPLAY UTILITY. Issuing a DISPLAY UTILITY command will cause DB2 to display the status of all active, stopped, or terminating utilities.

So, if you are in over the weekend running REORGs, issuing an occasional DISPLAY UTILITY allows you to keep up-to-date on the status of the job. By monitoring the current phase of the utility and matching this information with the utility phase information, you can determine the relative progress of the utility as it processes.

For the IBM COPY, REORG, and RUNSTATS utilities, the DISPLAY UTILITY command also can be used to monitor the progress of particular phases. The COUNT specified for each phase lists the number of pages that have been loaded, unloaded, copied, or read.

You also can check the progress of the CHECK, LOAD, RECOVER, and MERGE utilities using DISPLAY UTILITY. The number of rows, index entries, or pages, that have been processed are displayed by this command.

Log Information

You can use the DISPLAY LOG command to display information about the number of logs, their current capacity, and the setting of the LOGLOAD parameter. This information pertains to the active logs. DISPLAY ARCHIVE will show information about your archive logs.

Stored Procedure and UDF Information

If your organization uses stored procedures and/or user-defined functions (UDFs), the DISPLAY command once again comes in handy. You can use the DISPLAY PROCEDURE command to monitor stored procedure statistics. This command will return the following information:

Whether the named procedure is currently started or stopped
How many requests are currently executing
The high-water mark for concurrently running requests
How many requests are currently queued
How many times a request has timed out
The WLM environment in which the stored procedure executes

For UDFs, you can use the DISPLAY FUNCTION SPECIFIC command to monitor UDF statistics. This command displays one output line for each function that a DB2 application has accessed. It shows:

Whether the named function is currently started or stopped, and why
How many requests are currently executing
The high-water mark for concurrently running requests
How many requests are currently queued
How many times a request has timed out
The WLM environment in which the function executes

When displaying information about stored procedures and UDFs using the DISPLAY PROCEDURE and DISPLAY FUNCTION SPECIFIC commands, a status is returned indicating the state of the procedure or UDF. A procedure or UDF can be in one of four potential states:

STARTED - requests for the function can be processed
STOPQUE - requests are queued
STOPREJ - requests are rejected
STOPABN - requests are rejected because of abnormal termination

Log Information

There is a wealth of additional information that the DISPLAY command can uncover.

For distributed environments, use DISPLAY DDF to show DDF configuration and status information, as well as statistical details on distributed connections and threads; use DISPLAY LOCATION to show information about distributed threads.
For data sharing, you can use the DISPLAY GROUP command to display information about the data-sharing group (including the version of DB2 for each member); and DISPLAY GROUPBUFFERPOOL can be used to show information about the status of DB2 group buffer pools.
If you use the Resource Limit Facility, the DISPLAY RLIMIT command can be used to show the status of the RLF, including the ID of the active Resource Limit Specification Table (RLST).
To display active and in-doubt connections to DB2 for a specified connection or all connections, use the DISPLAY THREAD command.
And finally, the DISPLAY TRACE command can be used to list your active trace types and classes along with the specified destinations for each.

Summary

The DB2 DISPLAY command is indeed a powerful, and simple tool that can be used to gather a wide variety of details about your DB2 subsystems and databases. Every DBA should know how to use DISPLAY and its many options to simplify their day-to-day duties and job tasks.

Friday, February 06, 2009

A New DB2 Manual

I'm just now getting around to downloading the recently refreshed IBM DB2 9 for z/OS manuals. IBM updated almost all of the DB2 manuals in December 2008. Indeed, 19 of the 24 manuals listed have a publication date of December 2008.

But wait, I haven't seen one of these manuals before: IRLM Messages and Codes for IMS and DB2 for z/OS. If you take a look at the manual, yes, it is a first edition.

This "new" manual describes messages and codes that are issued by the IRLM (internal resource lock manager) which is used by both IMS and DB2 for z/OS. The information is not necessarily new, though, as it was previously contained in the messages and codes publications for both IMS and DB2. But now, we have a single manual.

Another thing I noticed, but I'm not sure exactly when it happened, is that Directory of Subsystem Parameters has been removed as an Appendix of the DB2 Installation Guide (dsnigk15). Now I know this Appendix was there in this manual when DB2 9 first came out (I still have the PDF)... but it was not in the previous edition (dsnigk14) of the Installation Guide either. Anyone know if it was moved somewhere else (wouldn't make much sense since it refers back to pages in the Installation Guide)? Or if there are plans afoot to make a DSNZPARM manual (I've been requesting and wishing for that for years).

Thursday, February 05, 2009

DB2 Performance Monitoring Overview

In today's blog entry we will discuss the basics monitoring and DB2 performance monitors.

The most common way to provide online DB2 performance monitoring capabilities is by online access to DB2 trace information in the MONITOR trace class. You generally specify OPX or OPn for the destination of the MONITOR trace. This way, you can place the trace records into a buffer that can be read using the IFI.

Some online DB2 performance monitors also provide direct access to DB2 performance data by reading the control blocks of the DB2 and application address spaces. This type of monitoring provides a "window" to up-to-the-minute performance statistics while DB2 is running. Such products can deliver in-depth performance monitoring without the excessive overhead of traces. Of course, they typically use a non-standard API into DB2, which could conceivable cause trouble.

Most online DB2 performance monitors provide a menu-driven interface accessible from TSO or VTAM. It enables online performance monitors to start and stop traces as needed based on the menu options chosen by the user. Consequently, you can reduce overhead and diminish the learning curve involved in understanding DB2 traces and their correspondence to performance reports.

Following are some typical uses of online performance monitors. Many online performance monitors can establish effective exception-based monitoring. When specified performance thresholds are reached, triggers can offer notification and take action. For example, you could set a ‘trigger’ when the number of lock suspensions for TXN2 is reached; when the ‘trigger’ is activated, a message is sent to the console and a batch report is generated to provide accounting detail information for the plan. You can set any number of ‘triggers’ for many thresholds.

Following are suggestions for setting thresholds:

When a buffer pool threshold is reached (PREFETCH DISABLED, DEFERRED WRITE THRESHOLD, or DM CRITICAL THRESHOLD).

For critical transactions, when predefined performance objectives are not met. For example, if TXN1 requires subsecond response time, set a trigger to notify a DBA when the transaction receives a class 1 accounting elapsed time exceeding 1 second by some percentage (10 percent; or even 25 percent, for example).

Many types of thresholds can be established. Most online monitors support this capability. As such, you can customize the thresholds for the needs of your DB2 environment.

Online performance monitors can produce real-time EXPLAINs for long-running SQL statements. If an SQL statement is taking a significant amount of time to process, an analyst can display the SQL statement as it executes and dynamically issue an EXPLAIN for the statement. Even as the statement executes, an understanding of why it is taking so long to run can be achieved. This can be particularly useful for dynamic SQL because it is not pre-bound and therefore you which won’t have any access path information for it.

Online performance monitors can also reduce the burden of monitoring more than one DB2 subsystem. Multiple DB2 subsystems can be tied to a single online performance monitor to enable monitoring of distributed capabilities, multiple production DB2s, or test and production DB2 subsystems, all from a single session.

Most online performance monitors provide historical trending. These monitors track performance statistics and store them in DB2 tables or in VSAM files with a timestamp. They also provide the capability to query these stores of performance data to assist in the following:

Analyzing recent history. Most SQL statements execute quickly, making difficult the job of capturing and displaying information about the SQL statement as it executes. However, you might not want to wait until the SMF data is available to run a batch report. Quick access to recent past-performance data in these external data stores provides a type of online monitoring that is as close to real time as is usually needed.

Determining performance trends, such as a transaction steadily increasing in its CPU consumption or elapsed time.

Performing capacity planning based on a snapshot of the recent performance of DB2 applications.

Some monitors also run when DB2 is down to provide access to the historical data accumulated by the monitor.

A final benefit of online DB2 performance monitors is their capability to interface with other z/OS monitors, for example IMS, CICS, MVS, and WebSphere monitors. This way, you can obtain a view of the entire spectrum of system performance.

Monday, February 02, 2009

Congratulations Pittsburgh Steelers!

Today my blog entry will veer away from technology briefly to congratulate the Pittsburgh Steelers on winning a record sixth Super Bowl title. I was born and raised in Pittsburgh and even though I live in Texas now, I'm still a die-hard Steelers fan.

Kudos to the Arizona Cardinals on putting up a great fight... and making the game too close for comfort there at the end!

I'll get back to our regularly scheduled DB2 programming in my next post... promise!

Friday, January 30, 2009

Hey DBAs! Recoverability Trumps Performance

Many DBAs reading this blog will probably think I'm wrong, at least initially. They'll claim that managing performance is the most important thing they do, but they are confusing frequency with importance. Yes, DBAs confront performance issues more often than they build backup plans – and they better be managing performance more frequently than they are actually recovering their databases or their company has big problems!

So why do I say that recoverability is at the pinnacle of the DBA task list? Well, if you cannot recover your databases after a problem then it won’t matter how fast you can access them, will it? Anybody can deliver fast access to the wrong information (or worse yet, no information at all). It is the job of the DBA to keep the information in their company’s databases accurate, secure, and accessible.

So what do we need to do to assure the integrity of our database data? First we need to understand the availability needs of our data in terms of the business. In the event of a failure how rapidly must we be able to recover from that failure? Keep in mind that the failure could be either physical, such as a failed disk drive, or logical, such as applying the wrong input to a process which corrupts the database.

Only after we know the impact to the business can we develop an appropriate backup and recovery plan. We need service level agreements (SLAs) for recovery just like we have SLAs for performance. The recovery SLA needs to be phrased as a recovery time objective (RTO) from an application perspective; for example “The amount of time to restore application availability after a failure of the order entry system cannot exceed 2 hours (or 10 minutes or whatever is appropriate for your business)”

To create effective RTOs you will need to be able to answer the question “What is the cost of not having this data available?” When we know the expectations of the business we can work to create a backup and recovery plan that matches the requirements. There are multiple techniques and methods for backing up and recovering databases. Some techniques, while more costly, can enhance availability by recovering data more rapidly.

It is imperative that the DBA team creates an appropriate recovery strategy for each database object. This requires mapping database objects to applications so we can adopt the proper strategy in accordance with the application recovery SLA. Some database objects will participate in multiple applications, and their recovery strategy will therefore be more complex.

Not all data is created equal. Some of your databases and tables contain data that is necessary for the core of your business. Other database objects contain data that is less critical or easily derived from other sources. Armed with this information, DBAs can develop RTOs such that the recovery plan matches the needs of the business.

Establishing a reasonable backup schedule requires you to balance two competing demands: the need to take image copy backups frequently to assure reasonable recovery time, while at the same time dealing with the need to take image copies infrequently so as not to interrupt daily business. All the while keeping in mind, if you make fewer image copies you will need to apply more log records during the recovery, and the recovery will take longer. The DBA must balance these competing objectives based on RTOs, usage criteria, and the capabilities of the DBMS.

When was the last time you re-evaluated and tested your backup and recovery plans? Oh, you may have looked at disaster plans, but have you examined your ability to recover locally? Do you know how long it would take to recover your most important primary customer tables, for example, if you took a hit in the middle of the day?

Regular recoverability health checking should be a standard, documented responsibility for the DBA staff; and if you can acquire software to automate the health-check process, all the better.

Wednesday, January 21, 2009

Vote for DB2

I ran across this poll on the web asking about your favorite DBMS so thought I'd write a brief blog post about it to boost DB2's standing.

If you get a chance, click on over and vote for DB2!

Tuesday, January 20, 2009

Looking for Education? Try an Online Tutorial or Two.

In today's difficult economic climate it can be difficult to get the training you need to ensure optimal job performance. Training budgets are notoriously the first thing that gets slashed when earnings and margins dip. And even if you have a training budget it can be difficult to get time out of the office.

But as DB2 DBAs, programmers, analysts, and other data professionals, we all need to keep out skills sharp. With that in mind, make sure that you keep up with IBM's developerWorks web site. This site contains a vast arsenal of information and training opportunities to keep you up-to-date on what is going on with IBM's offerings.

For the DB2 professional, keep an eye on the Information Management tutorials offered. IBM's tutorials provide a step-by-step guide written by experts to help you grow your skills on new technologies and IBM products. The site offers over 1,500 tutorials and they have added at least 300 new tutorials each year. If you click on the link above in this paragraph there are over 450 tutorials related to IBM's information management offerings (DB2, Informix, etc.)

So maybe you cannot get off-site for additional training, but there is really no excuse for not getting some training this year. Especially when IBM's developerWorks puts it all at your fingertips, just a couple of clicks away...

Tuesday, January 13, 2009

Counting Down the DB2 Performance Top 40

The title of this blog posting is the title of one of my IDUG NA presentations this year. I'm blogging about it (briefly) today to solicit input and comments. I have my own ideas about the things I'll be covering in this presentation, but if you've got your own favorite performance "thing" that you think should be covered in a Top 40 presentation like this, please share it as a comment here on the blog.

Keep in mind that the presentation is a DB2 for z/OS presentation, so I won't be covering LUW or iSeries stuff.

Monday, January 05, 2009

VOLATILE: A Useful Little Keyword

Just a short blog entry today to remind everyone about the VOLATILE keyword. This keyword was added in DB2 Version 8 and it can be specified on a table using CREATE TABLE and/or ALTER TABLE statements.

OK, so what will VOLATILE do? Basically, this keyword is used to indicate that the volume of data in the table is volatile and is likely to fluctuate. One common scenario where VOLATILE will be helpful is for tables that are emptied nightly and then repopulated the next day, such as an input queue.

When you specify the VOLATILE keyword on a table, BIND will favor using indexed access paths, even if the table was empty when RUNSTATS was run.

ERP environments (e.g. SAP, Peoplesoft) with thousands of tables typically have some tables that meet these criteria. Even worse, it is not uncommon for DBAs to have no idea of the actual content or use for many of those thousands of tables generated by the ERP installation. Some are not used based on which modules of the ERP system you implement, but the tables get created anyway. Many DBAs simply maintain all of the tables provided with the ERP system, whether they are used or not, including running image copies and gathering RUNSTATS for them... and many are empty tables.

Collecting statistics on an empty table populates the catalog with stats indicating that the table contains no data. And, of course, when access paths are generated using those statistics DB2 will probably favor a scan because the table is small (how much smaller can you get than empty?) But some of those tables are volatile, going from empty to perhaps hundreds of thousands of rows during processing.

Of course, if the table is actually empty (or contains only a small amount of data), and VOLATILE is specified, DB2 will use an index if one exists, which can degrade performance a bit. But that is a smaller price to pay than scanning thousands of rows, isn't it?

So the answer is to use the VOLATILE keyword for these type of tables... your users will be glad you did.

Friday, January 02, 2009

Recovery AssuranceExpert for DB2 z/OS: Automating the IT risk management of business availability

Business availability is more than just having a reliable hardware and database platform in place. Even the best high availability environment cannot safeguard itself from logical errors. Since most companies cannot afford downtime, it is important that the enterprise data on which they depend is always available.

Well-planned recovery procedures should be able to assure a complete recovery of enterprise-critical data within a pre-defined time window that provides for minimum disruption of the business. However, within complex environments, it is nearly impossible to perform recovery tests without disrupting the production system. Therefore, even the best-planned recovery scenarios fail because of operational risks resulting from unforeseen and typically immeasurable vulnerabilities.

If you are interested in minimizing the risk associated with DB2 for z/OS availability and recoverability, read this white paper by Brenda Honeycutt to learn about the value of regular, periodic health checks to assure your recovery time objectives for DB2 recoverability.

Sunday, December 07, 2008

Hectic

Just a quick post to apologize for not posting here as often as I would like. I have been traveling quite a lot the past month or so and just haven't had the time to keep up with everything while I've been on the road. But don't give up on me... I'll be back and blogging here again after the New Year.

In the meantime, be sure to check in regularly on my other blog, Data Management Today.

And if you live in Richmond or Detroit, you can see me present at your local user group next week... Tuesday in Richmond (12/9/08) and Thursday in Detroit (12/11/08).

Friday, November 14, 2008

There is Still Time to Attend IDUG Regional Forums

Just a quick entry today to remind everyone that IDUG has started conducting regional events. The first one was held last week in San Ramon, California, and next week two events will be held: in Camp Hill, Pennsylvania, and Kansas City, Missouri, (actually Lenexa, KS). The forums offer you an opportunity to obtain DB2 education rather quickly and inexpensively.

Each Forum offers 2 days of education with 2 tracks: one covering DB2 for z/OS and another covering DB2 for LUW. IDUG is offering full two day registrations for $425 and single day registrations for only $225.

Here are the scheduled dates:

* Camp Hill, PA - November 17 and 18
* Kansas City, MO - November 19 and 20

Check out the links above for the full list of sessions in your area.

I'll be delivering my presentation titled "DB2 9: For Developer's Only" at both of these forums. And there will be many other great speakers there, too!

Friday, November 07, 2008

More on DB2 Date and Time Data: Arithmetic Expressions

DB2 allows you to add and subtract DATE, TIME, and TIMESTAMP columns. In addition, you can add date and time durations to, or subtract them from, date and time columns. But use date and time arithmetic with care. If you do not understand the capabilities and features of date and time arithmetic, you will likely encounter some problems implementing it.

Keep the following rules in mind.

When you issue date arithmetic statements using durations, do not try to establish a common conversion factor between durations of different types. For example, the following two date arithmetic statements are not equivalent:

    1997/04/03 - 1 MONTH

  1997/04/03 - 30 DAYS

April has 30 days, so the normal response would be to subtract 30 days to subtract one month. The result of the first statement is 1997/03/03, but the result of the second statement is 1997/03/04. In general, use like durations (for example, use months or use days, but not both) when you issue date arithmetic.

Another consideration: if one operand is a date, the other operand must be a date or a date duration. If one operand is a time, the other operand must be a time or a time duration. You cannot mix durations and data types with date and time arithmetic.

If one operand is a timestamp, the other operand can be a time, a date, a time duration, or a date duration. The second operand cannot be a timestamp. You can mix date and time durations with timestamp data types.

Now, what exactly is in that field returned as the result of a date or time calculation? Simply stated, it is a duration. There are three types of durations: date durations, time durations, and labeled durations.

Date durations are expressed as a DECIMAL(8,0) number. The result of subtracting one DATE value from another is a date duration. To be properly interpreted, the number must have the format yyyymmdd, where yyyy represents the number of years, mm the number of months, and dd the number of days.

Time durations are expressed as a DECIMAL(6,0) number. To be properly interpreted, the number must have the format hhmmss, where hh represents the number of hours, mm the number of minutes, and ss the number of seconds. The result of subtracting one TIME value from another is a time duration.

Labeled durations represent a specific unit of time as expressed by a number followed by one of the seven duration keywords: YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS, or MICROSECONDS. A labeled duration can only be used as an operand of an arithmetic operator, and the other operand must have a data type of DATE, TIME, or TIMESTAMP. For example:

CURRENT DATE + 3 YEARS + 6 MONTHS

This will add three and a half years to the current date.

Thursday, November 06, 2008

Data Driven: A Great New Book on Data Quality Issues

If you are at all involved in assuring the quality of your company’s data you need to know the work of Thomas C. Redman. Dr. Redman has been working on improving data quality for years and he has written numerous articles and books on the subject. His latest book, Data Driven: Profiting From Your Most Important Business Asset is another winner.

Redman offers the basic thesis of the book right there on page one, where he states “…bad data lie at the root of issues of international importance, including the current subprime mortgage meltdown, lost and stolen identities, hospital errors and contested elections.” After laying down the problem, the rest of the book tells us what we need to do to correct the problems.

Data Driven will help you to improve the methods you deploy for the care and feeding of your data and information; in other words, helping you to control and manage data using similar processes and controls that you deploy on your other assets (finances, people, structures, etc.) – a noble goal, indeed!

The writing is concise and snappy – you won’t get bored reading this book. The style is engaging and it is easy to read. For example, instead of just saying what to do and how to do it, which can be boring, Redman discusses many of the arguments people use to say that data quality is impossible, and then debunks them showing that data quality is possible, if approached properly and thoroughly.

There are many good ideas, charts, and graphs in Data Driven, too. One of my favorites is on page 54, where you can find a chart of the ten habits followed by those with the best data. If you buy this book, make a poster-sized photocopy of that page and hang it up on the wall of the break room and in the data folks’ cubicles. Maybe the habits will rub off on everyone as they gaze upon them everywhere.

But the best little gem in this wonderful book is the entirety of the last chapter, which is titled “The Next One Hundred Days.” In this chapter Dr. Redman offers what he calls a hundred-day panorama. It is not a grand plan because most will not have the depth of understanding required to create such a plan and have it succeed. Instead, the panorama strives for breadth, not depth, with a focus on quality. Diligent readers can follow the guidance in this chapter and thereby begin the long-term process of appreciating the importance of data quality on their business practices.

And that alone is worth the price of the book… but, of course, Data Driven offers much more and I recommend it to every IT and business professional whose job relies on accurate data.

Monday, November 03, 2008

On Date Formats, Part 2

Here is a follow-up question and answer based on my previous blog post:

Q: My format does not fit into any of the formats listed in the DB2 manuals. What if I have a DATE stored like YYYYMMDD (with no dashes or slashes) and I want to compare it to a DB2 date?

A: Okay, let's look at one potential solution to your problem (and then I want to briefly talk about the use of proper data types). First of all you indicate that your date column contains dates in the following format: yyyymmdd with no dashes or slashes. You do not indicate whether this field is a numeric or character field - I will assume that it is character. If it is not, you can use the CHAR function to convert it to a character string.

Then, you can use the SUBSTR function to break the character column apart into the separate components, for example SUBSTR(column,1,4) returns the year component, SUBSTR(column,5,2) returns the month, and SUBSTR(column,7,2) returns the day.

Then you can concatenate all of these together into a format that DB2 recognizes, for example, the USA format which is mm/DD/yyyy. This can be done as follows:

SUBSTR(column,5,2) || "/" || SUBSTR(column,7,2) || 
                    "/" || SUBSTR(column,1,4)

Then you can use the DATE function to convert this character string into a DATE that DB2 will recognize. This is done as follows:

DATE(SUBSTR(column,5,2) || "/" || SUBSTR(column,7,2) || 
                   "/" || SUBSTR(column,1,4))

The result of this can be used in date arithmetic with other dates or date durations. Of course, it may not perform extremely well, but it should return the results you desire.

Now, a quick word about using proper data types. I say this all of the time, but there are many applications and implementations "out there" that do not heed the advice: it is wise to use the DATE data type when you store dates in DB2 tables. It simplifies life later on when you want to do things like formatting dates and performing date arithmetic.

Using the appropriate data type also ensures that DB2 will perform the proper integrity checks on the columns when data is entered, instead of requiring application logic to ensure that valid dates are entered.

Wednesday, October 29, 2008

On Date Formats

Regular readers of my blog know that from time to time I use the blog as a forum to answer questions I get via e-mail. Today, we address a popular theme - dealing with DB2 date data...

Q:I have a DATE column in a DB2 table, but I do not want it to display the way DB2 displays it by default. How can I get a date format retrieved from a column in a table from DB2 database in the format MM/DD/YYYY?

A:The simplest way to return a date in the format you desire is to use the built-in column function CHAR. Using this function you can convert a date column into any number of formats. The specific format you request, MM/DD/YYYY, is the USA date format. So, for example, to return the date in the format you requested for a column named START_DATE you would code the function as follows:

CHAR(START_DATE,USA)

The first argument is the column name and the second argument is the format. Consult the following table for a list of the date formats that are supported by DB2.

Name	Layout	Example
ISO	yyyy-mm-dd	2002-10-22
USA	mm/dd/yyyy	10/22/2002
EUR	dd.mm.yyyy	22.10.2002
JIS	yyyy-mm-dd	2002-10-22
LOCAL	Locally defined layout	N/A

You may also have an installation-defined date format that would be named LOCAL. For LOCAL, the date exit for ASCII data is DSNXVDTA, the date exit for EBCDIC is DSNXVDTX, and the date exit for Unicode is DSNXVDTU.

Of course, this is a simple date question... I will follow-up with some additional date-related questions and answers in my next couple of blog posts.

Wednesday, October 22, 2008

Bad Standards

Just started a new series on bad standards over on my Data Management Today blog.

Check it out when you get a chance and share your favorite "bad standards" either here or there... or by e-mailing me.

Monday, October 20, 2008

DBA Rules of Thumb

Database administration is a very technical discipline, but it is also a discipline in which the practitioner is very visible politically within the organization. As such, DBAs should be armed with the proper attitude and knowledge before attempting to practice the discipline of database administration.

Just as important as technical acumen, though, is the ability to carry oneself properly and to embrace the job appropriately. With this in mind, I wrote a series of blog entries on DBA Rules of Thumb over at my Data Management Today blog... and I thought the information I wrote there may be helpful to my DB2 and mainframe readership here, so I'm sharing the eight rules of thumb (with links) here on my DB2 Portal blog:

What do you think? Did I miss anything important?

P.S. Just a reminder that I will be presenting a webinar on assuring DB2 recoverability with my colleague, Michael Figaro, this Thursday, October 23, 2008 at 10:30 Central time. If you are at all interested in the topic, be sure to register today - and attend this Thursday!

Thursday, October 09, 2008

Assuring the Recoverability of Your DB2 Databases

Availability requires much more than just having a reliable hardware and database platform. Most companies cannot afford significant downtime, and some cannot afford any! As such, it is crucial for unplanned outages to be as short as possible. But it is not just a business requirement, in many cases assuring a speedy recovery is also a legal mandate. Regulations such as SOX and Basel II dictate that any outage is resolved within a predefined period of time.

But how many of us can answer, with any degree of certainty, the question “How long will this outage last?” There are many variables that need to be considered when estimating a DB2 recovery time: backups available, quality, point-in-time requirements, amount of log processing, disk speed, tape mounts, and on and on and on...

With these thoughts in mind, Michael Figaro and I will be delivering a webinar titled Assuring the Recoverability of Your DB2 Databases, on Thursday, October 23, 2008 at 10:30 am CDT.

We will tackle issues ranging from regulations, IT complexity, and business continuity, to DSNZPARMs and backup/recovery planning. We’ll also make the case that planning for database recoverability is the most important task among the many tasks of the DBA.

As part of the webinar we will introduce and demonstrate Recovery AssuranceExpert, a new technology to help you ensure that all of your critical DB2 objects are recoverable within your recovery time objectives. Recovery AssuranceExpert is an automated solution to perform daily health checks of data availability and recoverability, as well as provide actual recovery times required for a DB2 object, a complete application, or even a whole DB2 subsystem. Join us on October 23rd to find out how you can insure that your actual recoverability times fit into your SLAs.

Tuesday, September 30, 2008

A Perfect Storm?

There is something of a perfect storm brewing in the world of data today. The world is becoming more automated, more connected, more wireless, and more complex. The next wave of database administration is intelligent automation. I refer to this as implementing software scrubbing bubbles that “work hard, so you don’t have to.” (Remember that commercial!)

As more of the tasks required of DBAs become more automated, the DBA will be freed to expand into other areas. So one front on this storm is the autonomic computing initiatives that automate DBA tasks. At the same time, IT professionals are being asked to know more about the business instead of just knowing the technology. So DBAs need to understand the business purpose and definition of the data they manage, as well as the technological underpinnings of the DBMS. The driving force here is predominantly regulatory compliance. This second front of the perfect storm will cause DBAs to work more closely with metadata to drive database archiving, data auditing, and security to ensure their organization complies with regulations like Sarbanes-Oxley, HIPAA, and others.

Regarding the wireless aspect of things, pervasive devices (PDA, handhelds, cell phones, etc.) will increasingly interact with database systems. DBAs will need to get involved there to ensure successful data synchronization. And database systems will work with disconnected data seamlessly, such as data generated by RFID tags.

Yet another big database trend is technology "suck." By that I mean the DBMS is as it sucks up technologies and functions that previously required you to purchase separate software. Remember when the DBMS had no ETL or OLAP functionality? Those days are gone. This will continue as the DBMS adds capabilities to tackle more and more IT tasks.

Another trend impacting DBAs will be a change in some of their roles as more and more of the recent DBMS features actually start being used in more production systems.

The net result of this perfect storm of changes is that data professionals are absolutely being required to do more... sometimes with less (less time, less money, less staff, etc.)

If you know the technology but are then required to know the business, this is doing more – much more. But the technology, in many cases, is also expanding. For example, DB2 9 incorporates native XML. Most DBAs are not XML savvy, but increasingly they will have to learn more about XML as the DBMS technology expands. And this is just one example.

Additionally, data is growing at an ever-increasing rate. Every year the amount of data under management increases (some analysts peg the compound annual rate of data growth at 125%) and in many cases the number of DBAs to manage that growing data is not increasing, and indeed, could be decreasing.

And, budgetary limitations can cause DBAs to have to do more work, to more data, with less resources. When a company reduces budget but demands more work, automation is an absolute necessity. Turning work over to the computer can help (although it is unlikely to solve every administrative issue). Sometimes IT professionals fight against the very thing they excel in – that is, automating work. If you think about it, every computer program is written to automate someone’s work – the write (word processing), the accountant (financials, payroll, spreadsheets), and so on. This automation did not put the executives whose work was automated out of a job; instead it made them more efficient. Yet, for some reason, there is a notion in the IT industry that automating IT tasks will eliminate jobs. You cannot automate a DBA out of existence – but you can make that DBA’s job more effective and efficient with DBA tools and autonomic computing.

And the sad truth of the matter is that there is still a LOT more than can, and should, be done in most companies. We can start with better automation of DBA tasks, but we shouldn't stop there!

Corporate governance is hot – that is, technologies to help companies comply with governmental regulations. Software to enable archiving for long-term data retention, auditing to determine who did what to which piece of data, and security to better protect data are all hot data technologies right now. But database security need to be improve and technologies for securing and auditing data need to be more pervasively implemented.

Metadata is increasing in importance. As data professionals really begin to meld together technology and business, they find that metadata is imperative. But most organizations do not have a metadata repository fully-populated and up-to-date that acts as a lexicon for business data.

And finally, something that isn’t nearly hot enough is data quality and integrity. Tools, processes, and database options that can be used to make data more accurate and reliable are not implemented appropriately with any regularity. So the data stored in our corporate databases is suspect. According to Thomas C. Redman, data quality guru, poor data quality costs the typical company at least ten percent (10%) of revenue. That is a significant cost! Data quality is generally bad in most organizations – and more needs to be done to address that problem.

With all of these thoughts in mind, are you prepared to weather this perfect storm?

Tuesday, September 23, 2008

Who Did What to Which Data When... and How?

As the list of government regulations impacting IT grows organizations must adapt to understand and comply with new rules. This increasing compliance pressure is particularly intense on data stored in corporate databases. As such, organization need to be ever more vigilant in the techniques used to protect their data, and monitor access.

Database auditing, sometimes called data access auditing, is one technique growing in popularity as a response to the demands of regulatory compliance. At a high level, database auditing is basically a facility to track the use of database resources and authority. It can be used to help answer questions like “Who accessed or changed data?” and “What was actually changed?” and “When did it change?”

But how you implement your database auditing, especially in a mainframe environment, will have a significant impact on not just "the completeness" of what you capture in the audit trail, but on the performance and availability of your entire environment.

Join me on Wednesday, September 24, 2008 at 10:30 am, Central Daylight Time, for a free webinar where I will discuss the issues and requirements driving database auditing. This presentation can help to serve as a roadmap of sorts for your data access auditing needs.

Monday, September 22, 2008

What Every Good CIO Needs to Know About Mainframe Database Auditing

Mainframe Executive magazine just published my article on mainframe database auditing. Click here to read all about it: What Every Good CIO Needs to Know About Mainframe Database Auditing.

Friday, September 05, 2008

The Most Important Thing is Recoverability

I know that many readers will question the title of this blog posting. But it is true. Oh, many DBAs think that managing performance is the most important thing they do, but they are confusing frequency with importance. Yes, many are managing performance more often than building backup plans – and they better be managing performance more frequently than they are actually recovering their databases or their company has big problems!

Anyway, why do I place recoverability at the very top of the DBA task list? Well, if you cannot recover your databases after a problem then it won’t matter how fast you can access them, will it? Anybody can deliver fast access to the wrong information. It is the job of the DBA to keep the information in our company’s databases accurate, secure, and accessible.

So what do we need to do to assure the integrity of our database data? First we need to understand the availability needs of our data in terms of the business. In the event of a failure how rapidly must we be able to recover from that failure? Keep in mind that the failure could be either physical, such as a failed disk drive, or logical, such as applying the wrong input to a process which corrupts the database.

Only after we know the impact to the business can we develop an appropriate backup and recovery plan. We need service level agreements (SLAs) for recovery just like we have SLAs for performance. The recovery SLA, or Recovery Time Objective (RTO) needs to be from an application perspective, such as “Time to restore application availability after a failure for application X cannot exceed 2 hours (or 10 minutes or …)”

To create effective RTOs you must be able to answer the question “What is the cost of not having this data available?” When we know the expectations of the business we can work to create a backup and recovery plan that matches the requirements. There are multiple techniques and methods for backing up and recovering databases. Some techniques, while more costly, can enhance availability by recovering data more rapidly.

It is imperative that the DBA team creates an appropriate recovery strategy for each database object. This requires mapping database objects to applications so we can adopt the proper strategy in accordance with RTOs. Some database objects will participate in multiple applications, and their recovery strategy will therefore be more complex.

Not all data is created equal. Some of your databases and tables contain data that is necessary for the core of your business. Other database objects contain data that is less critical or easily derived from other sources. Armed with this information -- and our RTOs -- a DBA can create a recovery plan that matches the needs of the business.

Establishing a reasonable backup schedule requires you to balance two competing demands: the need to take image copy backups frequently to assure reasonable recovery time, while at the same time dealing with the need to take image copies infrequently so as not to interrupt daily business. All the while keeping in mind, if you make fewer image copies you will need to apply more log records during the recovery, and the recovery will take longer. The DBA must balance these competing objectives based on SLAs, usage criteria, and the capabilities of the DBMS.

When was the last time you re-evaluated and tested your backup and recovery plans? Oh, you may have looked at disaster plans, but have you examined your ability to recover locally? Do you know how long it would take to recover your most important primary customer tables, for example, if you took a hit in the middle of the day?

Regular recoverability health checking should be a standard documented responsibility for every DBA staff; and if you can acquire software to automate the health-check process, all the better.

SEGUS offers a nice option for checking the recoverability of your DB2 databases called Recovery AssuranceExpert. Using this automated tool you can monitor the recoverability of your DB2 environment including DB2 settings (such as DB2 logging, buffer pools, DSNZPARMs), recovery prerequisites, recovery service levels, and recover time objectives for your database objects.

When was the last time you tested recovery? Are you sure you can recover your DB2 databases within a satisfactory timeframe? Wouldn’t you sleep better if you had a methodology and process in place for doing so? I know I would…

Thursday, September 04, 2008

Database Performance and Row Size

Recently I was reading through some posts in a database-related newsgroup or mailing list (actually, right now I can't remember which one it was). The conversation I was reading was in response to a question like "Does the number of columns or size of the row matter in terms of performance?"

Actually, the question asked what kind of a performance impact might be expected if a query was issued against two similar tables. The first table had (say) 20 columns, and the second table had the same 20 columns, as well as 35 additional columns.

Well, most of the basic responses were similar. The consensus was that as long as the query was going against the same columns then performance should be about the same. I disagree. Here is why.

You also need to factor in the I/O requests that are required to return the data. The DBMS will perform I/O at the block (or page) level - this is so whether you return one row or millions of rows. For multi-row results, accessing data from the table with the wider row (more columns) will usually be less efficient. This is so because fewer rows will exist on each page (the row with 100 columns is smaller than the row with 150 columns so more rows can reside in a single, pre-sized block/page). The bigger the result set, the more pronounced the performance degradation can be (because more physical I/Os are required to retrieve the data).

Think about it this way. Is it faster to pull smaller peaches out of a basket than bigger peaches? That is about the same type of question and anybody can envision the process. Say you want 100 peaches. Small peaches fit 25 per basket; big peaches fit ten per basket. To get 100 small peaches you'd need to pull 4 baskets from the shelf. To get 100 big peaches you'd need to pull 10 baskets from the shelf. The second task will clearly take more time.

Of course, the exact performance difference is difficult to calculate - especially over an online forum and without knowledge of the specific DBMS being used. But there will, more than likely, be a performance effect on queries when you add columns to a table.

Wednesday, September 03, 2008

When Not to Index

Answering a question I got via e-mail on indexing...

Every now and then I take the opportunity to blog about a question I get through e-mail. This time the question was:

When does it make more sense not to build an index for a DB2 table?

I'll attempt to answer this question for any SQL DBMS, not just for DB2:

First of all, this is a very open-ended question, so I will give a high-level answer. Let's start by saying that most of the time you will want to build at least one - and probably multiple - indexes on each database table that you create. Indexes are crucial for optimizing performance of SQL access. Without an index, queries must scan every row of the table to come up with a result. And that can be very slow.

OK, that being said, here are some times when it might makes sense to have no indexes defined on a table:

When all accesses retrieve every row of the table. Because every row will be retrieved every time you want to use the table an index (if used) would just add extra I/O and would decrease, not enhance performance. Though not extremely common, you may indeed come across such tables in your organization.
For a very small table with only a few pages of data and no primary key or uniqueness requirements. A very small table (perhaps 10 or 20 pages) might not need an index because simply reading all of the pages is very efficient already.
When performance doesn't matter and the table is only accessed very infrequently. But, hey, when do you ever have those type of requirements in the real world?

Other than under these circumstances, you will most likely want to build one or more indexes on each table, not only to optimize performance, but also to ensure uniqueness, to support referential integrity, and perhaps to drive data clustering.

Of course, indexes do not come without cost. Indexes take up disk space and adding a lot of indexes will consume disk space. For some DBMS products, adding many indexes can impact the working set size and perhaps raise memory problems. Additionally, although indexes speed up queries they degrade inserts and deletes, as well as any modification to indexed columns.

What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!

Friday, August 22, 2008

Upcoming Webinar on Data Breaches and Databases

Anyone who has been paying attention lately knows at least something about the large number of data breaches that have been in the news. Data breaches and the threat of lost or stolen data will continue to plague organizations until comprehensive plans are enacted to combat them. Although many of these breaches have not been at the database level, some have, and more will be unless better data protection policies and procedures are enacted on operational databases.

If you are interested in this topic I will be conducting a free webinar titled Data Breach Protection: From a Database Perspective on Wednesday, August 27, 2008 at 10:30 am CDT. This presentation will provide an overview of the data breach problem, providing examples of data breaches, their associated cost, and series of best practices for protecting your valuable production data.

This webinar offers you the opportunity to:

Understand the various laws that have been enacted to combat data breaches and the trends toward increasing legislation
Learn how to calculate the cost of a data breach based on industry best practices and research from leading analysts
Gain knowledge of several best practices for managing data with the goal of protecting the data from surreptitious or nefarious access (and/or modification)
Learn about the available techniques for securing, encrypting, and masking data to minimize exposure of critical data
Uncover new data best practices for auditing access to database data and for protecting data stored for long-term retention

Hope to see you on-line next Wednesday!

Monday, July 28, 2008

Selecting Every Other Row

One of the fun things about publishing is getting questions from readers that make you think. A recent question I received went something like this: "Can I get the odd and even number of rows from a DB2 table?"

Well, my first reaction was to think "this guy doesn't understand the way a SQL DBMS like DB2 works." The data in DB2 tables is not ordered, so there is no way to guarantee that the rows are odd or even numbered. While that observation may (or may not) have been true, it didn't help the guy. So I thought about it and came up with a possible work-around solution.

The first thing we have to do is to mimic row numbers in DB2. Until V9, DB2 did not support the row number construct (such as you can find in Oracle), and we'd like this to work for the versions in support today (V8 and V9).

So, to do this we start by using the COUNT(*) function and a table expression. A table expression is when you substitute SQL in place of the table in the FROM clause of another SQL statement. For example, consider this SQL:

SELECT  DEPTNO, ROWNUM
FROM DSN8810.DEPT A,
    TABLE (SELECT COUNT(*) + 1 AS ROWNUM
           FROM DSN8810.DEPT B
           WHERE B.DEPTNO < A.DEPTNO) AS TEMP_TAB;

That puts a pseudo-row number on the table that we can access in our SQL predicates. If, say, we only want to return the even results, we could write the following query:

SELECT  DEPTNO, ROWNUM
FROM DSN8810.DEPT A,
    TABLE (SELECT COUNT(*) + 1 AS ROWNUM
    FROM DSN8810.DEPT B
    WHERE B.DEPTNO < A.DEPTNO) AS TEMP_TAB
WHERE MOD(ROWNUM,2) = 0
ORDER BY ROWNUM;

The MOD function returns the remainder of dividing the second argument into the first. So, if the remainder is zero, we have an even number. So, this query returns every other row to the result set. If you want the odd rows only, change the predicate with the MOD function to this:

WHERE MOD(ROWNUM,2) <> 0

Of course, there is no guarantee that the same exact rows will be even (or odd) for subsequent executions of this query. It all depends how DB2 optimizes the query for execution. But it does provide a nice way to produce samples of the data (perhaps to populate a test bed of data).

Thursday, July 24, 2008

Free Webinar - Database Auditing for DB2 z/OS - July 29, 2008

Protecting corporate data is a requirement of doing business in today's regulatory and security-minded business environment. Protecting corporate data -- an especially sensitive data -- is a matter of knowing who is accessing data and what are they doing with it. There have been many solutions for addressing this need on distributed databases, but no reasonable solution for protecting mainframe data until now.

Learn all about an exciting new solution for auditing your DB2 for z/OS databases and resources - Guardium for Mainframes - at this free webinar on July 29, 2008.

Guardium for Mainframes provides 100% visibility into mainframe database activities without impacting normal business operations. This webinar will show you how to get better insight into database activity without the performance penalty of typical database trace utilities and without relying on inadequate log file data.

I'll be introducing the webinar and giving a quick overview of the issues, and Bill Baker, a senior software consultant with NEON Enterprise Software, will walk through a demonstration of the Guardium for Mainframes in action!

Monday, July 21, 2008

New Data Sharing RedPaper

Just a quick FYI today to let you know about a new RedPaper offering information about exploiting client load balancing and fail over capabilities across a DB2 data sharing group (or a subset of the group members).

A RedPaper is sort of like a tip, only longer... and sort of like a RedBook, only shorter... Anyway, if you are interested in the topic, the RedPaper can be donwloaded for free by following this link:

DB2 9 for z/OS Data Sharing: Distributed Load Balancing and Fault Tolerant Configuration

Monday, July 07, 2008

A Video Interview on Long-term Retention

When I spoke at the Techxans event in Houston this past May (2008) I was interviewed beforehand on what my presentation would cover. And lo' and behold, the Techxans folks have put that interview up on YouTube, so I thought I'd share it here with my regular blog readers. Enjoy!

Wednesday, June 25, 2008

No Alphabetic Characters Wanted

Here is a question that was posed to me recently:

Q: We have a CHAR(10) column that cannot contain alphabetic characters. How can we make sure that the letters A thru Z are not allowed.

A: Well, think about the characteristics of alphabetic characters versus the other "things" that can be stored in a CHAR column. One thing that separates an alphabetic letter from numbers, punctuation, etc. is that there are upper and lower case versions (e.g. A, a). So, you could use the following predicate to preclude alphabetic characters from being accepted:

LOWER(:string) = UPPER(:string)

Of course, you will not be able to put this into a CHECK constraint because of restrictions on their content (for example, you cannot use function in a CHECK constraint). But you could use this in SQL statements and as a check in your programs before allowing data to be inserted or modified in your CHAR(10) column.

Anyone else have any other ideas?

Monday, June 16, 2008

IBM Rules the Middleware Roost

Have you seen Gartner's latest report on the middleware market?

The Gartner middleware market numbers were reported in a recent article in eWeek. Evidently, the worldwide application infrastructure and middleware software market revenue totaled $14.1 billion in 2007, a 12.9 percent increase from 2006 revenue of $12.5 billion.

Now that is quite healthy growth in what is a somewhat slow market. And right there at the top of the pile is IBM with a 28.9 percent share of what Gartner identifies as the AIM market...BEA Systems came in second with 9.3 percent of the market, followed by Oracle with 8.5 percent. However, Oracle now owns BEA and will benefit from BEA's market share (next year).

Oracle will likely continue its acquisitive ways, but IBM has not been silent on the acquisition front lately either. So I'm guessing that next year IBM will retain its #1 position with Oracle coming in solidly at #2.

For 2007, though, in terms of growth, Microsoft and Software AG posted impressive gains. Among the big enterprise software vendors, Microsoft came in at 41.6 percent revenue growth year over year. And Software AG showed strong growth with a 107 percent increase from 2006.

This is a market segment, like database software, where a small number of big players own most of the market. However, it is not quite as monopolized as the database market where three players (IBM, Oracle, and Microsoft) dwarf the rest of the field. The top five middleware vendors hold over 50 percent of the overall market and Gartner indicates that the big players are slowly eroding market share from the smaller vendors.

Monday, June 09, 2008

On The Road Again

I will been traveling extensively in June this year (2008). Last week I traveled to Phoenix to speak to the American Express Information Summit on the topic of regulatory compliance and its impact on data management and database administration. And I also spoke at the Los Angeles Area DB2 User Group on DB2 performance tuning and database trends.

This week (the second week of June) I will be traveling to Washington, DC to speak to the Baltimore-Washington DB2 User Group (BWDUG) on June 11th to deliver "The Impact of Regulatory Compliance on Database Administration." And then, later in the week, June 13th, I will be in Tampa to speak on the topic on database auditing to the Tampa Bay Relational User Group (TBRUG).

And the week after that I will be speaking to the Chicago chapter of (DAMA) on June 18th, on the topic of "Managing Data For Long Retention Periods."

So, if you are in one of the regions where I'll be speaking, I hope you can take the time to attend. And if not, you can always keep track of my speaking schedule on my web site at http://www.craigsmullins.com/speak.htm.

Wednesday, May 28, 2008

Database Archiving Trends and Best Practices

Just a short note to promote my upcoming webinar, this Friday, May 30, 2008 at 10:30 AM CST. The webinar is titled Database Archiving Trends and Best Practices and it will cover a variety of trends and issues that are contributing to the growing requirement within enterprises to archive database data for long-term retention and preservation.

I'll touch on trends such as regulatory compliance issues, e-discovery, operational performance improvement, and retiring legacy applications. After examining the forces driving the need to archive database data, we'll look at the requirements for implementing database archiving appropriately, and walk thru an example using TITAN Archive.

If your databases are bursting at the seams, your organization is experiencing compliance-related troubles and/or lawsuits, or you need to figure out how to sunset an old database application or two, this presentation will provide guidance, advice, and a workable template for you to follow.

I hope you can find the time to attend!

Monday, May 26, 2008

New IBM RedBook on Dimensional Modeling

Just a quick post today (Memorial Day in the USA) to inform you about a new IBM RedBook on dimensional modeling. If you are working with data warehousing applications, writing analytical queries, or in any way dealing with databases and dimensional models, this free RedBook is well worth downloading and reading.

It is titled Dimensional Modeling: In a Business Intelligence Environment. The book is not intended to be an academic treatise, but a practical guide for implementing dimensional models oriented specifically to business intelligence systems.

Particularly interesting are the case studies in Chapters 7 and 8 that walk you through BI implementations.

Download it today... and enjoy it at your leisure.