Tuesday, November 21, 2006

Character Versus Numeric Data Types

Most DBAs have faced the situation where one of their applications requires a four-byte code that is used to identify products, accounts, or some other business object, and all of the codes are numeric and will stay that way. But, for reporting purposes, users or developers wish the codes to print out with leading zeroes. So, the users request that the column be defined as CHAR(4) to ensure that leading zeroes are always shown. But what are the drawbacks, if any, to doing this?

Well, there are drawbacks! Without proper edit checks, INSERTs and UPDATEs could place invalid alphabetic characters into the alphanumeric column. This can be a very valid concern if ad hoc data modifications are permitted. Although it is uncommon to allow ad hoc modifications to production databases, data problems can still occur if all of the proper edit checks are not coded into every program that can modify the data. But let's assume (big assumption) that proper edit checks are coded and will never be bypassed. This removes the data integrity question.

There is another problem, though, that is related to performance and filter factors. Consider the possible number of values that a CHAR(4) column and a SMALLINT column can assume. Even if programmatic edit checks are coded for each, DB2 is not aware of these and assumes that all combinations of characters are permitted. DB2 uses base 37 math when it determines access paths for character columns, under the assumption that 26 alphabetic letters, 10 numeric digits, and a space will be used. This adds up to 37 possible characters. For a four-byte character column there are 3**74 or 1,874,161 possible values.

A SMALLINT column can range from -32,768 to 32,767 producing 65,536 possible small integer values. The drawback here is that negative codes and/or 5 digit codes could be entered. Both do not conform to the 4 digit maximum. However, if we adhere to our proper edit check assumption, the data integrity problems will be avoided here, as well.

DB2 will use the HIGH2KEY and LOW2KEY values to calculate filter factors. For character columns, the range between HIGH2KEY and LOW2KEY is larger than numeric columns because there are more total values. The filter factor will be larger for the numeric data type than for the character data type which may influence DB2 to choose a different access path. For this reason, favor the SMALLINT over the CHAR(4) definition.

The leading zeroes problem should be able to be solved using other methods. It is not necessary to store the data in the same format that users wish to display it. For example, when using QMF, you can ensure that leading zeroes are shown in reports by using the "J" edit code. Other report writes offer similar functionality. And report programs can be coded to display leading zeroes easily enough by moving the host variables to appropriate display fields.

In general, it is wise to choose a data type which is closest to the domain for the column. If the column is to store numeric data, favor choosing a numeric data type: SMALLINT, INTEGER, DECIMAL, or floating point. Same goes for temporal data (that is, choose DATE, TIME, or TIMESTAMP instead of a character or numeric data type). In addition, always be sure to code appropriate edit checks to ensure data integrity - but remember, fewer need to be coded if you choose the correct data type because DB2 automatically prohibits data that does not conform to the data type for each column.

Wednesday, November 08, 2006

DB2 Access Paths and Change Management Procedures

No one working as a DB2 DBA or performance analyst would deny that one of the most important components of assuring efficient DB2 performance is making sure that DB2 access paths are appropriate for your DB2 programs. Binding programs with EXPLAIN(YES) specified is important to ensure that we know what access paths DB2 has chosen for each SQL statement. Without the information that EXPLAIN puts in the PLAN_TABLE we would be "flying blind."

Anyway, programs need to be rebound periodically to ensure that DB2 has forumlated access paths based on up-to-date statistics and to ensure that you are taking advantage of all the latest and greatest DB2 optimizer features. Whether you are implementing changes into your DB2 applications, upgrading to a new version of DB2, or simply trying to achieve optimum performance for existing application plan and packages, an exhaustive and thorough REBIND management process is a necessity.

But not every organization does what is necessary to keep access paths up-to-date with the current state of their data. Oh, there are always reasons given as to why the acknowledged “best practice” of REORG/RUNSTATS/REBIND is not followed religiously. But these reasons are not always reasonable - especially when there are approaches to overcome them.

But let's approach this subject from another perspective: that is, from a change management procedures perspective. On the mainframe, change has traditionally been strictly controlled. But one exception has been DB2 access paths. In a mainframe shop everything we do is tightly controlled. If we make even a minor change to an application program, that program is thoroughly tested before it ever reaches a production environment. The program progresses through unit testing, QA testing, volume testing, and so on. As developers, we do a good job of testing a change to minimize the risk that the change might have unintended consequences.

We do the same type of due diligence with most other changes in the mainframe world. Database changes are planned and thoroughly tested. Many shops use change manager software to ensure that database changes are implemented appropriately and in a controlled manner. System software (e.g. CICS, WebSphere, etc.), including subsystem and DB2 changes, are all subject to strict change control procedures. This is done to minimize disruption to the production work being conducted by our business folks.

But, if you think about it, there is one exception to this tight change control environment: Binds and Rebinds are typically done in the production environments without the benefit of oversight or prior testing. This lack of change control results in unpredictable performance impacts. In most shops, programs are moved to production and bound there. Indeed, we are at the mercy of the DB2 optimizer, which generates access paths on the fly when we Bind or Rebind our programs. Any issues with inefficient access paths are then dealt with in a reactive mode. That is, problems are addressed after the fact.

One of the biggest reasons for not implementing strict change control processes for access paths is the lack of built-in methods for ensuring access path change control discipline. Let’s face it, manually evaluating thousands of packages and tens of thousands of SQL statements can be quite impractical. But there are things that can be done to help alleviate this problem.

Rebinding does not always produce DB2 performance improvements—and in some cases rebinding can cause DB2 performance to degrade. Typically, 75% to 90% of all rebinds are unnecessary. Bind ImpactExpert (from NEON Enterprise Software) manages the bind and rebind processes to assure optimal application performance by checking what access path changes will be before your rebind, and then only rebinding those programs where performance would improve. How does it do this?

Well, Bind ImpactExpert helps to eliminate the unpredictable results of rebinds that DBAs experience daily. By filtering out the rebinds that are likely to have a negative impact on DB2 performance, Bind ImpactExpert guarantees improved or consistent performing DB2 applications.

And one of the most popular features of Bind ImpactExpert is its ability to eliminate the unpredictability of rebinds when moving between DB2 releases. The EarlyPrecheck feature assists you in moving to new DB2 releases. The EarlyPrecheck feature allows you to evaluate and correct access path problems for both dynamic and static SQL prior to an actual DB2 version upgrade. You can perform access path evaluation weeks or months prior to the migration date to allow for in-depth analysis and pre-emptive correction of access path issues.

Bind ImpactExpert can also be stop unnecessary binds. It does this by determining whether a DBRM contains changed SQL and skipping the bind step for those that do not. This process eliminates unnecessary binds, optimizes and accelerates the compile procedure, reduces CPU usage, and reduces locks to the DB2 catalog.

All forward-thinking organizations should adopt a liberal Rebind process to ensure optimal access paths based on up-to-date statistics. Keeping abreast of data changes and making sure that your programs are optimized for the current state of the data is the best approach. This means regular executions of RUNSTATS, REORG, and Rebind.

So, if you are not rebinding your programs on a regular basis because you are afraid of degrading a few access paths, it is time to take a look at Bind ImpactExpert. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself. I mean, why would you want to penalize every DB2 program in your subsystem for fear that a program or two may have a few degraded access paths? Especially when NEON Enterprise Software offers an automated solution to the problem...

The bottom line is this: failing to keep your access paths aligned with your data is a sure recipe for declining DB2 application performance.

Monday, November 06, 2006

Try Out the XML Capability of DB2 9 for Free

Are you aware that there is a version of DB2 that you can use free of charge? It is called DB2 Express-C and it is basically IBM's way of removing price as being the issue in terms of you trying out and using DB2. Think of it as a way to use DB2 just like you would use an open source DBMS (except you don't get the source code).

According to IBM: DB2 Express-C is a version of DB2 Express Edition (DB2 Express) for the community. DB2 Express-C is a no-charge data server for use in development and deployment of applications including: XML, C/C++, Java, .NET, PHP, and more. DB2 Express-C can be run on up to 2 dual-core CPU servers, with up to 4 GB of memory, any storage system setup and with no restrictions on database size or any other artificial restrictions.

So, if you are wondering what it means for DB2 to support pureXML, then you might want to download DB2 Express-C and try it out for yourself. DB2 Express-C can run on AIx, HP-UX, Linux, Solaris, and Windows.

Now why would I be writing about a LUW product on a z/OS blog? Well, DB2 9 for LUW and DB2 9 for z/OS both support pureXML in the same way. So even if you are a DB2 for z/OS user, getting familiar with the XML support in DB2 Express-C can prepare you to help plan for how you might want to use XML in DB2 9 for z/OS when it becomes available.

Monday, October 30, 2006

New DB2 for iSeries Redbooks

I don't usually cover DB2 on the iSeries (you may still be calling it the DB2/400 or DB2 for the AS/400). But this week saw the release of two new redbooks on DB2 for iSeries, so I thought I'd mention them briefly here.

The first new redbook covers stored procedures, triggers, and user-defined functions, and it is called, appropriately enough, Stored Procedures, Triggers, and User-Defined Functions on DB2 Universal Database for iSeries. This redbook covers some of the announced features for stored procedures, triggers, and UDFs in V5R1, V5R2, V5R3, and V5R4. This redbook offers suggestions, guidelines, and practical examples on how to effectively develop DB2 for iSeries stored procedures, triggers, and UDFs.

The Ins and Outs of XML and DB2 for i5/OS is the second of the new DB2 for iSeries redbooks. As you might guess, this redbook focuses on XML. It covers the challenges of representing XML hierarchies in a relational DBMS, as well as provides an in-depth explanation of the most popular approaches to bridge the hierarchy / relational dilemma.

Also, if you are interested in XML and DB2 you might also want to check out the recently-published DB2 9 pureXML Guide, which covers pureXML in DB2 for LUW.

Friday, October 27, 2006

IBM Second in Software Sales

The San Francisco Chronicle reported today that IBM is officially the second largest software provider, bypassing former number two Oracle, but still behind number one Microsoft:

In its most recent quarter, IBM software generated $4.4 billion in revenue, and it's the company's main profit generator, with an 85 percent profit margin. Thirty-seven percent of IBM's profit comes from software. In 2005, it sold $15.8 billion worth of software, topping the $14.4 billion that Oracle posted in fiscal 2006, which ended June 30.

I find this news interesting. Often times IBM is excluded from software company discussions because they also sell hardward and services, so they are not always viewed as an ISV (independent software vendor). But I always thought this was unfair to IBM. After all, Oracle sells services, too. And Microsoft sells hardware, too - I mean, what is an XBOX if not hardware?

So give IBM their due... they are number two!