Monday, March 03, 2008

Q+A: Nulls and DASD

Recently, I received the following question, which I will answer today on this blog posting.

Question: Lets say I have a table A which has 500 columns. Out of those 500 columns only 5 columns have been defined as not nullable and the rest have been defined as NULLS allowed. And out of those 500 columns I have found that 300 columns are unused(empty) totally. My business allows me to remove those 300 columns. My doubt is if I remove those 300 empty columns will I save on DASD space occupied by DB2? Will empty columns occupy DASD space?Would be really helpful if you can guide me on this.

Answer: I'm happy to try to help out. First of all, the short answer to your question is "Yes!" Those 500 columns are all consuming valuable disk space. To determine how much space is being consumed, you will need to examine the data type and length assigned to each column and add them up. And to make matters worse, you must add an additional 1 byte to each of them because the columns are nullable.

In DB2, a NULL is stored using a special one-byte null indicator that is "attached" to every nullable column. If the column is set to NULL, then the indicator field is used to record this. Using NULL will never save space in a DB2 database design - in fact, it will always add an extra byte for every column that can be NULL. The byte is used whether or not the column is actually set to NULL.

So, a column defined as CHAR(5) NOT NULL will required five bytes of storage space - but if it is defined as nullable, then it requires six bytes of storage space - five bytes for the data, and one byte for the null indicator.

Given all of this, it would seem that there is a very viable case to be made for you to remove those columns that are not being used. Of course, this means that you will likely have to make changes to any programs accessing that table. Because the table definition will change (fewer columns) you will need new DCLGENs and those will have to be included and bound into your programs. Be sure to factor this additional workload into your planning before moving forward with this change.

The better question to ask is "How the heck did all of the empty columns get put into the table to begin with and how did that design get past the DBAs?"

If you have an answer for that one, please share it by posting your answer in a comment here!

Wednesday, February 20, 2008

Let's Hear It for COBOL!

I know that this blog is supposed to be primarily about DB2, but I like to sneak in mainframe-related topics from time to time. And I read a great article today in Computerworld that I want to share with you. The article, titled Confession of a COBOL Programmer, talks up the need for COBOL coders. Just like with other mainframe technologies, as the baby boomers retire there is an insufficient supply of newbies available to step in and continue the care and feeding of the COBOL legacy.

I've written briefly about COBOL before, in my Data Management Today blog. COBOL is still all over the place and in no danger of dying off. According to the Computerworld article, 75% of the world's businesses data is still processed in COBOL, and about 90% of all financial transactions are in COBOL.

Yet there is a lingering perception "out there" that COBOL is dead (or at least dying). And as far as graduating seniors and new programmers are concerned, COBOL ain't cool! New programmers don't want to learn it and most universities don't teach it in their computer science or information science curricula. Just like the mainframe (which is alive and well, too), COBOL is ignored and a big problem is developing.

Analysts at Gartner estimate that there are 180 billion lines of COBOL code in existence and about 90,000 COBOL programmers. To convert all of that to something else "each programmer will require 100,000 hours to complete the conversion of 2 million lines. That works out to 12,500 eight-hour workdays. If we figure 250 workdays per year (though it’s unlikely any Cobol programmers are settling for just two weeks of vacation per year), these guys should be done in 50 years."

Who knows, when I retire (sometime in the far-off future) maybe I'll hang up a shingle and offer my services as a COBOL coder... after all, that is what I started out doing right out of college (all those years ago)...

Tuesday, February 05, 2008

Intelligent Automatic Reoptimization? [DB2 9 for z/OS]

Most seasoned SQL programmers know that when you use host variables, DB2 may not always come up with the absolutely most optimal access path at bind time. Without knowing the actual values of the host variables the optimizer has to make some best guesses as to how best to satisfy the SQL request.

Furthermore, we know that we can guide DB2 on how best to approach this situation using the REOPT parameter of the BIND command. Prior to DB2 V9, there were three options for REOPT:

REOPT(NONE) – DB2 will not reoptimize SQL at run time.

REOPT(ALWAYS) – DB2 will prepare SQL statements again at run time when the host variable values are known. This enables the DB2 optimizer to formulate the query execution plan using the actual host variable values, which can result in better performing access paths.

REOPT(ONCE) – DB2 will prepare SQL statements only once, using the first set of host variable values, no matter how many times the statement is executed by the program. The access path is stored in the Dynamic Statement Cache (DSC) and will be used for all subsequent executions of the same SQL statement. REOPT(ONCE) only applies to dynamic SQL statements and is ignored if you use it with static SQL statements. This option was introduced in DB2 V8.

What is New in V9?

DB2 9 for z/OS introduces a new REOPT option: REOPT(AUTO). The ideas behind REOPT(AUTO) is to come up with the optimal access path in the minimum number of prepares.

The basic premise of REOPT(AUTO) is to re-optimize only when host variable values change. Using this option, DB2 will examine the host variable values and will generate new access paths only when host variable values change and DB2 has not already generated an access path for those values.

REOPT(AUTO) only applies to dynamic statements that can be cached.

After migrating to DB2 9, consider re-evaluating programs bound specifying REOPT(ALWAYS) and REOPT(NONE). In many cases, switching to REOPT(AUTO) from REOPT(ALWAYS) can produce performance improvement; and in some cases you can use re-optimization with REOPT(AUTO) for programs bound REOPT(NONE) because of the fear of too frequent re-optimization causing a performance hit.

In particular, consider specifying REOPT(AUTO) for SQL statements that at times can take a relatively long time to execute, depending on the values of parameter markers. In particular, you should especially consider doing this when parameter markers refer to non-uniform data that is joined to other tables.

Friday, January 25, 2008

Get Control of Access Path Changes for Dynamic SQL when Migrating to a New Version of DB2

Are you making plans to migrate to a new version of DB2? Do you know what impact the access changes for dynamic SQL will have on performance? When migrating to a new DB2 version, access path changes for dynamic SQL are unpredictable – as is the impact those changes will have on application performance. Learn how you can use Bind ImpactExpert to eliminate the unwanted surprises in version migration by performing a “precheck” on dynamic SQL access path changes.

This webinar will be presented by myself (Craig Mullins) and Joe Brockert, Sr. Software Consultant for NEON Enterprise Software. We'll discuss the issues associated with dynamic SQL during a DB2 migration and offer a live demo of Bind ImpactExpert. Join us to see the solution that provides predictability in access path changes.

Enroll by clicking on this link.

Wednesday, January 16, 2008

Q+A: Locking

I get questions e-mailed to me all the time. Although I try to read and answer them all, sometimes I don't. I hope those whom I've ignored over the years will forgive me, but I can't always answer everything (not enough time/energy) and sometimes things get lost or drop through the cracks.

Anyway, at times I will take a question I get and blog about it in Q+A format. Today is one of those days!

The question was: I want to perform a retry on an INSERT under DB2 Z/OS when I get a deadlock/timeout. -911 causes a rollback automatically. Is there a ZPARM or other method of turning this off? I am inserting millions of rows and do not want a rollback to the last commit point.

Here is my answer:

Well, first of all, let me recommend that you minimize the size of your unit of work. If you are inserting millions of rows without a COMMIT you are likely causing locking issues in your environment. The pages you have locked while you are waiting for your millions of inserts to finish are all unavailable to any other user of the table (assuming page locking). That means any data on any page that you have locked cannot be read by anyone else until your unit of work is committed. Any other user, running at the same time as you are, trying to get to any page you have modified, would be getting -911 too.

That being said, you can control whether or not the work is rolled back automatically in CICS (on a thread basis) using an RDO parameter (or RCT if on an ancient CICS). The parameter is called ROLBE (RCT) or DROLLBACK (RDO). If it is set to YES a CICS SYNCPOINT ROLLBACK is issued and a -911 SQLCODE is returned to the program. If NO is coded, a CICS SYNCPOINT ROLLBACK is not issued and the SQLCODE is set to -913. You will have to programmatically either specify COMMIT or ROLLBACK for the unit of work.

In a batch environment you will need to code your programs to periodically issue COMMITs after so many modifications (or using some other method like a timer or loop counter). There is no method I am aware of to automatically control this behavior outside of looking into a third party product (for example, Softbase Checkpoint Restart, and others).