Thursday, August 31, 2006

Accessing Partitioned Data

One area that tends to confuse some DB2 developers until they gain experience is how DB2 partitioning works. A common question I get goes something like this: “If a table is in a partitioned table space, say four partitions, for example, then to process the table in batch can I run four instances of the batch program in parallel, one against each partition. What do I code to do this?”

Well, the short and sweet answer to this question is “Yes, you can run four instances of a batch program in parallel if you so desire.” But there is a nuance to this misconception that might be missed here. The question lurking beneath the question is this: “How can I make sure I am accessing only data in one partition in each of the batch programs?”

To do this requires some programming work. The program will need to have a means of identifying which partition it should run against. So, you might code the program to accept an input parameter of 1, 2, 3, or 4. The program would read the parameter and translate it into the key range of values that should be read by the program. This is the LIMITKEY value for the particular partition as found in the partitioning index. You can retrieve this value from the DB2 catalog using the following query:

SELECT PARTITION, LIMITKEY
FROM SYSIBM.SYSINDEXPART
WHERE IXNAME = ?
AND IXCREATOR = ?
ORDER BY PARTITION;

Supply the index name and creator and this query will return the partition number and LIMITKEY for that partition. (If you include this query in the program you probably will want to include the PARTITION column in the WHERE clause and return only a single row.) The LIMITKEY is the high key value for that partition. Using this information you will need to write the queries in the program such that only values from the partition being processes will be retrieved. As long as the program adheres to that key range you should only process data from the one partition that holds that data.

Of course, none of this is necessary to get DB2 to operate in parallel. The best approach uses DB2 query parallelism because it minimizes the amount of programming work and has the potential to maximize performance. To signal DB2 to turn on parallelism you will need to BIND your program specifying the DEGREE(ANY) parameter. Then DB2 will choose the degree of parallelism for the batch program. You will only need to run the program once (instead of 4 times as in our example); DB2 will figure out how many tasks it can run in parallel for each query in the program. And there is no need to modify the program at all! This is far simpler than any alternate approach because DB2 handles chunking up the work into parallel tasks for you.

Tuesday, August 29, 2006

How are Indexes Being Used?

In keeping with my promise to periodically post blog entries based on questions I have received, here we have another question I have been asked:

If I have five indexes on a table, what is the best way to determine if all, some or none of the indexes are being used?

Here is the answer I sent:

The best approach would be to make sure you have run EXPLAIN on all of your production plans and packages. Then examine the PLAN_TABLE output for those indexes. For example:

SELECT *
FROM my.PLAN_TABLE
WHERE ACCESSNAME IN (put your list of indexes here);

Of course, this will not show the dynamic SQL that uses any of these indexes. There are tools on the market that can help you to gather SQL usage statistics and access path information for both static and dynamic SQL statements.

You might consider acquiring one of these tools if you use a lot of dynamic SQL. One such offering is NEON Enterprise Software’s SQL Performance Expert solution.

Thursday, August 24, 2006

VARCHAR versus Compression

A couple of days ago I posted a blurb giving advice on using variable character columns in DB2. After thinking about the topic a little bit more, I decided to post a follow-on topic: namely, comparing the use of VARCHAR to DB2 compression.

Even though these are two entirely different "things," they are each probably done for similar reasons - to save disk storage. VARCHAR does this by adjusting the size of the column to fit the actual length of text being stored; compression does this by sending rows of data through an algorithm to minimize its length. For those interested in the details of compression I refer you to Willie Favero's excellent blog where he has written a several-part series on compression -- here are the links to it: part one, part two, and part three.

So, what advice can I give on comparing the two? Well, you might want to consider forgoing the use of variable columns and instead turn on compression. With variable columns you always add overhead: there is a two-byte prefix for every VARCHAR column to store the length of the VARCHAR. If instead you use CHAR and turn on compression you no longer need the extra two bytes per row per variable column.

Also, keep in mind that compression reduces the size of the entire row. So not only will you be compressing the CHAR column (that used to be VARCHAR), but you will also give DB2 the opportunity to compress every other column in that row.

All in all, that means that comrpession can return better disk storage savings than variable columns, and all without the programmatic overhead of having to calculate and store the two-byte prefix for each previously variable column.

Of course, I don't want to give the impression that this should always be done... (remember the DBA's maxim: Almost never say "always or never.") And there are additional things to consider, such as:
  • Compression adds a compression dictionary to the table space so a compressed table space can actually be larger than a non-compressed table space (if it is very small to begin with).
  • Compression requires additional CPU cycles to compress and de-compress the data as it is inserted, modified, and read (of course, I/O can decrease because smaller rows will fit more on each page, so degraded CPU performance can be offset by improved I/O)

This is just an additional "thing to consider" when you are building your DB2 databases and trying to decide whether you should use VARCHAR or CHAR...

Monday, August 21, 2006

IBM Mainframes - Not Just for Big Shops Any More

Just a quick blog today to point you to an interesting article in the latest issue of IBM Systems Magazine - Mainframe Edition. The article, titled A New System for a New Market, points out that the System z9 Business Class (z9 BC) platform, the latest mainframe in IBM's product line announced in April 2006, is suitable for the small and medium business (SMB) space.

This offering brings high performance and scalability to the SMB market at a very reasonable cost (around $100k). With specialty engines that can be added (IFL, zIIP and zAAP), again at a reasonable cost, it looks like the IBM mainframe will not only remain viable for large shops, but it could expand out into smaller ones, too.

So, as most mainframe afficianados know, the mainframe is not dead. But, it may actually be able to grow with the new features and affordability being built into IBM's new mainframes.

Sunday, August 20, 2006

Advice on Using Variable Character Columns in DB2

One of the long-standing, troubling questions in DB2-land is when to use VARCHAR versus CHAR. The high-level advice for when to use VARCHAR instead of CHAR is for larger columns whose length varies considerably from row-to-row. Basically, VARCHAR should be used to save space in the database when your values are truly variable.

In other words, if you have a 10-byte column, it is probably not a good idea to make it variable... unless, of course, 90% of the values are only one or two bytes, then it might make some sense. Have you gotten the idea here that I'm not going to give any hard and fast rules? Hope so, cause I won't - just high-level guidance.

Another situation: say you have an 80 byte column where values range from 10 bytes to the full 80 bytes... and more than 50% of them are less than 60 bytes. Well, that sounds like a possible candidate for VARCHAR to me.

Of course, there are other considerations. Java programmers tend to prefer variable character columns because Java does not have a native fixed length character data type.

For traditional programming languages though, CHAR is preferred because VARCHAR requires additional programmatic handling (to set the length of each column when inserting or modifying the data).

OK, so what if you are trying to determine whether or not the appropriate decision was made when for VARCHAR columns instead of CHAR? You can use information from the DB2 Catalog to get a handle on the actual sizes of each VARCHAR column.

Using views and SQL it is possible to develop a report showing the lengths of the variable column values. First, determine which VARCHAR column you need information about. For the purposes of this example, let's examine the NAME column of SYSIBM.SYSTABLES. This column is defined as VARCHAR(18). Create a view that returns the length of the NAME column for every row, for example:

CREATE VIEW LENGTH_INFO
(COL_LGTH)
AS
SELECT LENGTH(NAME)
FROM SYSIBM.SYSTABLES;

Then, issue the following query using SPUFI to produce a report detailing the LENGTH and number of occurrences for that length:

SELECT COL_LGTH, COUNT(*)
FROM LENGTH_INFO
GROUP BY COL_LGTH
ORDER BY COL_LGTH;

This query will produce a report listing the lengths (in this case, from 1 to 18, excluding those lengths which do not occur) and the number of times that each length occurs in the table. These results can be analyzed to determine the range of lengths stored within the variable column. If you are not concerned about this level of detail, the following query can be used instead to summarize the space characteristics of the variable column in question:

SELECT 18*COUNT(*),
SUM(2+LENGTH(NAME)),
18*COUNT(*)-SUM(2+LENGTH(NAME)),
18,
AVG(2+LENGTH(NAME)),
18-AVG(2+LENGTH(NAME))
FROM SYSIBM.SYSTABLES;

The constant 18 will need to be changed in the query to indicate the maximum length of the variable column as defined in the DDL. This query will produce a report such as the one shown below:

SPACE SPACE TOTAL AVERAGE AVERAGE AVERAGE
USED AS USED AS SPACE SPACE AS SPACE AS SPACE
CHAR(18) VARCHAR(18) SAVED CHAR(18) VARCHAR(18) SAVED
--------- ----------- ------ -------- ----------- -------
158058 96515 61543 18 10 8



This information can then be analyzed to determine if the appropriate decision was made when VARCHAR was chosen. (Of course, the values returned will differ based on your environment and the column(s) that you choose to analyze.) Also, keep in mind that this report will not include the 2 byte prefix stored by DB2 for variable length columns.

I hope this high-level overview with advice on when to use VARCHAR versus CHAR has been helpful. If you have your own guidelines or queries that you use please feel free to post a comment to this blog and share them with everyone.



NOTE: You could skip the creation of the VIEW in the above query and just use a nested table expression (aka in-line view) instead.

Thursday, August 17, 2006

Greatest Software Ever?

I just stumbled across a very interesting article this afternoon and thought I'd share it with everybody through my blog. The article, published in Information Week is titled What's The Greatest Software Ever Written? And isn't that an intriguing question?

Well, I read through the article and other than a few quibbles here and there I'd have to say that the author did a good job of assembling his list. He spends quite a bit of time talking about the IBM 360 project - and well he should. This was one of the first truly huge software projects and it set the bar for what is expected of an operating system. It also was the catalyst for causing one of the best ever books on software development to be written - The Mythical Man Month. Written by Fred Brooks, the manager in charge of the IBM 360 project, this book outlines many of the truisms about software development that we acknowledge even today - more than 40 years later. If you work in IT and you haven't read The Mythical Man Month you really should buy a copy and read it immediately. Anyway, this blog isn't about that book, so let's move on.

I won't spoil it here and publish the list of greatest software - you will have to click on the link for the article and read it yourself (the actual list doesn't start until page 3 of the article, but don't just click right over to that page, read the whole thing).

Suffice it to say, several IBM projects make the list (I'm kinda partial to what came in at #2 -- it would've been my #1 actually). And I think perhaps that VisiCalc belongs on the list instead of the spreadsheet software that is listed - I mean, Dan Bricklin invented the entire spreadsheet category of software when Software Arts published VisiCalc back in the late 1970s.

But the article is good anyway and I'm sure it is almost impossible to publish a list like this without causing disagreement - and perhaps that is its intent any way. So take a moment and click over to the article and give it a read. And feel free to share your thoughts on it here by posting a comment or two.

Thursday, August 10, 2006

SHARE Travelers Take Heed

With the upcoming SHARE conference in Baltimore next week, there are sure to be many of you out there who will be traveling to the nation's capital region over the weekend. As you prepare to travel, be sure to factor in additional time at the airport due to the latest TSA warning.

Basically, in response to a recently thwarted terrorist plot in the UK, the threat level has been raised to High (or Orange) for all commercial aviation operating in or destined for the United States. That means the lines will be longer and the searches more thorough going through security at the airport.

Additionally, please read the TSA announcement and heed what it is saying. I am referring specifically to this: "Due to the nature of the threat revealed by this investigation, we are prohibiting any liquids, including beverages, hair gels, and lotions from being carried on the airplane." Please, for everyone's sake, leave your liquids at home:
  • You can get a drink after you pass through security.
  • Every hotel provides shampoo, conditioner, and lotion for free, so you don't need to bring them.
  • If you absolutely have to have your favorite brand, or some gel or spray, pack it in your checked bags.
And yes, please check your dang luggage! Although I am sometimes amused by idiots trying to jam a huge bag into the overhead bin, it becomes less amusing after a two hour amble through security. If you have a large bag check it!

And I'll see you all in Baltimore.

Wednesday, August 09, 2006

Where exactly is a DB2 plan stored?

The title of this posting is a question I received awhile ago. As I promised earlier on this blog, I will periodically post the answers I have given to e-mailed questions. So, here goes:

A DB2 "plan" is stored in the DB2 Directory and information about the plan is stored in the DB2 Catalog.

The DB2 Directory table that contains actual plans is SYSIBM.SCT02 (and SYSIBM.SPT01 contains actual packages). The plan is stored as an internal structure called a cursor table; packages are stored as package tables. As DB2 runs application programs, it loads the cursor tables for plans and package tables for packages from the DB2 Directory tables into the EDM Pool. This is where the "access path" code that determines how to get the actual DB2 data resides.

There is also metadata about plans and packages that you might find useful. This information includes data about the state, privileges, isolation level, release specification, and so. The DB2 Catalog contains information about plans in the following tables:

  • SYSIBM.SYSDBRM
  • SYSIBM.SYSPLAN
  • SYSIBM.SYSPLANAUTH
  • SYSIBM.SYSPLANDEP
  • SYSIBM.SYSSTMT

And, the DB2 Catalog contains information about packages in the following tables:

  • SYSIBM.SYSPACKAGE
  • SYSIBM.SYSPACKAUTH
  • SYSIBM.SYSPACKDEP
  • SYSIBM.SYSPACKLIST
  • SYSIBM.SYSPACKSTMT
  • SYSIBM.SYSPKSYSTEM
  • SYSIBM.SYSPLSYSTEM

Tuesday, August 08, 2006

Mainframe Weekly: A new mainframe-focused blog

Mainframe Weekly is a new blog featuring the insights of Trevor Eddolls. Trevor is an editor who has worked for Xephon for some time. Xephon publishes those "Update" journals - you know the ones, DB2 Update, CICS Update, etc. The ones that are full of content and don't accept any ads.

I've had the pleasure of writing for DB2 Update and working with Trevor for a number of years now, and I look forward to regularly reading his new blog. Recent entries there have covered CICS, DB2 and Viper, and storage technology.

Do yourself a favor and be sure to check in on Trevor's blog on a regular basis.

Monday, August 07, 2006

Upcoming SHARE Conference

Just a quick posting to remind everyone that the bi-annual SHARE conference is coming up next week in Baltimore, MD. For those of you who don't know about SHARE: the conference has offered education to IT professionals, improving the value of IT to businesses, and advancing careers since 1955 - the longest history of any IT user group.

SHARE in Baltimore runs from August 13-18, 2006 and, as usual, there will be a plethora of useful and educational sessions on multiple IT topics. According to the folks at SHARE there will be over 800 technical sessions to choose from. Session topics range from implementation of new software releases to in-depth looks at what the latest new technologies can do for your business.

I'll be there, and will be delivering two presentations on DB2. On Monday, August 14, (3:00 - 4:00 PM) I'll give an hour of DB2 for z/OS tips and techniques... and then on Tuesday, August 15, (8:00 - 9:00 AM) I'll be giving an overview of DB2 for z/OS performance tuning.

Hope to see you there!

Wednesday, August 02, 2006

New Mainframe Redbook

IBM published a new mainframe redbook this week that everyone who works with mainframe technology should download and have handy. It is titled Introduction to the New Mainframe: z/OS Basics and it offers a very nice high-level overview of mainframe technology circa 2006.

Of course, if you are a long-time mainframer much of this redbook may be too high level for you. But it will be just the right thing to have at your disposal when someone drops by to ask a basic question. Just e-mail them a copy of the book.

And even long-time experts still might want to at least leaf through it. It is divided into four broad sections, as follows:
  • Part 1. Introduction to z/OS and the mainframe
  • Part 2. Application programming on z/OS
  • Part 3. Online workloads for z/OS
  • Part 4. System programming on z/OS
So maybe you are a crackerjack programmer, but would like to know more about system programming. Maybe your online knowledge is skimpy. Or maybe you just want a nice overview of everything. Then this book is just the ticket!

Here is a brief synopsis out of the Preface of the redbook:

This IBM Redbook provides students of information systems technology with the background knowledge and skills necessary to begin using the basic facilities of a mainframe computer. It is the first in a planned series of textbooks designed to introduce students to mainframe concepts and help prepare them for a career in large systems computing.

This textbook can also be used as a prerequisite for courses in advanced topics or for internships and special studies. It is not intended to be a complete text covering all aspects of mainframe operation, nor is it a reference book that discusses every feature and option of the mainframe facilities.

Others who will benefit from this course include experienced data processing professionals who have worked with non-mainframe platforms, or who are familiar with some aspects of the mainframe but want to become knowledgeable with other facilities and benefits of the mainframe environment.

So download Introduction to the New Mainframe: z/OS Basics today...