Sunday, August 20, 2006

Advice on Using Variable Character Columns in DB2

One of the long-standing, troubling questions in DB2-land is when to use VARCHAR versus CHAR. The high-level advice for when to use VARCHAR instead of CHAR is for larger columns whose length varies considerably from row-to-row. Basically, VARCHAR should be used to save space in the database when your values are truly variable.

In other words, if you have a 10-byte column, it is probably not a good idea to make it variable... unless, of course, 90% of the values are only one or two bytes, then it might make some sense. Have you gotten the idea here that I'm not going to give any hard and fast rules? Hope so, cause I won't - just high-level guidance.

Another situation: say you have an 80 byte column where values range from 10 bytes to the full 80 bytes... and more than 50% of them are less than 60 bytes. Well, that sounds like a possible candidate for VARCHAR to me.

Of course, there are other considerations. Java programmers tend to prefer variable character columns because Java does not have a native fixed length character data type.

For traditional programming languages though, CHAR is preferred because VARCHAR requires additional programmatic handling (to set the length of each column when inserting or modifying the data).

OK, so what if you are trying to determine whether or not the appropriate decision was made when for VARCHAR columns instead of CHAR? You can use information from the DB2 Catalog to get a handle on the actual sizes of each VARCHAR column.

Using views and SQL it is possible to develop a report showing the lengths of the variable column values. First, determine which VARCHAR column you need information about. For the purposes of this example, let's examine the NAME column of SYSIBM.SYSTABLES. This column is defined as VARCHAR(18). Create a view that returns the length of the NAME column for every row, for example:

CREATE VIEW LENGTH_INFO
(COL_LGTH)
AS
SELECT LENGTH(NAME)
FROM SYSIBM.SYSTABLES;

Then, issue the following query using SPUFI to produce a report detailing the LENGTH and number of occurrences for that length:

SELECT COL_LGTH, COUNT(*)
FROM LENGTH_INFO
GROUP BY COL_LGTH
ORDER BY COL_LGTH;

This query will produce a report listing the lengths (in this case, from 1 to 18, excluding those lengths which do not occur) and the number of times that each length occurs in the table. These results can be analyzed to determine the range of lengths stored within the variable column. If you are not concerned about this level of detail, the following query can be used instead to summarize the space characteristics of the variable column in question:

SELECT 18*COUNT(*),
SUM(2+LENGTH(NAME)),
18*COUNT(*)-SUM(2+LENGTH(NAME)),
18,
AVG(2+LENGTH(NAME)),
18-AVG(2+LENGTH(NAME))
FROM SYSIBM.SYSTABLES;

The constant 18 will need to be changed in the query to indicate the maximum length of the variable column as defined in the DDL. This query will produce a report such as the one shown below:

SPACE SPACE TOTAL AVERAGE AVERAGE AVERAGE
USED AS USED AS SPACE SPACE AS SPACE AS SPACE
CHAR(18) VARCHAR(18) SAVED CHAR(18) VARCHAR(18) SAVED
--------- ----------- ------ -------- ----------- -------
158058 96515 61543 18 10 8



This information can then be analyzed to determine if the appropriate decision was made when VARCHAR was chosen. (Of course, the values returned will differ based on your environment and the column(s) that you choose to analyze.) Also, keep in mind that this report will not include the 2 byte prefix stored by DB2 for variable length columns.

I hope this high-level overview with advice on when to use VARCHAR versus CHAR has been helpful. If you have your own guidelines or queries that you use please feel free to post a comment to this blog and share them with everyone.



NOTE: You could skip the creation of the VIEW in the above query and just use a nested table expression (aka in-line view) instead.

Thursday, August 17, 2006

Greatest Software Ever?

I just stumbled across a very interesting article this afternoon and thought I'd share it with everybody through my blog. The article, published in Information Week is titled What's The Greatest Software Ever Written? And isn't that an intriguing question?

Well, I read through the article and other than a few quibbles here and there I'd have to say that the author did a good job of assembling his list. He spends quite a bit of time talking about the IBM 360 project - and well he should. This was one of the first truly huge software projects and it set the bar for what is expected of an operating system. It also was the catalyst for causing one of the best ever books on software development to be written - The Mythical Man Month. Written by Fred Brooks, the manager in charge of the IBM 360 project, this book outlines many of the truisms about software development that we acknowledge even today - more than 40 years later. If you work in IT and you haven't read The Mythical Man Month you really should buy a copy and read it immediately. Anyway, this blog isn't about that book, so let's move on.

I won't spoil it here and publish the list of greatest software - you will have to click on the link for the article and read it yourself (the actual list doesn't start until page 3 of the article, but don't just click right over to that page, read the whole thing).

Suffice it to say, several IBM projects make the list (I'm kinda partial to what came in at #2 -- it would've been my #1 actually). And I think perhaps that VisiCalc belongs on the list instead of the spreadsheet software that is listed - I mean, Dan Bricklin invented the entire spreadsheet category of software when Software Arts published VisiCalc back in the late 1970s.

But the article is good anyway and I'm sure it is almost impossible to publish a list like this without causing disagreement - and perhaps that is its intent any way. So take a moment and click over to the article and give it a read. And feel free to share your thoughts on it here by posting a comment or two.

Thursday, August 10, 2006

SHARE Travelers Take Heed

With the upcoming SHARE conference in Baltimore next week, there are sure to be many of you out there who will be traveling to the nation's capital region over the weekend. As you prepare to travel, be sure to factor in additional time at the airport due to the latest TSA warning.

Basically, in response to a recently thwarted terrorist plot in the UK, the threat level has been raised to High (or Orange) for all commercial aviation operating in or destined for the United States. That means the lines will be longer and the searches more thorough going through security at the airport.

Additionally, please read the TSA announcement and heed what it is saying. I am referring specifically to this: "Due to the nature of the threat revealed by this investigation, we are prohibiting any liquids, including beverages, hair gels, and lotions from being carried on the airplane." Please, for everyone's sake, leave your liquids at home:
  • You can get a drink after you pass through security.
  • Every hotel provides shampoo, conditioner, and lotion for free, so you don't need to bring them.
  • If you absolutely have to have your favorite brand, or some gel or spray, pack it in your checked bags.
And yes, please check your dang luggage! Although I am sometimes amused by idiots trying to jam a huge bag into the overhead bin, it becomes less amusing after a two hour amble through security. If you have a large bag check it!

And I'll see you all in Baltimore.

Wednesday, August 09, 2006

Where exactly is a DB2 plan stored?

The title of this posting is a question I received awhile ago. As I promised earlier on this blog, I will periodically post the answers I have given to e-mailed questions. So, here goes:

A DB2 "plan" is stored in the DB2 Directory and information about the plan is stored in the DB2 Catalog.

The DB2 Directory table that contains actual plans is SYSIBM.SCT02 (and SYSIBM.SPT01 contains actual packages). The plan is stored as an internal structure called a cursor table; packages are stored as package tables. As DB2 runs application programs, it loads the cursor tables for plans and package tables for packages from the DB2 Directory tables into the EDM Pool. This is where the "access path" code that determines how to get the actual DB2 data resides.

There is also metadata about plans and packages that you might find useful. This information includes data about the state, privileges, isolation level, release specification, and so. The DB2 Catalog contains information about plans in the following tables:

  • SYSIBM.SYSDBRM
  • SYSIBM.SYSPLAN
  • SYSIBM.SYSPLANAUTH
  • SYSIBM.SYSPLANDEP
  • SYSIBM.SYSSTMT

And, the DB2 Catalog contains information about packages in the following tables:

  • SYSIBM.SYSPACKAGE
  • SYSIBM.SYSPACKAUTH
  • SYSIBM.SYSPACKDEP
  • SYSIBM.SYSPACKLIST
  • SYSIBM.SYSPACKSTMT
  • SYSIBM.SYSPKSYSTEM
  • SYSIBM.SYSPLSYSTEM

Tuesday, August 08, 2006

Mainframe Weekly: A new mainframe-focused blog

Mainframe Weekly is a new blog featuring the insights of Trevor Eddolls. Trevor is an editor who has worked for Xephon for some time. Xephon publishes those "Update" journals - you know the ones, DB2 Update, CICS Update, etc. The ones that are full of content and don't accept any ads.

I've had the pleasure of writing for DB2 Update and working with Trevor for a number of years now, and I look forward to regularly reading his new blog. Recent entries there have covered CICS, DB2 and Viper, and storage technology.

Do yourself a favor and be sure to check in on Trevor's blog on a regular basis.