Monday, July 25, 2011

Bad Database Standards

Today's blog post is a metapost, of sorts. I am using my DB2 blog to point you to a series of posts I made in my other blog (Data and Technology Today). You see, I write for two blogs, this one that focuses on DB2 and mainframe topics, and another one on data and database management topics in general.

One of my more popular series of posts on Data and Technology Today was the one on bad database standards. The general idea of these posts is to dissect and criticize standards that are outdated, or simply wrong. The seven part series ran about a year ago and has generated a bunch of comments. And it will be useful to DB2 folks to read these posts, too.

So without any further ado, here are links to the seven bad database standards:

  1. Limiting the number of indexes.

  2. Too Many Columns!

  3. Limiting The Number of Tables in “Online” Joins

  4. Duplication of Data

  5. None Shall Pass!

  6. What’s In A Name?

  7. What Does Support Mean?

Take a moment to click through to these links and peruse the "bad" standards and, if you are so inclined, post a comment (either here on this blog, or over on the other one) to share your "favorite?" bad standards.


Tuesday, July 19, 2011

Updating DB2 Developer's Guide (6th edition)

Just a brief blog post today to inform regular readers who might be concerned that I have not been blogging as frequently as in the past.

I am currently working on updating my book, DB2 Developer's Guide, for DB2 versions 9 and 10. This will be the sixth edition of the book and if all goes as planned, it should be available on IBM Press before the end of the year (2011).

Be sure to keep an eye on this blog and my web site (http://www.craigsmullins.com) for further details about the next edition of the book.

Sunday, June 19, 2011

The PIECESIZE Clause

One of the more troubling aspects of DB2 database design and creation is the non-partitioning index (NPI). Creating NPIs on tables in a partitioned table space can pose management and performance issues. Partitioned table spaces tend to be large and by their very design will span multiple underlying data sets. Any partitioning indexes will also span multiple data sets. But what happens when you need to define non-partitioning indexes on a table in a partitioned table space?

The PIECESIZE clause of the CREATE INDEX statement can be used during index creation to break an NPI into several data sets (or "pieces"). More accurately, the PIECESIZE clause specifies the largest data set size for a non-partitioned index. PIECESIZE can be specified in kilobytes, megabytes, or gigabytes. For example, the following statement will limit the size of individual data sets for the XACT2 index to 256 megabytes:

CREATE UNIQUE INDEX DSN8910.XACT2
ON DSN8910.ACT (ACTKWD ASC)
USING STOGROUP DSN8G910
PRIQTY 65536K
SECQTY 8192K
ERASE NO
BUFFERPOOL BP0
CLOSE NO
PIECESIZE 256M;

Basically, PIECESIZE is used to enable NPIs to be created on very large partitioned table spaces. It breaks apart the NPI into separate pieces that can be somewhat managed individually. Without PIECESIZE, NPIs would be quite difficult to manage and administer. Keep in mind, though, that PIECESIZE does not magically partition an NPI based on the partitioning scheme of the table space. This is a misperception held by some. So, if you have a partitioned table space with 4 partitions and then create an NPI with 4 pieces, the data in the NPI pieces will not match up with the data in the 4 partitions.

When using PIECESIZE, more data sets will be created and therefore you can obtain greater control over data set placement. Placing the pieces on separate disk devices can help to reduce I/O contention for SQL operations that access NPIs during read or update processing. The elapsed time improvement may be even greater when multiple tasks are accessing the NPI.

Separating the NPI into pieces allows for better performance of INSERT, UPDATE and DELETE processes by eliminating bottlenecks that can be caused by using only one data set for the index. The use of pieces also improves concurrency and performance of heavy INSERT, UPDATE, and DELETE processing against any size partitioned table space with NPIs.

Keep in mind that PIECESIZE is only a specification of the maximum amount of data that a piece (that is, a data set) can hold and not the actual allocation of storage, so PIECESIZE has no effect on primary and secondary space allocation. Each data set will max out at the PIECESIZE value, so specifying PRIQTY greater than PIECESIZE will waste space. But also make sure that you avoid setting the PIECESIZE too small. A new data set will be allocated each time the PIECESIZE threshold is reached. DB2 will increment the A001 component of the data set name each time. Ideally, the value of your primary quantity and secondary quantities should be evenly divisible into PIECESIZE to avoid wasting space.

To choose a PIECESIZE value, divide the overall size of the entire NPI by the number of data sets that you wish to have. For example, for an NPI that is 8 megabytes, you can arrive at 4 data sets for the NPI by specifying PIECESIZE 2M. Of course, if your NPI grows over 8 megabytes in total you will get additional data sets. Keep in mind that 32 pieces is the limit if the underlying table space is not defined with DSSIZE 4G or greater. The limit is 254 pieces if the table space is defined as DSSIZE 4G or greater.

Wednesday, June 01, 2011

Mainframe Specialty Processors

Anyone who uses an IBM z Series mainframe has probably heard about zIIPs and zAAPs and other specialty processors. But maybe you haven't yet done any real investigation into what they are, what they do, and why they exist. So, with that in mind, let's take a brief journey into the world of specialty processors in today's blog entry!

Over the course of the past decade or so, IBM has introduced several different types of specialty processors. The basic idea of a specialty processor, is that it sits alongside the main CPUs and specific types of "special" workload is shuttled to the specialty processor to be run there, instead of on the primary CPU complex. Why is this useful or interesting to mainframe customers? Well, the specialty processor workload is not subject to IBM (as well as many ISVs) licensing charges... and, as any mainframer knows, the cost of software rises as capacity on the mainframe rises. But if capacity can be redirected to a specialty processor, then software license charges do not accrue -- at least for that workload.

And for VWLC customers, shuttling workload to a specialty processor can reduce the rolling four hour average and thereby decrease your monthly IBM software license bill.

Another benefit of the specialty processors is that can be cheaper to acquire than standard CPUs.

But specialty processors can only run certain types of workloads. There are four types of specialty processors:

  • ICF: Internal Coupling Facility - used for redirecting coupling facility cycles in a data sharing environment.
  • IFL: Integrated Facility for Linux - used for processing zLinux workload on an IBM mainframe.
  • zAAP: Application Assist Processor - used for Java workload
  • zIIP: Integrated Information Processor - used for processing certain, distributed database workloads.

When you activate any of these processors, some percentage of that type of workload can be redirected off of the main CP onto the specialty processor... but not 100% of the workload. It can be frustrating, particularly with the zIIP, to determine exactly what is redirected exactly when and exactly how much of it. In general, distributed DB2 for z/OS workload and XML processing can be redirected to zIIP processors.

Additionally, to run on a zIIP, the workload must run under an enclave SRB. So, code written to execute under a TCB will usually be unable to execute under an SRB without major changes. If you didn't understand that sentence, don't worry about it too much. Basically, IBM has enabled certain types of (mostly DB2) workload to run on zIIPs, and ISVs have enabled some of their code to run on zIIPs, too. If you are interested, more details about zIIPs can be found at this link.

Another interesting tidbit is that zAAP-eligible workloads can be run on zIIPs with IBM's zAAP on zIIP support. This can be a boon to some shops that only have zIIPs and no zAAPs. Now, with zAAP on zIIP support, you can use zIIP processors for both Java and distributed DB2 workloads. The combined eligible TCB and enclave SRB workloads might make the acquisition of a zIIP cost effective.This capability also provides more value for customers having only zIIP processors by making Java- and XML-based workloads eligible to run on existing zIIPs.

To take advantage of zAAP on zIIP, you need to be running z/OS V11.1 (or z/OS V1.9 or V1.10 with the PTFs for APAR OA27495) on a z9, z10, or z196 server.

Keep in mind, that the terms for specialty processors do not change. You can only have 1 zAAP and 1 zIIP per each general purpose processor. So, even if you have zAAP on zIIP configured, the chip is still a zIIP and you cannot have any more than 1 per general purpose processor.

The Bottom Line

The bottom line is that even though it can take some studying and research to understand their benefit and functionality, specialty processors can help to reduce the cost of mainframe computing... and that is a good thing!

What is an Enclave?

If you are a DB2 professional dealing with distributed workload… or if you are enabling zIIP specialty processors… chances are you’ve heard the term “enclave” or “enclave SRB.” But just what is an enclave?

An enclave is a construct that represents a transaction or unit of work. Enclaves are a method of managing mainframe transactions for non-traditional workloads. You can think of an enclave as an anchor point for resource accumulation regardless of where the transaction is executing.

With traditional workloads it is relatively easy to map the resources consumed to the actual transaction doing the consumption. But with non-traditional workloads – web transactions, distributed processing, etc. – it is more difficult because the transaction can span platforms. Enclaves are used to overcome this difficulty by correlating closely to the end user’s view of the transaction.

So even though a non-traditional transaction can comprise multiple “pieces” spanning many server address spaces, and can share those address spaces with other transactions, the enclave gives you more effective control over the non-traditional workload.

If you are interested in more details on enclaves and how they are managed, read through Enclaves – Managing Business Transactions from IBM’s RMF Newsletter.

Wednesday, May 25, 2011

A Quick SQL Trick: Find The Number of Commas

Today's blog post is a short one. I was recently asked how to return a count of specific characters in a text string column. For example, given a text string, return a count of the number of commas in the string.

This can be done using the LENGTH and REPLACE functions as follows:

SELECT LENGTH(TEXT_COLUMN) - LENGTH(REPLACE(TEXT_COLUMN, ',' ''))

The first LENGTH function simply returns the length of the text string. The second iteration of the LENGTH function in the expression returns the length of the text string after replacing the target character (in this case a comma) with a blank.

So, let's use a string literal to show a concrete example:

SELECT LENGTH('A,B,C,D') - LENGTH(REPLACE('A,B,C,D', ',', ''))

This will translate into 7 - 4... or 3. And there are three commas in the string.

When confronted with a problem like this it is usually a good idea to review the list of built-in SQL functions to see if you can accomplish your quest using SQL alone.


Friday, May 13, 2011

DB2 -- What's in a Name?

Versions of DB2 exist for a large array of platforms, of which the mainframe (z/OS) is only one. Of course, it is my favorite one since I’ve been working on mainframe technology now for decades and have worked with DB2 since Version 1.

It used to be easy: DB2 meant IBM’s mainframe SQL database management system based on the relational model. But you can’t just say the term “DB2” any more and expect people to understand what you mean.

Today there are variations of DB2 that run on the iSeries (AS/400), on Linux, Unix, and Windows (LUW) platforms, and even one that runs on PDAs and smart phones called DB2 Everyplace. Not to mention the mainframe variations that run on z/OS, VM, and VSE.
These products are all collectively referred to by IBM as the DB2 Family. Individually, each DBMS is referred to as DB2, or sometimes DB2 Universal Database Server. There was a period of time when DB2 for LUW was called UDB and DB2 for z/OS was just called DB2. Then IBM tried to rebrand both as DB2 UDB. But that seems to have gone away several versions ago now.
The proper way to refer to any individual offering in the DB2 family is DB2 for (operating system) (for example, DB2 for z/OS or DB2 for Windows).

Different Code Bases

There are four distinct code bases for the products under the DB2 brand. The mainframe has its own code base, as does the iSeries, and VSE/VM. The fourth code base is for Linux, Unix, and Windows (LUW) platforms—and the other DB2 offerings (e.g. DB2 Everyplace) originate from this code base.

Having a separate code base means that each of these DB2 “products” was developed independently from the others. So, for example, the process used by DB2 for z/OS to optimize SQL differs from the process used by DB2 for Linux. Usually, though, the result is similar—an efficient SQL statement.

But keep in mind that there will be some differences between the DB2s.

Some of the Differences

It is obvious that the different DB2 products are not “plug and play” commodities simply because they all share the name DB2. There are some big differences among these products in their current releases. The biggest differences are relatively easy to detect and include the following:
  • Differences imposed due to operating system constraints
    (OS/400 versus z/OS versus AIX)
  • Back-level compatibility issues
  • Workstation orientation differences such as GUI interfaces and drag-and-drop menus
  • Subsystem-centric implementation (z/OS) versus database-centric implementation (workstation)
Most of these differences are minor and easy to handle. Indeed, IBM has slowly but surely been making these disparate implementations of DB2 more and more alike with each new release and version. The interface (or API) by which most people access any of the DB2 Family is SQL and there is broad compatibility among the SQL implementations of the members of the DB2 Family (though not 100 percent, of course).

A misconception “out there” in DB2-land is that the LUW platform drives new features, but a review of the changes that have been introduced to DB2 over the past several versions and releases does not bear that out. Some features are introduced on the mainframe first; others on the distributed platforms first.

Of the basic differences mentioned earlier, the only one that might not be obvious is the focus of the DBMS implementation. DB2 for LUW is database-centric. This implies that each new database carries its own system catalog with it. Additionally, it is not possible to simply access tables across different databases; distributed access is required.

On z/OS, DB2 is subsystem-centric. A single system catalog spans databases. Each subsystem has a unique identification, and you can create multiple databases within it. Distributed requests are not required to access databases within the same subsystem (or, indeed, across multiple subsystems in a data-sharing environment).

Another concept that is different at the workstation level is that of a directory. The DB2 for z/OS Directory houses DBMS system-related information regarding DBD structure, skeleton plan and skeleton package tables, RBA log ranges, and utility control data. The information cannot be updated by the user but is managed and controlled by DB2.

At the workstation level, a directory is another matter altogether. For example, the directory structure used by DB2 for LUW controls the overall environment.
  • The System Database Directory identifies the databases that can be accessed from the workstation and contains an entry for each local and remote one. Each database entry contains the database name, alias, entry type, and location.
  • One Volume Database Directory is allocated per disk drive that contains a workstation database. Each entry identifies the location of a specific database on the drive.
  • The Workstation Directory is used to make a connection to a remote database server. It is used in conjunction with the Database Connection Services Directory to make a connection to a remote host server.
  • The Database Connection Services Directory is used by DB2 Connect to make a connection to a remote host server.
Not only is it possible for the user to update these directories, it is required. The workstation directories define the environment of DB2 for LUW. Without the proper information recorded in these directories, DB2 might not function in the desired manner. The information in these directories is somewhat analogous to DB2 for z/OS DSNZPARMs and the SYSDDF system catalog tables.

Database Structures

Not all the objects available to DB2 for z/OS users are supported at the workstation level. For example, hardware-specific DB2 objects such as table spaces and storage groups are not available for DB2 on other platforms, at least not in the same way that mainframers are used to dealing with them. Partitioning and segmenting as it is done on z/OS is not done on other platforms.

However, DB2 for LUW does provide a feature known as a segmented table. But this is not the same concept as a DB2 for z/OS segmented table space. DB2 for LUW segmented tables are used to span volumes, enabling DB2 to get around file size limitations.

The file structure used for databases differs from platform to platform. For example, DB2 for z/OS uses VSAM Linear Data Sets (LDS) or Entry Sequenced Data Sets (ESDS). A database deployed on DB2 for LUW uses two files for table data: one for normal data and a second to store long fields. These workstation files are flat files, not VSAM files.

Although tables are basically the same for all of the DB2 environments, not all of the DDL options are provided in all of the environments.

Optimizer Differences

One of the most significant benefits of relational databases is that they provide built-in optimization. The DB2 for z/OS optimizer is well-known to mainframe DB2 users, but how similar are the other DB2 optimizers?

DB2 for LUW uses the latest and greatest optimization technology from IBM -- the Starburst optimizer (which arose from IBM’s Almaden research lab). Starburst is a database optimization research project that has been covered quite extensively in the academic press.

As one example of the difference, consider that the DB2 for LUW optimizer has varying levels of optimization that can be selected by the user. This concept is not implemented in DB2 for z/OS.

Although some Starburst technology will find its way to DB2 for z/OS, the mainframe DB2 optimizer will not be completely replaced by Starburst technology. Doing so would not be wise because the DB2 for z/OS optimizer has been finely tuned for its environment over the course of almost three decades.

Another interesting tidbit is that DB2 for iSeries provides an access method for programmers in which they can bypass the relational engine. This is not encouraged, but it is available.

Other Differences

Other differences exist between the different implementations of DB2. Some of these are caused by the different release cycles IBM has created for the differing platforms. The bottom line is that you need to be aware that there are differences between the DB2s on different platforms. Whenever you use a specific implementation of DB2, you need to be aware of the features it supports that other DB2 platforms do not, as well as the features it does not support that other DB2 platforms do support.

Packaging and Naming Issues

The actual name of the DB2 edition can be tricky to master on non-mainframe platforms. On the mainframe you just say “I want DB2,” and that is what you get. Well, almost. You also have to decide whether you want IBM’s utilities or not, too.

But things are more difficult in the LUW world. The following packages are all available for DB2 on Linux, Unix, and Windows:

DB2 Workgroup Server Edition (WSE) is a multi-user, single host, DBMS at the departmental user. It should be deployed for smaller systems with a limited number of users.

DB2 Enterprise Server Edition (ESE) is the highest level of DB2 database version with intra-partition parallelism support (the database engine can process SQL statement segments in parallel), and inter-partition parallelism support (process a query in parallel across all of the nodes). ESE has Partitioning and Clustering options as additional add-on features. So, this is the enterprise DB2.

DB2 Advanced Enterprise Server Edition (AESE) sounds like a step up from ESE, and it is, kind of... but not really in terms of key DBMS technology. The advanced means that IBM integrates Optim and InfoSphere technologies into the product.

DB2 Express Edition is targeted at entry level users at a low price point. Small shops, partners, and new users can build applications on top of DB2 Express.

And DB2 Express-C is IBM’s “free” DBMS offering providing all the “core” capabilities of DB2 at no charge. So why use an open source DBMS when you can get a free version of DB2?
A handy comparison of the editions is available on IBM’s web site.

Summary

So you see, saying DB2 is no enough any more. Which DB2? They’re all great, but it can take some time to wrap your arms around all of this…

Friday, April 29, 2011

I'll Be Tweeting Live From IDUG

For those of you who use Twitter, make sure you are following me next week (http://www.twitter.com/craigmullins) as I will be tweeting my experiences from the IDUG conference in Anaheim.

If you aren't planning to go, you can follow my Tweets to hear what is going on... and if you are attending the show, you can follow my Tweets to hear my perspective on things...

I arrive in Anaheim Tuesday afternoon, so I will miss the kickoff, but I'll be there the rest of the week.

Tuesday, April 26, 2011

100 Years of IBM


If you have anything at all to do with computers or information technology, you have something to thank IBM for. Watch this video to find out what!

Monday, April 04, 2011

What About Surrogate Keys?

As is so often the case with my blog, today's topic came about as the result of an e-mail question I received from a DBA I know. His question was this:

"A great debate rages here about the use of ‘synthetic’ keys. We read all sorts of articles on the wild wild web but none seem to address the database performance impacts of designs using synthetic keys. I wondered if you could point me to any information on this…"

If you've ever Googled the term "surrogate key" you know the hornet's nest of opinions that swirls around "out there" about the topic. For those who haven't heard the term, here is my attempt at a quick summary: a surrogate key is a generated unique value that is used as the primary key of a database table; database designers tend to consider surrogate keys when the natural key consists of many columns, is very long, or may need to change.

And here is the response I sent to my e-mail inquisitor:

I doubt that there is any “final word” on this topic. It has been raging on for years and years; so folks pro, others con. This Wikipedia article offers up a nice start: http://en.wikipedia.org/wiki/Surrogate_key

However, when I get to the performance area of this article I don’t think I agree. The article puts a lot of emphasis on there being fewer columns to join and therefore better performance.. If you’ve got an index on those multiple columns how much “worse” will the performance be, really? Sure, the SQL is more difficult to write, but will a join over 4 or 5 indexed columns perform that much worse than a join on one indexed column? I suppose as the number of columns required for the natural key increases the impact could be greater (e.g. 10 columns???)

I guess I can see the argument if you are swapping a variable length key with a surrogate having a fixed length key – that should improve things!

Furthermore consider this: the natural key columns are still going to be there, after all, they are naturally part of the data, right? So the surrogate (synthetic) key gets added to each row. This will likely reduce the number of rows per page (maybe not, but probably). And that, in turn, will negatively impact the performance of sequential access because more I/O will be required to read the “same” number of rows.

And what about the impact of adding data? If there are a significant number of new rows being added at the same time by different processes there will be locking issues as they all try to put the new data on the same page (unless, of course, your surrogate key is not a sequential number and is, instead, something like the microseconds portion of the current timestamp [that must be tested to avoid duplicates]).

The one thing that usually causes me to tend to favor natural keys is just that – they are natural. If the data is naturally occurring it becomes easier for end users to remember it and use it. If it is a randomly generated surrogate nobody will actually know the data. Yes, this can be masked to a great deal based on the manner in which you build your applications to access the data, but ad hoc access becomes quite difficult.

I guess the bottom line is that “it depends” on a lot of different things! No surprise there, I suppose.

Here are a few other resources with information (not so much on performance though) that you may or may not have reviewed already:

What do you think about natural keys versus surrogate keys? Surely some readers here have an opinion on this topic! If so, post them as comments...

Wednesday, March 09, 2011

DB2 Symposium 2011

Today's blog post is about a great symposium dedicated to the topic of DB2. It is called, appropriately enough, the DB2 Symposium. DB2 Symposium is a three day training event with one day seminars presented by well-known DB2 consultants. I was fortunate enough to be asked to participate this year by the primary organizer of the event, Klaas Brant. For those of you who don't know him, Klaas is a well-respected DB2 consultant based in the Netherlands... and an all around great guy.

Why should I attend the DB2 Symposium you may ask? Don't IDUG and IOD provide everything I need in the way of events? Well, DB2 Symposium fills the gap between a conference and a multi-day training course. The DB2 Symposium is unique because you can participate for 1, 2, or 3 days, depending on your needs and budget.

Although it has not been to the USA the past few years, the DB2 Symposium is a regular, well-known event in Europe! And after a period of absence the DB2 Symposium is back in the USA.

The USA DB2 Symposium is happening soon, so you'll need to act fast if you want to participate. It occurs March 21-23, 2011 in the Dallas, Texas area. More precisely, at the Hilton Arlington (2401 East Lamar Boulevard, Arlington, Texas, USA 76006-7503). Each day the training sessions start at 9.00am and end at around 5.00pm.

But registration on site is not possible, you must pre-register online... so plan ahead!

My session is on March 21st and it is called DB2 Developer's Guide Comes Alive! This one day session, covers tips, techniques, and procedures you need to know in order to excel at administering and using DB2 on the mainframe.The material is based upon DB2 Developer's Guide, the best-selling DB2 for z/OS book on the market. Additionally, the course material will contain references to sections of the book for students to find additional material on each topic after the sessions. Topics to be covered will include:

  • A performance tuning roadmap for managing DB2 application, database and system performance. You will learn SQL coding and tuning techniques, guidance for database optimization and reorganization, coverage of buffer pool settings and parameters for performance.
  • Logical and physical database design recommendations for DB2, so you can build and maintain effective DB2 databases immediately. Includes discussion of standards, logical to physical translation, data types, usage of nulls, and more.
  • Information and guidance on BINDing and REBINDing, along with a discussion of the most important parameters.
  • Along the way we'll look at locking, access paths, statistics, indexing and more.
  • And even though the current edition of the book covers through DB2 V8, this course adds coverage of some of the newer features added to DB2 in versions 9 and 10 that can boost your productivity and performance.

If you own the book already, bring it along and I'll be happy to autograph it for you. And then you can use it along with the course materials... and if you don't own it already, you'll probably want to grab a copy after attending the seminar... you can always find a link to buy my books on the front page of my web site at http://www.craigsmullins.com.

So register for the DB2 Symposium today... and I'll see you in Dallas, pardner!

Monday, February 21, 2011

Not Your Standard Sorting Requirement

Sometimes the requirements of a particular application dictate that data needs to be sorted using some irregular collating sequence. These odd needs sometimes cause developers to sit and scratch their heads for hours, searching for ways to make DB2 do something that seems to be "unnatural." But often you can create an answer just by understanding the problem and applying some creative SQL.

At this point, some of you might be asking "What the heck is he talking about?" Fair enough. Let’s take a look at an example to bring the issue into focus.

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing an abbreviation for the name of the day on which the transaction happened; let’s call this column DAY_NAME. So, for example, the DAY_NAME column would contain MON for Monday data, and so on.

Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, the first step is usually to write the first query that comes to mind, or something like this:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY DAY_NAME;

Of course, the results will be sorted improperly. ORDER BY will sort the results alphabetically; in other words: FRI MON SAT SUN THU TUE WED

This is what I mean by an irregular sorting requirement. Here we have an example that occurs commonly enough, but without an obvious immediate solution. Furthermore, many businesses and applications have similar requirements for which the business needs dictate a different sort order than strictly alphabetical or numeric. So what is the solution here?

Of course, one solution would be to design the table with an additional numeric or alphabetic column that would sort properly. By this, I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. But this requires a database design change, and it becomes possible for the DAY_NUM and DAY_NAME to get out of sync if you are not careful.

There is another solution that is both elegant and does not require any change to the database. To implement this solution, all you need is an understanding of SQL and SQL functions -- in this case, the LOCATE function. Consider this SQL:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME);

The trick here is to understand how the LOCATE function works. The LOCATE function returns the starting position of the first occurrence of one string within another string.

So, in our example, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position. So, if DAY_NAME is WED, the LOCATE function returns 10. (Note: Some other database systems have a similar function called INSTR.) Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value, given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME)/3) + 1;

Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week.

Summary

With a sound understanding of the features of DB2 SQL and a little imagination many irregular requirements are achievable using nothing more than SQL!

Thursday, February 10, 2011

View Naming Conventions

Naming conventions sometimes instigate conflict within the world of DB2, especially when it comes to views. But, really, it should be very easy. Just always remember, that a view is a logical table. It consists of rows and columns, exactly the same as a DB2 table. A DB2 view can (syntactically) be used in SQL SELECT, UPDATE, DELETE, and INSERT statements in the same way that a DB2 table can. Furthermore, a view can be used functionally the same as a DB2 table (with certain limitations on updating as outlined in my article).

Therefore, shouldn't it stand to reason that views should be held to the same naming conventions as are used for tables? (As an aside, the same can be said for DB2 aliases and synonyms).

End users querying views don't need to know whether they are accessing a view or a table. That is the whole purpose of views. Why then, should we enforce an arbitrary naming standard, such as putting a V in the first or last position of a view name, on views?

DBAs and technical analysts, those individuals who have a need to differentiate between tables and views, can utilize the DB2 Catalog to determine which objects are views and which objects are tables.

Most users don't care whether they are using a table, view, synonym, or alias. They simply want to access the data. And, in a relational database, tables, views, synonyms, and aliases all logically appear to be identical to the end user: collections of rows and columns.

There are certain operations that cannot be performed on certain types of views, but the end users who need to know this will generally be sophisticated users. For example, very few shops allow end users to update any table they want using a report writer or query tool (e.g. QMF, SPUFI, etc.). Updates, deletions, and insertions (the operations which are not available to some views) are generally coded into application programs and executed in batch or via online transactions. Most end users need to query tables dynamically.

Now you tell me, which name will your typical end user remember more readily when he needs to access his marketing contacts: MKT_CONTACT or VMKTCT01?

Thursday, February 03, 2011

TIMESTAMP versus DATE/TIME

Consider a database design decision point where you need to store both date and time information on a single row in DB2. Is it better to use a single TIMESTAMP column or two columns, one DATE and the other TIME?


Well, of course, the answer is "it depends!" The correct solution will depend on several factors specific to your situation. Consider the following points before making your decision:

  • With DATE and TIME you must use two columns. TIMESTAMP uses one column, thereby simplifying data access and modification.
  • The combination of DATE and TIME columns requires 7 bytes of storage, while a TIMESTAMP column always requires 10 bytes of storage. Using the combination of DATE and TIME columns will save space.
  • TIMESTAMP provides greater time accuracy, down to the microsecond level. TIME provides accuracy only to the second level. If precision is important, use TIMESTAMP. Use TIME if you want to ensure that the actual time is NOT stored down to the microsecond level.
  • A TIMESTAMP can always be broken down into a DATE and a TIME component, after which you can treat the data just like DATE and TIME data.
  • Date and time arithmetic probably will be easier to implement using TIMESTAMP data instead of a combination of DATE and TIME. Subtracting one TIMESTAMP from another results in a TIMESTAMP duration. To calculate a duration using DATE and TIME columns, two subtraction operations must occur: one for the DATE column and one for the TIME column.
  • Formatting may be easier with DATE and TIME data. DB2 provides for the formatting of DATE and TIME columns via local DATE and TIME exits, the CHAR function, and the DATE and TIME precompiler options. If the date and time information is to be extracted and displayed on a report or by an online application, the availability of these DB2-provided facilities for DATE and TIME columns should be considered when making your decision.
  • Prior to DB2 V9, not much help was available for the formatting of TIMESTAMP columns.But DB2 9 for z/OS adds the TIMESTAMP_FORMAT function, which offers three different formats for displaying timestamp data.
Upon reviewing all of these details, and factoring in your usage requirements, you can then make an informed decision about whether to use one TIMESTAMP column, or two columns, one DATE and one TIME.