Monday, February 21, 2011

Not Your Standard Sorting Requirement

Sometimes the requirements of a particular application dictate that data needs to be sorted using some irregular collating sequence. These odd needs sometimes cause developers to sit and scratch their heads for hours, searching for ways to make DB2 do something that seems to be "unnatural." But often you can create an answer just by understanding the problem and applying some creative SQL.

At this point, some of you might be asking "What the heck is he talking about?" Fair enough. Let’s take a look at an example to bring the issue into focus.

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing an abbreviation for the name of the day on which the transaction happened; let’s call this column DAY_NAME. So, for example, the DAY_NAME column would contain MON for Monday data, and so on.

Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, the first step is usually to write the first query that comes to mind, or something like this:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY DAY_NAME;

Of course, the results will be sorted improperly. ORDER BY will sort the results alphabetically; in other words: FRI MON SAT SUN THU TUE WED

This is what I mean by an irregular sorting requirement. Here we have an example that occurs commonly enough, but without an obvious immediate solution. Furthermore, many businesses and applications have similar requirements for which the business needs dictate a different sort order than strictly alphabetical or numeric. So what is the solution here?

Of course, one solution would be to design the table with an additional numeric or alphabetic column that would sort properly. By this, I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. But this requires a database design change, and it becomes possible for the DAY_NUM and DAY_NAME to get out of sync if you are not careful.

There is another solution that is both elegant and does not require any change to the database. To implement this solution, all you need is an understanding of SQL and SQL functions -- in this case, the LOCATE function. Consider this SQL:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME);

The trick here is to understand how the LOCATE function works. The LOCATE function returns the starting position of the first occurrence of one string within another string.

So, in our example, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position. So, if DAY_NAME is WED, the LOCATE function returns 10. (Note: Some other database systems have a similar function called INSTR.) Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value, given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME)/3) + 1;

Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week.

Summary

With a sound understanding of the features of DB2 SQL and a little imagination many irregular requirements are achievable using nothing more than SQL!

Thursday, February 10, 2011

View Naming Conventions

Naming conventions sometimes instigate conflict within the world of DB2, especially when it comes to views. But, really, it should be very easy. Just always remember, that a view is a logical table. It consists of rows and columns, exactly the same as a DB2 table. A DB2 view can (syntactically) be used in SQL SELECT, UPDATE, DELETE, and INSERT statements in the same way that a DB2 table can. Furthermore, a view can be used functionally the same as a DB2 table (with certain limitations on updating as outlined in my article).

Therefore, shouldn't it stand to reason that views should be held to the same naming conventions as are used for tables? (As an aside, the same can be said for DB2 aliases and synonyms).

End users querying views don't need to know whether they are accessing a view or a table. That is the whole purpose of views. Why then, should we enforce an arbitrary naming standard, such as putting a V in the first or last position of a view name, on views?

DBAs and technical analysts, those individuals who have a need to differentiate between tables and views, can utilize the DB2 Catalog to determine which objects are views and which objects are tables.

Most users don't care whether they are using a table, view, synonym, or alias. They simply want to access the data. And, in a relational database, tables, views, synonyms, and aliases all logically appear to be identical to the end user: collections of rows and columns.

There are certain operations that cannot be performed on certain types of views, but the end users who need to know this will generally be sophisticated users. For example, very few shops allow end users to update any table they want using a report writer or query tool (e.g. QMF, SPUFI, etc.). Updates, deletions, and insertions (the operations which are not available to some views) are generally coded into application programs and executed in batch or via online transactions. Most end users need to query tables dynamically.

Now you tell me, which name will your typical end user remember more readily when he needs to access his marketing contacts: MKT_CONTACT or VMKTCT01?

Thursday, February 03, 2011

TIMESTAMP versus DATE/TIME

Consider a database design decision point where you need to store both date and time information on a single row in DB2. Is it better to use a single TIMESTAMP column or two columns, one DATE and the other TIME?


Well, of course, the answer is "it depends!" The correct solution will depend on several factors specific to your situation. Consider the following points before making your decision:

  • With DATE and TIME you must use two columns. TIMESTAMP uses one column, thereby simplifying data access and modification.
  • The combination of DATE and TIME columns requires 7 bytes of storage, while a TIMESTAMP column always requires 10 bytes of storage. Using the combination of DATE and TIME columns will save space.
  • TIMESTAMP provides greater time accuracy, down to the microsecond level. TIME provides accuracy only to the second level. If precision is important, use TIMESTAMP. Use TIME if you want to ensure that the actual time is NOT stored down to the microsecond level.
  • A TIMESTAMP can always be broken down into a DATE and a TIME component, after which you can treat the data just like DATE and TIME data.
  • Date and time arithmetic probably will be easier to implement using TIMESTAMP data instead of a combination of DATE and TIME. Subtracting one TIMESTAMP from another results in a TIMESTAMP duration. To calculate a duration using DATE and TIME columns, two subtraction operations must occur: one for the DATE column and one for the TIME column.
  • Formatting may be easier with DATE and TIME data. DB2 provides for the formatting of DATE and TIME columns via local DATE and TIME exits, the CHAR function, and the DATE and TIME precompiler options. If the date and time information is to be extracted and displayed on a report or by an online application, the availability of these DB2-provided facilities for DATE and TIME columns should be considered when making your decision.
  • Prior to DB2 V9, not much help was available for the formatting of TIMESTAMP columns.But DB2 9 for z/OS adds the TIMESTAMP_FORMAT function, which offers three different formats for displaying timestamp data.
Upon reviewing all of these details, and factoring in your usage requirements, you can then make an informed decision about whether to use one TIMESTAMP column, or two columns, one DATE and one TIME.