The Db2 Portal Blog

Thursday, July 12, 2007

New Data Types [DB2 9 for z/OS]

As we continue our blog series on the new features and functionality of DB2 9 for z/OS, today we examine the four (OK, five) new data types introduced in this version of DB2.

BIGINT

First up, we have the BIGINT data type. A BIGINT is an exact numeric data type capable of representing 63-bit integers. This is the third integer data type now available to DB2 and it offers the ability to store the largest range of values:

SMALLINT values can range from -32768 to 32767
INTEGER values can range from -2147483648 to 2147483647
BIGINT values can range from -9223372036854775808 to 9223372036854775807

So when you have the need to store very large integers you don’t have to use DECIMAL with a zero scale any longer.

BINARY and VARBINARY

Next up, and perhaps more exciting, V9 delivers a true binary data type for the first time in DB2, so we no longer have to use a BLOB or CHAR with FOR BIT DATA. BINARY is a fixed-length binary string up to 255 bytes. DB2 9 also delivers a VARBINARY data type, which is a variable-length binary string up to 32704 bytes.

BINARY and VARBINARY data types extend current support of binary strings (BLOB), and are compatible with the BLOB data type. They are not compatible with character string data types. IBM does make it somewhat easy to migrate existing columns defined as CHAR FOR BIT DATA or VARCHAR FOR BIT DATA over to BINARY or VARBINARY. If there is an index defined on the column, the index is placed in RBDP. You cannot alter BINARY or VARBINARY data types to CHAR FOR BIT DATA or VARCHAR FOR BIT DATA.

Also, there are some usage considerations to keep in mind. Two binary strings are equal only if the lengths are identical. If two strings are equal up to the length of the shorter string length, the shorter string is considered less than the longer string, even when the remaining bytes in the longer string are hex zeros.

DECFLOAT

The next new data type supported by DB2 9 for z/OS is DECFLOAT. V9 takes advantage of new System z9 hardware support delivering the DECFLOAT data type that lets you use decimal floating-point numbers with greater precision. A decimal floating-point value (DECFLOAT) is an IEEE 754r number with a decimal point.

The maximum precision is 34 digits and the range of a decimal floating point number is either 16 or 34 digits of precision.

DECFLOAT(16) values can range from a low of:


-9.999999999999999×10**384

to a high of:


9.999999999999999×10**384

DECFLOAT(34) values can range from a low of:


-9.999999999999999999999999999999999 ×10**6144

to a high of:


9.999999999999999999999999999999999 ×10**6144

Anyone think they’ll need bigger numbers?

In addition, the DECFLOAT data type is able to represent special values representing “non-number numbers”:

Infinity - a value that represents a number whose magnitude is infinitely large.
Quiet NaN - a value that represents undefined results which does not cause an invalid number condition. NaN is not a number.
Signaling NaN - a value that represents undefined results which will cause an invalid number condition if used in any numerical operation.

XML

And finally, we get pureXML support to store XML as a native data type in DB2. That means you can specify XML as a data type for columns in your DB2 tables in DB2 9 for z/OS. But I’m not really going to elaborate any further on XML here. A few more details can be found in an earlier post here.

Monday, July 09, 2007

New Built-in Functions [DB2 9 for z/OS]

DB2 9 for z/OS introduces a bevy of new built-in functions (BIFs) for programmers to use in their SQL statements. It is important to keep track of the BIFs available in DB2 because BIFs simplify your coding and development. Invoking a function is always easier than trying to write the equivalent functionality in your host language code. And the BIF will work properly, whereas you cannot always be so sure about your own (sometimes buggy) code.

So, what new function functionality do we get with DB2 9 for z/OS? First of all, we get some new ASCII and EBCDIC conversion functions. The ASCII_CHR function returns the character that has the ASCII code value that is specified by the argument.; and the ASCII_STR function returns a string, in the system ASCII CCSID that is an ASCII version of the string. Knowing that, I bet you can guess what EBCDIC_CHR and EBCDIC_STR do.

We also get UNICODE and UNICODE_STR functions in DB2 9 for z/OS. The UNICODE function returns the Unicode UTF-16 code value of the leftmost character of the argument as an integer. And the UNICODE_STR function? It returns a string in Unicode UTF-8 or UTF-16 (depending on the specified parameter) representing a Unicode encoding of the input string.

Perhaps more interesting is soundex support for testing whether two strings sound the same as each other. DB2 offers two functions here: SOUNDEX and DIFFERENCE. The SOUNDEX function returns a 4 character code that represents the sound of the words in the argument. The result can be used to compare with the sound of other strings. The data type of the result is CHAR(4). So what does this mean? Consider the following example:


SELECT LASTNAME
FROM   DSN910.EMP
WHERE  SOUNDEX(LASTNAME) = SOUNDEX(’Smith’);

This query would return not only employees with a last name of “Smith,” but also anything that sounds like Smith, such as Smythe.

The DIFFERENCE function is related to SOUNDEX. It returns a value from 0 to 4 where the number represents the difference between the sounds of two strings based on applying the SOUNDEX function to the strings. The higher the value, the closer the two strings are to sounding alike. Consider:


SELECT DIFFERENCE(’CONSTRAINT’,’CONSTANT’),
       SOUNDEX(’CONSTRAINT’),
       SOUNDEX(’CONSTANT’)
FROM   SYSIBM.SYSDUMMY1;

This example returns the values 4, C523, and C523. Since the two strings return the same SOUNDEX value, the difference is 4 (the highest value possible). The more different-sounding the two strings are, the smaller the number would be.

Another interesting series of BIFs are focused on date and time data. These functions include EXTRACT, MONTHS_BETWEEN, and various new timestamp-related functions.

EXTRACT returns a portion of a date or timestamp. You can use EXTRACT to slice up a date/time value into its component pieces. Consider:

SELECT BIRTHDATE,
       EXTRACT(DAY FROM BIRTHDATE) AS DAY,
       EXTRACT(MONTH FROM BIRTHDATE) AS MONTH,
       EXTRACT(YEAR FROM BIRTHDATE) AS YEAR
FROM DSN8910.EMP;

This query would return the entire date, along with each component (year, month, day) of the date as a separate column. You can use one function (EXTRACT) to do the job of the already-existing YEAR, MONTH, and DAY functions. Of course, YEAR, MONTH, and DAY are still available for your use. Similar functionality for EXTRACT exists for time components using HOUR, MINUTE, and SECOND.

The next temporal BIF introduced with DB2 9 for z/OS is the MONTHS_BETWEEN function. It returns an estimate of the number of months between two expressions. If the first expression represents a date that is later than the second, the result will be positive; if the opposite is true the result will be negative. The result is calculated based on a 31 day month.

For example, consider this statement:

SELECT MONTHS_BETWEEN ('2007-02-20','2007-01-17')
AS MONTHS_BETWEEN
FROM SYSIBM.SYSDUMMY1;

The result of this query would be 1.096774193548387.

We also get four new timestamp-related BIFs: TIMESTAMPADD, TIMESTAMPDIFF, TIMESTAMP_FORMAT, and TIMESTAMP_ISO. The first two are rather straightforward: TIMESTAMPADD adds an interval to a timestamp and TIMESTAMPDIFF subtracts two timestamps and returns an interval. I will not get into a discussion of date/time arithmetic here, but if you are interested check out my article Q+As on Dates and DB2.

What about the other two timestamp-related BIFs? TIMESTAMP_FORMAT offers the much-needed ability to choose different display formats for a timestamp value. The valid formats that can be specified are:
• ‘YYYY-MM-DD’
• ‘YYYY-MM-DD-HH24-MI-SS’
• ‘YYYY-MM-DD-HH24-MI-SS-NNNNNN’

And instead of using the dash ( - ) as the separator, you can also use . / , : ; and blank. These separators can be used in any combination. And the VARCHAR_FORMAT function returns a character representation of a timestamp in a format specified as above. So, consider the following query:

SELECT SUBSTR(NAME,1,8) AS TSNAME,
VARCHAR_FORMAT(CREATEDTS,'YYYY-MM-DD-HH24:MI:SS') AS TSCR
FROM SYSIBM.SYSTABLESPACE;
WHERE
CREATEDTS >=
TIMESTAMP_FORMAT('2007-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS');

This query will return the name of all table spaces created since the first of the year, along with its creation timestamp using the format specified.

We also get some new string manipulation functions. The LOCATE_IN_STRING function returns the starting position of the first occurrence of one string within another string. This is basically the same as the existing LOCATE function.

We also get LPAD and RPAD functions. The LPAD function returns a string that is padded on the left, with blanks (or a specific character). The LPAD function treats leading or trailing blanks as significant. RPAD, of course, does the same but on the right. So, consider the following example:

SELECT LPAD(LASTNAME, 30, ’.’ ) AS LAST,
RPAD(FIRSTNME, 30) AS FIRST
FROM DSN910.EMP;

This query will left pad the last name with periods and right pad the first name with blanks.

OVERLAY is yet another new string manipulation function. It allows you to return a string with portions of it overlaid by a specified value. You provide the string, a substring to be overlaid, and its starting point and length, and DB2 does the rest. Learning by example is simpler than trying to explain how it works, so here goes:

SELECT CHAR(OVERLAY('PLATELET','CEMEN',4,4,OCTETS),9),
CHAR(OVERLAY('INSERTING','IS',4,2,OCTETS),10),
FROM SYSIBM.SYSDUMMY1;

The results returned by this query would be:
• 'PLACEMENT ' (starting at position 4 overlay 4 bytes with 5 bytes 'CEMEN')
• 'INSISTING' (starting at position 4 overlay 2 bytes with 'IS')

Additional BIFs include RID, which returns the RID of a row, COLLATION_KEY which returns a varying-length binary string that represents the collation key of the expression in a named collation, DECRYPT_BINARY which adds the ability to decrypt the new BINARY and VARBINARY data types, and NORMALIZE_STRING which takes a Unicode string argument and returns a normalized string. And, of course, we also get scalar functions to support the new data types in DB2 9 (BIGINT, BINARY, VARBINARY and DECFLOAT).

Finally, DB2 9 for z/OS adds three new aggregate functions: CORRELATION, COVARIANCE, and COVARIANCE_SAMP.

The CORRELATION function returns the coefficient of correlation of a set of number pairs. And the COVARIANCE and COVARIANCE_SAMP functions return the (population) covariance of a set of number pairs.

So, as you can see, DB2 9 for z/OS continues adding new and useful functions to simplify our development efforts with DB2.

Friday, June 29, 2007

Implicitly Created Database Objects [DB2 9 for z/OS]

Today we continue our series on new features in DB2 V9 with a quick discussion of implicitly created database objects. To understand what this is, let’s first review the way DB2 works today (pre-V9). If you issue a CREATE TABLE statement and do not specify the database and table space into which the table is to be created, DB2 will automagically create a new table space in the default database (DSNDB04).

Not being content with that, DB2 9 extends this capability with the ability to implicitly create additional types of database objects. By coding your CREATE TABLE statement with the proper options you can implicitly create any or all of the following:

Database

Table space

Index to enforce Primary Key uniqueness

Index to enforce unique constraint

ROWID index (if the ROWID is defined as GENERATED BY DEFAULT)

LOB structures (LOB table space, auxiliary table, auxiliary index)

OK, so how does this happen? Let’s go down the list. If you fail to specify the IN clause on a CREATE TABLE, DB2 works a bit differently. In the past, DB2 would simply create an implicit table space in DSNDB04. As of DB2 9, the database is involved as well. DB2 will either create an implicit database or use a previously implicitly created database. The names of these implicitly created databases will range from DSN00001 to DSN60000. The first time, DB2 will create DSN00001, the second DSN00002, and so on until we reach DSN60000. The next time, DB2 will wrap around and start again from the beginning, using existing implicitly created databases. For the implicitly created databases, the STOGROUP will be set to SYSDEFLT; buffer pool values are determined via DSNZPARMs.

Next up is the table space. Although DB2 has supported implicitly created table spaces forever, there are some twists in DB2 9. First of all, you cannot create simple table spaces any longer, so all implicitly created table spaces will be segmented. In compatibility mode (CM), a implicitly created table spaces will be defined as SEGSIZE 4 and LOCKSIZE ROW. After migrating to new function mode (NFM) your implicitly created table spaces will be created as partition by growth table spaces. The options uses will be SEGSIZE 4, DSSIZE 4G, MAXPARTITIONS 256, LOCKSIZE ROW, and LOCKMAX SYSTEM.

As for the rest of the objects in the list, these system-required objects will always be implicitly created if the table space is created implicitly. For indexes that support the primary key or unique constraints, the names will be generated using a combination of the table name and randomly generated characters.

OK, so now that you know about the ability of DB2 9 to implicitly create objects, let me give you some advice. Whenever possible, don’t rely on it. It is much better, if at all feasible, for your DBAs to explicitly create and name all database objects as needed. Yes, it takes more time, but it gives you more control. You can explicitly decide which objects go into which database; you can explicitly set paramters; etc.

So, this is a nice new feature and it can enable DB2 to do some definitional things for you automatically. But most DBAs will want to continue to do things the traditional way, that is, building their DDL themselves without relying on implicitly

Monday, June 18, 2007

Skipping Locked Rows [DB2 9 for z/OS]

In DB2 9 it is possible for a transaction to skip over rows that are locked. This can be accomplished by means of the SKIP LOCKED DATA option within your SQL statement(s). SKIP LOCKED DATA can be specified in SELECT, SELECT INTO, and PREPARE, as well as searched UPDATE and DELETE statements. You can also use the SKIP LOCKED DATA option with the UNLOAD utility.

Of course, if a program skips over locked data then that data is not accessed and the program will not have it available. When this option is used DB2 will just skip over any locked data instead of waiting for it to be unlocked. The benefit, of course, is improved performance because you will not incur any lock wait time. But it comes at the cost of not accessing the locked data at all. This means that you should only utilize this clause when your program can tolerate skipping over some data.

The SKIP LOCKED DATA option is compatible with cursor stability (CS) isolation and read stability (RS) isolation. But it cannot be used with uncommitted read (UR) or repeatable read (RR) isolation levels. DB2 will simply ignore the SKIP LOCKED DATA clause under UR and RR isolation levels.

Additionally, SKIP LOCKED DATA works only with row locks and page locks. That means that SKIP LOCKED DATA does not apply to table, partition, LOB, XML, or table space locks.
Let's look at an example. Suppose we have a table with 5 rows in it that looks like this:


KEY  FNAME  LNAME
---  ------ -------
 1   JOE     MAMA
 2   DON     KNOTTS
 3   KIM     PORTANT
 4   BOB     NOBBLE
 5   KIM     BIMBO

Assume row level locking. Next assume that an UPDATE statement is run against the table changing FNAME to JIM WHERE LNAME = 'KIM'. And it is hanging out there without a COMMIT. Next, we run:


SELECT COUNT (*)
FROM   TABLE
WHERE  FNAME >= ’AAA’
SKIP LOCKED DATA;

The count returned would be 3 because DB2 skips the two locked rows (rows 3 and 5). And, of course, if the locks are released the count would be 5 again.

Wednesday, June 13, 2007

Do You Want to Ignore Clustering? [DB2 9 for z/OS]

DB2 9 for z/OS offers a new DDL parameter for your tables: APPEND. If you specify APPEND NO, which is the default, DB2 will operate as you are accustomed to it operating. That is, when rows are inserted or loaded DB2 will attempt to sequence them based on the clustering index key.

If you specify APPEND YES though, DB2 will ignore clustering during inserts and online LOAD processing. Instead of attempting to maintain clustering, DB2 will just append the rows at the end of the table or partition. If the table space is partition-by-growth (new DB2 9 feature) then DB2 can use any partition with space available at the end; for range-partitioned table spaces, obviously DB2 will append the data to the end of the partition corresponding to the range for the value(s) being inserted.

You might want to choose this option to speed up the addition of new data. Appending data is faster because DB2 does not have to search for the proper place to maintain clustering. And you can always re-cluster the table by running a REORG.

The APPEND option cannot be specified on LOB tables, XML tables, or tables in work files.

To track the state of the APPEND option there is a new column, APPEND, in the DB2 Catalog in SYSTABLES. Its value will be either ‘Y’ or ‘N’.

Monday, June 04, 2007

Reordered Row Format [DB2 9 for z/OS]

If you’ve worked with DB2 for awhile, especially as a DBA, you’ve probably heard the advice to re-arrange the columns of your tables to optimize logging efficiency. Basically, the more data that DB2 has to log, the more overhead your programs will incur, and performance will degrade. DB2 will log data from the first byte changed to the last byte changed – unless the row is variable, in which case DB2 will log from the first byte changed to the end of the row – unless the change does not cause the length of the variable row to change, in which case DB2 goes back to logging from the first byte changed to the last byte changed.

So, the advice goes something like this: put you static columns (those that do not change frequently) at the beginning of the row and your dynamic columns (those that will change more frequently) at the end of the row. And put your variable columns at the end of each. This would make your row look something like this:

[Static fixed-length cols]
[Static variable cols]
[Dynamic fixed-length cols]
[Dynamic variable cols]

Make sense?

Well, DB2 9 for z/OS takes this advice to heart (sort of). In New Function Mode (NFM), for new table spaces, DB2 will automatically put the variable columns at the end of the row. This is called reordered row format (RRF); the row format we are all familiar with today is now referred to as basic row format (BRF). This is all how the row is stored – it does not mean that your DDL is changed nor does it require changes to anything external or how you access the rows.

To summarize, this means that a row in RRF will store the fixed-length columns first and the variable columns at the end. Pointers within the row will point to the beginning of the variable columns.

So far so good, right? Well, we DB2 will also convert our old table spaces to RRF over time. Once we are in DB2 9 NFM, a REORG or a LOAD REPLACE will cause a change from BRF to RRF. So run a LOAD REPLACE a table space in NFM and the row format changes. REORG a partition and the row format for that partition changes. And yes, you can have a partitioned table space with some partitions in BRF and some in RRF.

With BRF we can be sure that DB2 is putting our variable columns at the end of the row – where they belong. But it still is not helping us with placing static columns before the dynamic ones. You’ll still have to guide DB2 to do that.

Tuesday, May 29, 2007

INSTEAD OF Triggers [DB2 9 for z/OS]

DB2 9 for z/OS introduces a new type of trigger: the INSTEAD OF trigger. The primary usage of INSTEAD OF triggers is to enable views that would not otherwise be updatable to support updates. Typically, a view that consists of multiple base tables cannot be updated. But with an INSTEAD OF trigger this problem can be surmounted. You can code an INSTEAD OF trigger to direct inserts, updates and deletes to the appropriate underlying tables of the view.

With the INSTEAD OF trigger, your application code does not have to include complex algorithms to specify which operations should be performed against views and which should be performed against base table. Instead, all actions are performed against the view and the activated trigger determines which underlying base tables are to be impacted.

So, you might choose to code an INSTEAD OF trigger on a view over a join to allow modifications on the view to go through to the underlying base tables joined in that view. Or you can encode and decode data within a view: for example, the view could contain the decryption functions while the INSTEAD OF triggers use the encryption functions to ensure security.

Only one INSTEAD OF trigger is allowed for each type of operation on a given subject view. That is, one for inserts, one for updates, and one for deletes. Therefore, you can have a grand total of three INSTEAD OF triggers per view.

DB2 executes the triggered-action instead of the insert, update, or delete operation on the subject view. Neither the WHEN clause nor the FOR EACH STATEMENT clause are allowed in INSTEAD OF triggers.

Furthermore, there are some restrictions on the view in order for an INSTEAD OF trigger to be allowed. First of all, the view must exist at the current server when the INSTEAD OF trigger is created. Additionally, none of the following are permitted for a view to have an INSTEAD OF trigger:

the WITH CASCADED CHECK option
a view on which a symmetric view has been defined
a view that references data encoded with different encoding schemes or CCSID values
a view with a ROWID, LOB, or XML column (or a distinct type that is defined as one of these types)
a view with a column based on an underlying column defined as an identity column, security label column, or a row change timestamp column
a with a column that is defined (directly or indirectly) as an expression
a view with a column that is based on a column of a result table that involves a set operator
a view with any columns that have field procedures
a view where all of the underlying base tables are DB2 Catalog tables or created global temporary tables
a view that has other views dependent on it

One way to think of INSTEAD OF triggers is that they contain the inverse of the logic in the body of the view. If the view joins tables, the trigger should break the join apart to modify the correct data. If the view decrypts columns, the INSTEAD OF trigger should encrypt the columns. Etc.

Let’s take a look at an example to better understand the INSTEAD OF trigger. First, we create a view that joins the EMP and DEPT tables:


CREATE VIEW EMP_DEPT (EMPNO, FIRSTNME, MIDINIT, LASTNAME,
                      PHONENO, HIREDATE, DEPTNAME)
AS SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, PHONENO,
          HIREDATE, DEPTNAME
     FROM EMP, DEPT
    WHERE EMP.WORKDEPT = DEPT.DEPTNO;

OK, so far, so good. But since this view is a join it is not updateable. Let’s fix this by coding up some INSTEAD OF triggers. First, we’ll take care of inserts:


CREATE TRIGGER E_D_ISRT
  INSTEAD OF INSERT ON EMP_DEPT
  REFERENCING NEW AS NEWEMP
  FOR EACH ROW INSERT INTO EMPLOYEE
   (EMPNO, FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE)
  VALUES
   (EMPNO, FIRSTNME, MIDINIT, LASTNAME,
    COALESCE
    ((SELECT DEPTNO FROM DEPT AS D WHERE D.DEPTNAME = NEWEMP.DEPTNAME),
      RAISE_ERROR('70001', 'Unknown dept name')
     ),
    PHONENO, HIREDATE);

An insert against the view would not be inserting a new department, so we will be inserting data into the EMP table. If the department does not exist, we’ll raise an error. Next we’ll consider updates:


CREATE TRIGGER E_D_UPD
  INSTEAD OF UPDATE ON EMP_DEPT
  REFERENCING NEW AS NEWEMP OLD AS OLDEMP
  FOR EACH ROW
  BEGIN ATOMIC
   VALUES(CASE WHEN NEWEMP.EMPNO = OLDEMP.EMPNO THEN 0
               ELSE RAISE_ERROR('70002', 'Must not change EMPNO') END);
   UPDATE EMP AS E SET
         (FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE)
       = (NEWEMP.FIRSTNME, NEWEMP.MIDINIT, NEWEMP.LASTNAME,
          COALESCE((SELECT DEPTNO FROM DEPT AS D
                    WHERE D.DEPTNAME = NEWEMP.DEPTNAME),
          RAISE_ERROR ('70001', 'Unknown dept name')),
          NEWEMP.PHONENO, NEWEMP.HIREDATE)
   WHERE  NEWEMP.EMPNO = E.EMPNO;
END

Finally we take care of deletions:


CREATE TRIGGER E_D_DEL
  INSTEAD OF DELETE ON EMP_DEPT
  REFERENCING OLD AS OLDEMP
  FOR EACH ROW
   DELETE FROM EMP AS E WHERE E.EMPNO = OLDEMP.EMPNO;

Using an INSTEAD OF trigger, each requested modification operation made against the view is replaced by the trigger logic. The trigger performs the insert, update, or delete on behalf of the view. No application changes are required because the code is in the trigger which resides in the database.

If you want to read more about INSTEAD OF triggers, I recommend this quite extensive article (albeit for DB2 LUW) out on the IBM Developer Works web site.

Thursday, May 24, 2007

LOB Enhancements [DB2 9 for z/OS]

IBM focused their attention on improving DB2’s ability to store and manage LOB data in Version 9. As anyone who has tried to use LOBs in a previous version of DB2 knows, the usability limitations were troublesome. But with Version 9, IBM chips away at some of the more annoying LOB limitations.

FETCHing LOBs

Prior to Version 9, there were two methods you could deploy in your programs to fetch LOB data:

Fetching data into a pre-allocated buffer
Using a LOB locator to retrieve a handle on the data.

Both methods have their issues. Fetching data into a preallocated buffer can cause virtual storage constraint problems, especially for larger LOBs. On the other hand, using LOB locators that commit infrequently or do not explicitly free the locators can use considerable amounts of DB2 resources.

V9 introduces a new clause, WITH CONTINUE, for use on your FETCH statements. By coding your program to use WITH CONTINUE you can retrieve LOB columns in multiple pieces without using a LOB locator, and continue a FETCH operation to retrieve the remaining LOB data when truncation occurs. (Note: this method can be used with XML data, too.) You will have to manage the buffers and reassemble the pieces of data in your application program.

So, by specifying WITH CONTINUE on your FETCH statement you tell DB2 to allow subsequent FETCH CURRENT CONTINUE operations. These will allow you to access the remaining truncated LOB (or XML) column after the initial FETCH. If truncation occurs, DB2 will remember the truncation position and will not discard the remaining data. DB2 will return the total length that would have been required to hold all of the data of the LOB or XML column. This will either be in the first four bytes of the LOB host variable structure or in the 4 byte area that is pointed to by the SQLDATALEN pointer in the SQLVAR entry of the SQLDA for that host variable.

File Reference Variables

DB2 V9 adds support for a LOB file reference variable. A file reference variable is a host variable that is defined in a host language to contain the file name that directs file input and output for a large object (LOB).

Using file reference variables, large LOB values can be inserted from a file or selected into a file rather than a host variable. This means that your application program does not need to acquire storage to contain the LOB value. File reference variables also enable you to move LOB values from the DBMS to a client application or from a client application to a database server without going through the working storage of the client application.

LOBs and Utilities

The manner in which DB2 handles LOBs in utility processing has also been improved in DB2 V9.

Loading and unloading LOBs has been improved. For LOAD, an input field value can contain the name of the file that contains a LOB column value. The LOB column value will then be loaded from that file. For UNLOAD, you can store the value of a LOB column in a file and record the name of the file in the unloaded record in the base table.

What about REORG? Well, prior to V9 you could not access LOB data during a REORG. And a REORG did not reclaim physical space from the LOB data set because LOBs were moved within the existing LOB table space. V9 fixes these problems. During a REORG (in V9), the original LOB table space is drained of writers. All LOBs are then extracted from the original data set and inserted into a shadow data set. When this operation is complete, all access to the LOB table space is stopped (the readers are drained) while the original data set is switched with the shadow data set. At this point, full access to the new data set is enabled, and an inline copy is taken to ensure recoverability of data.

Additionally, both CHECK LOB and CHECK DATA have SHRLEVEL REFERENCE and SHRLEVEL CHANGE options. Using these you can minimize downtime. For LOBs, CHECK DATA checks for consistency between a base table space and the corresponding LOB or XML table spaces. The CHECK LOB utility identifies structural defects in the LOB table space and any invalid LOB values. Running CHECK (DATA or LOB) with SHRLEVEL REFERENCE indicates that applications can read from but cannot write to the index, table space, or partition that is to be checked. SHRLEVEL CHANGE means that applications can read from and write to the index, table space, or partition that is to be checked.

Performance

Finally, DB2 V9 provides several performance improvements related to LOBs. Using file reference variables or WITH CONTINUE to “chunk” read LOBs can improve performance over using locators.

And as we all know, removing locks can improve performance. DB2 V9 eliminates LOB locking for space allocation, as well as for insert, delete, update, and select operations. Additionally, a LOB lock is no longer required to serialize the consistency between the value of the LOB and the column of the base row for an uncommitted read operation.

Monday, May 21, 2007

New OLAP Capabilities [DB2 9 for z/OS]

DB2 9 for z/OS offers several new SQL improvements focused on improving OLAP functionality. The OLAP functions provide the ability to return ranking, row numbering, and existing aggregate function information as a scalar value in the result of a query. You can include OLAP specifications in an expression, in a select-list, or in the ORDER BY clause of a select-statement.

The result to which the OLAP specification is applied is the result table of the innermost subselect that includes the OLAP specification. OK, at this point you might well be asking, “So, what are these new OLAP things and what can they do for me?” Let’s take them one at a time.

First up, we have the RANK and DENSE_RANK functions. These functions specify that the ordinal rank of a row within the specified window is computed. Rows that are not distinct with respect to the ordering within the specified window are assigned the same rank. Here is a quick example:


SELECT
   EMPNO,
   LASTNAME,
   FIRSTNAME,
   SALARY+BONUS+COMM AS TOTAL_COMP,
   RANK() OVER(ORDER BY SALARY+BONUS+COMM DESC) AS RANK_COMP
FROM EMP
WHERE SALARY+BONUS+COMM > 30000
ORDER BY LASTNAME;

This query will rank employees who have total compensation greater than $30,000, but order the results by last name. This allows you to rank data differently than the order in which it is presented.

You can define the results of ranking with gaps in the sequential rank numbering by using the RANK specification, or without gaps, by using the DENSE_RANK specification.

The difference between the two can be a bit difficult to decipher at first. Think of it this way: RANK specifies that the rank of a row is defined as 1 plus the number of rows that strictly precede the row. So, if two or more rows are not distinct you will get gaps in the ranking. With DENSE_RANK the rank of a row is defined as 1 plus the number of preceding rows that are distinct with respect to the ordering. In this case there will be no gaps in the sequential rank numbering. Consider the following data:


EMPNO  LASTNAME   FIRSTNAME  SALARY  BONUS   COMM
-----  --------   ---------  ------  ------ ------
100     MULLINS   CRAIG      500000  100000 400000
200     SHAW      DONNA       25000   10000      0
300     ABBOTT    RANDY      700000  300000      0
400     WISER     BUD         10000       0      0
500     GREEN     RACHEL      40000    2000   5000

The results of the previous query run against this data would look like this:


300    ABBOTT     RANDY     1000000    1
500    GREEN      RACHEL      47000    3
100    MULLINS    CRAIG     1000000    1
200    SHAW       DONNA       35000    4
400    WISER      BUD         10000    5

Note that both ABBOTT and MULLINS earn the most, but the amount is the same, so they share the number one ranking. As this is not a dense rank, the next rank value is 3, and not 2.

The next OLAP specification introduced by DB2 9 for z/OS is ROW_NUMBER. ROW_NUMBER specifies that a sequential row number is computed for the row that is defined by the ordering, starting with 1 for the first row. If the ORDER BY clause is not specified in the window, the row numbers are assigned to the rows in an arbitrary order, as the rows are returned. This satisfies an often-requested capability to simply assign a number to the result rows of a query. Row numbers also enable easy formulation of queries for computing histogram statistics and they enable formation of other OLAP specifications (for example, moving sums, moving averages, and so on).

Here is a sample query using ROW_NUMBER:


SELECT
   ROW_NUMBER() OVER(ORDER BY WORKDEPT, LASTNAME) AS NUMBER,
   LASTNAME,
   SALARY
FROM EMP
ORDER BY WORKDEPT, LASTNAME;

The result of a RANK, DENSE_RANK, and ROW_NUMBER specification is BIGINT, and the result cannot be null.

One more thing to consider for the new OLAP features is the ability to partition results for the OLAP specification. This is specified using the PARTITION BY clause with a partitioning-expression, which is an expression used to define the partitioning of the result table. Each column name that is referenced in a partitioning-expression must unambiguously reference a column of the result table of the subselect that contains the OLAP specification. A partitioning-expression cannot include a scalar-fullselect or any function that is not deterministic or has an external action.

Here is an SQL example using partitioning:


SELECT
   WORKDEPT,
   EMPNO,
   LASTNAME,
   FIRSTNAME,
   EDLEVEL,
   DENSE RANK() OVER
    (PARTITION BY WORKDEPT ORDER BY EDLEVEL DESC) AS RANK_EDLEVEL
FROM EMP
ORDER BY WORKDEPT, LASTNAME;

This SQL ranks departments according to their education level. And because DENSE_RANK is specified multiple employees with the same rank in the department will not increase the next ranking.

Sometimes RANK, DENSE_RANK, and ROW_NUMBER are called window functions. An OLAP specification is not valid in a WHERE, VALUES, GROUP BY, HAVING, or SET clause. An OLAP specification cannot be used as an argument of an aggregate function. When invoking an OLAP specification, a window is specified that defines the rows and order of the rows over which the function is applied.

Tuesday, May 15, 2007

Implicitly Hidden Columns [DB2 9 for z/OS]

Another nice new feature deep in the bowels of DB2 9 for z/OS is the ability to hide columns from the SELECT * statement. As far back as anyone can remember the advice has been given to avoid using SELECT * in application programs. But I still see it every now and then.

Now don't get me wrong, SELECT * is a nice shorthand when you are writing quick & dirty SQL using SPUFI or some other ad hoc SQL tool. But it does not belong in your application programs because a subsequent ALTER to add a column will cause the program to fail because there are now more columns being returned than the programmer coded into the program.

"OK," you may be asking, "so what? I thought you were writing about DB2 9 here?" Fair enough. DB2 9 for z/OS adds the ability to code IMPLICITLY HIDDEN on the column specification of a CREATE or ALTER TABLE statement. By coding IMPLICITYLY HIDDEN, the column will not be visible in the result of a SQL statement unless you explicitly refer to the column by name. So, SELECT * will not return any implicitly hidden columns to the result set.

For example, in the following table C2 is implicitly hidden and will not be returned as the result of SELECT *:

CREATE TABLE T1
(C1 SMALLINT NOT NULL,
C2 CHAR(10) IMPLICITLY HIDDEN,
C3 TIMESTAMP)
IN DB.TS;

This has some obvious beneifts. First of all, if you are one of those shops where programmers did not follow the no SELECT * in programs rule, then you can simply add every new column with the IMPLICITLY HIDDEN attribute and those SELECT * statements will keep on running because they won't see the new columns.

Or, you might want to take a more comprehensive approach, and specify every column (except one, perhaps the key) of every new (or modified) table as IMPLICITLY HIDDEN. If every column except the key is hidden, then a SELECT * won't return anything except the key - you'll have to explicitly specify all of those other columns to get them into your result sets. Of course, this negates the ability to use SELECT * for quick & dirty SPUFI queries because implicitly hidden columns will be hidden there, too.

There are a few caveats on the usage of IMPLICITLY HIDDEN. You cannot specify IMPLICITLY HIDDEN for a column that is defined as a ROWID, or a distinct type that is based on a ROWID. Additionally, IMPLICITLY HIDDEN must not be specified for all columns of a table.

Monday, May 14, 2007

Index on Expressions [DB2 9 for z/OS]

DB2 9 for z/OS offers, for the first time, the ability to create an index on data that is not technically in the table. At this point, you may well be asking “What does that mean?” and “Why would I do that?” I’ll attempt to answer both of those questions.

Basically, the neat new feature is an extension to the CREATE INDEX statement that lets you create an index on an expression. That begs the question of just what is an expression… Well, an expression can be as simple as a column reference, or it can be a built-in function invocation or even a more general expression, with certain restrictions.

The expression that is being indexed is referred to as the key-expression. The maximum length of the text string of each key-expression is 4000 bytes after conversion to UTF-8. Each key-expression must contain as least one reference to a column of the table named in the ON clause of the index. Referenced columns cannot be LOB, XML, or DECFLOAT data types nor can they be a distinct type that is based on one of those data types. Referenced columns cannot include any FIELDPROCs nor can they include a SECURITY LABEL. Furthermore, the key-expression cannot include any of the following:

subquery
aggregate function
not deterministic function
function that has an external action
user-defined function
sequence reference
host variable
parameter marker
special register
CASE expression
OLAP specification

Unlike a simple index, the index key of an index on expression is composed by concatenating the result (also known as a key-target) of the expression that is specified in the ON clause. An index that is created on an expression lets a query take advantage of index access (if the index is chosen by the optimizer) and avoid a table space scan.

Perhaps an example would help to clarify this topic. Consider this sample DDL:

CREATE INDEX XUPLN
ON EMP
(UPPER(LAST_NAME, 'En_US'))
USING STOGROUP DSN8G910
PRIQTY 360 SECQTY 36
ERASE NO
COPY YES;
This example would create an index on the LAST_NAME column of the EMP table, but would index on the data after applying the UPPER function. This is useful if you store the data in mixed case but submit queries with parameter markers (or host variables) in upper case only. By indexing the data as upper case the index better matches your queries.

Query performance can be enhanced if the optimizer chooses that index. When you use an index on an expression, the results of the expressions are evaluated during insertion time or during an index rebuild and are kept in the index. If the optimizer chooses to use that index, the predicate is evaluated against the values that are stored in the index. As a result, run-time performance overhead is eliminated.

This is a nice new feature of DB2 9 for z/OS that you should consider if you have queries that search on expressions.

Monday, April 30, 2007

Database Definition on Demand [DB2 9 for z/OS]

As you probably know, online schema evolution (sometimes referred to as “online schema change”) was one of the key new features of DB2 V8. But, as its name implies, its capabilities continue to evolve. With V9, online schema evolution expands to simplify more types of database definition changes. The new term IBM is using for this in V9 is Database Definition On Demand (DDOD).

One of the nice new components provided by DDOD in V9 is that online table space reorganization is significantly improved. Today, when reorganizing just a couple of partitions in a partitioned table space the BUILD2 phase takes a long time to complete. V8 removed the outage for DPSIs, and now V9 removes the BUILD2 phase for all types of secondary indexes.

Another new DDOD capability supports replacing one table quickly with another. This is accomplished via cloning and the technique can even avoid the need to REBIND packages. Cloning allows you to generate, in the same table space, a new table with the same structure as the original table. After creating the clone you can do with it what you want – LOAD, DELETE, INSERT, UPDATE, etc. data – and then exchange the clone table name with the current table name. In this way you can keep the existing table operational while you work on the next “generation” of that table in its clone. When the clone is ready, you EXCHANGE it with the existing table. Nice, huh?

How about being able to rename a column within your table or rename an index? V9 provides the ability to do both! No longer do you have to DROP and re-CREATE in order to rename columns and indexes.

DB2 V9 also introduces a new type of table space that combines the attributes of segmented and partitioned. It is called a universal table space. Universal table spaces offer improved space management for variable length rows. This is so because it uses the space map page with more free space information like segmented table spaces. Also, like segmented table spaces, universal table spaces deliver improved mass delete performance and you can immediately reuse the table segments after the mass delete.

There are two types of universal table spaces:

Partition-by-growth: this type of universal table space will partition the data as it grows without the need to specify key ranges. This type of universal table space is beneficial for tables that will grow over time and/or need the additional limits afforded by partitioning, but can benefit from the performance of segmented. You can define more than one table in this type of universal table space if you wish.
Range-partitioned: this type of universal table space requires a key range for partitioning – and it can contain only a single table. This is basically adding segmentation to the existing partitioned table space.

Additionally, you can define SMS constructs (MGMTCLAS, DATACLASS, and STORCLAS) on a STOGROUP and you can ALTER those constructs as well. And table space and index logging parameters can be altered.

DB2 V9 even adds a new capability to change the DB2 early code without requiring an IPL.

So, with DB2 9 for z/OS we get more flexibility in modifying our database schemas. And that is a good thing, right?

Thursday, April 19, 2007

SELECT from DELETE, UPDATE, and MERGE [DB2 9 for z/OS]

Another nice new SQL feature provides the ability to SELECT from DELETE, UPDATE, and MERGE statements.

This capability is similar to the SELECT from INSERT feature that was introduced with DB2 V8. So, before looking at the new V9 feature, let’s review the V8 feature.

The ability to SELECT from an INSERT statement is an intriguing feature. To understand why this is so, we need to review some background details. In some cases, it is possible to perform actions on an inserted row before it gets saved to disk. For example, a BEFORE TRIGGER might change data before it is even recorded to disk. But the application program will not have any knowledge of this change that is made in the trigger. Identity columns and columns with defaults (especially user-defined defaults) have similar effects. What if the program needs to know the final column values? Prior to V8 this was difficult and inefficient to implement.

Well, the SELECT FROM INSERT feature introduced in DB2 V8 solves this problem. It allows you to both insert the row and retrieve the values of the columns with a single SQL statement. It performs very well because it performs both the INSERT and the SELECT as a single operation. Consider the following example:


  SELECT COL5 INTO :C5-HV
  FROM FINAL TABLE
   (INSERT (COL1, COL2, COL5, COL7) INTO SAMPLE_TABLE
    VALUES('JONES', 'CHARLES', CURRENT DATE, 'HOURLY')
   );

The data is inserted as specified in the VALUES clause, and retrieved as specified in the SELECT. Without the ability to select COL5, the program would have no knowledge of the value supplied to COL5, because it was assigned using CURRENT DATE. With this new syntax the program can retrieve the CURRENT DATE value that was just inserted into COL5 without incurring additional overhead.

OK, on to V9. In this new version you can retrieve columns from rows that are modified via DELETE, UPDATE, and MERGE statements, thereby replacing multiple SQL calls with one. DB2 V9 allows a searched UPDATE or a searched DELETE statement in the FROM clause of a SELECT statement that is a subselect, or in the SELECT INTO statement. This capability allows a user or program to know which values were updated or deleted.

And, if you recall from a few blog entries ago, DB2 9 for z/OS also adds the MERGE statement. Well, it also adds the ability to SELECT from the MERGE. This allows you to return all the rows that were either inserted or updated as a result of the MERGE.

Here is an example of a SELECT from an UPDATE:


 SELECT SUM(salary) INTO :SAL-HV
 FROM FINAL_TABLE
  (UPDATE EMP
   SET SALARY = SALARY * 1.02
   WHERE WORKDEPT = ‘A01’);

Prior to the capability you would have had to run the UPDATE statement, and then only after it finishes, you would run the SELECT to add up the new salary values. Now, instead of multiple statements requiring multiple passes through the data, you can consolidate it into one.

Nice, huh?

Tuesday, April 17, 2007

Free DB2 Access Paths Webinar - April 18, 2007

Just a quick note to inform my readers that I will be conducting a webinar tomorrow - Wednesday, April 18, 2007 - on the topic of Change Control for DB2 Access Path Selection. The session will start at 10:30 Central time and will last about an hour. To register, click on the link above.

Here is a short description of what I'll discuss: Most changes are strictly controlled in the mainframe environment. But that is not the case for DB2 access paths. When we BIND or REBIND a program, DB2 formulates access paths “on the fly” and we do not have any control over what DB2 creates for us. This lack of control over changes can cause unpredictable performance.

NOTE
This webinar is in the past and can no longer be viewed.

Tuesday, April 10, 2007

MERGE and TRUNCATE [DB2 9 for z/OS]

Two additional new SQL statements available in DB2 Version 9 are the MERGE and TRUNCATE statements.

MERGE

The MERGE statement basically takes two “tables” and merges the data into one table. The table that will contain the merged results is referred to as the target; the other participating table is called the source. Rows in the target that match the source are updated and rows that do not exist in the target are inserted from the source to the target.

If you use other DBMSs you may be somewhat familiar with MERGE functionality. It is sometimes called an UPSERT (taking the UP from update and the SERT from insert). A simplified version of the MERGE syntax follows:


MERGE INTO table_name
  USING table_name
ON (condition)
WHEN MATCHED THEN
     UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED
     THEN INSERT column1 [, column2 ...]
          VALUES (value1 [, value2 ...]) ;

The DB2 implementation is a tad different, though. Instead of the USING clause specifying an actual table, it instead specifies a VALUES clause of data for a single row or an array of rows. So the source, if it consists of multiple rows, must be populated into a host variable array.

So, say we have a customer table, CUST, and we want to accept several customers from a file. If the customer already exists, we want to populate it with the new, updated information; if the customer does not exist, we want to insert the new customer. To accomplish this in DB2 V9, we can write a MERGE statement such as this:


MERGE INTO CUST C
 USING VALUES
   ((:CUSTNO, :CUSTNAME, :CUSTDESC)
    FOR :HV_NROWS ROWS) AS N (CUSTNO, NAME, DESC)
 ON (C.CUSTNO = N.CUSTNO)
 WHEN MATCHED THEN UPDATE
   SET (C.NAME, C.DESC) = (N.NAME, N.DESC)
 WHEN NOT MATCHED THEN INSERT (CUSTNO, NAME, DESC)
   VALUES (N.CUSTNO, N.NAME, N.DESC)
 NOT ATOMIC CONTINUE ON SQL EXCEPTION;

Of course, this is a simple example as there will likely be many other columns in the customer table that would be of interest. But you can easily extrapolate from this example using it as a template of sorts to build a more complex example.

The rows of input data are processed separately. When errors are encountered and NOT ATOMIC CONTINUE ON SQL EXCEPTION is in effect, processing continues, and some of the specified rows will not be processed. Regardless of the failure of any particular source row, the MERGE statement will not undo any changes that are made to the database by the statement. Merge will be attempted for rows that follow the failed row. However, the minimum level of atomicity is at least that of a single source row (in other words, it is not possible for a partial merge to complete).

If you are using triggers be sure to consult the SQL Reference manual (PDF) to understand how MERGE impacts trigger processing.

At any rate, you need to know that the MERGE statement lets you combine UPDATE and INSERT across many rows into a table based upon a matching key. You can embed the MERGE statement in an application program or issue it interactively. The statement is executable and can be dynamically prepared. In addition, you can use the SELECT FROM MERGE statement to return all the updated rows and inserted rows, including column values that are generated by DB2.

TRUNCATE

OK, so that is MERGE, but the title of this blog entry is MERGE and TRUNCATE, so what is TRUNCATE? Well, that is an easier story to tell. The TRUNCATE statement is simply a quick way to DELETE all of the data from a table. The table can be in any type of table space and it can be either a base table or a declared global temporary table. If the table contains LOB or XML columns, the corresponding table spaces and indexes are also truncated.

For clarification, consider the following example:


TRUNCATE TABLE EXAMPLE_TABLE
  REUSE STORAGE
  IGNORE DELETE TRIGGERS
  IMMEDIATE;

Seems easy enough, doesn’t it? But what are those additional parameters? Well, REUSE STORAGE tells DB2 to empty the storage that is allocated but keeps it allocated. The alternate, which is the default, is DROP STORAGE. This option tells DB2 to release the storage that is allocated for the table and to make it available for use for the same table or any other table in the table space. REUSE STORAGE is ignored for a table in a simple table space and the statement is processed as if DROP STORAGE is specified.

The next parameter, which is the default if nothing is specified, is IGNORE DELETE TRIGGERS. This tells DB2 to not fire any DELETE triggers. Alternately, you could specify RESTRICT WHEN DELETE TRIGGERS, which will return an error if there are any delete triggers defined on the table.

Finally, we have the IMMEDIATE option. This causes the TRUNCATE to be immediately executed and it cannot be undone. If IMMEDIATE is not specified you can issue a ROLLBACK to undo the TRUNCATE.

Synopsis

So with DB2 9 for z/OS we have two new helpful SQL statements that can simplify our coding efforts. MERGE to combine INSERT and UPDATE processing and TRUNCATE to quickly DELETE all data from a table. Keep them in mind and use them to aid your DB2 application development efforts.

Monday, April 09, 2007

New DB2 9 Security Redbook

Just a quick entry today to alert everyone that there is a new DB2 for z/OS redbook available on the topics of regulatory compliance, security, and audit. The redbook is titled Securing DB2 and Implementing MLS on z/OS and you can download it for free today over the web.

The redbook is 360 pages (including index) and covers the plethora of new security features in DB2 for z/OS. If you haven't looked at DB2's authorization and security functionality in awhile there is much to learn... and this redbook will be very illuminating.

That's all for today.

Thursday, April 05, 2007

Native XML Support in DB2 Databases [DB2 9 for z/OS]

One of the biggest technological advances in DB2 V9 is the ability to combine the management of structured and unstructured data. Basically, V9 will allow you to store data as native XML. This capability has already been introduced into V9 of DB2 for Linux, Unix, and Windows. Many of you may well ask “Hey, what’s the big deal here? Can’t we already use the XML Extender and store XML data in DB2 prior to V9?” Yes, but V9 changes the game. You will be able to search and analyze structured data in a relational data repository and unstructured data in an XML repository without the need to reformat it. So your regular “relational” data gets stored as always; and XML data gets stored in its native format without the need to shove it into a CLOB or shred it into “relational” columns. The approach is novel in that DB2 will now support native XML via dual storage engines – the traditional SQL/relational engine and a new XML engine. DB2 9 for z/OS handles XML as a new data type that is stored in a natural hierarchy - different from relational data. For those of you not familiar with XML, you need to know that there are big differences between XML data and typical DB2 data. Foremost among these differences is that XML data is hierarchical, whereas “relational” DB2 data is basically “flat.” Additionally, XML data is self-describing. XML tags identify and name the data elements in the XML document. This capability concentrates both the data and its structure into a single document. So, in essence, the XML document becomes self-describing. This is important to keep in mind because a single XML document can have many different types of data, whereas “relational” DB2 data is defined in the system catalog by its column definition. And all data in the same column must have the same data type (e.g. you cannot store a name in an integer column). Finally, XML data is ordered, whereas “relational” DB2 data is not. The order in which data items are specified in the XML document is relevant. There is often no other way to specify order within an XML document. For relational data, the order of the rows is not guaranteed unless you specify an ORDER BY clause on one or more columns. OK, now, just how would you support XML data in DB2 V9 then? Think of XML as just another data type. You would use the XML data type in a CREATE TABLE statement to define a column to be of type XML. Each column of type XML can hold one XML document for every row of the table. Even though the XML documents are logically associated with a row, XML and “relational” columns are stored differently. The “relational” columns are stored in the traditional structures we all know and love. The XML data is stored in hierarchical structures. Don’t let that scare you. IBM has seamlessly integrated XML with relational data to simplify application development while optimizing search performance with highly optimized XML indexes. Here is a quick example walkthrough from the DB2 XML Guide manual that creates a simple table with an XML column. First, as with triggers, when you create tables with XML in SPUFI be sure to set the SQL terminator to a character other than a semicolon, for example, the pound sign (#). This is done so that your SQL can have embedded semicolons. Also, you’ll probably want to set CAPS OFF in SPUFI to preserve lower case. Then, create a table like this: CREATE TABLE MYCUSTOMER (CID BIGINT, INFO XML) # This creates a two columns table, the first column as a big integer and the second for the XML data. Next, we’ll build an index over XML data. We will assume that the XML documents to be stored in the INFO column will have a root element named customerinfo with an attribute named Cid. So, here is the DDL for the unique index on the Cid attribute:

CREATE UNIQUE INDEX MYCUT_CID_XMLIDX ON MYCUSTOMER(INFO)
GENERATE KEY USING XMLPATTERN
‘declare default element namespace
   "http://posample.org"; /customerinfo/@Cid’
AS SQL DECFLOAT
#

The XML pattern defining the index is case-sensitive. The element and attribute names in the XML pattern must match the element and attribute names in the XML documents exactly. Now we can insert a couple of XML documents into the INFO column, such as:

INSERT INTO MYCUSTOMER (CID, INFO) VALUES (1000,
’<customerinfo xmlns="http://posample.org" cid="1000">
  <name>Kathy Smith</name>
  <addr country="Canada">
    <street<5 Rosewood</street>
    <city>Toronto</city>
    <prov-state<Ontario</PROV-STATE>
    <pcode-zip<M6W 1E6</PCODE-ZIP>
  </addr>
  <phone type="work">416-555-1358</phone>
 </customerinfo>’)
#

INSERT INTO MYCUSTOMER (CID, INFO) VALUES (1002,
’<customerinfo xmlns="http://posample.org" cid="1002">
  <name>Jim Noodle</name>
  <addr country="Canada">
    <street>25 EastCreek</street>
    <city>Markham</city>
    <prov-state>Ontario</PROV-STATE>
    <pcode-zip>N9C 3T6</PCODE-ZIP>
  </addr>
  <phone type="work">905-555-7258</phone>
  <phone type="cell">905-555-7254</phone>
 </customerinfo>’)
#

Then you can issue a SELECT statement against this table and thereby verify that the XML documents were successfully inserted. For example: SELECT CID, INFO FROM MYCUSTOMER # V9 also supports XPath to query elements within an XML document, as well as catalog extensions to support definitions of XML schemas. Furthermore, the IBM DB2 utilities have been extended such that they can be used to administer XML data, too. To my mind, though, one of the problems with XML in DB2 9 for z/OS is the lack of support for XQuery. XQuery is an XML query language capable of traversing XML documents. Just like SQL is the query language for native DB2 data, XQuery is the query language for native XML data. DB2 9 for Linux, Unix, and Windows supports XQuery, but DB2 9 for z/OS does not. For an independent tutorial on XQuery, click on this link or for an IBM tutorial on using XQuery in DB2 LUW click on this link instead. So, how do you retrieve XML data using DB2 9 for z/OS? You can use SQL to retrieve entire XML documents from XML columns just like you would retrieve any other column. But if you need to retrieve portions of that XML document you will need to specify XPath expressions, through SQL with XML extensions. For an independent tutorial on XPath, click on this link . Here is an example of using XPath to identify data within our XML data: DELETE FROM MYCUSTOMER WHERE XMLEXISTS ( ’declare default element namespace "http://posample.org"; /customerinfo/phone[@type="cell"]’ PASSING INFO) # This should DELETE any XML document that has cell phone information, and for the purposes of this example, that would be CID 1002. I do not wish to go into any detailed description of XPath in this blog, but you can use XML functions with XPath expressions to traverse the XML document for data. One final note: some of the IBM documentation could be clearer. For example, I take exception with this paragraph lifted directly out of the “What’s New” manual (GC18-9856-00): “Support for XML capabilities and functions span the entire DB2 family. Version 8 of DB2 for z/OS and Version 8 of DB2 for Linux, Unix, and Windows provide basic support for storing, retrieving, and querying XML documents. DB2 9 for Linux, UNIX and Windows continues the work by delivering rich support of XML, including an XQuery interface to the data. Now, DB2 V9.1 for z/OS expands on similar support by delivering seamless integration of XML data and relational data in the DB2 database.” Anyone reading that paragraph would be completely justified in expecting DB2 V9 for z/OS to include XQuery support. It seems to have been written using intentionally misleading wording in order to avoid admitting that XQuery is not supported on z/OS. At least, that is what it seems like to me, I could be wrong. I’m also interested in how many folks out there in DB2-mainframe-land expect to use the XML capabilities of DB2 for z/OS? Please sign in and leave a comment expressing whether or not you plan to use DB2’s XML support. Thanks, and that is all for today. Keep an eye out for future DB2 9 for z/OS related posts as I plan to continue adding to this series on new V9 features over the course of the next month or so (at least). Cheers!

Wednesday, April 04, 2007

ORDER BY and FETCH FIRST in Subselects [DB2 9 for z/OS]

Here is another quick post in my series on new features in DB2 9 for z/OS.

Today, we will look at the additional flexibility gained in how the ORDER BY and FETCH FIRST n ROWS ONLY clauses can be specified in V9. Prior to the V9, the only place you could specify these clauses was at the statement level. Indeed, this has been a source of confusion for many DB2 SQL programmers.

A discussion of DB2 SELECT should be broken down into three topics:

fullselect,
subselect, and
select-statement.

The select-statement is the form of a query that can be directly specified in a DECLARE CURSOR statement, or prepared and then referenced in a DECLARE CURSOR statement. It is the thing most people think of when they think of SELECT in all its glory. If so desired, it can be issued interactively using SPUFI. The select-statement consists of a fullselect, and any of the following optional clauses: order-by, fetch-first, update, read-only, optimize-for, isolation and queryno. Well, that is, until V9 which still allows the fetch-first and order-by at this level, but also at the fullselect and subselect level!

A fullselect can be part of a select-statement, a CREATE VIEW statement, or an INSERT statement. Basically, a fullselect specifies a result table. Prior to V9, this sometimes confused folks as they tried to put a FETCH FIRST n ROWS clause or an ORDER BY in a view or as part of an INSERT. That was not allowed!

Finally, a subselect is a component of the fullselect. A subselect specifies a result table derived from the result of its first FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next.

This is all a bit confusing. Think of it this way: in a subselect you specify the FROM to get the tables, the WHERE to get the conditions, GROUP BY to get aggregation, HAVING to get the conditions on the aggregated data, and the SELECT clause to get the actual columns. In a fullselect you add in the UNION to combine subselects and other fullselects. Finally, you add on any optional order-by, fetch-first, update, read-only, optimize-for, isolation and queryno clauses to get the select-statement.

But, of course, as of V9 you can use the order-by and/or the fetch-first at the subselect or fullselect level. This can be useful if you want to limit the results within a subquery or part of a UNION statement. Also, if you specify ORDER BY with FETCH FIRST n ROWS ONLY, the result is ordered before the fetch-first is applied. (That's a good thing.)

So, that means all of the following are now legal SQL formulations in V9:

(SELECT COL1 FROM T1
UNION
SELECT COL1 FROM T2
ORDER BY 1)
UNION
SELECT COL1 FROM T3
ORDER BY 1;

This example shows how ORDER BY can be applied in fullselects with UNION.

SELECT EMP_ACT.EMPNO, PROJNO
FROM EMP_ACT
WHERE EMP_ACT.EMPNO IN (SELECT EMPLOYEE.EMPNO
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 10 ROWS ONLY);

And this example will return the employee number and project number for any projects assigned to employees with one of the top ten salaries.

So, once you move to V9 you will have much more lattitude in how you write your SELECTs. If you are interested in more details, here is a link to the section of the DB2 9 for z/OS SQL Reference manual on building SQL queries.

Tuesday, April 03, 2007

INTERSECT and EXCEPT [DB2 9 for z/OS]

With this blog entry I am introducing a new series in which I will briefly blog about the new feature of DB2 9 for z/OS. Today's entry will cover the new INTERSECT and EXCEPT keywords.

DB2 for Linux, Unix, and Windows has supported INTERSECT and EXCEPT in SQL SELECT statements for quite some time now, and with V9 the z/OS platform catches up. These two set operations can be used to simplify some SQL statements. Think of them as being similar to the UNION operation.

INTERSECT is used to match result sets between two tables. If the data is the same in both results sets it passes through. When INTERSECT ALL is specified, the result consists of all rows that are in both result sets. If INTERSECT is specified without the ALL option, the duplicates will be removed from the results. For example, the following SQL will show all customers in the USA who are also employees (with no duplicates):

SELECT last_name, first_name, cust_num
FROM CUST
WHERE country = 'USA'
INTERSECT
SELECT last_name, first_name, emp_num
FROM EMP
WHERE country = 'USA';

EXCEPT, on the other hand, combines non-matching rows from two result tables. Some other DBMS implementations refer to this as the MINUS operation. When EXCEPT ALL is specified, the result consists of all rows from the first result table that do not have a corresponding row in the second and any duplicate rows are kept. If EXCEPT is specified without the ALL option, duplicates are eliminated. As an example, the following SQL will return only those items from TABLE1 that are not also in TABLE2:

SELECT item FROM TABLE1
EXCEPT
SELECT item FROM TABLE2;

Both INTERSECT and EXCEPT make it easier to formulate SQL requests...

Friday, March 30, 2007

Mimicking ROWNUM in DB2

I was going through the many random files stored on my PC when I came across this useful nugget. A year or so ago a reader sent me this code. He and his supervisor had come up with a way to simulate the Oracle rownum feature in DB2 and wanted to share it with the world. I think, at the time, I posted it up on TechTarget as I was doing Q+A there on the old search390.com site. But that site is now defunct, so I'm reposting the solution here on my DB2portal blog for all to share.


SELECT DEPTNO, ROW#
FROM DEDBADM.DEPT TB1,
     TABLE (SELECT COUNT(*) + 1 AS ROW#
            FROM DEDBADM.DEPT TB2
            WHERE TB2.DEPTNO < TB1.DEPTNO) AS TDEPT_TAB
WHERE ROW# = 2;

If you decide to use this code, be careful because there is no guarantee that the row numbers will be the same from run to run of the SQL. As long as you get the same access path the row numbers should be the same from run to run, but if the access path changes -- for example, from a table space scan to indexed access or vice versa -- then the row numbers for the runs will differ. That is, each row may have a different number than it had for a previous execution.

All in all, though, a nice solution to the problem!

Friday, March 23, 2007

IBM Annual Report: Good News for DB2 DBAs

I sometimes enjoy reading through the annual reports of companies like IBM (that is, companies whose products I use "on the job"), so I thought maybe readers of my blog might enjoy that, too. With that in mind, IBM's annual report for 2006 is now up and available online.

What nuggets of information did I find in it? Well, I have to admit, I have not scoured it from front to back (and I probably won't), but I did try to track down the status of DB2 within the company. DB2 falls within IBM's Information Management brand, in their Middleware software segment. The annual report tells me that the Middleware software segment is doing better than the overall software category within IBM:

The Middleware segment grew to $13.891 billion in 2006 from $12.552 billion in 2005. And that represents year over year growth of 10.7%.
IBM's entire software portfolio posted revenue of $18.161 in 2006 versus $16.83 billion in 2005, representing a year over year growth of 7.9%.

OK, but what about DB2? The best I can do is all of Information Management, which include DB2, as well as IMS and Informix, and all of the data management tools that IBM sells. These products posted a 14% gain in 2006 over 2005. Nice...

In terms of hardware, though, IBM saw a slide in that market segment. The company posted revenue of $21.97 billion in 2006 down from the $23.857 billion posted in 2005. This is a 7.9% drop in annual revenue for hardware. But, you may well ask, we care about the mainframe more than those other platforms, how did it do?

It did well, my friends. Of the four major computer brands sold by IBM, the z series boxes (that is, the mainframes) did the best. System z was up 7.8% over 2005. Its closest competitor was System x (UNIX servers) which was up 3.7%. Bad news for the other two computing platforms: System i (that is, the AS/400s) was down 15% and System p (Windows servers) was down 1.1%.

So, if you are my kinda people - that is, DBAs working on DB2 for z/OS - then IBM's annual report contains good news all around. The DBMS business is healthy and so is the mainframe business.

Monday, March 19, 2007

DB2 for z/OS V8 Performance Workshops

Just a quick note this morning to let folks in Zurich, Switzerland and London, UK know about a three-day performance workshop for DB2 V8 coming up in June. The workshop is based on the IBM Redbook DB2 for z/OS Version 8 Performance Topics, SG24-6465. If you are looking for an overview and update of performance-related issues you can tackle with DB2 V8, consider this workshop. According to the redbook web site this workshop helps "you understand the performance implications of migrating from DB2 V7 to DB2 V8, highlights the key performance functions and sets the right expectations. It provides the type of information needed to evaluate the performance impact of DB2 V8 and the capacity planning needs."

Of course, this workshop information is directed mostly at my European readers... unless, of course, you are a USA reader with some extra travel budget and your manager will allow you to travel overseas!?!? (I know, I know, you can't even get budget to travel to Scranton, let alone Zurich...)

Here is the information for each:

Zurich: starts June 18, 2007
London: starts June 13, 2007

That's all for today!

Wednesday, March 14, 2007

IDUG News

A lot of new stuff has been going on at the International DB2 User's Group (IDUG) the past few months, so I thought I'd write a quick blog entry to update folks about what's new.

First of all, this year's North American IDUG conference will be held the week of May 6 thru 10, 2007, in San Jose, CA. As always, the conference offers a fantastic learning and networking opportunity for DB2 and Informix users with hundreds of technical sessions, as well as 16 full day pre-conference seminars, the opprotunity to take certification exams for free, numerous special interest groups, and a great vendor exhibition. Also, this year's event is the first to offer IMS content, too - so if you are an IMS DBA or technician now is the time to add IDUG to your educational opportunities!

In other IDUG news, the group is expanding its conference coverage to India. The 2007 IDUG India Forum, taking place May 31 – June 2, 2007 in Bangalore, India, is geared towards professional application developers and DBAs. This three-day event will present content that will appeal to new users as well as experienced professionals. So if you are in India, IDUG is coming your way!

Finally, if you haven't visited the IDUG web site in awhile, now is the time to check it out again. IDUG volunteers have put a lot of effort into revamping the site and it looks great.

That's it for now... check in again soon!

Tuesday, March 06, 2007

DB2 9 for z/OS General Availability Announced

Today, March 6, 2007, IBM announced the general availability of the next version of DB2 for z/OS, DB2 9, for March 16, 2007. The full announcement can be read here.

You can read a high-level overview of DB2 9 for z/OS in my DB2 Magazine article titled DB2 9 for z/OS Roars to Life.

Wednesday, February 28, 2007

QMF vs. SPUFI

As regular readers of my blog know, I sometimes use the blog to answer questions I get via e-mail. This is one of those times...

The question I received is this:
Can you bring out the major differences between QMF and SPUFI?

Here is my response:
The biggest difference between QMF and SPUFI is that QMF is a query/reporting environment with the ability to format reports. SPUFI is just a quick and dirty SQL execution engine.

(Here is a trivia question: what does the acronym SPUFI stand for? The answer is provided at the bottom, so page down if you want to know...)

If you need to produce nice-looking reports, enable user input to a query, or store your queries and reports for future usage, QMF is a much better technology for doing so. QMF also offers data formatting and translation capabilities that are difficult (sometimes impossible) to accomplish using SPUFI and SQL alone.

A typical end-user might have difficulty using SPUFI because it requires using data sets to store your SQL statements and pass them to DB2. The results are also delivered to another data set. Most end users are not comfortable managing and manipulating mainframe data sets. QMF, on the other hand, stores its queries in tables and hides this fact from the user with a nice interface for saving and recalling SQL queries (and results).

Keep in mind, though, that SPUFI comes for free with DB2 whereas QMF is an add-on product and costs money. Not every DB2 customer will have QMF, whereas every DB2 (mainframe) customer will have SPUFI.

OK, now, what does SPUFI stand for?

The answer: SQL Processor Using File Input.

Tuesday, February 27, 2007

Dynamic SQL and Performance

The performance of dynamic SQL is one of the most widely debated DB2 issues. Some shops try to avoid it, but in this day and age of ERP/CRM systems, Java/JDBC programming, and web applications, all of which heavily rely on dynamic SQL, its use is becoming more ubiquitous.

Still, many shops that allow dynamic SQL try to place strict controls on its use. But as new and faster versions of DB2 are released, many of the restrictions on dynamic SQL use can be eliminated.

I suppose that you can still find some valid reasons for prohibiting dynamic SQL. For example, you should avoid dynamic SQL when the dynamic SQL statements are just a series of static SQL statements in disguise. Consider an application that needs two or three predicates for one SELECT statement that is otherwise unchanged. Coding three static SELECT statements can be more efficient than coding one dynamic SELECT with a changeable predicate. The static SQL takes more time to code but probably less time to execute.

Another reason for avoiding dynamic SQL is that it can require more overhead to process than equivalent static SQL. Dynamic SQL incurs overhead because the cost of the dynamic bind, or PREPARE, must be added to the processing time of all dynamic SQL programs. But this overhead is not quite as costly as many people seem to think it is.

To determine the cost of a dynamic bind, consider running some queries using SPUFI with the DB2 Performance trace turned on. Then examine the performance reports or performance monitor output to determine the elapsed and TCB time required to perform the PREPARE. The results should show elapsed times less than 1 second and subsecond TCB times. The actual time required to perform the dynamic prepare will vary with the complexity of the SQL statement. In general, the more complex the statement, the longer DB2 will take to optimize it. So be sure to test SQL statements of varying complexity.

Of course, the times you get will vary based on your environment, the type of dynamic SQL you use, and the complexity of the statement being prepared. Complex SQL statements with many joins, table expressions, unions, and subqueries take longer to PREPARE than simple queries. However, factors such as the number of columns returned or the size of the table being accessed have little or no effect on the performance of the dynamic bind.

And prepared dynamic SQL can be cached in the EDM pool so that the same SQL statement can reuse the mini plan for the statement the next time it runs. Of course, the statement has to be exactly the same in order to benefit from the dynamic SQL cache.

Keep in mind, too, that performance is not the only factor when deciding whether or not to use dynamic SQL. For example, if a dynamic SQL statement runs a little longer than a static SQL statement but saves days of programming cost then perhaps dynamic SQL is the better choice. It all depends on what is more important -- the cost of development and maintenance or squeezing out every last bit of performance at any cost.

Overhead issues notwithstanding, there are valid performance reasons for favoring dynamic SQL, too. For example, dynamic SQL can enable better use of indexes, choosing different indexes for different SQL formulations. Properly coded, dynamic SQL can use the column distribution statistics stored in the DB2 catalog, whereas static SQL is limited in how it can use these statistics. Use of the distribution statistics can cause DB2 to choose different access paths for the same query when different values are supplied to its predicates.

The REOPT bind parameter can be used to allow static SQL containing host variables or special registers to behave like incremental-bind statements. When dynamic reoptimization is activated, a dynamic bind similar to what is performed for dynamic SQL is performed. This means that these statements get compiled at the time of EXECUTE or OPEN instead of at BIND time. During this compilation, the access plan is chosen, based on the real values of these variables.

Specifying REOPT ONCE causes the access plan to be cached after the first OPEN or EXECUTE request and it is used for subsequent execution of this statement. With REOPT ALWAYS, the access plan is regenerated for every OPEN and EXECUTE request, and the current set of host variable, parameter marker, and special register values is used to create this plan.

Additionally, consider that the KEEPDYNAMIC bind option can enhance the performance of dynamic SQL. When a plan or package is bound specifying KEEPDYNAMIC(YES), the prepared statement is maintained across COMMIT points. Contrast that with KEEPDYNAMIC(NO), where only cursors using the WITH HOLD option keep the prepared statement after a COMMIT.

Dynamic SQL usually provides the most efficient development techniques for applications with changeable requirements (for example, numerous screen-driven queries). In addition, dynamic SQL generally reduces the number of SQL statements coded in your application program, thereby reducing the size of the plan and increasing the efficient use of system memory.
So, if you have a compelling reason to use dynamic SQL, then by all means, go ahead and code up your program to use dynamic SQL. I mean, after all, it is no longer the early, dark days of DB2 when dynamic SQL almost certainly meant performance problems. And, as I mentioned in the beginning, dynamic SQL is likely to be foisted on you in your new, more modern applications even if you continue to desperately keep it out of your COBOL programs.

For more in-depth details on dynamic SQL, consider downloading the following IBM redbook: DB2 for z/OS and OS/390 : Squeezing the Most Out of Dynamic SQL.

Wednesday, February 14, 2007

A Report From SHARE

As regular readers of this blog know, I regularly attend and speak at the SHARE user conferences held twice annually. The Spring 2007 SHARE conference is being held this week, February 12 thru 16, 2007, in Tampa, Florida. It is now Wednesday, mid-way through the week, and time for me to post a quick update of what is going on at SHARE, from a DB2 point of view.

First of all, I’ve been privileged to deliver two presentations this week. The first is one of the perennial favorites called “A DB2 Performance Tuning Roadmap.” I’ve been delivering this presentation now, in various forms and with various changes, for about 8 years. And every time I think it is time to retire it, I hear from a user group or a conference that is interested in having me present it. And every time I get good feedback. I guess that is not so surprising, I mean, after all, performance is always one of those topics to raise interest among all DBAs and many developers. This particular presentation works as an overview of the three basic components of database performance management: that is, you need to manage and tune the application/SQL, the database objects, and the DBMS/subsystem. All three are integral to optimizing DB2 performance and the presentation walks through the basics of each of the three areas and offers guidance and suggestions.

The second presentation I delivered is called “Change Control for DB2 Access Paths” and it is a bit newer. The premise of this presentation is that organizations need to REBIND for performance, but they also need to control change in their production environment. In the presentation I discuss the issues involved in BIND/REBIND management and then introduce how NEON Enterprise Software’s Bind ImpactExpert can provide a solution.

OK, so much for my presentations, what else is happening DB2-wise at SHARE this week? Well, a lot of the presentations are by IBMers focusing on DB2 9 for z/OS. Even though this new version has not been officially released yet, IBM is getting the word out early so we all can understand the benefits of this new release – and also, so we can understand that it is not the goliath that V8 was (nay, let’s say is).

Roger Miller has offered a couple of great presentations on migration and performance issues, and Willie Favero spoke about all of the great new SQL features and functionality that will be available in DB2 9. Then there was the update on the DB2 utilities and what is happening to them in DB2 9… And there is a lot of good news – to net it out, more online utilities, more availability, and less CPU.

If you’re an IMS person, then there has been a lot of coverage of IMS Version 10 here, too. I haven’t personally attended many of the IMS sessions, but I’ve gathered handouts that cover IMS V10 dynamic resource definition, transaction manager enhancements, and database and DBRC enhancements.

And the zNextGen program, that is spearheaded by NEON Enterprise Software's own Kristine Harper, offered an interesting program this week as well. One of the highlights was a fun presentation by Cheryl Watson, a well-reknowned expert in the mainframe world.

Basically, there is a lot of education going on in Tampa this week for mainframe database zealots… and I’m glad I’m hear to soak it up. If you weren’t able to make it to Tampa, consider trying to get to the Fall 2007 SHARE conference in San Diego, CA this August.

Monday, February 05, 2007

See You at SHARE?

Well, it is that time of the year again... that is, time for the first of two annual SHARE conferences. The "Spring 2007" SHARE will be held next week, in Tampa, FL.

For those of you who don't know about SHARE, it is an independent, volunteer-run association for IBM customers. It is not only mainframe-focused, but it is one of the last events mainframe folks can attend to hear about mainframe topics from z/OS to JES and from CICS to IMS to DB2.

I will be presenting on Monday (2/12) and Tuesday (2/13). On Monday I will cover a DB2 Performance Tuning Roadmap - basically presenting a ahigh-level overview of the "things" you need to be aware of, and track, in order to assure an optimally performing DB2 environment. And then on Tuesday I'll be addressing change control for DB2 access paths. As you know, change is strictly controlled on the mainframe, but one exception has been DB2 access paths. Binds and Rebinds are typically done in the production environments without the benefit of oversight or prior testing. This lack of change control results in unpredictable performance impacts. This presentation will discuss the issues and problems this causes, as well as how you can use Bind ImpactExpert to analyze and manage access path changes in your DB2 environment.

And I'll be attending the event all week long, so I'll be sure to post a blog or two about interesting things I learn at the show... so get yourself to Tampa if you can, but if you can't, be sure to tune in during the week next week to hear about some of the highlights (at least my take on the highlights) of SHARE.

Wednesday, January 31, 2007

A Quick Intro to DB2 SQL Functions

The last post I made to this blog used functions to translate data from one form to another. Functions frequently can be used to overcome issues when you are reading data from your DB2 tables. With that in mind, I thought it might make sense to go over DB2 SQL functions at a basic level - as a review for those who might not have used them before and to refresh the memory of those who have.

Functions can be specified in SQL statements to transform data from one state to another. Two types of functions can be applied to data in a DB2 table using SQL: column functions and scalar functions. Column functions analyze a group of rows and compute a single value for a designated column or expression. For example, the SUM function can be used to add, returning the sum of the values instead of each individual value:

SELECT SUM(SALARY)
FROM EMP
WHERE WORKDEPT = 'A01';

This query will return the total salary for all employees in the A01 department. This is a better solution than writing a program to read all employee rows in department A01 and then adding the salaries. With the SUM function less data is returned to the program, so it will operate much more efficiently.

The second type of function is a scalar function. Scalar functions are applied to a column or expression and operate on a single value. For example, the DAYOFWEEK function reads a single date value and returns an integer in the range of 1 to 7; where 1 is Sunday and 7 is Saturday:

SELECT DAYOFWEEK(HIREDATE)
FROM EMP
WHERE EMPNO = '000100';

This query will return an integer between 1 and 7 that indicates the day of the week when employee number 000100 was hired. Of course, this is just one example of many, many scalar functions supported by DB2. For a complete list of the available functions, refer to Chapter 3 of the IBM DB2 SQL Reference manual. Using scalar functions is generally more efficient than writing your own code to do the same thing.

Speaking of writing your own code, though, you can write your own user-defined functions (UDFs) in addition to the base, system-defined functions. UDFs give you the ability to write your own customized DB2 functions and standardize on that code. Consider creating UDFs for specific algorithms and processing that need to be executed in multiple applications in your organization. By writing the code once, putting it into a DB2 UDF, and then calling it whenever you need it, you will minimize errors and improve reusability.

A user-defined function can be specified anywhere a system-defined function can be specified.

Sunday, January 28, 2007

Sorting Out a Solution in DB2

Sometimes an application can require different, perhaps even somewhat "odd" data sorting. These needs may cause developers to sit and scratch their head for hours trying to make DB2 do something that seems "unnatural." But often you can conjure up an answer by properly understanding the problem and applying some creative SQL.

Some of you might be asking "What the heck is he talking about?" That's a fair question, so let’s look at an example to clarify.

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing the name of the day on which the transaction happened; let’s call this column DAY_NAME.

Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, if we write the first query that comes to mind the results will obviously be sorted improperly:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY DAY_NAME;

The results from this query would be ordered alphabetically; in other words

FRI
MON
SAT
SUN
THU
TUE
WED

This is what I mean be an irregular sorting requirement. The example may not be an everyday need, but it is not unrealistic for a business to have this, or a similar requirement that needs a different sort order than strictly alphabetical or numeric. So what is the solution here?

Of course, one solution would be to design the table with an additional numeric or alphabetic column that would sort properly. By this I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. But this requires a database design change, and it becomes possible for the DAY_NUM and DAY_NAME to get out of sync.

There is another solution that is both elegant and does not require any change to the database. To implement this solution all you need is an understanding of SQL and SQL functions – in this case, the LOCATION function. Here is the SQL:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY LOCATE(DAY_NAME, 'SUNMONTUEWEDTHUFRISAT');

The trick here is to understand how the LOCATE function works: it returns the starting position of the first occurrence of one string within another string.

So, in our example SQL, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position. So, if DAY_NAME is WED, the LOCATE function in the above SQL statement returns 10. Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

(Note: Other database systems have a similar function called INSTR.)

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE(DAY_NAME, 'SUNMONTUEWEDTHUFRISAT')/3) + 1;

Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week.

With a sound understanding of the features of DB2 SQL, many requirements that seem "odd" at first glance, are achievable using nothing more than SQL and your imagaination.