The Db2 Portal Blog

Thursday, May 24, 2007

LOB Enhancements [DB2 9 for z/OS]

IBM focused their attention on improving DB2’s ability to store and manage LOB data in Version 9. As anyone who has tried to use LOBs in a previous version of DB2 knows, the usability limitations were troublesome. But with Version 9, IBM chips away at some of the more annoying LOB limitations.

FETCHing LOBs

Prior to Version 9, there were two methods you could deploy in your programs to fetch LOB data:

Fetching data into a pre-allocated buffer
Using a LOB locator to retrieve a handle on the data.

Both methods have their issues. Fetching data into a preallocated buffer can cause virtual storage constraint problems, especially for larger LOBs. On the other hand, using LOB locators that commit infrequently or do not explicitly free the locators can use considerable amounts of DB2 resources.

V9 introduces a new clause, WITH CONTINUE, for use on your FETCH statements. By coding your program to use WITH CONTINUE you can retrieve LOB columns in multiple pieces without using a LOB locator, and continue a FETCH operation to retrieve the remaining LOB data when truncation occurs. (Note: this method can be used with XML data, too.) You will have to manage the buffers and reassemble the pieces of data in your application program.

So, by specifying WITH CONTINUE on your FETCH statement you tell DB2 to allow subsequent FETCH CURRENT CONTINUE operations. These will allow you to access the remaining truncated LOB (or XML) column after the initial FETCH. If truncation occurs, DB2 will remember the truncation position and will not discard the remaining data. DB2 will return the total length that would have been required to hold all of the data of the LOB or XML column. This will either be in the first four bytes of the LOB host variable structure or in the 4 byte area that is pointed to by the SQLDATALEN pointer in the SQLVAR entry of the SQLDA for that host variable.

File Reference Variables

DB2 V9 adds support for a LOB file reference variable. A file reference variable is a host variable that is defined in a host language to contain the file name that directs file input and output for a large object (LOB).

Using file reference variables, large LOB values can be inserted from a file or selected into a file rather than a host variable. This means that your application program does not need to acquire storage to contain the LOB value. File reference variables also enable you to move LOB values from the DBMS to a client application or from a client application to a database server without going through the working storage of the client application.

LOBs and Utilities

The manner in which DB2 handles LOBs in utility processing has also been improved in DB2 V9.

Loading and unloading LOBs has been improved. For LOAD, an input field value can contain the name of the file that contains a LOB column value. The LOB column value will then be loaded from that file. For UNLOAD, you can store the value of a LOB column in a file and record the name of the file in the unloaded record in the base table.

What about REORG? Well, prior to V9 you could not access LOB data during a REORG. And a REORG did not reclaim physical space from the LOB data set because LOBs were moved within the existing LOB table space. V9 fixes these problems. During a REORG (in V9), the original LOB table space is drained of writers. All LOBs are then extracted from the original data set and inserted into a shadow data set. When this operation is complete, all access to the LOB table space is stopped (the readers are drained) while the original data set is switched with the shadow data set. At this point, full access to the new data set is enabled, and an inline copy is taken to ensure recoverability of data.

Additionally, both CHECK LOB and CHECK DATA have SHRLEVEL REFERENCE and SHRLEVEL CHANGE options. Using these you can minimize downtime. For LOBs, CHECK DATA checks for consistency between a base table space and the corresponding LOB or XML table spaces. The CHECK LOB utility identifies structural defects in the LOB table space and any invalid LOB values. Running CHECK (DATA or LOB) with SHRLEVEL REFERENCE indicates that applications can read from but cannot write to the index, table space, or partition that is to be checked. SHRLEVEL CHANGE means that applications can read from and write to the index, table space, or partition that is to be checked.

Performance

Finally, DB2 V9 provides several performance improvements related to LOBs. Using file reference variables or WITH CONTINUE to “chunk” read LOBs can improve performance over using locators.

And as we all know, removing locks can improve performance. DB2 V9 eliminates LOB locking for space allocation, as well as for insert, delete, update, and select operations. Additionally, a LOB lock is no longer required to serialize the consistency between the value of the LOB and the column of the base row for an uncommitted read operation.

Monday, May 21, 2007

New OLAP Capabilities [DB2 9 for z/OS]

DB2 9 for z/OS offers several new SQL improvements focused on improving OLAP functionality. The OLAP functions provide the ability to return ranking, row numbering, and existing aggregate function information as a scalar value in the result of a query. You can include OLAP specifications in an expression, in a select-list, or in the ORDER BY clause of a select-statement.

The result to which the OLAP specification is applied is the result table of the innermost subselect that includes the OLAP specification. OK, at this point you might well be asking, “So, what are these new OLAP things and what can they do for me?” Let’s take them one at a time.

First up, we have the RANK and DENSE_RANK functions. These functions specify that the ordinal rank of a row within the specified window is computed. Rows that are not distinct with respect to the ordering within the specified window are assigned the same rank. Here is a quick example:


SELECT
   EMPNO,
   LASTNAME,
   FIRSTNAME,
   SALARY+BONUS+COMM AS TOTAL_COMP,
   RANK() OVER(ORDER BY SALARY+BONUS+COMM DESC) AS RANK_COMP
FROM EMP
WHERE SALARY+BONUS+COMM > 30000
ORDER BY LASTNAME;

This query will rank employees who have total compensation greater than $30,000, but order the results by last name. This allows you to rank data differently than the order in which it is presented.

You can define the results of ranking with gaps in the sequential rank numbering by using the RANK specification, or without gaps, by using the DENSE_RANK specification.

The difference between the two can be a bit difficult to decipher at first. Think of it this way: RANK specifies that the rank of a row is defined as 1 plus the number of rows that strictly precede the row. So, if two or more rows are not distinct you will get gaps in the ranking. With DENSE_RANK the rank of a row is defined as 1 plus the number of preceding rows that are distinct with respect to the ordering. In this case there will be no gaps in the sequential rank numbering. Consider the following data:


EMPNO  LASTNAME   FIRSTNAME  SALARY  BONUS   COMM
-----  --------   ---------  ------  ------ ------
100     MULLINS   CRAIG      500000  100000 400000
200     SHAW      DONNA       25000   10000      0
300     ABBOTT    RANDY      700000  300000      0
400     WISER     BUD         10000       0      0
500     GREEN     RACHEL      40000    2000   5000

The results of the previous query run against this data would look like this:


300    ABBOTT     RANDY     1000000    1
500    GREEN      RACHEL      47000    3
100    MULLINS    CRAIG     1000000    1
200    SHAW       DONNA       35000    4
400    WISER      BUD         10000    5

Note that both ABBOTT and MULLINS earn the most, but the amount is the same, so they share the number one ranking. As this is not a dense rank, the next rank value is 3, and not 2.

The next OLAP specification introduced by DB2 9 for z/OS is ROW_NUMBER. ROW_NUMBER specifies that a sequential row number is computed for the row that is defined by the ordering, starting with 1 for the first row. If the ORDER BY clause is not specified in the window, the row numbers are assigned to the rows in an arbitrary order, as the rows are returned. This satisfies an often-requested capability to simply assign a number to the result rows of a query. Row numbers also enable easy formulation of queries for computing histogram statistics and they enable formation of other OLAP specifications (for example, moving sums, moving averages, and so on).

Here is a sample query using ROW_NUMBER:


SELECT
   ROW_NUMBER() OVER(ORDER BY WORKDEPT, LASTNAME) AS NUMBER,
   LASTNAME,
   SALARY
FROM EMP
ORDER BY WORKDEPT, LASTNAME;

The result of a RANK, DENSE_RANK, and ROW_NUMBER specification is BIGINT, and the result cannot be null.

One more thing to consider for the new OLAP features is the ability to partition results for the OLAP specification. This is specified using the PARTITION BY clause with a partitioning-expression, which is an expression used to define the partitioning of the result table. Each column name that is referenced in a partitioning-expression must unambiguously reference a column of the result table of the subselect that contains the OLAP specification. A partitioning-expression cannot include a scalar-fullselect or any function that is not deterministic or has an external action.

Here is an SQL example using partitioning:


SELECT
   WORKDEPT,
   EMPNO,
   LASTNAME,
   FIRSTNAME,
   EDLEVEL,
   DENSE RANK() OVER
    (PARTITION BY WORKDEPT ORDER BY EDLEVEL DESC) AS RANK_EDLEVEL
FROM EMP
ORDER BY WORKDEPT, LASTNAME;

This SQL ranks departments according to their education level. And because DENSE_RANK is specified multiple employees with the same rank in the department will not increase the next ranking.

Sometimes RANK, DENSE_RANK, and ROW_NUMBER are called window functions. An OLAP specification is not valid in a WHERE, VALUES, GROUP BY, HAVING, or SET clause. An OLAP specification cannot be used as an argument of an aggregate function. When invoking an OLAP specification, a window is specified that defines the rows and order of the rows over which the function is applied.

Tuesday, May 15, 2007

Implicitly Hidden Columns [DB2 9 for z/OS]

Another nice new feature deep in the bowels of DB2 9 for z/OS is the ability to hide columns from the SELECT * statement. As far back as anyone can remember the advice has been given to avoid using SELECT * in application programs. But I still see it every now and then.

Now don't get me wrong, SELECT * is a nice shorthand when you are writing quick & dirty SQL using SPUFI or some other ad hoc SQL tool. But it does not belong in your application programs because a subsequent ALTER to add a column will cause the program to fail because there are now more columns being returned than the programmer coded into the program.

"OK," you may be asking, "so what? I thought you were writing about DB2 9 here?" Fair enough. DB2 9 for z/OS adds the ability to code IMPLICITLY HIDDEN on the column specification of a CREATE or ALTER TABLE statement. By coding IMPLICITYLY HIDDEN, the column will not be visible in the result of a SQL statement unless you explicitly refer to the column by name. So, SELECT * will not return any implicitly hidden columns to the result set.

For example, in the following table C2 is implicitly hidden and will not be returned as the result of SELECT *:

CREATE TABLE T1
(C1 SMALLINT NOT NULL,
C2 CHAR(10) IMPLICITLY HIDDEN,
C3 TIMESTAMP)
IN DB.TS;

This has some obvious beneifts. First of all, if you are one of those shops where programmers did not follow the no SELECT * in programs rule, then you can simply add every new column with the IMPLICITLY HIDDEN attribute and those SELECT * statements will keep on running because they won't see the new columns.

Or, you might want to take a more comprehensive approach, and specify every column (except one, perhaps the key) of every new (or modified) table as IMPLICITLY HIDDEN. If every column except the key is hidden, then a SELECT * won't return anything except the key - you'll have to explicitly specify all of those other columns to get them into your result sets. Of course, this negates the ability to use SELECT * for quick & dirty SPUFI queries because implicitly hidden columns will be hidden there, too.

There are a few caveats on the usage of IMPLICITLY HIDDEN. You cannot specify IMPLICITLY HIDDEN for a column that is defined as a ROWID, or a distinct type that is based on a ROWID. Additionally, IMPLICITLY HIDDEN must not be specified for all columns of a table.

Monday, May 14, 2007

Index on Expressions [DB2 9 for z/OS]

DB2 9 for z/OS offers, for the first time, the ability to create an index on data that is not technically in the table. At this point, you may well be asking “What does that mean?” and “Why would I do that?” I’ll attempt to answer both of those questions.

Basically, the neat new feature is an extension to the CREATE INDEX statement that lets you create an index on an expression. That begs the question of just what is an expression… Well, an expression can be as simple as a column reference, or it can be a built-in function invocation or even a more general expression, with certain restrictions.

The expression that is being indexed is referred to as the key-expression. The maximum length of the text string of each key-expression is 4000 bytes after conversion to UTF-8. Each key-expression must contain as least one reference to a column of the table named in the ON clause of the index. Referenced columns cannot be LOB, XML, or DECFLOAT data types nor can they be a distinct type that is based on one of those data types. Referenced columns cannot include any FIELDPROCs nor can they include a SECURITY LABEL. Furthermore, the key-expression cannot include any of the following:

subquery
aggregate function
not deterministic function
function that has an external action
user-defined function
sequence reference
host variable
parameter marker
special register
CASE expression
OLAP specification

Unlike a simple index, the index key of an index on expression is composed by concatenating the result (also known as a key-target) of the expression that is specified in the ON clause. An index that is created on an expression lets a query take advantage of index access (if the index is chosen by the optimizer) and avoid a table space scan.

Perhaps an example would help to clarify this topic. Consider this sample DDL:

CREATE INDEX XUPLN
ON EMP
(UPPER(LAST_NAME, 'En_US'))
USING STOGROUP DSN8G910
PRIQTY 360 SECQTY 36
ERASE NO
COPY YES;
This example would create an index on the LAST_NAME column of the EMP table, but would index on the data after applying the UPPER function. This is useful if you store the data in mixed case but submit queries with parameter markers (or host variables) in upper case only. By indexing the data as upper case the index better matches your queries.

Query performance can be enhanced if the optimizer chooses that index. When you use an index on an expression, the results of the expressions are evaluated during insertion time or during an index rebuild and are kept in the index. If the optimizer chooses to use that index, the predicate is evaluated against the values that are stored in the index. As a result, run-time performance overhead is eliminated.

This is a nice new feature of DB2 9 for z/OS that you should consider if you have queries that search on expressions.

Monday, April 30, 2007

Database Definition on Demand [DB2 9 for z/OS]

As you probably know, online schema evolution (sometimes referred to as “online schema change”) was one of the key new features of DB2 V8. But, as its name implies, its capabilities continue to evolve. With V9, online schema evolution expands to simplify more types of database definition changes. The new term IBM is using for this in V9 is Database Definition On Demand (DDOD).

One of the nice new components provided by DDOD in V9 is that online table space reorganization is significantly improved. Today, when reorganizing just a couple of partitions in a partitioned table space the BUILD2 phase takes a long time to complete. V8 removed the outage for DPSIs, and now V9 removes the BUILD2 phase for all types of secondary indexes.

Another new DDOD capability supports replacing one table quickly with another. This is accomplished via cloning and the technique can even avoid the need to REBIND packages. Cloning allows you to generate, in the same table space, a new table with the same structure as the original table. After creating the clone you can do with it what you want – LOAD, DELETE, INSERT, UPDATE, etc. data – and then exchange the clone table name with the current table name. In this way you can keep the existing table operational while you work on the next “generation” of that table in its clone. When the clone is ready, you EXCHANGE it with the existing table. Nice, huh?

How about being able to rename a column within your table or rename an index? V9 provides the ability to do both! No longer do you have to DROP and re-CREATE in order to rename columns and indexes.

DB2 V9 also introduces a new type of table space that combines the attributes of segmented and partitioned. It is called a universal table space. Universal table spaces offer improved space management for variable length rows. This is so because it uses the space map page with more free space information like segmented table spaces. Also, like segmented table spaces, universal table spaces deliver improved mass delete performance and you can immediately reuse the table segments after the mass delete.

There are two types of universal table spaces:

Partition-by-growth: this type of universal table space will partition the data as it grows without the need to specify key ranges. This type of universal table space is beneficial for tables that will grow over time and/or need the additional limits afforded by partitioning, but can benefit from the performance of segmented. You can define more than one table in this type of universal table space if you wish.
Range-partitioned: this type of universal table space requires a key range for partitioning – and it can contain only a single table. This is basically adding segmentation to the existing partitioned table space.

Additionally, you can define SMS constructs (MGMTCLAS, DATACLASS, and STORCLAS) on a STOGROUP and you can ALTER those constructs as well. And table space and index logging parameters can be altered.

DB2 V9 even adds a new capability to change the DB2 early code without requiring an IPL.

So, with DB2 9 for z/OS we get more flexibility in modifying our database schemas. And that is a good thing, right?