Friday, June 29, 2007

Implicitly Created Database Objects [DB2 9 for z/OS]

Today we continue our series on new features in DB2 V9 with a quick discussion of implicitly created database objects. To understand what this is, let’s first review the way DB2 works today (pre-V9). If you issue a CREATE TABLE statement and do not specify the database and table space into which the table is to be created, DB2 will automagically create a new table space in the default database (DSNDB04).

Not being content with that, DB2 9 extends this capability with the ability to implicitly create additional types of database objects. By coding your CREATE TABLE statement with the proper options you can implicitly create any or all of the following:
  • Database
  • Table space
  • Index to enforce Primary Key uniqueness
  • Index to enforce unique constraint
  • ROWID index (if the ROWID is defined as GENERATED BY DEFAULT)
  • LOB structures (LOB table space, auxiliary table, auxiliary index)

  • OK, so how does this happen? Let’s go down the list. If you fail to specify the IN clause on a CREATE TABLE, DB2 works a bit differently. In the past, DB2 would simply create an implicit table space in DSNDB04. As of DB2 9, the database is involved as well. DB2 will either create an implicit database or use a previously implicitly created database. The names of these implicitly created databases will range from DSN00001 to DSN60000. The first time, DB2 will create DSN00001, the second DSN00002, and so on until we reach DSN60000. The next time, DB2 will wrap around and start again from the beginning, using existing implicitly created databases. For the implicitly created databases, the STOGROUP will be set to SYSDEFLT; buffer pool values are determined via DSNZPARMs.

    Next up is the table space. Although DB2 has supported implicitly created table spaces forever, there are some twists in DB2 9. First of all, you cannot create simple table spaces any longer, so all implicitly created table spaces will be segmented. In compatibility mode (CM), a implicitly created table spaces will be defined as SEGSIZE 4 and LOCKSIZE ROW. After migrating to new function mode (NFM) your implicitly created table spaces will be created as partition by growth table spaces. The options uses will be SEGSIZE 4, DSSIZE 4G, MAXPARTITIONS 256, LOCKSIZE ROW, and LOCKMAX SYSTEM.

    As for the rest of the objects in the list, these system-required objects will always be implicitly created if the table space is created implicitly. For indexes that support the primary key or unique constraints, the names will be generated using a combination of the table name and randomly generated characters.

    OK, so now that you know about the ability of DB2 9 to implicitly create objects, let me give you some advice. Whenever possible, don’t rely on it. It is much better, if at all feasible, for your DBAs to explicitly create and name all database objects as needed. Yes, it takes more time, but it gives you more control. You can explicitly decide which objects go into which database; you can explicitly set paramters; etc.

    So, this is a nice new feature and it can enable DB2 to do some definitional things for you automatically. But most DBAs will want to continue to do things the traditional way, that is, building their DDL themselves without relying on implicitly

    Monday, June 18, 2007

    Skipping Locked Rows [DB2 9 for z/OS]

    In DB2 9 it is possible for a transaction to skip over rows that are locked. This can be accomplished by means of the SKIP LOCKED DATA option within your SQL statement(s). SKIP LOCKED DATA can be specified in SELECT, SELECT INTO, and PREPARE, as well as searched UPDATE and DELETE statements. You can also use the SKIP LOCKED DATA option with the UNLOAD utility.

    Of course, if a program skips over locked data then that data is not accessed and the program will not have it available. When this option is used DB2 will just skip over any locked data instead of waiting for it to be unlocked. The benefit, of course, is improved performance because you will not incur any lock wait time. But it comes at the cost of not accessing the locked data at all. This means that you should only utilize this clause when your program can tolerate skipping over some data.

    The SKIP LOCKED DATA option is compatible with cursor stability (CS) isolation and read stability (RS) isolation. But it cannot be used with uncommitted read (UR) or repeatable read (RR) isolation levels. DB2 will simply ignore the SKIP LOCKED DATA clause under UR and RR isolation levels.

    Additionally, SKIP LOCKED DATA works only with row locks and page locks. That means that SKIP LOCKED DATA does not apply to table, partition, LOB, XML, or table space locks.
    Let's look at an example. Suppose we have a table with 5 rows in it that looks like this:

    KEY FNAME LNAME
    --- ------ -------
    1 JOE MAMA
    2 DON KNOTTS
    3 KIM PORTANT
    4 BOB NOBBLE
    5 KIM BIMBO


    Assume row level locking. Next assume that an UPDATE statement is run against the table changing FNAME to JIM WHERE LNAME = 'KIM'. And it is hanging out there without a COMMIT. Next, we run:


    SELECT COUNT (*)
    FROM TABLE
    WHERE FNAME >= ’AAA’
    SKIP LOCKED DATA;

    The count returned would be 3 because DB2 skips the two locked rows (rows 3 and 5). And, of course, if the locks are released the count would be 5 again.

    Wednesday, June 13, 2007

    Do You Want to Ignore Clustering? [DB2 9 for z/OS]

    DB2 9 for z/OS offers a new DDL parameter for your tables: APPEND. If you specify APPEND NO, which is the default, DB2 will operate as you are accustomed to it operating. That is, when rows are inserted or loaded DB2 will attempt to sequence them based on the clustering index key.

    If you specify APPEND YES though, DB2 will ignore clustering during inserts and online LOAD processing. Instead of attempting to maintain clustering, DB2 will just append the rows at the end of the table or partition. If the table space is partition-by-growth (new DB2 9 feature) then DB2 can use any partition with space available at the end; for range-partitioned table spaces, obviously DB2 will append the data to the end of the partition corresponding to the range for the value(s) being inserted.

    You might want to choose this option to speed up the addition of new data. Appending data is faster because DB2 does not have to search for the proper place to maintain clustering. And you can always re-cluster the table by running a REORG.

    The APPEND option cannot be specified on LOB tables, XML tables, or tables in work files.

    To track the state of the APPEND option there is a new column, APPEND, in the DB2 Catalog in SYSTABLES. Its value will be either ‘Y’ or ‘N’.

    Monday, June 04, 2007

    Reordered Row Format [DB2 9 for z/OS]

    If you’ve worked with DB2 for awhile, especially as a DBA, you’ve probably heard the advice to re-arrange the columns of your tables to optimize logging efficiency. Basically, the more data that DB2 has to log, the more overhead your programs will incur, and performance will degrade. DB2 will log data from the first byte changed to the last byte changed – unless the row is variable, in which case DB2 will log from the first byte changed to the end of the row – unless the change does not cause the length of the variable row to change, in which case DB2 goes back to logging from the first byte changed to the last byte changed.

    So, the advice goes something like this: put you static columns (those that do not change frequently) at the beginning of the row and your dynamic columns (those that will change more frequently) at the end of the row. And put your variable columns at the end of each. This would make your row look something like this:

    [Static fixed-length cols]
    [Static variable cols]
    [Dynamic fixed-length cols]
    [Dynamic variable cols]

    Make sense?

    Well, DB2 9 for z/OS takes this advice to heart (sort of). In New Function Mode (NFM), for new table spaces, DB2 will automatically put the variable columns at the end of the row. This is called reordered row format (RRF); the row format we are all familiar with today is now referred to as basic row format (BRF). This is all how the row is stored – it does not mean that your DDL is changed nor does it require changes to anything external or how you access the rows.

    To summarize, this means that a row in RRF will store the fixed-length columns first and the variable columns at the end. Pointers within the row will point to the beginning of the variable columns.

    So far so good, right? Well, we DB2 will also convert our old table spaces to RRF over time. Once we are in DB2 9 NFM, a REORG or a LOAD REPLACE will cause a change from BRF to RRF. So run a LOAD REPLACE a table space in NFM and the row format changes. REORG a partition and the row format for that partition changes. And yes, you can have a partitioned table space with some partitions in BRF and some in RRF.

    With BRF we can be sure that DB2 is putting our variable columns at the end of the row – where they belong. But it still is not helping us with placing static columns before the dynamic ones. You’ll still have to guide DB2 to do that.

    Tuesday, May 29, 2007

    INSTEAD OF Triggers [DB2 9 for z/OS]

    DB2 9 for z/OS introduces a new type of trigger: the INSTEAD OF trigger. The primary usage of INSTEAD OF triggers is to enable views that would not otherwise be updatable to support updates. Typically, a view that consists of multiple base tables cannot be updated. But with an INSTEAD OF trigger this problem can be surmounted. You can code an INSTEAD OF trigger to direct inserts, updates and deletes to the appropriate underlying tables of the view.


    With the INSTEAD OF trigger, your application code does not have to include complex algorithms to specify which operations should be performed against views and which should be performed against base table. Instead, all actions are performed against the view and the activated trigger determines which underlying base tables are to be impacted.


    So, you might choose to code an INSTEAD OF trigger on a view over a join to allow modifications on the view to go through to the underlying base tables joined in that view. Or you can encode and decode data within a view: for example, the view could contain the decryption functions while the INSTEAD OF triggers use the encryption functions to ensure security.


    Only one INSTEAD OF trigger is allowed for each type of operation on a given subject view. That is, one for inserts, one for updates, and one for deletes. Therefore, you can have a grand total of three INSTEAD OF triggers per view.


    DB2 executes the triggered-action instead of the insert, update, or delete operation on the subject view. Neither the WHEN clause nor the FOR EACH STATEMENT clause are allowed in INSTEAD OF triggers.


    Furthermore, there are some restrictions on the view in order for an INSTEAD OF trigger to be allowed. First of all, the view must exist at the current server when the INSTEAD OF trigger is created. Additionally, none of the following are permitted for a view to have an INSTEAD OF trigger:

    • the WITH CASCADED CHECK option
    • a view on which a symmetric view has been defined
    • a view that references data encoded with different encoding schemes or CCSID values
    • a view with a ROWID, LOB, or XML column (or a distinct type that is defined as one of these types)
    • a view with a column based on an underlying column defined as an identity column, security label column, or a row change timestamp column
    • a with a column that is defined (directly or indirectly) as an expression
    • a view with a column that is based on a column of a result table that involves a set operator
    • a view with any columns that have field procedures
    • a view where all of the underlying base tables are DB2 Catalog tables or created global temporary tables
    • a view that has other views dependent on it

    One way to think of INSTEAD OF triggers is that they contain the inverse of the logic in the body of the view. If the view joins tables, the trigger should break the join apart to modify the correct data. If the view decrypts columns, the INSTEAD OF trigger should encrypt the columns. Etc.


    Let’s take a look at an example to better understand the INSTEAD OF trigger. First, we create a view that joins the EMP and DEPT tables:



    CREATE VIEW EMP_DEPT (EMPNO, FIRSTNME, MIDINIT, LASTNAME,
    PHONENO, HIREDATE, DEPTNAME)
    AS SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, PHONENO,
    HIREDATE, DEPTNAME
    FROM EMP, DEPT
    WHERE EMP.WORKDEPT = DEPT.DEPTNO;



    OK, so far, so good. But since this view is a join it is not updateable. Let’s fix this by coding up some INSTEAD OF triggers. First, we’ll take care of inserts:




    CREATE TRIGGER E_D_ISRT
    INSTEAD OF INSERT ON EMP_DEPT
    REFERENCING NEW AS NEWEMP
    FOR EACH ROW INSERT INTO EMPLOYEE
    (EMPNO, FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE)
    VALUES
    (EMPNO, FIRSTNME, MIDINIT, LASTNAME,
    COALESCE
    ((SELECT DEPTNO FROM DEPT AS D WHERE D.DEPTNAME = NEWEMP.DEPTNAME),
    RAISE_ERROR('70001', 'Unknown dept name')
    ),
    PHONENO, HIREDATE);



    An insert against the view would not be inserting a new department, so we will be inserting data into the EMP table. If the department does not exist, we’ll raise an error. Next we’ll consider updates:




    CREATE TRIGGER E_D_UPD
    INSTEAD OF UPDATE ON EMP_DEPT
    REFERENCING NEW AS NEWEMP OLD AS OLDEMP
    FOR EACH ROW
    BEGIN ATOMIC
    VALUES(CASE WHEN NEWEMP.EMPNO = OLDEMP.EMPNO THEN 0
    ELSE RAISE_ERROR('70002', 'Must not change EMPNO') END);
    UPDATE EMP AS E SET
    (FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE)
    = (NEWEMP.FIRSTNME, NEWEMP.MIDINIT, NEWEMP.LASTNAME,
    COALESCE((SELECT DEPTNO FROM DEPT AS D
    WHERE D.DEPTNAME = NEWEMP.DEPTNAME),
    RAISE_ERROR ('70001', 'Unknown dept name')),
    NEWEMP.PHONENO, NEWEMP.HIREDATE)
    WHERE NEWEMP.EMPNO = E.EMPNO;
    END








    Finally we take care of deletions:




    CREATE TRIGGER E_D_DEL
    INSTEAD OF DELETE ON EMP_DEPT
    REFERENCING OLD AS OLDEMP
    FOR EACH ROW
    DELETE FROM EMP AS E WHERE E.EMPNO = OLDEMP.EMPNO;


    Using an INSTEAD OF trigger, each requested modification operation made against the view is replaced by the trigger logic. The trigger performs the insert, update, or delete on behalf of the view. No application changes are required because the code is in the trigger which resides in the database.


    If you want to read more about INSTEAD OF triggers, I recommend this quite extensive article (albeit for DB2 LUW) out on the IBM Developer Works web site.


    Thursday, May 24, 2007

    LOB Enhancements [DB2 9 for z/OS]

    IBM focused their attention on improving DB2’s ability to store and manage LOB data in Version 9. As anyone who has tried to use LOBs in a previous version of DB2 knows, the usability limitations were troublesome. But with Version 9, IBM chips away at some of the more annoying LOB limitations.

    FETCHing LOBs

    Prior to Version 9, there were two methods you could deploy in your programs to fetch LOB data:

    • Fetching data into a pre-allocated buffer
    • Using a LOB locator to retrieve a handle on the data.

    Both methods have their issues. Fetching data into a preallocated buffer can cause virtual storage constraint problems, especially for larger LOBs. On the other hand, using LOB locators that commit infrequently or do not explicitly free the locators can use considerable amounts of DB2 resources.

    V9 introduces a new clause, WITH CONTINUE, for use on your FETCH statements. By coding your program to use WITH CONTINUE you can retrieve LOB columns in multiple pieces without using a LOB locator, and continue a FETCH operation to retrieve the remaining LOB data when truncation occurs. (Note: this method can be used with XML data, too.) You will have to manage the buffers and reassemble the pieces of data in your application program.

    So, by specifying WITH CONTINUE on your FETCH statement you tell DB2 to allow subsequent FETCH CURRENT CONTINUE operations. These will allow you to access the remaining truncated LOB (or XML) column after the initial FETCH. If truncation occurs, DB2 will remember the truncation position and will not discard the remaining data. DB2 will return the total length that would have been required to hold all of the data of the LOB or XML column. This will either be in the first four bytes of the LOB host variable structure or in the 4 byte area that is pointed to by the SQLDATALEN pointer in the SQLVAR entry of the SQLDA for that host variable.

    File Reference Variables

    DB2 V9 adds support for a LOB file reference variable. A file reference variable is a host variable that is defined in a host language to contain the file name that directs file input and output for a large object (LOB).

    Using file reference variables, large LOB values can be inserted from a file or selected into a file rather than a host variable. This means that your application program does not need to acquire storage to contain the LOB value. File reference variables also enable you to move LOB values from the DBMS to a client application or from a client application to a database server without going through the working storage of the client application.

    LOBs and Utilities

    The manner in which DB2 handles LOBs in utility processing has also been improved in DB2 V9.

    Loading and unloading LOBs has been improved. For LOAD, an input field value can contain the name of the file that contains a LOB column value. The LOB column value will then be loaded from that file. For UNLOAD, you can store the value of a LOB column in a file and record the name of the file in the unloaded record in the base table.

    What about REORG? Well, prior to V9 you could not access LOB data during a REORG. And a REORG did not reclaim physical space from the LOB data set because LOBs were moved within the existing LOB table space. V9 fixes these problems. During a REORG (in V9), the original LOB table space is drained of writers. All LOBs are then extracted from the original data set and inserted into a shadow data set. When this operation is complete, all access to the LOB table space is stopped (the readers are drained) while the original data set is switched with the shadow data set. At this point, full access to the new data set is enabled, and an inline copy is taken to ensure recoverability of data.

    Additionally, both CHECK LOB and CHECK DATA have SHRLEVEL REFERENCE and SHRLEVEL CHANGE options. Using these you can minimize downtime. For LOBs, CHECK DATA checks for consistency between a base table space and the corresponding LOB or XML table spaces. The CHECK LOB utility identifies structural defects in the LOB table space and any invalid LOB values. Running CHECK (DATA or LOB) with SHRLEVEL REFERENCE indicates that applications can read from but cannot write to the index, table space, or partition that is to be checked. SHRLEVEL CHANGE means that applications can read from and write to the index, table space, or partition that is to be checked.

    Performance

    Finally, DB2 V9 provides several performance improvements related to LOBs. Using file reference variables or WITH CONTINUE to “chunk” read LOBs can improve performance over using locators.

    And as we all know, removing locks can improve performance. DB2 V9 eliminates LOB locking for space allocation, as well as for insert, delete, update, and select operations. Additionally, a LOB lock is no longer required to serialize the consistency between the value of the LOB and the column of the base row for an uncommitted read operation.

    Monday, May 21, 2007

    New OLAP Capabilities [DB2 9 for z/OS]

    DB2 9 for z/OS offers several new SQL improvements focused on improving OLAP functionality. The OLAP functions provide the ability to return ranking, row numbering, and existing aggregate function information as a scalar value in the result of a query. You can include OLAP specifications in an expression, in a select-list, or in the ORDER BY clause of a select-statement.


    The result to which the OLAP specification is applied is the result table of the innermost subselect that includes the OLAP specification. OK, at this point you might well be asking, “So, what are these new OLAP things and what can they do for me?” Let’s take them one at a time.


    First up, we have the RANK and DENSE_RANK functions. These functions specify that the ordinal rank of a row within the specified window is computed. Rows that are not distinct with respect to the ordering within the specified window are assigned the same rank. Here is a quick example:


    SELECT
    EMPNO,
    LASTNAME,
    FIRSTNAME,
    SALARY+BONUS+COMM AS TOTAL_COMP,
    RANK() OVER(ORDER BY SALARY+BONUS+COMM DESC) AS RANK_COMP
    FROM EMP
    WHERE SALARY+BONUS+COMM > 30000
    ORDER BY LASTNAME;


    This query will rank employees who have total compensation greater than $30,000, but order the results by last name. This allows you to rank data differently than the order in which it is presented.

    You can define the results of ranking with gaps in the sequential rank numbering by using the RANK specification, or without gaps, by using the DENSE_RANK specification.

    The difference between the two can be a bit difficult to decipher at first. Think of it this way: RANK specifies that the rank of a row is defined as 1 plus the number of rows that strictly precede the row. So, if two or more rows are not distinct you will get gaps in the ranking. With DENSE_RANK the rank of a row is defined as 1 plus the number of preceding rows that are distinct with respect to the ordering. In this case there will be no gaps in the sequential rank numbering. Consider the following data:


    EMPNO LASTNAME FIRSTNAME SALARY BONUS COMM
    ----- -------- --------- ------ ------ ------
    100 MULLINS CRAIG 500000 100000 400000
    200 SHAW DONNA 25000 10000 0
    300 ABBOTT RANDY 700000 300000 0
    400 WISER BUD 10000 0 0
    500 GREEN RACHEL 40000 2000 5000



    The results of the previous query run against this data would look like this:


    300 ABBOTT RANDY 1000000 1
    500 GREEN RACHEL 47000 3
    100 MULLINS CRAIG 1000000 1
    200 SHAW DONNA 35000 4
    400 WISER BUD 10000 5


    Note that both ABBOTT and MULLINS earn the most, but the amount is the same, so they share the number one ranking. As this is not a dense rank, the next rank value is 3, and not 2.


    The next OLAP specification introduced by DB2 9 for z/OS is ROW_NUMBER. ROW_NUMBER specifies that a sequential row number is computed for the row that is defined by the ordering, starting with 1 for the first row. If the ORDER BY clause is not specified in the window, the row numbers are assigned to the rows in an arbitrary order, as the rows are returned. This satisfies an often-requested capability to simply assign a number to the result rows of a query. Row numbers also enable easy formulation of queries for computing histogram statistics and they enable formation of other OLAP specifications (for example, moving sums, moving averages, and so on).


    Here is a sample query using ROW_NUMBER:



    SELECT
    ROW_NUMBER() OVER(ORDER BY WORKDEPT, LASTNAME) AS NUMBER,
    LASTNAME,
    SALARY
    FROM EMP
    ORDER BY WORKDEPT, LASTNAME;


    The result of a RANK, DENSE_RANK, and ROW_NUMBER specification is BIGINT, and the result cannot be null.


    One more thing to consider for the new OLAP features is the ability to partition results for the OLAP specification. This is specified using the PARTITION BY clause with a partitioning-expression, which is an expression used to define the partitioning of the result table. Each column name that is referenced in a partitioning-expression must unambiguously reference a column of the result table of the subselect that contains the OLAP specification. A partitioning-expression cannot include a scalar-fullselect or any function that is not deterministic or has an external action.


    Here is an SQL example using partitioning:



    SELECT
    WORKDEPT,
    EMPNO,
    LASTNAME,
    FIRSTNAME,
    EDLEVEL,
    DENSE RANK() OVER
    (PARTITION BY WORKDEPT ORDER BY EDLEVEL DESC) AS RANK_EDLEVEL
    FROM EMP
    ORDER BY WORKDEPT, LASTNAME;


    This SQL ranks departments according to their education level. And because DENSE_RANK is specified multiple employees with the same rank in the department will not increase the next ranking.


    Sometimes RANK, DENSE_RANK, and ROW_NUMBER are called window functions. An OLAP specification is not valid in a WHERE, VALUES, GROUP BY, HAVING, or SET clause. An OLAP specification cannot be used as an argument of an aggregate function. When invoking an OLAP specification, a window is specified that defines the rows and order of the rows over which the function is applied.

    Tuesday, May 15, 2007

    Implicitly Hidden Columns [DB2 9 for z/OS]

    Another nice new feature deep in the bowels of DB2 9 for z/OS is the ability to hide columns from the SELECT * statement. As far back as anyone can remember the advice has been given to avoid using SELECT * in application programs. But I still see it every now and then.

    Now don't get me wrong, SELECT * is a nice shorthand when you are writing quick & dirty SQL using SPUFI or some other ad hoc SQL tool. But it does not belong in your application programs because a subsequent ALTER to add a column will cause the program to fail because there are now more columns being returned than the programmer coded into the program.

    "OK," you may be asking, "so what? I thought you were writing about DB2 9 here?" Fair enough. DB2 9 for z/OS adds the ability to code IMPLICITLY HIDDEN on the column specification of a CREATE or ALTER TABLE statement. By coding IMPLICITYLY HIDDEN, the column will not be visible in the result of a SQL statement unless you explicitly refer to the column by name. So, SELECT * will not return any implicitly hidden columns to the result set.

    For example, in the following table C2 is implicitly hidden and will not be returned as the result of SELECT *:

    CREATE TABLE T1
    (C1 SMALLINT NOT NULL,
    C2 CHAR(10) IMPLICITLY HIDDEN,
    C3 TIMESTAMP)
    IN DB.TS;

    This has some obvious beneifts. First of all, if you are one of those shops where programmers did not follow the no SELECT * in programs rule, then you can simply add every new column with the IMPLICITLY HIDDEN attribute and those SELECT * statements will keep on running because they won't see the new columns.

    Or, you might want to take a more comprehensive approach, and specify every column (except one, perhaps the key) of every new (or modified) table as IMPLICITLY HIDDEN. If every column except the key is hidden, then a SELECT * won't return anything except the key - you'll have to explicitly specify all of those other columns to get them into your result sets. Of course, this negates the ability to use SELECT * for quick & dirty SPUFI queries because implicitly hidden columns will be hidden there, too.

    There are a few caveats on the usage of IMPLICITLY HIDDEN. You cannot specify IMPLICITLY HIDDEN for a column that is defined as a ROWID, or a distinct type that is based on a ROWID. Additionally, IMPLICITLY HIDDEN must not be specified for all columns of a table.

    Monday, May 14, 2007

    Index on Expressions [DB2 9 for z/OS]

    DB2 9 for z/OS offers, for the first time, the ability to create an index on data that is not technically in the table. At this point, you may well be asking “What does that mean?” and “Why would I do that?” I’ll attempt to answer both of those questions.

    Basically, the neat new feature is an extension to the CREATE INDEX statement that lets you create an index on an expression. That begs the question of just what is an expression… Well, an expression can be as simple as a column reference, or it can be a built-in function invocation or even a more general expression, with certain restrictions.

    The expression that is being indexed is referred to as the key-expression. The maximum length of the text string of each key-expression is 4000 bytes after conversion to UTF-8. Each key-expression must contain as least one reference to a column of the table named in the ON clause of the index. Referenced columns cannot be LOB, XML, or DECFLOAT data types nor can they be a distinct type that is based on one of those data types. Referenced columns cannot include any FIELDPROCs nor can they include a SECURITY LABEL. Furthermore, the key-expression cannot include any of the following:
    • subquery
    • aggregate function
    • not deterministic function
    • function that has an external action
    • user-defined function
    • sequence reference
    • host variable
    • parameter marker
    • special register
    • CASE expression
    • OLAP specification
    Unlike a simple index, the index key of an index on expression is composed by concatenating the result (also known as a key-target) of the expression that is specified in the ON clause. An index that is created on an expression lets a query take advantage of index access (if the index is chosen by the optimizer) and avoid a table space scan.

    Perhaps an example would help to clarify this topic. Consider this sample DDL:

    CREATE INDEX XUPLN
    ON EMP
    (UPPER(LAST_NAME, 'En_US'))
    USING STOGROUP DSN8G910
    PRIQTY 360 SECQTY 36
    ERASE NO
    COPY YES;

    This example would create an index on the LAST_NAME column of the EMP table, but would index on the data after applying the UPPER function. This is useful if you store the data in mixed case but submit queries with parameter markers (or host variables) in upper case only. By indexing the data as upper case the index better matches your queries.

    Query performance can be enhanced if the optimizer chooses that index. When you use an index on an expression, the results of the expressions are evaluated during insertion time or during an index rebuild and are kept in the index. If the optimizer chooses to use that index, the predicate is evaluated against the values that are stored in the index. As a result, run-time performance overhead is eliminated.

    This is a nice new feature of DB2 9 for z/OS that you should consider if you have queries that search on expressions.

    Monday, April 30, 2007

    Database Definition on Demand [DB2 9 for z/OS]

    As you probably know, online schema evolution (sometimes referred to as “online schema change”) was one of the key new features of DB2 V8. But, as its name implies, its capabilities continue to evolve. With V9, online schema evolution expands to simplify more types of database definition changes. The new term IBM is using for this in V9 is Database Definition On Demand (DDOD).

    One of the nice new components provided by DDOD in V9 is that online table space reorganization is significantly improved. Today, when reorganizing just a couple of partitions in a partitioned table space the BUILD2 phase takes a long time to complete. V8 removed the outage for DPSIs, and now V9 removes the BUILD2 phase for all types of secondary indexes.

    Another new DDOD capability supports replacing one table quickly with another. This is accomplished via cloning and the technique can even avoid the need to REBIND packages. Cloning allows you to generate, in the same table space, a new table with the same structure as the original table. After creating the clone you can do with it what you want – LOAD, DELETE, INSERT, UPDATE, etc. data – and then exchange the clone table name with the current table name. In this way you can keep the existing table operational while you work on the next “generation” of that table in its clone. When the clone is ready, you EXCHANGE it with the existing table. Nice, huh?

    How about being able to rename a column within your table or rename an index? V9 provides the ability to do both! No longer do you have to DROP and re-CREATE in order to rename columns and indexes.

    DB2 V9 also introduces a new type of table space that combines the attributes of segmented and partitioned. It is called a universal table space. Universal table spaces offer improved space management for variable length rows. This is so because it uses the space map page with more free space information like segmented table spaces. Also, like segmented table spaces, universal table spaces deliver improved mass delete performance and you can immediately reuse the table segments after the mass delete.

    There are two types of universal table spaces:

    • Partition-by-growth: this type of universal table space will partition the data as it grows without the need to specify key ranges. This type of universal table space is beneficial for tables that will grow over time and/or need the additional limits afforded by partitioning, but can benefit from the performance of segmented. You can define more than one table in this type of universal table space if you wish.
    • Range-partitioned: this type of universal table space requires a key range for partitioning – and it can contain only a single table. This is basically adding segmentation to the existing partitioned table space.

    Additionally, you can define SMS constructs (MGMTCLAS, DATACLASS, and STORCLAS) on a STOGROUP and you can ALTER those constructs as well. And table space and index logging parameters can be altered.

    DB2 V9 even adds a new capability to change the DB2 early code without requiring an IPL.

    So, with DB2 9 for z/OS we get more flexibility in modifying our database schemas. And that is a good thing, right?

    Thursday, April 19, 2007

    SELECT from DELETE, UPDATE, and MERGE [DB2 9 for z/OS]

    Another nice new SQL feature provides the ability to SELECT from DELETE, UPDATE, and MERGE statements.

    This capability is similar to the SELECT from INSERT feature that was introduced with DB2 V8. So, before looking at the new V9 feature, let’s review the V8 feature.

    The ability to SELECT from an INSERT statement is an intriguing feature. To understand why this is so, we need to review some background details. In some cases, it is possible to perform actions on an inserted row before it gets saved to disk. For example, a BEFORE TRIGGER might change data before it is even recorded to disk. But the application program will not have any knowledge of this change that is made in the trigger. Identity columns and columns with defaults (especially user-defined defaults) have similar effects. What if the program needs to know the final column values? Prior to V8 this was difficult and inefficient to implement.

    Well, the SELECT FROM INSERT feature introduced in DB2 V8 solves this problem. It allows you to both insert the row and retrieve the values of the columns with a single SQL statement. It performs very well because it performs both the INSERT and the SELECT as a single operation. Consider the following example:


    SELECT COL5 INTO :C5-HV
    FROM FINAL TABLE
    (INSERT (COL1, COL2, COL5, COL7) INTO SAMPLE_TABLE
    VALUES('JONES', 'CHARLES', CURRENT DATE, 'HOURLY')
    );

    The data is inserted as specified in the VALUES clause, and retrieved as specified in the SELECT. Without the ability to select COL5, the program would have no knowledge of the value supplied to COL5, because it was assigned using CURRENT DATE. With this new syntax the program can retrieve the CURRENT DATE value that was just inserted into COL5 without incurring additional overhead.

    OK, on to V9. In this new version you can retrieve columns from rows that are modified via DELETE, UPDATE, and MERGE statements, thereby replacing multiple SQL calls with one. DB2 V9 allows a searched UPDATE or a searched DELETE statement in the FROM clause of a SELECT statement that is a subselect, or in the SELECT INTO statement. This capability allows a user or program to know which values were updated or deleted.

    And, if you recall from a few blog entries ago, DB2 9 for z/OS also adds the MERGE statement. Well, it also adds the ability to SELECT from the MERGE. This allows you to return all the rows that were either inserted or updated as a result of the MERGE.

    Here is an example of a SELECT from an UPDATE:


    SELECT SUM(salary) INTO :SAL-HV
    FROM FINAL_TABLE
    (UPDATE EMP
    SET SALARY = SALARY * 1.02
    WHERE WORKDEPT = ‘A01’);


    Prior to the capability you would have had to run the UPDATE statement, and then only after it finishes, you would run the SELECT to add up the new salary values. Now, instead of multiple statements requiring multiple passes through the data, you can consolidate it into one.

    Nice, huh?