Tuesday, September 04, 2007

MODIFY RECOVERY [DB2 9 for z/OS]

Another IBM DB2 utility that has received some attention under Version 9 is MODIFY RECOVERY. For those not sure what this utility is, MODIFY RECOVERY is used to remove records from the SYSIBM.SYSCOPY table in the DB2 Catalog, along with related log records from the SYSIBM.SYSLGRNX directory table and related information from the DBD.

Before the new V9 changed, when you ran MODIFY RECOVERY you had to specify the deletion criteria in one of two ways:

  • Given a specific date, delete all recovery information before that data
  • Or given an age in days, delete all recovery information older than the age

But as of V9 we can alternately specify what is to be retained instead of what is to be deleted. This way, we tell the utility what we want to keep and it will remove the rest.

So, instead of coding the DELETE parameter (with a DATE or AGE) we would instead code a RETAIN parameter. The RETAIN parameter can specify one of five different options:

LAST (integer) – Using this option informs DB2 to choose a date such that the number of full image copies kept is not less than the integer specified. So, if we specify LAST(5), then we will delete all entries that exist prior to the date on which the fifth oldest copy was taken. If more than one copy was taken on that date the result could be that more than 5 copies are retained.

LOGLIMIT – Specifying this option will delete all copies where DB2 no longer has log records to recover forward from. DB2 uses the BSDS to establish the oldest archive log timestamp.

GDGLIMIT – You can use this option to enable the number of copies to be retained to be linked to the corresponding GDG base limit.

GDGLIMIT LAST (integer) – It is possible also to combine the GDGLIMIT and LAST options. DB2 will then use the GDG base limit, if the last primary copy is a GDG, if not it uses the integer specified.

GDGLIMIT LOGLIMIT – And finally, we can combine the GDGLIMIT and LOGLIMIT options, too. If the most recent primary full copy is a GDG then the GDG limit is used. If the last copy is not a GDG then the BSDS is used to determine the date prior to which deletions will occur.

The last two options (the ones that combine two parameters) are particularly useful if you have a mix of GDGs and non-GDGs for your image copies. They are also helpful if you use the LISTDEF option when you make image copy backups.

It is important to understand that for all these options MODIFY RECOVERY will cause DB2 to determine the most recent date that satisfies the retention requirement. After ascertaining the correct date to use, all entries before that data will be deleted. In effect DB2 chooses the most recent date that would satisfy the retention criteria. So, under some circumstances there may be more than one image copy on the specific date that is established, and therefore DB2 will keep those additional copies, too.

As an example, say we take 7 image copies on a single day. Then we run MODIFY RECOVERY … RETAIN LAST (3). In this case, all 7 will be retained as they were made on the same day, even though we indicated that we wanted to retain only the last 3.

Finally, there is one more significant change made to the MODIFY RECOVERY utility. Prior to V9, information was only deleted when copies were removed from SYSIBM.SYSCOPY. But as of V9, MODIFY RECOVERY will deletes SYSIBM.SYSLGRNX entries even if no SYSIBM.SYSCOPY records are deleted.

Why is this important? Well, some shops have migrated over to running BACKUP SYSTEM instead of individual image copies. In this case, there will be no individual image copies to remove in SYSIBM.SYSCOPY, but the SYSIBM.SYSLGRNX records will continue to grow. After migrating to V9 you can use MODIFY RECOVERY to trim the size of SYSIBM.SYSLGRNX which can improve performance for processes that need to access log range information.

Tuesday, August 28, 2007

TEMPLATE Switching [DB2 9 for z/OS]

Next up in this on-going series covering the new features and functionality of DB2 9 for z/OS we will tackle the improvements that have been made to the IBM DB2 utilities.

A nice new feature that impacts multiple utilities is TEMPLATE switching. For those not familiar with templating, TEMPLATE is a utility control statement that enables the allocation of data sets for a LISTDEF (LISTDEF is the way you control the list of database objects for a utility execution). With the TEMPLATE statement you essentially specify data set naming conventions and allocation information without using JCL DD statements.

OK, so what is TEMPLATE switching? What it enables you to do is to specify different characteristics for image copies of varying sizes. So, for example, you can create a different template for low and high volume image copies. Here is an example:



//SYSIN DD *
TEMPLATE low DSN &DB..&TS..IC.D&DA..T&TI.
UNIT=DASD LIMIT(50 CYL,high)

TEMPLATE high DSN &DB..&TS..IC.D&DA..T&TI.

UNIT=TAPE

COPY TABLESPACE MY.SMALLTS COPYDDN(low)

COPY TABLESPACE MY.LARGETS COPYDDN(low)



Note the new parameter named LIMIT. This is what controls when the TEMPLATE is switched. In this example we set the LIMIT for low volume at 50 cylinders. When this is reached the TEMPLATE will be switched to the high volume template. Of course, you have the flexibility to control the limit by specifying whatever number is appropriate for your site as well as specifying it in CYL, GB, or MB.

It is important to understand that DB2 can only switch the TEMPLATE once, so you cannot have more than two templates.

Template switching is available for image copies produced by COPY, COPYTOCOPY, MERGECOPY, LOAD, and REORG. And you can set up template switching for both the COPYDDN and RECOVERYDDN.

Wednesday, August 22, 2007

Optimistic Locking [DB2 9 for z/OS]

DB2 Version 9 improves support for coding optimistic locking techniques. What is optimistic locking? Sometimes referred to as optimistic concurrency control, optimistic locking is basically just what it sounds like. We are optimists and think that usually we will be the only ones with interest in the data. In other words, when optimistic locking is implemented you are assuming that most of the time there will be no other programs that are interested in the page of data that you are planning to modify.


Of course, even in the most optimistic world there will be exceptions, so optimistic locking does not assume that there will never be any concurrent processes that need to access your page(s). Basically, with optimistic locking you can improve performance by minimizing locking. So how do we do that?


When an application uses optimistic locking, locks are obtained immediately before a read operation and then released immediately. Update locks are obtained immediately before an update operation and held until the end of the transaction. Optimistic locking uses the RID (Record IDentifier) and a row change timestamp to test whether data has been changed by another transaction since the last read operation.


DB2 knows when a row was changed and so therefore he (I make DB2 masculine) can ensure data integrity even as he minimizes the duration of locks. With optimistic locking, DB2 releases the page (or row) locks immediately after a read operation. And if you are using row locks, DB2 releases the row lock after each FETCH, taking a new lock on a row only for a positioned update or a positioned delete.


Careful readers will have noticed that I talked about a “row change timestamp” but you may not have heard that expression before. DB2 V9 adds support for automatically generated timestamp columns and if you wish to implement optimistic locking you will need to create (or alter) your tables to have a row change timestamp column, defined as follows:


NOT NULL GENERATED ALWAYS
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP

or

NOT NULL GENERATED BY DEFAULT
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP

For tables having a row change timestamp column, DB2 will automatically populate and maintain the timestamp values for each row. Notice how the syntax is similar to the syntax used for other automatically generated DB2 values, such as sequences. DB2 will automatically generate the timestamp value for each row when the row is inserted, and modify the timestamp for each row when any column in that row is updated.

When you add a ROW CHANGE TIMESTAMP column to an existing table, the initial value for existing rows will not be immediately populated. Instead, DB2 places the table space in an advisory-REORG pending state. When you reorganize the table space, DB2 will generates the values for the ROW CHANGE TIMESTAMP column for all rows (and, of course, remove the advisory-REORG pending status).

Now that you have this new column you can use it as a condition for making an UPDATE and specify it in your WHERE clause. Let’s walk thru a couple of examples.

First of all, when a table contains a ROW CHANGE TIMESTAMP you can use it to find out when its rows were modified. Let’s use the following table as an example:

CREATE TABLE CUSTOMER
(CUSTNO CHAR(8) NOT NULL,
CUST_INFOCHANGE NOT NULL GENERATED ALWAYS
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP,
CUST_NAME VARCHAR(50),
CUST_ADDRESS VARCHAR(100),
CUST_CITY CHAR(20),
CUST_STATE CHAR(2),
CUST_ZIP CHAR(9),
CUST_PHONE CHAR(10),

PRIMARY KEY (CUSTNO)
);

Now that the table is defined with the ROW CHANGE TIMESTAMP we can use it in our programs and queries to determine change information about the data. For example, if we want to find all of the customer rows that were changed in the past week (ie. the last 7 days) we could run the following query:


SELECT CUSTNO, CUST_NAME

FROM CUSTOMER
WHERE ROW CHANGE TIMESTAMP FOR CUSTOMER <=
CURRENT TIMESTAMP AND
ROW CHANGE TIMESTAMP FOR CUSTOMER >=
CURRENT TIMESTAMP - 7 DAYS;

But what would happen if you issued a statement like this against a table that was altered to include a ROW CHANGE TIMESTAMP? For example, if we created the CUSTOMER table as shown but without the CUST_INFOCHANGE column, populated the table with data, and then altered the table to include the CUST_INFOCHANGE column? In this case, DB2 will use the time the page was last modified. So the results will not be exactly correct because it would return all the rows on each page that qualifies (because at least one row on the page changed). This is why it is important to clear up the advisory REORG pending as soon as possible after adding the ROW CHANGE TIMESTAMP.

OK, this is all well and good, and you can probably see the value of having this automagically changing timestamp in some of your tables, but where is the optimistic locking part? Well, for programs that use updateable static scrollable cursors DB2 can use optimistic locking as long as the plan/package is bound specifying ISOLATION(CS). If you have this situation, DB2 will deploy optimistic locking to reduce the duration of locks between consecutive FETCH operations and between fetch operations and subsequent positioned UPDATE or DELETE operations.

Remember, though, DB2 cannot use optimistic concurrency control for dynamic scrollable cursors. With dynamic scrollable cursors, the most recently fetched row or page from the base table remains locked to maintain position for a positioned UPDATE or DELETE.

Without optimistic locking, the lock taken at the first FETCH is held until the next FETCH. The lock taken at the last FETCH is held until COMMIT, ROLLBACK, or the end of transaction.

With optimistic locking, the scenario changes significantly. When the application requests a FETCH to position the cursor on a row, DB2 locks that row, executes the FETCH and releases the lock. When the application requests a positioned UPDATE or DELETE on the row, DB2 locks the row and then re-evaluates the predicate to ensure that the row still qualifies for the result table.

So, when you move to DB2 9 for z/OS you should evaluate your applications looking for programs that could take advantage of optimistic locking – and then add the ROW CHANGE TIMESTAMP to the appropriate tables.

Tuesday, August 14, 2007

DB2-L Is Back Up & Running

In case you haven't noticed yet, the DB2 mailing list (DB2-L) is back up and running as of Monday, August 13, 2007.

However, the archives are not yet available, but it looks like they will be back up soon (with no data loss). So, welcome back DB2-L - - we all missed you!

Thursday, August 09, 2007

DB2-L Is Down

Those of you who subscribe to the DB2 mailing list, also known as DB2-L, may have noticed that the usually steady stream of information, questions, and comments that used to pour into your in-box has dried up. Don't worry (well, you can worry a little bit, I s'pose), it is nothing you have done... DB2-L is not functional.

The company that hosts the list for IDUG moved the server two weeks ago and when they did they destroyed the disk drive. IDUG has been struggling to get everything rebuilt and back up since then, but it is taking longer than expected.

If you want to keep abreast of when it will be back online, status updates will be posted on IDUG's website...

Wednesday, August 08, 2007

Altering LONG VARCHAR [DB2 9 for z/OS]

Just a short entry today to discuss a feature that will be useful for organizations having tables containing long text strings which are stored as VARCHARs. It involves altering the data type of a column from LONG VARCHAR to a compatible data type.

In previous versions of DB2 before to V9 it is not possible to change the data type of a LONG VARCHAR or LONG VARGRAPHIC column using ALTER. Instead, you would have had to unload the data, drop the table, create the table using the new column definitions and reload the data back into the new table. Of course, when you drop the table you’d lose any authorizations, indexes, etc. defined on the table so you’d have to save and re-create those as well. Basically, it wasn’t easy unless you have a change management tool that automates the process.

But before going any further, what is a LONG VARCHAR? Well, a VARCHAR is a varying-length column for character string data where n specifies the maximum length of the string. If n was greater than 254, the column was setup as a long string column – that is, a LONG VARCHAR; same story for VARGRAPHIC columns.

As of DB2 V9, you will not create any more LONG VARCHAR columns – they are all treated as VARCHARs. But the old LONG VARCHAR columns, migrated from earlier releases, still exist. The COLTYPE in the DB2 Catalog (SYSIBM.SYSCOLUMNS) will be LONGVAR for LONG VARCHAR columns and LOGVARG for LONG VARGRAPHIC columns.

The good news is that V9 allows you to ALTER the data type of LONG VARCHAR to VARCHAR and LONG VARGRAPHIC to VARGRAPHIC using ALTER TABLE with the ALTER COLUMN parameter. First, you must lookup the LENGTH of the column in the SYSCOLUMNS and use that for the length of the new data type specification. For example, if LONGCOL in table EXAMPLE is a LONG VARCHAR you can ALTER it in DB2 V9 to VARCHAR. Assuming the length of the column was 500, you can issue the following ALTER to change the column to VARCHAR from LONG VARCHAR:

ALTER TABLE EXAMPLE
ALTER COLUMN LONGCOL
SET DATA TYPE VARCHAR(500);

You must specify the exact value stored in the DB2 Catalog for LENGTH in this ALTER TABLE statement. Failure to do so will result in an error message.

So, you can convert all of those LONG VARCHAR columns from past versions of DB2 to VARCHAR using the ALTER statement in V9.

Monday, August 06, 2007

Encryption [DB2 9 for z/OS]

DB2 V9 offers some encryption news, but we need to go back a version to start the story. You see, DB2 supports encryption in Version 8 through encryption functions that need to be explicitly coded in order to encrypt and decrypt data.

These functions (ENCRYPT and DECRYPT) allow you to encrypt and decrypt data at the column level. Because you can specify a different password for every row that you insert, you are encrypting data at the “cell” level in your tables. If you use these functions to encrypt your data, be sure to put some mechanism in place to manage the passwords that are used to encrypt the data. Without the password, there is absolutely no way to decrypt the data.

To assist you in remembering the password, you have an option to specify a hint (for the password) at the time you encrypt the data. The following SQL example shows an INSERT that encrypts the SSN ( social security number ) using a password and a hint:

INSERT INTO EMP (SSN)
VALUES(ENCRYPT('289-46-8832','TARZAN','? AND JANE'));

The password is “TARZAN” and the hint we’ve chosen to provide is “? AND JANE”… so the hint will prompt us to think of Tarzan as the companion of Jane.

In order to retrieve the encrypted data you will need to use the DECRYPT function supplying the correct password. This is shown in the following SELECT statement:

SELECT DECRYPT_BIT(SSN,'TARZAN') AS SSN
FROM EMP;


If we fail to supply a password, or the wrong password, the data is returned in an encrypted format that is unreadable.

The result of encrypting data using the ENCRYPT function is VARCHAR FOR BIT DATA. The encryption algorithm is an internal algorithm. For those who care to know, it uses Triple DES cipher block chaining (CBC) with padding and the 128-bit secret key is derived from the password using an MD5 hash.

When defining columns to contain encrypted data the DBA must be involved because the data storage required is significantly different. The length of the column has to include the length of the non-encrypted data + 24 bytes + the number of bytes to the next 8 byte boundary + 32 bytes for the hint.

OK, that is all V8 stuff and this series of blog postings is supposed to be about V9 functionality, right? So what about version 9? Well, DB2 9 for z/OS offers some nice improvements to encryption support. Firstly, DB2 can take advantage of encryption hardware advances.

CP Assist for Cryptographic Function, aka CPACF, is available on z990 hardware. CPACF can run on all the CPUs, but remember, this feature is available only on z990 and later machines, not the older z900. The z990 also introduces a PCIXCC card which is needed for the IBM Data Encryption Tool, but not for the DB2 encryption functions.

Note: The IBM Data Encryption Tool (available from IBM at an additional price) offers encryption for DB2 tables at the table level, whereas the encryption functions (free with DB2) offer encryption at the column level.

The CP Assist for Cryptographic Function delivers cryptographic support on every CP with Data Encryption Standard (DES), Triple DES (TDES), and Advanced Encryption Standard (AES)-128 bit data encryption/decryption, as well as Secure Hash Algorithm (SHA-1) and SHA-256 hashing. For a more detailed discussion of CPACF, associated technology and functionality, check out the following IBM redbook: IBM eServer zSeries 990 (z990) Cryptography Implementation (SG24-7070).

Basically, the net result is that the cost of encrypting DB2 data under V9 is reduced on the z990 hardware.

Additionally, IBM has added encryption support in the controllers of its storage devices.

Both the IBM TS1120 tape drive and IBM Ultrium 4 tape drives include data encryption capabilities within the drives. This support can allow you to avoid the need for host-based encryption of data or the use of specialized encryption appliances. In addition, IBM claims that the encryption does not significantly impact the performance of the drives so there should be minimal to no impact on the batch processing window when encrypting in this manner.

So far, we’ve been talking about encryption for data at rest. But DB2 9 for z/OS also improves support for encryption of data in transit. DB2 9 supports the Secure Socket Layer (SSL) protocol by implementing the z/OS Communications Server IP Application Transparent Transport Layer Security (AT-TLS) function. The z/OS V1R7 Communications Server for TCP/IP introduces the AT-TLS function in the TCP/IP stack for applications that require secure TCP/IP connections. AT-TLS performs transport layer security on behalf of the application, in this case DB2 for z/OS, by invoking the z/OS system SSL in the TCP layer of the TCP/IP stack. The z/OS system SSL provides support for TLS V1.0, SSL V3.0, and SSL V2.0 protocols.

So encryption of data over the wire is improved in z/OS 1.7. The Communications Server supports AT-TLS, which uses SSL data encryption. Now SSL encryption has been available on z/OS for a long time, but now DB2 9 for z/OS makes use of this facility and offers SSL encryption using a new secure port.

When acting as a requester, DB2 for z/OS can request a connection using the secure port of another DB2 subsystem. When acting as a server, and from within a trusted context (I’ll discuss trusted context in a later DB2portal blog entry), SSL encryption can be required for the connection.

So, little by little, better encryption support is being made available within the world of DB2 for z/OS.

Tuesday, July 24, 2007

Index Compression [DB2 9 for z/OS]

Another useful new feature debuting in V9 is the ability to compress indexes. We’ve been able to compress DB2 data in table spaces for a long time now, either through an exit routine or with the COMPRESS table space parameter (added in DB2 V3). But before V9 we’ve never been able to compress index data.

Why would you want to compress index data? Well, some types of applications require very large indexes on very large tables - - data warehousing applications are one good example. Sometimes, the storage required for indexes to support your data warehouse applications can exceed the storage required for the base table. So it makes sense that you might want to reduce the storage consumed by such indexes.

DB2 V9 introduces the COMPRESS parameter for indexes. You can specify COMPRESS YES (or NO) on your CREATE INDEX and ALTER INDEX statements. Index partitioning is done at the index level and cannot be performed on a partition by partition basis.

Additionally, DB2 will only compress the data in leaf pages, not in the root page and any non-leaf pages in between. This makes sense because you don’t want to incur the expense of decompressing all of these type of pages in your indexes in order to find the right leaf page range.

Index compression does not require a compression dictionary. As such, DB2 can begin to immediately compress data in the leaf pages of your newly created indexes.

Now think about what we’ve already learned for a minute. The data on the leaf page is compressed, but we will want to access it uncompressed, right? So index pages are stored on disk in a compressed format but will be expanded when read. So those 4K index pages on disk will require more than 4K when expanded. This means that compressed indexes must be defined in a larger buffer pool (8K, 16K, or 32K). Nevertheless, when you compress an index DB2 will always compress the data down into a 4K page size on disk no matter what page size you choose.

Another consideration to keep in mind is that index data is decompressed for index image copies, so a copy of an index will require more storage space than the actual index requires.

So, when you move to DB2 9 in NFM you have an additional compression decision to make: which indexes should be compressed and which should not? But it is a good thing to have more options at our disposal, especially for applications with huge indexing requirements.

Monday, July 16, 2007

CLONE Tables [DB2 9 for z/OS]

This new feature in DB2 V9 might sound like an old monster movie (Invasion of the Clone Tables!!!), but it is actually a nifty new capability for managing DB2 data availability. Cloning is basically a method of entirely refreshing all of the rows of a table while maintaining availability to the table.

OK, so how does it work? Basically, you will create a table with the exact same attributes as a table that already exists at the current server, except that it is empty of data. It is created using the ALTER TABLE SQL statement with the ADD CLONE parameter. This clone table is created in the same table space as the existing table. After creating the clone table you can do whatever you want to do with it. LOAD it, INSERT rows to it, etc.

The clone table will be structurally identical to the base table in every way: same number of columns, column names, data types, as well as have the same indexes, before triggers, LOB objects and so on. All of this is done auto-magically by DB2 when it the clone table is created. Authorizations and views on the base table, however, are not duplicated, or cloned.

So, the clone is created and it can be manipulated without impacting the base table. When the clone is ready to become the base table it is exchanged with the base table. This is accomplished using the new EXCHANGE SQL statement. After running an EXCHANGE the clone becomes the real table and the previously “real” table becomes the clone - - and you can repeat the process.

The clone and the base table are kept in different underlying VSAM linear data sets. We all should know that the data sets used by DB2 are named using the following format:
cat.DSNDBD.DBNAME.TSNAME.I0001.A001. Well, the clone uses the same name except for the next to last component. It will be I0002. When the exchange is made, DB2 flips the I0001 with the I0002.

Of course, there are rules and details you will need to know to properly deploy cloning in your organizations, but that is all documented in the manuals. The point of these blog entries is more to make you aware of new functionality and give you the basic flavor for how it works and what it is intended for…

With that in mind, this is intended to allow you to quickly replace one version of DB2 table data with another version, without impacting availability while the new version is built.

Thursday, July 12, 2007

New Data Types [DB2 9 for z/OS]

As we continue our blog series on the new features and functionality of DB2 9 for z/OS, today we examine the four (OK, five) new data types introduced in this version of DB2.

BIGINT

First up, we have the BIGINT data type. A BIGINT is an exact numeric data type capable of representing 63-bit integers. This is the third integer data type now available to DB2 and it offers the ability to store the largest range of values:

  • SMALLINT values can range from -32768 to 32767
  • INTEGER values can range from -2147483648 to 2147483647
  • BIGINT values can range from -9223372036854775808 to 9223372036854775807

So when you have the need to store very large integers you don’t have to use DECIMAL with a zero scale any longer.

BINARY and VARBINARY

Next up, and perhaps more exciting, V9 delivers a true binary data type for the first time in DB2, so we no longer have to use a BLOB or CHAR with FOR BIT DATA. BINARY is a fixed-length binary string up to 255 bytes. DB2 9 also delivers a VARBINARY data type, which is a variable-length binary string up to 32704 bytes.

BINARY and VARBINARY data types extend current support of binary strings (BLOB), and are compatible with the BLOB data type. They are not compatible with character string data types. IBM does make it somewhat easy to migrate existing columns defined as CHAR FOR BIT DATA or VARCHAR FOR BIT DATA over to BINARY or VARBINARY. If there is an index defined on the column, the index is placed in RBDP. You cannot alter BINARY or VARBINARY data types to CHAR FOR BIT DATA or VARCHAR FOR BIT DATA.

Also, there are some usage considerations to keep in mind. Two binary strings are equal only if the lengths are identical. If two strings are equal up to the length of the shorter string length, the shorter string is considered less than the longer string, even when the remaining bytes in the longer string are hex zeros.

DECFLOAT

The next new data type supported by DB2 9 for z/OS is DECFLOAT. V9 takes advantage of new System z9 hardware support delivering the DECFLOAT data type that lets you use decimal floating-point numbers with greater precision. A decimal floating-point value (DECFLOAT) is an IEEE 754r number with a decimal point.

The maximum precision is 34 digits and the range of a decimal floating point number is either 16 or 34 digits of precision.


DECFLOAT(16) values can range from a low of:

-9.999999999999999×10**384


to a high of:

9.999999999999999×10**384



DECFLOAT(34) values can range from a low of:

-9.999999999999999999999999999999999 ×10**6144


to a high of:

9.999999999999999999999999999999999 ×10**6144



Anyone think they’ll need bigger numbers?

In addition, the DECFLOAT data type is able to represent special values representing “non-number numbers”:

  • Infinity - a value that represents a number whose magnitude is infinitely large.
  • Quiet NaN - a value that represents undefined results which does not cause an invalid number condition. NaN is not a number.
  • Signaling NaN - a value that represents undefined results which will cause an invalid number condition if used in any numerical operation.

XML

And finally, we get pureXML support to store XML as a native data type in DB2. That means you can specify XML as a data type for columns in your DB2 tables in DB2 9 for z/OS. But I’m not really going to elaborate any further on XML here. A few more details can be found in an earlier post here.

Monday, July 09, 2007

New Built-in Functions [DB2 9 for z/OS]

DB2 9 for z/OS introduces a bevy of new built-in functions (BIFs) for programmers to use in their SQL statements. It is important to keep track of the BIFs available in DB2 because BIFs simplify your coding and development. Invoking a function is always easier than trying to write the equivalent functionality in your host language code. And the BIF will work properly, whereas you cannot always be so sure about your own (sometimes buggy) code.

So, what new function functionality do we get with DB2 9 for z/OS? First of all, we get some new ASCII and EBCDIC conversion functions. The ASCII_CHR function returns the character that has the ASCII code value that is specified by the argument.; and the ASCII_STR function returns a string, in the system ASCII CCSID that is an ASCII version of the string. Knowing that, I bet you can guess what EBCDIC_CHR and EBCDIC_STR do.

We also get UNICODE and UNICODE_STR functions in DB2 9 for z/OS. The UNICODE function returns the Unicode UTF-16 code value of the leftmost character of the argument as an integer. And the UNICODE_STR function? It returns a string in Unicode UTF-8 or UTF-16 (depending on the specified parameter) representing a Unicode encoding of the input string.

Perhaps more interesting is soundex support for testing whether two strings sound the same as each other. DB2 offers two functions here: SOUNDEX and DIFFERENCE. The SOUNDEX function returns a 4 character code that represents the sound of the words in the argument. The result can be used to compare with the sound of other strings. The data type of the result is CHAR(4). So what does this mean? Consider the following example:




SELECT LASTNAME
FROM DSN910.EMP
WHERE SOUNDEX(LASTNAME) = SOUNDEX(’Smith’);



This query would return not only employees with a last name of “Smith,” but also anything that sounds like Smith, such as Smythe.

The DIFFERENCE function is related to SOUNDEX. It returns a value from 0 to 4 where the number represents the difference between the sounds of two strings based on applying the SOUNDEX function to the strings. The higher the value, the closer the two strings are to sounding alike. Consider:


SELECT DIFFERENCE(’CONSTRAINT’,’CONSTANT’),
SOUNDEX(’CONSTRAINT’),
SOUNDEX(’CONSTANT’)
FROM SYSIBM.SYSDUMMY1;

This example returns the values 4, C523, and C523. Since the two strings return the same SOUNDEX value, the difference is 4 (the highest value possible). The more different-sounding the two strings are, the smaller the number would be.

Another interesting series of BIFs are focused on date and time data. These functions include EXTRACT, MONTHS_BETWEEN, and various new timestamp-related functions.

EXTRACT returns a portion of a date or timestamp. You can use EXTRACT to slice up a date/time value into its component pieces. Consider:

SELECT BIRTHDATE,
EXTRACT(DAY FROM BIRTHDATE) AS DAY,
EXTRACT(MONTH FROM BIRTHDATE) AS MONTH,
EXTRACT(YEAR FROM BIRTHDATE) AS YEAR
FROM DSN8910.EMP;

This query would return the entire date, along with each component (year, month, day) of the date as a separate column. You can use one function (EXTRACT) to do the job of the already-existing YEAR, MONTH, and DAY functions. Of course, YEAR, MONTH, and DAY are still available for your use. Similar functionality for EXTRACT exists for time components using HOUR, MINUTE, and SECOND.

The next temporal BIF introduced with DB2 9 for z/OS is the MONTHS_BETWEEN function. It returns an estimate of the number of months between two expressions. If the first expression represents a date that is later than the second, the result will be positive; if the opposite is true the result will be negative. The result is calculated based on a 31 day month.

For example, consider this statement:

SELECT MONTHS_BETWEEN ('2007-02-20','2007-01-17')
AS MONTHS_BETWEEN
FROM SYSIBM.SYSDUMMY1;

The result of this query would be 1.096774193548387.

We also get four new timestamp-related BIFs: TIMESTAMPADD, TIMESTAMPDIFF, TIMESTAMP_FORMAT, and TIMESTAMP_ISO. The first two are rather straightforward: TIMESTAMPADD adds an interval to a timestamp and TIMESTAMPDIFF subtracts two timestamps and returns an interval. I will not get into a discussion of date/time arithmetic here, but if you are interested check out my article Q+As on Dates and DB2.

What about the other two timestamp-related BIFs? TIMESTAMP_FORMAT offers the much-needed ability to choose different display formats for a timestamp value. The valid formats that can be specified are:
• ‘YYYY-MM-DD’
• ‘YYYY-MM-DD-HH24-MI-SS’
• ‘YYYY-MM-DD-HH24-MI-SS-NNNNNN’

And instead of using the dash ( - ) as the separator, you can also use . / , : ; and blank. These separators can be used in any combination. And the VARCHAR_FORMAT function returns a character representation of a timestamp in a format specified as above. So, consider the following query:

SELECT SUBSTR(NAME,1,8) AS TSNAME,
VARCHAR_FORMAT(CREATEDTS,'YYYY-MM-DD-HH24:MI:SS') AS TSCR
FROM SYSIBM.SYSTABLESPACE;
WHERE
CREATEDTS >=
TIMESTAMP_FORMAT('2007-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS');

This query will return the name of all table spaces created since the first of the year, along with its creation timestamp using the format specified.

We also get some new string manipulation functions. The LOCATE_IN_STRING function returns the starting position of the first occurrence of one string within another string. This is basically the same as the existing LOCATE function.

We also get LPAD and RPAD functions. The LPAD function returns a string that is padded on the left, with blanks (or a specific character). The LPAD function treats leading or trailing blanks as significant. RPAD, of course, does the same but on the right. So, consider the following example:

SELECT LPAD(LASTNAME, 30, ’.’ ) AS LAST,
RPAD(FIRSTNME, 30) AS FIRST
FROM DSN910.EMP;

This query will left pad the last name with periods and right pad the first name with blanks.

OVERLAY is yet another new string manipulation function. It allows you to return a string with portions of it overlaid by a specified value. You provide the string, a substring to be overlaid, and its starting point and length, and DB2 does the rest. Learning by example is simpler than trying to explain how it works, so here goes:

SELECT CHAR(OVERLAY('PLATELET','CEMEN',4,4,OCTETS),9),
CHAR(OVERLAY('INSERTING','IS',4,2,OCTETS),10),
FROM SYSIBM.SYSDUMMY1;

The results returned by this query would be:
• 'PLACEMENT ' (starting at position 4 overlay 4 bytes with 5 bytes 'CEMEN')
• 'INSISTING' (starting at position 4 overlay 2 bytes with 'IS')

Additional BIFs include RID, which returns the RID of a row, COLLATION_KEY which returns a varying-length binary string that represents the collation key of the expression in a named collation, DECRYPT_BINARY which adds the ability to decrypt the new BINARY and VARBINARY data types, and NORMALIZE_STRING which takes a Unicode string argument and returns a normalized string. And, of course, we also get scalar functions to support the new data types in DB2 9 (BIGINT, BINARY, VARBINARY and DECFLOAT).

Finally, DB2 9 for z/OS adds three new aggregate functions: CORRELATION, COVARIANCE, and COVARIANCE_SAMP.

The CORRELATION function returns the coefficient of correlation of a set of number pairs. And the COVARIANCE and COVARIANCE_SAMP functions return the (population) covariance of a set of number pairs.

So, as you can see, DB2 9 for z/OS continues adding new and useful functions to simplify our development efforts with DB2.