Tuesday, July 24, 2007
Why would you want to compress index data? Well, some types of applications require very large indexes on very large tables - - data warehousing applications are one good example. Sometimes, the storage required for indexes to support your data warehouse applications can exceed the storage required for the base table. So it makes sense that you might want to reduce the storage consumed by such indexes.
DB2 V9 introduces the COMPRESS parameter for indexes. You can specify COMPRESS YES (or NO) on your CREATE INDEX and ALTER INDEX statements. Index partitioning is done at the index level and cannot be performed on a partition by partition basis.
Additionally, DB2 will only compress the data in leaf pages, not in the root page and any non-leaf pages in between. This makes sense because you don’t want to incur the expense of decompressing all of these type of pages in your indexes in order to find the right leaf page range.
Index compression does not require a compression dictionary. As such, DB2 can begin to immediately compress data in the leaf pages of your newly created indexes.
Now think about what we’ve already learned for a minute. The data on the leaf page is compressed, but we will want to access it uncompressed, right? So index pages are stored on disk in a compressed format but will be expanded when read. So those 4K index pages on disk will require more than 4K when expanded. This means that compressed indexes must be defined in a larger buffer pool (8K, 16K, or 32K). Nevertheless, when you compress an index DB2 will always compress the data down into a 4K page size on disk no matter what page size you choose.
Another consideration to keep in mind is that index data is decompressed for index image copies, so a copy of an index will require more storage space than the actual index requires.
So, when you move to DB2 9 in NFM you have an additional compression decision to make: which indexes should be compressed and which should not? But it is a good thing to have more options at our disposal, especially for applications with huge indexing requirements.
Monday, July 16, 2007
OK, so how does it work? Basically, you will create a table with the exact same attributes as a table that already exists at the current server, except that it is empty of data. It is created using the ALTER TABLE SQL statement with the ADD CLONE parameter. This clone table is created in the same table space as the existing table. After creating the clone table you can do whatever you want to do with it. LOAD it, INSERT rows to it, etc.
The clone table will be structurally identical to the base table in every way: same number of columns, column names, data types, as well as have the same indexes, before triggers, LOB objects and so on. All of this is done auto-magically by DB2 when it the clone table is created. Authorizations and views on the base table, however, are not duplicated, or cloned.
So, the clone is created and it can be manipulated without impacting the base table. When the clone is ready to become the base table it is exchanged with the base table. This is accomplished using the new EXCHANGE SQL statement. After running an EXCHANGE the clone becomes the real table and the previously “real” table becomes the clone - - and you can repeat the process.
The clone and the base table are kept in different underlying VSAM linear data sets. We all should know that the data sets used by DB2 are named using the following format:
cat.DSNDBD.DBNAME.TSNAME.I0001.A001. Well, the clone uses the same name except for the next to last component. It will be I0002. When the exchange is made, DB2 flips the I0001 with the I0002.
Of course, there are rules and details you will need to know to properly deploy cloning in your organizations, but that is all documented in the manuals. The point of these blog entries is more to make you aware of new functionality and give you the basic flavor for how it works and what it is intended for…
With that in mind, this is intended to allow you to quickly replace one version of DB2 table data with another version, without impacting availability while the new version is built.
Thursday, July 12, 2007
As we continue our blog series on the new features and functionality of DB2 9 for z/OS, today we examine the four (OK, five) new data types introduced in this version of DB2.
First up, we have the BIGINT data type. A BIGINT is an exact numeric data type capable of representing 63-bit integers. This is the third integer data type now available to DB2 and it offers the ability to store the largest range of values:
- SMALLINT values can range from -32768 to 32767
- INTEGER values can range from -2147483648 to 2147483647
- BIGINT values can range from -9223372036854775808 to 9223372036854775807
So when you have the need to store very large integers you don’t have to use DECIMAL with a zero scale any longer.
BINARY and VARBINARY
Next up, and perhaps more exciting, V9 delivers a true binary data type for the first time in DB2, so we no longer have to use a BLOB or CHAR with FOR BIT DATA. BINARY is a fixed-length binary string up to 255 bytes. DB2 9 also delivers a VARBINARY data type, which is a variable-length binary string up to 32704 bytes.
BINARY and VARBINARY data types extend current support of binary strings (BLOB), and are compatible with the BLOB data type. They are not compatible with character string data types. IBM does make it somewhat easy to migrate existing columns defined as CHAR FOR BIT DATA or VARCHAR FOR BIT DATA over to BINARY or VARBINARY. If there is an index defined on the column, the index is placed in RBDP. You cannot alter BINARY or VARBINARY data types to CHAR FOR BIT DATA or VARCHAR FOR BIT DATA.
Also, there are some usage considerations to keep in mind. Two binary strings are equal only if the lengths are identical. If two strings are equal up to the length of the shorter string length, the shorter string is considered less than the longer string, even when the remaining bytes in the longer string are hex zeros.
The next new data type supported by DB2 9 for z/OS is DECFLOAT. V9 takes advantage of new System z9 hardware support delivering the DECFLOAT data type that lets you use decimal floating-point numbers with greater precision. A decimal floating-point value (DECFLOAT) is an IEEE 754r number with a decimal point.
The maximum precision is 34 digits and the range of a decimal floating point number is either 16 or 34 digits of precision.
DECFLOAT(16) values can range from a low of:
to a high of:
DECFLOAT(34) values can range from a low of:
to a high of:
Anyone think they’ll need bigger numbers?
In addition, the DECFLOAT data type is able to represent special values representing “non-number numbers”:
- Infinity - a value that represents a number whose magnitude is infinitely large.
- Quiet NaN - a value that represents undefined results which does not cause an invalid number condition. NaN is not a number.
- Signaling NaN - a value that represents undefined results which will cause an invalid number condition if used in any numerical operation.
And finally, we get pureXML support to store XML as a native data type in DB2. That means you can specify XML as a data type for columns in your DB2 tables in DB2 9 for z/OS. But I’m not really going to elaborate any further on XML here. A few more details can be found in an earlier post here.
Monday, July 09, 2007
So, what new function functionality do we get with DB2 9 for z/OS? First of all, we get some new ASCII and EBCDIC conversion functions. The ASCII_CHR function returns the character that has the ASCII code value that is specified by the argument.; and the ASCII_STR function returns a string, in the system ASCII CCSID that is an ASCII version of the string. Knowing that, I bet you can guess what EBCDIC_CHR and EBCDIC_STR do.
We also get UNICODE and UNICODE_STR functions in DB2 9 for z/OS. The UNICODE function returns the Unicode UTF-16 code value of the leftmost character of the argument as an integer. And the UNICODE_STR function? It returns a string in Unicode UTF-8 or UTF-16 (depending on the specified parameter) representing a Unicode encoding of the input string.
Perhaps more interesting is soundex support for testing whether two strings sound the same as each other. DB2 offers two functions here: SOUNDEX and DIFFERENCE. The SOUNDEX function returns a 4 character code that represents the sound of the words in the argument. The result can be used to compare with the sound of other strings. The data type of the result is CHAR(4). So what does this mean? Consider the following example:
WHERE SOUNDEX(LASTNAME) = SOUNDEX(’Smith’);
This query would return not only employees with a last name of “Smith,” but also anything that sounds like Smith, such as Smythe.
The DIFFERENCE function is related to SOUNDEX. It returns a value from 0 to 4 where the number represents the difference between the sounds of two strings based on applying the SOUNDEX function to the strings. The higher the value, the closer the two strings are to sounding alike. Consider:
This example returns the values 4, C523, and C523. Since the two strings return the same SOUNDEX value, the difference is 4 (the highest value possible). The more different-sounding the two strings are, the smaller the number would be.
Another interesting series of BIFs are focused on date and time data. These functions include EXTRACT, MONTHS_BETWEEN, and various new timestamp-related functions.
EXTRACT returns a portion of a date or timestamp. You can use EXTRACT to slice up a date/time value into its component pieces. Consider:
EXTRACT(DAY FROM BIRTHDATE) AS DAY,
EXTRACT(MONTH FROM BIRTHDATE) AS MONTH,
EXTRACT(YEAR FROM BIRTHDATE) AS YEAR
This query would return the entire date, along with each component (year, month, day) of the date as a separate column. You can use one function (EXTRACT) to do the job of the already-existing YEAR, MONTH, and DAY functions. Of course, YEAR, MONTH, and DAY are still available for your use. Similar functionality for EXTRACT exists for time components using HOUR, MINUTE, and SECOND.
The next temporal BIF introduced with DB2 9 for z/OS is the MONTHS_BETWEEN function. It returns an estimate of the number of months between two expressions. If the first expression represents a date that is later than the second, the result will be positive; if the opposite is true the result will be negative. The result is calculated based on a 31 day month.
For example, consider this statement:
SELECT MONTHS_BETWEEN ('2007-02-20','2007-01-17')
The result of this query would be 1.096774193548387.
We also get four new timestamp-related BIFs: TIMESTAMPADD, TIMESTAMPDIFF, TIMESTAMP_FORMAT, and TIMESTAMP_ISO. The first two are rather straightforward: TIMESTAMPADD adds an interval to a timestamp and TIMESTAMPDIFF subtracts two timestamps and returns an interval. I will not get into a discussion of date/time arithmetic here, but if you are interested check out my article Q+As on Dates and DB2.
What about the other two timestamp-related BIFs? TIMESTAMP_FORMAT offers the much-needed ability to choose different display formats for a timestamp value. The valid formats that can be specified are:
And instead of using the dash ( - ) as the separator, you can also use . / , : ; and blank. These separators can be used in any combination. And the VARCHAR_FORMAT function returns a character representation of a timestamp in a format specified as above. So, consider the following query:
SELECT SUBSTR(NAME,1,8) AS TSNAME,
VARCHAR_FORMAT(CREATEDTS,'YYYY-MM-DD-HH24:MI:SS') AS TSCR
TIMESTAMP_FORMAT('2007-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS');
This query will return the name of all table spaces created since the first of the year, along with its creation timestamp using the format specified.
We also get some new string manipulation functions. The LOCATE_IN_STRING function returns the starting position of the first occurrence of one string within another string. This is basically the same as the existing LOCATE function.
We also get LPAD and RPAD functions. The LPAD function returns a string that is padded on the left, with blanks (or a specific character). The LPAD function treats leading or trailing blanks as significant. RPAD, of course, does the same but on the right. So, consider the following example:
SELECT LPAD(LASTNAME, 30, ’.’ ) AS LAST,
RPAD(FIRSTNME, 30) AS FIRST
This query will left pad the last name with periods and right pad the first name with blanks.
OVERLAY is yet another new string manipulation function. It allows you to return a string with portions of it overlaid by a specified value. You provide the string, a substring to be overlaid, and its starting point and length, and DB2 does the rest. Learning by example is simpler than trying to explain how it works, so here goes:
The results returned by this query would be:
• 'PLACEMENT ' (starting at position 4 overlay 4 bytes with 5 bytes 'CEMEN')
• 'INSISTING' (starting at position 4 overlay 2 bytes with 'IS')
Additional BIFs include RID, which returns the RID of a row, COLLATION_KEY which returns a varying-length binary string that represents the collation key of the expression in a named collation, DECRYPT_BINARY which adds the ability to decrypt the new BINARY and VARBINARY data types, and NORMALIZE_STRING which takes a Unicode string argument and returns a normalized string. And, of course, we also get scalar functions to support the new data types in DB2 9 (BIGINT, BINARY, VARBINARY and DECFLOAT).
Finally, DB2 9 for z/OS adds three new aggregate functions: CORRELATION, COVARIANCE, and COVARIANCE_SAMP.
The CORRELATION function returns the coefficient of correlation of a set of number pairs. And the COVARIANCE and COVARIANCE_SAMP functions return the (population) covariance of a set of number pairs.
So, as you can see, DB2 9 for z/OS continues adding new and useful functions to simplify our development efforts with DB2.