Monday, September 24, 2007

Histogram Statistics [DB2 9 for z/OS]

Another utility upgrade that has found its way into DB2 9 for z/OS is the ability to gather histogram statistics. This feature is already available in DB2 for Linux, Unix, and Windows… and after you migrate to DB2 V9 it will be available to you on z/OS.

What is it? Well, let’s first define what a histogram is for those of you who are not statistics experts. A histogram is a way of summarizing data that is measured on an interval scale. A histogram is particularly helpful to quickly highlight how data is distributed; to determine if data is symmetrical or skewed; and to indicate whether or not outliers exists.

The histogram is only appropriate for variables whose values are numerical and measured on an interval scale. It is generally used when dealing with large data sets. Histogram statistics can be quite useful to the optimizer for certain types of queries.

Instead of the frequency statistics, which are collected for only a subset of the data, sometimes DB2 can improve access path selection by estimating predicate selectivity from histogram statistics, which are collected over all values in a table space.

Consider collecting histogram statistics to improve access paths for troublesome queries with RANGE, LIKE, and BETWEEN predicates. They can also help in some cases for =, IS NULL, IN LIST and COL op COL predicates.

How to Collect Histogram Statistics

IBM RUNSTATS in DB2 V9 can collect statistics by quantiles. DB2 allows up to 100 quantiles. The user can specify how many quantiles DB2 is to use from 1 to 100. Of course, avoid 1 because it will not help.

You can tell RUNSTATS to collect histogram statistics by coding the HISTOGRAM keyword in conjunction with the COLGROUP option. In this way you can collect histogram statistics for a group of columns. You must also tell DB2 the number of quantiles to collect by specifying the NUMQUANTILES parameter. NUMQUANTILES can also be specified with the INDEX parameter, in which can it indicates that histogram statistics are to be collected for the columns of the index.

A single value can never broken into more than one interval. This means that the maximum number of intervals is equal to the number of distinct column values. Therefore, be sure that you do not specify a value for NUMQUANTILES that is greater than the total number of distinct values for the column (or column group) specified. Also, keep in mind that any NULLs will occupy a single interval.

So then, how do you decide on the number of quantiles to collect? If you do not specify NUMQUANTILES, the default value of 100 will be used, and then based on the number of records in the table, the number will be readjusted to an optimal number. Therefore, unless you have a good understanding of the application or a viable reason to deviate, a good rule of thumb is to simply let the NUMQUANTILES default and let DB2 work it out.

RUNSTATS will produce an equal-depth histogram. This means that each interval will have about the same number of rows. Please note that this does not mean the same number of values – it is the same number of rows. This means that in some cases a highly frequent single value could potentially occupy an interval all by itself.

The histogram statistics are collected in three new columns: QUANTILENO, LOWVALUE, and HIGHVALUE. These columns can be found in the following six DB2 Catalog tables:

  • SYSIBM.SYSCOLDIST
  • SYSIBM.SYSKEYTGTDIST
  • SYSIBM.SYSCOLDIST_HIST
  • SYSIBM.SYSCOLDISTSTATS
  • SYSIBM.SYSKEYTGTDIST_HIST
  • SYSIBM.SYSKEYTGTDISTSTATS.

Here is an example of a RUNSTATS to gather histogram statistics for the key columns of the indexes.:

RUNSTATS TABLESPACE DB07.CSMTS02
INDEX ALL
HISTOGRAM NUMCOLS 2 NUMQUANTILES 10
SHRLEVEL(CHANGE)
UPDATE ALL
REPORT YES

Summary

Histogram statistics is a very powerful new capability of the RUNSTATS utility that can be used to gather distribution statistics across all data values. These statistics can be helpful when you need additional distribution data to enable the optimizer to arrive at a better access path for certain queries/predicates.

Friday, September 07, 2007

COPY Improvements [DB2 9 for z/OS]

Next up in this sub-series on IBM utility improvements in the larger series on DB2 V9 features and functions we will take a look at the new “stuff” in the IBM COPY utility.

Copying Just the Pending Database Objects

The first nice new feature is a new parameter that can be used to make backup copies of only those database objects in a copy pending state. The new parameter is called SCOPE PENDING, and when you specify it the IBM COPY utility will only copy objects in copy pending or informational copy pending state.

The default is SCOPE ALL, which makes COPY act like it always acted; that is, without regard to the pending state.

This option can be used in conjunction with LISTDEF to build very powerful, selective backup jobs.

Copying Clone Tables

The IBM COPY utility also has been extended to support copying clone tables. If you are not sure what they are, follow the link to the blog entry in this series that covers that topic.

Anyway, the new CLONE parameter must be coded on the COPY specification in order for clone table and index data to be copied. It is important to understand that the CLONE parameter is basically an on/off toggle. If CLONE is coded, then COPY will copy only clone table (and/or index) data. You can specify both clone and regular database objects in the same COPY specification, but the IBM COPY utility will ignore non-clone objects when the CLONE parameter is specified; and it will ignore clone objects if the CLONE parameter is not specified.

CHECKPAGE

The CHECKPAGE parameter is not new to V9, but it is improved. Specifying CHECKPAGE indicates that the COPY utility should check each page in the table space or index for validity. If it finds an error, COPY issues a message that describes the type of error. Using CHECKPAGE is a good idea because it will identify problems in your page sets.

In previous releases of DB2, using CHECKPAGE with COPY was problematic because of high overhead and availability issues. In the DB2 9 for z/OS Technical Overview redbook, IBM reports that the CPU overhead is about 5 percent for COPY INDEX and 14 percent for COPY TABLESPACE if you are running V8. As for the availability issue, as soon as a validity problem is found on a page, the COPY utility will place the page set into COPY PENDING so it cannot be accessed while COPY chugs merrily along looking for additional invalid pages.

Both problems are addressed in DB2 9 for z/OS: IBM has significantly reduced the CPU overhead and fixed the availability issue. Instead of putting the page set into COPY PENDING immediately when it finds a problem, instead COPY will produce the message that it found a problem, but it will not set the pending state. Furthermore, it will continue checking pages but not copying anything.

Upon completion, COPY will issue a return code of 8 and it will update the SYSIBM.SYSCOPY catalog table indicating it found a broken page. This information is recording in a new column, TTYPE, with a value of “B”). This is important to understand because the page set will not be in a COPY PENDING state so you will have to check for the return code and fix the problem.

Tuesday, September 04, 2007

MODIFY RECOVERY [DB2 9 for z/OS]

Another IBM DB2 utility that has received some attention under Version 9 is MODIFY RECOVERY. For those not sure what this utility is, MODIFY RECOVERY is used to remove records from the SYSIBM.SYSCOPY table in the DB2 Catalog, along with related log records from the SYSIBM.SYSLGRNX directory table and related information from the DBD.

Before the new V9 changed, when you ran MODIFY RECOVERY you had to specify the deletion criteria in one of two ways:

  • Given a specific date, delete all recovery information before that data
  • Or given an age in days, delete all recovery information older than the age

But as of V9 we can alternately specify what is to be retained instead of what is to be deleted. This way, we tell the utility what we want to keep and it will remove the rest.

So, instead of coding the DELETE parameter (with a DATE or AGE) we would instead code a RETAIN parameter. The RETAIN parameter can specify one of five different options:

LAST (integer) – Using this option informs DB2 to choose a date such that the number of full image copies kept is not less than the integer specified. So, if we specify LAST(5), then we will delete all entries that exist prior to the date on which the fifth oldest copy was taken. If more than one copy was taken on that date the result could be that more than 5 copies are retained.

LOGLIMIT – Specifying this option will delete all copies where DB2 no longer has log records to recover forward from. DB2 uses the BSDS to establish the oldest archive log timestamp.

GDGLIMIT – You can use this option to enable the number of copies to be retained to be linked to the corresponding GDG base limit.

GDGLIMIT LAST (integer) – It is possible also to combine the GDGLIMIT and LAST options. DB2 will then use the GDG base limit, if the last primary copy is a GDG, if not it uses the integer specified.

GDGLIMIT LOGLIMIT – And finally, we can combine the GDGLIMIT and LOGLIMIT options, too. If the most recent primary full copy is a GDG then the GDG limit is used. If the last copy is not a GDG then the BSDS is used to determine the date prior to which deletions will occur.

The last two options (the ones that combine two parameters) are particularly useful if you have a mix of GDGs and non-GDGs for your image copies. They are also helpful if you use the LISTDEF option when you make image copy backups.

It is important to understand that for all these options MODIFY RECOVERY will cause DB2 to determine the most recent date that satisfies the retention requirement. After ascertaining the correct date to use, all entries before that data will be deleted. In effect DB2 chooses the most recent date that would satisfy the retention criteria. So, under some circumstances there may be more than one image copy on the specific date that is established, and therefore DB2 will keep those additional copies, too.

As an example, say we take 7 image copies on a single day. Then we run MODIFY RECOVERY … RETAIN LAST (3). In this case, all 7 will be retained as they were made on the same day, even though we indicated that we wanted to retain only the last 3.

Finally, there is one more significant change made to the MODIFY RECOVERY utility. Prior to V9, information was only deleted when copies were removed from SYSIBM.SYSCOPY. But as of V9, MODIFY RECOVERY will deletes SYSIBM.SYSLGRNX entries even if no SYSIBM.SYSCOPY records are deleted.

Why is this important? Well, some shops have migrated over to running BACKUP SYSTEM instead of individual image copies. In this case, there will be no individual image copies to remove in SYSIBM.SYSCOPY, but the SYSIBM.SYSLGRNX records will continue to grow. After migrating to V9 you can use MODIFY RECOVERY to trim the size of SYSIBM.SYSLGRNX which can improve performance for processes that need to access log range information.

Tuesday, August 28, 2007

TEMPLATE Switching [DB2 9 for z/OS]

Next up in this on-going series covering the new features and functionality of DB2 9 for z/OS we will tackle the improvements that have been made to the IBM DB2 utilities.

A nice new feature that impacts multiple utilities is TEMPLATE switching. For those not familiar with templating, TEMPLATE is a utility control statement that enables the allocation of data sets for a LISTDEF (LISTDEF is the way you control the list of database objects for a utility execution). With the TEMPLATE statement you essentially specify data set naming conventions and allocation information without using JCL DD statements.

OK, so what is TEMPLATE switching? What it enables you to do is to specify different characteristics for image copies of varying sizes. So, for example, you can create a different template for low and high volume image copies. Here is an example:



//SYSIN DD *
TEMPLATE low DSN &DB..&TS..IC.D&DA..T&TI.
UNIT=DASD LIMIT(50 CYL,high)

TEMPLATE high DSN &DB..&TS..IC.D&DA..T&TI.

UNIT=TAPE

COPY TABLESPACE MY.SMALLTS COPYDDN(low)

COPY TABLESPACE MY.LARGETS COPYDDN(low)



Note the new parameter named LIMIT. This is what controls when the TEMPLATE is switched. In this example we set the LIMIT for low volume at 50 cylinders. When this is reached the TEMPLATE will be switched to the high volume template. Of course, you have the flexibility to control the limit by specifying whatever number is appropriate for your site as well as specifying it in CYL, GB, or MB.

It is important to understand that DB2 can only switch the TEMPLATE once, so you cannot have more than two templates.

Template switching is available for image copies produced by COPY, COPYTOCOPY, MERGECOPY, LOAD, and REORG. And you can set up template switching for both the COPYDDN and RECOVERYDDN.

Wednesday, August 22, 2007

Optimistic Locking [DB2 9 for z/OS]

DB2 Version 9 improves support for coding optimistic locking techniques. What is optimistic locking? Sometimes referred to as optimistic concurrency control, optimistic locking is basically just what it sounds like. We are optimists and think that usually we will be the only ones with interest in the data. In other words, when optimistic locking is implemented you are assuming that most of the time there will be no other programs that are interested in the page of data that you are planning to modify.


Of course, even in the most optimistic world there will be exceptions, so optimistic locking does not assume that there will never be any concurrent processes that need to access your page(s). Basically, with optimistic locking you can improve performance by minimizing locking. So how do we do that?


When an application uses optimistic locking, locks are obtained immediately before a read operation and then released immediately. Update locks are obtained immediately before an update operation and held until the end of the transaction. Optimistic locking uses the RID (Record IDentifier) and a row change timestamp to test whether data has been changed by another transaction since the last read operation.


DB2 knows when a row was changed and so therefore he (I make DB2 masculine) can ensure data integrity even as he minimizes the duration of locks. With optimistic locking, DB2 releases the page (or row) locks immediately after a read operation. And if you are using row locks, DB2 releases the row lock after each FETCH, taking a new lock on a row only for a positioned update or a positioned delete.


Careful readers will have noticed that I talked about a “row change timestamp” but you may not have heard that expression before. DB2 V9 adds support for automatically generated timestamp columns and if you wish to implement optimistic locking you will need to create (or alter) your tables to have a row change timestamp column, defined as follows:


NOT NULL GENERATED ALWAYS
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP

or

NOT NULL GENERATED BY DEFAULT
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP

For tables having a row change timestamp column, DB2 will automatically populate and maintain the timestamp values for each row. Notice how the syntax is similar to the syntax used for other automatically generated DB2 values, such as sequences. DB2 will automatically generate the timestamp value for each row when the row is inserted, and modify the timestamp for each row when any column in that row is updated.

When you add a ROW CHANGE TIMESTAMP column to an existing table, the initial value for existing rows will not be immediately populated. Instead, DB2 places the table space in an advisory-REORG pending state. When you reorganize the table space, DB2 will generates the values for the ROW CHANGE TIMESTAMP column for all rows (and, of course, remove the advisory-REORG pending status).

Now that you have this new column you can use it as a condition for making an UPDATE and specify it in your WHERE clause. Let’s walk thru a couple of examples.

First of all, when a table contains a ROW CHANGE TIMESTAMP you can use it to find out when its rows were modified. Let’s use the following table as an example:

CREATE TABLE CUSTOMER
(CUSTNO CHAR(8) NOT NULL,
CUST_INFOCHANGE NOT NULL GENERATED ALWAYS
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP,
CUST_NAME VARCHAR(50),
CUST_ADDRESS VARCHAR(100),
CUST_CITY CHAR(20),
CUST_STATE CHAR(2),
CUST_ZIP CHAR(9),
CUST_PHONE CHAR(10),

PRIMARY KEY (CUSTNO)
);

Now that the table is defined with the ROW CHANGE TIMESTAMP we can use it in our programs and queries to determine change information about the data. For example, if we want to find all of the customer rows that were changed in the past week (ie. the last 7 days) we could run the following query:


SELECT CUSTNO, CUST_NAME

FROM CUSTOMER
WHERE ROW CHANGE TIMESTAMP FOR CUSTOMER <=
CURRENT TIMESTAMP AND
ROW CHANGE TIMESTAMP FOR CUSTOMER >=
CURRENT TIMESTAMP - 7 DAYS;

But what would happen if you issued a statement like this against a table that was altered to include a ROW CHANGE TIMESTAMP? For example, if we created the CUSTOMER table as shown but without the CUST_INFOCHANGE column, populated the table with data, and then altered the table to include the CUST_INFOCHANGE column? In this case, DB2 will use the time the page was last modified. So the results will not be exactly correct because it would return all the rows on each page that qualifies (because at least one row on the page changed). This is why it is important to clear up the advisory REORG pending as soon as possible after adding the ROW CHANGE TIMESTAMP.

OK, this is all well and good, and you can probably see the value of having this automagically changing timestamp in some of your tables, but where is the optimistic locking part? Well, for programs that use updateable static scrollable cursors DB2 can use optimistic locking as long as the plan/package is bound specifying ISOLATION(CS). If you have this situation, DB2 will deploy optimistic locking to reduce the duration of locks between consecutive FETCH operations and between fetch operations and subsequent positioned UPDATE or DELETE operations.

Remember, though, DB2 cannot use optimistic concurrency control for dynamic scrollable cursors. With dynamic scrollable cursors, the most recently fetched row or page from the base table remains locked to maintain position for a positioned UPDATE or DELETE.

Without optimistic locking, the lock taken at the first FETCH is held until the next FETCH. The lock taken at the last FETCH is held until COMMIT, ROLLBACK, or the end of transaction.

With optimistic locking, the scenario changes significantly. When the application requests a FETCH to position the cursor on a row, DB2 locks that row, executes the FETCH and releases the lock. When the application requests a positioned UPDATE or DELETE on the row, DB2 locks the row and then re-evaluates the predicate to ensure that the row still qualifies for the result table.

So, when you move to DB2 9 for z/OS you should evaluate your applications looking for programs that could take advantage of optimistic locking – and then add the ROW CHANGE TIMESTAMP to the appropriate tables.