Tuesday, July 09, 2013

DB2 Locking, Part 15: Tackling Timeout Troubles

Many shops battle with locking issues and frequently, the cause of performance issues can be traced back to locking issues, more specifically, lock timeout issues. When you experience a timeout, it means that another process holds a lock on the data that you are trying to modify.
You may experience timeout troubles as the dreaded -911 SQLCODE. Timeouts are different than most performance issues because the job or users receiving the -911 SQLCODE is usually not the source of the problem. Instead, the jobs causing the problem typically continue to run just fine, bringing about problems for everybody else!
The primary factors to examine when looking to resolve timeout issues are:
  • The lock size parameter that was specified at table space creation time.
  • The duration of the locks established by the BIND strategies. Lock duratiuon is determined by a combination of the ACQUIRE and RELEASE strategies for the table spaces and the ISOLATION parameters for the page locks.

  • The use of LOCK TABLE statements in any programs (See DB2 Locking Part 14).
  • The scope of the commit in the application code.
  • The manner in which the table space was started (RW or RO).
Most of the time, resource contention and timeouts are due to long-running programs that are not issuing COMMITs…or perhaps are not committing their work frequently enough. Issuing a COMMIT ends a unit of recovery and hardens all database modifications made during that unit of recovery.
So be sure to verify that all of your batch processes -- especially any that run concurrently with other workload (but really all batch process) -- have a COMMIT strategy. This means that your programs should issue a COMMIT after processing "a set number of" inserts, updates, and deletes. A COMMIT will tell DB2 to make the changes permanent and releases locks. A good rule of thumb is to strive for having no lock held for more than five seconds.
A reasonable approach is to set a counter that is incremented after every modification. Then, check it and when it exceeds a predefined threshold -- say 25 or 50 or 100 modifications -- then issue a COMMIT. You should make the threshold an input parameter so that you can change it as the workload in your system changes. For example, make it 25 when concurrent activity is high, but ramp it up to 100 or higher when it is low. Failure to issue COMMITs will result in timeouts, as well as possibly deadlocks and lock escalation.
For a more in-depth discussion on the importance of issuing COMMITs in your application programs review DB2 Locking Part 9.
Another practical approach for reducing lock duration is to make data modifications as close to the end of the unit of work as possible. Look at all of your programs, both batch and online, and try to save the data modification statements to as close to the COMMIT as you can. By saving the data modification until right before you issue a COMMIT, you reduce the overall average lock duration. This should result in reducing contention and therefore, the number of timeouts.
Some additional suggestions to maximize concurrency and reduce timeouts include:
  • Use WHERE CURRENT OF CURSOR to perform UPDATEs and DELETEs in your programs.
  • Ensure that you have created your table spaces with the appropriate LOCKSIZE (usually PAGE, but sometimes perhaps ROW).
  • If you have tables that are static, consider starting their table space as RO instead of RW. Doing so can enable DB2 to utilize table locking instead of page or row locking for those tables.
  • Limit the scope of SQL statements so that a statement locks 150 or fewer pages on a table for each unit of work that is performed.

Finally, if you want to investigate timeout details, be sure to examine the statistics trace class 3 and IFCID 0196 for timeouts (IFCID 0172 is for deadlocks).

Saturday, July 06, 2013

DB2 Locking, Part 14: Using the LOCK TABLE Statement

We continue the series on DB2 locking today with a look at the SQL statement: LOCK TABLE .

You can issue the LOCK TABLE statement in your application programs to raise the lock granularity to the table (actually, table space) level. Doing so means that you will not need to take lower level locks (at the page or row level, whichever is in effect for the table space in question). Issuing a LOCK TABLE in a program can make sense in certain, specific circumstances. Consider using the LOCK TABLE statement to control the efficiency of locking in programs that will issue many page/row lock requests when there are no concurrent requests for the same data. Issuing a LOCK TABLE, at times, can be a reasonable alternative to using an ISOLATION level of RR or RS when a large percentage of a table's rows or pages will be modified.

If your environment can withstand the concurrency hit caused by LOCK TABLE you can gain performance by reducing locking activity. Taking a larger granularity lock at the table(space) level instead of multiple page or row locks will improve the performance of your program, albeit at the expense of concurrent activity to the data. If you wish to avoid modifying the program you can START the table space is read only mode (RO) and achieve similar results (no locks are taken because the data cannot be modified). This can be accomplished using a command like so:

-START DATABASE(DBNAME) SPACENAM(SPNAME) ACCESS(RO)

Of course, if you go that route you will have to make sure that you restart the table space for read write activity (RW) after the program finishes. This can be unwieldy to implement.

If, instead, you wish to use the LOCK TABLE approach, there are two types of LOCK TABLE requests. The LOCK TABLE...IN SHARE MODE statement acquires an S-lock on the table specified in the statement. This locking strategy effectively eliminates the possibility of concurrent modification programs running while the LOCK TABLE is in effect. The S-lock is obtained on the table space for tables contained in non-segmented table spaces. This is important to understand, especially if you have multi-table table spaces.

The LOCK TABLE...IN EXCLUSIVE MODE statement acquires an X-lock on the table specified in the statement. All concurrent processing is suspended until the X-lock is released. Again, for non-seqmented table spaces, the X-lock is obtained on the table space not the table.

In both cases, you can specify the PART parameter to indicate that only a specific partition is to be locked. For example, to lock only the third partition of a partitioned table space, you can issue: LOCK TABLE...PART 3 IN EXCLUSIVE MODE.
The table locks acquired as a result of the LOCK TABLE statement are held until the next COMMIT point unless RELEASE(DEALLOCATE) was specified for the plan issuing the LOCK TABLE statement. In that situation, the lock is held until the program terminates. That means, for RELEASE(COMMIT) programs, you will need to issue the LOCK TABLE again after each COMMIT or processing will revert to page/row locking.

Also, keep in mind that the lock will not take effect until the statement executes, even if ACQUIRE(ALLOCATE) was coded at BIND time. 

Monday, July 01, 2013

DB2 Locking, Part 13: Optimistic Locking

Continuing our series on DB2 locking, let's look into a relatively recent development -- optimistic locking...

IBM added improvements for optimistic locking techniques in DB2 9 for z/OS.  What is optimistic locking? Sometimes referred to as optimistic concurrency control, optimistic locking is basically just what it sounds like. We are optimists and think that usually we will be the only ones with interest in the data. In other words, when optimistic locking is implemented you are assuming that most of the time there will be no other programs that are interested in the page of data that you are planning to modify.

Of course, even in the most optimistic world there will be exceptions, so optimistic locking does not assume that there will never be any concurrent processes that need to access your page(s). Basically, with optimistic locking you can improve performance by minimizing locking. So how do we do that?

When an application uses optimistic locking, locks are obtained immediately before a read operation and then released immediately. Update locks are obtained immediately before an update operation and held until the end of the transaction. Optimistic locking uses the RID (Record IDentifier) and a row change timestamp to test whether data has been changed by another transaction since the last read operation.

DB2 knows when a row was changed and so therefore he (I always tend to make DB2 masculine, sorry ladies) can ensure data integrity even as he minimizes the duration of locks. With optimistic locking, DB2 releases the page (or row) locks immediately after a read operation. And if you are using row locks, DB2 releases the row lock after each FETCH, taking a new lock on a row only for a positioned update or a positioned delete.

Careful readers will have noticed that I talked about a “row change timestamp” but you may not have heard that expression before. DB2 V9 added support for automatically generated timestamp columns and if you wish to implement optimistic locking you will need to create (or alter) your tables to have a row change timestamp column, defined as follows:

NOT NULL GENERATED ALWAYS
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP

or

NOT NULL GENERATED BY DEFAULT
FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP

For tables having a row change timestamp column, DB2 automatically populates and maintains the timestamp values for each row. Notice how the syntax is similar to the syntax used for other automatically generated DB2 values, such as sequences. DB2 will automatically generate the timestamp value for each row when the row is inserted, and modify the timestamp for each row when any column in that row is updated.

When you add a ROW CHANGE TIMESTAMP column to an existing table, the initial value for existing rows will not be immediately populated. Instead, DB2 places the table space in an advisory-REORG pending state. When you reorganize the table space, DB2 will generates the values for the ROW CHANGE TIMESTAMP column for all rows (and, of course, remove the advisory-REORG pending status).

OK, but how does this implement optimistic locking? Well, you can use this new column as a condition for making an UPDATE, by specifying it in your WHERE clause. Let’s walk thru a couple of examples.

First of all, when a table contains a ROW CHANGE TIMESTAMP you can use it to find out when its rows were modified. Let’s use the following table as an example:

CREATE TABLE CUSTOMER
 (CUSTNO           CHAR(8)   NOT NULL,
  CUST_INFOCHANGE  NOT NULL GENERATED ALWAYS
                   FOR EACH ROW ON UPDATE
                   AS ROW CHANGE TIMESTAMP,
  CUST_NAME        VARCHAR(50),
  CUST_ADDRESS     VARCHAR(100),
  CUST_CITY        CHAR(20),
  CUST_STATE       CHAR(2),
  CUST_ZIP         CHAR(9),
  CUST_PHONE       CHAR(10),

  PRIMARY KEY (CUSTNO))

Now that the table is defined with the ROW CHANGE TIMESTAMP we can use it in our programs and queries to determine change information about the data. For example, if we want to find all of the customer rows that were changed in the past week (ie. the last 7 days) we could run the following query:

SELECT CUSTNO, CUST_NAME
FROM   CUSTOMER
WHERE  ROW CHANGE TIMESTAMP FOR CUSTOMER <=
       CURRENT TIMESTAMP
AND    ROW CHANGE TIMESTAMP FOR CUSTOMER >=
       CURRENT TIMESTAMP - 7 DAYS;

But what would happen if you issued a statement like this against a table that was altered to include a ROW CHANGE TIMESTAMP? For example, if we created the CUSTOMER table as shown but without the CUST_INFOCHANGE column, populated the table with data, and then altered the table to include the CUST_INFOCHANGE column? In this case, DB2 will use the time the page was last modified. So the results will not be exactly correct because it would return all the rows on each page that qualifies (because at least one row on the page changed). This is why it is important to clear up the advisory REORG pending as soon as possible after adding the ROW CHANGE TIMESTAMP.

This is all well and good, and you can probably see the value of having this automagically changing timestamp in some of your tables, but where is the optimistic locking part? Well, for programs that use updateable static scrollable cursors DB2 can use optimistic locking as long as the program is bound specifying ISOLATION(CS). If you have this situation, DB2 will deploy optimistic locking to reduce the duration of locks between consecutive FETCH operations and between fetch operations and subsequent positioned UPDATE or DELETE operations.

Without optimistic locking, the lock taken at the first FETCH is held until the next FETCH. The lock taken at the last FETCH is held until COMMIT, ROLLBACK, or the end of transaction.

With optimistic locking, the scenario changes significantly. When the application requests a FETCH to position the cursor on a row, DB2 locks that row, executes the FETCH and releases the lock. When the application requests a positioned UPDATE or DELETE on the row, DB2 locks the row and then re-evaluates the predicate to ensure that the row still qualifies for the result table.

Optimistic locking itself will not happen without some effort on your part. Your application must have a ROW CHANGE TIMESTAMP and it must be selected first. Then, during a modification, a predicate needs to be added as a condition to tell whether the row has been modified or not. The static scrollable cursor uses the optimistic locking technique automatically. DB2 cannot use optimistic concurrency control for dynamic scrollable cursors. With dynamic scrollable cursors, the most recently fetched row or page from the base table remains locked to maintain position for a positioned UPDATE or DELETE.


So, if you have not taken a look at which applications might benefit from optimistic locking techniques since your shop migrated to you move to DB2 9 for z/OS it is time to take a look at what applications could take advantage of optimistic locking – and then add the ROW CHANGE TIMESTAMP to the appropriate tables.

Wednesday, June 19, 2013

DB2 Locking, Part 12: Lock Promotion and Escalation

It can be potentially troublesome when DB2 lock promotion and escalation occur. These situations can greatly reduce the availability of concurrent access to data.

First let's look at lock promotion. When binding a program with an ISOLATION level of RR, the optimizer sometimes decides that table space locks will perform better than page locks. As such, the optimizer promotes the locking level to table space locking, regardless of the LOCKSIZE specified in the DDL. This process is called lock promotion.

When you set the LOCKSIZE bind parameter to ANY, DB2 processing begins with page-level locking. As processing continues and locks are acquired, however, DB2 might decide that too many page (or row) locks have been acquired, causing inefficient processing.

In this scenario, DB2 escalates the level of locking from page (or row) locks to table or table space locks—a procedure called lock escalation. The threshold governing when lock escalation occurs is set in one of two ways:

  • The DSNZPARM start-up parameters for DB2 
  • The LOCKMAX parameter of the CREATE or ALTER TABLESPACE statement (which is stored in the MAXROWS column of SYSIBM.SYSTABLESPACE)

Lock escalation applies to objects defined with LOCKSIZE ANY in the DDL.

There are some limitations on lock escalation, though. A table lock can never be escalated to a table space lock. Table space locks are the highest level of locking and, therefore, cannot be escalated. Furthermore, a row lock will not be escalated to a page lock. It will be escalated to a table space lock.

User lock escalation occurs if a single user accumulates more page locks than are allowed by the DB2 subsystem (as set in DSNZPARMs), the program is informed via a -904 SQLCODE. The program can either issue a ROLLBACK and produce a message indicating that the program should be modified to COMMIT more frequently or, alternately, escalate the locking strategy itself by explicitly issuing a LOCK TABLE statement within the code.

Prior to implementing the second approach, be sure to understand the ramifications of issuing the  LOCK TABLE statement and how it works.

Locking DSNZPARMs

There are two  DSNZPARM parameters that are used to govern DB2 locking and lock escalation: NUMLKTS and NUMLKUS.

NUMLKTS defines the threshold for the number of page locks that can be concurrently held for any one table space by any single DB2 application (thread). When the threshold is reached, DB2 escalates all page locks for objects defined as LOCKSIZE ANY according to the following rules:

  • All page locks held for data in segmented table spaces are escalated to table locks.
  • All page locks held for data in partitioned table spaces are escalated to table space locks.

NUMLKUS defines the threshold for the total number of page locks across all table spaces that can be concurrently held by a single DB2 application. When any given application attempts to acquire a lock that would cause the application to surpass the NUMLKUS threshold, the application receives a resource unavailable message (SQLCODE of -904).

TABLESPACE DDL Locking Parameters

In general, letting DB2 handle the level of locking required can be a fine strategy. Turning over the determination for lock size to DB2 requires setting LOCKSIZE ANY. Of course, you might have a compelling reason to use a specific LOCKSIZE. For example, you might choose to specify LOCKSIZE PAGE to explicitly direct DB2 to lock at the page level. Or, under certain conditions you might choose LOCKSIZE ROW to implement row-level locking.

The LOCKMAX parameter specifies the maximum number of page or row locks that any one process can hold at any one time for the table space. When the threshold is reached, the page or row locks are escalated to a table or table space lock. The LOCKMAX parameter is similar to the NUMLKTS parameter, but for a single table space only.

Monday, June 17, 2013

DB2 Locking, Part 11: Data Sharing Global Lock Management

Data sharing adds an additional level of complexity to the DB2 locking strategies and techniques we have been discussing in this series. Because data sharing group members can access any object from any member in the group, a global locking mechanism is required. It is handled by the lock structure defined in the coupling facility. The lock structure is charged with managing inter-member locking. Without a global lock management process, data integrity problems could occur when one member attempts to read (or change) data that is in the process of being changed by another member.

Data sharing groups utilize a global locking mechanism to preserve the integrity of the shared data. The global locking mechanism allows locks to be recognized between members.

Global Locking

All members of a data sharing group must be aware of locks that are held or requested by the other members. The DB2 data sharing group utilizes the coupling facility (CF) to establish and administer global locks.

The IRLM performs locking within each member DB2 subsystem. Additionally, the IRLM communicates with the coupling facility to establish global locks. Each member of the data sharing group communicates lock requests to the coupling facility’s lock structure. The manner in which a transaction takes locks during execution does not change. The only difference is that, instead of being local locks, the locks being taken are global in nature.

DB2 data sharing does not use message passing to perform global locking. The members DB2 IRLMs use the coupling facility to do global locking. Contention can be identified quickly without having to suspend the tasks to send messages around to the other DB2 members contained in the data sharing group. The following list outlines the events that occur when transactions from different DB2 members try to access the same piece of data:


  1. TXN1 requests a lock that is handled by the local IRLM.
  2. The local IRLM passes the request to the coupling facility global lock structures to ensure that no other members have incompatible locks. No incompatible locks are found, so the lock is taken.
  3. TXN2 requests a lock that is handled by its local IRLM. The lock is for the same data held by TXN1 executing in a different DB2 subsystem.
  4. Once again, the local IRLM passes the request to the coupling facility global lock structures to check for lock compatibility. In this case, an incompatible lock is found, so the lock request cannot be granted. The task is suspended.
  5. Eventually, TXN1 executes a COMMIT, which releases all local and global locks.
  6. TXN2 now can successfully execute the lock and continue processing.


Data Sharing Locking Considerations 

Consider specifying TRACKMOD NO for objects used in a data sharing environment to avoid locking problems. In addition, consider the MEMBER CLUSTER option. The MEMBER CLUSTER option indicates that DB2 should locate table space data based on available space rather than clustering the data by the clustering index. This option can benefit applications when there are many inserts to the same table from multiple members.

You might also improve performance by randomizing index key columns to reduce hot spots. This can be accomplished using the RANDOM keyword of CREATE INDEX.

The coupling facility level (CFLEVEL) can also be a consideration. DB2 prefetch processing for GBP-dependent page sets and partitions varies depending on the CFLEVEL in which the group buffer pool is allocated.

If the group buffer pool is allocated in a coupling facility with CFLEVEL=0 or 1, DB2 reads and registers one page at a time in the group buffer pool. If the group buffer pool is allocated in a coupling facility with CFLEVEL=2 or higher, DB2 can register the entire list of pages prefetched with one request to the coupling facility.

You can determine the CFLEVEL of your coupling facility using the DISPLAY GROUP command.

Lock Structures

The coupling facility contains several lock structures that are used for global locking purposes. The lock lists contain names of modified resources. This information is used to notify members of the data sharing group that the various resources have been changed.

Additionally, a hash table is used to identify compatible and incompatible lock modes. If the same hash value is used for the same resource name from different systems (with incompatible lock modes), lock contention will occur. If the same hash value is used for different resource names (called a hashing collision), false contention will occur. Any contention requires additional asynchronous processing to occur.

Hierarchical Locking

DB2 data sharing introduces the concept of explicit hierarchical locking to reduce global locking overhead (which increases global locking performance). Explicit hierarchical locking allows data sharing to differentiate between global and local locks. When no inter-DB2 interest occurs in a resource, the local IRLM can grant locks locally on the resources that are lower in the hierarchy. This feature allows the local DB2 to obtain local locks on pages or rows for that table space without notifying the coupling facility. In a data sharing environment, locks on the top parents are always propagated to the coupling facility lock structures. (These structures are detailed on the previous page.) In addition, the local DB2 propagates locks on children, depending on the compatibility of the maximum lock held on a table space that also has other members of the DB2 data sharing group requesting locks on it.

P-Locks Versus L-Locks

DB2 data sharing introduces two new lock identifiers: P-locks and L-locks.

P-locks preserve inter-DB2 coherency of buffered pages. P-locks are owned by the member DB2 subsystem and are used for physical resources such as page sets. These physical resources can be either data objects or index objects. P-locks are held for the length of time the pages are locally cached in the local buffer pool. As such, data can be cached beyond a transaction commit point.

P-locks are negotiable. If multiple DB2 members hold incompatible P-locks, the IRLMs try to downgrade lock compatibility. P-locks are never timed out. Because P-locks are not owned by transactions, they cannot be deadlocked. The sole job of a P-lock is to ensure inter-DB2 coherency. P-locks notify the data sharing group that a member of that group is performing work on that resource. This way, the coupling facility can become involved and begin treating the resources globally.

L-locks are used for both intra- and inter-DB2 concurrency between transactions. L-locks can either be local or global in scope. L-locks are owned by transactions and are held for COMMIT or allocation duration. L-locks are not negotiable and, as such, must wait for incompatible L-locks held by other DB2 members to be released before they can be taken. Suspended L-locks can be timed out by the IRLM.