Tuesday, July 09, 2013

DB2 Locking, Part 15: Tackling Timeout Troubles

Many shops battle with locking issues and frequently, the cause of performance issues can be traced back to locking issues, more specifically, lock timeout issues. When you experience a timeout, it means that another process holds a lock on the data that you are trying to modify.
You may experience timeout troubles as the dreaded -911 SQLCODE. Timeouts are different than most performance issues because the job or users receiving the -911 SQLCODE is usually not the source of the problem. Instead, the jobs causing the problem typically continue to run just fine, bringing about problems for everybody else!
The primary factors to examine when looking to resolve timeout issues are:
  • The lock size parameter that was specified at table space creation time.
  • The duration of the locks established by the BIND strategies. Lock duratiuon is determined by a combination of the ACQUIRE and RELEASE strategies for the table spaces and the ISOLATION parameters for the page locks.

  • The use of LOCK TABLE statements in any programs (See DB2 Locking Part 14).
  • The scope of the commit in the application code.
  • The manner in which the table space was started (RW or RO).
Most of the time, resource contention and timeouts are due to long-running programs that are not issuing COMMITs…or perhaps are not committing their work frequently enough. Issuing a COMMIT ends a unit of recovery and hardens all database modifications made during that unit of recovery.
So be sure to verify that all of your batch processes -- especially any that run concurrently with other workload (but really all batch process) -- have a COMMIT strategy. This means that your programs should issue a COMMIT after processing "a set number of" inserts, updates, and deletes. A COMMIT will tell DB2 to make the changes permanent and releases locks. A good rule of thumb is to strive for having no lock held for more than five seconds.
A reasonable approach is to set a counter that is incremented after every modification. Then, check it and when it exceeds a predefined threshold -- say 25 or 50 or 100 modifications -- then issue a COMMIT. You should make the threshold an input parameter so that you can change it as the workload in your system changes. For example, make it 25 when concurrent activity is high, but ramp it up to 100 or higher when it is low. Failure to issue COMMITs will result in timeouts, as well as possibly deadlocks and lock escalation.
For a more in-depth discussion on the importance of issuing COMMITs in your application programs review DB2 Locking Part 9.
Another practical approach for reducing lock duration is to make data modifications as close to the end of the unit of work as possible. Look at all of your programs, both batch and online, and try to save the data modification statements to as close to the COMMIT as you can. By saving the data modification until right before you issue a COMMIT, you reduce the overall average lock duration. This should result in reducing contention and therefore, the number of timeouts.
Some additional suggestions to maximize concurrency and reduce timeouts include:
  • Use WHERE CURRENT OF CURSOR to perform UPDATEs and DELETEs in your programs.
  • Ensure that you have created your table spaces with the appropriate LOCKSIZE (usually PAGE, but sometimes perhaps ROW).
  • If you have tables that are static, consider starting their table space as RO instead of RW. Doing so can enable DB2 to utilize table locking instead of page or row locking for those tables.
  • Limit the scope of SQL statements so that a statement locks 150 or fewer pages on a table for each unit of work that is performed.

Finally, if you want to investigate timeout details, be sure to examine the statistics trace class 3 and IFCID 0196 for timeouts (IFCID 0172 is for deadlocks).

No comments: