Showing posts with label COMMIT. Show all posts
Showing posts with label COMMIT. Show all posts

Tuesday, July 08, 2014

DB2 Application Performance Management

Assuring optimal performance for database applications can be a tricky thing. In today's blog I ruminate on the high-level issues involved in optimizing your DB2 for z/OS applications.

Applications that access databases are only as good as the performance they achieve. And every user wants their software to run as fast as possible. As such, performance tuning and management is one of the biggest demands on the DBA’s time. When asked what is the single most important or stressful aspect of their job, DBAs typically respond "assuring optimal performance."  Indeed, a Forrester Research survey indicates that performance and troubleshooting tops the list of most challenging DBA tasks.

But when you are dealing with data in a database management system there are multiple interacting components that must be managed and tuned to achieve optimal performance. That is, every database application, at its core, requires three components to operate:
·  the system (that is, the DBMS itself, the network, and the O/S),
·  the database (that is, the DDL and database schema), and
·  the application (that is, the SQL and program logic).

Each of these components requires care and attention, but today I want to focus on the high-level aspects of performance management from the perspective of the application. Furthermore, I will discuss this in terms of DB2 for z/OS.

So where do we begin? For DB2, a logical starting point is with BIND Parameters. There are many parameters and values that must be chosen from and specified when you bind a DB2 application program. The vast array of options at our disposal can render the whole process extremely confusing -- especially if you don’t bind on a daily basis. And even if you do, some of the options still might be confusing if you rarely have to change them. You know what I’m talking about, parameters like ACQUIRE, RELEASE, VALIDATE, and DEGREE.

I will not delve into the myriad bind options and give you advice on which to use when. There are many articles and books, as well as the IBM DB2 manuals that you can use to guide you along that path. Suffice it to say, that there are some standard parameters and values that should be chosen most of the time in most situations. As such, a wise DBA group will set up canned routines for the programmers to use for compiling and binding their applications. Choices such as: CICS transaction, DB2 batch, and BI/analytical query can be presented to the developer and then, based on which of the various types of programs and environments that are available, the canned script can choose the proper bind options. Doing so can greatly diminish the problems that can be encountered when the "wrong" parameters or values are chosen at Bind time.

Before concluding this short section on Bind parameters I want to give one important piece of advice: In production, always Bind your plans and packages specifying EXPLAIN YES. Failing to do so means that access paths will be generated, but you will not know what they are. This is akin to blinding yourself to what DB2 is doing and it makes application performance tuning much more difficult.

Access Path Management

Bind and Rebind are important components to achieve optimal DB2 application performance. This is so because these commands are what determine the access paths to the data requested by your program. So it is vitally important that you create a strategy for when and how to Rebind your programs. There are several common approaches. The best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change. More on this in a moment.

Other approaches include Rebinding only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to Rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data รข€“ but it still can pose significant administrative issues.

The final approach -- the worst of the bunch -- comes from the if it ain’t broke don’t fix it school of thought. Basically, it boils down to (almost) never rebinding your programs. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Oh, the possibility of degraded performance is real and that is why this approach has been adopted by some. And it can be difficult to find which statements may have degraded after a Rebind. The ideal situation would allow us to review the access path changes before hand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions.
By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to DB2 (new releases/PTFs) optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.
To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need 4 R’s:

  1. RUNSTATS or preferably, RTS
  2. REORG
  3. RUNSTATS
  4. REBIND

But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another R to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don't call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.

So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal DB2 application performance.

Tuning the Code

Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with -- and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.

SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:
  • Let SQL do the work instead of the program. For example, code an SQL join instead of two cursors using program logic to join.
  • Simpler is generally better, but complex SQL can be very efficient.
  • Retrieve only the columns required, never more.
  • Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate.
  • When joining tables, always provide join predicates. In other words, avoid Cartesian products.
  • Favor using Stage 1 and Indexable predicates.
  • But favor Stage 2 predicates over application logic.
  • Avoid sorting (if possible) by creating indexes for ORDER BY and GROUP BY operations.
  • Avoid black boxes -- that is, avoid I/O routines that are called by programs instead of using embedded SQL.
  • Minimize deadlocks by updating tables in the same sequence in every program.
  • Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
  • Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.

Even if you follow the guidelines in this bulleted list, there will still be numerous opportunities for you to tune SQL for performance. To tune SQL you must be able to interpret the output of the access paths produced by EXPLAIN. This information is encoded in the plan tables. IBM offers Data Studio (as a free download) with a visual explain capability that can simplify this process. But you will also have to accumulate experience as to which SQL formulations work more efficiently than others. This skill will come with time and on-the-job learning.

Finally, some attention must be paid to the host language code. Host language code refers to the application programs written in C, COBOL, Java, Visual Basic or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem.

Bottom Line

Although DBAs must understand all three aspects of database performance management concentrating on the application aspects of performance will most likely provide the most bang-for-the-buck. Of course, we have only touched the tip of the DB2 application performance iceberg today. But even this high-level view into application performance can serve as a nice starting place for tuning your DB2 applications.


Good luck with DB2 for z/OS and happy performance tuning! 

Friday, September 06, 2013

Top Ten Most Pervasive Myths About DB2 for z/OS

Today's blog offers up yet another Top Ten list for DB2 users, perusers, and abusers... This time counting down the most common myths that are perpetrated "out there" regarding DB2 and how it works (or doesn't work)...


1.Use Views to Insulate Programs from Change

              This lie has been told for almost as long as DB2 has been around. I first wrote about this way back in 1991 for Database Programming & Design. Check that article out here if you don't understand why this is a bad idea, in general. 

2.Locking Problems Indicate a Database Problem

              Locking problems are generally caused by bad program design. You should write code to reduce the duration of locks and to COMMIT regularly... and then locking won't be a problem, for the most part.

3.Primary Key is Usually a Good Choice for Clustering

              Actually, the foreign key is likely to be a better choice. When you join PK-->FK there will be one PK to multiple FK. Wouldn't it be best if the FKs were clustered on the same page (or pages)?

4.Just Using the Defaults Should Work Out Well

              Don't rely on defaults. Many of them are outdated or wrong... and even if they aren't it will be better if you review and investigate all options before explicitly specifying the parameter value you want. 

5.Programmers Don’t Need to Know How to Tune SQL

              Programming performance-oriented SQL into your programs from the beginning would go a long way toward improving performance overall... and reducing the length of the application development lifecycle. 

6.Black Boxes Work Well for Performance

              No they don't!!!

7.Using NULLs Can Save Space

              No they can't!!!

8.RUNSTATS Aren’t That Important

              If you don't work with up-to-date RUNSTATS then you are hobbling the DB2 Optimizer and almost assuredly getting sub-optimal access paths for your DB2 SQL. 


9.DB2 is a Hog

              If you don't use it properly, every piece of software can become a resource hog. If you acquire the knowledge on how to work properly with DB2 then it will hum along like a well-oiled machine!

10.It Depends!

              This is the answer that can be used for every DB2 question. But if that answer is not followed up with what "it" depends upon, then it is a useless answer... and whoever gave you that answer is probably just trying to get rid of you instead of helping you with your problems.

Wednesday, May 22, 2013

DB2 Locking Part 7: Lock Avoidance, Related Issues, and Stuff


In today's blog entry, part 7 in our on-going series on DB2 locking, we will take a look at lock avoidance... as well as some other related things.

Lock Avoidance
Lock avoidance is a mechanism employed by DB2 for z/OS to access data without locking but also while maintaining data integrity. It prohibits access to uncommitted data and serializes access to pages. Lock avoidance improves performance by reducing the overall volume of lock requests. Let’s face it, the most efficient lock is the one never taken.

In general, DB2 avoids locking data pages if it can determine that the data to be accessed is committed and that no semantics are violated by not acquiring the lock. DB2 avoids locks by examining the log to verify the committed state of the data.

When determining if lock avoidance techniques will be practical, DB2 first scans the page to be accessed to determine whether any rows qualify. If none qualify, a lock is not required.

For each data page to be accessed, the RBA of the last page update (stored in the data page header) is compared with the log RBA for the oldest active unit of recovery. This RBA is called the Commit Log Sequence Number, or CLSN. If the CLSN is greater than the last page update RBA (or LRSN in a data sharing environment), the data on the page has been committed and the page lock can be avoided.

Additionally, a bit is stored in the record header for each row on the page. The bit is called the Possibly UNCommitted, or PUNC, bit. The PUNC bit indicates whether update activity has been performed on the row. The PUNC bit is checked if the CLSN test fails. For each qualifying row on the page, the PUNC bit is checked to see whether it is off. This indicates that the row has not been updated since the last time the bit was turned off. Therefore, locking can be avoided.

Note
IBM provides no external method for you to determine whether the PUNC bit is on or off for each row. Therefore, you should ensure that any table that can be modified should be reorganized on a regularly scheduled basis.

If neither CLSN nor PUNC bit testing indicates that a lock can be avoided, DB2 acquires the requisite lock.

In addition to enhancing performance, lock avoidance increases data availability. Data that in previous releases would have been considered locked, and therefore unavailable, is now considered accessible.

When Lock Avoidance Can Occur

Lock avoidance can be used only for data pages. Further, DB2 Catalog and DB2 Directory access does not use lock avoidance techniques. You can avoid locks under the following circumstances:

  • For any pages accessed by read-only or ambiguous queries bound with ISOLATION(CS) and CURRENTDATA NO
  • For any unqualified rows accessed by queries bound with ISOLATION(CS) or ISOLATION(RS)
  • When DB2 system-managed referential integrity checks for dependent rows caused by either the primary key being updated, or the parent row being deleted and the DELETE RESTRICT rule is in effect
  • For both COPY and RUNSTATS when SHRLEVEL(CHANGE) is specified
An ambiguous cursor is one where DB2 cannot determine whether there is intent to modify data retrieved using that cursor. The cursor is ambiguous if it is in a plan or package containing either PREPARE or EXECUTE IMMEDIATE SQL statements, along with the following conditions: the cursor is not defined with the FOR READ ONLY clause or the FOR UPDATE OF clause; the cursor is not defined on a read-only result table; the cursor is not the target of a WHERE CURRENT clause on an UPDATE or DELETE statement.

Skipping Locked Rows

Although not really a part of lock avoidance, you can indeed avoid locking data using the capability added with DB2 V9 that allows for a transaction to skip over rows that are locked using the SKIP LOCKED DATA option. SKIP LOCKED DATA can be coded on the following SQL statements:

  • SELECT
  • SELECT INTO
  • PREPARE
  • searched UPDATE
  • searched DELETE
You can also use the SKIP LOCKED DATA option with the UNLOAD utility.

When you tell DB2 to skip locked data, then that data is not accessed and your program will not have it available. DB2 just skips over any locked data instead of waiting for it to be unlocked. The benefit, of course, is improved performance because you will not incur any lock wait time; however, it comes at the cost of not accessing the locked data. This means that you should utilize this clause only when your program can tolerate skipping over some data.

The SKIP LOCKED DATA option is compatible with cursor stability (CS) isolation and read stability (RS) isolation; however, it cannot be used with uncommitted read (UR) or repeatable read (RR) isolation levels. DB2 simply ignores the SKIP LOCKED DATA clause under UR and RR isolation levels.

In addition, SKIP LOCKED DATA works only with row locks and page locks. That means that SKIP LOCKED DATA does not apply to table, partition, LOB, XML, or table space locks. And the bigger the lock size, the more data that will be skipped when a lock is encountered. With row locking, you will be skipping over locked rows; however, with page locking, you will be skipping over all the rows on the locked page.

Use this feature with extreme care, and make sure that you know exactly what you are telling DB2 to do; otherwise, you might be reading less than you want. Consider using this option in certain test environments or possibly even in production under the proper conditions. For example, perhaps you have a program that needs to read from a table like a queue to get a next number. If it is not imperative that the numbers be sequential, using SKIP LOCKED DATA can eliminate bottlenecks by skipping any locked rows/pages to get data off of the queue.

Using Currently Committed Data

DB2 10 for z/OS adds another interested nuance to locking with the option to use currently committed data.

A common problem encountered by DB2 subsystems with poorly designed applications is frequent lock timeouts (and deadlocks). Under normal locking semantics, if data is locked an application requesting the same data would just wait until it becomes available, or timeout after waiting. As of DB2 V10, though, there is an alternative option for applications that cannot tolerate waiting on locks.

Using the CONCURRENTACCESSRESOLUTION parameter of the BIND command, you can direct your application program to use currently committed semantics, by specifying
USECURRENTLYCOMMITTED for the CONCURRENTACCESSRESOLUTION parameter. This clause applies only when the isolation level in effect is cursor stability (CS) or read stability (RS); otherwise it is simply ignored.


The impact of this clause on the program is that DB2 uses the currently committed version of the data for applicable scans when data is in the process of being updated or deleted. Rows that are in the process of being inserted can be skipped. So, only committed data is returned, as was the case previously, but now readers do not wait for writers to release locks. Instead, readers return data based on the currently committed version; that is, data prior to the start of the write operation.

Until next time... stay tuned for the next edition in the DB2 Locking series here at the DB2 Portal.