The Db2 Portal Blog

Wednesday, September 18, 2013

Using the DISPLAY Command, Part 1

Often times some of the simplest commands and features of DB2 can be used to great effect -- if you just remember that they are there and learn how to use them. The DISPLAY Command is one of those features.

Although a DB2 performance monitor is probably the best solution for gathering information about your DB2 subsystems and databases, you can gain significant insight into “what is going on out there” using the simple DISPLAY command.

In today's blog post, and a series of subsequent posts, we will learn about the DISPLAY command and how it can be used to monitor various aspects of DB2.

There are eight variations of the DISPLAY command that you can utilize, depending on the type of information you are looking for. The DISPLAY command can be used to return information about the status of DB2 data sharing groups, databases and table spaces, threads, stored procedures, user-defined functions, utilities, and traces; it can also monitor the Resource Limit Facility (RLF) and distributed data locations.

Let’s start with coverage of what is probably the most often-used variation of the DISPLAY command, looking into database information.

By running the DISPLAY DATABASE command, you can gather information on DB2 databases and table spaces. The output of the basic command will show the status of the objects specified along with any exception states that apply. For example:

-DISPLAY DATABASE(DBNAME)

Issuing this command will display details on the DBNAME database including information about the tablespaces and indexes in that database. So, with a simple command you can easily find all of the tablespaces and indexes within any database — pretty powerful stuff. But the status information for each space is useful, too. When a status other than RO or RW is encountered, the object is in an indeterminate state or is being processed by a DB2 utility. The possible statuses that DB2 can assign to a page set are detailed here:

ARBDP	Index is in Advisory Rebuild Pending status; the index should be rebuilt to improve performance and allow the index to be used for index-only access again.
AREO*	The table space, index, or partition is in Advisory Reorg Pending status; the object should be reorganized to improve performance. This status is new as of DB2 V8.
ACHKP	The Auxiliary Check Pending status has been set for the base table space. An error exists in the LOB column of the base table space.
AREST	The table space, index space, or partition is in Advisory Restart Pending status. If back-out activity against the object is not already underway, either issue the RECOVER POSTPONED command or recycle the specifying LBACKOUT=AUTO.
AUXW	Either the base table space or the LOB table space is in the Auxiliary Warning status. This warning status indicates an error in the LOB column of the base table space or an invalid LOB in the LOB table space.
CHKP	The Check Pending status has been set for this table space or partition.
COPY	The Copy Pending flag has been set for this table space or partition.
DEFER	Deferred restart is required for the object.
GRECP	The table space, table space partition, index, index partition, or logical index partition is in the group buffer pool Recover Pending state.
ICOPY	The index is in Informational Copy Pending status.
INDBT	In-doubt processing is required for the object.
LPL	The table space, table space partition, index, index partition, or logical index partition has logical page errors.
LSTOP	The logical partition of a non-partitioning index is stopped.
PSRBD	The entire non-partitioning index space is in Page Set Rebuild Pending status.
OPENF	The table space, table space partition, index, index partition, or logical index partition had an open data set failure.
PSRCP	Indicates Page Set Recover Pending state for an index (non-partitioning indexes).
PSRBD	The non-partitioning index space is in a Page Set Rebuild Pending status.
RBDP	The physical or logical index partition is in the Rebuild Pending status.
RBDP*	The logical partition of a non-partitioning index is in the Rebuild Pending status, and the entire index is inaccessible to SQL applications. However, only the logical partition needs to be rebuilt.
RECP	The Recover Pending flag has been set for this table space, table space partition, index, index partition, or logical index partition.
REFP	The table space, index space, or index is in Refresh Pending status.
RELDP	The object has a release dependency.
REORP	The data partition is in a REORG Pending state.
REST	Restart processing has been initiated for the table space, table space partition, index, index partition, or logical index partition.
RESTP	The table space or index is in the Restart Pending status.
RO	The table space, tables pace partition, index, index partition, or logical index partition has been started for read-only processing.
RW	The table space, table space partition, index, index partition, or logical index partition has been started for read and write processing.
STOP	The table space, table space partition, index, index partition, or logical index partition has been stopped.
STOPE	The table space or index is stopped because of an invalid log RBA or LRSN in one of its pages.
STOPP	A stop is pending for the table space, table space partition, index, index partition, or logical index partition.
UT	The table space, table space partition, index, index partition, or logical index partition has been started for the execution of utilities only.
UTRO	The table space, table space partition, index, index partition, or logical index partition has been started for RW processing, but only RO processing is enabled because a utility is in progress for that object.
UTRW	The table space, table space partition, index, index partition, or logical index partition has been started for RW processing, and a utility is in progress for that object.
UTUT	The table space, table space partition, index, index partition, or logical index partition has been started for RW processing, but only UT processing is enabled because a utility is in progress for that object.
WEPR	Write error page range information.

Of course, there are many additional options that can be used in conjunction with the DISPLAY DATABASE command. The following options can be used to narrow down the amount of information displayed:

USE displays what processes are using resources for the page sets in the database
CLAIMERS displays the claims on the page sets in the database
LOCKS displays the locks held on the page sets in the database
LPL displays the logical page list entries
WEPR displays the write error page range information.

Additionally, for partitioned page sets, you can specify which partition, or range of partitions, that you wish to display.

The OVERVIEW option can be specified to display each object in the database on its own line. This condenses the output of the command and makes it easier to view. The OVERVIEW keyword cannot be specified with any other keywords except SPACENAM, LIMIT, and AFTER.

Another tactic that can be used to control the amount of output generated by DISPLAY DATABASE is to use the LIMIT parameter. The default number of lines returned by the DISPLAY command is 50, but the LIMIT parameter can be used to set the maximum number of lines returned to any numeric value. For example:

-DISPLAY DATABASE(DBNAME) LIMIT(300)

Using the LIMIT parameter in this manner would increase the limit to 200 lines of output. To indicate no limit, you can replace the numeric limit with an asterisk (*).

Finally, you can choose to display only objects in restricted or advisory status using either the ADVISORY or RESTRICT key word.

And be sure to tune in for the next blog post which will look into using DISPLAY to uncover information about your DB2 buffer pools.

Wednesday, September 11, 2013

Answering a Question: Dealing with Different Data Types

I get a lot of e-mail. Some of it is junk, but some of it contains questions on all sorts of issues. I cannot answer everything that comes into my in box or I wouldn't have time to earn a living. But every now and then I will answer a question here on the blog.

Today's question is:

Dear Mr. Mullins,

I have a question I hope you can help me with. In DB2 I am attempting to compare two columns, however one column is data type "Integer" and the other is data type "Decimal". Can you give me an idea on how to convert the data types so they can be compared?

And here is my answer:

You can cast one data type to the other. For example, to cast the decimal to an integer you would use the INTEGER function, like so:

WHERE INTEGER(dec_col) = int_col

Or you could cast the integer column to a decimal like this:

WHERE DECIMAL(int_col) = dec_col

Alternately, you can use CAST to cast either column to either data type. For example, CAST(dec_col AS INTEGER)… Hope this helps…

Friday, September 06, 2013

Top Ten Most Pervasive Myths About DB2 for z/OS

Today's blog offers up yet another Top Ten list for DB2 users, perusers, and abusers... This time counting down the most common myths that are perpetrated "out there" regarding DB2 and how it works (or doesn't work)...

1.Use Views to Insulate Programs from Change

This lie has been told for almost as long as DB2 has been around. I first wrote about this way back in 1991 for Database Programming & Design. Check that article out here if you don't understand why this is a bad idea, in general.

2.Locking Problems Indicate a Database Problem

Locking problems are generally caused by bad program design. You should write code to reduce the duration of locks and to COMMIT regularly... and then locking won't be a problem, for the most part.

3.Primary Key is Usually a Good Choice for Clustering

Actually, the foreign key is likely to be a better choice. When you join PK-->FK there will be one PK to multiple FK. Wouldn't it be best if the FKs were clustered on the same page (or pages)?

4.Just Using the Defaults Should Work Out Well

Don't rely on defaults. Many of them are outdated or wrong... and even if they aren't it will be better if you review and investigate all options before explicitly specifying the parameter value you want.

5.Programmers Don’t Need to Know How to Tune SQL

Programming performance-oriented SQL into your programs from the beginning would go a long way toward improving performance overall... and reducing the length of the application development lifecycle.

6.Black Boxes Work Well for Performance

No they don't!!!

7.Using NULLs Can Save Space

No they can't!!!

8.RUNSTATS Aren’t That Important

If you don't work with up-to-date RUNSTATS then you are hobbling the DB2 Optimizer and almost assuredly getting sub-optimal access paths for your DB2 SQL.

9.DB2 is a Hog

If you don't use it properly, every piece of software can become a resource hog. If you acquire the knowledge on how to work properly with DB2 then it will hum along like a well-oiled machine!

10.It Depends!

This is the answer that can be used for every DB2 question. But if that answer is not followed up with what "it" depends upon, then it is a useless answer... and whoever gave you that answer is probably just trying to get rid of you instead of helping you with your problems.

Sunday, September 01, 2013

Top Ten Common SQL Mistakes (with DB2 for z/OS)

There are many different types of development mistakes that can be made when you are coding a complex application system. But probably the most pervasive mistakes being made when developing an application using DB2 for z/OS are SQL mistakes... well, actually, that statement is probably true regardless of the DBMS, but I am going to focus on DB2 for z/OS given that this is a blog that focuses on DB2 and mainframe issues.

Assuming an ORDER BY is not necessary for ordered results - if you want your results always to be returned in a specific order, you must include the ORDER BY clause. If you fail to do so, a future rebind can change the access path and also possibly change the order of your results set.
Forgetting the NULL indicator - when your query can return a null, but sure to include a null indicator or you will get a -305 SQLCODE (22002 SQLSTATE). And be sure to check the null indicator to see if the data is null or not!
Incorrect expectations when using the NOT IN predicate with NULLs - Nulls can be confusing when using the IN predicate. For example, what if we want to find all Colors who are not assigned to a particular Product using a query like shown below. THe problem arises when the P.color can be NULL. The NULL causes the predicate to be UNKNOWN so the results set is always empty.

SELECT C.color
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
FROM Products AS P);

Coding predicates appropriately in Outer Joins - Sometimes it can be difficult to figure out how to code outer joins appropriately if you don't code them very often. Terry Purcell has written a nice article on the topic that can be found here.
Not coding a cursor for a multi-row result - When more than one row can be returned by your query you must use a cursor (or specify FETCH FIRST 1 ROW ONLY)
Recompiling but not binding - if you make changes to the SQL, you have to BIND it again. Otherwise it won't work!
Forgetting to use single quotes around strings (instead of double quotes) - DB2 SQL expects single quotes around character strings.
Trying to modify a Primary Key column - you cannot modify a primary key. A primary key should be unique within the table and immutable.
Forcing dynamic SQL into static SQL (sometimes hundreds of static SQL statements) - you should analyze the type and nature of your database queries to determine whether they should be static or dynamic. Consider using the advice here (static SQL v. dynamic SQL) to guide you.
Asking for more data than you need (columns and/or rows) - specify only the columns that you actually need in your SELECT-list... and use WHERE clauses (predicates) to filter the data to just that data that you need before bringing it into the program. The lesser the amount of data that DB2 needs to transfer from the database to your program, the more efficient things will be!

Thursday, August 22, 2013

NoSQL Gets Me Thinking About ACID

This week I attended the NoSQL Now Conference in San Jose, California. This conference focused on NoSQL technology and implementations and a LOT of the discussion focused on transactions and whether or not Big Data and NoSQL databases were at a disadvantage when it comes to their lack of support for ACID (mostly).

You can read all about my experience at this conference on my Data Technology Today blog in the following two posts:

At any rate, though, I got to thinking... and those of you who know me understand that that can be a dangerous thing. Basically, ACID is not a topic that relational folks sit around talking about. It is kind of taken for granted. So I thought it might be a good idea to reinforce the definition of ACID and why it is so important in DB2... and the relational world in general.

ACID is an acronym for atomicity, consistency, isolation, and durability. Each of these four qualities is necessary for a transaction to be designed correctly and deliver data integrity when complete:

Atomicity means that a transaction must exhibit “all or nothing” behavior. Either all of the instructions within the transaction happen, or none of them happen. Atomicity preserves the “completeness” of the business process.
Consistency refers to the state of the data both before and after the transaction is executed. A transaction maintains the consistency of the state of the data. In other words, after running a transaction, all data in the database is “correct.”
Isolation means that transactions can run at the same time. Any transactions running in parallel have the illusion that there is no concurrency. In other words, it appears that the system is running only a single transaction at a time. No other concurrent transaction has visibility to the uncommitted database modifications made by any other transactions. To achieve isolation, a locking mechanism is required.
Durability refers to the impact of an outage or failure on a running transaction. A durable transaction will not impact the state of data if the transaction ends abnormally. The data will survive any failures.

Let’s use an example to better understand the importance of ACID transactions to relational database applications. Consider a banking application. Assume that you wish to withdraw $50 from your account with your bank. This business process requires a transaction to be executed. You request the money either in person by handing a slip to a bank teller or by using an ATM. When the bank receives the request, it performs the following tasks, which make up the complete business process. The bank will

Check your account to make sure you have the necessary funds to withdraw the requested amount.
If you do not, deny the request and stop; otherwise continue processing.
Debit the requested amount from your checking account.
Produce a receipt for the transaction.
Deliver the requested amount and the receipt to you.

The transaction that is run to perform the withdrawal must complete all of these steps, or none of these steps, or else one of the parties in the transaction will be dissatisfied. If the bank debits your account but does not give you your money, then you will not be satisfied. If the bank gives you the money but does not debit the account, the bank will be unhappy. Only the completion of every one of these steps results in a “complete business process.” Database developers must understand the requisite business processes and design transactions that ensure ACID properties.

Unit of work (UOW) is another transaction term that describes a physical transaction. A UOW is a series of instructions and messages that, when executed, guarantees data integrity. So a UOW and a transaction are similar in concept. However, a UOW is not necessarily a complete business process—it can be a subset of the business process, and a group of units of work can constitute a single transaction. Each UOW must possess ACID characteristics. In other words, if the transaction were to fail, the state of the data upon failure must be consistent in terms of the business requirements.

To summarize, a transaction—when executed alone, on a consistent database—will either complete, producing correct results, or terminate, with no effect. In either case the resulting condition of the database will be a consistent state.

Hopefully after this discussion it is simple to see why relational databases—with many related tables—rely on ACID properties of transactions to maintain consistency. Of course, the NoSQL world has different use cases and, arguably, can get by with eventual consistency... that is, without ACID. But that is another topic that is, frankly, beyond the scope of this DB2/mainframe-focused blog.

Hopefully, though, this review of ACID and its importance to data consistency was helpful.

Sunday, August 11, 2013

The Top Ten DB2 Development Best Practices

If you have been reading my blogs lately you know that I have been posting Top Ten lists of various types, and today's post offers up yet another one. This time, the list provides guidance for DB2 developers to keep in mind as they build DB2 applications...

1. Minimize network calls

The fewer number of times you need to make calls across the network the better your program will perform.

2. Minimize passes through the data

Try to read each data item only once. The more times you access the same data the worse performance will be.

3. Put the work into the SQL, not the program

Let SQL do the work... DB2 can optimize SQL better than you can optimize your programming language of choice.

4. Unlearn the “flat file” mentality

SQL is a set-based language. Each SQL statement can operate on multiple rows of data at once. And joins are more efficient than opening multiple cursors and performing "master file" processing logic on them.

5. Be sure data type and length match in predicates

Even though DB2 no longer automatically degrades non-matching predicates to Stage 2, it is still a best practice to match the data type and length for columns and host variables that participate in predicates.

6. Know your Stage 1, Stage 2, and Indexable predicates

...and avoid Stages 3 and 4!

7. Document your code

Without proper documentation application maintenance becomes difficult... especially if you used any SQL tuning tricks/techniques.

8. Always check the SQLCODE or SQLSTATE

If you don't check for an error you may be processing with bad, incorrect, or missing data.

9. Analyze your access paths (and tune your SQL in test)

If you do not examine the access paths formulated by the DB2 optimizer for your SQL then how do you know how efficient (or not) your code is (or will be).

10. Avoid Bachelor Programming Syndrome

Be sure to code COMMITs in your application programs

Monday, August 05, 2013

COPYTOCOPY: The Forgotten Little DB2 Utility?

The COPYTOCOPY utility was introduced some time ago now, way back in DB2 for z/OS Version 7. The purpose of the COPYTOCOPY utility is to make additional image copies of currently existing image copy data sets. But many DBAs "out there" keep pluggiong away, almost unaware that COPYTOCOPY even exists. So let's spend a little time talking about it today.

The primary benefit of COPYTOCOPY is to reduce the amount of time required to run the COPY utility. Remember that the COPY utility can be used to take up to four image copies with a single execution of the utility. But with COPYTOCOPY available, instead of using COPY to make four image copy backups, the DBA can use COPY to make a single image copy, and then run COPYTOCOPY to make additional image copies. The COPY utility will take less time to create a single image copy backup than it will to take multiple image copy backups. And the combination of COPY plus COPYTOCOPY can, at times perhaps, be used to increase availability.

Individual data and index partitions are treated as distinct target objects by the COPYTOCOPY utility. Any other utilities operating on different partitions of the same table space or index space can be run concurrently with COPYTOCOPY.

The following utilities can not be run concurrently on the same database object as the COPYTOCOPY utility:

COPY
LOAD
MERGECOPY
MODIFY
RECOVER
REORG INDEX
REORG TABLESPACE

Furthermore, COPYTOCOPY is flexible enough to run against any DB2 image copy data set. This includes inline copies made during the execution of the REORG and LOAD utilities. COPYTOCOPY must start with a primary image copy backup – either the local primary or recovery site primary copy. From that image copy, the COPYTOCOPY utility can make up to three copies of one or more of the following types:

local primary
local backup
recovery site primary
recovery site backup

Copies created by COPYTOCOPY can be used by the RECOVER utility just like regular image copies created using the COPY utility. Both table space and index space copies can be made using the COPYTOCOPY utility. Any DB2 utility process that uses image copy data sets can use the image copy data sets created by COPYTOCOPY. This includes MERGECOPY, UNLOAD, and subsequent runs of COPYTOCOPY. However, keep in mind that image copies created with the CONCURRENT option of the COPY utility are not supported by the COPYTOCOPY utility.

Just like the COPY utility, the COPYTOCOPY utility records information about the image copies that it creates in the SYSIBM.SYSCOPY system catalog table. The COPYTOCOPY utility will insert the values in the DSNAME, GROUP_MEMBER, JOBNAME, AUTHID, DSVOLSER and DEVTYPE columns as appropriate depending on the copies that are being created.

You cannot run COPYTOCOPY to create additional image copies for certain DB2 Catalog (SYSCOPY in DSNDB06) and DB2 Directory (DSNDB01 and SYSUTILX both in DSNDB01) objects.

The COPYTOCOPY utility operates in these distinct phases:

UTILINIT – Initialization and setup
CPY2CPY – Copying the image copy
UTILTERM – Cleanup

TERM and Restart Issues

The use of the TERM command to terminate a COPYTOCOPY step that has abended is not recommended. A current restart should be done instead to allow COPYTOCOPY to pickup where it left off. Terminating COPYTOCOPY in such a situation might cause inconsistencies between the ICF catalog and DB2 catalog when GDGs are used.

You cannot use RESTART(PHASE) for a COPYTOCOPY job. It is fine to use RESTART(CURRENT) as long as you avoid using the -TERM UTILITY command to terminate a COPYTOCOPY step. When you use RESTART(CURRENT), COPYTOCOPY will restart from the last commit point with the same image copy data set, so be sure to code a data set disposition of DISP=(MOD,CATLG,CATLG) on your JCL DD statements.

Inline Copy Exception

When using COPYTOCOPY to copy an inline image copy that was made by the REORG utility with the part range option you will need to specify individual DSNUM for the partitions to be copied. The COPYTOCOPY utility does not support part range. COPYTOCOPY will copy only the specified partition data from the input inline image copy data set into the output image copy data set.

COPYTOCOPY Execution

To run the COPYTOCOPY utility it is not necessary to provide the explicit data set name of the image copy being copied. Instead, the input to the COPYTOCOPY utility is the name of the table space, index space, or index for which the original copy was made, and an indication of which image copy in the catalog should be copied. To specify this information COPYTOCOPY provides three options:

FROMLASTCOPY – indicates that the most recent image copy taken for the table space or index space is to be used as input to the COPYTOCOPY utility. The input could be either a full image copy or incremental copy. The utility will retrieve the information from the SYSIBM.SYSCOPY system catalog table.
FROMLASTFULLCOPY – indicates that the most recent full image copy taken for the object is to be used as the input to COPYTOCOPY job. Once again, this information is obtained by querying the DB2 Catalog.
FROMLASTINCRCOPY – indicates that the most recent incremental image copy taken for the object is to be used as the input to COPYTOCOPY job. FROMLASTINCRCOPY is not valid for index spaces or indexes. If FROMLASTINCRCOPY is specified for an index space or index, COPYTOCOPY will use the last full image copy that was taken for the index, if one is available. And once again, this information is obtained by querying the DB2 Catalog.

Of course, you may choose instead to specify the data set name for the image copy that is to be copied by the COPYTOCOPY utility. This can be accomplished by using the FROMCOPY clause. But keep in mind that when you are using COPYTOCOPY with a list of objects defined using the LISTDEF statement, the FROMCOPY clause is not valid.

If the FROMCOPY keyword is not used the COPYTOCOPY utility must determine which specific image copy is to be copied. Before COPYTOCOPY can execute it may have to choose between the local site primary copy, local site backup copy, recovery site primary copy, and recovery site backup copy data sets. COPYTOCOPY will search image copies in the following order to determine the input data set to be used:

If you are running COPYTOCOPY at your local site, the search order will be (1) local site primary copy, (2) local site backup copy, (3) recovery site primary copy, (4) recovery site backup copy.
If you are running the utility at your recovery site, the search order will be (1) recovery site primary copy, (2) recovery site backup copy, (3) local site primary copy, then finally (4) local site backup copy.

If the input data set cannot be allocated or opened, the COPYTOCOPY utility will try to use the next image copy data with the same START_RBA value in SYSIBM.SYSCOPY column, in the search order as indicated previously. When the FROMCOPY keyword is used though, only the explicitly specified data set can be used as the input to COPYTOCOPY.

An Example of the COPYTOCOPY Utility

Let’s take a quick look at a sample JCL job step to run the COPYTOCOPY utility. The following code can be run to make a backup local image copy of the table space DSN8S71E in the sample DB2 database DSN8D71A. This will be either a full or incremental image copy, whichever was last run for this object:


//STEP1    EXEC DSNUPROC,UID='DBAPCSM.CPY2CPYT',

//         UTPROC='',
//         SYSTEM='DB2T',DB2LEV=DB2A
//SYSIN    DD *

//COPY2    DD
DSN=COPY002F.IFDY01,UNIT=SYSDA,VOL=SER=CPY02I,

//        
SPACE=(CYL,(15,1)),DISP=(NEW,CATLG,CATLG)

//SYSIN    DD *
  
COPYTOCOPY TABLESPACE DSN8D71A.DSN8S71E COPYDDN(,COPY2)

/*

Bottom Line

The COPYTOCOPY utility provides a useful new feature to the toolkit of DB2 DBAs. Using COPYTOCOPY to create additional image copies from existing image copies can enhance availability and assist DBAs in creating an optimal backup and recovery plan for their DB2 applications and databases.

Wednesday, July 31, 2013

Compression Becoming More Important in the Age of Big Data

DBAs and database professionals have been aware of the pros and cons of compressing data for years. The traditional argument goes something like this: with compression you can store more data in less space, but at the cost of incurring CPU to compress the data upon insertion (and modification) and decompress the data upon reading it. Over time, the benefits of compression became greater as compression algorithms became more robust, hardware assist chips became available to augment compression speed, and the distributed model of computing made transmitting data across networks a critical piece of the business transaction (and transmitting compressed data is more efficient than transmitting uncompressed data).

IBM has significantly improved compression in DB2 for z/OS over the years. In the early days of mainframe DB2 no compression capability came with DB2 out-of-the-box -- the only mechanism for compressing data was via an exit routine (EDITPROC). Many software vendors developed and sold compression routines for DB2. Eventually, IBM began shipping a sample compression routine with DB2. And then in DB2 Version 3 (1993) hardware-assisted compression was introduced. Using the hardware assist , the CPU used by DB2 compression is minimal and the cons list gets a little shorter.

Indeed, one piece of advice that I give to most shops when I consult for them is that they probably need to look at compressing more data than they already are. Compressed data can improve performance these days because, in many cases, you can fit more rows per page. And therefore scans and sequential processes can process more data with the same number of I/Os, thereby improving performance. Of course, you should use the DSN1COMP utility to estimate the amount of savings that can accrue via compression before compressing any existing data.

Eventually, in DB2 9 we even get index compression capability (of course, using different technology than data compression). At any rate, compressing data on DB2 for z/OS is no longer the “only-if-I-have-to” task that it once was.

Then along comes the Big Data phenomenon where increasingly large data sets need to be stored and analyzed. Big Data is typified by data sets that are so large and complex that traditional tools and database systems are ill-suited to process them. Clearly, compressing such data could be advantageous… but is it possible to process and compress such large volumes of data?

New alternatives to traditional systems are being made available that offer efficient resource usage based on principles of compressed sensing and other techniques. One example of this new technology is IBM’s BLU Acceleration, which is included in DB2 10.5 for Linux, Unix, and Windows. One feature of BLU Acceleration is extended compression, which eliminates the need for indexes and aggregation and operates on compressed data and can thereby eliminate the CPU time that would be required to decompress the data. Advanced encoding maximizes compression while preserving the order of encoding so compressed data can be quickly analyzed without decompressing it. It is an impressive technology as no changes are required to your existing SQL statements.

IBM reports that some clients using DB2 10.5 for LUW with BLU Acceleration have achieved compression rates 10 times greater than uncompressed tables.

Of course, BLU Acceleration is much more than compression (it combines in-memory, columnar and compression technologies), but for the purposes of today’s blog entry we won’t delve deeper into the technology. If you are interested in a little bit more on BLU read my high-level overview in my coverage of this year’s IDUG DB2 TechnicalConference.

So compression is becoming cool… who’d have thought that back in the 1980s when compression was something we only did when we absolutely had to?

Friday, July 26, 2013

Friday, July 19, 2013

DB2 Locking, Part 17: In Conclusion

Today's blog post concludes our multi-part series on DB2 for z/OS locking and concurrency. We have touched on a great many aspects of locking in this series. Such an in-depth, technical topic as DB2 locking can be difficult to master, but doing so can deliver a more clear understanding of how DB2 operates and how your programming and design decisions impact not only your application, but also the entire DB2 subsystem.

A Couple Additional Locking Ideas and Thoughts

Before concluding this series, here are a few more guidelines and thoughts on DB2 locking and concurrency:

It is a good idea to use clustering to encourage DB2 to maintain data that is accessed together on the same page. If you use page locking, fewer locks are required to access multiple rows if the rows are clustered on the same page or pages. You can also use larger page sizes to control the amount of data locked for page locking.
Consider using the free space parameters to influence locking. If you increase free space, fewer rows are stored on a single page. Therefore, fewer rows are locked by a single page lock. This approach can decrease contention. However, it consumes additional disk storage, and it can also decrease the performance of table space scans (because more pages with fewer rows must be read). Additionally, keep in mind that as data is added to the table the free space will decrease (because the new data is using it). As such, locking issues may become more prevalent.

You can also decrease the number of rows per page using the MAXROWS option of the CREATE TABLESPACE statement. The fewer rows per page, the less intrusive page locking will be because fewer rows will be impacted by a page lock. This approach is probably better than the free space approach (previous bullet) because new data will not impact number of rows per page.

Design your application programs with locking considerations in mind. THis is the Number One thing to remember to increase concurrency and minimize the impact of locking on DB2 application and system performance. You can minimize the effect of locking through proper application program design. This means:

Limiting the number of rows that are accessed by coding predicates to filter unwanted rows

Requesting only the data (rows and columns) that your actually need to perform your business processes

Perform modifications as close to the end of the unit of work as possible

And remember to avoid bachelor programming syndrome (see Part 9).

Summary

Of course, there are probably many more hints, tips, and guidelines for developing DB2 databases and applications with concurrency in mind, but I think a 17 part series is sufficient for my blog. If you want more details on concurrency (or any other aspect of DB2 for z/OS) might I recommend the latest edition of my book -- DB2 Developer's Guide, 6th edition.

And be sure to come back and review this series on locking if you get confused as you work to maximize the concurrency of your DB2 queries, transactions, and programs.

Finally, as a service to my readers, this post includes a directory/index to the 16 separate posts that make up this series.

Index of Blog Posts on DB2 Locking

DB2 Locking, Part 1: An Overview

DB2 Locking, Part 2: Table Space and Table Locks

DB2 Locking, Part 3: Locks Versus Latches

DB2 Locking, Part 4: Page and Row Locks

DB2 Locking, Part 5: Lock Suspensions, Timeouts, and Deadlocks

DB2 Locking, Part 6: Claims, Drains, and Partition Independence

DB2 Locking Part 7: Lock Avoidance, Related Issues, and Stuff

DB2 Locking, Part 8: LOBs and Locking

DB2 Locking, Part 9: Avoid Bachelor Programming Syndrome

DB2 Locking, Part 10: Know Your ISOLATION Levels

DB2 Locking, Part 11: Data Sharing Global Lock Management

DB2 Locking, Part 12: Lock Promotion and Escalation

DB2 Locking, Part 13: Optimistic Locking

DB2 Locking, Part 14: Using the LOCK TABLE Statement

DB2 Locking, Part 15: Tackling Timeout Troubles

DB2 Locking, Part 16: Skipping Uncommitted Inserts

Monday, July 15, 2013

DB2 Locking, Part 16: Skipping Uncommitted Inserts

DB2 10 for z/OS introduces a new technique for concurrency. As we have discussed previously in this series on DB2 locking, one of the most troublesome problems for DB2 performance analysts is reducing timeouts and lock suspensions.

The CONCURRENTACCESSRESOLUTION parameter can be used to specify a concurrent access resolution option to use for statements in a package when binding your application program.

There are two options. The first, USECURRENTLYCOMMITTED, indicates that DB2 can use the currently committed version of the data for applicable scans when data is in the process of being updated or deleted. Rows that are in the process of being inserted can be skipped. This clause applies only when the isolation level in effect is Cursor Stability (CS) or Read Stability (RS) making skip uncommitted inserts apply. It is ignored for other isolation levels.

The second option is WAITFOROUTCOME, which indicates that applicable scans must wait for a COMMIT or ROLLBACK operation to complete when data is in the process of being updated or deleted. Rows that are in the process of being inserted are not skipped.

The default behavior is WAITFOROUTCOME. Instead, if you choose to specify USECURRENTLYCOMMITTED DB2 will ignore rows that are in the process of being inserted and use only currently committed rows. This might be desirable for highly concurrent web transactions or to mimic the application concurrency settings or behavior of another DBMS.

Another option at your disposal is to specify that uncommitted inserts are to be skipped at the subsystem level (using option 19 on panel DSNTIPB during DB2 installation or by assembling a new DSNZPARM). The SKIPUNCI subsystem parameter specifies whether statements ignore a row that was inserted by another transaction if the row has not if the row has not yet been detected as committed. A newly inserted row can be detected as committed only after the lock held on the row has been released.

There are two options for this parameter:

Specifying YES for SKIPUNCI will cause DB2 to behave as though the newly inserted row has not yet arrived and the row is skipped, until the lock held on a newly inserted row is released.
SKIPUNCI(NO) indicates that DB2 will wait for the inserted row to be committed or rolled back. It then processes the row if the insert commits, or it moves on to find another row if the insert is rolled back. If a transaction performs one or more inserts, and then spawns a second transaction, specify NO for SKIP UNCOMM INSERTS if the first transaction needs the second transaction to wait for the outcome of the inserts. This is the default value.

By using either of these two methods to skip uncommitted inserts you can improve concurrent access to data as you tune your application’s locking and concurrency requirements.

Tuesday, July 09, 2013

DB2 Locking, Part 15: Tackling Timeout Troubles

Many shops battle with locking issues and frequently, the cause of performance issues can be traced back to locking issues, more specifically, lock timeout issues. When you experience a timeout, it means that another process holds a lock on the data that you are trying to modify.

You may experience timeout troubles as the dreaded -911 SQLCODE. Timeouts are different than most performance issues because the job or users receiving the -911 SQLCODE is usually not the source of the problem. Instead, the jobs causing the problem typically continue to run just fine, bringing about problems for everybody else!

The primary factors to examine when looking to resolve timeout issues are:

The lock size parameter that was specified at table space creation time.
The duration of the locks established by the BIND strategies. Lock duratiuon is determined by a combination of the ACQUIRE and RELEASE strategies for the table spaces and the ISOLATION parameters for the page locks.
The use of LOCK TABLE statements in any programs (See DB2 Locking Part 14).
The scope of the commit in the application code.
The manner in which the table space was started (RW or RO).

Most of the time, resource contention and timeouts are due to long-running programs that are not issuing COMMITs…or perhaps are not committing their work frequently enough. Issuing a COMMIT ends a unit of recovery and hardens all database modifications made during that unit of recovery.

So be sure to verify that all of your batch processes -- especially any that run concurrently with other workload (but really all batch process) -- have a COMMIT strategy. This means that your programs should issue a COMMIT after processing "a set number of" inserts, updates, and deletes. A COMMIT will tell DB2 to make the changes permanent and releases locks. A good rule of thumb is to strive for having no lock held for more than five seconds.

A reasonable approach is to set a counter that is incremented after every modification. Then, check it and when it exceeds a predefined threshold -- say 25 or 50 or 100 modifications -- then issue a COMMIT. You should make the threshold an input parameter so that you can change it as the workload in your system changes. For example, make it 25 when concurrent activity is high, but ramp it up to 100 or higher when it is low. Failure to issue COMMITs will result in timeouts, as well as possibly deadlocks and lock escalation.

For a more in-depth discussion on the importance of issuing COMMITs in your application programs review DB2 Locking Part 9.

Another practical approach for reducing lock duration is to make data modifications as close to the end of the unit of work as possible. Look at all of your programs, both batch and online, and try to save the data modification statements to as close to the COMMIT as you can. By saving the data modification until right before you issue a COMMIT, you reduce the overall average lock duration. This should result in reducing contention and therefore, the number of timeouts.

Some additional suggestions to maximize concurrency and reduce timeouts include:

Use WHERE CURRENT OF CURSOR to perform UPDATEs and DELETEs in your programs.
Ensure that you have created your table spaces with the appropriate LOCKSIZE (usually PAGE, but sometimes perhaps ROW).
If you have tables that are static, consider starting their table space as RO instead of RW. Doing so can enable DB2 to utilize table locking instead of page or row locking for those tables.
Limit the scope of SQL statements so that a statement locks 150 or fewer pages on a table for each unit of work that is performed.

Finally, if you want to investigate timeout details, be sure to examine the statistics trace class 3 and IFCID 0196 for timeouts (IFCID 0172 is for deadlocks).