Tuesday, July 19, 2011

Updating DB2 Developer's Guide (6th edition)

Just a brief blog post today to inform regular readers who might be concerned that I have not been blogging as frequently as in the past.

I am currently working on updating my book, DB2 Developer's Guide, for DB2 versions 9 and 10. This will be the sixth edition of the book and if all goes as planned, it should be available on IBM Press before the end of the year (2011).

Be sure to keep an eye on this blog and my web site (http://www.craigsmullins.com) for further details about the next edition of the book.

Sunday, June 19, 2011

The PIECESIZE Clause

One of the more troubling aspects of DB2 database design and creation is the non-partitioning index (NPI). Creating NPIs on tables in a partitioned table space can pose management and performance issues. Partitioned table spaces tend to be large and by their very design will span multiple underlying data sets. Any partitioning indexes will also span multiple data sets. But what happens when you need to define non-partitioning indexes on a table in a partitioned table space?

The PIECESIZE clause of the CREATE INDEX statement can be used during index creation to break an NPI into several data sets (or "pieces"). More accurately, the PIECESIZE clause specifies the largest data set size for a non-partitioned index. PIECESIZE can be specified in kilobytes, megabytes, or gigabytes. For example, the following statement will limit the size of individual data sets for the XACT2 index to 256 megabytes:

CREATE UNIQUE INDEX DSN8910.XACT2
ON DSN8910.ACT (ACTKWD ASC)
USING STOGROUP DSN8G910
PRIQTY 65536K
SECQTY 8192K
ERASE NO
BUFFERPOOL BP0
CLOSE NO
PIECESIZE 256M;

Basically, PIECESIZE is used to enable NPIs to be created on very large partitioned table spaces. It breaks apart the NPI into separate pieces that can be somewhat managed individually. Without PIECESIZE, NPIs would be quite difficult to manage and administer. Keep in mind, though, that PIECESIZE does not magically partition an NPI based on the partitioning scheme of the table space. This is a misperception held by some. So, if you have a partitioned table space with 4 partitions and then create an NPI with 4 pieces, the data in the NPI pieces will not match up with the data in the 4 partitions.

When using PIECESIZE, more data sets will be created and therefore you can obtain greater control over data set placement. Placing the pieces on separate disk devices can help to reduce I/O contention for SQL operations that access NPIs during read or update processing. The elapsed time improvement may be even greater when multiple tasks are accessing the NPI.

Separating the NPI into pieces allows for better performance of INSERT, UPDATE and DELETE processes by eliminating bottlenecks that can be caused by using only one data set for the index. The use of pieces also improves concurrency and performance of heavy INSERT, UPDATE, and DELETE processing against any size partitioned table space with NPIs.

Keep in mind that PIECESIZE is only a specification of the maximum amount of data that a piece (that is, a data set) can hold and not the actual allocation of storage, so PIECESIZE has no effect on primary and secondary space allocation. Each data set will max out at the PIECESIZE value, so specifying PRIQTY greater than PIECESIZE will waste space. But also make sure that you avoid setting the PIECESIZE too small. A new data set will be allocated each time the PIECESIZE threshold is reached. DB2 will increment the A001 component of the data set name each time. Ideally, the value of your primary quantity and secondary quantities should be evenly divisible into PIECESIZE to avoid wasting space.

To choose a PIECESIZE value, divide the overall size of the entire NPI by the number of data sets that you wish to have. For example, for an NPI that is 8 megabytes, you can arrive at 4 data sets for the NPI by specifying PIECESIZE 2M. Of course, if your NPI grows over 8 megabytes in total you will get additional data sets. Keep in mind that 32 pieces is the limit if the underlying table space is not defined with DSSIZE 4G or greater. The limit is 254 pieces if the table space is defined as DSSIZE 4G or greater.

Wednesday, June 01, 2011

Mainframe Specialty Processors

Anyone who uses an IBM z Series mainframe has probably heard about zIIPs and zAAPs and other specialty processors. But maybe you haven't yet done any real investigation into what they are, what they do, and why they exist. So, with that in mind, let's take a brief journey into the world of specialty processors in today's blog entry!

Over the course of the past decade or so, IBM has introduced several different types of specialty processors. The basic idea of a specialty processor, is that it sits alongside the main CPUs and specific types of "special" workload is shuttled to the specialty processor to be run there, instead of on the primary CPU complex. Why is this useful or interesting to mainframe customers? Well, the specialty processor workload is not subject to IBM (as well as many ISVs) licensing charges... and, as any mainframer knows, the cost of software rises as capacity on the mainframe rises. But if capacity can be redirected to a specialty processor, then software license charges do not accrue -- at least for that workload.

And for VWLC customers, shuttling workload to a specialty processor can reduce the rolling four hour average and thereby decrease your monthly IBM software license bill.

Another benefit of the specialty processors is that can be cheaper to acquire than standard CPUs.

But specialty processors can only run certain types of workloads. There are four types of specialty processors:

  • ICF: Internal Coupling Facility - used for redirecting coupling facility cycles in a data sharing environment.
  • IFL: Integrated Facility for Linux - used for processing zLinux workload on an IBM mainframe.
  • zAAP: Application Assist Processor - used for Java workload
  • zIIP: Integrated Information Processor - used for processing certain, distributed database workloads.

When you activate any of these processors, some percentage of that type of workload can be redirected off of the main CP onto the specialty processor... but not 100% of the workload. It can be frustrating, particularly with the zIIP, to determine exactly what is redirected exactly when and exactly how much of it. In general, distributed DB2 for z/OS workload and XML processing can be redirected to zIIP processors.

Additionally, to run on a zIIP, the workload must run under an enclave SRB. So, code written to execute under a TCB will usually be unable to execute under an SRB without major changes. If you didn't understand that sentence, don't worry about it too much. Basically, IBM has enabled certain types of (mostly DB2) workload to run on zIIPs, and ISVs have enabled some of their code to run on zIIPs, too. If you are interested, more details about zIIPs can be found at this link.

Another interesting tidbit is that zAAP-eligible workloads can be run on zIIPs with IBM's zAAP on zIIP support. This can be a boon to some shops that only have zIIPs and no zAAPs. Now, with zAAP on zIIP support, you can use zIIP processors for both Java and distributed DB2 workloads. The combined eligible TCB and enclave SRB workloads might make the acquisition of a zIIP cost effective.This capability also provides more value for customers having only zIIP processors by making Java- and XML-based workloads eligible to run on existing zIIPs.

To take advantage of zAAP on zIIP, you need to be running z/OS V11.1 (or z/OS V1.9 or V1.10 with the PTFs for APAR OA27495) on a z9, z10, or z196 server.

Keep in mind, that the terms for specialty processors do not change. You can only have 1 zAAP and 1 zIIP per each general purpose processor. So, even if you have zAAP on zIIP configured, the chip is still a zIIP and you cannot have any more than 1 per general purpose processor.

The Bottom Line

The bottom line is that even though it can take some studying and research to understand their benefit and functionality, specialty processors can help to reduce the cost of mainframe computing... and that is a good thing!

What is an Enclave?

If you are a DB2 professional dealing with distributed workload… or if you are enabling zIIP specialty processors… chances are you’ve heard the term “enclave” or “enclave SRB.” But just what is an enclave?

An enclave is a construct that represents a transaction or unit of work. Enclaves are a method of managing mainframe transactions for non-traditional workloads. You can think of an enclave as an anchor point for resource accumulation regardless of where the transaction is executing.

With traditional workloads it is relatively easy to map the resources consumed to the actual transaction doing the consumption. But with non-traditional workloads – web transactions, distributed processing, etc. – it is more difficult because the transaction can span platforms. Enclaves are used to overcome this difficulty by correlating closely to the end user’s view of the transaction.

So even though a non-traditional transaction can comprise multiple “pieces” spanning many server address spaces, and can share those address spaces with other transactions, the enclave gives you more effective control over the non-traditional workload.

If you are interested in more details on enclaves and how they are managed, read through Enclaves – Managing Business Transactions from IBM’s RMF Newsletter.

Wednesday, May 25, 2011

A Quick SQL Trick: Find The Number of Commas

Today's blog post is a short one. I was recently asked how to return a count of specific characters in a text string column. For example, given a text string, return a count of the number of commas in the string.

This can be done using the LENGTH and REPLACE functions as follows:

SELECT LENGTH(TEXT_COLUMN) - LENGTH(REPLACE(TEXT_COLUMN, ',' ''))

The first LENGTH function simply returns the length of the text string. The second iteration of the LENGTH function in the expression returns the length of the text string after replacing the target character (in this case a comma) with a blank.

So, let's use a string literal to show a concrete example:

SELECT LENGTH('A,B,C,D') - LENGTH(REPLACE('A,B,C,D', ',', ''))

This will translate into 7 - 4... or 3. And there are three commas in the string.

When confronted with a problem like this it is usually a good idea to review the list of built-in SQL functions to see if you can accomplish your quest using SQL alone.