Sunday, June 19, 2011


One of the more troubling aspects of DB2 database design and creation is the non-partitioning index (NPI). Creating NPIs on tables in a partitioned table space can pose management and performance issues. Partitioned table spaces tend to be large and by their very design will span multiple underlying data sets. Any partitioning indexes will also span multiple data sets. But what happens when you need to define non-partitioning indexes on a table in a partitioned table space?

The PIECESIZE clause of the CREATE INDEX statement can be used during index creation to break an NPI into several data sets (or "pieces"). More accurately, the PIECESIZE clause specifies the largest data set size for a non-partitioned index. PIECESIZE can be specified in kilobytes, megabytes, or gigabytes. For example, the following statement will limit the size of individual data sets for the XACT2 index to 256 megabytes:


Basically, PIECESIZE is used to enable NPIs to be created on very large partitioned table spaces. It breaks apart the NPI into separate pieces that can be somewhat managed individually. Without PIECESIZE, NPIs would be quite difficult to manage and administer. Keep in mind, though, that PIECESIZE does not magically partition an NPI based on the partitioning scheme of the table space. This is a misperception held by some. So, if you have a partitioned table space with 4 partitions and then create an NPI with 4 pieces, the data in the NPI pieces will not match up with the data in the 4 partitions.

When using PIECESIZE, more data sets will be created and therefore you can obtain greater control over data set placement. Placing the pieces on separate disk devices can help to reduce I/O contention for SQL operations that access NPIs during read or update processing. The elapsed time improvement may be even greater when multiple tasks are accessing the NPI.

Separating the NPI into pieces allows for better performance of INSERT, UPDATE and DELETE processes by eliminating bottlenecks that can be caused by using only one data set for the index. The use of pieces also improves concurrency and performance of heavy INSERT, UPDATE, and DELETE processing against any size partitioned table space with NPIs.

Keep in mind that PIECESIZE is only a specification of the maximum amount of data that a piece (that is, a data set) can hold and not the actual allocation of storage, so PIECESIZE has no effect on primary and secondary space allocation. Each data set will max out at the PIECESIZE value, so specifying PRIQTY greater than PIECESIZE will waste space. But also make sure that you avoid setting the PIECESIZE too small. A new data set will be allocated each time the PIECESIZE threshold is reached. DB2 will increment the A001 component of the data set name each time. Ideally, the value of your primary quantity and secondary quantities should be evenly divisible into PIECESIZE to avoid wasting space.

To choose a PIECESIZE value, divide the overall size of the entire NPI by the number of data sets that you wish to have. For example, for an NPI that is 8 megabytes, you can arrive at 4 data sets for the NPI by specifying PIECESIZE 2M. Of course, if your NPI grows over 8 megabytes in total you will get additional data sets. Keep in mind that 32 pieces is the limit if the underlying table space is not defined with DSSIZE 4G or greater. The limit is 254 pieces if the table space is defined as DSSIZE 4G or greater.

Wednesday, June 01, 2011

Mainframe Specialty Processors

Anyone who uses an IBM z Series mainframe has probably heard about zIIPs and zAAPs and other specialty processors. But maybe you haven't yet done any real investigation into what they are, what they do, and why they exist. So, with that in mind, let's take a brief journey into the world of specialty processors in today's blog entry!

Over the course of the past decade or so, IBM has introduced several different types of specialty processors. The basic idea of a specialty processor, is that it sits alongside the main CPUs and specific types of "special" workload is shuttled to the specialty processor to be run there, instead of on the primary CPU complex. Why is this useful or interesting to mainframe customers? Well, the specialty processor workload is not subject to IBM (as well as many ISVs) licensing charges... and, as any mainframer knows, the cost of software rises as capacity on the mainframe rises. But if capacity can be redirected to a specialty processor, then software license charges do not accrue -- at least for that workload.

And for VWLC customers, shuttling workload to a specialty processor can reduce the rolling four hour average and thereby decrease your monthly IBM software license bill.

Another benefit of the specialty processors is that can be cheaper to acquire than standard CPUs.

But specialty processors can only run certain types of workloads. There are four types of specialty processors:

  • ICF: Internal Coupling Facility - used for redirecting coupling facility cycles in a data sharing environment.
  • IFL: Integrated Facility for Linux - used for processing zLinux workload on an IBM mainframe.
  • zAAP: Application Assist Processor - used for Java workload
  • zIIP: Integrated Information Processor - used for processing certain, distributed database workloads.

When you activate any of these processors, some percentage of that type of workload can be redirected off of the main CP onto the specialty processor... but not 100% of the workload. It can be frustrating, particularly with the zIIP, to determine exactly what is redirected exactly when and exactly how much of it. In general, distributed DB2 for z/OS workload and XML processing can be redirected to zIIP processors.

Additionally, to run on a zIIP, the workload must run under an enclave SRB. So, code written to execute under a TCB will usually be unable to execute under an SRB without major changes. If you didn't understand that sentence, don't worry about it too much. Basically, IBM has enabled certain types of (mostly DB2) workload to run on zIIPs, and ISVs have enabled some of their code to run on zIIPs, too. If you are interested, more details about zIIPs can be found at this link.

Another interesting tidbit is that zAAP-eligible workloads can be run on zIIPs with IBM's zAAP on zIIP support. This can be a boon to some shops that only have zIIPs and no zAAPs. Now, with zAAP on zIIP support, you can use zIIP processors for both Java and distributed DB2 workloads. The combined eligible TCB and enclave SRB workloads might make the acquisition of a zIIP cost effective.This capability also provides more value for customers having only zIIP processors by making Java- and XML-based workloads eligible to run on existing zIIPs.

To take advantage of zAAP on zIIP, you need to be running z/OS V11.1 (or z/OS V1.9 or V1.10 with the PTFs for APAR OA27495) on a z9, z10, or z196 server.

Keep in mind, that the terms for specialty processors do not change. You can only have 1 zAAP and 1 zIIP per each general purpose processor. So, even if you have zAAP on zIIP configured, the chip is still a zIIP and you cannot have any more than 1 per general purpose processor.

The Bottom Line

The bottom line is that even though it can take some studying and research to understand their benefit and functionality, specialty processors can help to reduce the cost of mainframe computing... and that is a good thing!

What is an Enclave?

If you are a DB2 professional dealing with distributed workload… or if you are enabling zIIP specialty processors… chances are you’ve heard the term “enclave” or “enclave SRB.” But just what is an enclave?

An enclave is a construct that represents a transaction or unit of work. Enclaves are a method of managing mainframe transactions for non-traditional workloads. You can think of an enclave as an anchor point for resource accumulation regardless of where the transaction is executing.

With traditional workloads it is relatively easy to map the resources consumed to the actual transaction doing the consumption. But with non-traditional workloads – web transactions, distributed processing, etc. – it is more difficult because the transaction can span platforms. Enclaves are used to overcome this difficulty by correlating closely to the end user’s view of the transaction.

So even though a non-traditional transaction can comprise multiple “pieces” spanning many server address spaces, and can share those address spaces with other transactions, the enclave gives you more effective control over the non-traditional workload.

If you are interested in more details on enclaves and how they are managed, read through Enclaves – Managing Business Transactions from IBM’s RMF Newsletter.

Wednesday, May 25, 2011

A Quick SQL Trick: Find The Number of Commas

Today's blog post is a short one. I was recently asked how to return a count of specific characters in a text string column. For example, given a text string, return a count of the number of commas in the string.

This can be done using the LENGTH and REPLACE functions as follows:


The first LENGTH function simply returns the length of the text string. The second iteration of the LENGTH function in the expression returns the length of the text string after replacing the target character (in this case a comma) with a blank.

So, let's use a string literal to show a concrete example:


This will translate into 7 - 4... or 3. And there are three commas in the string.

When confronted with a problem like this it is usually a good idea to review the list of built-in SQL functions to see if you can accomplish your quest using SQL alone.

Friday, May 13, 2011

DB2 -- What's in a Name?

Versions of DB2 exist for a large array of platforms, of which the mainframe (z/OS) is only one. Of course, it is my favorite one since I’ve been working on mainframe technology now for decades and have worked with DB2 since Version 1.

It used to be easy: DB2 meant IBM’s mainframe SQL database management system based on the relational model. But you can’t just say the term “DB2” any more and expect people to understand what you mean.

Today there are variations of DB2 that run on the iSeries (AS/400), on Linux, Unix, and Windows (LUW) platforms, and even one that runs on PDAs and smart phones called DB2 Everyplace. Not to mention the mainframe variations that run on z/OS, VM, and VSE.
These products are all collectively referred to by IBM as the DB2 Family. Individually, each DBMS is referred to as DB2, or sometimes DB2 Universal Database Server. There was a period of time when DB2 for LUW was called UDB and DB2 for z/OS was just called DB2. Then IBM tried to rebrand both as DB2 UDB. But that seems to have gone away several versions ago now.
The proper way to refer to any individual offering in the DB2 family is DB2 for (operating system) (for example, DB2 for z/OS or DB2 for Windows).

Different Code Bases

There are four distinct code bases for the products under the DB2 brand. The mainframe has its own code base, as does the iSeries, and VSE/VM. The fourth code base is for Linux, Unix, and Windows (LUW) platforms—and the other DB2 offerings (e.g. DB2 Everyplace) originate from this code base.

Having a separate code base means that each of these DB2 “products” was developed independently from the others. So, for example, the process used by DB2 for z/OS to optimize SQL differs from the process used by DB2 for Linux. Usually, though, the result is similar—an efficient SQL statement.

But keep in mind that there will be some differences between the DB2s.

Some of the Differences

It is obvious that the different DB2 products are not “plug and play” commodities simply because they all share the name DB2. There are some big differences among these products in their current releases. The biggest differences are relatively easy to detect and include the following:
  • Differences imposed due to operating system constraints
    (OS/400 versus z/OS versus AIX)
  • Back-level compatibility issues
  • Workstation orientation differences such as GUI interfaces and drag-and-drop menus
  • Subsystem-centric implementation (z/OS) versus database-centric implementation (workstation)
Most of these differences are minor and easy to handle. Indeed, IBM has slowly but surely been making these disparate implementations of DB2 more and more alike with each new release and version. The interface (or API) by which most people access any of the DB2 Family is SQL and there is broad compatibility among the SQL implementations of the members of the DB2 Family (though not 100 percent, of course).

A misconception “out there” in DB2-land is that the LUW platform drives new features, but a review of the changes that have been introduced to DB2 over the past several versions and releases does not bear that out. Some features are introduced on the mainframe first; others on the distributed platforms first.

Of the basic differences mentioned earlier, the only one that might not be obvious is the focus of the DBMS implementation. DB2 for LUW is database-centric. This implies that each new database carries its own system catalog with it. Additionally, it is not possible to simply access tables across different databases; distributed access is required.

On z/OS, DB2 is subsystem-centric. A single system catalog spans databases. Each subsystem has a unique identification, and you can create multiple databases within it. Distributed requests are not required to access databases within the same subsystem (or, indeed, across multiple subsystems in a data-sharing environment).

Another concept that is different at the workstation level is that of a directory. The DB2 for z/OS Directory houses DBMS system-related information regarding DBD structure, skeleton plan and skeleton package tables, RBA log ranges, and utility control data. The information cannot be updated by the user but is managed and controlled by DB2.

At the workstation level, a directory is another matter altogether. For example, the directory structure used by DB2 for LUW controls the overall environment.
  • The System Database Directory identifies the databases that can be accessed from the workstation and contains an entry for each local and remote one. Each database entry contains the database name, alias, entry type, and location.
  • One Volume Database Directory is allocated per disk drive that contains a workstation database. Each entry identifies the location of a specific database on the drive.
  • The Workstation Directory is used to make a connection to a remote database server. It is used in conjunction with the Database Connection Services Directory to make a connection to a remote host server.
  • The Database Connection Services Directory is used by DB2 Connect to make a connection to a remote host server.
Not only is it possible for the user to update these directories, it is required. The workstation directories define the environment of DB2 for LUW. Without the proper information recorded in these directories, DB2 might not function in the desired manner. The information in these directories is somewhat analogous to DB2 for z/OS DSNZPARMs and the SYSDDF system catalog tables.

Database Structures

Not all the objects available to DB2 for z/OS users are supported at the workstation level. For example, hardware-specific DB2 objects such as table spaces and storage groups are not available for DB2 on other platforms, at least not in the same way that mainframers are used to dealing with them. Partitioning and segmenting as it is done on z/OS is not done on other platforms.

However, DB2 for LUW does provide a feature known as a segmented table. But this is not the same concept as a DB2 for z/OS segmented table space. DB2 for LUW segmented tables are used to span volumes, enabling DB2 to get around file size limitations.

The file structure used for databases differs from platform to platform. For example, DB2 for z/OS uses VSAM Linear Data Sets (LDS) or Entry Sequenced Data Sets (ESDS). A database deployed on DB2 for LUW uses two files for table data: one for normal data and a second to store long fields. These workstation files are flat files, not VSAM files.

Although tables are basically the same for all of the DB2 environments, not all of the DDL options are provided in all of the environments.

Optimizer Differences

One of the most significant benefits of relational databases is that they provide built-in optimization. The DB2 for z/OS optimizer is well-known to mainframe DB2 users, but how similar are the other DB2 optimizers?

DB2 for LUW uses the latest and greatest optimization technology from IBM -- the Starburst optimizer (which arose from IBM’s Almaden research lab). Starburst is a database optimization research project that has been covered quite extensively in the academic press.

As one example of the difference, consider that the DB2 for LUW optimizer has varying levels of optimization that can be selected by the user. This concept is not implemented in DB2 for z/OS.

Although some Starburst technology will find its way to DB2 for z/OS, the mainframe DB2 optimizer will not be completely replaced by Starburst technology. Doing so would not be wise because the DB2 for z/OS optimizer has been finely tuned for its environment over the course of almost three decades.

Another interesting tidbit is that DB2 for iSeries provides an access method for programmers in which they can bypass the relational engine. This is not encouraged, but it is available.

Other Differences

Other differences exist between the different implementations of DB2. Some of these are caused by the different release cycles IBM has created for the differing platforms. The bottom line is that you need to be aware that there are differences between the DB2s on different platforms. Whenever you use a specific implementation of DB2, you need to be aware of the features it supports that other DB2 platforms do not, as well as the features it does not support that other DB2 platforms do support.

Packaging and Naming Issues

The actual name of the DB2 edition can be tricky to master on non-mainframe platforms. On the mainframe you just say “I want DB2,” and that is what you get. Well, almost. You also have to decide whether you want IBM’s utilities or not, too.

But things are more difficult in the LUW world. The following packages are all available for DB2 on Linux, Unix, and Windows:

DB2 Workgroup Server Edition (WSE) is a multi-user, single host, DBMS at the departmental user. It should be deployed for smaller systems with a limited number of users.

DB2 Enterprise Server Edition (ESE) is the highest level of DB2 database version with intra-partition parallelism support (the database engine can process SQL statement segments in parallel), and inter-partition parallelism support (process a query in parallel across all of the nodes). ESE has Partitioning and Clustering options as additional add-on features. So, this is the enterprise DB2.

DB2 Advanced Enterprise Server Edition (AESE) sounds like a step up from ESE, and it is, kind of... but not really in terms of key DBMS technology. The advanced means that IBM integrates Optim and InfoSphere technologies into the product.

DB2 Express Edition is targeted at entry level users at a low price point. Small shops, partners, and new users can build applications on top of DB2 Express.

And DB2 Express-C is IBM’s “free” DBMS offering providing all the “core” capabilities of DB2 at no charge. So why use an open source DBMS when you can get a free version of DB2?
A handy comparison of the editions is available on IBM’s web site.


So you see, saying DB2 is no enough any more. Which DB2? They’re all great, but it can take some time to wrap your arms around all of this…