Wednesday, August 29, 2012

Managing DB2 for z/OS Application Performance

Applications that access databases are only as good as the performance they achieve. And every user wants their software to run as fast as possible. As such, performance tuning and management is one of the biggest demands on the DBA’s time. When asked what is the single most important or stressful aspect of their job, DBAs typically respond “assuring optimal performance.” Indeed, a recent Forrester Research survey indicates that “performance and troubleshooting” tops the list of most challenging DBA tasks.

But when you are dealing with data in a database management system there are multiple interacting components that must be managed and tuned to achieve optimal performance. That is, every database application, at its core, requires three components to operate:

the system (that is, the DBMS itself, the network, and the O/S),
the database (that is, the DDL and database schema), and
the application (that is, the SQL and program logic).

Each of these components requires care and attention, but today I want to focus on the high-level aspects of performance management from the perspective of the application. Furthermore, I will discuss this in terms of DB2 for z/OS.

So where do we begin? For DB2, a logical starting point is with BIND Parameters. There are many parameters and values that must be chosen from and specified when you bind a DB2 application program. The vast array of options at our disposal can render the whole process extremely confusing – especially if you don’t bind on a daily basis. And even if you do, some of the options still might be confusing if you rarely have to change them. You know what I’m talking about, parameters like ACQUIRE, RELEASE, VALIDATE, and DEGREE.

I will not delve into the myriad bind options as there are many articles and books, as well as the IBM DB2 manuals that you can use to guide you along that path. Suffice it to say, that there are some standard parameters and values that should be chosen “most of the time” in certain situations. As such, a wise DBA group will set up canned routines for the programmers to use for compiling and binding their applications. Choices such as: “CICS transaction”, “DB2 batch”, or “analytical query” can be presented to the developer and then, based on which of the various types of programs and environments that are available, the canned script can choose the proper bind options. Doing so can greatly diminish the problems that can be encountered when the “wrong” parameters or values are chosen at bind time.

This same process can be put in place for production binding to ensure that the appropriate parameters and values are chosen. This is especially useful when the binds are not done by a DBA, but are automated in production or done by a less-experienced change control clerk.

Of course, there should always be a method for over-riding the “standard” values for special situations, although these overrides should not be available to anyone other than a well-trained individual (DBA or otherwise).

I want to make one small exception here regarding advice on bind parameters, and that is the EXPLAIN parameter. In production, always bind your plans and packages specifying EXPLAIN YES. Failing to do so means that access paths will be generated, but you will not know what they are. This is akin to blinding yourself to what DB2 is doing and is not advisable.

Access Path Management

Bind and Rebind are important components to achieve optimal DB2 application performance. Bind/Rebind determine the access paths to the data that is accessed by your program. As such, it is vital that you develop an appropriate strategy for when and how to Rebind your programs.

There are several common approaches taken by DB2 users. By far, the best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change. More on this in a moment.

Other approaches include Rebinding only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data – but it still can pose significant administrative issues.

The final approach is from the “if it ain’t broke don’t fix it” school of thought. This approach is the worst of the several approaches discussed here. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Yet, the possibility of degraded performance is real. That is why this approach has been adopted at some sites. The problem is being able to find which statements may be worse. The ideal situation would be to be able to review the access path changes before hand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.
Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions.

By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to DB2 (new releases/PTFs) optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need at least 4 R’s:

Real Time Statistics (or RUNSTATS)
REORG
RUNSTATS
REBIND

But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another “R” – to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don't call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.

So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal DB2 application performance.

Tuning The Code Itself

Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with – and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.

SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:

Let SQL do the work instead of the application program. For example, code an SQL join instead of two cursors and a programmatic join.
Simpler is generally better, but complex SQL can be very efficient.
Retrieve only the columns required, never more.
Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate.
When joining tables, always provide join predicates. In other words, avoid Cartesian products.
Favor using Stage 1 and Indexable predicates.
Avoid sorting if possible by creating indexes for ORDER BY, GROUP BY, and DISTINCT operations.
Avoid black boxes – that is, avoid I/O routines that are called by programs instead of using embedded SQL.
Avoid deadlocks by updating tables in the same sequence in every program.
Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.

To tune SQL the DBA must be able to interpret the output of the access paths produced by EXPLAIN. This information is encoded in the PLAN_TABLEs. IBM and other vendors offer tools to simplify this process, such as IBM's Data Studio.

Finally, some attention must be paid to the host language code. Host language code refers to the application programs written in C, COBOL, Java, Visual Basic or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem.

The Bottom Line

Although DBAs must understand all three aspects of database performance management concentrating on the application aspects of performance will most likely provide the most bang-for-the-buck. Of course, we have only touched the tip of the DB2 application performance iceberg today. But even this high-level view into application performance can serve as a nice starting place for tuning your DB2 applications.

Good luck with DB2 for z/OS and happy performance tuning!

Wednesday, August 15, 2012

Don’t Forget About DB2 Session Variables

Session variables, set by DB2 or by the user, offer another way to provide additional information to applications. Session variables are set by DB2, and are accessible to application programs and end users. By accessing session variables, an application program can determine many aspects of its environment and the DB2 features in effect.

There are a plethora of session variables available to the DB2 programmer as outlined in the accompanying table.

Table 1. DB2 Session Variables.

Session Variable	Description
APPLICATION_ENCODING_SCHEME	The application encoding scheme specified for the subsystem.
COBOL_STRING_DELIMITER	The string delimiter. Value will be DEFAULT, “, or ‘.
DATA_SHARING_GROUP_NAME	Name of the data sharing group.
DATE_FORMAT	The date format in use. Value will be ISO, JIS, USA, EUR, or LOCAL.
DATE_LENGTH	The LOCAL DATE LENGTH install parameter. Value is 10-254, or 0 for no exit.
DECIMAL_ARITHMETIC	The DECIMAL ARITHMETIC install parameter. Value is DEC15, DEC31, 15, or 31.
DECIMAL_POINT	The DECIMAL POINT install parameter. Value is ‘.’ or ‘,’.
DEFAULT_DECFLOAT_ROUND_MODE	The DECFLOAT ROUNDING MODE install parameter.
DEFAULT_DEFAULT_SSID	The value of GROUP ATTACH field on the DSNTIPK installation panel or the SUBSYSTEM NAME field on the DSNTIPM installation panel.
DEFAULT_LANGUAGE	The LANGUAGE DEFAULT install parameter. Value is ASM, C, CPP, IBMCOB, FORTRAN, or PL/I.
DEFAULT_LOCALE_LC_CTYPE	The value of LOCALE LC_CTYPE install parameter.
DSNHDECP_NAME	The fully qualified data set name of the data set from which the DSNHDECP or a user-specified application defaults module was loaded.
DYNAMIC_RULES	The USE FOR DYNAMICRULES install parameter. Value is YES or NO.
ENCODING_SCHEME	The DEF ENCODING SCHEME install parameter. Value is EBCDIC, ASCII, or UNICODE.
MIXED_DATA	The MIXED DATA install parameter. Value is YES or NO.
NEWFUN	The INSTALL TYPE parameter. Value is INSTALL, UPDATE, MIGRATE, or ENFM; reflects the setting of the DSNHDECP variable NEWFUN.
PACKAGE_NAME	Name of the package currently in use.
PACKAGE_VERSION	Version of the current package.
PACKAGE_SCHEMA	Schema name of the current package.
PAD_NUL_TERMINATED	The PAD NUL-TERMINATED install parameter. Value is YES or NO.
PLAN_NAME	Name of the plan currently being run.
SECLABEL	The user’s security label (if any); null if not defined.
SQL_STRING_DELIMITER	The SQL STRING DELIMITER install parameter. Value is DEFAULT, “, or ‘.
SSID	DB2 subsystem identifier.
STANDARD_SQL	The STD SQL LANGUAGE install parameter. Value is YES or NO.
SYSTEM_NAME	Name of the system, as defined in field SUBSYSTEM NAME on installation panel DSNTIPM.
SYSTEM_ASCII_CCSID	A comma-delimited string of the ASCII CCSIDs in use on this system.
SYSTEM_EBCDIC_CCSID	A comma-delimited string of the EBCDIC CCSIDs in use on this system.
SYSTEM_UNICODE_CCSID	A comma-delimited string of the UNICODE CCSIDs in use on this system.
TIME_FORMAT	The TIME FORMAT install parameter. Value is ISO, JIS, USA, EUR, or LOCAL.
TIME_LENGTH	The LOCAL TIME LENGTH install parameter. Value is 8-254, or 0 for no exit.
VERSION	Version of the DB2 subsystem. This value is a string, formatted as pppvvrrm where: ppp is a product string set to the value ‘DSN’. vv is a two-digit version identifier such as ‘09’. rr is a two-digit release identifier such as ‘01’. m is a one-digit maintenance level identifier.

Each session variable must be qualified by SYSIBM. A built-in function named GETVARIABLE can retrieve session variable values. So, you could create a view based on a security label, for example:

CREAT VIEW VSECLBL AS

SELECT column-list

FROM table-name

WHERE SECLABEL_COL = GETVARIABLE(SYSIBM.SECLABEL);

The GETVARIABLE built-in function can be used in views, triggers, stored procedures, and constraints to enforce a security policy.

Users can add up to ten session variables by setting the name and value in their connection or sign-on exits. User-created session variables are qualified by SESSION. For example, the customer might have a connection or sign-on exit that examines the SQL user’s IP address, and maps the IP address to the user’s site within the company. This is recorded in a session variable, named say, USER_SITE. This session variable is then accessible using the built-in function, for example:

GETVARIABLE(SESSION.USER_SITE)

Using session variables much more information is available to application programs as they execute, and more control and security is provided, as well. Additionally, session variables can be trusted. They are set by DB2 and an application cannot modify them.

Monday, July 09, 2012

DB2 Hashing and Hash Organized Tables

Up until DB2 10, all DB2 data was retrieved using some form of indexing or scanning. With DB2 Version 10, a new access method called hashing is available. Of course, referring to hashing as “new” is somewhat disingenuous because it is an old, time-tested data processing technique. Indeed, IMS databases are founded upon hashing technology.

A hash, or hash function, is an algorithm that converts a defined set of data elements into a small number, usually a single integer that can serve as an index to an array or a storage location on disk. The values returned by a hash function are called hash values, hash sums, or simply hashes.

Figure 1 depicts basic hash functionality. Key values are processed by a hash algorithm (also known as a hash function, hash routine, or randomizer). The hash algorithm translates the key value into a storage location. When data is inserted the algorithm tells DB2 where to physically store the data; when data is accessed by the key, the algorithm tells DB2 where to find the data.

Figure 1. How hashing operates.

Hashing is used primarily to optimize random I/O, such as for looking up a code table value or accessing a single row based on the value of its primary key. A hash access can outperform indexed access because fewer I/O operations are required to retrieve the data. The hash requires 1 I/O operations (possibly 2 if a hash collision occurs). An index requires at least 2 I/O operations, one to read the index page and one to read the data page. Of course, only the smallest of indexes consists of just a single root page; most consist of a root page, one or more non-leaf pages, and a leaf page, thereby requiring 3 or more I/O operations.

The traditional notion of clustering is not pertinent for hash organized tables. The data will be organized based on the hash algorithm. Index clustering is not permitted for tables that are hash organized.

The Hash Space

A hash space is a defined storage area used for organizing table data for hash access. Sufficient disk space must be allocated to the hash space to accommodate the hashed rows that will be added to the table.

When the hash space is full, new rows will be relocated to the overflow index. As the amount of data in the overflow increases, hash access performance declines. This is so because DB2 must first go to the hashed location and then follow the pointer in the overflow index to find the data, thereby increasing the number of I/O operations required to retrieve the data. Refer to Figure 2 for an example. In this case, a new row for NEELD needs to be added, but there is no room in the hash space. So the hash space points to an overflow area where the new row can be stored.

Figure 2. Hash overflow.

For this reason, a hash organized table might consume more disk space than a traditional table to minimize overflow rows.

Hash spaces can contain only a single table and must be defined in a universal table space — either partitioned by range or partitioned by growth. The hash overflow index for a table in a partition-by-range Universal table space will be a partitioned index. The hash overflow index for a table in a partition-by-growth Universal table space will be a non-partitioned index.

Creating Hash Organized Tables

Hash access can be specified for a table using the organization-clause of the CREATE TABLE statement: ORGANIZE BY HASH UNIQUE (column-names) HASH SPACE (hash-space-value).

The ORGANIZE BY HASH specification tells DB2 to use hashing for organizing the data of this table. The hash key is specified in the list of column names. By specifying UNIQUE, the table cannot contain more than one row with the same hash key value. And the HASH SPACE parameter defines the amount of fixed hash space to be pre-allocated for the table. For tables defined in a partition-by-range UTS, this space is for each partition.

Caution

Exercise care when specifying UNIQUE if the hash key can contain nulls. Although it is generally considered bad design for keys to contain nulls, columns specified as a hash key can be NULL. If the hash key can contain nulls and UNIQUE is also specified, all nulls for a column are considered equal. To clarify, if the hash key is a single nullable column, it can contain only one null.

For example, consider the following SQL that would create the sample DEPT table as a hash-organized table:

CREATE TABLE DSN81010.DEPT

(DEPTNO CHAR(3) NOT NULL,

DEPTNAME VARCHAR(36) NOT NULL,

MGRNO CHAR(6),

ADMRDEPT CHAR(3) NOT NULL,

LOCATION CHAR(16),

PRIMARY KEY (DEPTNO)

)

IN DSN8D10A.DSN8S10D

ORGANIZE BY HASH UNIQUE (DEPTNO)

HASH SPACE 64 M;

Take care to specify a sufficient amount of storage for the hash space in the organization-clause. By doing so, the overflow index should have a small number of entries, or better yet, no entries at all. Hashing works best for tables that are static in size with limited growth (if any).

If you choose to convert an existing table to use hashing, be sure to walk through the following steps:

Analyze each table you are considering for conversion. Look for static tables with unique keys that are frequently used for single fetch queries.
Before replacing any existing indexes be sure that they are not used for range queries. Hashing is not used for range queries, so these indexes should not be dropped. Any index used solely for direct lookup can be dropped.
Estimate the number of rows and average row size and calculate a sufficient hash space size.
Issue the ALTER ADD organization-clause statement.
Reorganize the table space specifying AUTOESTSPACE YES. Doing so allows DB2 to automatically estimate the best size for the hash space using real-time statistics.
REBIND any applications using SQL with equality predicates on the hash key.

After converting, monitor real-time statistics to evaluate the table’s hash usage efficacy. In SYSIBM.SYSTABLESPACESTATS the REORGHASHACCESS column indicates the number of times that hash access was used to access the table. Furthermore, you can compare the REORGHASHACCESS column to the REORGSCANACCESS column to determine the number of time the table was accessed by the hash versus other techniques. You can also review the HASHLASTUSED column, which contains the date when hash access was last used for SELECT, FETCH, searched UPDATE, searched DELETE, or to enforce referential constraints.

In addition, be sure to verify that hash access is used where you expected. To accomplish this, check the ACCESSTYPE column of the PLAN_TABLE looking for the values H, HN, or MH—these indicate that hash access is used.

When to Use Hashing

Consider using hash organization for tables that are stable or predictable in size and where the data is accessed most often by its primary key (that is, the hash key). Hash-organized tables require an overflow index for rows that exceed the specified hash space. Rows in the overflow index are not hashed—DB2 must scan the index to retrieve rows in the overflow.

Table Space Size Options for Hashing

Choose the table space sizing options and parameters wisely for hash organized tables. Care must be taken to ensure that sufficient space is available for the hashing. For PBR Universal table spaces, be sure to specify an appropriate DSSIZE because it will be used to validate the hash space for each partition.

Consider specifying PRIQTY as –1 to allow DB2 to use the default primary space allocation. Furthermore, set DEFINE YES to ensure that the fixed table space is allocated successfully before the first access.

Free space also is an important consideration for hash organized table spaces. The PCTFREE specification is used by DB2 when the table space is reorganized with AUTOSPACE(YES) specification. DB2 uses the value of the DATASIZE column in the SYSIBM.SYSTABLESPACESTATS as the initial size of the hash table space, and increases that value by the value of PCTFREE to calculate the new size when you reorganize the table space.

Note:This blog post was adapted from the newly published 6th edition of Craig's best-selling book, DB2 Developer's Guide.

Saturday, May 12, 2012

A Cookbook for DB2 LUW Developers

I recently came across an interesting new book on DB2 for Linux, Unix, and Windows titled IBMDB2 9.7 Advanced Application Developer Cookbook. The book shakes up the typical technology book with a cookbook metaphor and succeeds in delivering the best of both of those approaches.

The authors, Sanjay Kumar and Mohankumar Saraswatipura, have obviously used the technology and clearly explain to the reader step-by-step tactics for tackling most of the important components of the DB2 application development life cycle. The book is divided into 9 chapters, each delivering a series of practical recipes for using, creating and managing various aspects of DB2 application programming. The recipes are well-written and easy to understand with lots of supporting code to guide you as you work through the recipe. The cover of the book references “over 70 practical recipes” but it sure seems like there are more than that.

The first chapter of the book covers recipes that focus on the application development enhancements made to DB2 9.7. This is an important chapter because sometimes new features get introduced so rapidly that developers do not get the chance to learn them before the next version comes out. Indeed, with DB2 10 being released in early April 2012, it is even more important to learn what DB2 9.7 added to the mix.

Other chapters I found quite useful were Chapter 5, which covers recipes for coding Java applications against DB2 databases; and Chapter 8 and 9, which covers recipes for monitoring and tuning your DB2 applications.

Of course, the trouble with recipes is that you rely on them when you want to make something specific. A book that contained a recipe for everything you ever wanted to do with DB2 is not practical though. You can, however, use many of the recipes as starting points for beginning the “dish” you wish to “cook” and then add to the recipe the additional “flourishes” you need to make the dish your own.

Whether you are a novice or a long-time DB2 coder this book will be helpful as you design, plan, develop, and optimize your DB2 9.7 applications and databases.

Wednesday, April 25, 2012

IDUG DB2 Technical Conference 2012

The time is drawing near for this year’s North American IDUGDB2 Technical Conference . This year’s event is being held in Denver, CO the week of May 14-18, 2012. The IDUG conference planning committee has brewed up a wonderful agenda of educational sessions, presentations, and courses. Check ‘em all out online here. Whether you are a DBA, a developer, a or a systems programmer there are a plethora of learning opportunities. And there are sessions that run the gamut from DB2 for z/OS to DB2 LUW to data warehousing to Big Data to new version information and more! You can even try your hand at certification at the conference.

Even more importantly, IDUG offers a great chance to build and strengthen your DB2 network. You will meet everyone who is anyone at the IDUG conference. You can network with IBMers, consultants, and peers throughout the week at technical networking sessions and the products and services expo. And don’t forget the Dine Around where you can sign up to eat with your favorite DB2 experts.

One of my favorite stops every year is the IDUG book store, where you can browse the latest publications on DB2 and related technologies. I’m super excited about this year’s book store because the latest edition of my book, DB2 Developer’s Guide, covering through DB2 V10 will be available at the book store this year.

There is truly no other place where you can cram so much DB2 stuff into a week. So if you rely on DB2 in your professional life, the IDUGDB2 Technical Conference is the place to be.

Thursday, March 22, 2012

Working With Variable Data

One of the key design issues that is addressed in most DB2 applications is how to implement character data that varies in size from row to row. The basic trade-off is “ease-of-use and performance” versus “storage requirements.” It is possible to save storage space by using variable columns instead of placing small amounts of data in a large fixed space. Each variable column carries a 2-byte overhead, however, for storing the length of the data. Additionally, variable columns tend to increase CPU usage and can cause the update process to become inefficient. When a variable column is updated with a larger value, the row becomes larger; if not enough space is available to store the row, it must be moved to another page. This makes the update and any subsequent retrieval slower.

Follow these rules when defining variable character columns:

Avoid variable columns if a sufficient disk space is available to store the data using fixed columns. The general rule is: disk is cheap! Trading the cost of disk for more efficient development and subsequent performance is often an easy decision to make.
Though this may be obvious, do not define a variable column when the data does not vary from row to row.
Do not define a variable columns if its maximum length is less than 30 bytes. Furthermore, do not define a variable column if its maximum length is within 10 bytes of the average length of the column. Adhering to these two rules will cause you to choose VARCHAR data types only when they can potentially provide enough DASD savings to offset other costs.
Consider redefining variable columns by placing multiple rows of fixed length columns in another table or by shortening the columns and placing the overflow in another table.

If, after following these guidelines, VARCHAR columns need to be implemented, go ahead and do so. However, it is wise to continuously re-assess the decision to use variable character data.

So how can you monitor the effectiveness of variable columns? Well, it is possible to query the DB2 Catalog to determine the effectiveness of using VARCHAR for a column instead of CHAR. Consider, for example, the PROJNAME column of the DSN81010.PROJ table. It is defined as VARCHAR(24).

To gauge whether VARCHAR is appropriate for this particular column, consider issuing the following SQL query:

     SELECT    COL_LGTH, COUNT(*)
     FROM      (SELECT LENGTH(PROJNAME) AS COL_LENGTH
                FROM   DSN81010.PROJ

               )
     GROUP BY  COL_LGTH
     ORDER BY  COL_LGTH;

This query will produce a report listing the lengths (in this case, from 1 to 24, excluding those lengths which do not occur) and the number of times that each length occurs in the table. These results can be analyzed to determine the range of lengths stored within the variable column.

If you are not concerned about this level of detail, the following query can be used instead to summarize the space characteristics of the variable column in question:

    SELECT  24*COUNT(*),
            SUM(2+LENGTH(PROJNAME)),
            24*COUNT(*)-SUM(2+LENGTH(PROJNAME)),
            24,
            AVG(2+LENGTH(PROJNAME)),
            24-AVG(2+LENGTH(PROJNAME))
    FROM    DSN81010.PROJ;

This query produces a report similar to the one shown below:

SPACE      SPACE        TOTAL   AVG.       AVG.         AVG.
USED AS    USED AS      SPACE   SPACE AS   SPACE AS     SPACE
CHAR(24)   VARCHAR(24)  SAVED   CHAR(24)   VARCHAR(24)  SAVED
--------   -----------  ------  ---------  -----------  -----
158058        96515     61543     24          16          8

The following list itemizes the definition for each of the individual columns calculated by this query:

Definition	Calculation
Space Used As CHAR(24)	24COUNT()
Space Used As VARCHAR(24)	SUM(2+LENGTH(PROJNAME))
Total Space Saved Using VARCHAR	24COUNT()-SUM(2+LENGTH(PROJNAME))
Avg. Space Used As CHAR(24)	24
Avg. Space Used As VARCHAR(24)	AVG(2+LENGTH(PROJNAME))
Avg. Space Saved Using VARCHAR	24-AVG(2+LENGTH(PROJNAME))

The query can be modified to be used for any VARCHAR-defined column. The constant 24 can be changed to indicate the maximum length of the variable column as defined in the DDL. Using these tools, you can better judge the actual disk space savings accruing as a result of VARCHAR usage.

Synopsis

There are sound reasons for using variable data types within DB2 databases. Yet, it is important to remember that business conditions change and what may have been a sound reason for using VARCHAR data in the past, may no longer be sound. Use the queries in this article to determine if it is still appropriate to use VARCHAR data. In general, do not use VARCHAR for small columns or for columns whose length does not vary considerably.

Sunday, March 04, 2012

Fetching Multiple Rows

When you need to retrieve multiple rows, consider deploying a multi-row fetch to transfer more than one row using a single FETCH statement. This capability was added as of DB2 Version 8.

A multi-row FETCH retrieves multiple rows at one time into column arrays in your application program. By fetching multiple rows at once, your request can become more efficient, especially for distributed requests. The performance improvement using multi-row FETCH in general depends on several factors, such as whether the request is distributed, the number of rows to be fetched, the complexity of the SELECT statement, and the number of columns being fetched.

Nevertheless, using multi-row FETCH (in local environments) can improve performance with a significant reduction of CPU time possible. Tests conducted by IBM have shown performance gains of between 25% and 40% processing 10 rows per SQL statement for programs processing a considerable number of rows. With such significant gains possible, why hasn’t everyone moved to multi-row FETCH? Well, perhaps because it requires programming changes. A multi-row FETCH requires a cursor defined with rowset positioning. A rowset is a group of rows that are operated on as a set. Such a cursor enables your program to retrieve more than one row using a single FETCH statement. By fetching multiple rows at once, your request might become more efficient, especially for distributed requests.

To use this feature, you must DECLARE your cursor as using the WITH ROWSET POSITIONING parameter. For example

EXEC SQL
  DECLARE CURSOR SAMPCURS
  WITH ROWSET POSITIONING
  FOR
  SELECT DEPTNO
  FROM   DSN81010.DEPT
END-EXEC.

Furthermore, to use a multi-row fetch you must have defined the appropriate structures to receive multi-row data. This means you must have defined an array of host variables into which the fetched rows can be placed. Each column fetched requires its own host variable array into which its values will be placed. Be sure to match the array size to the rowset size. With the appropriate setup coded, FETCH statements can be written to retrieve more than a single row from the result set. For example

FETCH ROWSET FROM SAMPCURS
  FOR 5 ROWS
  INTO HOSTVAR-ARRAY;

As you can see, the multiple-row fetch block is identical to a single-row-fetch block, except that there are two additional clauses—ROWSET and FOR n ROWS. The ROWSET clause specifies that the orientation of this cursor is rowset positioning (instead of single row). The FOR n ROWS clause specifies the size of the rowset to be returned. The maximum rowset size is 32,767.

Rowset cursors are very useful when you need to retrieve many rows or large amounts of data in distributed systems. By retrieving multiple rows with a single FETCH, multiple trips between the application and the database can be eliminated, thereby improving network performance.

To learn more about multi-row FETCH consider attending my upcoming webinar on the topic. This presentation will introduce and define multi-row FETCH, how to use it, and the performance implications of doing so. The presentation will also touch upon multi-row UPDATE. And it will introduce the new SoftBase Attach Facility MRF Feature, which allows you to implement multi-row FETCH without coding changes. To attend, sign up at this link: https://www1.gotomeeting.com/register/754473201

Tuesday, February 28, 2012

Identifying Unused Indexes

Did you know that DB2 V9 added a new column to the Real Time Statistics to help identify unused indexes?

The LASTUSED column in the SYSINDEXSPACESTATS table contains a date indicating the last time the index was used. Any time the index is used to satisfy a SELECT, FETCH, searched UPDATE, searched DELETE, or to enforce a referential constraint, the date is updated.

This helps to solve the problem of determining whether or not an index is being used. Standard operating advice is to DROP or delete anything that is not going to be used. But trying to determine whether something is actually used or not can be difficult at times.

You could always query your PLAN_TABLEs or the plan and package dependency tables in the DB2 Catalog for static SQL. But what about dynamic SQL? That is more difficult. Now, as of DB2 V9, you can simply query the LASTUSED column to see when the index was last used. The LASTUSED date is by partition. So, for a partitioned index, the last used date of each partition in the index should be checked.

Of course, you will have to give it some time because you might have an index supporting a rarely used query. Most shops have queries and programs that run quarterly, or even annually, but nevertheless are very important... and you wouldn't want to drop indexes on those queries even though they do not run frequently because when they do run, they are important...

Examine the LASTUSED column over time to determine which indexes are truly not being used, and then DROP the unused indexes.

Thursday, February 16, 2012

Update on DB2 Developer's Guide, 6th edition

I know a lot of my readers are waiting on the updated edition of my book, DB2 Developer's Guide, so I thought I'd post a short update on the progress. The technical edits are over and production will be starting soon. The book is scheduled now for publication in early May 2012 and is available to be pre-ordered now on amazon com.

The book has been completely updated and is now up-to-date with DB2 10 for z/OS. Just think of the things that have been added to DB2 since the last time the book was updated: Universal table spaces, pureXML, SECADM, hashes, new data types, INSTEAD OF triggers, temporal support, and much, much more.

Consider pre-ordering a copy today so you'll get it as soon as it comes off the presses!