Monday, April 15, 2013

DB2 Locking, Part 1: An Overview

DB2 automatically guarantees the integrity of data by enforcing several locking strategies. These strategies permit multiple users from multiple environments to access and modify data concurrently. Basically, DB2 combines the following strategies to implement an overall locking strategy:

• Table and table space locking

• IRLM page and row locking

• Internal page and row latching

• Claims and drains to achieve partition independence

• Checking commit log sequence numbers (CLSN) and PUNC bits to achieve lock avoidance

• Global locking through the coupling facility in a data sharing environment

But what exactly is locking? How does DB2 utilize these strategies to lock pages and guarantee data integrity? Why does DB2 have to lock data before it can process it? What is the difference between a lock and a latch? How can DB2 provide data integrity while operating on separate partitions concurrently? Finally, how can DB2 avoid locks and still guarantee data integrity?

Today's blog post will offer a high level introduction to DB2 locking, why it is needed, as well as the benefits and drawbacks that it introduces to using DB2 databases.

Anyone accustomed to application programming when access to a database is required understands the potential for concurrency problems. When one application program tries to read data that is in the process of being changed by another, the DBMS must forbid access until the modification is complete to ensure data integrity. Most DBMS products, DB2 included, use a locking mechanism for all data items being changed. Therefore, when one task is updating data on a page, another task cannot access data (read or update) on that same page until the data modification is complete and committed.

When multiple users can access and update the same data at the same time, a locking mechanism is required. This mechanism must be capable of differentiating between stable data and uncertain data. Stable data has been successfully committed and is not involved in an update in a current unit of work. Uncertain data is currently involved in an operation that could modify its contents.

Consider two DB2 application programs. If program #1 updates a piece of data on page 1, you must ensure that program #2 cannot access the data until program #1 commits the unit of work. Otherwise, a loss of integrity could result. Without a locking mechanism, the following sequence of events would be possible:

Program #1 retrieves a row from DSN81010.EMP for EMPNO ‘000010’.
Program #1 issues an update statement to change that employee’s salary to 55000.
Program #2 retrieves the DSN81010.EMP row for EMPNO ‘000010’. Because the change was not committed, the old value for the salary, 52750, is retrieved.
Program #1 commits the change, causing the salary to be 55000.
Program #2 changes a value in a different column and commits the change.
The value for salary is now back to 52750, negating the change made by program #1.

The DBMS avoids this situation by using a locking mechanism. DB2 supports locking at four levels, or granularities: table space-, table-, page-, and row-level locking. DB2 also provides LOB locking for large objects (BLOBs, CLOBs, and DBCLOBs).

Locks can be taken at any level in the locking hierarchy without taking a lock at the lower level. However, locks cannot be taken at the lower levels without a compatible higher-level lock also being taken. For example, you can take a table space lock without taking any other lock, but you cannot take a page lock without first securing a table space-level lock (and a table lock as well if the page is part of a table in a segmented table space).

Additionally, a page lock does not have to be taken before a row lock is taken. Your locking strategy requires an “either/or” type of choice by table space: either row locking or page locking. An in-depth discussion on the merits of both are beyond the scope of today's blog post, but will be covered in future posts.

Both page locks and row locks can escalate to a table level and then to a table space level for segmented tables or straight to a table space level for partitioned table spaces. A table or table space cannot have both page locks and row locks held against it at the same time.

Many modes of locking are supported by DB2, but they can be divided into two types:

• Locks to enable the reading of data

• Locks to enable the updating of data

But remember, we are still talking at a very high, and somewhat simplistic level. DB2 uses varieties of these two types of locks to indicate the type of locking required.

Locks Versus Latches

A true lock is handled by DB2 using the IRLM. However, whenever doing so is practical, DB2 tries to lock resources without going to the IRLM. This type of lock is called a latch. Whereas true locks are always set in the IRLM, latches are set internally by DB2, without going to the IRLM.

When a latch is taken instead of a lock, it is handled in the Buffer Manager by internal DB2 code; so the cross-memory service calls to the IRLM are eliminated. Latches are usually held only briefly—for a shorter duration than locks. Also, a latch requires about one-third the number of instructions as a lock. Therefore, latches are more efficient than locks because they avoid the overhead associated with calling an external address space. Latches are used when a resource serialization situation is required for a short time. Both latches and locks guarantee data integrity. In subsequent sections, any usage of the term lock generically, refers to both locks and latches.

Lock Duration

Before you learn about the various types of locks that can be acquired by DB2, you should understand lock duration, which refers to the length of time that a lock is maintained. The duration of a lock is based on the BIND options chosen for the program requesting locks. Locks can be acquired either immediately when the program is requested to be run or iteratively as needed during the execution of the program. Locks can be released when the plan is terminated or when they are no longer required for a unit of work.

The BIND parameters affecting DB2 locking are summarized below:

The BIND parameters that impact table space locks are the ACQUIRE and RELEASE parameters.

ACQUIRE(ALLOCATE | USE): The ALLOCATE option specifies that locks will be acquired when the plan is allocated, which normally occurs when the first SQL statement is issued. The USE option indicates that locks will be acquired only as they are required, SQL statement by SQL statement. As of DB2 10, ACQUIRE(ALLOCATE) is no longer supported when binding or rebinding plans.
RELEASE(DEALLOCATE | COMMIT): When you specify DEALLOCATE for a program, locks are not released until the plan is terminated. When you specify COMMIT, table space locks are released when a COMMIT is issued.

The BIND parameter affecting page and row locks is the ISOLATION parameter. There are four choices for isolation level:

ISOLATION(CS), or Cursor Stability, acquires and releases page locks as pages are read and processed. CS provides the greatest level of concurrency at the expense of potentially different data being returned by the same cursor if it is processed twice during the same unit of work.
ISOLATION(RR), or Repeatable Read, holds page and row locks until a COMMIT point; no other program can modify the data. If data is accessed twice during the unit of work, the same exact data will be returned.
ISOLATION(RS), or Read Stability, holds page and row locks until a COMMIT point, but other programs can INSERT new data. If data is accessed twice during the unit of work, new rows may be returned, but old rows will not have changed.
ISOLATION(UR), or Uncommitted Read, is also known as dirty read processing. UR avoids locking altogether, so data can be read that never actually exists in the database.

Regardless of the ISOLATION level chosen, all page locks are released when a COMMIT is encountered.

I think that is enough for this first installment on DB2 locking... come back soon and we will expand on table space locks in the next installment of this blog series on DB2 locking...

Monday, March 18, 2013

DB2 Table Editors

In today's blog post I want to briefly discuss one of the more useful, yet often ignored, DB2 tools available on the market -- Table Editors...

Typically, the only method of updating DB2 data (indeed, any data stored in a relational database) is with SQL data manipulation language statements DELETE, INSERT, and UPDATE (or with a database load). Because these SQL statements operate on data a set at a time, multiple rows -- or even all of the rows -- can be affected by a single SQL statement. Coding SQL statements for every data modification required during the application development and testing phase can be time-consuming and error-prone.

A table editing tool can reduce the time needed to make simple data modifications by providing full-screen edit capability for database tables. The user specifies the table to edit and is placed into an edit session. The data is presented to the user as a series of rows, with the columns separated by spaces. A header line indicates the column names. The data can be scrolled up and down as well as left and right. To change data, the user simply types over the current data.

This type of tool is ideal for supporting the application development process. A programmer can make quick changes without coding SQL. Also, if properly implemented, a table editor can reduce the number of erroneous data modifications made by beginning SQL users.

When a table editor is used, all columns are available for update. Thus, if a table editor is used to change production data, a simple mis-keying can cause unwanted updates. Native SQL should be used if you must ensure that only certain columns are updated.

One final note: Tested SQL statements and application programs are characterized by their planned nature. These modification requests are well thought out and tested. This is not true for changes implemented through a table editor, so always exercise caution when using a table editor.

Examples of DB2 Table Editors include:

BMC Catalog Manager (which includes an integrated table editor)
CA RC/Update (which includes an integrated table editor)
IBM DB2 Table Editor
RazorSQL DB2 Table Editor

Thursday, February 14, 2013

The Importance of Database Design

It is impossible to over-estimate the importance of database design on the effectiveness and efficiency of application systems. The first step, of course, is to create a logical data model of the business information that must be stored in, and accessed through, the database. This is a non-trivial task, but it is not the focus of today’s blog post, which is implementing optimal and well-designed database schemata.

Oftentimes the database design does not get the attention it deserves. This can occur for numerous reasons such as:

Insufficient specifications and/or poor logical data modeling
Not enough time in the development schedule
Too many changes occurring throughout the development cycle
Database design assigned to, or performed by novices

And even when the database design is being performed by experienced professionals with sufficient time and a solid logical model, it is very easy for design flaws to creep into the database. This is especially the case for larger and more complex databases required to support mission critical applications.

Of course, the first step in constructing a physical database should be transforming the logical design using best practices. The transformation consists of the following:

Transforming entities into tables
Transforming attributes into columns
Transforming domains into data types and constraints
Transforming relationships into primary and foreign keys

But a simple transformation will not result in a complete and correct physical database design – it is merely the first step. And design flaws can be introduced even during such a transformation.

The process of normalizing your data should be conducted during the logical design phase, but sometimes mistakes are made during the logical modeling process. In simple terms, normalization is the process of identifying the one best place where each fact belongs. A normalized design minimizes data redundancy and optimizes data structures by systematically and properly placing data elements into the appropriate groupings.

First normal form (1NF) eliminates repeating groups and non-atomic data from an entity.
Second normal form (2NF) ensures that all the attributes of each entity are dependent on the primary key.
Third normal form (3NF) ensures that no relationships between attributes exist within an entity.

Although normalization is a logical process and does not necessarily dictate physical database design it is good practice to implement normalized physical databases – especially with today’s powerful hardware and database systems. A normalized data model will ensure that each entity is well formed and that each attribute is assigned to the proper entity. And denormalized data structures should always be documented.

A database that is not normalized can be difficult to modify (because facts may be strewn about within the database) and prone to data quality issues (again because one fact may be in more than one place). Failing to achieve appropriate normalization is a significant design flaw.

Design flaws can make it difficult to access a database in an efficient manner. Something as simple as creating columns with inconsistent data type and length across tables can introduce inefficiency. For example, perhaps JOB_CODE is defined as CHAR(10) in one table and CHAR(15) in another. If these columns need to be compared or joined, additional work is required by the DBMS to make the columns comparable. It is easy to see how a database design issue can make applications inefficient no matter how adept the developers are. And this is just one type of design flaw.

There are numerous other types of design flaws that can negatively impact the usability and efficiency of a database implementation (and the applications that use it). Improperly defined constraints (referential, check, uniqueness) can cause data quality problems. Improper indexing (to support constraints and other physical structures) can cause a database to operate inefficiently or even cease operating altogether. In DB2 for z/OS, failing to explicitly name a clustering index causes DB2 to default cluster sequence to the oldest index. Changing index structures, which sometimes require dropping and recreating the index, can cause the data to be ordered in a completely different physical sequence, thereby impacting performance.

Some database design flaws are more subtle. What happens if you create two incompatible check constraints? For example, consider the following:

CHECK (empno < 100)

CHECK (empno >= 101)

No data can be added to the table because no number is less than 100 and greater than or equal to 101. Of course, this is an extreme example to highlight the problem. Another situation can occur if the default value is not one of the values permitted by the check constraint, for example:

emp_type CHAR(8) DEFAULT 'new'

CHECK (emp_type IN ('temp',
'fulltime',
'contract')), ...

Cycles are another potential problem that can be created in a physical database schema. A cycle is a referential path that connects a table to itself. This can occur when multiple tables are related back to each other and it looks like a loop when diagrammed. DB2 forbids a table from being delete-connected to itself in a cycle. A table is delete-connected to another table if it is a dependent of a table specified with a CASCADE delete rule.

Nested triggers can also create problems. A trigger is fired by an INSERT, UPDATE, or DELETE operation. However, a trigger can also contain INSERT, UPDATE, or DELETE logic within itself. Therefore, a trigger is fired by a data modification, but can also cause another data modification, thereby firing yet another trigger. When a trigger contains INSERT, UPDATE, and/or DELETE logic (including MERGE statements), the trigger is said to be a nested trigger. If referential integrity is combined with triggers, additional cascading updates and/or deletes can occur.

DB2 restricts the number of nested triggers that can be executed within a single firing event. If this were not done, it could be quite possible to have triggers firing triggers ad infinitum until all of the data was removed from an entire database. DB2 limits this cascading effect to 16 levels to prevent endless looping. If more than 16 levels of nesting occur, the transaction is aborted. Such nesting may restrict certain types of data modification from happening at all because the number of nested calls will always be exceeded. So nesting levels need to be controlled and managed in the database schema to avoid problems.

And this is by no means a comprehensive list of database design flaws that can cause real problems for application developers and end users. Getting the database design correct is imperative if you wish to have any hope of assuring application performance.

So how can you go about examining the correctness of your database structures? The best approach is an automated one such as that provided by Database Examiner, a product offered by DBE Software that performs a comprehensive validation of a database schema. The in-depth technology incorporated into Database Examiner allows you to review your database schema, identify problems, and remediate the issues to ensure a quality database implementation offering peak performance.

Simply provide Database Examiner with the data model, DDL, or a link to the active database and it will perform a series of comprehensive diagnostics. Database Examiner applies the rules of relational technology to detect flaws, inconsistencies and lack of integrity. The product applies more than 50 diagnostics that can be organized by category or severity levels. And you can customize the diagnostics by selecting those to be executed and assigning each one a severity level.

But Database Examiner doesn’t just highlight the flaws, it also helps to fix them be providing recommendations and corrections for each issue it detects, including the generation of SQL DDL scripts to remedy the flaws. Even better, Database Examiner provides multi-platform DBMS support, including DB2 for z/OS, DB2 LUW, Oracle, Microsoft SQL Server and Sybase.

And DBE Software is currently offering folks the ability to download and use a scaled down “Performance Edition” of Database Examiner, free ofcharge for a limited time. Be sure to enter the promo code MULL on the download request to get the best offer available.

Hopefully this blog entry has convinced you that database design is important… important enough to take some time to evaluate the quality of your existing database structures. And to take a look at automating the process using Database Examiner (using the promo code MULL ).

Friday, February 01, 2013

A Brief Introduction to the DB2 Catalog

The system catalog, or the DB2 Catalog, offers a wealth of information about DB2. If the DB2 optimizer is the heart and soul of DB2, the DB2 Catalog is its memory. The knowledge base of every object known to DB2 is stored in the DB2 Catalog, along with the DB2 Directory and the BSDS (Bootstrap Data Set).

The tables in the DB2 Catalog collectively describe the objects and resources available to DB2. You can think of the DB2 Catalog as a metadata repository for your DB2 databases. As of Version 10, the DB2 Catalog is composed of 90 table spaces and 137 tables all in a single database named DSNDB06. These numbers have grown considerably since the early days of DB2. The DB2 Catalog consisted of 25 tables in 11 table spaces for the first version of DB2 and as recently as DB2 V8, there were only 21 table spaces and 87 tables. The following table runs down the history:

Over the course of the past couple releases, the DB2 Catalog has undergone many significant changes. For most of its life, the DB2 Catalog contained many multi-table table spaces. As of DB2 10 for z/OS, IBM made an effort to clean that up, and now only a few table spaces are in the DB2 Catalog with more than one table defined. As of V10, most of the table spaces in the DB2 Catalog are now universal table spaces. In addition, the DB2 Catalog now must be SMS-managed.

Even as many new tables have been added to the DB2 Catalog to support new features such as trusted context, XML, and access path management, some tables have been removed. The SYSPROCEDURES table, which was used to register stored procedures in earlier version of DB2, was removed as of DB2 V9. And the SYSLINKS table was removed for DB2 V10.

The SYSLINKS table was used to record the links (or pointers) that existed in several of the older DB2 Catalog table spaces (SYSDBASE, SYSPLAN, SYSDBAUT, SYSVIEW, SYSGROUP), as well as in the DB2 Directory (DBD01). Links were used to tie tables together hierarchically—not unlike an IMS database—using a special type of relationship. However, links are obsolete in DB2 as of V10.

Each DB2 Catalog table maintains data about an aspect of the DB2 environment. In that respect, the DB2 Catalog functions as a data dictionary for DB2, supporting and maintaining data about the DB2 environment. The DB2 Catalog records all the information required by DB2 for the following functional areas:

Database Objects: Storage groups, databases, table spaces, partitions, tables, auxiliary tables, columns, user-defined distinct types, views, synonyms, aliases, sequences, indexes, index keys, foreign keys, relationships, schemas, user-defined functions, stored procedures, triggers, and so on.
Programs: Plans, packages, DBRMs, and Java/JAR information
XML: XML Schema Repository tables
Security: Database privileges, plan privileges, schema privileges, system privileges, table privileges, view privileges, use privileges, trusted contexts, roles, and audit policies
Utility: Image copy data sets, REORG executions, LOAD executions, and object organization efficiency information
Communication: How DB2 subsystems are connected for communication, data distribution, and DRDA usage
Performance: Statistics, profiles, queries, and auto alerts
Environmental: Control and administrative information (such as details on image copies and the dummy tables)

How does the DB2 Catalog support data about these areas? For the most part, the tables of the DB2 Catalog cannot be modified using standard SQL data manipulation language statements. You do not use INSERT statements, DELETE statements, or UPDATE statements (with a few exceptions) to modify these tables. Instead, the DB2 Catalog operates as a semi-active, integrated, and non-subvertible data dictionary. The definitions of these three adjectives follow.

First, the DB2 Catalog is semi-active. An active dictionary is built, maintained, and used as the result of the creation of the objects defined to the dictionary. In other words, as the user is utilizing the intrinsic functions of the DBMS, metadata is being accumulated and populated in the active data dictionary.

The DB2 Catalog, therefore, is active in the sense that when standard DB2 SQL is issued, the DB2 Catalog is either updated or accessed. All the information in the DB2 Catalog, however, is not completely up-to-date, and some of the tables must be proactively populated (such as SYSIBM.IPNAMES and SYSIBM.IPLIST). But, for the most part, the DB2 Catalog operates as an active data dictionary, particularly with regard to SQL. Remember that the three types of SQL are DDL, DCL, and DML. When DDL is issued to create DB2 objects such as databases, table spaces, and tables, the pertinent descriptive information is automatically stored in the DB2 Catalog.

When a CREATE, DROP, or ALTER statement is issued, information is recorded or updated in the DB2 Catalog. For example, upon successfully issuing a CREATE TABLE statement, DB2 populates the metadata for the table into SYSTABLES and SYSCOLUMNS, as well as possibly into SYSSEQUENCES, SYSFIELDS, SYSCHECKS, and SYSCHECKDEP depending upon the exact DDL that was issued.

The same is true for security SQL data control language statements. The GRANT and REVOKE statements cause information to be added or removed from DB2 Catalog tables. For example, if you issue GRANT TABLE, DB2 potentially adds metadata to SYSTABAUTH and SYSCOLAUTH.

Data manipulation language SQL (SELECT, INSERT, UPDATE, MERGE, DELETE) statements use the DB2 Catalog to ensure that the statements accurately reference the DB2 objects being manipulated (such as column names and data types).

Why then is the DB2 Catalog classified as only semi-active rather than completely active? The DB2 Catalog houses important information about the physical organization of DB2 objects. For example, the following information is maintained in the DB2 Catalog:

The number of rows in a given DB2 table or a given DB2 table space

The number of distinct values in a given DB2 index

The physical order of the rows in the table for a set of keys

This information is populated by means of the DB2 RUNSTATS utility. A truly active data dictionary would update this information as data is populated in the application table spaces, tables, and indexes. Some of these statistics are now actively populated in the Real Time Statistics table in the DB2 Catalog, making them active. But because some of the information in the DB2 Catalog is not always completely up-to-date, it is only a semi-active system catalog.

I also decsribed the DB2 Catalog as being integrated. The DB2 Catalog and the DB2 DBMS are inherently bound together, neither having purpose or function without the other. The DB2 Catalog without DB2 defines nothing; DB2 without the DB2 Catalog has nothing defined that it can operate on.

The final adjective used to classify the DB2 Catalog is non-subvertible. This simply means that the DB2 Catalog is continually updated as DB2 is being used; the most important metadata in the DB2 Catalog cannot be updated behind DB2’s back. Suppose that you created a table with 20 columns. You cannot subsequently update the DB2 Catalog to indicate that the table has 15 columns instead of 20 without using standard DB2 data definition language SQL statements to drop and re-create the table.

An Exception to the Rule

As with most things in life, there are exceptions to the basic rule that the SQL data manipulation language cannot be used to modify DB2 Catalog tables. You can modify columns (used by the DB2 optimizer) that pertain to the physical organization of table data.

Querying the DB2 Catalog

Because the DB2 Catalog consists of DB2 tables, you can write SQL queries to easily retrieve the metadata information about your DB2 environment. You can write queries to discover all sorts of interesting and useful information about DB2 across the following broad categories:

Navigational queries, which help you to maneuver through the sea of DB2 objects in your DB2 subsystems
Physical analysis queries, which depict the physical state of your application table spaces and indexes
Queries that aid programmers (and other analysts) in identifying the components of DB2 packages and plans
Application efficiency queries, which combine DB2 Catalog statistics with the PLAN_TABLE output from EXPLAIN to identify problem queries quickly
Authorization queries, which identify the authority implemented for each type of DB2 security
Historical queries, which use the DB2 Catalog HIST tables to identify and monitor changing data patterns
Partition statistics queries, which aid the analysis of partitioned table spaces

In addition to aiding development, DB2 Catalog queries can also aid performance tuning and administration of your production environment. An effective strategy for monitoring DB2 objects using catalog queries can help to catch and forestall problems before they affect performance. By monitoring DB2 objects using DB2 Catalog queries, you can more effectively forecast disk needs and other resource usage, making it easier to plan for future capacity needs.

Summary

The DB2 Catalog is a rich source of information about your DB2 subsystem and applications. Be sure to use it to simplify your DB2 development and administrative efforts.

Note: This blog post was adapted from material in the sixth and latest edition of Craig's book, DB2 Developer's Guide.

Tuesday, January 15, 2013

Upcoming Webinar: Data Security in the Age of Regulatory Compliance

Webinar Title: Data Security in the Age of Regulatory Compliance
Presenter: Craig S. Mullins
Date: Wednesday, January 23
Time: 2pm Eastern / 11am Pacific
Cost: Free
Register Link: https://www1.gotomeeting.com/register/990275648

As governmental regulations expand, organizations need to deploy better controls to ensure quality data and properly protected database systems. Sarbanes-Oxley, HIPAA, BASEL II, PCI DSS and more make the news, but what do they mean in terms of your data? And what steps can be taken to ensure compliance?

Anyone who has been paying attention lately knows at least something about the large number of data breaches in the news… and their impact on business. Data breaches and the threat of lost or stolen data will continue to plague organizations until comprehensive plans are enacted to combat them. Many of these breaches have been at the database level, and more will be unless better data protection policies and procedures are enacted on operational databases.

As a result of expanded regulations and the ever-present specter of data breaches, data security has grown in importance. And that places new burdens on DBAs and data management professionals. If you are interested in learning more about this topic -- and steps you can take to ensure compliance -- be sure to register for my upcoming webinar sponsored by SoftBase Systems --> Data Security in the Age of Regulatory Compliance. This presentation will offer an overview of this new landscape focusing particularly on techniques for improving data and database security.

Topics to be discussed include:

An Introduction to Industry and Governmental Regulations
The Pervasiveness of Data Breaches with Techniques for Avoidance and Remediation
Long-term Data Retention
Database Activity Monitoring and Auditing
Database Security and Encryption
Test Data Management
Data Masking
Metadata Management

Sunday, January 13, 2013

Two New Group Privileges in DB2 10 for z/OS

DB2 10 for z/OS delivers two new group level privileges to enable more granular and functional security support for DB2 administrators. The system DBADM authority is for DBAs at shops looking to minimize SYSADM usage, and SQLADM authority is for users who focus predominantly on performance-related issues.

System DBADM Authority is a DB2 V10 capability to better support separation of duties. System DBADM authority can be assigned to enable a user to manage all objects within a DB2 subsystem but without necessarily accessing data. This authority can be granted to an authid or role. By using system DBA authority judiciously, the need for SYSADM authority can be minimized.

So, as of DB2 V10, DBADM security can be granted at the system level, or at a database-by-database level as in all past versions of DB2.

Two granular options can be set when granting system DBADM authority: ACCESSCTRL and DATAACCESS. You can specify whether the system DBADM designation is to be granted with or without either.

Specifying WITH ACCESSCTRL indicates that the ACCESSCTRL authority is granted along with the system DBADM authority. ACCESSCTRL enables system DBADM to grant all authorities and privileges, except system DBADM, DATAACCESS, ACCESSCTRL authorities and privileges on security-related objects. And, of course, WITHOUT ACCESSCTRL specifies that these abilities are not granted to the system DBADM.

Specifying WITH DATAACCESS indicates that the DATAACCESS authority is granted along with the system DBADM authority. DATAACCES enables the system DBADM to access data in all user tables, views, and materialized query tables in a DB2 subsystem and enables the user to execute plans, packages, functions, and procedures. Specifying WITHOUT DATAACCESS specifies that the capability to access data is not granted to the system DBADM.

Many security regulations and compliance initiatives favor prohibiting high-level authorities, such as SYSADM and DBADM, being conferred with data access privileges. Keeping administrative and data access separate is another control designed to protect user data.

DB2 V10 also introduces the ability to grant the SQLADM privilege for DBAs who work as SQL performance specialists. Some organizations delineate job responsibilities into granular roles, such as recovery DBA or SQL performance tuner.

The SQLADM privilege can be granted to authids and roles. An agent with SQLADM authority can perform SQL and SQL performance management-related actions without requiring any additional privileges.

SQLADM authority includes the capability to perform the following:

Issue the DESCRIBE TABLE statement.
Execute the EXPLAIN statement with any of the following options: PLAN, ALL,
STMTCACHE ALL, STMTID, STMTTOKEN, and MONITORED STMTS.
Execute the PREPARE statement.
Explain dynamic SQL statements that execute under the special register CURRENT EXPLAIN MODE, when CURRENT EXPLAIN MODE = EXPLAIN.
Issue BINDs specifying EXPLAIN(ONLY) or SQLERROR(CHECK).
Issue START and STOP commands.
Issue the DISPLAY PROFILE command.
Execute the RUNSTATS and MODIFY STATISTICS utilities for any database.
Obtain appropriate IFCID data using the MONITOR2 privilege.

Thursday, December 20, 2012

Seasons Greetings!

Just a short post today to wish all of my readers a very happy holiday season and to let you know that I will not be posting anything new between now and the end of the year... but be sure to check back again in 2013 as I continue to write about DB2 and mainframe issues that impact us all!

See you all next year!

Monday, November 26, 2012

SQL Coding Guidelines: The Basics

When you are writing your SQL statements to access DB2 data be sure to follow the subsequent guidelines for coding SQL for performance. These are certain very simple, yet important rules to follow when writing your SQL statements. Of course, SQL performance is a complex topic and to understand every nuance of how SQL performs can take a lifetime. That said, adhering to the following simple rules puts you on the right track to achieving high-performing DB2 applications.

1) The first rule is to always provide only the exact columns that you need to retrieve in the SELECT-list of each SQL SELECT statement. Another way of stating this is “do not use SELECT *”. The shorthand SELECT * means retrieve all columns from the table(s) being accessed. This is fine for quick and dirty queries but is bad practice for inclusion in application programs because:

DB2 tables may need to be changed in the future to include additional columns. SELECT * will retrieve those new columns, too, and your program may not be capable of handling the additional data without requiring time-consuming changes.
DB2 will consume additional resources for every column that requested to be returned. If the program does not need the data, it should not ask for it. Even if the program needs every column, it is better to explicitly ask for each column by name in the SQL statement for clarity and to avoid the previous pitfall.

2) Do not ask for what you already know. This may sound simplistic, but most programmers violate this rule at one time or another. For a typical example, consider what is wrong with the following SQL statement:

SELECT EMPNO, LASTNAME, SALARY

FROM EMP

WHERE EMPNO = '000010';

Give up? The problem is that EMPNO is included in the SELECT-list. You already know that EMPNO will be equal to the value '000010' because that is what the WHERE clause tells DB2 to do. But with EMPNO listed in the WHERE clause DB2 will dutifully retrieve that column too. This causes additional overhead to be incurred thereby degrading performance.

3) Use the WHERE clause to filter data in the SQL instead of bringing it all into your program to filter. This too is a common rookie mistake. It is much better for DB2 to filter the data before returning it to your program. This is so because DB2 uses additional I/O and CPU resources to obtain each row of data. The fewer rows passed to your program, the more efficient your SQL will be. So, the following SQL:

SELECT EMPNO, LASTNAME, SALARY

FROM EMP

WHERE SALARY > 50000.00;

Is better than simply reading all of the data without the WHERE clause and then checking each row to see if the SALARY is greater than 50000.00 in your program.

These rules, though, are not the be-all, end-all of SQL performance tuning – not by a long shot. Additional, in-depth tuning may be required. But following the above rules will ensure that you are not making “rookie” mistakes that can kill application performance.

Tuesday, November 20, 2012

Happy Thanksgiving

I know it is only Tuesday (right now), but here's wishing all of my readers and all of their families and friends a very happy and restful Thanksgiving holiday. Visit with your relatives... Eat up... Watch football... Relax...

And try not to get trampled on Friday if you're going out shopping for bargains!

Wednesday, October 31, 2012

Eliminate Performance Management Black Holes

Today’s modern applications are not restricted to a single platform. Yet, in many ways, mainframe management and administration professionals are still tackling performance issues and problem resolution using the same methods they did when mainframe applications were all COBOL and never strayed off of the centralized mainframe computing platform. And this causes many problems.

The IT infrastructure of most organizations is multi-tiered, and business transactions flow through multiple systems and services to deliver business value. This means that today’s applications can utilize the most appropriate and useful technology for each component, thereby delivering more functional and easier to use transactions. Development time can be faster, too.

So far, so good… but performance management and problem resolution become difficult challenges in this brave new world. A business application that relies upon multiple differing computing platforms, technologies, and components to deliver service is characterized by its complexity. Consider an application where users deploy Windows workstations running .NET applications that connect to Linux servers running Oracle and stored procedures that connect to the mainframe to access DB2 data. Such an infrastructure consists of a lot of connecting points to stitch together significantly differing hardware and software platforms. And what happens if there is a problem? Tracking down the root cause of the problem can be difficult.

DBAs and system administrators for each platform may have tooling to monitor and diagnose their particular portion of the infrastructure, but a siloed approach is not pragmatic and usually results in a lot of finger pointing instead of problem solving. You know what I’m talking about – “there’s a problem with DB2!”

The mainframe DBAs usually have a performance monitor for DB2 and the systems programmers have a monitor for CICS and z/OS. And the distributed DBAs and system administrators have their monitors, too, for Oracle, SQL Server, Linux, Unix and so on.

So let’s assume that this multi-tier application begins exhibiting poor performance. Where do you start? You can’t fix what you can’t see, so unless the problem is easily monitored within an individual silo, discovering what is wrong and fixing the issue becomes a lengthy, troublesome, and expensive task. The problem could be anywhere, so it’s a major undertaking to pinpoint the root cause.

To the mainframe professionals the application goes into a “black hole” when it leaves the mainframe; to the distributed computing professionals, it is just the other way around with the mainframe being the “black hole.” Even with visibility into some components of the application, at some point the transactions vanish into one of the black holes. A siloed approach to managing performance just does not work when the application spans multiple silos. What is needed is an application performance management solution.

Organizations must be able to track business transactions from end to end; that is, from the time the user makes a request all through the entire infrastructure required to provide an answer and right back to the end user. Such a tracking solution must be able to follow the transaction in real time and report on the resources consumed at each point along the way. By providing real-time monitoring of each transaction flowing end-to-end across distributed applications it becomes possible to see the problem, to identify its root cause, to determine performance issues, and to solve problems more rapidly and less expensively.

Think about it. What are the applications and transactions like at your organization these days? How many can satisfy business needs completely on a single platform? Have you ever tried to resolve an issue or identify the root cause of a problem for an application that spans multiple platforms? When did the trail become cold because the transaction disappeared? And how did you move past all the finger-pointing?

Compuware’s latest offering, APM for Mainframe, delivers end to end transaction management so that the mainframe is no longer a black hole. Using this solution you can track your distributed applications across every platform, find the root causes of problems and performance issues, and resolve them on the spot.

Without such a solution you just have to keep living with the pain. And that means unresponsive distributed applications, slower time to correct problems, lost revenue, and higher administrative costs.

Friday, October 12, 2012

Book Signing and Discount at IOD Conference

Just a short blog post today to let everybody know that I will be doing a book signing for the new edition of both of my books at the 2012 Information on Demand conference on Tuesday, October 23rd.

You can choose to have me sign the new 6th edition of my DB2 book, DB2 Developer's Guide... or the new 2nd edition of my DBA book, Database Administration: The Complete Guide to DBA Practices and Procedures.

Or you can be my favorite type of person and choose to get them both signed!

See you in Vegas!

Wednesday, October 03, 2012

DB2 11 for z/OS: Coming Soon!

Today, October 3, 2012, IBM announced the Early Support Program for the next version of DB2. Heretofore code-named Sequoia, the DB2 11 ESP is planned for March 8, 2013 availability. So if you are still lagging behind running DB2 9 (or, heaven forbid, DB2 V8) now is the time to get yourself to DB2 10 so you'll be ready for the imminent onslaught of DB2 11 for z/OS.

The announcement (212-364) offers up a bit of information on some of the features that are planned to be available in DB2 11, including:

Performance improvements for some types of online transaction processing (OLTP), heavy insert, select query workloads, and when running queries against compressed tables;
Improved-data sharing performance and efficiency;
Improved utility performance and additional zIIP eligible workload (particularly with LOAD and RUNSTATS);
Cost-effective archiving of warm and cold data with easy access to both within a single query;
Intelligent statistics gathering and advanced optimization technology for efficient query execution in dynamic workloads;
Additional online schema changes that simplify management, reduce the need for planned outages, and minimize the need for REORG;
Productivity improvements for DBAs, application developers, and system administrators;
Efficient real-time scoring within your existing transaction environment;
Enhanced analysis, forecasting, reporting, and presentation capabilities, as well as improved storage management, in QMF;
Expanded SQL, SQL PL, temporal, and XML function for better application performance;
Faster migration with application protection from incompatible SQL and XML changes and simpler catalog migration.

One of the more intriguing tidbits is the new SQL PL ARRAY support. IBM is claiming this feature may be able to offer up to 10 percent CPU savings for OLTP workloads with high read-to-write ratios.

So get ready for DB2 11 - I'm sure we'll hear more about it at the IOD conference, coming up at the end of the month.

Wednesday, August 29, 2012

Managing DB2 for z/OS Application Performance

Applications that access databases are only as good as the performance they achieve. And every user wants their software to run as fast as possible. As such, performance tuning and management is one of the biggest demands on the DBA’s time. When asked what is the single most important or stressful aspect of their job, DBAs typically respond “assuring optimal performance.” Indeed, a recent Forrester Research survey indicates that “performance and troubleshooting” tops the list of most challenging DBA tasks.

But when you are dealing with data in a database management system there are multiple interacting components that must be managed and tuned to achieve optimal performance. That is, every database application, at its core, requires three components to operate:

the system (that is, the DBMS itself, the network, and the O/S),
the database (that is, the DDL and database schema), and
the application (that is, the SQL and program logic).

Each of these components requires care and attention, but today I want to focus on the high-level aspects of performance management from the perspective of the application. Furthermore, I will discuss this in terms of DB2 for z/OS.

So where do we begin? For DB2, a logical starting point is with BIND Parameters. There are many parameters and values that must be chosen from and specified when you bind a DB2 application program. The vast array of options at our disposal can render the whole process extremely confusing – especially if you don’t bind on a daily basis. And even if you do, some of the options still might be confusing if you rarely have to change them. You know what I’m talking about, parameters like ACQUIRE, RELEASE, VALIDATE, and DEGREE.

I will not delve into the myriad bind options as there are many articles and books, as well as the IBM DB2 manuals that you can use to guide you along that path. Suffice it to say, that there are some standard parameters and values that should be chosen “most of the time” in certain situations. As such, a wise DBA group will set up canned routines for the programmers to use for compiling and binding their applications. Choices such as: “CICS transaction”, “DB2 batch”, or “analytical query” can be presented to the developer and then, based on which of the various types of programs and environments that are available, the canned script can choose the proper bind options. Doing so can greatly diminish the problems that can be encountered when the “wrong” parameters or values are chosen at bind time.

This same process can be put in place for production binding to ensure that the appropriate parameters and values are chosen. This is especially useful when the binds are not done by a DBA, but are automated in production or done by a less-experienced change control clerk.

Of course, there should always be a method for over-riding the “standard” values for special situations, although these overrides should not be available to anyone other than a well-trained individual (DBA or otherwise).

I want to make one small exception here regarding advice on bind parameters, and that is the EXPLAIN parameter. In production, always bind your plans and packages specifying EXPLAIN YES. Failing to do so means that access paths will be generated, but you will not know what they are. This is akin to blinding yourself to what DB2 is doing and is not advisable.

Access Path Management

Bind and Rebind are important components to achieve optimal DB2 application performance. Bind/Rebind determine the access paths to the data that is accessed by your program. As such, it is vital that you develop an appropriate strategy for when and how to Rebind your programs.

There are several common approaches taken by DB2 users. By far, the best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change. More on this in a moment.

Other approaches include Rebinding only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data – but it still can pose significant administrative issues.

The final approach is from the “if it ain’t broke don’t fix it” school of thought. This approach is the worst of the several approaches discussed here. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Yet, the possibility of degraded performance is real. That is why this approach has been adopted at some sites. The problem is being able to find which statements may be worse. The ideal situation would be to be able to review the access path changes before hand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.
Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions.

By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to DB2 (new releases/PTFs) optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need at least 4 R’s:

Real Time Statistics (or RUNSTATS)
REORG
RUNSTATS
REBIND

But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another “R” – to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don't call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.

So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal DB2 application performance.

Tuning The Code Itself

Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with – and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.

SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:

Let SQL do the work instead of the application program. For example, code an SQL join instead of two cursors and a programmatic join.
Simpler is generally better, but complex SQL can be very efficient.
Retrieve only the columns required, never more.
Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate.
When joining tables, always provide join predicates. In other words, avoid Cartesian products.
Favor using Stage 1 and Indexable predicates.
Avoid sorting if possible by creating indexes for ORDER BY, GROUP BY, and DISTINCT operations.
Avoid black boxes – that is, avoid I/O routines that are called by programs instead of using embedded SQL.
Avoid deadlocks by updating tables in the same sequence in every program.
Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.

To tune SQL the DBA must be able to interpret the output of the access paths produced by EXPLAIN. This information is encoded in the PLAN_TABLEs. IBM and other vendors offer tools to simplify this process, such as IBM's Data Studio.

Finally, some attention must be paid to the host language code. Host language code refers to the application programs written in C, COBOL, Java, Visual Basic or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem.

The Bottom Line

Although DBAs must understand all three aspects of database performance management concentrating on the application aspects of performance will most likely provide the most bang-for-the-buck. Of course, we have only touched the tip of the DB2 application performance iceberg today. But even this high-level view into application performance can serve as a nice starting place for tuning your DB2 applications.

Good luck with DB2 for z/OS and happy performance tuning!