Friday, February 01, 2013

A Brief Introduction to the DB2 Catalog


The system catalog, or the DB2 Catalog, offers a wealth of information about DB2. If the DB2 optimizer is the heart and soul of DB2, the DB2 Catalog is its memory. The knowledge base of every object known to DB2 is stored in the DB2 Catalog, along with the DB2 Directory and the BSDS (Bootstrap Data Set).

The tables in the DB2 Catalog collectively describe the objects and resources available to DB2. You can think of the DB2 Catalog as a metadata repository for your DB2 databases. As of Version 10, the DB2 Catalog is composed of 90 table spaces and 137 tables all in a single database named DSNDB06. These numbers have grown considerably since the early days of DB2. The DB2 Catalog consisted of 25 tables in 11 table spaces for the first version of DB2 and as recently as DB2 V8, there were only 21 table spaces and 87 tables. The following table runs down the history:


Over the course of the past couple releases, the DB2 Catalog has undergone many significant changes. For most of its life, the DB2 Catalog contained many multi-table table spaces. As of DB2 10 for z/OS, IBM made an effort to clean that up, and now only a few table spaces are in the DB2 Catalog with more than one table defined. As of V10, most of the table spaces in the DB2 Catalog are now universal table spaces. In addition, the DB2 Catalog now must be SMS-managed.

Even as many new tables have been added to the DB2 Catalog to support new features such as trusted context, XML, and access path management, some tables have been removed. The SYSPROCEDURES table, which was used to register stored procedures in earlier version of DB2, was removed as of DB2 V9. And the SYSLINKS table was removed for  DB2 V10.


The SYSLINKS table was used to record the links (or pointers) that existed in several of the older DB2 Catalog table spaces (SYSDBASE, SYSPLAN, SYSDBAUT, SYSVIEW, SYSGROUP), as well as in the DB2 Directory (DBD01). Links were used to tie tables together hierarchically—not unlike an IMS database—using a special type of relationship. However, links are obsolete in DB2 as of V10.

Each DB2 Catalog table maintains data about an aspect of the DB2 environment. In that respect, the DB2 Catalog functions as a data dictionary for DB2, supporting and maintaining data about the DB2 environment. The DB2 Catalog records all the information required by DB2 for the following functional areas:
  • Database Objects: Storage groups, databases, table spaces, partitions, tables, auxiliary tables, columns, user-defined distinct types, views, synonyms, aliases, sequences, indexes, index keys, foreign keys, relationships, schemas, user-defined functions, stored procedures, triggers, and so on.
  • Programs: Plans, packages, DBRMs, and Java/JAR information
  • XML: XML Schema Repository tables
  • Security: Database privileges, plan privileges, schema privileges, system privileges, table privileges, view privileges, use privileges, trusted contexts, roles, and audit ­policies
  • Utility: Image copy data sets, REORG executions, LOAD executions, and object organization efficiency information
  • Communication: How DB2 subsystems are connected for communication, data distribution, and DRDA usage
  • Performance: Statistics, profiles, queries, and auto alerts
  • Environmental: Control and administrative information (such as details on image copies and the dummy tables)

How does the DB2 Catalog support data about these areas? For the most part, the tables of the DB2 Catalog cannot be modified using standard SQL data manipulation language statements. You do not use INSERT statements, DELETE statements, or UPDATE statements (with a few exceptions) to modify these tables. Instead, the DB2 Catalog operates as a semi-active, integrated, and non-subvertible data dictionary. The definitions of these three adjectives follow.

First, the DB2 Catalog is semi-active. An active dictionary is built, maintained, and used as the result of the creation of the objects defined to the dictionary. In other words, as the user is utilizing the intrinsic functions of the DBMS, metadata is being accumulated and populated in the active data dictionary.

The DB2 Catalog, therefore, is active in the sense that when standard DB2 SQL is issued, the DB2 Catalog is either updated or accessed. All the information in the DB2 Catalog, however, is not completely up-to-date, and some of the tables must be proactively populated (such as SYSIBM.IPNAMES and SYSIBM.IPLIST). But, for the most part, the DB2 Catalog operates as an active data dictionary, particularly with regard to SQL. Remember that the three types of SQL are DDL, DCL, and DML. When DDL is issued to create DB2 objects such as databases, table spaces, and tables, the pertinent descriptive information is automatically stored in the DB2 Catalog.

When a CREATE, DROP, or ALTER statement is issued, information is recorded or updated in the DB2 Catalog. For example, upon successfully issuing a CREATE TABLE statement, DB2 populates the metadata for the table into SYSTABLES and SYSCOLUMNS, as well as possibly into SYSSEQUENCES, SYSFIELDS, SYSCHECKS, and SYSCHECKDEP depending upon the exact DDL that was issued.

The same is true for security SQL data control language statements. The GRANT and REVOKE statements cause information to be added or removed from DB2 Catalog tables. For example, if you issue GRANT TABLE, DB2 potentially adds metadata to SYSTABAUTH and SYSCOLAUTH.

Data manipulation language SQL (SELECT, INSERT, UPDATE, MERGE, DELETE) statements use the DB2 Catalog to ensure that the statements accurately reference the DB2 objects being manipulated (such as column names and data types).

Why then is the DB2 Catalog classified as only semi-active rather than completely active? The DB2 Catalog houses important information about the physical organization of DB2 objects. For example, the following information is maintained in the DB2 Catalog:

  • The number of rows in a given DB2 table or a given DB2 table space
  • The number of distinct values in a given DB2 index
  • The physical order of the rows in the table for a set of keys


This information is populated by means of the DB2 RUNSTATS utility. A truly active data dictionary would update this information as data is populated in the application table spaces, tables, and indexes. Some of these statistics are now actively populated in the Real Time Statistics table in the DB2 Catalog, making them active. But because some of the information in the DB2 Catalog is not always completely up-to-date, it is only a semi-active system catalog.

I also decsribed the DB2 Catalog as being integrated. The DB2 Catalog and the DB2 DBMS are inherently bound together, neither having purpose or function without the other. The DB2 Catalog without DB2 defines nothing; DB2 without the DB2 Catalog has nothing defined that it can operate on.

The final adjective used to classify the DB2 Catalog is non-subvertible. This simply means that the DB2 Catalog is continually updated as DB2 is being used; the most important metadata in the DB2 Catalog cannot be updated behind DB2’s back. Suppose that you created a table with 20 columns. You cannot subsequently update the DB2 Catalog to indicate that the table has 15 columns instead of 20 without using standard DB2 data definition language SQL statements to drop and re-create the table.

An Exception to the Rule  

As with most things in life, there are exceptions to the basic rule that the SQL data manipulation language cannot be used to modify DB2 Catalog tables. You can modify columns (used by the DB2 optimizer) that pertain to the physical organization of table data. 

Querying the DB2 Catalog

Because the DB2 Catalog consists of DB2 tables, you can write SQL queries to easily retrieve the metadata information about your DB2 environment. You can write queries to discover all sorts of interesting and useful information about DB2 across the following broad categories:

  • Navigational queries, which help you to maneuver through the sea of DB2 objects in your DB2 subsystems
  • Physical analysis queries, which depict the physical state of your application table spaces and indexes
  • Queries that aid programmers (and other analysts) in identifying the components of DB2 packages and plans
  • Application efficiency queries, which combine DB2 Catalog statistics with the PLAN_TABLE output from EXPLAIN to identify problem queries quickly
  • Authorization queries, which identify the authority implemented for each type of DB2 security
  • Historical queries, which use the DB2 Catalog HIST tables to identify and monitor changing data patterns
  • Partition statistics queries, which aid the analysis of partitioned table spaces 


In addition to aiding development, DB2 Catalog queries can also aid performance tuning and administration of your production environment. An effective strategy for monitoring DB2 objects using catalog queries can help to catch and forestall problems before they affect performance. By monitoring DB2 objects using DB2 Catalog queries, you can more effectively forecast disk needs and other resource usage, making it easier to plan for future capacity needs.

Summary


The DB2 Catalog is a rich source of information about your DB2 subsystem and applications. Be sure to use it to simplify your DB2 development and administrative efforts. 


Note: This blog post was adapted from material in the sixth and latest edition of Craig's book, DB2 Developer's Guide.

Tuesday, January 15, 2013

Upcoming Webinar: Data Security in the Age of Regulatory Compliance


Webinar Title: Data Security in the Age of Regulatory Compliance
Presenter: Craig S. Mullins
Date: Wednesday, January 23
Time: 2pm Eastern / 11am Pacific
Cost: Free
Register Link: https://www1.gotomeeting.com/register/990275648
As governmental regulations expand, organizations need to deploy better controls to ensure quality data and properly protected database systems. Sarbanes-Oxley, HIPAA, BASEL II, PCI DSS and more make the news, but what do they mean in terms of your data? And what steps can be taken to ensure compliance?
Anyone who has been paying attention lately knows at least something about the large number of data breaches in the news… and their impact on business. Data breaches and the threat of lost or stolen data will continue to plague organizations until comprehensive plans are enacted to combat them. Many of these breaches have been at the database level, and more will be unless better data protection policies and procedures are enacted on operational databases.
As a result of expanded regulations and the ever-present specter of data breaches, data security has grown in importance. And that places new burdens on DBAs and data management professionals. If you are interested in learning more about this topic -- and steps you can take to ensure compliance -- be sure to register for my upcoming webinar sponsored by SoftBase Systems --> Data Security in the Age of Regulatory Compliance. This presentation will offer an overview of this new landscape focusing particularly on techniques for improving data and database security.
Topics to be discussed include:
  • An Introduction to Industry and Governmental Regulations
  • The Pervasiveness of Data Breaches with Techniques for Avoidance and Remediation
  • Long-term Data Retention
  • Database Activity Monitoring and Auditing
  • Database Security and  Encryption
  • Test Data Management
  • Data Masking
  • Metadata Management

Sunday, January 13, 2013

Two New Group Privileges in DB2 10 for z/OS


DB2 10 for z/OS delivers two new group level privileges to enable more granular and functional security support for DB2 administrators. The system DBADM authority is for DBAs at shops looking to minimize SYSADM usage, and SQLADM authority is for users who focus predominantly on performance-related issues.

System DBADM Authority is a DB2 V10  capability to better support separation of duties. System DBADM authority can be assigned to enable a user to manage all objects within a DB2 subsystem but without necessarily accessing data. This authority can be granted to an authid or role. By using system DBA authority judiciously, the need for SYSADM authority can be minimized.

So, as of DB2 V10, DBADM security can be granted at the system level, or at a database-by-database level as in all past versions of DB2.

Two granular options can be set when granting system DBADM authority: ACCESSCTRL and DATAACCESS. You can specify whether the system DBADM designation is to be granted with or without either.

Specifying WITH ACCESSCTRL indicates that the ACCESSCTRL authority is granted along with the system DBADM authority. ACCESSCTRL enables system DBADM to grant all authorities and privileges, except system DBADM, DATAACCESS, ACCESSCTRL authorities and privileges on security-related objects. And, of course, WITHOUT ACCESSCTRL specifies that these abilities are not granted to the system DBADM.
Specifying WITH DATAACCESS indicates that the DATAACCESS authority is granted along with the system DBADM authority. DATAACCES enables the system DBADM to access data in all user tables, views, and materialized query tables in a DB2 subsystem and enables the user to execute plans, packages, functions, and procedures. Specifying WITHOUT DATAACCESS specifies that the capability to access data is not granted to the system DBADM.

Many security regulations and compliance initiatives favor prohibiting high-level authorities, such as SYSADM and DBADM, being conferred with data access privileges. Keeping administrative and data access separate is another control designed to protect user data. 

DB2 V10 also introduces the ability to grant the SQLADM privilege for DBAs who work as SQL performance specialists. Some organizations delineate job responsibilities into granular roles, such as recovery DBA or SQL performance tuner.

The SQLADM privilege can be granted to authids and roles. An agent with SQLADM authority can perform SQL and SQL performance management-related actions without requiring any additional privileges.

SQLADM authority includes the capability to perform the following:
  •  Issue the DESCRIBE TABLE statement.
  •  Execute the EXPLAIN statement with any of the following options: PLAN, ALL
  • STMTCACHE ALL, STMTID, STMTTOKEN, and MONITORED STMTS.
  •  Execute the PREPARE statement.
  •  Explain dynamic SQL statements that execute under the special register CURRENT EXPLAIN MODE, when CURRENT EXPLAIN MODE = EXPLAIN.
  •  Issue BINDs specifying EXPLAIN(ONLY) or SQLERROR(CHECK).
  •  Issue START and STOP commands.
  •  Issue the DISPLAY PROFILE command.
  •  Execute the RUNSTATS and MODIFY STATISTICS utilities for any database.
  •  Obtain appropriate IFCID data using the MONITOR2 privilege. 

Thursday, December 20, 2012

Seasons Greetings!

Just a short post today to wish all of my readers a very happy holiday season and to let you know that I will not be posting anything new between now and the end of the year... but be sure to check back again in 2013 as I continue to write about DB2 and mainframe issues that impact us all!


See you all next year!

Monday, November 26, 2012

SQL Coding Guidelines: The Basics


When you are writing your SQL statements to access DB2 data be sure to follow the subsequent guidelines for coding SQL for performance. These are certain very simple, yet important rules to follow when writing your SQL statements. Of course, SQL performance is a complex topic and to understand every nuance of how SQL performs can take a lifetime. That said, adhering to the following simple rules puts you on the right track to achieving high-performing DB2 applications.


1)  The first rule is to always provide only the exact columns that you need to retrieve in the SELECT-list of each SQL SELECT statement. Another way of stating this is “do not use SELECT *”. The shorthand SELECT * means retrieve all columns from the table(s) being accessed. This is fine for quick and dirty queries but is bad practice for inclusion in application programs because:
  • DB2 tables may need to be changed in the future to include additional columns. SELECT * will retrieve those new columns, too, and your program may not be capable of handling the additional data without requiring time-consuming changes.
  • DB2 will consume additional resources for every column that requested to be returned. If the program does not need the data, it should not ask for it. Even if the program needs every column, it is better to explicitly ask for each column by name in the SQL statement for clarity and to avoid the previous pitfall.

2)  Do not ask for what you already know. This may sound simplistic, but most programmers violate this rule at one time or another. For a typical example, consider what is wrong with the following SQL statement:

    SELECT   EMPNO, LASTNAME, SALARY
    FROM      EMP
    WHERE   EMPNO = '000010';

Give up? The problem is that EMPNO is included in the SELECT-list. You already know that EMPNO will be equal to the value '000010' because that is what the WHERE clause tells DB2 to do. But with EMPNO listed in the WHERE clause DB2 will dutifully retrieve that column too. This causes additional overhead to be incurred thereby degrading performance.

3)  Use the WHERE clause to filter data in the SQL instead of bringing it all into your program to filter. This too is a common rookie mistake. It is much better for DB2 to filter the data before returning it to your program. This is so because DB2 uses additional I/O and CPU resources to obtain each row of data. The fewer rows passed to your program, the more efficient your SQL will be. So, the following SQL:

    SELECT   EMPNO, LASTNAME, SALARY
    FROM      EMP
    WHERE   SALARY > 50000.00;

Is better than simply reading all of the data without the WHERE clause and then checking each row to see if the SALARY is greater than 50000.00 in your program.

These rules, though, are not the be-all, end-all of SQL performance tuning – not by a long shot. Additional, in-depth tuning may be required. But following the above rules will ensure that you are not making “rookie” mistakes that can kill application performance.