Just a short blog post to let everybody know that I will be in Orlando next week for the IDUG DB2 Tech Conference. I'll be delivering an education seminar (DB2 Developer's Guide Comes Alive!) on Monday and giving two presentations on Thursday...
On Tuesday and Wednesday I'll be roaming around the conference, attending sessions, and looking to say hello to old friends and colleagues.
Hope to see you there!
IDUG NA 2013
Friday, April 26, 2013
Thursday, April 25, 2013
DB2 Locking, Part 4: Page and Row Locks
In the first three installments of this series on DB2 locking we have looked ata broad overview of locking (part 1), table and table space locks (part 2) and the difference between locks and latches (part 3). Today we will move ahead and discuss page and row locking.
Page Locking
The types of page locks that
DB2 can take are outlined in Table 1. S-locks allow data to be read concurrently but not modified. With
an X-lock, data on a page can be
modified (with INSERT, UPDATE, or DELETE), but concurrent access is not allowed. U-locks enable X-locks to be queued, whereas S-locks exist on data that must
be modified.
Table 1. Page Locks
As with table space locks,
concurrent page locks can be acquired but only with compatible page locks. The
compatibility matrix for page locks is shown in Table 2.
Table 2. Page Lock Compatibility Matrix
When are these page locks
taken? Page locks can be acquired only under the following conditions:
- The DDL for the object requesting a lock specifies LOCKSIZE PAGE or LOCKSIZE ANY.
- If LOCKSIZE ANY was specified, the NUMLKTS threshold or the table space LOCKMAX specification must not have been exceeded. You learn more about these topics later in this section.
If all these factors are met,
page locking progresses as outlined in Table 3. The type of processing in
the left column causes the indicated page lock to be acquired for the scope of
pages identified in the right column. DB2 holds each page lock until it is
released as specified in the ISOLATION level of the plan requesting the particular lock. Page locks can be promoted from one type of lock to another based
on the type of processing that is occurring. A program can FETCH a row using a cursor with the FOR UPDATE OF clause, causing a U-lock to be
acquired on that row’s page. Later, the program can modify that row, causing
the U-lock to be promoted to an X-lock.
Table 3. How Page Locks Are Acquired
Row
Locks
The smallest piece of DB2 data
that you can lock is the individual row. The types of row locks that DB2 can
take are similar to the types of page locks that it can take. Refer back to Table 1 and simply replace Page with Row. So row locks act like page locks, only on a smaller granularity (that is, on rows instead of pages).
S-locks allow data to be read
concurrently but not modified. With an X-lock, you can modify data in that row (using INSERT, UPDATE, MERGE, or DELETE), but concurrent access is
not allowed. U-locks enable X-locks to be queued, whereas S-locks exist on data that must be modified.
Once again, concurrent row
locks can be acquired but only with compatible row locks. Table 2 works the same way for row locks as it does for page locks.
When are these row locks
taken? Row locks can be acquired when the DDL for the object requesting a lock
specifies LOCKSIZE ROW. (Although it is theoretically possible for LOCKSIZE ANY to choose row locks, in
practice I have yet to see this happen.) Again, we can use an earlier Table (Table 3) replacing the word page with the word row to see how row locking progresses. The type of processing in the left column causes the indicated row
lock to be acquired for the scope of rows identified in the right column. A row
lock is held until it is released as specified by the ISOLATION level of the plan requesting
the particular lock.
Row locks can be promoted from one type of lock to another based
on the type of processing that is occurring. A program can FETCH a row using a cursor with the FOR UPDATE OF clause, causing a U-lock to be acquired on that
row. Later, the program can modify that row, causing the U-lock to be promoted to an X-lock.
Page
Locks Versus Row Locks
The answer to the question of
whether to use page locks or row locks is, of course, “It depends!” The nature
of your specific data and applications determine whether page or row locks are
most applicable.
The resources required to
acquire, maintain, and release a row lock are just about the same as the
resources required for a page lock. Therefore, the number of rows per page must
be factored into the row-versus-page locking decision. The more rows per page,
the more resources row locking will consume. For example, a table space with a
single table that houses 25 rows per page can consume as much as 25 times more
resources for locking if row locks are chosen over page locks. Of course, this
estimate is very rough, and other factors (such as lock avoidance) can reduce
the number of locks acquired, and thereby reduce the overhead associated with
row locking. However, locking a row-at-a-time instead of a page-at-a-time can
reduce contention. Row locking almost always consumes more resources than page
locking. Likewise, if two applications running concurrently access the same
data in different orders, row locking might actually decrease concurrent data
access.
You must therefore ask these
questions:
- What is the nature of the applications that access the objects in question? Of course, the answer to this question differs not only from organization to organization, but also from application to application within the same organization.
- Which is more important, reducing the resources required to execute an application or increasing data availability? The answer to this question will depend upon the priorities set by your organization and any application teams accessing the data.
As a general rule of thumb,
favor specifying LOCKSIZE PAGE, as page locking is generally the most practical locking strategy
for most applications. If you’re experiencing severe contention problems on a
table space that is currently using LOCKSIZE
PAGE, consider changing to LOCKSIZE ROW and gauging the impact on
performance, resource consumption, and concurrent data access. Alternatively,
you also might choose to specify LOCKSIZE
ANY and let DB2 choose the type of locking to be
performed.
Note
Note: A possible alternative to row locking is to specify MAXROWS 1 for the table space and use LOCKSIZE PAGE (or LOCKSIZE ANY), instead of LOCKSIZE ROW.
Monday, April 22, 2013
DB2 Locking, Part 3: Locks Versus Latches
So far in this series on DB2 locking we have offered up a broad overview of what locking is and then delved into the world of table and table space locks. In this short entry, before we tackles page and row locks, we are going to look at the difference between a lock and a latch.
A true lock is handled by DB2
using the IRLM. The IRLM, or internal resource lock manager, is both a separate subsystem and an integral component of DB2. As its name implies, it manages locks for DB2.
However, whenever doing so is practical, DB2 can lock
resources without going to the IRLM. This type of lock is called a latch. True locks are always set in
the IRLM. Latches, by contrast, are set internally by DB2, without going to the
IRLM.
When a latch is taken instead
of a lock, it is handled in the Buffer Manager by internal DB2 code; so the
cross-memory service calls to the IRLM are eliminated. Latches are usually held
only briefly—for a shorter duration than locks. Also, a latch requires about
one-third the number of instructions as a lock. Therefore, latches are more
efficient than locks because they avoid the overhead associated with calling an
external address space. Latches are used when a resource serialization
situation is required for a short time. Both latches and locks guarantee data
integrity.
In subsequent blog entries, any usage of the term lock
generically, refers to both locks and latches.
Thursday, April 18, 2013
DB2 Locking, Part 2: Table Space and Table Locks
Today's post is the second in our DB2 locking series and it covers the topic of table space and table locks.
Table Space Locks
A table space lock is acquired when a DB2 table or index is accessed. Note that I said accessed, not updated. The table space is locked even when simple read-only access is occurring.
Refer to to Table 1 below for a listing of the types of table space locks that can be acquired during the execution of an SQL statement. Every table space lock implies two types of access: the access acquired by the lock requester and the access allowed to other subsequent, concurrent processes.
Table 1. Table Space Locks
(click on image for larger view)
When an SQL statement is issued and first accesses data, it takes an intent lock on the table space. Later in the process, actual S-, U-, or X-locks are taken. The intent locks (IS, IX, and SIX) enable programs to wait for the required S-, U-, or X-lock that needs to be taken until other processes have released competing locks.
The type of table space lock used by DB2 during processing is contingent on several factors, including the table space LOCKSIZE specified in the DDL, the bind parameters chosen for the plan being run, and the type of processing requested. Table 2 provides a synopsis of the initial table space locks acquired under certain conditions.
Table 2. How Table Space Locks Are Acquired
(click on image for larger view)
A table space U-lock indicates intent to update, but an update has not occurred. This is caused by using a cursor with the FOR UPDATE OF clause. A U-lock is non-exclusive because it can be taken while tasks have S-locks on the same table space. More information on table space lock compatibility follows in Table 3.
An additional consideration is that table space locks are usually taken in combination with table and page locks, but they can be used on their own. When you specify the LOCKSIZE TABLESPACE DDL parameter, table space locks alone are used as the locking mechanism for the data in that table space. This way, concurrent access is limited and concurrent update processing is eliminated.
Similar in function to the LOCKSIZE DDL parameter is the LOCK TABLE statement. The LOCK TABLE statement requests an immediate lock on the specified table. The LOCK TABLE statement has two forms—one to request a share lock and one to request an exclusive lock.
LOCK TABLE table_name IN SHARE MODE;
LOCK TABLE table_name IN EXCLUSIVE MODE;
You also can issue LOCK TABLE against a specific partition, for example:
LOCK TABLE table-name
PARTITION integer IN SHARE MODE;
A locking scheme is not effective unless multiple processes can secure different types of locks on the same resource concurrently. With DB2 locking, some types of table space locks can be acquired concurrently by discrete processes. Two locks that can be acquired concurrently on the same resource are said to be compatible with one another.
Refer to Table 3 for a breakdown of DB2 table space lock compatibility. A Yes in the matrix indicates that the two locks are compatible and can be acquired by distinct processes on the same table space concurrently. A No indicates that the two locks are incompatible. In general, two locks cannot be taken concurrently if they allow concurrent processes to negatively affect the integrity of data in the table space.
Table 3. Table Space Lock Compatability
(click on image for larger view)
Table Locks
When segmented table spaces are involved, DB2 can use table locks . Table locks are always associated with a corresponding table space lock.
The same types of locks are used for table locks as are used for table space locks. S, U, X, IS, IX, and SIX table locks can be acquired by DB2 processes when data in segmented table spaces is accessed. Table 1 describes the options available to DB2 for table locking. The compatibility chart in Table 3 applies to table locks as well as table space locks.
For a table lock to be acquired, an IS-lock must first be acquired on the segmented table space in which the table exists. The type of table lock to be taken depends on the LOCKSIZE specified in the DDL, the bind parameters chosen for the plan being run, and the type of processing requested. Table 4 (below) is a modified version of Table 2 (shown earlier), depicting the initial types of table spaces and table locks acquired given a certain set of conditions. Table locks are not acquired when the LOCKSIZE TABLESPACE parameter is used.
Table 4. How Table Locks Are Acquired
(click on image for larger view)
Summary
And so concludes Part 2 of our multi-part series on DB2 locking. Be sure to keep watching this blog for additional postings in this series!
Monday, April 15, 2013
DB2 Locking, Part 1: An Overview
DB2 automatically guarantees the integrity of data by enforcing
several locking strategies. These strategies permit multiple users from
multiple environments to access and modify data concurrently. Basically, DB2 combines the following
strategies to implement an overall locking strategy:
• Table and table space locking
• IRLM page and row locking
• Internal page and row latching
• Claims and drains to achieve partition independence
• Checking commit log sequence numbers (CLSN) and PUNC bits to achieve
lock avoidance
• Global locking through the coupling facility in a data sharing
environment
But what exactly is locking? How
does DB2 utilize these strategies to lock pages and guarantee data integrity?
Why does DB2 have to lock data before it can process it? What is the difference
between a lock and a latch? How can DB2 provide data integrity while operating
on separate partitions concurrently? Finally, how can DB2 avoid locks and still
guarantee data integrity?
Today's blog post will offer a high level introduction to DB2 locking, why it is needed, as well as the benefits and drawbacks that it introduces to using DB2 databases.
Anyone accustomed to
application programming when access to a database is required understands the
potential for concurrency problems. When one application program tries to read
data that is in the process of being changed by another, the DBMS must forbid
access until the modification is complete to ensure data integrity. Most DBMS
products, DB2 included, use a locking mechanism for all data items being
changed. Therefore, when one task is updating data on a page, another task
cannot access data (read or update) on that same page until the data
modification is complete and committed.
When multiple users can access
and update the same data at the same time, a locking mechanism is required.
This mechanism must be capable of differentiating between stable data and
uncertain data. Stable data has been successfully committed and is not involved in an update
in a current unit of work. Uncertain data is currently involved in an operation that could modify its
contents.
Consider two DB2 application programs. If program #1 updates a piece
of data on page 1, you must ensure that program #2 cannot access the data until
program #1 commits the unit of work. Otherwise, a loss of integrity could
result. Without a locking mechanism, the following sequence of events would be
possible:
- Program #1 retrieves a row from DSN81010.EMP for EMPNO ‘000010’.
- Program #1 issues an update statement to change that employee’s salary to 55000.
- Program #2 retrieves the DSN81010.EMP row for EMPNO ‘000010’. Because the change was not committed, the old value for the salary, 52750, is retrieved.
- Program #1 commits the change, causing the salary to be 55000.
- Program #2 changes a value in a different column and commits the change.
- The value for salary is now back to 52750, negating the change made by program #1.
The DBMS avoids this situation
by using a locking mechanism. DB2 supports locking at four levels, or granularities: table space-, table-, page-, and row-level locking. DB2 also
provides LOB locking for large objects (BLOBs, CLOBs, and DBCLOBs).
Locks can be taken at any
level in the locking hierarchy without taking a lock at the lower level.
However, locks cannot be taken at the lower levels without a compatible
higher-level lock also being taken. For example, you can take a table space
lock without taking any other lock, but you cannot take a page lock without
first securing a table space-level lock (and a table lock as well if the page
is part of a table in a segmented table space).
Additionally, a page lock does not have to be taken before a
row lock is taken. Your locking strategy requires an “either/or” type of choice
by table space: either row locking or page locking. An in-depth discussion on
the merits of both are beyond the scope of today's blog post, but will be covered in future posts.
Both page locks and row locks can escalate to a table level and then to a table space level for segmented tables
or straight to a table space level for partitioned table spaces. A table or
table space cannot have both page locks and row locks held against it at the
same time.
Many modes of locking are
supported by DB2, but they can be divided into two types:
• Locks to enable the reading of data
• Locks to enable the updating of data
But remember, we are still talking at a very high, and somewhat simplistic level. DB2 uses varieties of these two types of locks to indicate the type of
locking required.
Locks
Versus Latches
A true lock is handled by DB2
using the IRLM. However, whenever doing so is practical, DB2 tries to lock
resources without going to the IRLM. This type of lock is called a latch. Whereas true locks are always set in
the IRLM, latches are set internally by DB2, without going to the
IRLM.
When a latch is taken instead
of a lock, it is handled in the Buffer Manager by internal DB2 code; so the
cross-memory service calls to the IRLM are eliminated. Latches are usually held
only briefly—for a shorter duration than locks. Also, a latch requires about
one-third the number of instructions as a lock. Therefore, latches are more
efficient than locks because they avoid the overhead associated with calling an
external address space. Latches are used when a resource serialization
situation is required for a short time. Both latches and locks guarantee data
integrity. In subsequent sections, any usage of the term lock
generically, refers to both locks and latches.
Lock
Duration
Before you learn about the
various types of locks that can be acquired by DB2, you should understand lock
duration, which refers to the length of time that a lock is maintained. The duration of a lock is
based on the BIND options chosen for the program requesting locks. Locks can be
acquired either immediately when the program is requested to be run or
iteratively as needed during the execution of the program. Locks can be
released when the plan is terminated or when they are no longer required for a
unit of work.
The BIND parameters affecting DB2 locking
are summarized
below:
The BIND parameters that impact table space locks are the ACQUIRE and RELEASE parameters.
- ACQUIRE(ALLOCATE | USE):
The ALLOCATE
option specifies that locks will be acquired when the plan is allocated, which
normally occurs when the first SQL statement is issued. The USE option indicates that locks
will be acquired only as they are required, SQL statement by SQL statement. As of DB2 10, ACQUIRE(ALLOCATE) is no longer supported when binding or rebinding plans.
- RELEASE(DEALLOCATE | COMMIT): When you specify DEALLOCATE for a program, locks are not released until the plan is terminated. When you specify COMMIT, table space locks are released when a COMMIT is issued.
The BIND parameter affecting page and row locks is the ISOLATION parameter. There are four choices for isolation level:
- ISOLATION(CS), or Cursor Stability, acquires and releases page locks as pages are read and processed. CS provides the greatest level of concurrency at the expense of potentially different data being returned by the same cursor if it is processed twice during the same unit of work.
- ISOLATION(RR), or Repeatable Read, holds page and row locks until a COMMIT point; no other program can modify the data. If data is accessed twice during the unit of work, the same exact data will be returned.
- ISOLATION(RS), or Read Stability, holds page and row locks until a COMMIT point, but other programs can INSERT new data. If data is accessed twice during the unit of work, new rows may be returned, but old rows will not have changed.
- ISOLATION(UR), or Uncommitted Read, is also known as dirty read processing. UR avoids locking altogether, so data can be read that never actually exists in the database.
Regardless of the ISOLATION level chosen, all page locks
are released when a COMMIT is encountered.
I think that is enough for this first installment on DB2 locking... come back soon and we will expand on table space locks in the next installment of this blog series on DB2 locking...
Monday, March 18, 2013
DB2 Table Editors
In today's blog post I want to briefly discuss one of the more useful, yet often ignored, DB2 tools available on the market -- Table Editors...
Typically, the only method
of updating DB2 data (indeed, any data stored in a relational database) is with SQL data manipulation language statements
DELETE, INSERT, and UPDATE (or with a database load). Because these SQL
statements operate on data a set at a time, multiple rows -- or even all of the
rows -- can be affected by a single SQL statement. Coding SQL statements for every
data modification required during the application development and testing phase
can be time-consuming and error-prone.
A table editing
tool can reduce the time needed to make simple data modifications by providing
full-screen edit capability for database tables. The user specifies the table
to edit and is placed into an edit session. The data is presented to the user as
a series of rows, with the columns separated by spaces. A header line indicates
the column names. The data can be scrolled up and down as well as left and
right. To change data, the user simply types over the current data.
This type of
tool is ideal for supporting the application development process. A programmer
can make quick changes without coding SQL. Also, if properly implemented, a
table editor can reduce the number of erroneous data modifications made by
beginning SQL users.
When a table
editor is used, all columns are available for update. Thus, if a table editor
is used to change production data, a simple mis-keying can cause unwanted
updates. Native SQL should be used if you must ensure that only certain columns
are updated.
One final note:
Tested SQL statements and application programs are characterized by their
planned nature. These modification requests are well thought out and tested.
This is not true for changes implemented through a table editor, so always exercise caution when using a table editor.
Examples of DB2 Table Editors include:
- BMC Catalog Manager (which includes an integrated table editor)
- CA RC/Update (which includes an integrated table editor)
- IBM DB2 Table Editor
- RazorSQL DB2 Table Editor
Thursday, February 14, 2013
The Importance of Database Design
It is impossible to over-estimate the importance of database
design on the effectiveness and efficiency of application systems. The first
step, of course, is to create a logical data model of the business information
that must be stored in, and accessed through, the database. This is a
non-trivial task, but it is not the focus of today’s blog post, which is
implementing optimal and well-designed database schemata.
Oftentimes the database design does not get the attention it
deserves. This can occur for numerous reasons such as:
- Insufficient specifications and/or poor logical data modeling
- Not enough time in the development schedule
- Too many changes occurring throughout the development cycle
- Database design assigned to, or performed by novices
And even when the database design is being performed by
experienced professionals with sufficient time and a solid logical model, it is
very easy for design flaws to creep into the database. This is especially the
case for larger and more complex databases required to support mission critical
applications.
Of course, the first step in constructing a physical database
should be transforming the logical design using best practices. The
transformation consists of the following:
- Transforming entities into tables
- Transforming attributes into columns
- Transforming domains into data types and constraints
- Transforming relationships into primary and foreign keys
But a simple transformation will not result in a complete and
correct physical database design – it is merely the first step. And design flaws can be introduced even
during such a transformation.
The process of normalizing your data should be conducted during
the logical design phase, but sometimes mistakes are made during the logical
modeling process. In simple terms, normalization
is the process of identifying the one best place where each fact belongs. A
normalized design minimizes data redundancy and optimizes data structures by
systematically and properly placing data elements into the appropriate
groupings.
- First normal form (1NF) eliminates repeating groups and non-atomic data from an entity.
- Second normal form (2NF) ensures that all the attributes of each entity are dependent on the primary key.
- Third normal form (3NF) ensures that no relationships between attributes exist within an entity.
Although normalization is a logical process and does not
necessarily dictate physical database design it is good practice to implement
normalized physical databases – especially with today’s powerful hardware and
database systems. A normalized data model will ensure that each entity is well
formed and that each attribute is assigned to the proper entity. And
denormalized data structures should always be documented.
A database that is not normalized can be difficult to modify
(because facts may be strewn about within the database) and prone to data
quality issues (again because one fact may be in more than one place). Failing
to achieve appropriate normalization is a significant design flaw.
Design flaws can make it difficult to access a database in an
efficient manner. Something as simple as creating columns with inconsistent
data type and length across tables can introduce inefficiency. For example,
perhaps JOB_CODE is defined as CHAR(10) in one table and CHAR(15) in another.
If these columns need to be compared or joined, additional work is required by
the DBMS to make the columns comparable.
It is easy to see how a database design issue can make applications
inefficient no matter how adept the developers are. And this is just one type
of design flaw.
There are numerous other types of design flaws that can
negatively impact the usability and efficiency of a database implementation
(and the applications that use it). Improperly defined constraints
(referential, check, uniqueness) can cause data quality problems. Improper
indexing (to support constraints and other physical structures) can cause a
database to operate inefficiently or even cease operating altogether. In DB2
for z/OS, failing to explicitly name a clustering index causes DB2 to default
cluster sequence to the oldest index. Changing index structures, which
sometimes require dropping and recreating the index, can cause the data to be
ordered in a completely different physical sequence, thereby impacting
performance.
Some database design flaws are more subtle. What happens if you
create two incompatible check constraints? For example, consider the following:
CHECK (empno < 100)
CHECK (empno >= 101)
No data can be added to the table because no number is less than
100 and greater than or equal to 101. Of course, this is an extreme example to
highlight the problem. Another situation can occur if the default value is not
one of the values permitted by the check constraint, for example:
emp_type CHAR(8) DEFAULT 'new'
CHECK (emp_type IN ('temp',
'fulltime',
'contract')), ...
'fulltime',
'contract')), ...
Cycles are another potential problem that can be created in a
physical database schema. A cycle is a referential path that connects a table
to itself. This can occur when multiple tables are related back to each other
and it looks like a loop when diagrammed. DB2 forbids a table from being
delete-connected to itself in a cycle. A table is delete-connected to another
table if it is a dependent of a table specified with a CASCADE delete rule.
Nested triggers can also create problems. A trigger is fired by
an INSERT, UPDATE, or DELETE operation. However, a trigger can also contain
INSERT, UPDATE, or DELETE logic within itself. Therefore, a trigger is fired by
a data modification, but can also cause another data modification, thereby
firing yet another trigger. When a trigger contains INSERT, UPDATE, and/or
DELETE logic (including MERGE statements), the trigger is said to be a nested
trigger. If referential integrity is combined with triggers, additional
cascading updates and/or deletes can occur.
DB2 restricts the number of nested triggers that can be executed
within a single firing event. If this were not done, it could be quite possible
to have triggers firing triggers ad infinitum until all of the data was removed
from an entire database. DB2 limits this cascading effect to 16 levels to
prevent endless looping. If more than 16 levels of nesting occur, the
transaction is aborted. Such nesting may restrict certain types of data
modification from happening at all because the number of nested calls will
always be exceeded. So nesting levels need to be controlled and managed in the
database schema to avoid problems.
And this is by no means a comprehensive list of database design
flaws that can cause real problems for application developers and end users.
Getting the database design correct is imperative if you wish to have any hope
of assuring application performance.
So how can you go about examining the correctness of your
database structures? The best approach is an automated one such as that
provided by Database Examiner, a product offered by DBE Software that performs a
comprehensive validation of a database schema. The in-depth technology incorporated
into Database Examiner allows you to review your database schema, identify
problems, and remediate the issues to ensure a quality database implementation
offering peak performance.
Simply provide Database Examiner with the data model, DDL, or a
link to the active database and it will perform a series of comprehensive diagnostics. Database
Examiner applies the rules of relational technology to detect flaws,
inconsistencies and lack of integrity. The product applies more than 50
diagnostics that can be organized by category or severity levels. And you can
customize the diagnostics by selecting those to be executed and assigning each
one a severity level.
But Database Examiner doesn’t just highlight the flaws, it also helps to fix them be providing recommendations
and corrections for each issue it detects, including the generation of SQL DDL
scripts to remedy the flaws. Even better, Database Examiner provides multi-platform DBMS
support, including DB2 for z/OS, DB2 LUW, Oracle, Microsoft SQL Server and
Sybase.
And DBE Software is currently offering folks the ability to download
and use a scaled down “Performance Edition” of Database Examiner, free ofcharge for a limited time. Be sure to enter the promo code MULL on the download
request to get the best offer available.
Hopefully this blog entry has convinced you that database design
is important… important enough to take some time to evaluate the quality of
your existing database structures. And to take a look at automating the process
using Database Examiner (using the promo code MULL ).
Friday, February 01, 2013
A Brief Introduction to the DB2 Catalog
The system catalog, or the DB2 Catalog, offers a wealth of information about DB2. If the DB2 optimizer is the heart and soul of DB2, the DB2 Catalog is its memory. The knowledge base of every object known to DB2 is stored in the DB2 Catalog, along with the DB2 Directory and the BSDS (Bootstrap Data Set).
The tables in the DB2 Catalog collectively describe the objects and resources available to DB2. You can think of the DB2 Catalog as a metadata repository for your DB2 databases. As of Version 10, the DB2 Catalog is composed of 90 table spaces and 137 tables all in a single database named DSNDB06. These numbers have grown considerably since the early days of DB2. The DB2 Catalog consisted of 25 tables in 11 table spaces for the first version of DB2 and as recently as DB2 V8, there were only 21 table spaces and 87 tables. The following table runs down the history:
Over the course of the past couple releases, the DB2 Catalog has undergone many significant changes. For most of its life, the DB2 Catalog contained many multi-table table spaces. As of DB2 10 for z/OS, IBM made an effort to clean that up, and now only a few table spaces are in the DB2 Catalog with more than one table defined. As of V10, most of the table spaces in the DB2 Catalog are now universal table spaces. In addition, the DB2 Catalog now must be SMS-managed.
Even as many new tables have been added to the DB2 Catalog to support new features such as trusted context, XML, and access path management, some tables have been removed. The SYSPROCEDURES table, which was used to register stored procedures in earlier version of DB2, was removed as of DB2 V9. And the SYSLINKS table was removed for DB2 V10.
The SYSLINKS table was used to record the
links (or pointers) that existed in several of the older DB2 Catalog table
spaces (SYSDBASE, SYSPLAN, SYSDBAUT, SYSVIEW, SYSGROUP), as well as in the DB2
Directory (DBD01). Links were used to tie tables together hierarchically—not
unlike an IMS database—using a special type of relationship. However, links are
obsolete in DB2 as of V10.
Each DB2 Catalog table
maintains data about an aspect of the DB2 environment. In that respect, the DB2
Catalog functions as a data dictionary for DB2, supporting and maintaining data
about the DB2 environment. The DB2 Catalog records all the information required by DB2 for the
following functional areas:
- Database Objects: Storage groups, databases, table spaces, partitions, tables, auxiliary tables, columns, user-defined distinct types, views, synonyms, aliases, sequences, indexes, index keys, foreign keys, relationships, schemas, user-defined functions, stored procedures, triggers, and so on.
- Programs: Plans, packages, DBRMs, and Java/JAR information
- XML: XML Schema Repository tables
- Security: Database privileges, plan privileges, schema privileges, system privileges, table privileges, view privileges, use privileges, trusted contexts, roles, and audit policies
- Utility: Image copy data sets, REORG executions, LOAD executions, and object organization efficiency information
- Communication: How DB2 subsystems are connected for communication, data distribution, and DRDA usage
- Performance: Statistics, profiles, queries, and auto alerts
- Environmental: Control and administrative information (such as details on image copies and the dummy tables)
How does the DB2 Catalog
support data about these areas? For the most part, the tables of the DB2
Catalog cannot be modified using standard SQL data manipulation language
statements. You do not use INSERT statements, DELETE statements, or UPDATE statements (with a few exceptions) to modify these tables.
Instead, the DB2 Catalog operates as a semi-active, integrated, and non-subvertible
data dictionary. The definitions of these three adjectives follow.
First, the DB2 Catalog is
semi-active. An active dictionary is built, maintained, and used as the result
of the creation of the objects defined to the dictionary. In other words, as the
user is utilizing the intrinsic functions of the DBMS, metadata is being
accumulated and populated in the active data dictionary.
The DB2 Catalog, therefore, is
active in the sense that when standard DB2 SQL is issued, the DB2 Catalog is
either updated or accessed. All the information in the DB2 Catalog, however, is
not completely up-to-date, and some of the tables must be proactively populated
(such as SYSIBM.IPNAMES and SYSIBM.IPLIST). But, for the most part, the DB2 Catalog operates as an active data
dictionary, particularly with regard to SQL. Remember that the three types of SQL are DDL, DCL, and DML. When
DDL is issued to create DB2 objects such as databases, table spaces, and
tables, the pertinent descriptive information is automatically stored in the DB2 Catalog.
When a CREATE, DROP, or ALTER statement is issued,
information is recorded or updated in the DB2 Catalog. For example, upon
successfully issuing a CREATE TABLE statement, DB2 populates the metadata for the table into SYSTABLES and SYSCOLUMNS, as well as possibly into SYSSEQUENCES, SYSFIELDS, SYSCHECKS, and SYSCHECKDEP depending upon the exact DDL
that was issued.
The same is true for security
SQL data control language statements. The GRANT and REVOKE statements cause information to be added or removed from DB2
Catalog tables. For example, if you issue GRANT
TABLE, DB2 potentially adds metadata to SYSTABAUTH and SYSCOLAUTH.
Data manipulation language SQL
(SELECT, INSERT, UPDATE, MERGE, DELETE) statements use the DB2
Catalog to ensure that the statements accurately reference the DB2 objects being
manipulated (such as column names and data types).
Why then is the DB2 Catalog
classified as only semi-active rather than completely active? The DB2 Catalog
houses important information about the physical organization of DB2 objects.
For example, the following information is maintained in the DB2 Catalog:
- The number of rows in a given DB2 table or a given DB2 table space
- The number of distinct values in a given DB2 index
- The physical order of the rows in the table for a set of keys
This information is populated
by means of the DB2 RUNSTATS utility. A truly active data dictionary would update this
information as data is populated in the application table spaces, tables, and
indexes. Some of these statistics are now actively populated in the Real Time
Statistics table in the DB2 Catalog, making them active. But because some of
the information in the DB2 Catalog is not always completely up-to-date, it is
only a semi-active system catalog.
I also decsribed the DB2 Catalog as being integrated. The DB2 Catalog and the DB2 DBMS are inherently
bound together, neither having purpose or function without the other. The DB2
Catalog without DB2 defines nothing; DB2 without the DB2 Catalog has nothing
defined that it can operate on.
The final adjective used to
classify the DB2 Catalog is non-subvertible. This simply means that the DB2
Catalog is continually updated as DB2 is being used; the most important metadata
in the DB2 Catalog cannot be updated behind DB2’s back. Suppose that you
created a table with 20 columns. You cannot subsequently update the DB2 Catalog
to indicate that the table has 15 columns instead of 20 without using standard
DB2 data definition language SQL statements to drop and re-create the table.
An
Exception to the Rule
As with most things in life,
there are exceptions to the basic rule that the SQL data manipulation language
cannot be used to modify DB2 Catalog tables. You can modify columns (used by
the DB2 optimizer) that pertain to the physical organization of table data.
Querying the DB2 Catalog
Because the DB2 Catalog consists of DB2 tables, you can write SQL queries to easily retrieve the metadata information about your DB2 environment. You can write queries to discover all sorts of interesting and useful information about DB2 across the following broad categories:
- Navigational queries, which
help you to maneuver through the sea of DB2 objects in your DB2 subsystems
- Physical analysis queries, which depict the physical state of your
application table spaces and indexes
- Queries that aid programmers (and other analysts) in identifying the
components of DB2 packages and plans
- Application efficiency queries, which combine DB2 Catalog statistics
with the PLAN_TABLE output from EXPLAIN to identify problem queries quickly
- Authorization queries, which identify the authority implemented for
each type of DB2 security
- Historical queries, which use the DB2 Catalog HIST tables to identify and
monitor changing data patterns
- Partition statistics queries, which aid the analysis of partitioned table spaces
Summary
The DB2 Catalog is a rich source of information about your DB2 subsystem and
applications. Be sure to use it to simplify your DB2 development
and administrative efforts.
Note: This blog post was adapted from material in the sixth and latest edition of Craig's book, DB2 Developer's Guide.
Tuesday, January 15, 2013
Upcoming Webinar: Data Security in the Age of Regulatory Compliance
Webinar Title: Data Security in the Age of Regulatory Compliance
Presenter: Craig S. Mullins
Date: Wednesday, January 23
Time: 2pm Eastern / 11am Pacific
Cost: Free
Register Link: https://www1.gotomeeting.com/register/990275648
Presenter: Craig S. Mullins
Date: Wednesday, January 23
Time: 2pm Eastern / 11am Pacific
Cost: Free
Register Link: https://www1.gotomeeting.com/register/990275648
As governmental regulations expand, organizations need to deploy better controls to ensure quality data and properly protected database systems. Sarbanes-Oxley, HIPAA, BASEL II, PCI DSS and more make the news, but what do they mean in terms of your data? And what steps can be taken to ensure compliance?
Anyone who has been paying attention lately knows at least something about the large number of data breaches in the news… and their impact on business. Data breaches and the threat of lost or stolen data will continue to plague organizations until comprehensive plans are enacted to combat them. Many of these breaches have been at the database level, and more will be unless better data protection policies and procedures are enacted on operational databases.
As a result of expanded regulations and the ever-present specter of data breaches, data security has grown in importance. And that places new burdens on DBAs and data management professionals. If you are interested in learning more about this topic -- and steps you can take to ensure compliance -- be sure to register for my upcoming webinar sponsored by SoftBase Systems --> Data Security in the Age of Regulatory Compliance. This presentation will offer an overview of this new landscape focusing particularly on techniques for improving data and database security.
Topics to be discussed include:
- An Introduction to Industry and Governmental Regulations
- The Pervasiveness of Data Breaches with Techniques for Avoidance and Remediation
- Long-term Data Retention
- Database Activity Monitoring and Auditing
- Database Security and Encryption
- Test Data Management
- Data Masking
- Metadata Management
Sunday, January 13, 2013
Two New Group Privileges in DB2 10 for z/OS
DB2 10 for z/OS delivers two new group
level privileges to enable more granular and functional security support for
DB2 administrators. The system DBADM authority is for DBAs at shops looking to
minimize SYSADM usage, and SQLADM authority is for users who focus
predominantly on performance-related issues.
System
DBADM Authority is a DB2 V10 capability to better support separation of
duties. System DBADM authority can be assigned to enable a user to manage all objects
within a DB2 subsystem but without necessarily accessing data. This authority
can be granted to an authid or role. By using system DBA authority judiciously,
the need for SYSADM authority can be minimized.
So, as of DB2 V10, DBADM security can be granted at
the system level, or at a database-by-database level as in all past versions of
DB2.
Two granular options can be
set when granting system DBADM authority: ACCESSCTRL and DATAACCESS. You can specify whether the system DBADM designation is to be granted
with or without either.
Specifying WITH ACCESSCTRL indicates that the ACCESSCTRL authority is granted along
with the system DBADM authority. ACCESSCTRL enables system DBADM to grant all authorities and privileges, except system DBADM, DATAACCESS, ACCESSCTRL authorities and privileges on
security-related objects. And, of course, WITHOUT
ACCESSCTRL specifies that these
abilities are not granted to the system DBADM.
Specifying WITH DATAACCESS indicates that
the DATAACCESS
authority is granted along with the system DBADM authority. DATAACCES enables the system DBADM to access data in all user tables, views, and materialized query
tables in a DB2 subsystem and enables the user to execute plans, packages,
functions, and procedures. Specifying WITHOUT
DATAACCESS specifies that the capability
to access data is not granted to the system DBADM.
Many security regulations and
compliance initiatives favor prohibiting high-level authorities, such as SYSADM and DBADM, being conferred with data
access privileges. Keeping administrative and data access separate is another
control designed to protect user data.
DB2
V10 also introduces the ability to grant the SQLADM privilege for DBAs who work as SQL performance specialists. Some
organizations delineate job responsibilities into granular roles, such as
recovery DBA or SQL performance tuner.
The SQLADM privilege can be granted to
authids and roles. An agent with SQLADM authority can perform SQL and SQL performance management-related
actions without requiring any additional privileges.
SQLADM authority
includes the capability to perform the following:
- Issue the DESCRIBE TABLE statement.
- Execute the EXPLAIN statement with any of the following options: PLAN, ALL,
- STMTCACHE ALL, STMTID, STMTTOKEN, and MONITORED STMTS.
- Execute the PREPARE statement.
- Explain dynamic SQL statements that execute under the special register CURRENT EXPLAIN MODE, when CURRENT EXPLAIN MODE = EXPLAIN.
- Issue BINDs specifying EXPLAIN(ONLY) or SQLERROR(CHECK).
- Issue START and STOP commands.
- Issue the DISPLAY PROFILE command.
- Execute the RUNSTATS and MODIFY STATISTICS utilities for any database.
- Obtain appropriate IFCID data using the MONITOR2 privilege.
Subscribe to:
Posts (Atom)