The Db2 Portal Blog: tips and tricks

Showing posts with label tips and tricks. Show all posts

Monday, June 26, 2023

Top 10 Db2 Performance Tips - No. 1: Analyze Query Execution Plans

Query execution plans play a crucial role in identifying performance bottlenecks within an IBM Db2 database. When a query is executed, the database optimizer determines the most efficient way to retrieve the requested data by generating a query execution plan. It does this by parsing SQL requests and turning them into actionable commands for accessing and modifying data.

So the query execution plan outlines the steps and operations involved in executing the query, including accessing tables, applying filters, joining data, and performing sorting or aggregations.

Understanding query execution plans is important because they reveal the inner workings of how Db2 will process the query. And they also highlight areas where performance can be improved if you understand the various ways that data can be processed by Db2. By analyzing query execution plans, you can identify potential inefficiencies and take proactive steps to optimize queries for better performance.

For example, one high-level performance tuning step that you can identify using a query execution plan is to find table (space) scans that would be better off using an index instead.

Plan Tables

But how can you review and analyze query execution plans? Perhaps the first method used by most folks is to examine the Plan Tables. A lot of information is contained in the PLAN_TABLE. The PLAN_TABLE is simply a specific table used by the EXPLAIN command to populate information about the choices made by the Db2 Optimizer when it formulates a query execution plan. The information in the PLAN_TABLE provides detailed information about the access paths chosen, join strategies employed, and other relevant details that impact query performance.

As long as EXPLAIN YES is specified when you bind your program, or you explicitly issue an EXPLAIN PLAN statement, information about the query execution plan will be placed in the PLAN_TABLE.

So, after the optimizer creates the access paths and populates the PLAN_TABLE with data representing those access paths, you will need to examine the results to determine if everything is satisfactory. Many questions can be answered by analyzing the results of EXPLAIN – questions like:

if we are joining what type of join is used (NLJ, MS, Hybrid),
was an index used, and if so how many columns matched,
are we doing a scan, and if so what type of scan (full or page range)
is prefetch being used, and if so what type (sequential, list)
was a hint used
was parallelism used, and if so what degree and type
was a sort required, and if so why (Join, Unique, Group By, Order By)
what type of locking is required

And that just covers the main PLAN_TABLE. There are actually many other explain tables available that, if created, will be populated by EXPLAIN. But that is beyond the scope of this high-level tip, which is already becoming rather lengthy.

Other Tools

There are also several additional tools and techniques that can be used to analyze Db2 query execution plans. One commonly used tool is Visual Explain, which provides a graphical representation of the query execution plan. Many tuners prefer a visual approach to revieing and tuning access paths. Visual Explain allows users to visualize the flow of data, understand the order in which operations are performed, and identify potential performance bottlenecks.

Visual Explain is a feature of the free IBM Data Studio tool, as well as the new Db2 Administration Foundation offering. There are other tools, from other vendors, that also provide a visual Explain capability.

Speaking of which, another technique to analyze query execution plans is to use database monitoring tools. Such tools capture real-time performance data, including query execution plans, and provide useful performance metrics that can be reviewed and acted upon as needed.

By monitoring the execution plans of frequently executed queries, it is possible to detect patterns, identify resource-intensive operations, and make informed decisions about query optimization.

A Bit of Guidance

It is wise to always Bind your production programs (that use static SQL) specifying EXPLAIN YES so that you can see the access paths chosen by Db2. This way, you can either check to make sure that no access paths have changed before proceeding to implement changes in production, or at least you have the information available should problems arise after you move code into production.

It can be difficult to determine if any access paths have changed, because you will need the old execution plans to compare to the new ones. For this reason, it is a good idea to keep several versions of access path information for each program.

Additionally, sometimes the program has changed between running EXPLAIN, which can make it challenging to find the exact SQL statements to compare. But you are still better off with older versions than without any historical data.

To optimize query execution plans for better performance, it is essential to focus on areas such as index usage, join strategies, and predicate selection. By strategically creating and maintaining appropriate indexes, rewriting complex queries, and refining predicate conditions, it is possible to influence the optimizer's decisions and improve query performance.

In Conclusion

Analyzing query execution plans is a vital step in optimizing the performance of IBM Db2 applications. By using tools like EXPLAIN and Visual Explain, PLAN_TABLE data, and monitoring tools, it is possible to gain valuable insights into query execution, identify potential bottlenecks, and enhance overall performance.

By proactively analyzing and optimizing query execution plans, organizations can achieve faster response times, improved scalability, and a more efficient database environment.

Be sure to check back here next week when we will address database design as a performance issue.

Friday, September 04, 2015

Influencing the DB2 Optimizer: Part 7 - Miscellaneous Additional Considerations

In this 7th, and final installment of this series on influencing the DB2 optimizer's access path choices, we will take a look at a couple of additional things to consider as you work toward improving your SQL performance.

Favor Optimization Hints Over Updating the DB2 Catalog

Optimization hints to influence access paths are less intrusive and easier to implement than changing data in the DB2 Catalog. However, that does not mean that you should use optimization hints all the time! Do not use optimization hints as a crutch to arrive at a specific access path. Optimization hints are best used when an access path changes and you want to go back to a previous, efficient access path.

Limit Ordering to Avoid Scanning

The optimizer is more likely to choose an index scan when ordering is important (ORDER BY, GROUP BY, or DISTINCT) and the index is clustered by the columns to be sorted.

Maximize Buffers and Minimize Data Access

If the inner table fits in 2% of the buffer pool, nested loop join should be favored. Therefore, to increase the chances of nested loop joins, increase the size of the buffer pool (or decrease the size of the inner table, if possible).

Consider Deleting Non-uniform Distribution Statistics

Sometimes non-uniform distribution statistics can cause dynamic SQL statements to fluctuate dramatically in terms of how they perform. To decrease these wild fluctuations, consider removing the non-uniform distribution statistics from the DB2 Catalog.

Although dynamic SQL makes the best use of these statistics, the overall performance of some applications that heavily use dynamic SQL can suffer. The optimizer might choose a different access path for the same dynamic SQL statement, depending on the values supplied to the predicates. In theory, this should be the desired goal. In practice, however, the results might be unexpected. For example, consider the following dynamic SQL statement:

SELECT EMPNO, LASTNAME

FROM DSN81010.EMP

WHERE WORKDEPT = ?

The access path might change depending on the value of WORKDEPT because the optimizer calculates different filter factors for each value, based on the distribution statistics. As the number of occurrences of distribution statistics increases, the filter factor decreases. This makes DB2 think that fewer rows will be returned, which increases the chance that an index will be used and affects the choice of inner and outer tables for joins.

These statistics are stored in the SYSIBM.SYSCOLDIST and SYSIBM.SYSCOLDISTSTATS tables and can be removed using SQL DELETE statements.

This suggested guideline does not mean that you should always delete the non-uniform distribution statistics. My advice is quite to the contrary. When using dynamic SQL, allow DB2 the chance to use these statistics. Delete these statistics only when performance is unacceptable. (They can always be repopulated later using RUNSTATS.)

Collect More Than Just the Top Ten Non-uniform Distribution Statistics

If non-uniform distribution impacts more than just the top ten most frequently occurring values, you should use the FREQVAL option of RUNSTATS to capture more than 10 values. Capture only as many as will prove to be useful for optimizing queries against the non-uniformly distributed data.

DB2 Referential Integrity Use

Referential integrity (RI) is the implementation of constraints between tables so that values from one table (the parent) control the values in another (the dependent, or child). A referential constraint between a parent table and a dependent table is defined by a relationship between the columns of the tables. The parent table’s primary key columns control the values permissible in the dependent table’s foreign key columns. For example, in the sample table, DSN8810.EMP, the WORKDEPT column (the foreign key) must reference a valid department as defined in the DSN8810.DEPT table’s DEPTNO column (the primary key).

You have two options for implementing RI at your disposal: declarative and application. Declarative constraints provide DB2-enforced referential integrity and are specified by DDL options. All modifications, whether embedded in an application program or ad hoc, must comply with the referential constraints. Favor using declarative RI as DB2 will then be aware of the relationship and can use that information during access path optimization.

Application-enforced referential integrity is coded into application programs. Every program that can update referentially-constrained tables must contain logic to enforce the referential integrity. This type of RI is not applicable to ad hoc updates.

With DB2-enforced RI, CPU use is reduced because the Data Manager component of DB2 performs DB2-enforced RI checking, whereas the RDS component of DB2 performs application-enforced RI checking. Additionally, rows accessed for RI checking when using application-enforced RI must be passed back to the application from DB2. DB2-enforced RI does not require this passing of data, further reducing CPU time.

In addition, DB2-enforced RI uses an index (if one is available) when enforcing the referential constraint. In application-enforced RI, index use is based on the SQL used by each program to enforce the constraint.

If you must use application RI instead of declarative RI, be sure to also define referential constraints with the NOT ENFORCED keyword. In that case, the constraints will not be enforced by DB2, but will be documented in the DDL. And it gives DB2 additional information that can be used by the Optimizer for query optimization.

Summary

Hopefully this 7-part series on influencing DB2 access paths provided you with a nice overview of the options available to you and considerations for their use. If you are interested in learning more about SQL tuning and DB2 performance, consider purchasing the book from which this series was drawn: DB2 Developer's Guide 6th edition.

Happy SQL performance tuning!

Friday, September 06, 2013

Top Ten Most Pervasive Myths About DB2 for z/OS

Today's blog offers up yet another Top Ten list for DB2 users, perusers, and abusers... This time counting down the most common myths that are perpetrated "out there" regarding DB2 and how it works (or doesn't work)...

1.Use Views to Insulate Programs from Change

This lie has been told for almost as long as DB2 has been around. I first wrote about this way back in 1991 for Database Programming & Design. Check that article out here if you don't understand why this is a bad idea, in general.

2.Locking Problems Indicate a Database Problem

Locking problems are generally caused by bad program design. You should write code to reduce the duration of locks and to COMMIT regularly... and then locking won't be a problem, for the most part.

3.Primary Key is Usually a Good Choice for Clustering

Actually, the foreign key is likely to be a better choice. When you join PK-->FK there will be one PK to multiple FK. Wouldn't it be best if the FKs were clustered on the same page (or pages)?

4.Just Using the Defaults Should Work Out Well

Don't rely on defaults. Many of them are outdated or wrong... and even if they aren't it will be better if you review and investigate all options before explicitly specifying the parameter value you want.

5.Programmers Don’t Need to Know How to Tune SQL

Programming performance-oriented SQL into your programs from the beginning would go a long way toward improving performance overall... and reducing the length of the application development lifecycle.

6.Black Boxes Work Well for Performance

No they don't!!!

7.Using NULLs Can Save Space

No they can't!!!

8.RUNSTATS Aren’t That Important

If you don't work with up-to-date RUNSTATS then you are hobbling the DB2 Optimizer and almost assuredly getting sub-optimal access paths for your DB2 SQL.

9.DB2 is a Hog

If you don't use it properly, every piece of software can become a resource hog. If you acquire the knowledge on how to work properly with DB2 then it will hum along like a well-oiled machine!

10.It Depends!

This is the answer that can be used for every DB2 question. But if that answer is not followed up with what "it" depends upon, then it is a useless answer... and whoever gave you that answer is probably just trying to get rid of you instead of helping you with your problems.

Wednesday, March 09, 2011

DB2 Symposium 2011

Today's blog post is about a great symposium dedicated to the topic of DB2. It is called, appropriately enough, the DB2 Symposium. DB2 Symposium is a three day training event with one day seminars presented by well-known DB2 consultants. I was fortunate enough to be asked to participate this year by the primary organizer of the event, Klaas Brant. For those of you who don't know him, Klaas is a well-respected DB2 consultant based in the Netherlands... and an all around great guy.

Why should I attend the DB2 Symposium you may ask? Don't IDUG and IOD provide everything I need in the way of events? Well, DB2 Symposium fills the gap between a conference and a multi-day training course. The DB2 Symposium is unique because you can participate for 1, 2, or 3 days, depending on your needs and budget.

Although it has not been to the USA the past few years, the DB2 Symposium is a regular, well-known event in Europe! And after a period of absence the DB2 Symposium is back in the USA.

The USA DB2 Symposium is happening soon, so you'll need to act fast if you want to participate. It occurs March 21-23, 2011 in the Dallas, Texas area. More precisely, at the Hilton Arlington (2401 East Lamar Boulevard, Arlington, Texas, USA 76006-7503). Each day the training sessions start at 9.00am and end at around 5.00pm.

But registration on site is not possible, you must pre-register online... so plan ahead!

My session is on March 21st and it is called DB2 Developer's Guide Comes Alive! This one day session, covers tips, techniques, and procedures you need to know in order to excel at administering and using DB2 on the mainframe.The material is based upon DB2 Developer's Guide, the best-selling DB2 for z/OS book on the market. Additionally, the course material will contain references to sections of the book for students to find additional material on each topic after the sessions. Topics to be covered will include:

A performance tuning roadmap for managing DB2 application, database and system performance. You will learn SQL coding and tuning techniques, guidance for database optimization and reorganization, coverage of buffer pool settings and parameters for performance.
Logical and physical database design recommendations for DB2, so you can build and maintain effective DB2 databases immediately. Includes discussion of standards, logical to physical translation, data types, usage of nulls, and more.
Information and guidance on BINDing and REBINDing, along with a discussion of the most important parameters.
Along the way we'll look at locking, access paths, statistics, indexing and more.
And even though the current edition of the book covers through DB2 V8, this course adds coverage of some of the newer features added to DB2 in versions 9 and 10 that can boost your productivity and performance.

If you own the book already, bring it along and I'll be happy to autograph it for you. And then you can use it along with the course materials... and if you don't own it already, you'll probably want to grab a copy after attending the seminar... you can always find a link to buy my books on the front page of my web site at http://www.craigsmullins.com.

So register for the DB2 Symposium today... and I'll see you in Dallas, pardner!

Thursday, October 07, 2010

Null Follow-up: IS [NOT] DISTINCT FROM

After publishing the last blog post here on the topic of pesky problems that crop up when dealing with nulls, I received a comment lamenting that I did not address the IS [NOT] DISTINCT FROM clause. So today’s blog post will redress that failure.

First of all, IS [NOT] DISTINCT FROM is a relatively new predicate operator, introduced to DB2 for z/OS in Version 8. It is quite convenient to use in situations where you are looking to compare to columns that could contain NULL.

Before diving into the operator, let’s first discuss the problem it helps to solve. Two columns are not equal if both are NULL, that is because NULL is unknown and a NULL never equals anything else, not even another NULL. But sometimes you might want to treat NULLs as equivalent. In order to do that, you would have to code something like this in your WHERE clause:


WHERE COL1 = COL2
OR (COL1 IS NULL AND COL2 IS NULL)

This coding would cause DB2 to return all the rows where COL1 and COL2 are the same value, as well as all the rows where both COL1 and COL2 are NULL, effectively treating NULLs as equivalent. But this coding although relatively simply, can be unwieldy and perhaps, at least not at first blush, unintuitive.

Here comes the IS NOT DISTINCT FROM clause to the rescue. As of DB2 V8, the following clause is logically equivalent to the one above, but perhaps simpler to code and understand:


WHERE COL1 IS NOT DISTINCT FROM COL2

The same goes for checking a column against a host variable. You might try to code a clause specifying WHERE COL = :HV :hvind (host variable and indicator variable). But such a search condition would never be true when the value in that host variable is null, even if the host variable contains a null indicator. This is because one null does not equal another null - ever. Instead we’d have to code additional predicates: one to handle the non-null values and two others to ensure both COL1 and the :HV are both null. With the introduction of the IS NOT DISTINCT FROM predicate, the search condition could be simplified to just:


WHERE COL1 IS NOT DISTINCT FROM :HV :hvind

Wednesday, October 06, 2010

Null Troubles

A null represents missing or unknown information at the column level. A null is not the same as 0 (zero) or blank. Null means no entry has been made for the column and it implies that the value is either unknown or not applicable.

DB2 supports null, and as such you can use null to can distinguish between a deliberate entry of 0 (for numerical columns) or a blank (for character columns) and an unknown or inapplicable entry (NULL for both numerical and character columns).

Nulls sometimes are inappropriately referred to as “null values.” Using the term value to describe a null is inaccurate because a null implies the lack of a value. Therefore, simply use the term null or nulls (without appending the term “value” or “values” to it).

DB2 represents null in a special “hidden” column known as an indicator. An indicator is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to the end user, but must be provided for when programming in a host language (such as COBOL or PL/I).

Every column defined to a DB2 table must be designated as either allowing or disallowing nulls. A column is defined as nullable – meaning it can be set to NULL – in the table creation DDL. Null is the default if nothing is specified after the column name. To prohibit the column from being set to NULL you must explicitly specify NOT NULL after the column name. In the following sample table, COL1 and COL3 can be set to null, but not COL2, COL4, or COL5:


CREATE TABLE SAMPLE1
  (COL1   INTEGER,
   COL2   CHAR(10) NOT NULL,
   COL3   CHAR(5),
   COL4   DATE     NOT NULL WITH DEFAULT,
   COL5   TIME     NOT NULL);

What Are The Issues with Null?

The way in which nulls are processed usually is not intuitive to folks used to yes/no, on/off, thinking. With null data, answers are not true/false, but true/false/unknown. Remember, a null is not known. So when a null participates in a mathematical expression, the result is always null. That means that the answer to each of the following is NULL:

5 + NULL
NULL / 501324
102 – NULL
51235 * NULL
NULL**3
NULL + NULL
NULL/0

Yes, even that last one is null, even though the mathematician in us wants to say “error” because of division by zero. So nulls can be tricky to deal with.

Another interesting aspect of nulls is that the AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. To clarify with an example, if the COMM column is nullable, the result of the following query:


    SELECT  AVG(COMM)
    FROM    DSN8810.EMP;

is not the same as for this query:


    SELECT  SUM(COMM)/COUNT(*)
    FROM    DSN8810.EMP;

But perhaps the more troubling aspect of this treatment of nulls is “What exactly do the results mean?” Shouldn’t a function that processes any NULLs at all return an answer of NULL, or unknown? Does skipping all columns that are NULL return a useful result? I think what is really needed is an option for these functions when they operate on nullable columns. Perhaps a switch that would allow three different modes of operation:

Return a NULL if any columns were null, which would be the default
Operate as it currently does, ignoring NULLs
Treat all NULLs as zeroes

At least that way users would have an option as to how NULLs are treated by functions. But this is not the case, so to avoid confusion, try to avoid allowing nulls in columns that must be processed using these functions whenever possible.

Here are some additional considerations regarding the rules of operation for nulls:

When a nullable column participates in an ORDER BY or GROUP BY clause, the returned nulls are grouped at the high end of the sort order.
Nulls are considered to be equal when duplicates are eliminated by SELECT DISTINCT or COUNT (DISTINCT column).
A unique index considers nulls to be equivalent and disallows duplicate entries because of the existence of nulls, unless the WHERE NOT NULL clause is specified in the index.
For comparison in a SELECT statement, two null columns are not considered equal. When a nullable column participates in a predicate in the WHERE or HAVING clause, the nulls that are encountered cause the comparison to evaluate to UNKNOWN.
When a nullable column participates in a calculation, the result is null.
Columns that participate in a primary key cannot be null.
To test for the existence of nulls, use the special predicate IS NULL in the WHERE clause of the SELECT statement. You cannot simply state WHERE column = NULL. You must state WHERE column IS NULL.
It is invalid to test if a column is <> NULL, or >= NULL. These are all meaningless because null is the absence of a value.

Examine these rules closely. ORDER BY, GROUP BY, DISTINCT, and unique indexes consider nulls to be equal and handle them accordingly. The SELECT statement, however, deems that the comparison of null columns is not equivalence, but unknown. This inconsistent handling of nulls is an anomaly that you must remember when using nulls.

Here are a couple of other issues to consider when nulls are involved.

Did you know it is possible to write SQL that returns a NULL even if you have no nullable columns in your database? Assume that there are no nullable columns in the EMP table (including SALARY) and then consider the following SQL:


SELECT SUM(SALARY)
FROM   EMP
WHERE  DEPTNO > 999;

The result of this query will be NULL if no DEPTNO exists that is greater than 999. So it is not feasible to try to design your way out of having to understand nulls!

Another troubling issue with NULLs is that some developers have incorrect expectations when using the NOT IN predicate with NULLs. Consider the following SQL:


SELECT C.color
FROM   Colors AS C 
WHERE  C.color NOT IN (SELECT P.color 
                       FROM   Products AS P);

If one of the products has its color set to NULL, then the result of the SELECT is the empty set, even if there are colors to which no other product is set.

Summary

Nulls are clearly one of the most misunderstood features of DB2 – indeed, of most SQL database systems. Although nulls can be confusing, you cannot bury your head in the sand and ignore nulls if you choose to use DB2 as your DBMS. Understanding what nulls are, and how best to use them, can help you to create usable DB2 databases and design useful and correct queries in your DB2 applications.

Wednesday, March 18, 2009

A Short Introduction to Lock Avoidance

Lock avoidance is a mechanism employed by DB2 for z/OS to access data without locking while still maintaining data integrity. It prohibits access to uncommitted data and serializes access to pages. Lock avoidance improves performance by reducing the overall volume of lock requests. After all, let’s face it, the most efficient lock is the one never taken.

Of course, even if it is not taking a lock, DB2 must still maintain the integrity of its data. Instead of taking a lock, DB2 uses a latch. To take advantage of Lock Avoidance, the SQL statement must be Read Only and the plan must be bound with Isolation Level Cursor Stability (CS) and CURRENTDATA(NO).

In general, DB2 avoids locking data pages if it can determine that the data to be accessed is committed and that no semantics are violated by not acquiring the lock. DB2 avoids locks by examining the log to verify the committed state of the data.

When determining if lock avoidance techniques will be practical, DB2 first scans the page to be accessed to determine whether any rows qualify. If none qualify, a lock is not required.

For each data page to be accessed, the RBA of the last page update (stored in the data page header) is compared with the log RBA for the oldest active unit of recovery. This RBA is called the Commit Log Sequence Number, or CLSN. If the CLSN is greater than the last page update RBA, the data on the page has been committed and the page lock can be avoided.

Additionally, a bit is stored in the record header for each row on the page. The bit is called the Possibly UNCommitted, or PUNC, bit. The PUNC bit indicates whether update activity has been performed on the row. For each qualifying row on the page, the PUNC bit is checked to see whether it is off. This indicates that the row has not been updated since the last time the bit was turned off. Therefore, locking can be avoided. (Note that there is no external method for DBAs to use to determine whether a row’s PUNC bit is on or off.)

If neither CLSN nor PUNC bit testing indicates that a lock can be avoided, DB2 acquires the requisite lock.

In addition to enhancing performance, lock avoidance improves data availability. Data that without lock avoidance would have been considered locked, and therefore unavailable, can now be accessible.

Lock avoidance is used only for data pages. Further, DB2 Catalog and DB2 Directory access does not use lock avoidance techniques. You can avoid locks under the following circumstances:

For any pages accessed by read-only or ambiguous queries bound with ISOLATION(CS) and CURRENTDATA NO
For any unqualified rows accessed by queries bound with ISOLATION(CS) or ISOLATION(RS)
When DB2 system-managed referential integrity checks for dependent rows caused by either the primary key being modified, or the parent row being deleted and the DELETE RESTRICT rule is in effect
For both COPY and RUNSTATS when SHRLEVEL(CHANGE) is specified

To determine the impact of lock avoidance on your system, you can review DB2 trace records. IFCIDs 218 and 223 provide CLSN information, and IFCIDs 226 and 227 provide 'wait for page latch' information.

Avoiding locks can improve the performance of your queries and programs that satisfy the preceding requirements. To encourage DB2 to avoid locks, BIND your plans and packages specifying ISOLATION(CS) and CURRENTDATA NO. Furthermore, avoid ambiguous cursors by specifying FOR READ ONLY for all cursors that are not used for updating.

Wednesday, October 22, 2008

Bad Standards

Just started a new series on bad standards over on my Data Management Today blog.

Check it out when you get a chance and share your favorite "bad standards" either here or there... or by e-mailing me.

Monday, October 20, 2008

DBA Rules of Thumb

Database administration is a very technical discipline, but it is also a discipline in which the practitioner is very visible politically within the organization. As such, DBAs should be armed with the proper attitude and knowledge before attempting to practice the discipline of database administration.

Just as important as technical acumen, though, is the ability to carry oneself properly and to embrace the job appropriately. With this in mind, I wrote a series of blog entries on DBA Rules of Thumb over at my Data Management Today blog... and I thought the information I wrote there may be helpful to my DB2 and mainframe readership here, so I'm sharing the eight rules of thumb (with links) here on my DB2 Portal blog:

What do you think? Did I miss anything important?

P.S. Just a reminder that I will be presenting a webinar on assuring DB2 recoverability with my colleague, Michael Figaro, this Thursday, October 23, 2008 at 10:30 Central time. If you are at all interested in the topic, be sure to register today - and attend this Thursday!

Thursday, October 09, 2008

Assuring the Recoverability of Your DB2 Databases

Availability requires much more than just having a reliable hardware and database platform. Most companies cannot afford significant downtime, and some cannot afford any! As such, it is crucial for unplanned outages to be as short as possible. But it is not just a business requirement, in many cases assuring a speedy recovery is also a legal mandate. Regulations such as SOX and Basel II dictate that any outage is resolved within a predefined period of time.

But how many of us can answer, with any degree of certainty, the question “How long will this outage last?” There are many variables that need to be considered when estimating a DB2 recovery time: backups available, quality, point-in-time requirements, amount of log processing, disk speed, tape mounts, and on and on and on...

With these thoughts in mind, Michael Figaro and I will be delivering a webinar titled Assuring the Recoverability of Your DB2 Databases, on Thursday, October 23, 2008 at 10:30 am CDT.

We will tackle issues ranging from regulations, IT complexity, and business continuity, to DSNZPARMs and backup/recovery planning. We’ll also make the case that planning for database recoverability is the most important task among the many tasks of the DBA.

As part of the webinar we will introduce and demonstrate Recovery AssuranceExpert, a new technology to help you ensure that all of your critical DB2 objects are recoverable within your recovery time objectives. Recovery AssuranceExpert is an automated solution to perform daily health checks of data availability and recoverability, as well as provide actual recovery times required for a DB2 object, a complete application, or even a whole DB2 subsystem. Join us on October 23rd to find out how you can insure that your actual recoverability times fit into your SLAs.

Monday, July 28, 2008

Selecting Every Other Row

One of the fun things about publishing is getting questions from readers that make you think. A recent question I received went something like this: "Can I get the odd and even number of rows from a DB2 table?"

Well, my first reaction was to think "this guy doesn't understand the way a SQL DBMS like DB2 works." The data in DB2 tables is not ordered, so there is no way to guarantee that the rows are odd or even numbered. While that observation may (or may not) have been true, it didn't help the guy. So I thought about it and came up with a possible work-around solution.

The first thing we have to do is to mimic row numbers in DB2. Until V9, DB2 did not support the row number construct (such as you can find in Oracle), and we'd like this to work for the versions in support today (V8 and V9).

So, to do this we start by using the COUNT(*) function and a table expression. A table expression is when you substitute SQL in place of the table in the FROM clause of another SQL statement. For example, consider this SQL:

SELECT  DEPTNO, ROWNUM
FROM DSN8810.DEPT A,
    TABLE (SELECT COUNT(*) + 1 AS ROWNUM
           FROM DSN8810.DEPT B
           WHERE B.DEPTNO < A.DEPTNO) AS TEMP_TAB;

That puts a pseudo-row number on the table that we can access in our SQL predicates. If, say, we only want to return the even results, we could write the following query:

SELECT  DEPTNO, ROWNUM
FROM DSN8810.DEPT A,
    TABLE (SELECT COUNT(*) + 1 AS ROWNUM
    FROM DSN8810.DEPT B
    WHERE B.DEPTNO < A.DEPTNO) AS TEMP_TAB
WHERE MOD(ROWNUM,2) = 0
ORDER BY ROWNUM;

The MOD function returns the remainder of dividing the second argument into the first. So, if the remainder is zero, we have an even number. So, this query returns every other row to the result set. If you want the odd rows only, change the predicate with the MOD function to this:

WHERE MOD(ROWNUM,2) <> 0

Of course, there is no guarantee that the same exact rows will be even (or odd) for subsequent executions of this query. It all depends how DB2 optimizes the query for execution. But it does provide a nice way to produce samples of the data (perhaps to populate a test bed of data).

Wednesday, June 25, 2008

No Alphabetic Characters Wanted

Here is a question that was posed to me recently:

Q: We have a CHAR(10) column that cannot contain alphabetic characters. How can we make sure that the letters A thru Z are not allowed.

A: Well, think about the characteristics of alphabetic characters versus the other "things" that can be stored in a CHAR column. One thing that separates an alphabetic letter from numbers, punctuation, etc. is that there are upper and lower case versions (e.g. A, a). So, you could use the following predicate to preclude alphabetic characters from being accepted:

LOWER(:string) = UPPER(:string)

Of course, you will not be able to put this into a CHECK constraint because of restrictions on their content (for example, you cannot use function in a CHECK constraint). But you could use this in SQL statements and as a check in your programs before allowing data to be inserted or modified in your CHAR(10) column.

Anyone else have any other ideas?

Monday, April 28, 2008

On the Road Again

If you live in or around Birmingham, Alabama or Dallas, Texas or Phoenix, Arizona I will be in your neck of the woods the next two weeks speaking at the local DB2 user groups.

On April 30th, 2008 I'll be speaking at Alabama DB2 User Group on the topic of Managing Data For Long Retention Periods.

Then, on May 2nd, I mosey on over to Dallas to speak on two topics at the DB2 Forum meeting. I'll cover database auditing in one talk and the other will be my "famous" DB2 Top 10 Lists presentation.

The following week, on May 8th, I'll be in Arizona to discuss The Impact of Regulatory Compliance on Database Administration at SWARUG.

And in my last presentation before IDUG, I'll be giving a shortened version of the regulatory compliance presentation in my hometown of Houston, TX at the Techxans: CIO Speaker Forum.

So maybe I'll see you on the road... and, if not, I hope to see you in Dallas for IDUG the week of May 18 thru 22, 2008. I've got a presenation on data breaches from a database perspective (4 PM on Tuesday), and I'll also be leading a Special Interest Group discussion on the changing role of the DBA (9:15 AM on Thursday). You can see the entire agenda here on IDUG's web site.

Thursday, April 10, 2008

Consider Table Expressions to Improve Performance

Table expressions are frequently overlooked as a potential solution for resolving problems using only SQL. Table expressions can be used to force the optimizer to choose a specific processing order.

For example, consider the following query:


SELECT D.DEPTNO,
       MIN(D.DEPTNAME) AS DEPT_NAME,
       MIN(D.LOCATION) AS DEPT_LOCATION,
       SUM(E.SALARY) AS TOTAL_SALARY
FROM   DEPT D,
       EMP E
WHERE  D.DEPTNO = E.WORKDEPT
AND    E.BONUS BETWEEN 0.00 AND 1000.00
GROUP BY D.DEPTNO;

In this query, the detail rows that qualify from each table are joined prior to the GROUP BY processing. In general, there will be more EMP rows than DEPT rows because a department comprises multiple employees. Suppose there were 200 DEPT rows joined to 75,000 EMP rows. The join is done and then the GROUP BY is processed.

Instead, you can use table expressions to force the optimizer to process the aggregations on a table-by-table basis:


SELECT D.DEPTNO,
       D.DEPTNAME,
       D.LOCATION,
       E.TOTAL_SALARY
FROM   DEPT D,
       (SELECT WORKDEPT, SUM(SALARY) AS TOTAL_SALARY
        FROM EMP E
        WHERE E.BONUS BETWEEN 0.00 and 1000.00
        GROUP BY E.WORKDEPT) AS E
WHERE  D.DEPTNO = E.WORKDEPT;

This will produce the same results but it should perform better.

In general, consider using table expressions to pre-filter FULL JOIN tables, to pre-filter null supplying tables of LEFT/RIGHT joins, to separate GROUP BY work, and to generate or derive data.

Friday, March 30, 2007

Mimicking ROWNUM in DB2

I was going through the many random files stored on my PC when I came across this useful nugget. A year or so ago a reader sent me this code. He and his supervisor had come up with a way to simulate the Oracle rownum feature in DB2 and wanted to share it with the world. I think, at the time, I posted it up on TechTarget as I was doing Q+A there on the old search390.com site. But that site is now defunct, so I'm reposting the solution here on my DB2portal blog for all to share.


SELECT DEPTNO, ROW#
FROM DEDBADM.DEPT TB1,
     TABLE (SELECT COUNT(*) + 1 AS ROW#
            FROM DEDBADM.DEPT TB2
            WHERE TB2.DEPTNO < TB1.DEPTNO) AS TDEPT_TAB
WHERE ROW# = 2;

If you decide to use this code, be careful because there is no guarantee that the row numbers will be the same from run to run of the SQL. As long as you get the same access path the row numbers should be the same from run to run, but if the access path changes -- for example, from a table space scan to indexed access or vice versa -- then the row numbers for the runs will differ. That is, each row may have a different number than it had for a previous execution.

All in all, though, a nice solution to the problem!