Friday, September 01, 2006

The Two Biggest DB2 Performance Things

DB2 performance is one of those perennial topics that people just can't seem to get enough of. I guess that is because the performance of applications is one of the bigger issues that end users complain about. And DBAs can be heroes if they can resolve performance problems quickly. It alos could be that performance problems are so ubiquitous because people keep on making the same design and coding mistakes...

With all of that in mind, let's take a look at what I think are the two biggest things you need to get control of to keep DB2 and SQL performance in check.

(1) Keep DB2 statistics up-to-date

Without the statistics stored in the DB2' system catalog, the optimizer will have a difficult time optimizing anything. These statistics provide the optimizer with information pertinent to the state of the tables that the SQL statement being optimized will access. The type of statistical information stored in the system catalog include:
  • Information about tables, including the total number of rows, information about compression, and total number of pages;
  • Information about columns, including number of discrete values for the column and the distribution range of values stored in the column;
  • Information about table spaces, including the number of active pages;
  • Current status of the index, including whether an index exists, the organization of the index (number of leaf pages and number of levels), the number of discrete values for the index key, and whether the index is clustered;
  • Information about the table space and index node groups or partitions.
Statistics populate the DB2 system catalog when you execute the RUNSTATS utility. Generally, this utility is invoked in batch jobs. Be sure to work with your DBA to ensure you accumulate statistics at the appropriate time, especially in a production environment.

(2) Build appropriate indexes

Perhaps the most important thing you can do to assure optimal DB2 application performance is create correct indexes for your tables based on the queries your applications use. Of course, this is easier said than done. But we can start with some basics. For example, consider this SQL statement:

SELECT LASTNAME, SALARY
FROM EMP
WHERE EMPNO = '000010'
AND DEPTNO = 'D01' ;


What index or indexes would make sense for this simple query? ""'First, think about all the possible indexes that you could create. Your first short list probably looks something like this:

  • Index1 on EMPNO
  • Index2 on DEPTNO
  • Index3 on EMPNO and DEPTNO
This is a good start, and Index3 is probably the best of the lot. It lets DB2 use the index to immediately look up the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table, you might want to examine the impact of creating yet another index on the table. Factors to consider include:

Modification impact: DB2 will automatically maintain every index you create. This means every INSERT and every DELETE to this table will insert and delete not just from the table, but also from its indexes. And if you UPDATE the value of a column that is in an index, you also update the index. So, indexes speed the process of retrieval but slow down modification.

Columns in the existing indexes: If an index already exists on EMPNO or DEPTNO, it might not be wise to create another index on the combination. However, it might make sense to change the other index to add the missing column. But not always, because the order of the columns in the index can make a big difference depending on the query. For example, consider this query:

SELECT LASTNAME, SALARY
FROM EMP
WHERE EMPNO = '000010'
AND DEPTNO > 'D01';

In this case, EMPNO should be listed first in the index. And DEPTNO should be listed second, allowing DB2 to do a direct index lookup on the first column (EMPNO) and then a scan on the second (DEPTNO) for the greater-than.

Furthermore, if indexes already exist for both columns (one for EMPNO and one for DEPTNO), DB2 can use them both to satisfy this query so creating another index might not be necessary.

Importance of this particular query: The more important the query, the more you might want to tune by index creation. If you are coding a query that the CIO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. Of course, the decision depends on the application's importance to the business-not just on the user's importance.

Index design involves much more than I have covered so far. For example, you might consider index overloading to achieve index-only access. If all the data that a SQL query asks for is contained in the index, DB2 might be able to satisfy the request using only the index. Consider our previous SQL statement. We asked for LASTNAME and SALARY, given information about EMPNO and DEPTNO. And we also started by creating an index on the EMPNO and DEPTNO columns. If we include LASTNAME and SALARY in the index as well, we never need to access the EMP table because all the data we need exists in the index. This technique can significantly improve performance because it cuts down on the number of I/O requests.

Keep in mind that making every query an index-only access is not prudent or even possible. You should save this technique for particularly troublesome or important SQL statements.

Summary

If you are just embarking on your journey into the wonderful world of DB2 performance management, please, start with the two items covered in this short blog entry. I've just scratched the surface of both areas and you can benefit by additional research and education in both statistics gathering and index design. And if you are a long-time DB2 professional, it can't hurt to bone up on these topics either. You might learn about some newer DB2 feature or function that you haven't used yet, or maybe just strengthen what you already know.

So what do you think? Are these the two most important DB2 performance things, or do you think something else is more important? Post your comments below...

No comments: