The Db2 Portal Blog

Friday, September 15, 2006

The Path to an Executable DB2 Program

Here is another Q+A exchange that I thought might be useful to share with everyone here on the blog:

QUESTION: Could you please explain the difference between a package and a plan, the process of precompilation, compilation and running a COBOL DB2 program as well as a COBOL CICS DB2 program?

ANSWER: Well, I'll hit the highlights in response to your question but if you really want an in-depth answer then you should read the DB2 manuals (or a book like DB2 Developer's Guide).

What is a plan?

A plan is an executable module containing the access path logic produced by the DB2 optimizer. It can be composed of one or more DBRMs and packages. Before a DB2 for z/OS program (with static SQL) can be run, it must have a plan associated with it.

Plans are created by the BIND command. The plan is stored in the DB2 directory and accessed when its program is run. Information about the plan is stored in the DB2 catalog.

What is a package?

A package is a single, bound DBRM with optimized access paths. By using packages, the table access logic is "packaged" at a lower level of granularity than a plan -- at the package (or program) level.

To execute a package, you first must include it in the package list of a plan (usually, there are some exceptions, such as for triggers and user-defined functions). In general, packages are not directly executed, they are only indirectly executed when the plan in which they are contained executes -- as discussed previously, UDFs and triggers are exceptions to this rule. A plan can consist of one or more DBRMs, one or more packages or, a combination of packages and DBRMs.
What is program preparation?

A DB2 application program must go through a process known as program preparation before it can run successfully. Program preparation requires a series of code preprocessors that when enacted in the proper sequence, create an executable load module and a DB2 application plan. The combination of the executable load module and the application plan is required before any DB2 program can be run, whether batch or online. CICS programs require an additional preprocessing step.

The first step is precompilation. DB2 programs must be parsed and modified before normal compilation. The DB2 precompiler performs this task. At a high level, the precompiler basically searches for SQL, strips it out of the program and puts it into a DBRM. The output of precompilation is a DBRM with SQL and a modified source program.

The DBRM is bound using the BIND command, which is a type of compiler for SQL statements. In general, BIND reads SQL statements from DBRMs and produces a mechanism to access data as directed by the SQL statements being bound.

The modified source is compiled using the language compiler of your choice. The compiled source then is link-edited to an executable load module. The appropriate DB2 host language interface module also must be included by the link edit step. This interface module is based on the environment (TSO, CICS, or IMS/TM) in which the program will execute.

At the end of this series of steps, if all goes well, you will have an executable DB2 program.

Wednesday, September 06, 2006

Help for SAP Shops Using DB2 for z/OS

Just a quick blog post today to alert folks to a relatively new blog focusing entirely on SAP usage with DB2 for z/OS. According to the blogger, Omer Brandis:

Whether you have already implemented SAP on DB2 for z/OS, or are just seriously considering it, this is the blog for you. This blog will deal with real issues from the point of view of a true real-world professional, discussing the good, bad, and ugly of SAP on DB2 for z/OS.

Recent posts have covered offsite recovery, reorganizing SAP OFFICE, and hidden REORG jobs. If you use SAP and DB2 on the mainframe, be sure to check back in on this blog on a regular basis (no pun intended).

Friday, September 01, 2006

The Two Biggest DB2 Performance Things

DB2 performance is one of those perennial topics that people just can't seem to get enough of. I guess that is because the performance of applications is one of the bigger issues that end users complain about. And DBAs can be heroes if they can resolve performance problems quickly. It alos could be that performance problems are so ubiquitous because people keep on making the same design and coding mistakes...

With all of that in mind, let's take a look at what I think are the two biggest things you need to get control of to keep DB2 and SQL performance in check.

(1) Keep DB2 statistics up-to-date

Without the statistics stored in the DB2' system catalog, the optimizer will have a difficult time optimizing anything. These statistics provide the optimizer with information pertinent to the state of the tables that the SQL statement being optimized will access. The type of statistical information stored in the system catalog include:

Information about tables, including the total number of rows, information about compression, and total number of pages;
Information about columns, including number of discrete values for the column and the distribution range of values stored in the column;
Information about table spaces, including the number of active pages;
Current status of the index, including whether an index exists, the organization of the index (number of leaf pages and number of levels), the number of discrete values for the index key, and whether the index is clustered;
Information about the table space and index node groups or partitions.

Statistics populate the DB2 system catalog when you execute the RUNSTATS utility. Generally, this utility is invoked in batch jobs. Be sure to work with your DBA to ensure you accumulate statistics at the appropriate time, especially in a production environment.

(2) Build appropriate indexes

Perhaps the most important thing you can do to assure optimal DB2 application performance is create correct indexes for your tables based on the queries your applications use. Of course, this is easier said than done. But we can start with some basics. For example, consider this SQL statement:

SELECT LASTNAME, SALARY
FROM EMP
WHERE EMPNO = '000010'
AND DEPTNO = 'D01' ;

What index or indexes would make sense for this simple query? ""'First, think about all the possible indexes that you could create. Your first short list probably looks something like this:

Index1 on EMPNO
Index2 on DEPTNO
Index3 on EMPNO and DEPTNO

This is a good start, and Index3 is probably the best of the lot. It lets DB2 use the index to immediately look up the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table, you might want to examine the impact of creating yet another index on the table. Factors to consider include:

Modification impact: DB2 will automatically maintain every index you create. This means every INSERT and every DELETE to this table will insert and delete not just from the table, but also from its indexes. And if you UPDATE the value of a column that is in an index, you also update the index. So, indexes speed the process of retrieval but slow down modification.

Columns in the existing indexes: If an index already exists on EMPNO or DEPTNO, it might not be wise to create another index on the combination. However, it might make sense to change the other index to add the missing column. But not always, because the order of the columns in the index can make a big difference depending on the query. For example, consider this query:

SELECT LASTNAME, SALARY
FROM EMP
WHERE EMPNO = '000010'
AND DEPTNO > 'D01';

In this case, EMPNO should be listed first in the index. And DEPTNO should be listed second, allowing DB2 to do a direct index lookup on the first column (EMPNO) and then a scan on the second (DEPTNO) for the greater-than.

Furthermore, if indexes already exist for both columns (one for EMPNO and one for DEPTNO), DB2 can use them both to satisfy this query so creating another index might not be necessary.

Importance of this particular query: The more important the query, the more you might want to tune by index creation. If you are coding a query that the CIO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. Of course, the decision depends on the application's importance to the business-not just on the user's importance.

Index design involves much more than I have covered so far. For example, you might consider index overloading to achieve index-only access. If all the data that a SQL query asks for is contained in the index, DB2 might be able to satisfy the request using only the index. Consider our previous SQL statement. We asked for LASTNAME and SALARY, given information about EMPNO and DEPTNO. And we also started by creating an index on the EMPNO and DEPTNO columns. If we include LASTNAME and SALARY in the index as well, we never need to access the EMP table because all the data we need exists in the index. This technique can significantly improve performance because it cuts down on the number of I/O requests.

Keep in mind that making every query an index-only access is not prudent or even possible. You should save this technique for particularly troublesome or important SQL statements.

Summary

If you are just embarking on your journey into the wonderful world of DB2 performance management, please, start with the two items covered in this short blog entry. I've just scratched the surface of both areas and you can benefit by additional research and education in both statistics gathering and index design. And if you are a long-time DB2 professional, it can't hurt to bone up on these topics either. You might learn about some newer DB2 feature or function that you haven't used yet, or maybe just strengthen what you already know.

So what do you think? Are these the two most important DB2 performance things, or do you think something else is more important? Post your comments below...

Thursday, August 31, 2006

Accessing Partitioned Data

One area that tends to confuse some DB2 developers until they gain experience is how DB2 partitioning works. A common question I get goes something like this: “If a table is in a partitioned table space, say four partitions, for example, then to process the table in batch can I run four instances of the batch program in parallel, one against each partition. What do I code to do this?”

Well, the short and sweet answer to this question is “Yes, you can run four instances of a batch program in parallel if you so desire.” But there is a nuance to this misconception that might be missed here. The question lurking beneath the question is this: “How can I make sure I am accessing only data in one partition in each of the batch programs?”

To do this requires some programming work. The program will need to have a means of identifying which partition it should run against. So, you might code the program to accept an input parameter of 1, 2, 3, or 4. The program would read the parameter and translate it into the key range of values that should be read by the program. This is the LIMITKEY value for the particular partition as found in the partitioning index. You can retrieve this value from the DB2 catalog using the following query:

SELECT PARTITION, LIMITKEY
FROM SYSIBM.SYSINDEXPART
WHERE IXNAME = ?
AND IXCREATOR = ?
ORDER BY PARTITION;

Supply the index name and creator and this query will return the partition number and LIMITKEY for that partition. (If you include this query in the program you probably will want to include the PARTITION column in the WHERE clause and return only a single row.) The LIMITKEY is the high key value for that partition. Using this information you will need to write the queries in the program such that only values from the partition being processes will be retrieved. As long as the program adheres to that key range you should only process data from the one partition that holds that data.

Of course, none of this is necessary to get DB2 to operate in parallel. The best approach uses DB2 query parallelism because it minimizes the amount of programming work and has the potential to maximize performance. To signal DB2 to turn on parallelism you will need to BIND your program specifying the DEGREE(ANY) parameter. Then DB2 will choose the degree of parallelism for the batch program. You will only need to run the program once (instead of 4 times as in our example); DB2 will figure out how many tasks it can run in parallel for each query in the program. And there is no need to modify the program at all! This is far simpler than any alternate approach because DB2 handles chunking up the work into parallel tasks for you.

Tuesday, August 29, 2006

How are Indexes Being Used?

In keeping with my promise to periodically post blog entries based on questions I have received, here we have another question I have been asked:

If I have five indexes on a table, what is the best way to determine if all, some or none of the indexes are being used?

Here is the answer I sent:

The best approach would be to make sure you have run EXPLAIN on all of your production plans and packages. Then examine the PLAN_TABLE output for those indexes. For example:

SELECT *
FROM my.PLAN_TABLE
WHERE ACCESSNAME IN (put your list of indexes here);

Of course, this will not show the dynamic SQL that uses any of these indexes. There are tools on the market that can help you to gather SQL usage statistics and access path information for both static and dynamic SQL statements.

You might consider acquiring one of these tools if you use a lot of dynamic SQL. One such offering is NEON Enterprise Software’s SQL Performance Expert solution.