Tuesday, July 21, 2015

Influencing the DB2 Optimizer: Part 4 - OPTIMIZE FOR n ROWS

In the first installments of this series of posts on ways to influence the DB2 optimizer we looked at standard methods like RUNSTATS and SQL tweaking techniques. Today, we will look at yet another method of influencing access path selection: specifying the OPTIMIZE FOR n ROWS for a SELECT statement. 

This clause enables programmers to specify the estimated maximum number of rows that will be retrieved. By indicating that a different number of rows will be returned than DB2 anticipates, you can influence access path selection. 

For example, consider the following statement:

  EXEC SQL
    DECLARE OPT_CUR FOR
      SELECT   WORKDEPT, EMPNO, SALARY
      FROM     DSN81010.EMP
      WHERE    WORKDEPT IN (‘A00’, ‘D11’)
        OPTIMIZE FOR 5 ROWS
  END-EXEC.

In this example, the number of rows to be optimized for has been set to 5. Keep in mind, though, that this this query could return more than 5 rows. DB2 formulates an access path optimized for 5 rows. More rows can be retrieved, but performance could suffer if you greatly exceed the estimated maximum.

Note: The FETCH FIRST n ROWS ONLY clause can be used to limit the number of rows returned regardless of the number that actually qualify.

Using OPTIMIZE FOR generally is preferable to both manually updating the DB2 Catalog statistics (our next topic, so say tuned) and tweaking the SQL statement (our previous topic). It provides more information to DB2’s optimization process, thereby giving DB2 the opportunity to establish a better access path. The crucial point, though, is that DB2 is doing the optimization; no manual updates or artificial SQL constructs are required.

When using the OPTIMIZE FOR n ROWS clause, it is a good idea to make n as accurate as possible. An accurate estimate gives DB2 the best opportunity to achieve optimum performance for the statement and also helps document the purpose of the SQL statement. For example, if you are coding an online transactions in which 25 rows (for example) are displayed on the screen, code OPTIMIZE FOR 25 ROWS.

But this is not really a hard and fast rule. You can tinker with different values to see what impact it may have on the optimization process. 

There is another "trick" that you can deploy using OPTIMIZE FOR n ROWS. By setting n=1 you should be able to disable list prefetch for the query.

One caution to keep in mind: DB2 uses the value of n for the block size of a distributed network request. The smaller the value of n, the fewer rows sent across the network for each block. The only exception is that when n=1, DB2 will set the block size to 16.

Synopsis 


So if you are having trouble with a specific access path, consider using the OPTIMIZE FOR n ROWS clause with various values for n. It just might give you some reasonable results.

No comments: