Showing posts with label FETCH FIRST. Show all posts
Showing posts with label FETCH FIRST. Show all posts

Tuesday, February 02, 2016

The Most Misunderstood Features of DB2 – Part 2: Optimize vs. Limited Fetch

Welcome to Part 2 in my on-going blog series on The Most Misunderstood Features of DB2. In Part 1 of the series we tackled the topic of locking, which IMHO is easily the most misunderstood feature of DB2 (probably of most DBMSes).

Today's topic is a brief one, but one that I've found folks to be confused about. Namely, the difference between the OPTIMIZE FOR and FETCH x ROWS ONLY clauses.

The OPTIMIZE FOR n ROWS and FETCH FIRST n ROWS ONLY are similar when it comes to influencing the optimizer. Both will provide help to the optimizer about the scale of the query (that is, the number of rows to expect when optimizing the access path to the data).

But, the two clauses have different purposes. If all you want to do is influence the optimizer then you should use OPTIMIZE FOR n ROWS as its usage does not limit the number of rows that can be fetched when the statement is executed. For example, you might specify OPTIMIZE FOR 1 ROW on a SELECT statement. But if 500 rows qualify then you can FETCH all 500 rows at run time. 

(Note: the specific numbers used in the previous paragraph are not important; it is important only that you know the number of rows specified in the OPTIMIZE clause does not inhibit the number of rows that can later be accessed by the query/program.)

The FETCH FIRST n ROWS ONLY clause is different. It actually limits the number of rows that can be fetched when the query is run. If you specify the clause FETCH FIRST 5 ROWS ONLY, then only 5 rows can be fetched - even if 100 or 1000 or more actually qualify. And yes, the number of rows you specify for FETCH FIRST will be provided to the optimizer for access path formulation, but it also will change the way the query behaves when it runs. 

So, when you write a SELECT statement that qualifies a large number of rows, but you only need a few rows at run time, it is a good idea to use FETCH FIRST n ROWS ONLY. On the other hand, if you are simply trying to tweak query performance by getting the optimizer to consider a different number of qualifying rows, then you should consider using OPTIMIZER FOR n ROWS.

The two statements are quite different and it is important that developers not confuse the two!

Sunday, September 01, 2013

Top Ten Common SQL Mistakes (with DB2 for z/OS)

There are many different types of development mistakes that can be made when you are coding a complex application system. But probably the most pervasive mistakes being made when developing an application using DB2 for z/OS are SQL mistakes... well, actually, that statement is probably true regardless of the DBMS, but I am going to focus on DB2 for z/OS given that this is a blog that focuses on DB2 and mainframe issues.

  1. Assuming an ORDER BY is not necessary for ordered results - if you want your results always to be returned in a specific order, you must include the ORDER BY clause. If you fail to do so, a future rebind can change the access path and also possibly change the order of your results set.
  2. Forgetting the NULL indicator - when your query can return a null, but sure to include a null indicator or you will get a -305 SQLCODE (22002 SQLSTATE). And be sure to check the null indicator to see if the data is null or not!
  3. Incorrect expectations when using the NOT IN predicate with NULLs - Nulls can be confusing when using the IN predicate. For example, what if we want to find all Colors who are not assigned to a particular Product using a query like shown below. THe problem arises when the P.color can be NULL. The NULL causes the predicate to be UNKNOWN so the results set is always empty. 
           SELECT C.color
           
FROM   Colors AS C
           
WHERE  C.color NOT IN (SELECT P.color
                                  
FROM Products AS P);

  1. Coding predicates appropriately in Outer Joins - Sometimes it can be difficult to figure out how to code outer joins appropriately if you don't code them very often. Terry Purcell has written a nice article on the topic that can be found here.
  2. Not coding a cursor for a multi-row result - When more than one row can be returned by your query you must use a cursor (or specify FETCH FIRST 1 ROW ONLY)
  3. Recompiling but not binding - if you make changes to the SQL, you have to BIND it again. Otherwise it won't work! 
  4. Forgetting to use single quotes around strings (instead of double quotes) - DB2 SQL expects single quotes around character strings.
  5. Trying to modify a Primary Key column - you cannot modify a primary key. A primary key should be unique within the table and immutable.
  6. Forcing dynamic SQL into static SQL (sometimes hundreds of static SQL statements) - you should analyze the type and nature of your database queries to determine whether they should be static or dynamic. Consider using the advice here (static SQL v. dynamic SQL) to guide you.
  7. Asking for more data than you need (columns and/or rows) - specify only the columns that you actually need in your SELECT-list... and use WHERE clauses (predicates) to filter the data to just that data that you need before bringing it into the program. The lesser the amount of data that DB2 needs to transfer from the database to your program, the more efficient things will be!

Thursday, September 24, 2009

Limiting the Number of Rows Fetched

Application developers frequently need to retrieve a limited number of qualifying rows from a table. For example, maybe you need to list the top ten best selling items from inventory, or a list of the top five most expensive products (i.e., highest price tag). There are several ways to accomplish this prior to DB2 V7 using SQL, but they are not necessarily efficient.

The first reaction is to simply use the WHERE clause to eliminate non-qualifying rows. But this is simplistic, and often is not sufficient to produce the results desired in an optimal manner. What if the program only requires that the top ten results be returned? This can be a somewhat difficult request to formulate using SQL alone.

Consider, for example, an application that needs to retrieve only the top ten most highly paid employees from the EMP sample table. You could simply issue a SQL request that retrieves all of the employees in order by salary, but only use the first ten retrieved. That is easy; for example:

EMPNO, FIRSTNME, LASTNAME, SALARY
FROM DSN8710.EMP
ORDER BY SALARY DESC;

You must specify the ORDER BY clause with the DESC key word. This sorts the results into descending order, instead of the default, which is ascending. Without the DESC key word, the "top ten" would be at the very end of the results set, not at the beginning.

But that does not really satisfy the requirement - retrieving only the top ten. It merely sorts the results into descending sequence. So the results would still be all employees in the table, but in the correct order so you can view the "top ten" salaries very easily. The ideal solution should return only the top ten employees with the highest salary and not merely a sorted list of all employees.

You can code some "tricky" SQL to support this request for all versions of DB2, such as the following:

SELECT EMPNO, FIRSTNME, LASTNAME, SALARY
FROM DSN8710.EMP A
WHERE 10 > (SELECT COUNT(*)
FROM DSN8710.EMP B
WHERE A.SALARY < B.SALARY)
AND SALARY IS NOT NULL
ORDER BY SALARY DESC;

This SQL is portable from DB2 to other DBMSs, such as Oracle or SQL Server. And, of course, you can change the constant 10 to any number you wish, thereby retrieving the top 20, or top 5, as deemed necessary by the needs of your application.

Since the SALARY column is nullable in the EMP table, you must remove the nulls from the results set. And the ORDER BY is required to sort the results in the right order. If it is removed from the query, the results will still contain the top ten, but they will be in no particular order.

But DB2, as of V7, provides an easier and less complicated way to limit the results of a SELECT statement - the FIRST key word. You can code FETCH FIRST n ROWS which will limit the number of rows that are fetched and returned by a SELECT statement.

Additionally, you can specify a new clause -- FETCH FIRST ROW ONLY clause -- on SELECT INTO statements when the query can return more than one row in the answer set. Doing so informs DB2 to ignore the other rows.

There is one difference between the new V7 formulation and the other SELECT statement we reviewed, and that is the way "ties" are handled. A tie occurs when more than one row contains the same value. The previous query we examined may return more than 10 rows if there are multiple rows with the same value for price within the top ten.

Using the FIRST key word DB2 will limit the number of rows returned to ten, even if there are other rows with the same value for price as the number ten row in the results set. The needs of your application will dictate whether ties are to be ignored or included in the result set. If all "ties" need to be included in the results set, which would mean that more than 10 rows would be needed, the new V7 feature may not prove to be helpful.

And it is also important to note that as of DB2 9, you can include the FETCH FIRST clause in a subselect. ORDER BY is allowed in a subselect, too. The subselect MUST be enclosed in parentheses and the FETCH FIRST (or ORDER BY) cannot be in the outermost fullselect of a view, or in a materialized query table.