Sorting is a very costly operation that you should strive to avoid if at all possible.
Indexes are very useful for this purpose. DB2 can choose to use an available index
to satisfy the sorting requirements of an ORDER BY, GROUP BY, or DISTINCT clause.
Therefore, it can be beneficial to create indexes that correspond to these clauses
for frequently executed SQL statements.
For example, consider this SQL statement:
SELECT LAST_NAME, TITLE, SALARY
FROM EMPLOYEE
ORDER BY LAST_NAME, TITLE;
If an index on LAST_NAME and TITLE exists, DB2 can use it to avoid sorting. By
retrieving the data in order from the index, sorting becomes unnecessary.
You can use this information to tune your SQL further to avoid sorting. When
ordering is required only by one column, sometimes adding another column to
the ORDER BY can enable indexed access to avoid the sort.
Consider this SQL statement:
SELECT DEPT_CODE, MANAGER, DEPT_NAME,
FROM DEPARTMENT
ORDER BY MANAGER, DEPT_CODE;
Say there is a unique index on MANAGER. DB2 will probably use this index to
satisfy the request. However, a sort will be done after retrieving the data to
put the results into DEPT_CODE order within MANAGER. But, since we know
our data, we are able to change this situation. Because MANAGER is unique
we know that the following SQL statement is equivalent to the prior one:
SELECT DEPT_CODE, MANAGER, DEPT_NAME,
FROM DEPARTMENT
ORDER BY MANAGER;
In this case, DB2 can use the index on MANAGER to avoid the sort. The extra
column, DEPT_CODE, is removed from the ORDER BY clause. But, since MANAGER
is unique, there can be at most one DEPT_CODE per MANAGER. Therefore, the
sort order will be equivalent. Because we knew our data we removed a sort from
the process and made our query more efficient!
One final note on sorting: although most of you probably know this, it cannot be
stated too strongly (or frequently) - always code an ORDER BY if the order of the
results of your query is important. The ORDER BY clause is the only foolproof
method of ensuring appropriate sort order of query results. Simply verifying that
access is via an index is not sufficient to ensure the order of the results because:
- The access path may change in the future to non-indexed access.
- Even indexed access does not always return ordered results
(e.g. list prefetch).
UNION VERSUS UNION ALL
Another simple, yet effective, means of enhancing SQL performance is to
understand how UNION, EXCEPT and INTERSECT work. Let's start with UNION
because it has been around in DB2 the longest. UNION takes the results of
multiple SELECT statements and combines them together. It does this, as part
of the UNION operation, by sorting the results and removing any duplicates.
UNION ALL, on the other hand, will not sort and will not remove duplicates.
If you know that the results of the queries being unioned together are distinct
(that is, no duplicates will be encountered), then you can use UNION ALL instead
of UNION, and thereby enhance performance by avoiding the sort. Additionally,
if you do not care whether duplicates are returned, always use UNION ALL.
The same applies to the EXCEPT and INTERSECT operations, which were added
as of DB2 9 for z/OS.