Did you know that the order in which you code your
predicates can
have an impact on query performance? It is usually a minimal impact,
but it may buy you a couple of microseconds for a very performance-critical
query. In order to use predicate ordering to your
advantage however, you need to be armed with some basic
information on how DB2 evaluates predicates as it processes
your SQL.
So,
before we continue, let's review the order in which DB2 evaluates predicates at execution time. DB2 will evaluate indexable predicates
first: matching predicates before non-matching. Then, Stage
1 predicates, and finally Stage 2 predicates. Within each of these
four groups, DB2 will evaluate equal predicates, then BETWEEN
and NOT NULL predicates, and finally, any other predicates.
If more than one predicate exists within a group, then DB2
will evaluate them in the physical order in which they are coded
in the SQL statement.
The
re-ordering of predicates to take advantage of this situation should
be considered only as a last resort. When implemented, the technique
will usually shave only a little bit from the query's
execution time. It is also important to note that predicate order
will not impact a query's access path: it will remain unchanged (as shown in the PLAN_TABLE).
Now,
how can we use this to our advantage?
Consider the following query:
SELECT
LAST_NAME, FIRST_NAME
FROM
EMPLOYEE
WHERE
SEX = 'M'
AND
TITLE = 'MANAGER'
For
the purposes of this discussion, no index exists for either of the
columns coded in the predicates. They are therefore the same type:
stage 1 and equal predicates. Furthermore, we know our data
- in our organization, there is approximately a 50-50 split between
males and females, and 15% of all employees are managers.
To
optimize this query then, we can swap the two predicates to achieve
better performance. So the query becomes:
SELECT
LAST_NAME, FIRST_NAME
FROM
EMPLOYEE
WHERE
TITLE = 'MANAGER'
AND
SEX = 'M'
Why
should this query outperform the previous version? Well, assume
we have 100,000 employees. If DB2 retrieves 50% of the rows
(SEX = 'M') and then retrieves 15% of those 50%, we will have
processed 57,500 rows:
(
100000 * 0.5 ) + ( ( 100000 * 0.5 ) * 0.15 ) = 57,500
But,
if instead, DB2 were to retrieve 15% of the rows (TITLE = 'MANAGER')
and then 50% of those, we will have processed only
22,500 rows:
(
100000 * 0.15) + ( ( 100000 * 0.15 ) * 0.5 ) = 22,500
Obviously,
it is better for fewer rows to qualify early, thereby reducing
the answer set and the number of rows that will have to be
subsequently scanned.
No comments:
Post a Comment