Monday, January 25, 2010

Which is better? "BETWEEN" vs "<=" and >"="

This was a recent topic on the DB2-L mailing list so I thought I'd weigh in with my two cents worth on the topic.

As with most DB2 (and, indeed, IT) issues, the correct answer is "it depends!" Let's dig a bit deeper to explain what I mean.

From a maintainability perspective, BETWEEN is probably better. The BETWEEN predicate is easier to understand and code than the equivalent combination of the less than or equal to predicate (<=) and the greater than or equal to predicate (>=). In past releases, in many cases it was more efficient, too. But today the Optimizer recognizes the two formulations as equivalent and there usually is no performance benefit one way or the other. Performance reasons aside, one BETWEEN predicate is easier to understand and maintain than multiple <= and >= predicates. For this reason, I tend to favor using BETWEEN.

But not always. Consider the scenario of comparing a host variable to two columns. Usually BETWEEN is used to compare one column to two values, here shown using host variables:

WHERE COLUMN1 BETWEEN :HOST-VAR1 AND :HOST-VAR2

However, it is possible to use BETWEEN to compare one value to two columns, as shown:

WHERE :HOST-VAR BETWEEN COLUMN1 AND COLUMN2

This statement should be changed to

WHERE :HOST_VAR >= COLUMN1 and :HOST-VAR <= COLUMN2

The reason for this exception is that a BETWEEN formulation comparing a host variable to two columns is a Stage 2 predicate, whereas the preferred formulation is Stage 1. And we all know that Stage 1 outperforms Stage 2, right?

Remember too, that SQL is flexible and often the same results can be achieved using different SQL formulations. Sometimes one SQL statement will dramatically outperform a functionally equivalent SQL statement just because it is indexable and the other is not. For example, consider this SQL statement

SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
FROM EMP
WHERE MIDINIT NOT BETWEEN 'A' AND 'G';

It is not indexable because it uses the NOT BETWEEN predicate. However, if we understand the data in the table and the desired results, perhaps we can reformulate the SQL to use indexable predicates, such as

SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
FROM EMP
WHERE MIDINIT >= 'H';

Or we could code MIDINIT BETWEEN 'H' AND 'Z' in place of MIDINIT >= 'H'. Of course, for either of these solutions to work correctly we would need to know that MIDINIT never contained values that collate lower than the value 'A'.

So, as usual, there is no one size fits all answer to the question!

1 comment:

Bode Museum said...

Hello Craig!!!

Loved your blog.

It was a very interesting topic which gained my attention and after reading it led me comment.

Thanks for sharing.