When you are writing your SQL statements to access DB2 data be sure to follow the subsequent guidelines for coding SQL for performance. These are certain very simple, yet important rules to follow when writing your SQL statements. Of course, SQL performance is a complex topic and to understand every nuance of how SQL performs can take a lifetime. That said, adhering to the following simple rules puts you on the right track to achieving high-performing DB2 applications.
1) The first rule is to
always provide only the exact columns that you need to retrieve
in the SELECT-list of each SQL SELECT statement. Another way of stating this is
“do not use SELECT *”. The shorthand SELECT * means retrieve all columns from
the table(s) being accessed. This is fine for quick and dirty queries but is
bad practice for inclusion in application programs because:
- DB2 tables may need to be changed in the future to include additional columns. SELECT * will retrieve those new columns, too, and your program may not be capable of handling the additional data without requiring time-consuming changes.
- DB2 will consume additional resources for every column that requested to be returned. If the program does not need the data, it should not ask for it. Even if the program needs every column, it is better to explicitly ask for each column by name in the SQL statement for clarity and to avoid the previous pitfall.
2) Do not ask for what
you already know. This may sound simplistic, but most programmers violate
this rule at one time or another. For a typical example, consider what is wrong
with the following SQL statement:
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE EMPNO = '000010';
Give up? The problem is that EMPNO is included in the
SELECT-list. You already know that EMPNO will be equal to the value '000010'
because that is what the WHERE clause tells DB2 to do. But with EMPNO listed in
the WHERE clause DB2 will dutifully retrieve that column too. This causes
additional overhead to be incurred thereby degrading performance.
3) Use the WHERE clause
to filter data in the SQL instead of bringing it all into your program to
filter. This too is a common rookie mistake. It is much better for DB2 to
filter the data before returning it to your program. This is so because DB2
uses additional I/O and CPU resources to obtain each row of data. The fewer
rows passed to your program, the more efficient your SQL will be. So, the
following SQL:
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE SALARY > 50000.00;
Is better than simply reading all of the data without the
WHERE clause and then checking each row to see if the SALARY is greater than
50000.00 in your program.
These rules, though, are not the be-all, end-all of SQL performance tuning – not by a long shot. Additional, in-depth tuning may be required. But following the above rules will ensure that you are not making “rookie” mistakes that can kill application performance.
3 comments:
For the test purpose I very often breake rule #2 running SQL in SPUFI. ;)
In quick and dirty situations I think many of us break that rule... and really, that is OK. The problems accrue when such SQL is placed into production though... in that case the SQL gets run over and over with the unneeded columns being returned over and over, thereby consuming resources that need not be consumed.
And avoid the use of cast functions or any functions in the WHERE clause like "DATE(timestamp_value) = current date". This is a common issue I fix almost everyday giving a boost with index scans.
Post a Comment