When it comes to assuring optimal performance of Db2
applications, coding properly formulated SQL is an imperative. Most experts agree that poorly coded SQL and application code is the cause of most
performance problems – perhaps as high as 80% of poor relational performance is
caused by “bad” SQL and application code.
But writing efficient SQL statements can be a tricky
proposition. This is especially so for programmers and developers new to a
relational database environment. So, before we delve into the specifics of
coding SQL for performance, it is best that we take a few moments to review SQL
basics.
SQL, an acronym for Structured Query Language, is a powerful
tool for manipulating data. It is the de facto standard query language for
relational database management systems and is used not just by Db2, but also by
the other leading RDBMS products such as Oracle, Sybase, and Microsoft SQL
Server.
SQL is a high-level language that provides a greater degree
of abstraction than do procedural languages. Most programming languages require
that the programmer navigate data structures. This means that program logic
needs to be coded to proceed record-by-record through data elements in an order
determined by the application programmer or systems analyst. This information
is encoded in the program logic and is difficult to change after it has been programmed.
SQL, on the other hand, is fashioned so that the programmer
can specify what data is needed, and not how to
retrieve it. SQL is coded without embedded data-navigational instructions. Db2
analyzes the SQL and formulates data-navigational instructions "behind the
scenes." These data-navigational instructions are called access paths. By
having the DBMS determine the optimal access path to the data, a heavy burden
is removed from the programmer. In addition, the database can have a better
understanding of the state of the data it stores, and thereby can produce a
more efficient and dynamic access path to the data. The result is that SQL,
used properly, can provide for quicker application development.
Another feature of SQL is that it is not merely a query
language. The same language used to query data is used also to define data
structures, control access to the data, and insert, modify, and delete
occurrences of the data. This consolidation of functions into a single language
eases communication between different types of users. DBAs, systems
programmers, application programmers, systems analysts, and end users all speak
a common language: SQL. When all the participants in a project are speaking the
same language, a synergy is created that can reduce overall system-development
time.
Arguably, though, the single most important feature of SQL
that has solidified its success is its capability to retrieve data easily using
English-like syntax. It is much easier to understand the following than it is
to understand pages and pages of program source code.
SELECT LASTNAME
FROM EMP
WHERE
EMPNO = '000010';
Think about it; when accessing data from a file the
programmer would have to code instructions to open the file, start a loop, read
a record, check to see if the EMPNO field equals the proper value, check for
end of file, go back to the beginning of the loop, and so on.
SQL is, by nature, quite flexible. It uses a free-form
structure that gives the user the ability to develop SQL statements in a way
best suited to the given user. Each SQL request is parsed by the DBMS before
execution to check for proper syntax and to optimize the request. Therefore,
SQL statements do not need to start in any given column and can be strung
together on one line or broken apart on several lines. For example, the
following SQL statement is equivalent to the previously listed SQL statement:
SELECT LASTNAME
FROM EMP WHERE EMPNO = '000010';
Another flexible feature of SQL is that a single request can
be formulated in a number of different and functionally equivalent ways. One
example of this SQL capability is that it can join tables or nest queries. A
nested query always can be converted to an equivalent join. Other examples of
this flexibility can be seen in the vast array of functions and predicates.
Examples of features with equivalent functionality are:
· BETWEEN versus <=
/ >=
· IN versus a series of
predicates tied together with AND
· INNER JOIN versus
tables strung together in the FROM clause separated by commas
· OUTER JOIN versus a
simple SELECT, with a UNION, and a correlated subselect
· CASE expressions
versus UNION ALL statements
This flexibility
exhibited by SQL is not always desirable as different but equivalent SQL
formulations can result in extremely differing performance. The ramifications
of this flexibility are discussed later in this paper with guidelines for
developing efficient SQL.
As mentioned, SQL
specifies what data to retrieve or manipulate, but does not specify how you
accomplish these tasks. This keeps SQL intrinsically simple. If you can
remember the set-at-a-time orientation of a relational database, you will begin
to grasp the essence and nature of SQL. A single SQL statement can act upon
multiple rows. The capability to act on a set of data coupled with the lack of
need for establishing how to retrieve and manipulate data defines SQL as a
non-procedural language.
Because SQL is a
non-procedural language a single statement can take the place of a series of
procedures. Again, this is possible because SQL uses set-level processing and
DB2 optimizes the query to determine the data-navigation logic. Sometimes one
or two SQL statements can accomplish tasks that otherwise would require entire
procedural programs to do.
High-Level SQL Coding Guidelines
When you are writing
your SQL statements to access Db2 data be sure to follow the subsequent
guidelines for coding SQL for performance. These are certain very simple, yet
important rules to follow when writing your SQL statements. Of course, SQL
performance is a complex topic and to understand every nuance of how SQL
performs can take a lifetime. That said, adhering to the following simple rules
puts you on the right track to achieving high-performing Db2 applications.
1) The first rule is to
always provide only the exact columns that you need to retrieve
in the SELECT-list of each SQL SELECT statement. Another way of stating this is
“do not use SELECT *”. The shorthand SELECT * means retrieve all columns from
the table(s) being accessed. This is fine for quick and dirty queries but is
bad practice for inclusion in application programs because:
· Db2 tables may need
to be changed in the future to include additional columns. SELECT * will
retrieve those new columns, too, and your program may not be capable of
handling the additional data without requiring time-consuming changes.
· Db2 will consume
additional resources for every column that requested to be returned. If the
program does not need the data, it should not ask for it. Even if the program
needs every column, it is better to explicitly ask for each column by name in
the SQL statement for clarity and to avoid the previous pitfall.
2)
Do not ask for what you already know. This may sound
simplistic, but most programmers violate this rule at one time or another. For
a typical example, consider what is wrong with the following SQL statement:
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE
EMPNO = '000010';
Give up? The problem
is that EMPNO is included in the SELECT-list. You already know that EMPNO will
be equal to the value '000010' because that is what the WHERE clause tells DB2
to do. But with EMPNO listed in the WHERE clause Db2 will dutifully retrieve
that column too. This causes additional overhead to be incurred thereby
degrading performance.
3) Use the WHERE clause
to filter data in the SQL instead of bringing it all into your program to
filter. This too is a common rookie mistake. It is much better for Db2 to
filter the data before returning it to your program. This is so because Db2
uses additional I/O and CPU resources to obtain each row of data. The fewer
rows passed to your program, the more efficient your SQL will be. So, the
following SQL
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE
SALARY > 50000.00;
Is better than simply reading all of the data without the
WHERE clause and then checking each row to see if the SALARY is greater than
50000.00 in your program.
These rules, though,
are not the be-all, end-all of SQL performance tuning – not by a long shot.
Additional, in-depth tuning may be required. But following the above rules will
ensure that you are not making “rookie” mistakes that can kill application performance.
In Closing
This short blog post is just the very beginning of SQL performance for Db2 programmers. Indeed, I wrote a book on the topic called A Guide to Db2 Performance for Application Developers, so check that out if this post has whetted your appetite for more Db2 performance tips... and if you are a more visual learner, I have also partnered with Interskill Learning for a series of videos in their Expert Video series on the topic of Coding Db2 Applications for Performance. So, why wait, dig in to a book, some videos, or both, to help improve the performance of your Db2 applications!