One of
the constant struggles with Db2 development, and indeed developing for any relational DBMS, is designing and creating appropriate indexes. Perhaps the most important thing you can do to assure
optimal application performance when accessing data in a relational/SQL
database is to create correct indexes for your tables based on the queries your
applications use. Of course, this is easier said than done.
But we
can start with some basics. For example, consider this SQL statement:
SELECT LASTNAME, SALARY
FROM EMP
WHERE EMPNO = '000010'
AND DEPTNO = 'D01';
|
What
index (or indexes) would it make sense to build for this simple query? First, think about all
the possible indexes that you could create. Your first short-list probably
looks something like this:
- Index1
on EMPNO
- Index2
on DEPTNO
- Index3
on EMPNO and DEPTNO
This is
a good start, and Index3 is likely to be the best of the lot. It enables the optimizer to use the index to immediately look up the row or rows that satisfy the two
simple predicates in the WHERE clause. Of course, if you already have a lot of
indexes on the EMP table, you might want to examine the impact of creating yet
another index on the table.
With
the caveat that appropriate index creation can be complicated, let’s look at
the Top 10 things you can do to build the right indexes on your tables:
1. Index by workload, not by object
Many
people make the mistake of just guessing at some indexes to create when they
are creating other database objects (like tables and tablespaces). But without an
idea of how the tables are going to be accessed these guesses are
usually wrong – at least some of them.
Indexes
should be built to optimize the access of your SQL queries. To properly create
an optimal set of indexes requires a list of the SQL to be used, an estimate of
the frequency that each SQL statement will be executed, and the importance of
each query. Only then can the delicate balancing act of creating the right
indexes to optimize the right queries most of the time be made.
If you
are doing it any other way, you are doing it wrong.
2. Build indexes based on predicates
3. Index most-heavily used queries
Numbers
2 and 3 can be thought of as corollaries to Number 1… that is, these are the
aspects of application workload that need to be examined to produce appropriate
and effective indexes.
Look at the predicates of the queries in the workload and work to create a set of indexes that match up to most (if not all) of them. When it is not practical to build all of the indexes you have identified, then it makes sense to look at the queries that will be used most often and optimize them first.
4. Index important queries
The
more important the query, the more you might want to tune by index creation. If
you are coding a query that the CIO will run every day, you want to make sure
it delivers optimal performance. So building indexes for that particular query
is important. On the other hand, a query for a clerk might not necessarily be
weighted as high, so that query might have to make do with the indexes that
already exist.
Of course, the decision should depend on the application’s importance
to the business - not just on the user’s importance. For example, if that clerk runs a query hundreds of times a day and the CIO runs his query only once, then perhaps the clerk's query is more important.
5. Index to avoid sorting (GROUP BY, ORDER BY)
In
addition to building indexes to optimize data access, indexes can be used to
avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which
can cause performance slowdowns. By indexing on the columns specified in these
clauses, the optimizer can use an index to avoid a sort, and thereby
potentially improve performance.
6. Create indexes for uniqueness (PK, U)
Some
indexes are required in order to make the database schema valid. Most Database
systems require that unique indexes be created when unique and primary key
constraints exist.
7. Create indexes for foreign keys
Even if
not required, creating indexes for each foreign key can optimize the
performance when accessing and enforcing referential constraints (RI –
referential integrity).
8. Consider adding columns for index-only access
Sometimes
it can be advantageous to include additional columns in an index to increase
the chances of index-only access. With index-only access, all of the data needed
to satisfy the query can be found in the index alone — without having to read
data from the tablespace.
For
example, suppose that there is an index on the DEPTNO column of the DEPT table.
The following query may use this index:
SELECT DEPTNAME FROM DEPT WHERE DEPTNO =
'D01';
|
The
index could be used to access only those columns with a DEPTNO greater
than D00, but then the DBMS would need to access the data in the table space to
return the DEPTNAME. If you added DEPTNAME to the index, that is, create the
index on (DEPTNO, DEPTNAME) then all of the data needed for this query exists
in the index and additional I/O to the table space would not be needed.
This technique is sometimes referred to as index overloading.
Of
course, this is not always a good idea. You have to take into account whether
other queries use the index and how it might negatively impact their
performance.
It is also worth mentioning index include columns, which allows you to define additional (included) columns that are not part of the actual key but are available in the index. So we can create an index like this:
CREATE INDEX IXNAME
ON DEPT ( DEPTNO )
INCLUDE ( DEPTNAME )
The index key is only on the DEPTNO column, but the DEPTNAME is also included in the index, so the earlier query can be satisfied using index-only access with this index.
9. Don’t arbitrarily limit the number of indexes
An
example of a bad standard would be to place an artificial
limit on indexing. There should be no arbitrary limit on
the number of indexes that you can create for any database table. Indexes
are undoubtedly one of the most important factors in creating efficient
queries. Relational optimizers rely on indexes to build fast access paths to
data. Without indexes data must be scanned – and that can be a long,
inefficient means by which to retrieve your data. When a rule such as this
exists, it usually is stated in the standards manual using verbiage something
like “Each table can have at most five indexes created for
it” — or — “Do not create more than three indexes for any
single table in the database.” These are bad standards.
If
you already have three indexes, or five indexes, or even 57 indexes, and
another index will improve performance why would you arbitrarily want to avoid
creating that index?
Anyway,
a good indexing standard, if you choose to have one, should read something like
this: “Create indexes as necessary to support your database queries.
Limitations on creating new indexes should only be entertained when they begin
significantly to impede the efficiency of data modification.”
Which
brings us to…
10. Be aware of I/U/D implications
Db2 must automatically maintain every index you create. This means every
INSERT and every DELETE to an indexed table will insert and delete not just
from the table, but also from its indexes.
Additionally,
when you UPDATE the value of a column that has been defined in an index, the
DBMS must also update the index. So, indexes speed the process of retrieval but
slow down modification.
So the general rule of thumb should be "Index until it hurts... and then back off the least effective index to make it no longer hurt." Sure, that is easier said than done, but it is a valid philosophy to follow.
Summary
Following
these Top 10 index design techniques can go a long way toward improving not
just your index usage, but also the performance of your database applications.