Welcome to our
on-going series of misunderstood issues in the world of DB2 for z/OS. Today’s
topic is indexing, or to be more precise: not indexing. What do I mean by that?
Well, I’ve heard folks
say that you should always create at least one index on every DB2 table. And
while I can empathize with that general notion, I can’t agree completely
because of that word “always,” which almost always makes a sentence wrong.
Sure, there are
several good arguments for creating indexes on DB2 tables. Perhaps the most
important one is clustering. DB2 uses an index to control how data is clustered
on disk. Only one index can be specified as the clustering index (because, of
course, the data on disk can only be stored in one order, right!). If you do
not specify a clustering index then DB2 will use the earliest created (oldest)
index to cluster the data. So it is usually a good idea to create an index to
guide clustering.
Another popular reason
to create an index on a DB2 table is to enforce uniqueness for a UNIQUE
constraint or PRIMARY KEY. The only way to enforce uniqueness on a DB2 column
(or set of columns) is by using a unique index.
Of course, there are a
lot of other good reasons to create indexes, most of them to improve query
performance. But I contend that there are situations when it makes sense not to
create any indexes at all.
So when does it make more sense not to build an index for a DB2 table?
Let's
start by saying that most of the time you will want to build at least one - and
probably multiple - indexes on each table that you create. Indexes are crucial
for optimizing performance of SQL access. Without an index, queries must scan
every row of the table to come up with a result. And that can be very slow.
Having
said that, here are a few situations some times where it can make sense to have
no indexes defined on a table:
When all (or most) accesses retrieve every row
of the table. Because every row will be retrieved every time you want to use
the table, an index (if used) would just add extra I/O and would diminish, not
enhance performance. Though not extremely common, you may indeed come across
such tables in your organization.
For a very small table with only a few pages of data and no
primary key or uniqueness requirements. A very small table (perhaps 20 to 30 or
so pages) might not need an index because simply reading all of the pages is
very efficient already.
When performance doesn't matter and the table is only accessed
very infrequently. But, hey, when do you ever
have that type of requirement in the real world?
Other
than for these circumstances, you will most likely want to build one or more
indexes on each table, not only to optimize performance, but also to ensure
uniqueness, to support referential integrity, and to drive data clustering.
Of
course, indexes do not come without cost. Indexes take up disk space and adding
a lot of indexes will consume disk space. An additional consideration is their
impact in data modification. Although indexes speed up queries they degrade
inserts and deletes, as well as any modification to indexed columns.
What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!
What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!
Hello Craig,
ReplyDeleteIf the table is not large enough, wouldn't the Optimizer by itself chose an Scan instead of Index to avoid the extra IOs?
With the new theme of more memory to enhance performance in V12, perhaps an "in memory" table is another scenario, where you would't need an Index?
Thanks,
Vibin
Yes, this is kind of a modification of the section above stating: "For a very small table with only a few pages of data and no primary key or uniqueness requirements. A very small table (perhaps 20 to 30 or so pages) might not need an index because simply reading all of the pages is very efficient already."
ReplyDeleteYour point, that even tables that are larger than this are being chosen by the optimizer for scans because of memory availability is a good one. Thanks!
And yes, if you pin a table in-memory then it would not usually need an index either.
ReplyDeleteHi Craig,
ReplyDeleteFor a table with one row without index - I lock the row with read stability(RS) and later UPDATE it. The SELECT portion, lets say 13k records, is taking time (say 40 secs) but the UPDATE takes hardly any - less than a second. Why is this so? When I remove the RS on the SELECT and change it to WITH UR, I see the UPDATE now taking more time than the SELECT.
Is this to do with the RS lock? Is there no way to make the SELECT + UPDATE consume less time?
The program (CICS-DB2) is bound with CS Isolation. I do a SYNCPOINT for every 1 record processed.
The locking of the row is mandatory in my case before UPDATE. I tried the WITH UR on the SELECT just to see the impact...Of course SELECT took less time but UPDATE shot up. But if I do the WITH RS on SELECT, the UPDATE time comes down but SELECT compensated by going upwards.
Well, if you are going to modify the data it must be locked. When you use WITH UR there are no locks taken. That is why SELECT WITH UR is so fast. Are you doing this using a cursor in a program? If so be sure to specify FOR UPDATE OF.
ReplyDelete