Tuesday, March 15, 2016

The Most Misunderstood Features of DB2 – Part 6: Not Indexing

Welcome to our on-going series of misunderstood issues in the world of DB2 for z/OS. Today’s topic is indexing, or to be more precise: not indexing. What do I mean by that?

Well, I’ve heard folks say that you should always create at least one index on every DB2 table. And while I can empathize with that general notion, I can’t agree completely because of that word “always,” which almost always makes a sentence wrong.

Sure, there are several good arguments for creating indexes on DB2 tables. Perhaps the most important one is clustering. DB2 uses an index to control how data is clustered on disk. Only one index can be specified as the clustering index (because, of course, the data on disk can only be stored in one order, right!). If you do not specify a clustering index then DB2 will use the earliest created (oldest) index to cluster the data. So it is usually a good idea to create an index to guide clustering.

Another popular reason to create an index on a DB2 table is to enforce uniqueness for a UNIQUE constraint or PRIMARY KEY. The only way to enforce uniqueness on a DB2 column (or set of columns) is by using a unique index.

Of course, there are a lot of other good reasons to create indexes, most of them to improve query performance. But I contend that there are situations when it makes sense not to create any indexes at all.

So when does it make more sense not to build an index for a DB2 table?

Let's start by saying that most of the time you will want to build at least one - and probably multiple - indexes on each table that you create. Indexes are crucial for optimizing performance of SQL access. Without an index, queries must scan every row of the table to come up with a result. And that can be very slow.

Having said that, here are a few situations some times where it can make sense to have no indexes defined on a table:

When all (or most) accesses retrieve every row of the table. Because every row will be retrieved every time you want to use the table, an index (if used) would just add extra I/O and would diminish, not enhance performance. Though not extremely common, you may indeed come across such tables in your organization.
For a very small table with only a few pages of data and no primary key or uniqueness requirements. A very small table (perhaps 20 to 30 or so pages) might not need an index because simply reading all of the pages is very efficient already.
When performance doesn't matter and the table is only accessed very infrequently. But, hey, when do you ever have that type of requirement in the real world?

Other than for these circumstances, you will most likely want to build one or more indexes on each table, not only to optimize performance, but also to ensure uniqueness, to support referential integrity, and to drive data clustering.

Of course, indexes do not come without cost. Indexes take up disk space and adding a lot of indexes will consume disk space. An additional consideration is their impact in data modification. Although indexes speed up queries they degrade inserts and deletes, as well as any modification to indexed columns.

What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!


Vibin Varghese said...

Hello Craig,
If the table is not large enough, wouldn't the Optimizer by itself chose an Scan instead of Index to avoid the extra IOs?
With the new theme of more memory to enhance performance in V12, perhaps an "in memory" table is another scenario, where you would't need an Index?


Craig S. Mullins said...

Yes, this is kind of a modification of the section above stating: "For a very small table with only a few pages of data and no primary key or uniqueness requirements. A very small table (perhaps 20 to 30 or so pages) might not need an index because simply reading all of the pages is very efficient already."

Your point, that even tables that are larger than this are being chosen by the optimizer for scans because of memory availability is a good one. Thanks!

Craig S. Mullins said...

And yes, if you pin a table in-memory then it would not usually need an index either.

Ar said...

Hi Craig,

For a table with one row without index - I lock the row with read stability(RS) and later UPDATE it. The SELECT portion, lets say 13k records, is taking time (say 40 secs) but the UPDATE takes hardly any - less than a second. Why is this so? When I remove the RS on the SELECT and change it to WITH UR, I see the UPDATE now taking more time than the SELECT.
Is this to do with the RS lock? Is there no way to make the SELECT + UPDATE consume less time?
The program (CICS-DB2) is bound with CS Isolation. I do a SYNCPOINT for every 1 record processed.

The locking of the row is mandatory in my case before UPDATE. I tried the WITH UR on the SELECT just to see the impact...Of course SELECT took less time but UPDATE shot up. But if I do the WITH RS on SELECT, the UPDATE time comes down but SELECT compensated by going upwards.

Craig S. Mullins said...

Well, if you are going to modify the data it must be locked. When you use WITH UR there are no locks taken. That is why SELECT WITH UR is so fast. Are you doing this using a cursor in a program? If so be sure to specify FOR UPDATE OF.