BIND
and REBIND are crucially important components as you attempt to assure efficient DB2 applications. Because
the BIND/REBIND process determines exactly how your DB2 data is accessed it is important
that you develop an appropriate strategy for when and how to REBIND your
programs.
There are several common REBIND approaches taken by DB2 users.
By far, the best approach is to REBIND your applications over time as the data
changes. This approach involves some form of regular maintenance that keeps DB2
statistics up to date and formulates new access paths as data volumes and
patterns change.
Other approaches include REBINDing only when a new version
of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied
to DB2. Another approach is to REBIND automatically after a regular period of
time (days, weeks, months, etc.). This approach can work if the period of time
is wisely chosen based on the application data – but it still can pose administrative
issues.
The final approach can be summarized as “if it ain’t broke
don’t fix it!” This is the worst of the several approaches discussed here. The
biggest problem with this approach is that you are penalizing every
program in your subsystem for fear that a program or two may have a degraded
access path. This results in potentially many programs having sub-optimal
performance because the optimizer never gets a chance to create better access
paths as the data changes.
Of course, the possibility of degraded performance is real –
and that is why this approach has been adopted at some sites. The problem is
being able to find which statements have degraded. In an ideal world we would
be to be able to review the access path changes beforehand to determine if they
are better or worse. But DB2 itself does not provide any systematic method of
administering access paths that way. There are third party tools that can help
you achieve this though.
Anyway, let’s go back to the best approach again, and that is
to REBIND on a regular basis as your data changes. This approach has become
known as the Three Rs. To implement this approach you:
- Regularly
reorganize the data to ensure that it is optimally structured.
- Follow
that with RUNSTATS to be sure that the reorganized state of the data is
reflected in the DB2 Catalog.
- And
follow that with a REBIND for all the application programs that access the
data structures impacted by the REORG and RUNSTATS.
At any rate, your goal should be to keep your access paths
up-to-date with the current state of your data. Failing to do this means that
DB2 is accessing data based upon false assumptions. DB2 is unlikely to make the
same access path choice as your data grows – and as patterns within the data
change.
By REBINDing you can generally improve the overall
performance of your applications because the access paths will be better
designed based on an accurate view of the data. Additionally, as DB2 changes
are introduced (PTFs, new version/release) optimizer improvements and new
access techniques can be incorporated into the access paths. That is, if you
never REBIND, not only are you forgoing better access paths due to data changes
but you are also forgoing better access paths due to changes to DB2 itself.
Adopting the Three R’s approach can pose additional questions.
For example, when should you reorganize? In order to properly determine when a
REORG is needed you’ll have to look at statistics. This means looking at either
RUNSTATS or Real-Time Statistics (RTS). So, perhaps it should be at least 4 R’s
– in other words:
- RTS (or RUNSTATS)
- REORG
- RUNSTATS
- REBIND
Now it is true that some folks don’t rely on statistics to
schedule a REORG. Instead, they just build the JCL to REORG their database
objects when they create the object. So they create a table space then build
the REORG job and schedule it to run monthly, or quarterly, or on some regular
basis. This is better than no REORG at all, but it is probably not the best
approach because you are most likely either reorganizing too soon (in which
case you waste the CPU cycles to do the REORG) or you are reorganizing too late
(in which case performance is suffering for a period of time before the REORG
runs). Better to base your REORGs off of statistics and thresholds using either
RUNSTATS or RTS.
Without accurate statistics there is little hope that the
optimizer will formulate the best access path to retrieve your data. If the
optimizer does not have accurate information on the size, organization, and
particulars of your data then it will be creating access paths based on either
default or inaccurate statistics. Incorrect statistics will cause bad choices
to be made – such as choosing a merge-scan join when a nested loop join would
be better, or failure to invoke sequential prefetch, or using the wrong index –
or no index at all. And the problem of inaccurate statistics is pervasive.
There are shops out there that never, or rarely, run RUNSTATS to gather
up-to-date statistics. Make sure yours is not one of those shops!
When should you run RUNSTATS? One answer is “As frequently
as possible based on how often your data changes.” To do this you will need to
know a thing or two about your data growth patterns: what is its make-up, how
is it used, how fast does it grow, and how often does it change? These patterns
will differ for every table space in your system.
Next we need to decide when to REBIND? The best answer for
this is when statistics have changed significantly enough to change access
paths. When we know that data has significantly changed it makes sense to REBIND
after the RUNSTATS completes. But the trick is determining exactly when we have
a “significant” change in our data. Without an automated method of comparing
and contrasting statistics (or even better yet, access paths) coming up with an
answer in a manual way can be time-consuming and error-prone – especially if we
have thousands of DB2 programs to manage.
As we REBIND, we always must be on alert for rogue access
paths. A rogue access path is created when the optimizer formulates a new
access path that performs worse than the previous access path. This can happen
for a variety of reasons. Of course, number one is that the optimizer, though
good, is not perfect. So mistakes can happen. Other factors can cause degraded
access paths, too. The access paths for volatile tables depend on when you run
the RUNSTATS. Volatile tables are those that start out empty, get rows added to
them during processing, and are emptied out at the end of the day. And, of
course, if the catalog or statistics are not accurate we can get problems, too.
So adopting the Four Rs approach implies that you will have
to develop a methodology for reviewing your access paths and taking care of any
“potential” problem access paths. Indeed, the Four Rs becomes The Five Rs as we
add a step to review the access paths after REBINDing to make sure that there
are no rogue access paths:
- Start
with Real Time Stats (or RUNSTATS) to determine when to REORG.
- Then we Reorganize
the needy table spaces (and indexes)
- After reorganizing,
run RUNSTATS to pick up current statistics,
- Follow
that with the BEBINDs.
- Then we
need that fifth R – which is to Review the access paths generated by the
REBIND.
The review is of utmost importance because the optimizer can
make mistakes. And, of course, so can you. But your users will not call you
when performance is better (or the same). They only dial your numbers when performance
gets worse. As such, proactive shops will put best practices in place to test REBIND
results comparing the before and after impact of the optimizer’s choices.