Applications that access databases are only as good as the
performance they achieve.
And every user wants their software to run as fast as possible. As such,
performance tuning and management is one of the biggest demands on the DBA’s
time. When asked what is the single most important or stressful aspect of their
job, DBAs typically respond “assuring optimal performance.” Indeed, a recent
Forrester Research survey indicates that “performance and troubleshooting” tops
the list of most challenging DBA tasks.
But when you are dealing with data in a database management
system there are multiple interacting components that must be managed and tuned
to achieve optimal performance. That is, every database application, at its core,
requires three components to operate:
- the
system (that is, the DBMS itself, the network, and the O/S),
- the
database (that is, the DDL and database schema), and
- the
application (that is, the SQL and program logic).
Each of these components requires care and attention, but
today I want to focus on the high-level aspects of performance management from
the perspective of the application. Furthermore, I will discuss this in terms
of DB2 for z/OS.
So where do we begin? For DB2, a logical starting point is
with BIND Parameters. There are many parameters and values that must be
chosen from and specified when you bind a DB2 application program. The vast
array of options at our disposal can render the whole process extremely
confusing – especially if you don’t bind on a daily basis. And even if you do,
some of the options still might be confusing if you rarely have to change them.
You know what I’m talking about, parameters like ACQUIRE, RELEASE, VALIDATE,
and DEGREE.
I will not delve into the myriad bind options as there are many articles and books, as well as the
IBM DB2 manuals that you can use to guide you along that path. Suffice it to
say, that there are some standard parameters and values that should be chosen “most
of the time” in certain situations. As such, a wise DBA group will set up
canned routines for the programmers to use for compiling and binding their
applications. Choices such as: “CICS transaction”, “DB2 batch”, or “analytical
query” can be presented to the developer and then, based on which of the
various types of programs and environments that are available, the canned
script can choose the proper bind options. Doing so can greatly diminish the
problems that can be encountered when the “wrong” parameters or values are
chosen at bind time.
This same process can be put in place for production binding
to ensure that the appropriate parameters and values are chosen. This is
especially useful when the binds are not done by a DBA, but are automated in production
or done by a less-experienced change control clerk.
Of course, there should always be a method for over-riding
the “standard” values for special situations, although these overrides should
not be available to anyone other than a well-trained individual (DBA or
otherwise).
I want to make one small exception here regarding advice on
bind parameters, and that is the EXPLAIN parameter. In production, always bind
your plans and packages specifying EXPLAIN YES. Failing to do so means that
access paths will be generated, but you will not know what they are. This is
akin to blinding yourself to what DB2 is doing and is not advisable.
Access Path Management
Bind
and Rebind are important components to achieve optimal DB2 application
performance. Bind/Rebind determine the access paths to the data that is
accessed by your program. As such, it is vital that you develop an appropriate
strategy for when and how to Rebind your programs.
There
are several common approaches taken by DB2 users. By far, the best approach is
to Rebind your applications over time as the data changes. This approach
involves some form of regular maintenance that keeps DB2 statistics up to date
and formulates new access paths as data volumes and patterns change. More on
this in a moment.
Other
approaches include Rebinding only when a new version of DB2 is installed, or
perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach
is to rebind automatically after a regular period of time, whether it is days,
weeks, months, or whatever period of time you deem significant. This approach
can work if the period of time is wisely chosen based on the application data –
but it still can pose significant administrative issues.
The
final approach is from the “if it ain’t broke don’t fix it” school of thought.
This approach is the worst of the several approaches discussed here. This
approach penalizes every program for
fear that a single program (or two) might experience a degraded access path. Yet,
the possibility of degraded performance is real. That is why this approach has
been adopted at some sites. The problem is being able to find which statements
may be worse. The ideal situation would be to be able to review the access path
changes before hand to determine if they are better or worse. But DB2 itself
does not provide any systematic method of administering access paths that way.
There are third party tools that can help you achieve this though.
Anyway,
let’s go back to the best approach again, and that is to Rebind regularly as
your data changes. This involves what is known as the three Rs: REORG,
RUNSTATS, and Rebind. At any rate, your goal should be to keep your access
paths up-to-date with the current state of your data. Failing to do this means
that DB2 is accessing data based upon false assumptions.
By
Rebinding you will generally improve the overall performance of your
applications because the access paths will be better designed based on an
accurate view of the data. And as you apply changes to DB2 (new releases/PTFs) optimizer
improvements and new access techniques can be used. If you never Rebind, not
only are you forgoing better access paths due to data changes but you are also
forgoing better access paths due to changes to DB2 itself.
To
adopt the Three R’s you need to determine when to REORG. This means looking at
either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need at least 4 R’s:
- Real Time Statistics (or RUNSTATS)
- REORG
- RUNSTATS
- REBIND
But is
this enough? Probably not because we need to review the access paths after rebinding
to make sure that there are no rogue access paths. So, let’s add another “R” –
to Review the access paths generated by the REBIND. As we mentioned, the
optimizer can make mistakes. And, of course, so can you. Users don't call you
when performance is better (or the same). But if performance gets worse, you
can bet on getting a call from irate users.
So we
need to put in place best practices whereby we test Rebind results to compare
the before and after impact of the optimizer’s choices. Only then can we assure
that we are achieving optimal DB2 application performance.
Tuning The Code
Itself
Of
course, everything we’ve discussed so far assumes that the code is written
efficiently to begin with – and that is a big assumption. We also need to make
sure that we are implementing efficient application code. The application code
consists of two parts: the SQL code and the host language code in which the SQL
is embedded.
SQL is
simple to learn and easy to start using. But SQL tuning and optimization is an
art that takes years to master. Some general rules of thumb for creating
efficient SQL statements include:
- Let SQL do the work instead of the application program. For
example, code an SQL join instead of two cursors and a programmatic join.
- Simpler is generally better, but complex SQL can be very
efficient.
- Retrieve only the columns required, never more.
- Retrieve the absolute minimum number of rows by specifying every
WHERE clause that is appropriate.
- When joining tables, always provide join predicates. In other
words, avoid Cartesian products.
- Favor using Stage 1 and Indexable predicates.
- Avoid sorting if possible by creating indexes for ORDER BY, GROUP
BY, and DISTINCT operations.
- Avoid black boxes – that is, avoid I/O routines that are called by
programs instead of using embedded SQL.
- Avoid deadlocks by updating tables in the same
sequence in every program.
- Issue data modification statements (INSERT, UPDATE, DELETE) as
close as possible to the COMMIT statement as possible.
- Be sure to build a COMMIT strategy into every batch program that
changes data. Failing to COMMIT can cause locking problems.
To
tune SQL the DBA must be able to interpret the output of the access paths
produced by EXPLAIN. This information is encoded in the PLAN_TABLEs. IBM and other vendors offer tools to simplify this process, such as IBM's Data Studio.
Finally,
some attention must be paid to the host language code. Host language code
refers to the application programs written in C, COBOL, Java, Visual Basic or
the programming language du jour. SQL statements are usually embedded into host
language code and it is quite possible to have finely tuned SQL inside of
inefficient host language code. And, of course, that would cause a performance
problem.
The Bottom Line
Although
DBAs must understand all three aspects of database performance management
concentrating on the application aspects of performance will most likely
provide the most bang-for-the-buck. Of course, we have only touched the tip of
the DB2 application performance iceberg today. But even this high-level view
into application performance can serve as a nice starting place for tuning your
DB2 applications.
Good
luck with DB2 for z/OS and happy performance tuning!