Testing application programs is a discipline unto itself and there are many
considerations and nuances to be mastered to be able to test software
appropriately. This blog post will not go into a lot of depth regarding testing practices, other than to highlight the important things to keep in mind
to assure optimal performance of your Db2 applications.
Statistics
The data that you use in your test environment will not be
the same as your production data. Typically, you will have less test data than
you do in production. So, if you run the RUNSTATS utility on your test data you will get
different statistics than in production.
Instead of running RUNSTATS, you can test your SQL access
paths by updating the test catalog statistics to be the same as your production
system. Do not attempt to modify test statistics on your own. You should work with your DBA
group to set up a process for updating test statistics. This can be accomplished
in various ways. Your organization may have a tool that makes it easy to copy
statistics from production to test; or your DBA team may use a DDL script of
queries and modification statements to populate test statistics from
production.
If your test table definitions are different in test and
production you will need to take this into account in the script. For example,
things like creator, name, indexes, number of partitions, and even columns can
differ between environments. Furthermore, you may have new tables and columns for which there are no current production statistics, meaning that you will need to create estimated statistics based on your knowledge of the business and application.
Some organizations make the mistake of copying production
statistics to test once, and never (or rarely) populating
test again. This is a mistake because most production databases change over
time, sometimes dramatically. When you run Runstats for your production applications it is a good
idea to also update your test statistics from the new production statistics.
Modeling a Production Environment in Test
Another tactic you can use to improve the accuracy of access
path testing is to model the configuration and settings of your productionenvironment in your test system. Remember that the Db2 optimizer does not just
use statistics, but also information about
your computing environment.
Db2 test systems typically vary from the production system. Application
testing is often accomplished on test systems that have different parameters
and configurations than the production systems that run the applications. Test
systems usually get set up on a less powerful processor (or LPAR), and use less
memory for buffering, sorting, and other system processes. This can result in
different access paths for test and production systems, which can cause
performance problems that only show up after you move to production.
However, it is possible to model the configuration and parameters
of your production environment in your test system. You can specify
configuration details for Db2 to use for access path selection in
your test system using profile tables.
Work with your DBA group to configure proper profile settings for
testing your applications.
Test Cases for Skewed Data
Db2 assumes that data values are mostly uniformly
distributed throughout the data. However, not all data is uniformly
distributed. Db2 RUNSTATS can capture information about non-uniformly
distributed and skewed data.
When data is non-uniformly distributed a subset of the
values occur much more frequently than others. A special case of non-uniformly
distributed data is skewed data. When data is skewed, one
value (or a very small number of values) occurs much more frequently than
others.
Non-uniformly distributed and skewed data presents a performance testing challenge. The
Db2 optimizer can formulate different access paths for non-uniformly
distributed data based on the actual values supplied. This is particularly
important for dynamic SQL applications, but you should be aware of
non-uniformly distributed and skewed data even for static SQL applications.
For non-uniformly distributed data you can examine the Db2
catalog to obtain values for the most commonly occurring values. For Db2 for z/OS this information is in the SYSIBM.SYSCOLDIST table.
Be sure to test with several of the values that are stored
in the Colvalue column of these tables, and some that are not. This will enable
you to test the performance of the most common values and less common values.
The access paths may differ for each and the performance also can vary.
An Example
Let’s discuss an example. Suppose you operate a bar with a cigar
room and you want to keep track of customers with a table of cigar smokers.
We gather information like name, address, phone number, sex, and favorite cigar
in the table. Cigar smokers skew male, so there will likely be many more rows
where Sex is M, than there are where Sex is F. With that background, consider a query like this one:
SELECT name, phoneno, fav_cigar
FROM cigar_smokers
WHERE sex = ?;
FROM cigar_smokers
WHERE sex = ?;
Because the data is skewed, it is possible that Db2 will
choose a different access path for M than for F. If the vast majority of the
rows are men, then a table scan might be chosen for Sex = ‘M’; whereas with
only a few rows for women, an index might be chosen if one exists on the Sex
column.
This is just a simple example. You need to understand your
data and how it skews to make sure that you create and test sample test cases
for all of the pertinent values.
SQL Variations
A final performance testing consideration is to consider
multiple SQL variations, especially for queries that access a lot of data or
have complex access paths. Do not just find one SQL formulation that works and
stick with it. Remember from earlier chapters that you can code multiple
variations of SQL statements that return the same data, but that perform quite
differently.
Note
This blog post was adapted and excerpted from my latest book, A Guide to Db2 Performance for Application Developers. Click the link for more information or to buy a copy (both print and ebook available).