Just a short post today to wish all of my readers a very happy holiday season and to let you know that I will not be posting anything new between now and the end of the year... but be sure to check back again in 2013 as I continue to write about DB2 and mainframe issues that impact us all!
See you all next year!
Thursday, December 20, 2012
Monday, November 26, 2012
SQL Coding Guidelines: The Basics
When you are writing your SQL statements to access DB2 data be sure to follow the subsequent guidelines for coding SQL for performance. These are certain very simple, yet important rules to follow when writing your SQL statements. Of course, SQL performance is a complex topic and to understand every nuance of how SQL performs can take a lifetime. That said, adhering to the following simple rules puts you on the right track to achieving high-performing DB2 applications.
1) The first rule is to
always provide only the exact columns that you need to retrieve
in the SELECT-list of each SQL SELECT statement. Another way of stating this is
“do not use SELECT *”. The shorthand SELECT * means retrieve all columns from
the table(s) being accessed. This is fine for quick and dirty queries but is
bad practice for inclusion in application programs because:
- DB2 tables may need to be changed in the future to include additional columns. SELECT * will retrieve those new columns, too, and your program may not be capable of handling the additional data without requiring time-consuming changes.
- DB2 will consume additional resources for every column that requested to be returned. If the program does not need the data, it should not ask for it. Even if the program needs every column, it is better to explicitly ask for each column by name in the SQL statement for clarity and to avoid the previous pitfall.
2) Do not ask for what
you already know. This may sound simplistic, but most programmers violate
this rule at one time or another. For a typical example, consider what is wrong
with the following SQL statement:
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE EMPNO = '000010';
Give up? The problem is that EMPNO is included in the
SELECT-list. You already know that EMPNO will be equal to the value '000010'
because that is what the WHERE clause tells DB2 to do. But with EMPNO listed in
the WHERE clause DB2 will dutifully retrieve that column too. This causes
additional overhead to be incurred thereby degrading performance.
3) Use the WHERE clause
to filter data in the SQL instead of bringing it all into your program to
filter. This too is a common rookie mistake. It is much better for DB2 to
filter the data before returning it to your program. This is so because DB2
uses additional I/O and CPU resources to obtain each row of data. The fewer
rows passed to your program, the more efficient your SQL will be. So, the
following SQL:
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE SALARY > 50000.00;
Is better than simply reading all of the data without the
WHERE clause and then checking each row to see if the SALARY is greater than
50000.00 in your program.
These rules, though, are not the be-all, end-all of SQL performance tuning – not by a long shot. Additional, in-depth tuning may be required. But following the above rules will ensure that you are not making “rookie” mistakes that can kill application performance.
Tuesday, November 20, 2012
Happy Thanksgiving
I know it is only Tuesday (right now), but here's wishing all of my readers and all of their families and friends a very happy and restful Thanksgiving holiday. Visit with your relatives... Eat up... Watch football... Relax...
And try not to get trampled on Friday if you're going out shopping for bargains!
And try not to get trampled on Friday if you're going out shopping for bargains!
Wednesday, October 31, 2012
Eliminate Performance Management Black Holes
Today’s modern applications are not restricted to a single platform. Yet, in many ways, mainframe management and administration professionals are still tackling performance issues and problem resolution using the same methods they did when mainframe applications were all COBOL and never strayed off of the centralized mainframe computing platform. And this causes many problems.
The IT infrastructure of most organizations is multi-tiered, and business transactions flow through multiple systems and services to deliver business value. This means that today’s applications can utilize the most appropriate and useful technology for each component, thereby delivering more functional and easier to use transactions. Development time can be faster, too.
So far, so good… but performance management and problem resolution become difficult challenges in this brave new world. A business application that relies upon multiple differing computing platforms, technologies, and components to deliver service is characterized by its complexity. Consider an application where users deploy Windows workstations running .NET applications that connect to Linux servers running Oracle and stored procedures that connect to the mainframe to access DB2 data. Such an infrastructure consists of a lot of connecting points to stitch together significantly differing hardware and software platforms. And what happens if there is a problem? Tracking down the root cause of the problem can be difficult.
DBAs and system administrators for each platform may have tooling to monitor and diagnose their particular portion of the infrastructure, but a siloed approach is not pragmatic and usually results in a lot of finger pointing instead of problem solving. You know what I’m talking about – “there’s a problem with DB2!”
The mainframe DBAs usually have a performance monitor for DB2 and the systems programmers have a monitor for CICS and z/OS. And the distributed DBAs and system administrators have their monitors, too, for Oracle, SQL Server, Linux, Unix and so on.
So let’s assume that this multi-tier application begins exhibiting poor performance. Where do you start? You can’t fix what you can’t see, so unless the problem is easily monitored within an individual silo, discovering what is wrong and fixing the issue becomes a lengthy, troublesome, and expensive task. The problem could be anywhere, so it’s a major undertaking to pinpoint the root cause.
To the mainframe professionals the application goes into a “black hole” when it leaves the mainframe; to the distributed computing professionals, it is just the other way around with the mainframe being the “black hole.” Even with visibility into some components of the application, at some point the transactions vanish into one of the black holes. A siloed approach to managing performance just does not work when the application spans multiple silos. What is needed is an application performance management solution.
Organizations must be able to track business transactions from end to end; that is, from the time the user makes a request all through the entire infrastructure required to provide an answer and right back to the end user. Such a tracking solution must be able to follow the transaction in real time and report on the resources consumed at each point along the way. By providing real-time monitoring of each transaction flowing end-to-end across distributed applications it becomes possible to see the problem, to identify its root cause, to determine performance issues, and to solve problems more rapidly and less expensively.
Think about it. What are the applications and transactions like at your organization these days? How many can satisfy business needs completely on a single platform? Have you ever tried to resolve an issue or identify the root cause of a problem for an application that spans multiple platforms? When did the trail become cold because the transaction disappeared? And how did you move past all the finger-pointing?
Compuware’s latest offering, APM for Mainframe, delivers end to end transaction management so that the mainframe is no longer a black hole. Using this solution you can track your distributed applications across every platform, find the root causes of problems and performance issues, and resolve them on the spot.
Without such a solution you just have to keep living with the pain. And that means unresponsive distributed applications, slower time to correct problems, lost revenue, and higher administrative costs.
Friday, October 12, 2012
Book Signing and Discount at IOD Conference
Just a short blog post today to let everybody know that I will be doing a book signing for the new edition of both of my books at the 2012 Information on Demand conference on Tuesday, October 23rd.
You can choose to have me sign the new 6th edition of my DB2 book, DB2 Developer's Guide... or the new 2nd edition of my DBA book, Database Administration: The Complete Guide to DBA Practices and Procedures.
Or you can be my favorite type of person and choose to get them both signed!
See you in Vegas!
Wednesday, October 03, 2012
DB2 11 for z/OS: Coming Soon!
Today, October 3, 2012, IBM announced the Early Support Program for the next version of DB2. Heretofore code-named Sequoia, the DB2 11 ESP is planned for March 8, 2013 availability. So if you are still lagging behind running DB2 9 (or, heaven forbid, DB2 V8) now is the time to get yourself to DB2 10 so you'll be ready for the imminent onslaught of DB2 11 for z/OS.
The announcement (212-364) offers up a bit of information on some of the features that are planned to be available in DB2 11, including:
The announcement (212-364) offers up a bit of information on some of the features that are planned to be available in DB2 11, including:
- Performance improvements for some types of online transaction processing (OLTP), heavy insert, select query workloads, and when running queries against compressed tables;
- Improved-data sharing performance and efficiency;
- Improved utility performance and additional zIIP eligible workload (particularly with LOAD and RUNSTATS);
- Cost-effective archiving of warm and cold data with easy access to both within a single query;
- Intelligent statistics gathering and advanced optimization technology for efficient query execution in dynamic workloads;
- Additional online schema changes that simplify management, reduce the need for planned outages, and minimize the need for REORG;
- Productivity improvements for DBAs, application developers, and system administrators;
- Efficient real-time scoring within your existing transaction environment;
- Enhanced analysis, forecasting, reporting, and presentation capabilities, as well as improved storage management, in QMF;
- Expanded SQL, SQL PL, temporal, and XML function for better application performance;
- Faster migration with application protection from incompatible SQL and XML changes and simpler catalog migration.
One of the more intriguing tidbits is the new SQL PL ARRAY support. IBM is claiming this feature may be able to offer up to 10 percent CPU savings for OLTP workloads with high read-to-write ratios.
So get ready for DB2 11 - I'm sure we'll hear more about it at the IOD conference, coming up at the end of the month.
Wednesday, August 29, 2012
Managing DB2 for z/OS Application Performance
Applications that access databases are only as good as the
performance they achieve.
And every user wants their software to run as fast as possible. As such,
performance tuning and management is one of the biggest demands on the DBA’s
time. When asked what is the single most important or stressful aspect of their
job, DBAs typically respond “assuring optimal performance.” Indeed, a recent
Forrester Research survey indicates that “performance and troubleshooting” tops
the list of most challenging DBA tasks.
But when you are dealing with data in a database management
system there are multiple interacting components that must be managed and tuned
to achieve optimal performance. That is, every database application, at its core,
requires three components to operate:
- the
system (that is, the DBMS itself, the network, and the O/S),
- the
database (that is, the DDL and database schema), and
- the
application (that is, the SQL and program logic).
Each of these components requires care and attention, but
today I want to focus on the high-level aspects of performance management from
the perspective of the application. Furthermore, I will discuss this in terms
of DB2 for z/OS.
So where do we begin? For DB2, a logical starting point is
with BIND Parameters. There are many parameters and values that must be
chosen from and specified when you bind a DB2 application program. The vast
array of options at our disposal can render the whole process extremely
confusing – especially if you don’t bind on a daily basis. And even if you do,
some of the options still might be confusing if you rarely have to change them.
You know what I’m talking about, parameters like ACQUIRE, RELEASE, VALIDATE,
and DEGREE.
I will not delve into the myriad bind options as there are many articles and books, as well as the
IBM DB2 manuals that you can use to guide you along that path. Suffice it to
say, that there are some standard parameters and values that should be chosen “most
of the time” in certain situations. As such, a wise DBA group will set up
canned routines for the programmers to use for compiling and binding their
applications. Choices such as: “CICS transaction”, “DB2 batch”, or “analytical
query” can be presented to the developer and then, based on which of the
various types of programs and environments that are available, the canned
script can choose the proper bind options. Doing so can greatly diminish the
problems that can be encountered when the “wrong” parameters or values are
chosen at bind time.
This same process can be put in place for production binding
to ensure that the appropriate parameters and values are chosen. This is
especially useful when the binds are not done by a DBA, but are automated in production
or done by a less-experienced change control clerk.
Of course, there should always be a method for over-riding
the “standard” values for special situations, although these overrides should
not be available to anyone other than a well-trained individual (DBA or
otherwise).
I want to make one small exception here regarding advice on
bind parameters, and that is the EXPLAIN parameter. In production, always bind
your plans and packages specifying EXPLAIN YES. Failing to do so means that
access paths will be generated, but you will not know what they are. This is
akin to blinding yourself to what DB2 is doing and is not advisable.
Bind and Rebind are important components to achieve optimal DB2 application performance. Bind/Rebind determine the access paths to the data that is accessed by your program. As such, it is vital that you develop an appropriate strategy for when and how to Rebind your programs.
There are several common approaches taken by DB2 users. By far, the best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change. More on this in a moment.
Other approaches include Rebinding only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data – but it still can pose significant administrative issues.
The final approach is from the “if it ain’t broke don’t fix it” school of thought. This approach is the worst of the several approaches discussed here. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Yet, the possibility of degraded performance is real. That is why this approach has been adopted at some sites. The problem is being able to find which statements may be worse. The ideal situation would be to be able to review the access path changes before hand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.
Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions.
By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to DB2 (new releases/PTFs) optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.
To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need at least 4 R’s:
- Real Time Statistics (or RUNSTATS)
- REORG
- RUNSTATS
- REBIND
But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another “R” – to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don't call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.
So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal DB2 application performance.
Tuning The Code
Itself
Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with – and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.
SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:
- Let SQL do the work instead of the application program. For
example, code an SQL join instead of two cursors and a programmatic join.
- Simpler is generally better, but complex SQL can be very
efficient.
- Retrieve only the columns required, never more.
- Retrieve the absolute minimum number of rows by specifying every
WHERE clause that is appropriate.
- When joining tables, always provide join predicates. In other
words, avoid Cartesian products.
- Favor using Stage 1 and Indexable predicates.
- Avoid sorting if possible by creating indexes for ORDER BY, GROUP
BY, and DISTINCT operations.
- Avoid black boxes – that is, avoid I/O routines that are called by
programs instead of using embedded SQL.
- Avoid deadlocks by updating tables in the same sequence in every program.
- Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
- Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.
To
tune SQL the DBA must be able to interpret the output of the access paths
produced by EXPLAIN. This information is encoded in the PLAN_TABLEs. IBM and other vendors offer tools to simplify this process, such as IBM's Data Studio.
Finally,
some attention must be paid to the host language code. Host language code
refers to the application programs written in C, COBOL, Java, Visual Basic or
the programming language du jour. SQL statements are usually embedded into host
language code and it is quite possible to have finely tuned SQL inside of
inefficient host language code. And, of course, that would cause a performance
problem.
The Bottom Line
Although
DBAs must understand all three aspects of database performance management
concentrating on the application aspects of performance will most likely
provide the most bang-for-the-buck. Of course, we have only touched the tip of
the DB2 application performance iceberg today. But even this high-level view
into application performance can serve as a nice starting place for tuning your
DB2 applications.
Good
luck with DB2 for z/OS and happy performance tuning!
Wednesday, August 15, 2012
Don’t Forget About DB2 Session Variables
Session variables, set by DB2 or by the user, offer another way
to provide additional information to applications. Session variables are set by
DB2, and are accessible to application programs and end users. By accessing
session variables, an application program can determine many aspects of its
environment and the DB2 features in effect.
There are a plethora of session variables available to the DB2
programmer as outlined in the accompanying table.
Table 1. DB2 Session Variables.
The application
encoding scheme specified for the subsystem.
| |
The string delimiter.
Value will be DEFAULT, “, or ‘.
|
|
Name of the data
sharing group.
|
|
The date format
in use. Value will be ISO,
JIS, USA, EUR, or LOCAL.
|
|
The LOCAL DATE LENGTH install
parameter. Value is 10-254, or 0 for no exit.
|
|
The DECIMAL ARITHMETIC install
parameter. Value is DEC15, DEC31, 15,
or 31.
|
|
The DECIMAL POINT install parameter.
Value is ‘.’ or ‘,’.
|
|
The DECFLOAT ROUNDING MODE install parameter.
|
|
The value of GROUP ATTACH field on the DSNTIPK installation panel or the SUBSYSTEM NAME field on the DSNTIPM installation panel.
|
|
The LANGUAGE DEFAULT install
parameter. Value is ASM,
C, CPP, IBMCOB, FORTRAN, or PL/I.
|
|
The value of LOCALE LC_CTYPE install parameter.
|
|
The fully qualified
data set name of the data set from which the DSNHDECP or a user-specified application defaults module was loaded.
|
|
The USE FOR DYNAMICRULES install parameter.
Value is YES or NO.
|
|
The DEF ENCODING SCHEME install
parameter. Value is EBCDIC, ASCII, or UNICODE.
|
|
The MIXED DATA install parameter. Value is YES or NO.
|
|
The INSTALL TYPE parameter. Value is INSTALL, UPDATE, MIGRATE, or ENFM; reflects the setting of
the DSNHDECP variable NEWFUN.
|
|
Name of the
package currently in use.
|
|
Version of the
current package.
|
|
Schema name of the
current package.
|
|
The
PAD NUL-TERMINATED
install parameter. Value is YES or NO.
| |
Name of the plan currently being run.
|
|
The user’s security label (if any); null if not defined.
|
|
The SQL STRING DELIMITER install parameter. Value is DEFAULT, “, or ‘.
|
|
DB2 subsystem
identifier.
|
|
The STD SQL LANGUAGE install parameter.
Value is YES or NO.
|
|
Name of the
system, as defined in field SUBSYSTEM NAME on installation panel DSNTIPM.
|
|
A comma-delimited
string of the ASCII CCSIDs in use on this system.
|
|
A comma-delimited
string of the EBCDIC CCSIDs in use on this system.
|
|
A comma-delimited
string of the UNICODE CCSIDs in use on this system.
|
|
The TIME FORMAT install parameter. Value is ISO, JIS, USA, EUR, or LOCAL.
|
|
The LOCAL TIME LENGTH install parameter.
Value is 8-254, or 0 for no exit.
|
|
Version of the
DB2 subsystem. This value is a string, formatted as pppvvrrm where:
|
Each session variable must be qualified by SYSIBM. A built-in
function named GETVARIABLE can retrieve session variable values. So, you could
create a view based on a security label, for example:
CREAT VIEW VSECLBL AS
SELECT
column-list
FROM table-name
WHERE SECLABEL_COL = GETVARIABLE(SYSIBM.SECLABEL);
The GETVARIABLE built-in function can be used in views, triggers,
stored procedures, and constraints to enforce a security policy.
Users can add up to ten session variables by setting the name and
value in their connection or sign-on exits. User-created session variables are
qualified by SESSION. For example, the customer might have a connection or
sign-on exit that examines the SQL user’s IP address, and maps the IP address
to the user’s site within the company. This is recorded in a session variable,
named say, USER_SITE. This session variable is then accessible using the
built-in function, for example:
GETVARIABLE(SESSION.USER_SITE)
Using session variables much more information is available to
application programs as they execute, and more control and security is
provided, as well. Additionally, session variables can be trusted. They are set
by DB2 and an application cannot modify them.
Monday, July 09, 2012
DB2 Hashing and Hash Organized Tables
Up until DB2 10, all DB2 data
was retrieved using some form of indexing or scanning. With DB2 Version 10, a
new access method called hashing is available. Of course, referring to hashing
as “new” is somewhat disingenuous because it is an old, time-tested data
processing technique. Indeed, IMS databases are founded upon hashing
technology.
A hash,
or hash function, is an algorithm that converts a defined set of data elements
into a small number, usually a single integer that can serve as an index to an
array or a storage location on disk. The values returned by a hash function are
called hash values, hash sums, or simply hashes.
Figure 1 depicts basic hash
functionality. Key values are processed by a hash algorithm (also known as a
hash function, hash routine, or randomizer). The hash algorithm translates the
key value into a storage location. When data is inserted the algorithm tells
DB2 where to physically store the data; when data is accessed by the key, the
algorithm tells DB2 where to find the data.
Figure 1. How
hashing operates.
Hashing is used primarily to
optimize random I/O, such as for looking up a code table value or accessing a
single row based on the value of its primary key. A hash access can outperform
indexed access because fewer I/O operations are required to retrieve the data.
The hash requires 1 I/O operations (possibly 2 if a hash collision occurs). An
index requires at least 2 I/O operations, one to read the index page and one to
read the data page. Of course, only the smallest of indexes consists of just a
single root page; most consist of a root page, one or more non-leaf pages, and
a leaf page, thereby requiring 3 or more I/O operations.
The traditional notion of
clustering is not pertinent for hash organized tables. The data will be
organized based on the hash algorithm. Index clustering is not permitted for
tables that are hash organized.
The
Hash Space
A hash space is a defined storage area used for organizing table data for hash
access. Sufficient disk space must be allocated to the hash space to accommodate
the hashed rows that will be added to the table.
When the hash space is full,
new rows will be relocated to the overflow index. As the amount of data in the
overflow increases, hash access performance declines. This is so because DB2
must first go to the hashed location and then follow the pointer in the
overflow index to find the data, thereby increasing the number of I/O
operations required to retrieve the data. Refer to Figure 2 for an example. In
this case, a new row for NEELD needs to be added, but there is no room in the hash space. So the
hash space points to an overflow area where the new row can be stored.
Figure 2. Hash
overflow.
For this reason, a hash
organized table might consume more disk space than a traditional table to
minimize overflow rows.
Hash spaces can contain only a
single table and must be defined in a universal table space — either
partitioned by range or partitioned by growth. The hash overflow index for a
table in a partition-by-range Universal table space will be a partitioned
index. The hash overflow index for a table in a partition-by-growth Universal
table space will be a non-partitioned index.
Creating
Hash Organized Tables
Hash access can be specified
for a table using the organization-clause of the CREATE
TABLE statement: ORGANIZE
BY HASH UNIQUE (column-names) HASH SPACE (hash-space-value).
The ORGANIZE BY HASH specification
tells DB2 to use hashing for organizing the data of this table. The hash key is
specified in the list of column names. By specifying UNIQUE, the table cannot contain
more than one row with the same hash key value. And the HASH SPACE parameter defines the amount
of fixed hash space to be pre-allocated for the table. For tables defined in a
partition-by-range UTS, this space is for each partition.
Caution
Exercise care when specifying UNIQUE if the hash key can contain nulls. Although it is generally
considered bad design for keys to contain nulls, columns specified as a hash
key can be NULL. If the hash key can contain nulls and UNIQUE is also specified, all nulls
for a column are considered equal. To clarify, if the hash key is a single
nullable column, it can contain only one null.
For example, consider the
following SQL that would create the sample DEPT table as a hash-organized
table:
CREATE TABLE DSN81010.DEPT
(DEPTNO CHAR(3) NOT NULL,
DEPTNAME VARCHAR(36) NOT NULL,
MGRNO CHAR(6),
ADMRDEPT CHAR(3) NOT NULL,
LOCATION CHAR(16),
PRIMARY KEY
(DEPTNO)
)
IN
DSN8D10A.DSN8S10D
ORGANIZE BY
HASH UNIQUE (DEPTNO)
HASH SPACE 64
M;
Take care to specify a
sufficient amount of storage for the hash space in the organization-clause. By doing so, the overflow index should have a small number
of entries, or better yet, no entries at all. Hashing works best for tables
that are static in size with limited growth (if any).
If you choose to convert an
existing table to use hashing, be sure to walk through the following steps:
- Analyze each table you are considering for conversion. Look for static tables with unique keys that are frequently used for single fetch queries.
- Before replacing any existing indexes be sure that they are not used for range queries. Hashing is not used for range queries, so these indexes should not be dropped. Any index used solely for direct lookup can be dropped.
- Estimate the number of rows and average row size and calculate a sufficient hash space size.
- Issue the ALTER ADD organization-clause statement.
- Reorganize the table space specifying AUTOESTSPACE YES. Doing so allows DB2 to automatically estimate the best size for the hash space using real-time statistics.
- REBIND any applications using SQL with equality predicates on the hash key.
After converting, monitor
real-time statistics to evaluate the table’s hash usage efficacy. In SYSIBM.SYSTABLESPACESTATS
the REORGHASHACCESS
column indicates the number of times that hash access was used to access the
table. Furthermore, you can compare the REORGHASHACCESS column to the REORGSCANACCESS column to determine the number of time the table was accessed by
the hash versus other techniques. You can also review the HASHLASTUSED column, which contains the
date when hash access was last used for SELECT, FETCH, searched UPDATE, searched DELETE, or to enforce referential constraints.
In addition, be sure to verify
that hash access is used where you expected. To accomplish this, check the ACCESSTYPE column of the PLAN_TABLE looking for the values H, HN, or MH—these indicate that hash
access is used.
When
to Use Hashing
Consider using hash
organization for tables that are stable or predictable in size and where the
data is accessed most often by its primary key (that is, the hash key).
Hash-organized tables require an overflow index for rows that exceed the
specified hash space. Rows in the overflow index are not hashed—DB2 must scan
the index to retrieve rows in the overflow.
Table
Space Size Options for Hashing
Choose the table space sizing
options and parameters wisely for hash organized tables. Care must be taken to
ensure that sufficient space is available for the hashing. For PBR Universal
table spaces, be sure to specify an appropriate DSSIZE because it will be used to validate the hash space for each
partition.
Consider specifying PRIQTY as –1 to allow DB2 to use the
default primary space allocation. Furthermore, set DEFINE YES to ensure that the fixed
table space is allocated successfully before the first access.
Free space also is an
important consideration for hash organized table spaces. The PCTFREE specification is used by DB2
when the table space is reorganized with AUTOSPACE(YES) specification. DB2 uses the value of the DATASIZE column in the SYSIBM.SYSTABLESPACESTATS as
the initial size of the hash table space, and increases that value by the value
of PCTFREE to calculate
the new size when you reorganize the table space.
Note:This blog post was adapted from the newly published 6th edition of Craig's best-selling book, DB2 Developer's Guide.
Subscribe to:
Posts (Atom)