The Db2 Portal Blog: 2014

Friday, December 19, 2014

Season's Greetings

Seasons Greetings

And a hearty "Hello!" to everybody out there... just a short post today to wish everybody a very happy holiday season.

This is the time of year when work starts winding down somewhat and people start to spend time with their families. Whether it is Christmas, Hannukah, Kwanza, or just celebrating the end of one year and the beginning of another, may you spend it peacefully, happily and among those you love.

This will be the final post of the year for this blog, but be sure to join me again next year - 2015 - as we continue to examine all aspects of everybody's favorite DBMS... IBM's DB2...

Monday, December 15, 2014

The Wizard of Userville and the DB2 Developer's Guide

Once upon a time there was a kingdom called Userville. The people in the kingdom were impatient and wanted to know everything about everything—they could never get enough information. Life was difficult and the people were unhappy because data was often lost, and even when it was available, it was often inaccurate and not easy to access.

The King decided to purchase DB2, an advanced tool for storing and retrieving data. With DB2 the Users could process their data and turn it into information. “This,” he thought, “should keep the people happy. DB2 will solve all my problems.” But he soon found out that special knowledge was necessary to make DB2 work its wonders. Nobody in Userville knew how to use it properly.

Luckily, a grand Wizard living in a nearby kingdom knew many mystical secrets for retrieving data. These secrets were a form of magic called SQL. The King of Userville summoned the Wizard, offering him many great treasures if only he would help the poor Users in Userville.

The Wizard soon arrived, determined to please. Armed with nothing more than SQL and a smile, the Wizard strode to the terminal and uttered the magic words:

SELECT E.EMPNO, E.FIRSTNME, E.LASTNAME, D.DEPTNO, D.DEPTNAME

FROM DSN81010.DEPT D,

DSN81010.EMP E

WHERE E.WORKDEPT = D.DEPTNO;

A crowd gathered and applauded as the desired information began pumping out of the terminal. “More, more,” shouted the data-starved masses. The Wizard gazed into the screen, and with amazing speed effortlessly produced report after report. The King was overheard to say, “You know, this is just too good to be true!” Everybody was happy. The Users had their share of information, the King had a peaceful kingdom, and the Wizard had his treasures and the respect of the Users.

For many months, the Users were satisfied with the magic of the great Wizard. Then, one day, the Wizard disappeared…in a jet to the West Coast for 150 grand a year—and a bunch of stock options. The people of the kingdom began to worry. “How will we survive without the magic of the Wizard? Will we have to live, once again, without our precious information?” The Wizard’s apprentice tried to silence the crowd by using his magic, but it wasn’t the same. The information was still there, but it wasn’t coming fast enough or as effortlessly. The apprentice was not yet as skilled as the great Wizard who had abandoned the kingdom. But, as luck would have it, one day he stumbled upon the great Wizard’s diary. He quickly absorbed every page and soon was invoking the Wizard’s magic words. And all was well again.

Well, life is not always that simple. Departing Wizards do not often leave behind documentation of their secrets. But...

DB2 Developer's Guide

Many of you who have purchased my book, DB2 Developer's Guide, will recognize the story recounted above because it starts the book off in Chapter 1. The idea being that the rest of the book is the Wizard's guide to DB2 for z/OS...

If you use DB2 for z/OS for a living and you have never read DB2 Developer's Guide, maybe it is time to treat yourself to an early present for the holidays? The book comprises more than 1500 pages of in-depth DB2 knowledge and information. Over the course of 46 chapters DB2 Developer's Guide covers:

SQL Techniques, Tips, and Tricks
DB2 Application Development
DB2 In-Depth (an under the covers look)
DB2 Perfromance Monitoring
DB2 Utilities and Commands
DB2 Tools and Organizational Issues
Distributed DB2
and much, much more

The book has been in print for more than 20 years now and has been published in 6 different editions over that span. The current edition is the 6th edition published by IBM Press.

So continue and take the next step toward becoming a DB2 Wizard by getting your own copy today!

Wednesday, December 10, 2014

An Extra DBA Rule of Thumb

Last year on the blog I posted a series of 12 DBA Rules of Thumb. As a quick reminder, these Rules of Thumb - or ROTS, are some general rules of the road that apply to the management discipline of Database Administration that I have collected over the years. These ROTs are broadly applicable to all DBAs, even though this is a DB2-focused blog.

Please click on the link in the paragraph above if you need a refresher on the DBA ROTs from last year.

The purpose of today's blog post is to suggest an additional Rule of Thumb... and that is to Diversify! A good DBA is a Jack-of-All-Trades.

You can’t just master one thing and be successful in this day-and-age. The DBA maintains production, QA and test environments, monitors application development projects, attends strategy and design meetings, selects and evaluates new products, and connects legacy systems to the Web.

And if all of that is not enough, to add to the chaos, DBAs are expected to know everything about everything. From technical and business jargon to the latest management and technology fads and trends, the DBA is expected to be “in the know.” For example, the DBA must be up on trends like Big Data and Analytics.

And do not expect any private time: A DBA must be prepared for interruptions at any time to answer any type of question… and not just about databases, either.

When application problems occur, the database environment is frequently the first thing blamed. The database is “guilty until proven innocent.” And the DBA is expected to be there to fix things. That means the DBA is often forced to prove that the database is not the source of the problem. The DBA must know enough about all aspects of IT to track down errors and exonerate the DBMS and database structures he has designed. So he must be an expert in database technology, but also have semi-expert knowledge of the IT components with which the DBMS interacts: application programming languages, operating systems, network protocols and products, transaction processors, every type of computer hardware imaginable, and more. The need to understand such diverse elements makes the DBA a very valuable resource. It also makes the job interesting and challenging.

To summarize, the DBA must be a Jack-of-all-Trades... and a master of several!!!

Tuesday, December 02, 2014

DSN1COPY Improvements in DB2 11 for z/OS

There have been some nice data validation improvements made to the IBM DSN1COPY utility in DB2 11 for z/OS. I suppose I should first explain what the DSN1COPY utility is before I talk about how it has been improved, so...

DSN1COPY is also known as the "Offline Copy utility." It has many uses. Of course, the primary use case for DSN1COPY is to copy data sets without DB2 having to be up and running. DSN1COPY can be used to copy VSAM data sets to sequential data sets, and vice versa. It also can copy VSAM data sets to other VSAM data sets and can copy sequential data sets to other sequential data sets. As such, DSN1COPY can be used to:

Create a sequential data set copy of a DB2 table space or index data set.
Create a sequential data set copy of another sequential data set copy produced by DSN1COPY.
Create a sequential data set copy of an image copy data set produced using the DB2 COPY utility, except for segmented table spaces.
Restore a DB2 table space or index using a sequential data set produced by DSN1COPY.
Restore a DB2 table space using a full image copy data set produced using the DB2 COPY utility.
Move DB2 data sets from one disk to another.
Move a DB2 table space or index space from a smaller data set to a larger data set to eliminate extents. Or move a DB2 table space or index space from a larger data set to a smaller data set to eliminate wasted space.

DSN1COPY runs as a batch job, so it can run as an offline utility when the DB2 subsystem is inactive. It can run also when the DB2 subsystem is active, but the objects it operates on should be stopped to ensure that DSN1COPY creates valid output. DSN1COPY does not check to see whether an object is stopped before carrying out its task. DSN1COPY does not directly communicate with DB2.

DSN1COPY performs a page-by-page copy. Therefore, you cannot use DSN1COPY to alter the structure of DB2 data sets. For example, you cannot copy a partitioned table space into a segmented table space.

Perhaps the nicest feature of DSN1COPY is its ability to modify the internal object identifier stored in DB2 table space and index data sets, as well as in data sets produced by DSN1COPY and the DB2 COPY utility. When you specify the OBIDXLAT option, DSN1COPY reads a data set specified by the SYSXLAT DD statement. This data set lists source and target DBIDs, PSIDs or ISOBIDs, and OBIDs, thereby enabling you to modify these IDs accordingly (possibly for moving data from one subsystem to another).

You can also use DSN1COPY to check the validity of table space and index pages.

OK Then, But What's New?

So now that we understand the DSN1COPY utility, let's dig in to learn a little bit about how it has been improved in DB2 11 for z/OS. Basically, DB2 11 bring improved data validation to the DSN1COPY utility.

In DB2 11, the target data set produce by DSN1COPY is automatically validated after it is populated. The first time that the target data set is physically opened by an operation other than a utility, DB2 checks for inconsistencies in the data and the DB2 Catalog. The validation performed includes checking:

DBID, PSID, and OBID
SEGSIZE and PAGESIZE
Table space type
Table schema (if the table space contains only one table)

If inconsistencies are found, DB2 throws a -904 SQLCODE and reports the issue. You can then use the REPAIR utility to remediate the reported issues. In past releases, validation did not occur immediately, which could have resulted in data corruption issues, storage overlays, and even ABENDs.

Summary

So you can rest easier knowing that DSN1COPY data is checked after it is created, thereby removing a lot of the chance for calamity if you ran the utility improperly... and that's a good thing!

Monday, November 24, 2014

Peace, Prosperity and Happy Thanksgiving

Just a quick blog post today to wish all of my readers, wherever they live, peace and prosperity during this holiday season.

And to my readers in the USA, Happy Thanksgiving. You know what that means, right? Relatives, football, and a lot of food, including turkey!

It also means that we are about to "officially" start the holiday shopping season, which begins on Black Friday. Of course, anyone who has been out in the mall, or to any store really, knows that the holiday season started right after Halloween. But for most people it begins this week!

So, Happy Thanksgiving to the US folks... and peace, happiness and warm thoughts to all of you everywhere this week.

Friday, November 14, 2014

Database Basics for Beginners

Every now and then I get e-mail from readers of my blogs asking introductory/beginner questions about databases and DBMS. I cannot take the time to answer all of these e-mails in-depth, so I thought I blog a quick post with some good resources for folks.

I think a good place to start is an article I wrote several years ago now titled What is a Database? This article breaks down the benefits of a database, outlines the difference between a database and a DBMS, and provides some guidance for further reading (suggested books).

Other questions I am get asked frequently involve database administration. One is: What Does a DBA Do? Follow the link to find my answer to that question. Another is: how can I become a DBA? I wrote an article titled How to Become a DBA to answer that one. And finally, another frequent topic is: How many DBAs do I need? That is a tricky one, but I propose a framework to help answer that question in an article titled DBA Staffing Considerations.

I also get a lot of DB2 for z/OS questions. And I've written a book on that topic, plus a bunch of DB2 articles, too (all of which can be found here).

So I guess what I am saying here is to take a look at what is already "out there" to see if your questions can be answered on the web. But, please, keep the questions coming. If I do not answer your e-mail do not be discouraged. I do read most of them (unless it gets caught in my spam collector). Even if I do not have the time to respond, I keep track of what is asked and use it as input into my writing process... so you may see an answer pop up online in a blog, article, or column I write... eventually.

Tuesday, November 11, 2014

On Building Appropriate DB2 Indexes

Perhaps the single most important thing that can be done to assure optimal DB2 application performance is creating correct indexes for your tables based on the queries used by your applications. Of course, this is easier said than done. But we can start with some basics. For example, consider the following SQL statement:

  SELECT  LASTNAME, SALARY
FROM EMP
WHERE   EMPNO = '000010'
AND     DEPTNO = 'D01';

What index or indexes would make sense for this simple query? The short answer is “it depends.” But the more important answer is to understand what it depends upon! First, think about all of the possible indexes that could be created. Your first short list probably looks something like this:

Index1 on EMPNO
Index2 on DEPTNO
Index3 on EMPNO and DEPTNO
Index4 on DEPTNO and EMPNO

This is a good start and one of either Index3 or Index4 is probably the best. Either allows DB2 to use the index to immediately lookup the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table you might want to examine the impact of creating yet another index on the table. Factors to consider include:

Modification impact: DB2 will automatically maintain every index that you create. This means that every INSERT and every DELETE to this table will cause data to be inserted and deleted not just from the table, but also from its indexes. And if you UPDATE the value of a column that is in an index, the index will also be updated. So, indexes speed the process of retrieval but slow down modification.
Importance of this particular query: The more important the query the more you may want to tune by index creation. For example, if you are coding a query that will be run every day by the CIO, you will want to make sure that it performs optimally. Who wants to risk a call from the CIO complaining about performance? So building indexes for that particular query is very important. On the other hand, a query for a low-level clerk may not necessarily be weighted as high, so that query may have to make due with the indexes that already exist. Of course, the decision depend on the importance of the application to the business – not just on the importance of the user of the application.
Columns in the existing indexes: If an index already exists on EMPNO or DEPTNO it might not be wise to create another index on the combination. However, it might make sense to change the other index to add the missing column. But not always because the order of the columns in the index can make a big difference depending on the query. For example, consider the following query:

SELECT LASTNAME, SALARY

FROM EMP

WHERE EMPNO = '000010'

AND DEPTNO > 'D01';

In this case, EMPNO should be listed first in the index. And DEPTNO should be listed second allowing DB2 to do a direct index lookup on the first column (EMPNO) and then a scan on the second (DEPTNO) for the greater-than. Furthermore, if indexes already exist for both columns (one for EMPNO and one for DEPTNO) DB2 potentially can use them both to satisfy this query so creating another index may not be necessary.

One final thought for today, and that is to build your indexes based on workload, not object by object. Many people make the mistake of just guessing as some indexes as they create tables for new projects. And sometimes this cannot be avoided because the SQL typically is not known before the database is created. But some of the guesses -- or maybe many of them -- are likely to be suboptimal at best, wrong at worst.

Indexes should be built to optimize access to data via your SQL queries. (Of course, there are indexes required to support RI and uniqueness, but let's leave them out of the discussion for the moment.) To properly create a set of indexes requires a list of the SQL to be used, an estimate of the amount of data in the table (and an estimate of column values if possible), an estimate of the frequency that each SQL statement will be executed, and the relative importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries (most of the time) be attempted. If you are doing it any other way, you are doing it wrong.

Of course, there is much more to index design than we have covered so far. For example, you might consider index overloading to achieve index only access. If all of the data that a SQL query asks for is contained in the index, DB2 may be able to satisfy the request using only the index. Consider our previous SQL statement. We asked for LASTNAME and SALARY given information about EMPNO and DEPTNO. And we also started by creating an index on the EMPNO and DEPTNO columns. If we include LASTNAME and SALARY in the index as well then we never need to access the EMP table because all of the data we need exists in the index. This technique can significantly improve performance because it cuts down on the number of I/O requests.

Keep in mind, though, that it is not prudent (or even possible) to make every query an index only access. This technique should be saved for particularly troublesome or important SQL statements.

Monday, November 03, 2014

Removing Superfluous Spaces

We all can relate to dealing with systems that have data integrity problems. But some data integrity problems can be cleaned up using a touch of SQL. Consider the common data entry problem of extraneous spaces (or blanks) inserted into a name field. Not only is it annoying, sometimes it can cause the system to ignore relationships between data elements because the names do not match. For example, “Craig Mullins” is not equivalent to “Craig Mullins”; the first one has two spaces between the first and last name whereas the second one only has one.

You can write an SQL UPDATE statement to clean up the problem, if you know how to use the REPLACE function. REPLACE does what it sounds like it would do: it reviews a source string and replaces all occurrences of a one string with another. For example, to replace all occurrences of Z with A in the string BZNZNZ you would code:

REPLACE(‘BZNZNZ’,’Z’,’A’)

And the result would be BANANA. So, let’s create a SQL statement using REPLACE to get rid of any unwanted spaces in the NAME column of our EMPLOYEE table:

UPDATE EMPLOYEE

SET NAME = REPLACE(

REPLACE(

REPLACE(NAME, SPACE(1), '<>')

'><', SPACE(0))

'<>', SPACE(1));

"Wait-a-minute," you might be saying. "What are all of those left and right carats and why do I need them?"

OK, fair enough. Let’s explain how this works starting from the inside out. The inside REPLACE statement takes the NAME column and converts every occurrence of a single space into a left/right carat. The next REPLACE (working outward), takes the string we just created, and removes every occurrence of a right/left carat combination by replacing it with a zero length string. The final REPLACE function takes that string and replaces any left/right carats with a single space. The reversal of the carats is the key to removing all spaces except one – remember, we want to retain a single space anywhere there was a single space as well as anywhere that had multiple spaces. Try it, it works.

Of course, you can use any two characters you like, but the left and right carat characters work well visually. Be sure that you do not choose to use characters that occur naturally in the string that you are acting upon.

Finally, the SPACE function was used for clarity. You could have used strings encased in single quotes, but the SPACE function is easier to read. It simply returns a string of spaces the length of which is specified as the integer argument. So, SPACE(12) would return a string of twelve spaces.

Thursday, October 23, 2014

DB2 SQL and Application Performance Tools

So far in this series of blog posts on DB2 performance tools, we have looked at system and database performance solutions. But perhaps the most important solution area involves monitoring and tuning application SQL statements.

Why do I say that? Well, the cause of most performance problems is usually due to bad SQL and application code. Not every problem, of course, but maybe as much as 70 to 80 percent of DB2 (and relational) performance issues is likely due to inefficient application code.

Writing SQL statements to access database tables is the responsibility of an application development team. However, the DBA usually gets involved when it comes to the performance of SQL. With SQL’s flexibility, the same request can be made in different ways. Because many of these methods are inefficient, application performance can fluctuate wildly unless the SQL is analyzed and tuned by an expert prior to implementation.

The EXPLAIN command provide information about the access paths used by SQL queries by parsing SQL in application programs and placing encoded output into a PLAN_TABLE or by producing a standard access path report. To gauge efficiency, a DBA must decode this data and determine if a more efficient access path is available.

SQL code reviews are required to ensure that optimal SQL design techniques are used. An application design walkthrough should be performed for each program before it moves to production. This is done to review all SQL statements, the selected access paths, and the program code in which the SQL is embedded. The review also includes an evaluation of database statistical information to ascertain whether production-level statistics were used at the time of the EXPLAIN.

A line-by-line review of application source code and EXPLAIN output is tedious and prone to error, and can cause application backlogs. SQL analysis tools greatly simplify this process by automating major portions of the code review process. The SQL analysis tool typically

Analyzes the SQL in an application program, describing the access paths chosen in a graphic format, an English description, or both.
Issues warnings when specific SQL constructs are encountered. For example, each time a sort is requested (by ORDER BY, GROUP BY, or DISTINCT), a message informs the user of the requisite sort.
Suggests alternative SQL solutions based on an “expert system” that reads SQL statements and their corresponding PLAN_TABLE entries and poses alternative SQL options.
Extends the rules used by the “expert system” to capture site-specific rules.
Analyzes at the subsystem, instance, server, application, plan, package, or SQL statement level.
Stores multiple versions of EXPLAIN output, creates performance comparisons, and plans history reports.

Tools that analyze the performance of the application code in which the SQL is embedded are available too. These tools usually capture in-depth information about programs as they are run and provide reports that specify which areas of the code consume the most resources. Unfortunately, most of these tools do not necessarily interface to SQL analysis tools. Why might this be a problem?

Well, consider an application program that contains a singleton SELECT inside a loop. The singleton SELECT requests a single row based on a WHERE clause, checking for the primary key of that table. For each iteration of the loop, the program changes the primary key value being searched such that the entire table is read from the lowest key value to the highest key value.

SQL analysis tools will probably not target the SQL statement as inefficient because the predicate value is for the primary key, which should invoke indexed access. The application program analysis tool may flag the section of the code that accesses the data as inefficient, but it will not help you to fix it or tell you why it is inefficient.

A knowledgeable performance analyst or DBA would have to use both tools and interpret the output of each to arrive at a satisfactory conclusion. For example, it could be more efficient to code a cursor, without a predicate, to retrieve every row of the table, and then fetch each row one by one. This method would eliminate index I/O, might use parallel access, and therefore should reduce I/O and elapsed time—thereby enhancing performance.

Only a trained analyst can catch this type of design problem during a code walkthrough. Although a plan analysis tool significantly reduces the effort involved in the code review process, it cannot eliminate it.

So what should you look for in an SQL analysis tool? The first feature required of SQL analysis tools is the ability to read and interpret standard EXPLAIN or SHOW PLAN output. The tool should be able to read the plan table or interface directly with the DBMS command to obtain the output. It then must be able to automatically scan the EXPLAIN or SHOW PLAN data and report on the selected access paths and the predicted performance. Advanced tools will provide recommendations for improving the SQL by adding indexes or modifying the SQL.

Yet another category of tool can evaluate access paths as you REBIND programs and categorize them into changed and unchanged access paths. This helps to identify where SQL tuning may be required. Advanced forms of these tools also apply rules tot he changed SQL to indicate if the access path is better or worse than the prior access path. Such tools can be incredibly helpful for performing mass rebinds of your production programs.

SQL Monitors

An SQL monitoring solution can identify running SQL statements, filter the information, and display it in an appropriate order and configuration. For example, you can use an SQL monitor to identify the Top Ten CPU users over the past hour (or the past day, week, etc.)

Usually, there is the on-line capability, that displays what is happening right now, and the historical capability, which can display details and trends over time.

An SQL monitor is particularly helpful when working to remediate production performance issues where hundreds or thousands (or more) of SQL statements can be running at any one time.

End-to-End Performance Tools

Modern applications require multiple system components and run across multiple networked devices to deliver functionality. When performance problems arise, it can be difficult to determine what, exactly, is causing the problem. Is it on the client or the server? Is it a networking problem? Is it a database issue or a code problem?

End-to-end performance monitoring tools exist that track an application request from initiation to completion. These solutions provide enhanced visibility specifically into application performance—giving organizations the power to understand both when and why performance has degraded, and the information needed to improve matters in a business-prioritized fashion.

By following the workload as it progresses across multiple pieces of hardware and software, problem determination becomes possible.

Workload Testing and Estimation

Another category of SQL performance tool allows you to identify a workload consisting of programs and transactions that are to be run during a specific timeframe. The tools help to identify performance issues that crop up only when the application is running at a production volume.

Data Studio

Finally, no overview of application performance tools for DB2 would be complete without a brief mention of IBM's Data Studio. Data Studio is a free-of-charge tool for basic DB2 administration and development tasks. Data Studio offers an easy to use GUI interface for the following:

Designing data access queries and routines
Building, running, and tuning SQL
Building, testing, and deploying stored procedures (using SQL or Java)
Creating Web services in for Service Oriented Architecture (SOA) solutions
Developing DB2 SQLJ applications
Managing database objects and authorizations

You can download Data Studio at IBM’s website. It is available as a stand-alone package geared mostly for DBAs, or as an IDE geared for both DBA and development work.

Of course, IBM sells other DB2 tools for a fee, some of whichcan integrate and work well with Data Studio. And there are other tools that compete with Data Studio that offer a lot more functionality than the basics provided by the free capabilities of Data Studio (such as Dell's Toad).

Summary

These past few posts have taken a broad overview look at the categories and types of performance tools available for managing the performance of your DB2 for z/OS environment. Many of the same categories of tools are available for DB2 for LUW (as well as other DBMS offerings).

Have I missed any important categories? If so, drop me a line or add a comment here to the blog. I'm always interested in getting feedback.

Thanks... and happy performance tuning!

Friday, October 17, 2014

Performance Tools That Operate on Databases and Database Objects

In our last blog post here, we covered DB2 system performance management tools - that is, tools that look at the performance at a system or subsystem level. Today, we turn our attention to the database objects...

Most DBMSs do not provide an intelligent database analysis capability. Instead, the DBA or performance analyst must use system catalog views and queries, or a system catalog tool, to keep watch over each database and its objects. This is not an optimal solution because it relies on human intervention for efficient database organization, opening up the possibility for human error.

DB2 for z/OS, however, does provide Real Time Statistics that can be used to drive database optimization and maintenance. What are Real Time Statistics (or RTS)?

Well, RTS are similar to traditional database statistics that are accumulated using a utility programs (RUNSTATS), but the RTS are accumulated by DB2 “on the fly” as the database management system and its applications are running. That is to say, without having to run a utility program.

RTS are stored in two tables in the DB2 Catalog:

SYSIBM.SYSTABLESPACESTATS: Contains statistics on table spaces and table space partitions
SYSIBM.SYSINDEXSPACESTATS: Contains statistics on index spaces and index space partitions

But since this post is supposed to be talking about database-performance tools, I don’t want to get into a full blown discussion of RTS… after all, RTS are a built-in component of DB2. That said, the ability of DB2 to generate and store RTS enables database performance tools to make decisions based on actual, up-to-date performance metrics. Of course, DB2 is not the only DBMS with such metrics, but since this is a blog about DB2, I won’t get into any details of the other database systems.

Database Analysis Tools

At any rate, database analysis tools are available that can proactively and automatically monitor your database environment. These database analysis tools typically can:

Collect statistics for tables and indexes: standard statistical information from the DBMS, extended statistics capturing more information (for example, data set extents), or a combination of both.
Read the underlying data sets for the database objects to capture current statistics, read the database statistics from the system catalog, read tables unique to the tool that captured the enhanced statistics, or any combination thereof.
Set thresholds based on database statistics whereby the automatic scheduling of database reorganization and other maintenance tasks can be invoked.
Provide a series of canned reports detailing the potential problems for specific database objects.

Database Utilities

Another category of performance tool that operates at the database (or database object) level are database utilities. Usually there are some number of rudimentary utilities that ship for free with the DBMS. These are usually simple, no-frills programs that are notorious for poor performance, especially on very large tables. However, these utilities are required to populate, administer, and organize your databases. The typical utilities that are provided are LOAD, UNLOAD, REORG, RUNSTATS, BACKUP, and RECOVER, as well as utilities for integrity checking.

Although I suppose it is possible to make an argument, at some level, for any and all of these utilities to have a performance aspect to them, REORG and RUNSTATS are the ones that definitely impact database performance.

RUNSTATS is used to gather statistics on the composition of the database and REORG is used to organize table space data optimally.

There are third-party vendors that provide support tools that replace the database utilities and provide the same or more functionality in a more efficient manner. For example, it is not unheard of for third-party vendors to claim that its utilities execute anywhere from four to ten times faster than the native DBMS utilities. These claims must be substantiated for the data and applications at your organization (but such claims are believable). Before committing to any third-party utility, the DBA should be sure that the product provides all of the basic functionality required.

When testing utility tools from different vendors, be sure to conduct fair tests. For example, always reload or recover prior to testing REORG utilities, or you may skew your results due to different levels of table organization. Additionally, always run the tests for each tool on the same object with the same amount of data, and make sure that the data cache is flushed between each test run. Finally, make sure that the workload on the system is the same (or as close as possible) when testing each product because concurrent workload can skew benchmark test results.

Yet another category of database-focused tool is the Utility management tool. This type of tool provides administrative support for the creation and execution of database utility jobstreams. These utility generation and management tools:

Automatically generate utility parameters, JCL, or command scripts.
Monitor the database utilities as they execute.
Automatically schedule utilities when exceptions are triggered.
Restart utilities with a minimum of intervention. For example, if a utility cannot be restarted, the utility manager should automatically terminate the utility before resubmitting it.

Space Management Tools

Most DBMSs provide basic statistics for space utilization, but the in-depth statistics required for both space management and performance tuning are usually inadequate for heavy duty administration. For example, most DBMSs lack the ability to monitor the requirements of the underlying files used by the DBMS. When these files go into extents or become defragmented, performance can suffer. Without a space management tool, the only way to monitor this information is with arcane and difficult-to-use operating system commands. This can be a tedious exercise.

Additionally, each DBMS allocates space differently. The manner in which the DBMS allocates this space can result in inefficient disk usage. Sometimes space is allocated, but the database will not use it. A space management tool is the only answer for ferreting out the amount of used space versus the amount of allocated space.

Space management tools often interface with other database and systems management tools such as operating system space management tools, database analysis tools, system catalog query and management tools, and database utility generators.

Compression Tools

A standard tool for reducing storage costs is the compression utility. This type of tool operates by applying an algorithm to the data in a table such that the data is encoded in a more compact area. By reducing the amount of area needed to store data, overall storage costs are decreased. Compression tools must compress the data when it is added to the table and subsequently modified, then expand the data when it is later retrieved.

In the earlier days of DB2, compression tools that used an exit routine were common. But ever since DB2 Version 3, which introduced the built-in, hardware-assisted compression capability of DB2, compression duties are handled quite efficiently with out-of-the-box DB2 functionality.

Additionally, some tools are available that compress database logs, enabling more log information to be retained on disk before it is offloaded to another medium.

Synopsis

So, there are a number of different categories of performance tools that function at the database or database object level that are worth considering. These differ from system performance tools (covered in the last blog post) and application performance tools (which will be covered in the next blog post).

Thursday, October 09, 2014

Database System Performance Tools

System performance tools examine the database server, its configuration, and usage. The most commonly used system performance tool is the performance monitor. Database performance monitoring and analysis tools support many types of performance-oriented requests in many ways. For example, system performance tools can operate:

In the background mode as a batch job that reports on performance statistics written by the DBMS trace facility
In the foreground mode as an online monitor that either traps trace information or captures information from the DBMS control blocks as applications execute
By sampling the database kernel and user address spaces as the program runs and by capturing information about the performance of the job, independent of database traces
By capturing database trace information and maintaining it in a history file (or table) for producing historical performance reports and for predicting performance trends
As a capacity planning device that gives statistical information about an application and the environment in which it will operate
As an after-the-fact analysis tool on a workstation, that analyzes and graphs all aspects of application performance and system-wide performance

Each database performance monitor supports one or more of these features. The evaluation of database performance monitors is a complex task. Sometimes more than one performance monitor is used at a single site—perhaps one for batch reporting and another for online event monitoring. Maybe an enterprise-wide monitoring solution has been implemented and one component of that solution is a database module that monitors your DBMS, but it lacks the details of a more sophisticated DBMS monitor. So, another performance monitor is purchased for daily DBA usage, while the module of the enterprise-wide monitoring solution is used for integrated monitoring by system administrators.

Modern database performance tools can set performance thresholds that, once reached, will alert the DBA, perform another task to report on, or actually fix the problem. These tools are typically agent-based. An agent is a piece of independent code that runs on the database server looking for problems. It interacts with, but does not rely on, a console running on another machine that is viewed by the DBA. This agent architecture enables efficient database monitoring because the agent is not tied to a workstation and can act independently. The agent sends information to the DBA only when required.

Additionally, some system performance tools are available that focus on a specific component of the DBMS such as the buffer pools (data cache). Such a tool can be used to model the memory requirements for database caching, to capture data cache utilization statistics, and perhaps even to make recommendations for improving the performance of the buffers.

Another type of performance optimization tool enables database configuration parameters to be changed without recycling the DBMS instance, subsystem, or server. These tools are useful when the changes require the DBMS to be stopped and restarted. Such tools can dramatically improve availability, especially if configuration parameters need to be changed frequently and the DBMS does not support dynamic parameter modification.

A few ISVs provide invasive system performance tools that enhance the performance of databases by adding functionality directly to the DBMS and interacting with the database kernel. Typically, these products take advantage of known DBMS shortcomings.

For example, products are available that enhance the performance of reading a database page or block or that optimize data caching by providing additional storage and control over buffers and their processing. Care must be taken when evaluating invasive performance tools. New releases of the DBMS may negate the need for these tools because functionality has been added or known shortcomings have been corrected. However, this does not mean that you should not consider invasive database performance tools. They can pay for themselves after only a short period of time. Discarding the tool when the DBMS supports its functionality is not a problem if the tool has already paid for itself in terms of better performance.

One final caution: Because invasive performance tools can interact very closely with the database kernel, be careful when migrating to a new DBMS release or a new release of the tool. Extra testing should be performed with these tools because of their intrusive nature.

Saturday, October 04, 2014

DB2 Performance Tuning Tools

Well, as I promised a post or two ago, in this and the next couple of posts we will take a look at database performance tools...

Database tools are helpful to enable organizations to effectively manage the performance of applications that access database data... and to help manage the DBMS itself. Some DBMS vendors provide embedded options and bundled tools to address database performance management. However, these tools are frequently insufficient for large-scale or heavily used database applications. Fortunately, many third-party tools will effectively manage the performance of mission-critical database applications. Tools that enable DBAs to tune databases fall into two major categories: performance management and performance optimization.

Many different types of performance management tools are available.

Performance monitors enable DBAs and performance analysts to gauge the performance of applications accessing databases in one (or more) of three ways: real time, near real time (intervals), or based on historical trends. The more advanced performance monitors are agent-based.
Performance estimation tools provide predictive performance estimation for entire programs and SQL statements based on access paths, operating environment, and a rules or inference engine.
Capacity planning tools enable DBAs to analyze the current environment and database design and perform “what-if” scenarios on both.
SQL analysis and tuning tools provide graphical and/or textual descriptions of query access paths as determined by the relational optimizer. These tools can execute against single SQL statements or entire programs.
Advisory tools augment SQL analysis and tuning tools by providing a knowledge base that provides tips on how to reformulate SQL for optimal performance. Advanced tools may automatically change the SQL (on request) based on the coding tips in the knowledge base.
System analysis and tuning tools enable the DBA to view and change database and system parameters using a graphical interface (e.g., cache and/or bufferpool tuning, log sizing).

In the performance optimization category, several tools can be used to tune databases.

Reorganization tools automate the process of rebuilding optimally organized databases. Databases can cause performance problems due to their internal organization (e.g., fragmentation, row ordering, storage allocation).
Caching tools work to buffer frequently used data in memory which can be accessed faster than secondary disk storage. These tools can augment the performance of the DBMS cache or, more commonly, integrate with the disk storage subsystem.
Compression tools enable DBAs to minimize the amount of disk storage used by databases, thereby reducing overall disk utilization and, possibly, elapsed query/program execution time, because fewer I/Os may be required. (Caution: Compression tools can also increase CPU consumption due to the overhead of their compress/decompress algorithms.)
Sorting tools can be used to sort data prior to loading databases to ensure that rows will be in a predetermined sequence. Additionally, sorting tools can be used in place of ORDER BY or GROUP BY SQL. Retrieving rows from a relational database is sometimes more efficient using SQL and ORDER BY rather than SQL alone followed by a standalone sort of the SQL results set.

The DBA will often need to use these tools in conjunction with one another—integrated and accessible from a central management console. This enables the DBA to perform core performance-oriented and database administration tasks from a single platform.

Many DBMS vendors provide solutions to manage their databases only; for example, Oracle provides Oracle Enterprise Manager, IBM offers Data Studio for DB2, and Microsoft provides SQL Server Management Studio for this purpose. Third-party vendors provide more robust options that act across heterogeneous environments such as multiple different database servers or operating systems. One example is Dell's Toad product family (there are others).

In general, it is only a good idea to use the DBMS vendor solution as your only management tool if your shop has just a single DBMS. Organizations with multiple DBMS engines running across multiple operating systems should investigate the third-party tool vendors with heterogeneous support (perhaps in addition to the single solution tools).

We will take a closer look at some of these types of tools, with a focus on DB2 for z/OS, in upcoming blog posts.

Monday, September 22, 2014

Rules for an Effective DB2 Monitoring Strategy

DB2, and relational databases in general, have a reputation of being (relatively) easy for users to understand; users specify what data to retrieve, not how to retrieve it. The layer of complexity removed for the users, however, had to be relegated elsewhere: to the code of DB2. And that means you sometimes have to dig into technical details of the DB2 optimizer or other arcane details to uncover performance issues.

DB2 also has a reputation as a large resource consumer. This reputation is largely because of DB2’s complexity. Because DB2 performance analysts must understand and monitor this complexity, they require an array of performance monitoring tools and techniques.

But I do not want to get into all of the potential tools and techniques in today’s short post. I plan to talk about the various types of DB2 performance and monitoring solutions that are available in upcoming posts.

Instead, today’s post just covers the high-level components of what is needed for an effective DB2 performance management strategy... An effective monitoring strategy includes the following:

Scheduled batch performance reports on the recent performance of DB2 applications and the DB2 subsystem; a history of these reports would be useful, too.
An online monitor that executes when DB2 executes to enable quick monitoring of performance problems as they occur.
Online monitors for all teleprocessing environments in which DB2 transactions execute (for example, CICS, IMS/TM, or TSO).
A monitoring solution that can track and report on dynamic distributed traffic.
End-to-end transaction monitoring capability, sometimes called Application Performance Management.
SQL query monitoring and explain analysis.
Regular monitoring of z/OS for memory use and VTAM for network use.
Scheduled reports from the DB2 Catalog and queries run against the RTS tables
Access to the DB2 DSNMSTR address space to review console messages.
Use of the DB2 -DISPLAY command to view databases, threads, and utility execution.

As I mentioned, I will cover the various types of performance tools and product offerings in upcoming posts. But for now, if you are interested in uncovering more information about third-party performance tools take a look at this link on my web site.

Wednesday, September 17, 2014

Plan to Attend IBM Insight 2014

The IBM Insight conference is just around the corner and if you care about DB2, Big Data, Analytics, data warehousing, or really, anything at all about enterprise data, then Las Vegas is the place to be the week of October 26 thru 30, 2014.

But what is IBM Insight? Well, you may remember it as the IBM Information on Demand conference, or IOD for short, Yes, IBM has renamed the conference yet again. I'm sure a lot of you can remember when there were a bunch of different "Technical Conferences" like the IMS Tech Conference or the DB2 Tech Conference. Those all (and several others) got rolled up into IOD... which is now IBM Insight.

IBM claims to have changed the name because "It's no longer just about information; it's what you can do with the information." And that kinda makes some sense... or at least it does to me!

With that bit of confusion out of the way, why should you attend the IBM Insight 2014 conference? Well, there is something for everybody in a data-related profession. Over 1500 presentations that run the gamut from DB2 to IMS to Cognos to BI to Big Data to analytics to... well, you get the idea. And there are five fast track groups that highlight several of IBM's important initiatives covering:

Watson and Cognitive Computing
Cloud
Security
Infrastructure
Mobile and Social Engagement

And with over 13,000 attendees the opportunity to network with your peers is unmatched! Add to that the over 350 exhibitors at the Expo hall and you will be able to view, review, and examine all kinds of interesting software to help you manage your enterprise data.

Not to mention the fact that I'll be presenting there on the Top 10 DB2 Things You Need to Know (Wednesday, October 29, at 3:00 pm). Well, OK, I guess I mentioned it.

So what are you waiting for? Register today... and there is still time (expires September 19, 2014) to get the early bird discount of $300 off!

Saturday, September 13, 2014

Submit an Abstract for IDUG NA 2015 in Philadelphia

Yes, it is time to start thinking about next year's IDUG DB2 Tech Conference already, especially if you are hoping to deliver a presentation there. The conference will be in the Philadelphia area in 2015, a first for IDUG... well, actually, the conference will be held at the Radisson Hotel Valley Forge in King of Prussia, PA - but that might as well be Philadelphia. I was born and raised in Pittsburgh, and we always thought that entire side of the state might as well be New Jersey, so it is all the same to me!

The conference information can be found at this link and you can either follow the Call for Presentations link at that page, or click here to submit your abstract.

Now why should you consider speaking at IDUG? If you have in the past, I'm sure you are wondering why somebody would even ask such a question. First of all, if you are accepted as a speaker, you get a free conference pass. And everybody can appreciate the benefit of some free education. But by putting together a presentation and preparing to speak in front of your peers you will learn more than you think! Sometimes the "teacher" learns more than the "students"... if you have never done it before, give it a try. Sure, it can be scary at first, but don't let that stop you. Learning how to present and speak in public can, and will, further your career!

Think about it, the number one fear of most people is public speaking... even more than the fear of death! You know what that means? If you are at a funeral, most people would rather be in the coffin than delivering the eulogy. That's just nuts!

And by going to IDUG you'll get a chance to network with IBMers, gold consultant, IBM champions, DBA, programmers, and more. Trust me... you don't want to miss out on this opportunity.