Monday, November 24, 2014

Peace, Prosperity and Happy Thanksgiving

Just a quick blog post today to wish all of my readers, wherever they live, peace and prosperity during this holiday season.

And to my readers in the USA, Happy Thanksgiving. You know what that means, right? Relatives, football, and a lot of food, including turkey!

It also means that we are about to "officially" start the holiday shopping season, which begins on Black Friday. Of course, anyone who has been out in the mall, or to any store really, knows that the holiday season started right after Halloween. But for most people it begins this week!

So, Happy Thanksgiving to the US folks... and peace, happiness and warm thoughts to all of you everywhere this week.

Friday, November 14, 2014

Database Basics for Beginners

Every now and then I get e-mail from readers of my blogs asking introductory/beginner questions about databases and DBMS. I cannot take the time to answer all of these e-mails in-depth, so I thought I blog a quick post with some good resources for folks.

I think a good place to start is an article I wrote several years ago now titled What is a Database? This article breaks down the benefits of a database, outlines the difference between a database and a DBMS, and provides some guidance for further reading (suggested books).

Other questions I am get asked frequently involve database administration. One is: What Does a DBA Do? Follow the link to find my answer to that question. Another is: how can I become a DBA? I wrote an article titled How to Become a DBA to answer that one. And finally, another frequent topic is: How many DBAs do I need? That is a tricky one, but I propose a framework to help answer that question in an article titled DBA Staffing Considerations.

I also get a lot of DB2 for z/OS questions. And I've written a book on that topic, plus a bunch of DB2 articles, too (all of which can be found here).

So I guess what I am saying here is to take a look at what is already "out there" to see if your questions can be answered on the web. But, please, keep the questions coming. If I do not answer your e-mail do not be discouraged. I do read most of them (unless it gets caught in my spam collector). Even if I do not have the time to respond, I keep track of what is asked and use it as input into my writing process... so you may see an answer pop up online in a blog, article, or column I write... eventually.

Tuesday, November 11, 2014

On Building Appropriate DB2 Indexes

Perhaps the single most important thing that can be done to assure optimal DB2 application performance is creating correct indexes for your tables based on the queries used by your applications. Of course, this is easier said than done. But we can start with some basics. For example, consider the following SQL statement:

  SELECT  LASTNAME, SALARY
  FROM    EMP
  WHERE   EMPNO = '000010'
  AND     DEPTNO =  'D01';

What index or indexes would make sense for this simple query? The short answer is “it depends.” But the more important answer is to understand what it depends upon! First, think about all of the possible indexes that could be created. Your first short list probably looks something like this:

  • Index1 on EMPNO
  • Index2 on DEPTNO
  • Index3 on EMPNO and DEPTNO
  • Index4 on DEPTNO and EMPNO

This is a good start and one of either Index3 or Index4 is probably the best. Either allows DB2 to use the index to immediately lookup the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table you might want to examine the impact of creating yet another index on the table. Factors to consider include:
  •  Modification impact: DB2 will automatically maintain every index that you create. This means that every INSERT and every DELETE to this table will cause data to be inserted and deleted not just from the table, but also from its indexes. And if you UPDATE the value of a column that is in an index, the index will also be updated. So, indexes speed the process of retrieval but slow down modification.
  • Importance of this particular query: The more important the query the more you may want to tune by index creation. For example, if you are coding a query that will be run every day by the CIO, you will want to make sure that it performs optimally. Who wants to risk a call from the CIO complaining about performance? So building indexes for that particular query is very important. On the other hand, a query for a low-level clerk may not necessarily be weighted as high, so that query may have to make due with the indexes that already exist. Of course, the decision depend on the importance of the application to the business – not just on the importance of the user of the application.
  • Columns in the existing indexes: If an index already exists on EMPNO or DEPTNO it might not be wise to create another index on the combination. However, it might make sense to change the other index to add the missing column. But not always because the order of the columns in the index can make a big difference depending on the query. For example, consider the following query:

        SELECT   LASTNAME, SALARY
        FROM     EMP
        WHERE    EMPNO = '000010'
        AND      DEPTNO >  'D01';

In this case, EMPNO should be listed first in the index. And DEPTNO should be listed second allowing DB2 to do a direct index lookup on the first column (EMPNO) and then a scan on the second (DEPTNO) for the greater-than. Furthermore, if indexes already exist for both columns (one for EMPNO and one for DEPTNO) DB2 potentially can use them both to satisfy this query so creating another index may not be necessary.

One final thought for today, and that is to build your indexes based on workload, not object by object. Many people make the mistake of just guessing as some indexes as they create tables for new projects. And sometimes this cannot be avoided because the SQL typically is not known before the database is created. But some of the guesses -- or maybe many of them -- are likely to be suboptimal at best, wrong at worst.

Indexes should be built to optimize access to data via your SQL queries. (Of course, there are indexes required to support RI and uniqueness, but let's leave them out of the discussion for the moment.) To properly create a set of indexes requires a list of the SQL to be used, an estimate of the amount of data in the table (and an estimate of column values if possible), an estimate of the frequency that each SQL statement will be executed, and the relative importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries (most of the time) be attempted. If you are doing it any other way, you are doing it wrong.

Of course, there is much more to index design than we have covered so far. For example, you might consider index overloading to achieve index only access. If all of the data that a SQL query asks for is contained in the index, DB2 may be able to satisfy the request using only the index. Consider our previous SQL statement. We asked for LASTNAME and SALARY given information about EMPNO and DEPTNO.  And we also started by creating an index on the EMPNO and DEPTNO columns. If we include LASTNAME and SALARY in the index as well then we never need to access the EMP table because all of the data we need exists in the index. This technique can significantly improve performance because it cuts down on the number of I/O requests.

Keep in mind, though, that it is not prudent (or even possible) to make every query an index only access. This technique should be saved for particularly troublesome or important SQL statements. 

Monday, November 03, 2014

Removing Superfluous Spaces

We all can relate to dealing with systems that have data integrity problems. But some data integrity problems can be cleaned up using a touch of SQL. Consider the common data entry problem of extraneous spaces (or blanks) inserted into a name field. Not only is it annoying, sometimes it can cause the system to ignore relationships between data elements because the names do not match. For example, “Craig  Mullins” is not equivalent to “Craig Mullins”; the first one has two spaces between the first and last name whereas the second one only has one.

You can write an SQL UPDATE statement to clean up the problem, if you know how to use the REPLACE function. REPLACE does what it sounds like it would do: it reviews a source string and replaces all occurrences of a one string with another. For example, to replace all occurrences of Z with A in the string BZNZNZ you would code:

  REPLACE(‘BZNZNZ’,’Z’,’A’)

And the result would be BANANA. So, let’s create a SQL statement using REPLACE to get rid of any unwanted spaces in the NAME column of our EMPLOYEE table:

  UPDATE EMPLOYEE
     SET NAME = REPLACE(
                 REPLACE(
                  REPLACE(NAME, SPACE(1), '<>')
                  '><', SPACE(0))
                 '<>', SPACE(1));

"Wait-a-minute," you might be saying. "What are all of those left and right carats and why do I need them?"

OK, fair enough. Let’s explain how this works starting from the inside out. The inside REPLACE statement takes the NAME column and converts every occurrence of a single space into a left/right carat. The next REPLACE (working outward), takes the string we just created, and removes every occurrence of a right/left carat combination by replacing it with a zero length string. The final REPLACE function takes that string and replaces any left/right carats with a single space. The reversal of the carats is the key to removing all spaces except one – remember, we want to retain a single space anywhere there was a single space as well as anywhere that had multiple spaces. Try it, it works.

Of course, you can use any two characters you like, but the left and right carat characters work well visually. Be sure that you do not choose to use characters that occur naturally in the string that you are acting upon.


Finally, the SPACE function was used for clarity. You could have used strings encased in single quotes, but the SPACE function is easier to read. It simply returns a string of spaces the length of which is specified as the integer argument. So, SPACE(12) would return a string of twelve spaces.

Thursday, October 23, 2014

DB2 SQL and Application Performance Tools

So far in this series of blog posts on DB2 performance tools, we have looked at system and database performance solutions. But perhaps the most important solution area involves monitoring and tuning application SQL statements.

Why do I say that? Well, the cause of most performance problems is usually due to bad SQL and application code. Not every problem, of course, but maybe as much as 70 to 80 percent of DB2 (and relational) performance issues is likely due to inefficient application code.

Writing SQL statements to access database tables is the responsibility of an application development team. However, the DBA usually gets involved when it comes to the performance of SQL. With SQL’s flexibility, the same request can be made in different ways. Because many of these methods are inefficient, application performance can fluctuate wildly unless the SQL is analyzed and tuned by an expert prior to implementation.

The EXPLAIN command provide information about the access paths used by SQL queries by parsing SQL in application programs and placing encoded output into a PLAN_TABLE or by producing a standard access path report. To gauge efficiency, a DBA must decode this data and determine if a more efficient access path is available.

SQL code reviews are required to ensure that optimal SQL design techniques are used. An application design walkthrough should be performed for each program before it moves to production. This is done to review all SQL statements, the selected access paths, and the program code in which the SQL is embedded. The review also includes an evaluation of database statistical information to ascertain whether production-level statistics were used at the time of the EXPLAIN.

A line-by-line review of application source code and EXPLAIN output is tedious and prone to error, and can cause application backlogs. SQL analysis tools greatly simplify this process by automating major portions of the code review process. The SQL analysis tool typically
  • Analyzes the SQL in an application program, describing the access paths chosen in a graphic format, an English description, or both.
  • Issues warnings when specific SQL constructs are encountered. For example, each time a sort is requested (by ORDER BY, GROUP BY, or DISTINCT), a message informs the user of the requisite sort.
  • Suggests alternative SQL solutions based on an “expert system” that reads SQL statements and their corresponding PLAN_TABLE entries and poses alternative SQL options.
  • Extends the rules used by the “expert system” to capture site-specific rules.
  • Analyzes at the subsystem, instance, server, application, plan, package, or SQL statement level.
  • Stores multiple versions of EXPLAIN output, creates performance comparisons, and plans history reports.

Tools that analyze the performance of the application code in which the SQL is embedded are available too. These tools usually capture in-depth information about programs as they are run and provide reports that specify which areas of the code consume the most resources. Unfortunately, most of these tools do not necessarily interface to SQL analysis tools. Why  might this be a problem?

Well, consider an application program that contains a singleton SELECT inside a loop. The singleton SELECT requests a single row based on a WHERE clause, checking for the primary key of that table. For each iteration of the loop, the program changes the primary key value being searched such that the entire table is read from the lowest key value to the highest key value.

SQL analysis tools will probably not target the SQL statement as inefficient because the predicate value is for the primary key, which should invoke indexed access. The application program analysis tool may flag the section of the code that accesses the data as inefficient, but it will not help you to fix it or tell you why it is inefficient.

A knowledgeable performance analyst or DBA would have to use both tools and interpret the output of each to arrive at a satisfactory conclusion. For example, it could be more efficient to code a cursor, without a predicate, to retrieve every row of the table, and then fetch each row one by one. This method would eliminate index I/O, might use parallel access, and therefore should reduce I/O and elapsed time—thereby enhancing performance.

Only a trained analyst can catch this type of design problem during a code walkthrough. Although a plan analysis tool significantly reduces the effort involved in the code review process, it cannot eliminate it. 

So what should you look for in an SQL analysis tool? The first feature required of SQL analysis tools is the ability to read and interpret standard EXPLAIN or SHOW PLAN output. The tool should be able to read the plan table or interface directly with the DBMS command to obtain the output. It then must be able to automatically scan the EXPLAIN or SHOW PLAN data and report on the selected access paths and the predicted performance. Advanced tools will provide recommendations for improving the SQL by adding indexes or modifying the SQL.

Yet another category of tool can evaluate access paths as you REBIND programs and categorize them into changed and unchanged access paths. This helps to identify where SQL tuning may be required. Advanced forms of these tools also apply rules tot he changed SQL to indicate if the access path is better or worse than the prior access path. Such tools can be incredibly helpful for performing mass rebinds of your production programs.

SQL Monitors

An SQL monitoring solution can identify running SQL statements, filter the information, and display it in an appropriate order and configuration. For example, you can use an SQL monitor to identify the Top Ten CPU users over the past hour (or the past day, week, etc.) 

Usually, there is the on-line capability, that displays what is happening right now, and the historical capability, which can display details and trends over time.

An SQL monitor is particularly helpful when working to remediate production performance issues where hundreds or thousands (or more) of SQL statements can be running at any one time.

End-to-End Performance Tools

Modern applications require multiple system components and run across multiple networked devices to deliver functionality. When performance problems arise, it can be difficult to determine what, exactly, is causing the problem. Is it on the client or the server? Is it a networking problem? Is it a database issue or a code problem?

End-to-end performance monitoring tools exist that track an application request from initiation to completion. These solutions provide enhanced visibility specifically into application performance—giving organizations the power to understand both when and why performance has degraded, and the information needed to improve matters in a business-prioritized fashion.

By following the workload as it progresses across multiple pieces of hardware and software, problem determination becomes possible.

Workload Testing and Estimation

Another category of SQL performance tool allows you to identify a workload consisting of programs and transactions that are to be run during a specific timeframe. The tools help to identify performance issues that crop up only when the application is running at a production volume. 

Data Studio

Finally, no overview of application performance tools for DB2 would be complete without a brief mention of IBM's Data Studio. Data Studio is a free-of-charge tool for basic DB2 administration and development tasks. Data Studio offers an easy to use GUI interface for the following:
  • Designing data access queries and routines
  • Building, running, and tuning SQL
  • Building, testing, and deploying stored procedures (using SQL or Java)
  • Creating Web services in for Service Oriented Architecture (SOA) solutions
  • Developing DB2 SQLJ applications
  • Managing database objects and authorizations

You can download Data Studio at IBM’s website. It is available as a stand-alone package geared mostly for DBAs, or as an IDE geared for both DBA and development work.

Of course, IBM sells other DB2 tools for a fee, some of whichcan integrate and work well with Data Studio. And there are other tools that compete with Data Studio that offer a lot more functionality than the basics provided by the free capabilities of Data Studio (such as Dell's Toad).

Summary

These past few posts have taken a broad overview look at the categories and types of performance tools available for managing the performance of your DB2 for z/OS environment. Many of the same categories of tools are available for DB2 for LUW (as well as other DBMS offerings).

Have I missed any important categories? If so, drop me a line or add a comment here to the blog. I'm always interested in getting feedback.

Thanks... and happy performance tuning!