Thursday, July 03, 2014

Database Versus DBMS

What is a database? I bet most people reading this blog post think that they know the answer to that question. But many of them would be wrong. DB2 is not a database, it is a DBMS, or Database Management System. You can use DB2 to create a database, but DB2, in and of itself, is not a database. Same goes for Oracle (which is a DBMS and a company) and SQL Server (just a DBMS).
So what is a database? A database is an organized store of data wherein the data is accessible by named data elements (for example, fields, records, and files). It does not even have to be computerized to be a database. The phone book is a database (Why do they still send out phone books? Does anyone even use them any more? Now I’m way off topic, so let’s get back on track.)
A DBMS is software that enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving and storing information in a database. DBMSs also are generally responsible for data integrity, data access control, and automated rollback, restart and recovery.
In layman’s terms, you can think of a database as a filing system. You can think of the filing cabinet itself along with the file folders and labels as the DBMS. A DBMS manages databases. You implement and access database instances using the capabilities of the DBMS.
So, DB2 and Oracle and SQL Server and MySQL are database management systems. Your payroll application uses the payroll database, which may be implemented using DB2 or Oracle or…
Why is that important? If we do not use precise terms when we write, speak, and work confusion can result. And confusion leads to over budget projects, improperly developed systems, and lost productivity. So precision must be important to us.

Tuesday, July 01, 2014

Blog Recommendation: Essential SQL

Hello, regular readers... just a short post today with a blog recommendation for anybody who uses SQL or wants to learn how to use SQL.

The name of the blog is Essential SQL by Kris Wenzel.

I happened upon the blog a couple of weeks ago and it offers up some nice, educational content on SQL. It is not specific to DB2, but the material is high-level and easily convertible to a DB2 environment.

The material on the blog starts out very basic with no assumption of any prior SQL knowledge... and builds up over time adding on details. Learn as much, or as little as you'd like.

Hope you find the blog to be useful (either for yourself, or to pass along to others)...

Happy SQL coding!

Monday, June 16, 2014

Don't Forget the Humble DB2 DISPLAY Command

Although robust performance and administration tools are probably the best solution for gathering information about your DB2 subsystems and databases, you can gain significant insight into your DB2 environment simply using the DISPLAY command.  There are multiple variations of the DISPLAY command depending on the type of information you are looking for.

DISPLAY DATABASE is probably the most often-used variation of the DISPLAY command. The output of the basic command shows the status of the database objects specified, along with any exception states. For example, issuing -DISPLAY DATABASE(DBNAME) shows details on the DBNAME database, including information about its tablespaces and indexes. With one simple command you can easily find all of the tablespaces and indexes within any database — pretty powerful stuff. But you also get status information for each space, too. When a status other than RO or RW is encountered, the object is in an indeterminate state or is being processed by a DB2 utility.

There are additional options that can be used with DISPLAY DATABASE. For partitioned page sets, you can specify which partition, or range of partitions, to show. And you can choose to display only objects in restricted or advisory status using either the ADVISORY or RESTRICT key word.

You can control the amount of output generated by DISPLAY DATABASE using the LIMIT parameter. The default number of lines returned by the DISPLAY command is 50, but the LIMIT parameter can be used to set the maximum number of lines returned to any numeric value; or you can use an asterisk (*) to indicate no limit.

Moving on, the DISPLAY BUFFERPOOL command provides the current status and allocation information for each buffer pool. The output includes the number of pages assigned to each pool, whether the pages have been allocated, and the current settings for the sequential steal and deferred write thresholds. For additional information on buffer pools you can specify the DETAIL parameter to return usage information such as number of GETPAGEs, prefetch usage, and synchronous reads. You can use this data for rudimentary buffer pool tuning.

You can gather even more information about your buffer pools using the LIST and LSTATS parameters. The LIST parameter shows open table spaces and indexes within the specified buffer pools; the LSTATS parameter shows statistics for the table spaces and indexes. Statistical information is reset each time DISPLAY with LSTATS is issued, so the statistics are as of the last time LSTATS was issued. 

If you are charged with running (IBM) DB2 utilities, another useful command is DISPLAY UTILITY. Issuing this command causes DB2 to display the status of all active, stopped, or terminating utilities. So, if you are in over the weekend running REORGs, RUNSTATS, or image copies, you can issue occasional DISPLAY UTILITY commands to keep up-to-date on the status of your jobs. By monitoring the current phase of the utility and matching this information with the utility phase information, you can determine the relative progress of the utility as it processes. The COUNT specified for each phase lists the number of pages that have been loaded, unloaded, copied, or read.

You can use the DISPLAY LOG command to display information about the number of active logs, their current capacity, and the setting of the LOGLOAD parameter. For archive logs, use the DISPLAY ARCHIVE command.

DISPLAY is helpful, too, if your organization uses stored procedures or user-defined functions (UDFs). DISPLAY PROCEDURE monitors whether procedures are currently started or stopped, how many requests are currently executing, the high-water mark for requests, how many requests are queued, how many times a request has timed out, and the WLM environment in which the stored procedure executes. And you can use the DISPLAY FUNCTION SPECIFIC command to monitor UDF statistics.

DISPLAY also returns a status indicating the state of each procedure or UDF. A procedure or UDF can be in one of four potential states: STARTED, STOPQUE (requests are queued), STOPREJ (requests are rejected), or STOPABN (requests are rejected because of abnormal termination).
And there remains a wealth of additional information that the DISPLAY command can uncover. For distributed environments, DISPLAY DDF shows configuration and status information, as well as statistical details on distributed connections and threads; DISPLAY LOCATION shows distributed threads details; DISPLAY PROFILE shows whether profiling is active or inactive; DISPLAY GROUP provides details of data-sharing groups (including the version of DB2 for each member) and DISPLAY GROUPBUFFERPOOL shows information about the status of DB2 group buffer pools; DISPLAY RLIMIT provides the status of the resource limit facility; DISPLAY THREAD display active and in-doubt connections to DB2; and DISPLAY TRACE lists your active trace types and classes along with the specified destinations for each.

If you are looking for some additional, more in-depth details on the DISPLAY command, take a look at this series of blog posts I wrote last year:
  • Part 1 of the series focused on using DISPLAY to monitor details about your database objects; 
  • Part 2 focused on using DISPLAY to monitor your DB2 buffer pools;
  • Part 3 covered utility execution and log information;
  • And Part 4 examined using the DISPLAY command to monitor DB2 stored procedures and user-defined functions.

Summary

The DB2 DISPLAY command is indeed a powerful and simple tool that can be used to gather a wide variety of details about your DB2 subsystems and databases. Every DBA should know how to use DISPLAY and its many options to simplify their day-to-day duties and job tasks.

Tuesday, June 10, 2014

ORDER BY an Expression

Sometimes a program requires that the results of a query be returned in a specific sequence. We all know that the ORDER BY clause can be used to sort SQL results into a specific order. For example, to return a sorted list of employee compensation sorted by last name we could write:

SELECT LASTNAME, 
       FIRSTNAME, 
       SALARY+COMM+BONUS
FROM   EMP
ORDER BY LASTNAME;

But what if we need to sort it by total compensation? There are two approaches that work here: position number and column renaming. Using position number the ORDER BY clause becomes:

SELECT LASTNAME, 
       FIRSTNAME, 
       SALARY+COMM+BONUS
FROM   EMP
ORDER BY 3;

This will cause DB2 to sort by the third element in the SELECT-list, in this case the total compensation expression. But what if we add another column at the beginning of the SELECT-list? Or what if we need to port the SQL to a different database with different standards? Well, in that case we can use column renaming:

SELECT LASTNAME, 
       FIRSTNAME, 
       SALARY+COMM+BONUS AS TOTAL_COMP
FROM   EMP
ORDER BY TOTAL_COMP;

This method is preferred for a number of reasons:

  • it will continue to work even if the SQL statement is changed
  • it gives the expression a name making it more self-documenting
  • it should be more portable

Monday, June 02, 2014

Don't Neglect Your DB2 Rebind Strategy

We’re all busy. Frequently it can seem like you just got in to the office and already it is past quitting time! There is so much to do and so little time to do it all. And we all work more than 40 hours a week… these are some of the common complaints of the busy DBA.

And those are valid concerns, but it does not diminish the need to properly address DB2 database administration and performance management... with a special focus on proactive management. 

So please take a little bit of time to read about, and consider your organization's strategy for rebinding DB2 applications.

REBIND Strategy

One of the most important contributors to the on-going efficiency and health of your DB2 environment is proper management of DB2 access path changes. A thorough REBIND management process is a requirement for healthy DB2 applications.

But many shops do not do everything possible to keep access paths up-to-date with the current state of their data. Approaches vary, such as rebinding only when a new version of DB2 is installed, whenever PTFs are applied to DB2, or to rebind automatically after a regular period of time. Although these methods are workable, they are less than optimal. 

The worst approach though is the “if it ain’t broke don’t fix it” mentality. In other words, many DBA groups adopt “never REBIND unless you absolutely have to” as a firm policy. The biggest problem this creates is that it penalizes every program in your subsystem for fear of a few degraded access paths. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data and environment change. Of course, the possibility of degraded performance after a REBIND is real – and that is why many sites avoid regularly rebinding their programs.

Even so, the best approach is to perform regular REBINDs as your data changes. To do so, you should follow the Three R’s. Regularly reorganizing to ensure optimal structure; followed by RUNSTATS to ensure that the reorganized state of the data is reflected in the DB2 Catalog; and finally, rebinding all of programs that access the reorganized structures. This technique can improve application performance because access paths will be better designed based on an accurate view of your data.

Of course, adopting the Three R’s approach raises questions, such as “When should you reorganize?” To properly determine when to reorganize you’ll have to examine statistics. This means looking at either RUNSTATS in the catalog or Real Time Statistics (RTS). So, the Three R’s become the Four 4 R’s – examine the Real Time Stats, REORG database objects as indicated by RTS, RUNSTATS to get the new statistics, then REBIND any impacted application programs.

Some organizations do not rely on statistics to schedule REORGs. Instead, they build REORG  JCL as they create each object – that is, create a table space, build and schedule a REORG job, and run it monthly or quarterly. This is better than no REORG at all, but it is not ideal because you are likely to be reorganizing too soon (wasting CPU cycles) or too late (causing performance degradation until REORG).

It is better to base your REORGs off of thresholds on catalog or real-time statistics. Statistics are the fuel that makes the optimizer function properly. Without accurate statistics the optimizer cannot formulate the best access path to retrieve your data because it does not know how your data is currently structured. So when should you run RUNSTATS? One answer is “as frequently as possible based on how often your data changes.” To succeed you need an understanding of data growth patterns – and these patterns will differ for every table space and index.

The looming question is this: why are we running all of these RUNSTATS and REORGs? To improve performance, right? But only with regular REBINDs will your programs take advantage of the new statistics to build more efficient access paths, at least for static SQL applications.

Without an automated method of comparing and contrasting access paths, DB2 program change management can be time-consuming and error-prone – especially when we deal with thousands of programs. And we always have to be alert for a rogue access path – that is, when the optimizer formulates a new access path that performs worse than the previous access path.

Regular rebinding means that you must regularly review access paths and correct any “potential” problems. Indeed, the Four R’s become the Five R’s because we need to review the access paths after rebinding to make sure that there are no problems. So, we should begin with RTS (or RUNSTATS) to determine when to REORG. After reorganizing we should run RUNSTATS again, followed by a REBIND. Then we need that fifth R – which is to Review the access paths generated by the REBIND.

The review process involves finding which statements might perform worse than before. Ideally, the DBAs would review all access path changes to determine if they are better or worse. But DB2 does not provide any systematic means of doing that. There are tools that can help you achieve this though.

The bottom line is that DB2 shops should implement best practices whereby access paths are tested to compare the before and after impact of the optimizer’s choices. By adopting best practices to periodically REBIND your DB2 programs you can achieve better overall application performance because programs will be using access paths generated from statistics that more accurately represent the data. And by implementing a quality review step there should be less need to reactively tune application performance because there will be fewer access path and SQL-related performance problems.