The Db2 Portal Blog

Tuesday, July 08, 2014

DB2 Application Performance Management

Assuring optimal performance for database applications can be a tricky thing. In today's blog I ruminate on the high-level issues involved in optimizing your DB2 for z/OS applications.

Applications that access databases are only as good as the performance they achieve. And every user wants their software to run as fast as possible. As such, performance tuning and management is one of the biggest demands on the DBA’s time. When asked what is the single most important or stressful aspect of their job, DBAs typically respond "assuring optimal performance." Indeed, a Forrester Research survey indicates that performance and troubleshooting tops the list of most challenging DBA tasks.

But when you are dealing with data in a database management system there are multiple interacting components that must be managed and tuned to achieve optimal performance. That is, every database application, at its core, requires three components to operate:

· the system (that is, the DBMS itself, the network, and the O/S),

· the database (that is, the DDL and database schema), and

· the application (that is, the SQL and program logic).

Each of these components requires care and attention, but today I want to focus on the high-level aspects of performance management from the perspective of the application. Furthermore, I will discuss this in terms of DB2 for z/OS.

So where do we begin? For DB2, a logical starting point is with BIND Parameters. There are many parameters and values that must be chosen from and specified when you bind a DB2 application program. The vast array of options at our disposal can render the whole process extremely confusing -- especially if you don’t bind on a daily basis. And even if you do, some of the options still might be confusing if you rarely have to change them. You know what I’m talking about, parameters like ACQUIRE, RELEASE, VALIDATE, and DEGREE.

I will not delve into the myriad bind options and give you advice on which to use when. There are many articles and books, as well as the IBM DB2 manuals that you can use to guide you along that path. Suffice it to say, that there are some standard parameters and values that should be chosen most of the time in most situations. As such, a wise DBA group will set up canned routines for the programmers to use for compiling and binding their applications. Choices such as: CICS transaction, DB2 batch, and BI/analytical query can be presented to the developer and then, based on which of the various types of programs and environments that are available, the canned script can choose the proper bind options. Doing so can greatly diminish the problems that can be encountered when the "wrong" parameters or values are chosen at Bind time.

Before concluding this short section on Bind parameters I want to give one important piece of advice: In production, always Bind your plans and packages specifying EXPLAIN YES. Failing to do so means that access paths will be generated, but you will not know what they are. This is akin to blinding yourself to what DB2 is doing and it makes application performance tuning much more difficult.

Access Path Management

Bind and Rebind are important components to achieve optimal DB2 application performance. This is so because these commands are what determine the access paths to the data requested by your program. So it is vitally important that you create a strategy for when and how to Rebind your programs. There are several common approaches. The best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change. More on this in a moment.

Other approaches include Rebinding only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to Rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data â€“ but it still can pose significant administrative issues.

The final approach -- the worst of the bunch -- comes from the if it ain’t broke don’t fix it school of thought. Basically, it boils down to (almost) never rebinding your programs. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Oh, the possibility of degraded performance is real and that is why this approach has been adopted by some. And it can be difficult to find which statements may have degraded after a Rebind. The ideal situation would allow us to review the access path changes before hand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions.

By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to DB2 (new releases/PTFs) optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need 4 R’s:

RUNSTATS or preferably, RTS
REORG
RUNSTATS
REBIND

But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another R to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don't call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.

So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal DB2 application performance.

Tuning the Code

Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with -- and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.

SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:

Let SQL do the work instead of the program. For example, code an SQL join instead of two cursors using program logic to join.
Simpler is generally better, but complex SQL can be very efficient.
Retrieve only the columns required, never more.
Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate.
When joining tables, always provide join predicates. In other words, avoid Cartesian products.
Favor using Stage 1 and Indexable predicates.
But favor Stage 2 predicates over application logic.
Avoid sorting (if possible) by creating indexes for ORDER BY and GROUP BY operations.
Avoid black boxes -- that is, avoid I/O routines that are called by programs instead of using embedded SQL.
Minimize deadlocks by updating tables in the same sequence in every program.
Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.

Even if you follow the guidelines in this bulleted list, there will still be numerous opportunities for you to tune SQL for performance. To tune SQL you must be able to interpret the output of the access paths produced by EXPLAIN. This information is encoded in the plan tables. IBM offers Data Studio (as a free download) with a visual explain capability that can simplify this process. But you will also have to accumulate experience as to which SQL formulations work more efficiently than others. This skill will come with time and on-the-job learning.

Finally, some attention must be paid to the host language code. Host language code refers to the application programs written in C, COBOL, Java, Visual Basic or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem.

Bottom Line

Although DBAs must understand all three aspects of database performance management concentrating on the application aspects of performance will most likely provide the most bang-for-the-buck. Of course, we have only touched the tip of the DB2 application performance iceberg today. But even this high-level view into application performance can serve as a nice starting place for tuning your DB2 applications.

Good luck with DB2 for z/OS and happy performance tuning!

Thursday, July 03, 2014

Database Versus DBMS

What is a database? I bet most people reading this blog post think that they know the answer to that question. But many of them would be wrong. DB2 is not a database, it is a DBMS, or Database Management System. You can use DB2 to create a database, but DB2, in and of itself, is not a database. Same goes for Oracle (which is a DBMS and a company) and SQL Server (just a DBMS).

So what is a database? A database is an organized store of data wherein the data is accessible by named data elements (for example, fields, records, and files). It does not even have to be computerized to be a database. The phone book is a database (Why do they still send out phone books? Does anyone even use them any more? Now I’m way off topic, so let’s get back on track.)

A DBMS is software that enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving and storing information in a database. DBMSs also are generally responsible for data integrity, data access control, and automated rollback, restart and recovery.

In layman’s terms, you can think of a database as a filing system. You can think of the filing cabinet itself along with the file folders and labels as the DBMS. A DBMS manages databases. You implement and access database instances using the capabilities of the DBMS.

So, DB2 and Oracle and SQL Server and MySQL are database management systems. Your payroll application uses the payroll database, which may be implemented using DB2 or Oracle or…

Why is that important? If we do not use precise terms when we write, speak, and work confusion can result. And confusion leads to over budget projects, improperly developed systems, and lost productivity. So precision must be important to us.

Tuesday, July 01, 2014

Blog Recommendation: Essential SQL

Hello, regular readers... just a short post today with a blog recommendation for anybody who uses SQL or wants to learn how to use SQL.

The name of the blog is Essential SQL by Kris Wenzel.

I happened upon the blog a couple of weeks ago and it offers up some nice, educational content on SQL. It is not specific to DB2, but the material is high-level and easily convertible to a DB2 environment.

The material on the blog starts out very basic with no assumption of any prior SQL knowledge... and builds up over time adding on details. Learn as much, or as little as you'd like.

Hope you find the blog to be useful (either for yourself, or to pass along to others)...

Happy SQL coding!

Monday, June 16, 2014

Don't Forget the Humble DB2 DISPLAY Command

Although robust performance and administration tools are probably the best solution for gathering information about your DB2 subsystems and databases, you can gain significant insight into your DB2 environment simply using the DISPLAY command. There are multiple variations of the DISPLAY command depending on the type of information you are looking for.

DISPLAY DATABASE is probably the most often-used variation of the DISPLAY command. The output of the basic command shows the status of the database objects specified, along with any exception states. For example, issuing -DISPLAY DATABASE(DBNAME) shows details on the DBNAME database, including information about its tablespaces and indexes. With one simple command you can easily find all of the tablespaces and indexes within any database — pretty powerful stuff. But you also get status information for each space, too. When a status other than RO or RW is encountered, the object is in an indeterminate state or is being processed by a DB2 utility.

There are additional options that can be used with DISPLAY DATABASE. For partitioned page sets, you can specify which partition, or range of partitions, to show. And you can choose to display only objects in restricted or advisory status using either the ADVISORY or RESTRICT key word.

You can control the amount of output generated by DISPLAY DATABASE using the LIMIT parameter. The default number of lines returned by the DISPLAY command is 50, but the LIMIT parameter can be used to set the maximum number of lines returned to any numeric value; or you can use an asterisk (*) to indicate no limit.

Moving on, the DISPLAY BUFFERPOOL command provides the current status and allocation information for each buffer pool. The output includes the number of pages assigned to each pool, whether the pages have been allocated, and the current settings for the sequential steal and deferred write thresholds. For additional information on buffer pools you can specify the DETAIL parameter to return usage information such as number of GETPAGEs, prefetch usage, and synchronous reads. You can use this data for rudimentary buffer pool tuning.

You can gather even more information about your buffer pools using the LIST and LSTATS parameters. The LIST parameter shows open table spaces and indexes within the specified buffer pools; the LSTATS parameter shows statistics for the table spaces and indexes. Statistical information is reset each time DISPLAY with LSTATS is issued, so the statistics are as of the last time LSTATS was issued.

If you are charged with running (IBM) DB2 utilities, another useful command is DISPLAY UTILITY. Issuing this command causes DB2 to display the status of all active, stopped, or terminating utilities. So, if you are in over the weekend running REORGs, RUNSTATS, or image copies, you can issue occasional DISPLAY UTILITY commands to keep up-to-date on the status of your jobs. By monitoring the current phase of the utility and matching this information with the utility phase information, you can determine the relative progress of the utility as it processes. The COUNT specified for each phase lists the number of pages that have been loaded, unloaded, copied, or read.

You can use the DISPLAY LOG command to display information about the number of active logs, their current capacity, and the setting of the LOGLOAD parameter. For archive logs, use the DISPLAY ARCHIVE command.

DISPLAY is helpful, too, if your organization uses stored procedures or user-defined functions (UDFs). DISPLAY PROCEDURE monitors whether procedures are currently started or stopped, how many requests are currently executing, the high-water mark for requests, how many requests are queued, how many times a request has timed out, and the WLM environment in which the stored procedure executes. And you can use the DISPLAY FUNCTION SPECIFIC command to monitor UDF statistics.

DISPLAY also returns a status indicating the state of each procedure or UDF. A procedure or UDF can be in one of four potential states: STARTED, STOPQUE (requests are queued), STOPREJ (requests are rejected), or STOPABN (requests are rejected because of abnormal termination).

And there remains a wealth of additional information that the DISPLAY command can uncover. For distributed environments, DISPLAY DDF shows configuration and status information, as well as statistical details on distributed connections and threads; DISPLAY LOCATION shows distributed threads details; DISPLAY PROFILE shows whether profiling is active or inactive; DISPLAY GROUP provides details of data-sharing groups (including the version of DB2 for each member) and DISPLAY GROUPBUFFERPOOL shows information about the status of DB2 group buffer pools; DISPLAY RLIMIT provides the status of the resource limit facility; DISPLAY THREAD display active and in-doubt connections to DB2; and DISPLAY TRACE lists your active trace types and classes along with the specified destinations for each.

If you are looking for some additional, more in-depth details on the DISPLAY command, take a look at this series of blog posts I wrote last year:

Part 1 of the series focused on using DISPLAY to monitor details about your database objects;
Part 2 focused on using DISPLAY to monitor your DB2 buffer pools;
Part 3 covered utility execution and log information;
And Part 4 examined using the DISPLAY command to monitor DB2 stored procedures and user-defined functions.

Summary

The DB2 DISPLAY command is indeed a powerful and simple tool that can be used to gather a wide variety of details about your DB2 subsystems and databases. Every DBA should know how to use DISPLAY and its many options to simplify their day-to-day duties and job tasks.

Tuesday, June 10, 2014

ORDER BY an Expression

Sometimes a program requires that the results of a query be returned in a specific sequence. We all know that the ORDER BY clause can be used to sort SQL results into a specific order. For example, to return a sorted list of employee compensation sorted by last name we could write:

SELECT LASTNAME,

FIRSTNAME,

SALARY+COMM+BONUS

FROM EMP

ORDER BY LASTNAME;

But what if we need to sort it by total compensation? There are two approaches that work here: position number and column renaming. Using position number the ORDER BY clause becomes:

SELECT LASTNAME,

FIRSTNAME,

SALARY+COMM+BONUS

FROM EMP

ORDER BY 3;

This will cause DB2 to sort by the third element in the SELECT-list, in this case the total compensation expression. But what if we add another column at the beginning of the SELECT-list? Or what if we need to port the SQL to a different database with different standards? Well, in that case we can use column renaming:

SELECT LASTNAME,

FIRSTNAME,

SALARY+COMM+BONUS AS TOTAL_COMP

FROM EMP

ORDER BY TOTAL_COMP;

This method is preferred for a number of reasons:

it will continue to work even if the SQL statement is changed
it gives the expression a name making it more self-documenting
it should be more portable

Monday, June 02, 2014

Don't Neglect Your DB2 Rebind Strategy

We’re all busy. Frequently it can seem like you just got in to the office and already it is past quitting time! There is so much to do and so little time to do it all. And we all work more than 40 hours a week… these are some of the common complaints of the busy DBA.

And those are valid concerns, but it does not diminish the need to properly address DB2 database administration and performance management... with a special focus on proactive management.

So please take a little bit of time to read about, and consider your organization's strategy for rebinding DB2 applications.

REBIND Strategy

One of the most important contributors to the on-going efficiency and health of your DB2 environment is proper management of DB2 access path changes. A thorough REBIND management process is a requirement for healthy DB2 applications.

But many shops do not do everything possible to keep access paths up-to-date with the current state of their data. Approaches vary, such as rebinding only when a new version of DB2 is installed, whenever PTFs are applied to DB2, or to rebind automatically after a regular period of time. Although these methods are workable, they are less than optimal.

The worst approach though is the “if it ain’t broke don’t fix it” mentality. In other words, many DBA groups adopt “never REBIND unless you absolutely have to” as a firm policy. The biggest problem this creates is that it penalizes every program in your subsystem for fear of a few degraded access paths. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data and environment change. Of course, the possibility of degraded performance after a REBIND is real – and that is why many sites avoid regularly rebinding their programs.

Even so, the best approach is to perform regular REBINDs as your data changes. To do so, you should follow the Three R’s. Regularly reorganizing to ensure optimal structure; followed by RUNSTATS to ensure that the reorganized state of the data is reflected in the DB2 Catalog; and finally, rebinding all of programs that access the reorganized structures. This technique can improve application performance because access paths will be better designed based on an accurate view of your data.

Of course, adopting the Three R’s approach raises questions, such as “When should you reorganize?” To properly determine when to reorganize you’ll have to examine statistics. This means looking at either RUNSTATS in the catalog or Real Time Statistics (RTS). So, the Three R’s become the Four 4 R’s – examine the Real Time Stats, REORG database objects as indicated by RTS, RUNSTATS to get the new statistics, then REBIND any impacted application programs.

Some organizations do not rely on statistics to schedule REORGs. Instead, they build REORG JCL as they create each object – that is, create a table space, build and schedule a REORG job, and run it monthly or quarterly. This is better than no REORG at all, but it is not ideal because you are likely to be reorganizing too soon (wasting CPU cycles) or too late (causing performance degradation until REORG).

It is better to base your REORGs off of thresholds on catalog or real-time statistics. Statistics are the fuel that makes the optimizer function properly. Without accurate statistics the optimizer cannot formulate the best access path to retrieve your data because it does not know how your data is currently structured. So when should you run RUNSTATS? One answer is “as frequently as possible based on how often your data changes.” To succeed you need an understanding of data growth patterns – and these patterns will differ for every table space and index.

The looming question is this: why are we running all of these RUNSTATS and REORGs? To improve performance, right? But only with regular REBINDs will your programs take advantage of the new statistics to build more efficient access paths, at least for static SQL applications.

Without an automated method of comparing and contrasting access paths, DB2 program change management can be time-consuming and error-prone – especially when we deal with thousands of programs. And we always have to be alert for a rogue access path – that is, when the optimizer formulates a new access path that performs worse than the previous access path.

Regular rebinding means that you must regularly review access paths and correct any “potential” problems. Indeed, the Four R’s become the Five R’s because we need to review the access paths after rebinding to make sure that there are no problems. So, we should begin with RTS (or RUNSTATS) to determine when to REORG. After reorganizing we should run RUNSTATS again, followed by a REBIND. Then we need that fifth R – which is to Review the access paths generated by the REBIND.

The review process involves finding which statements might perform worse than before. Ideally, the DBAs would review all access path changes to determine if they are better or worse. But DB2 does not provide any systematic means of doing that. There are tools that can help you achieve this though.

The bottom line is that DB2 shops should implement best practices whereby access paths are tested to compare the before and after impact of the optimizer’s choices. By adopting best practices to periodically REBIND your DB2 programs you can achieve better overall application performance because programs will be using access paths generated from statistics that more accurately represent the data. And by implementing a quality review step there should be less need to reactively tune application performance because there will be fewer access path and SQL-related performance problems.

Wednesday, May 21, 2014

IBM DB2 11 Tools and Utilities: Delivering timely value to your business

Migrating to any new software release can be a lot easier when you are familiar with new features prior to deployment. So it stands to reason that you should familiarize yourself with new DB2 functionality before you try to migrate your environment to a new version. This training can take the form of:

Reading the DB2 manuals, especially the What’s New manual and the Technical Overview redbook that typically comes out with each new version of DB2
Attending presentations on the new release, whether online, at user groups, or even at IDUG or IBM Insight (which used to be the IOD conference)
Formal training from IBM or other sources

But the bottom line is that you need to educate yourself in advance of migrating to any new version of DB2... Otherwise, you may not be ready to move to the new version on a schedule that fits your business needs.

With IBM DB2 11 for z/OS, you can have a smoother migration process that also enables you to deploy key applications faster. New features and capabilities, both within DB2 11 and the tools and utilities that support DB2 can make migration easier.

And, of course, DB2 11 for z/OS comes with out-of the box cost savings and features that allow you to do more with business-critical analytics and applications. But are your DB2 Tools and DB2 Utilities ready to provide you complete exploitation or support? Do you know the difference?

Join me on June 10, 2014 as I deliver a webcast on DB2 11 for z/OS Tools and Utilities on behalf of IBM. During this informative webinar I will take you through some of the key features in DB2 11 and the importance of timely support for these features by your DB2 tools and utilities. I’ll expose some of the new capabilities of IBM’s tools and utilities for DB2 and I’ll also share ways to make your DB2 11 migration simpler, safer and faster.

So register today at http://www-01.ibm.com/software/os/systemz/webcast/10jun/

And I’ll see you in June!

Monday, May 19, 2014

Another Great IDUG DB2 Tech Conference Concludes

Last week, the annual North American IDUG DB2 Tech Conference was held in Phoenix, AZ... and, as usual, it was a great week with a LOT of educational and networking opportunities.

The week started off with a great selection of day long seminars conducted by industry luminaries, highlighted by the likes of Bonnie Baker, Susan Lawson, Dan Luksetich, Judy Nall, Roger Sanders and more. There was something for everyone whether you were a newbie or an old hat... using DB2 on LUW or z/OS... wanting to cram for certification or just learn something new.

On Tuesday the day started off with a bang as Dr. Vladimir Bacvanski, Founder of SciSpike, regaled the attendees on the nuances of Big Data and NoSQL. And he didn't use Powerpoint slides! He drew all of his stuff live, free-hand, much to the enjoyment of everybody. Especially enjoyable was his big pile of data (see below)...

After Valdimir's inspiring talk Leon Katsnelson of IBM came up to talk about the Big Data industry trends driving innovation at IBM. My next session on the first day at IDUG was another highlight of the week -- the DB2 for z/OS spotlight session (Trends and Directions). This session talked about what has already been delivered and what might be coming next (code name: Cypress).

I don't want to get into walking through every presentation I attended during the week, but suffice it to say that there was a lot of informative material shared with IDUG attendees. If you work with DB2 for a living, you really should try to get to this conference every year!

And I delivered two presentations at this year's event. The first was titled A Big Data Roadmap for the DB2 Professional, in which I explained and defined Big Data and NoSQL terminology and use cases... and offered up my opinions on Big Data and how it can/will impact the IT industry and the place of relational DBMS within it.

My second presentation was a VSP for Datavail, a DBA managed services company. In this presentation I offered up a definition of Database Administration as a management discipline and provided some best practices for DB2 DBAs. Datavail will be sponsoring me later this year in a webinar on the same topic, so be sure to keep an eye on my blog for an announcement of when that will be coming!

Other than the traditional hour long presentations, IDUG offers a fantastic opportunity for networking, especially at the vendor exhibition hall. Here I am catching up with Dave Beulke, Judy Nall, and Gerry Hodge:

And I also was able to catch up with friends I rarely see, like Peter Ong (see below), except for at conferences like IDUG:

Another highlight of the conference was the second keynote, delivered on Thursday by Donald Feinberg of Gartner. I was particularly happy to hear his presentation on data trends, especially his take on Big Data, which was remarkably similar to the one I had espoused earlier in the day!

The week also was highlighted by lots of business meetings, lots of casual conversations with friends and colleagues, great food, and I even got the chance to sit down with Dan Luksetich and record a podcast on DB2 auditing with him... I'll be sure to blog about the podcast when it goes live on Dan's site.

So, all in all, IDUG week was fantastic success... but that is no surprise to those of us who make IDUG a regular, annual event. Hope to see you there next year (in Philadelphia).

Saturday, May 10, 2014

DB2 for z/OS, Distributed Workload, and Enclaves

If you work with DB2 for z/OS and have to manage distributed workload… or if you are enabling zIIP specialty processors… chances are you’ve at least heard the term “enclave” or “enclave SRB” before. But do you understand what an enclave is and why it is important?

An enclave is a construct that represents a transaction or unit of work. Enclaves are a method of managing mainframe transactions for non-traditional workloads. You can think of an enclave as an anchor point for resource accumulation regardless of where the transaction is executing.

With traditional workloads it is relatively easy to map the resources consumed to the actual transaction doing the consumption. But with non-traditional workloads – web transactions, distributed processing, etc. – it is more difficult because the transaction can span platforms. Enclaves are used to overcome this difficulty by correlating closely to the end user’s view of the transaction.

An enclave can consist of many pieces spread over many server address spaces. So even though a non-traditional transaction can comprise multiple “pieces” spanning many address spaces, and can share those address spaces with other transactions, an enclave gives you control over the non-traditional workload. And WLM can be used to more effectively manage non-traditional workload via the enclave.

If you are interested in more details on enclaves and how they are managed, read through Enclaves – Managing Business Transactions from IBM’s RMF Newsletter.

Friday, May 09, 2014

IDUG DB2 Tech Conference in Phoenix, Arizona

Today's blog post is just a quick one to remind everybody about the 2014 IDUG DB2 Tech Conference coming up next week in Phoenix, Arizona.

As usual, there will be a plethora of useful information shared at the conference - as well as the opportunity to meet with developers, gold consultants and DBAs/developers from all over the country... and even the world!

There are complimentary workshop sessions scheduled for Friday on a variety of topics and you will probably also want to sign up for one of the educational seminars on Monday. And don't forget to take advantage of the free IBM certification exams offered up at every IDUG conference.

This year's keynote presentation will be delivered by noted Gartner analyst Donald Feinberg, who is responsible for Gartner's research on database management systems, data warehousing infrastructure and Big Data.

And just check out the impressive agenda of DB2 presentations that will be available at this year's IDUG!

Add to all of the above an array of Special Interest Groups, the DB2 z/OS and LUW panel sessions (where you can ask questions of the developers), and let's not forget the vendor exhibition where you can check out the latest products and services for DB2... and it all adds up to IDUG being the place to be next week if you have anything to do with DB2. I know I'll be there...

...and I hope to see you there, too.

Monday, May 05, 2014

Using Check Constraints to Simulate Domains

Check constraints are a very useful, though somewhat underutilized feature of DB2. Check constraints enable enhanced data integrity without requiring procedural logic (such as in stored procedures and triggers). Let’s examine the basics of table check constraints.

A constraint is basically a restriction placed upon the data values that can be stored in a column or columns of a table. Of course, most RDBMS products provide several different types of constraints, such as referential constraints (to define primary and foreign keys) and unique constraints (to prohibit duplicates).

Check constraints place specific data value restrictions on the contents of a column through the specification of a Boolean expression. The expression is explicitly defined in the table DDL and is formulated in much the same way that SQL WHERE clauses are formulated. Any attempt to modify the column data (i.e. during INSERT and UPDATE processing) will cause the expression to be evaluated. If the modification conforms to the Boolean expression, the modification is permitted to continue. If not, the statement will fail with a constraint violation.

This functionality is great for simulating the relational concept of a domain. A domain is basically the set of valid values that a column or data type can take on. Check constraints only simulate domains, though, because there are other features provided by domains that are not provided by check constraints. One such feature is that columns pooled from separate domains must not be compared or operated on by expressions that require the same type of data for both operands. For domains to truly be supported the DBMS should support both check constraints and user-defined data types with strong type checking. This prohibits allowing ridiculous operations, such as comparing IQ to shoe size or adding Australian dollars to Euros.

Forming Check Constraints

Check constraints are written using recognizable SQL syntax. This makes them easy to implement for anyone who has even a passing familiarity with SQL. The check constraint consists of two components: a constraint name and a check condition.

The constraint name is an SQL identifier and is used to reference or identify the constraint. The same constraint name cannot be specified more than once for the same table. If a constraint name is not explicitly coded, DB2 will create a unique name automatically for the constraint.

The check condition defines the actual constraint logic. The check condition can be defined using any of the basic predicates (>, <, =, <>, <=, >=), as well as BETWEEN, IN, LIKE, and NULL. Furthermore, AND and OR can be used to string conditions together.

There are, however, restrictions on how check constraints are formulated. Some of these restrictions include:

· Limitations on the entire length of the check condition.

· Other tables may not be accessed in the check condition.

· Only a limited subset of SQL operations are permitted (for example subselects and column functions are prohibited in a check constraint).

· One of the operands (usually the first) of the check constraint must be the name of a column contained in the table for which the constraint is defined.

· The other operand (usually the second) must be either another column name in the same table or a constant value.

· If the second operand is a constant, it must be compatible with the data type of the first operand. If the second operand is a column, it must be the same data type as the first column specified.

Check Constraint Examples

Check constraints enable the DBA or database designer to specify more robust data integrity rules directly into the database. Consider the following example:

CREATE TABLE EMP

(EMPNO INTEGER

CONSTRAINT CHECK_EMPNO

CHECK (EMPNO BETWEEN 100 and 25000),

EMP_ADDRESS VARCHAR(70),

EMP_TYPE CHAR(8)

CHECK (EMP_TYPE IN (‘TEMP’, ‘FULLTIME’, ‘CONTRACT’)),

EMP_DEPT CHAR(3) NOT NULL WITH DEFAULT,

SALARY DECIMAL(7,2) NOT NULL

CONSTRAINT CHECK_SALARY

CHECK (SALARY < 50000.00),

COMMISSION DECIMAL(7,2),

BONUS DECIMAL(7,2)

);

The CREATE statement for the EMP table contains three different check constraints:

1. The name of the first check constraint for the EMP table is CHECK_EMPNO. It is defined on the EMPNO column. The constraint ensures that the EMPNO column can contain values that range from 100 to 25000 (instead of the domain of all valid integers).

2. The second check constraint for this table is on the EMP_TYPE column. This is an example of an unnamed constraint. Though this is possible, it is not recommended. It is best to always provide an explicit constraint name in order to ease identification and administration. This specific constraint restricts the values that can be placed into EMP_TYPE as: 'TEMP', 'FULLTIME', and 'CONTRACT'; no other values would be accepted.

3. The last check constraint on this table is named CHECK_SALARY. It effectively ensures that no employee can be entered with a salary of more than $50,000. (Now who would want to work there?)

Column vs. Table Level Constraints

The first check constraint example we reviewed showed a column-level check constraint. However, check constraints also may be coded at the table-level. A column-level check constraint is defined in the DDL immediately after the column. Appropriately enough, a table-level check constraint is defined after all of the columns of the table have already been defined.

It is quite common for business rules to require access to multiple columns within a single table. When this situation occurs, it is wise to code the business rule into a check constraint at the table-level, instead of at the column level. Of course, any column-level check constraint can also be defined at the table-level, as well. In terms of functionality, there is no difference between an integrity constraint defined at the table-level and the same constraint defined at the column-level.

Let’s augment our sample table DDL to add two table-level check constraints:

CREATE TABLE EMP

(EMPNO INTEGER

CONSTRAINT CHECK_EMPNO

CHECK (EMPNO BETWEEN 100 and 25000),

EMP_ADDRESS VARCHAR(70),

EMP_TYPE CHAR(8)

CHECK (EMP_TYPE IN (‘TEMP’, ‘FULLTIME’, ‘CONTRACT’)),

EMP_DEPT CHAR(3) NOT NULL WITH DEFAULT,

SALARY DECIMAL(7,2) NOT NULL

CONSTRAINT CHECK_SALARY

CHECK (SALARY < 50000.00),

COMMISSION DECIMAL(7,2),

BONUS DECIMAL(7,2),

CONSTRAINT COMM_VS_SALARY

CHECK (SALARY > COMMISSION),

CONSTRAINT COMM_BONUS

CHECK (COMMISSION>0 OR BONUS > 0),

);

The CREATE statement for the EMP table has been modified to contain two table-level check constraints having the following ramifications:

1. The name of the first table-level check constraint for the EMP table is COMM_VS_SALARY. This constraint will ensure that no employee can earn more commission than salary.

2. The second table-level check constraint is named COMM_BONUS. This constraint will ensure that every employee either earns a commission or a bonus (or possibly, both).

Check Constraint Benefits

So what are the benefits of check constraints? The primary benefit is the ability to enforce business rules directly in each database without requiring additional application logic. Once defined, the business rule is physically implemented and cannot be bypassed. Check constraints also provide the following benefits:

· Because there is no additional programming required, DBAs can implement check constraints without involving the application programming staff. This effectively minimizes the amount of code that must be written by the programming staff. With the significant application backlog within most organizations, this can be the most crucial reason to utilize check constraints.

· Check constraints provide better data integrity. As check constraints are always executed whenever the data in the column upon which they are defined is to be modified, the business rule is not bypassed during ad hoc processing and dynamic SQL. When business rules are enforced using application programming logic instead, the rules can not be checked during ad hoc processes.

· Check constraints promote consistency. Because they are implemented once, in the table DDL, each constraint is always enforced. Constraints written in application logic, on the other hand, must be executed within each program that modifies any data to which the constraint applies. This can cause code duplication and inconsistent maintenance resulting in inaccurate business rule support.

· Typically check constraints coded in DDL will outperform the corresponding application code.

The overall impact of check constraints will be to increase application development productivity while at the same time promoting higher data integrity.

Check Constraints, NULLs, and Defaults

An additional consideration for check constraints is the relational NULL. Any nullable column also defined with a check constraint can be set to null. When the column is set to null, the check constraint evaluates to unknown. Because null indicates the lack of a value, the presence of a null will not violate the check constraint.

Additionally, DB2 provides the ability to specify defaults for table columns – both system-defined defaults (pre-defined and automatically set by the DBMS) and user-defined defaults. When a row is inserted or loaded into the table and no value is specified for the column, the column will be set to the value that has been identified in the column default specification. For example, we could define a default for the EMP_TYPE column of our sample EMP table as follows:

EMP_TYPE CHAR(8) DEFAULT ‘FULLTIME’

CHECK (EMP_TYPE IN (‘TEMP’, ‘FULLTIME’, ‘CONTRACT’)),

If a row is inserted without specifying an EMP_TYPE, the column will default to the value, ‘FULLTIME’.

A problem can arise when using defaults with check constraints. Most DBMS products do not perform semantic checking on constraints and defaults. The DBMS, therefore, will allow the DBA to define defaults that contradict check constraints. Furthermore, it is possible to define check constraints that contradict one another. Care must be taken to avoid creating this type of problem.

Examples of contradictory constraints are depicted below:

CHECK (EMPNO > 10 AND EMPNO <9 o:p="">

In this case, no value is both greater than 10 and less than 9, so nothing could ever be inserted.

EMP_TYPE CHAR(8) DEFAULT ‘NEW’

CHECK (EMP_TYPE IN (‘TEMP’, ‘FULLTIME’, ‘CONTRACT’)),

In this case, the default value is not one of the permitted EMP_TYPE values according to the defined constraint. No defaults would ever be inserted.

CHECK (EMPNO > 10)

CHECK (EMPNO >= 11)

In this case, the constraints are redundant. No logical harm is done, but both constraints will be checked, thereby impacting the performance of applications that modify the table in which the constraints exist.

Other potential semantic problems could occur if the constraints contradicts a referential integrity DELETE or UPDATE rule, if two constraints are defined on the same column with contradictory conditions, or if the constraint requires that the column be NULL, but the column is defined as NOT NULL.

Other Potential Hazards

Take care when using the LOAD utility on a table with check constraints defined to it. By specifying the ENFORCE NO parameter you can permit DB2 to load data that does not conform to the check constraints (as well as the referential constraints). Although this eases the load process by enabling DB2 to bypass constraint checking, it will place the table space into a check pending state. You can run CHECK DATA to clear this state (or force the check pending off by using START with the FORCE option or the REPAIR utility). If you do not run CHECK DATA, constraint violations may occur causing dirty data.

Summary

Check constraints provide a very powerful vehicle for supporting business rules in the database. They can be used to simulate relational domains. Because check constraints are non-bypassable, they provide better data integrity than corresponding logic programmed into the application. It is a wise course of action to use check constraints in your database designs to support data integrity, domains, and business rules in all of your relational database applications.