The Db2 Portal Blog

Thursday, October 07, 2010

Null Follow-up: IS [NOT] DISTINCT FROM

After publishing the last blog post here on the topic of pesky problems that crop up when dealing with nulls, I received a comment lamenting that I did not address the IS [NOT] DISTINCT FROM clause. So today’s blog post will redress that failure.

First of all, IS [NOT] DISTINCT FROM is a relatively new predicate operator, introduced to DB2 for z/OS in Version 8. It is quite convenient to use in situations where you are looking to compare to columns that could contain NULL.

Before diving into the operator, let’s first discuss the problem it helps to solve. Two columns are not equal if both are NULL, that is because NULL is unknown and a NULL never equals anything else, not even another NULL. But sometimes you might want to treat NULLs as equivalent. In order to do that, you would have to code something like this in your WHERE clause:


WHERE COL1 = COL2
OR (COL1 IS NULL AND COL2 IS NULL)

This coding would cause DB2 to return all the rows where COL1 and COL2 are the same value, as well as all the rows where both COL1 and COL2 are NULL, effectively treating NULLs as equivalent. But this coding although relatively simply, can be unwieldy and perhaps, at least not at first blush, unintuitive.

Here comes the IS NOT DISTINCT FROM clause to the rescue. As of DB2 V8, the following clause is logically equivalent to the one above, but perhaps simpler to code and understand:


WHERE COL1 IS NOT DISTINCT FROM COL2

The same goes for checking a column against a host variable. You might try to code a clause specifying WHERE COL = :HV :hvind (host variable and indicator variable). But such a search condition would never be true when the value in that host variable is null, even if the host variable contains a null indicator. This is because one null does not equal another null - ever. Instead we’d have to code additional predicates: one to handle the non-null values and two others to ensure both COL1 and the :HV are both null. With the introduction of the IS NOT DISTINCT FROM predicate, the search condition could be simplified to just:


WHERE COL1 IS NOT DISTINCT FROM :HV :hvind

Wednesday, October 06, 2010

Null Troubles

A null represents missing or unknown information at the column level. A null is not the same as 0 (zero) or blank. Null means no entry has been made for the column and it implies that the value is either unknown or not applicable.

DB2 supports null, and as such you can use null to can distinguish between a deliberate entry of 0 (for numerical columns) or a blank (for character columns) and an unknown or inapplicable entry (NULL for both numerical and character columns).

Nulls sometimes are inappropriately referred to as “null values.” Using the term value to describe a null is inaccurate because a null implies the lack of a value. Therefore, simply use the term null or nulls (without appending the term “value” or “values” to it).

DB2 represents null in a special “hidden” column known as an indicator. An indicator is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to the end user, but must be provided for when programming in a host language (such as COBOL or PL/I).

Every column defined to a DB2 table must be designated as either allowing or disallowing nulls. A column is defined as nullable – meaning it can be set to NULL – in the table creation DDL. Null is the default if nothing is specified after the column name. To prohibit the column from being set to NULL you must explicitly specify NOT NULL after the column name. In the following sample table, COL1 and COL3 can be set to null, but not COL2, COL4, or COL5:


CREATE TABLE SAMPLE1
  (COL1   INTEGER,
   COL2   CHAR(10) NOT NULL,
   COL3   CHAR(5),
   COL4   DATE     NOT NULL WITH DEFAULT,
   COL5   TIME     NOT NULL);

What Are The Issues with Null?

The way in which nulls are processed usually is not intuitive to folks used to yes/no, on/off, thinking. With null data, answers are not true/false, but true/false/unknown. Remember, a null is not known. So when a null participates in a mathematical expression, the result is always null. That means that the answer to each of the following is NULL:

5 + NULL
NULL / 501324
102 – NULL
51235 * NULL
NULL**3
NULL + NULL
NULL/0

Yes, even that last one is null, even though the mathematician in us wants to say “error” because of division by zero. So nulls can be tricky to deal with.

Another interesting aspect of nulls is that the AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. To clarify with an example, if the COMM column is nullable, the result of the following query:


    SELECT  AVG(COMM)
    FROM    DSN8810.EMP;

is not the same as for this query:


    SELECT  SUM(COMM)/COUNT(*)
    FROM    DSN8810.EMP;

But perhaps the more troubling aspect of this treatment of nulls is “What exactly do the results mean?” Shouldn’t a function that processes any NULLs at all return an answer of NULL, or unknown? Does skipping all columns that are NULL return a useful result? I think what is really needed is an option for these functions when they operate on nullable columns. Perhaps a switch that would allow three different modes of operation:

Return a NULL if any columns were null, which would be the default
Operate as it currently does, ignoring NULLs
Treat all NULLs as zeroes

At least that way users would have an option as to how NULLs are treated by functions. But this is not the case, so to avoid confusion, try to avoid allowing nulls in columns that must be processed using these functions whenever possible.

Here are some additional considerations regarding the rules of operation for nulls:

When a nullable column participates in an ORDER BY or GROUP BY clause, the returned nulls are grouped at the high end of the sort order.
Nulls are considered to be equal when duplicates are eliminated by SELECT DISTINCT or COUNT (DISTINCT column).
A unique index considers nulls to be equivalent and disallows duplicate entries because of the existence of nulls, unless the WHERE NOT NULL clause is specified in the index.
For comparison in a SELECT statement, two null columns are not considered equal. When a nullable column participates in a predicate in the WHERE or HAVING clause, the nulls that are encountered cause the comparison to evaluate to UNKNOWN.
When a nullable column participates in a calculation, the result is null.
Columns that participate in a primary key cannot be null.
To test for the existence of nulls, use the special predicate IS NULL in the WHERE clause of the SELECT statement. You cannot simply state WHERE column = NULL. You must state WHERE column IS NULL.
It is invalid to test if a column is <> NULL, or >= NULL. These are all meaningless because null is the absence of a value.

Examine these rules closely. ORDER BY, GROUP BY, DISTINCT, and unique indexes consider nulls to be equal and handle them accordingly. The SELECT statement, however, deems that the comparison of null columns is not equivalence, but unknown. This inconsistent handling of nulls is an anomaly that you must remember when using nulls.

Here are a couple of other issues to consider when nulls are involved.

Did you know it is possible to write SQL that returns a NULL even if you have no nullable columns in your database? Assume that there are no nullable columns in the EMP table (including SALARY) and then consider the following SQL:


SELECT SUM(SALARY)
FROM   EMP
WHERE  DEPTNO > 999;

The result of this query will be NULL if no DEPTNO exists that is greater than 999. So it is not feasible to try to design your way out of having to understand nulls!

Another troubling issue with NULLs is that some developers have incorrect expectations when using the NOT IN predicate with NULLs. Consider the following SQL:


SELECT C.color
FROM   Colors AS C 
WHERE  C.color NOT IN (SELECT P.color 
                       FROM   Products AS P);

If one of the products has its color set to NULL, then the result of the SELECT is the empty set, even if there are colors to which no other product is set.

Summary

Nulls are clearly one of the most misunderstood features of DB2 – indeed, of most SQL database systems. Although nulls can be confusing, you cannot bury your head in the sand and ignore nulls if you choose to use DB2 as your DBMS. Understanding what nulls are, and how best to use them, can help you to create usable DB2 databases and design useful and correct queries in your DB2 applications.

Friday, September 24, 2010

A Recommended New DB2 Book

Judy Nall has performed a much-needed service for the DB2 for z/OS community by writing her new book, DB2 9 System Administration for z/OS: Certification Study Guide. There are many DB2 for z/OS books (heck, I wrote one myself) that cover programming, performance, and database administration details. But never before has there been one that focused on system administration and system programming.

Of course, the book is targeted at those looking to become an IBM Certified System Administrator for DB2 for z/OS. I have never taken the exams required for that certification, but the material in this book will go a long way toward making you a better system programmer for a mainframe DB2 environment.

Whereas some of the material can be found in greater detail in other books on the market, we must keep in mind that target market for the book. And the coverage of DB2 fundamentals and performance is well-written and hits the mark for systems folks. And the chapters on installation and migration, system backup and recovery, and systems operation and troubleshooting offer great systems-level knowledge not found in other DB2 for z/OS books.

So while DB2 9 System Administration for z/OS: Certification Study Guide is not for everyone, the people that it is for (systems programmers and systems DBAs) should enjoy it and benefit from the nice job Judy has done organizing and explaining the details of system administration for DB2 for z/OS.

Thursday, August 26, 2010

Free DB2 Education Webinar Series

Want to learn more about DB2 for z/OS but there is no money in the education budget? Can you spare an hour a week over the course of a month? Well then, you are in luck because SoftwareOnZ is sponsoring a series of DB2 webinars presented by yours truly, Craig S. Mullins…

Each webinar will be focused on a specific DB2 topic so you can pick and choose the ones that are most interesting to you – or attend them all and receive a certificate signed by me indicating that you have completed The DB2 Education Webinar Series.

The schedule and topics for these sessions follows:

September 28, 2010 – DB2 Access Paths: Surviving and Thriving

Binding your DB2 programs creates access paths that dictate how your applications will access DB2 data. But it can be tricky to understand exactly what is going on. There are many options and it can be difficult to select the proper ones… and to control when changes need to be made.

This presentation will clarify the BIND process, enabling you to manage DB2 application performance by controlling your DB2 access paths. And it will introduce a new, GUI-based product for managing when your programs need to be rebound.

October 5, 2010 – Optimizing DB2 Database Administration

DB2 DBAs are tasked with working in a complex technological environment, and as such, the DBA has to know many things about many things. This makes for busy days. How often have you asked yourself, “Where does the time go?”

Well, the more operational duties that can be automated and streamlined, the more effective a DBA can be. This presentation will address issues that every DB2 Database Administrator and/or DB2 Systems Programmer faces on a daily basis. And it will introduce a new tool, DB-Genie, that will reduce the amount of time, effort, and human error involved in maintaining DB2 databases.

October 12, 2010 – DB2 Storage: Don’t Ignore the Details!

For many DB2 professionals, storage management can be an afterthought. What with designing, building, and maintaining databases, assuring recoverability, monitoring performance, and so on, keeping track of where and how your databases are stored is not top of mind. But a storage problem can bring your databases and applications to a grinding halt, so it is not wise to ignore your storage needs.

This presentation will discuss the important storage-related details regarding DB2 for z/OS, including some of the newer storage options at your disposal. And we will also introduce a new web-based tool for monitoring all of your mainframe DB2 storage.

October 19, 2010 – The DB2 Application Developer’s Aid de Camp

Building DB2 application programs is a thankless job. And it can be difficult to ensure that you have a effective and efficient development environment for coding DB2 applications. Can you easily identify which tables are related to which… and what indexes are available so you code queries the right way the first time? Do you have the right data to test your programs? Can you make quick and dirty changes to just a few tables or rows without having to write yet another program?

This presentation will discuss the issues and difficulties that developers encounter on a daily basis as they build DB2 applications… and it will present a useful programmer-focused toolset for overcoming these difficulties.

Summary

Certainly there will be something of interest for every DB2 professional in at least one, if not all, of these complimentary web-based seminars.

So what’s stopping you? Sign up today!

Thursday, August 05, 2010

DB2 Best Practices

With today's blog entry I'm hoping to encourage some open-ended dialogue on best practices for DB2 database administration. Give the following questions some thought and if you've got something to share, post a comment!

What are the things that you do, or want to do, on a daily basis to manage your database infrastructure?

What things have you found to be most helpful to automate in administering your databases? Yes, I know that all the DBMS vendors are saying that they've created the "on demand" "lights-out" "24/7" database environment, but we all know that ain't so! So what have you done to automate (either using DBMS features, tools, or homegrown scripts) to keep an eye on things?

How have you ensured the recovery of your databases in the case of problems? Application problems? Against improper data entry or bad transactions? Disaster situations? And have you tested your disaster recovery plans? If so, how? And were they successful?

What type of auditing is done on your databases to track who has done what to what data? Do you audit all changes? To all applications, or just certain ones? Do you audit access, as well as modification? If so how?

How do you manage change? Do you use a change management tool or do it all by hand? Are database schema changes integrated with application changes? If so, how? If not, how do you coordinate things to keep the application synchronized with the databases?

What about DB2 storage management? Do you actively monitor disk usage of your DB2 table space and index spaces? Do you have alerts set so that you are notified if any object is nearing its maximum size? How about your VSAM data sets? Do you monitor extents and periodically consolidate? How do you do it... ALTER/REORG? Defrag utilities? Proactive defrag?

Is your performance management set up with triggers and farmed out to DBAs by exception or is it all reactive, with tuning tasks being done based on who complains the loudest?

Do you EXPLAIN every SQL statement before it goes into production? Does someone review the acess plans or are they just there to be reviewed in case of production performance problems? Do you rebind your programs periodically (for static SQL programs) as your data volume and statistics change, or do you just leave things alone until (or unless) someone complains?

When do you reorganize your data structures? On a pre-scheduled regular basis or based on database statistics? Or a combination of both? And how do you determine which are done using which method? What about your other DB2 utilities? Have you automated their scheduling or do you still manually build JCL?

How do you handle PTFs? Do you know which have been applied and which have not? And what impact that may be having on your database environment and applications? Do you have a standard for how often PTFs are applied?

How is security managed? Do the DBAs do all of the GRANTs and REVOKEs or is that job shared by security administrators? Are database logons coordinated across different DBMSs? Or could I have an operating system userid that is different from my SQL Server logon that is different than my Oracle logon -- with no capability of identifying that the user is the same user across the platforms?

How has regulatory compliance (e.g. PCI DSS, SOX, etc.) impacted your database administration activities? Have you had to purchase additional software to ensure compliance? How is compliance policed at your organization?

Just curious... Hope I get some responses!