Sunday, April 23, 2006

Using Nulls in DB2

A null represents missing or unknown information at the column level. If a column “value” can be null, it can mean one of two things: the attribute is not applicable for certain occurrences of the entity, or the attribute applies to all entity occurrences, but the information may not always be known. Of course, it could be a combination of these two situations, too.

A null is not the same as 0 or blank. Null means no entry has been made for the column and it implies that the value is either unknown or not applicable.

Because DB2 supports null you can distinguish between a deliberate entry of 0 (for numerical columns) or a blank (for character columns) and an unknown or inapplicable entry (NULL for both numerical and character columns). Null indicates that the user did not explicitly make an entry or has explicitly entered NULL for the column. For example, a null value in the Price column of the ITEM table in some database does not mean that the item is being given away for free; instead it means that the price is not known or has not yet been set.

Nulls sometimes are inappropriately referred to as “null values.” Using the term value to describe a null is inaccurate because a null implies the lack of a value. Therefore, simply use the term null or nulls (without appending the term “value” or “values” to it).

A Few Examples

When are nulls useful? Well, defining a column as NULL provides a place holder for data you might not yet know. For example, when a new employee is hired and is inserted into the EMP table, what should the employee termination date column be set to? I don’t know about you, but I wouldn’t want any valid date to be set in that column for my employee record. Instead, null can be used to specify that the termination date is currently unknown.

Let’s consider another example. Suppose that we also capture employee’s hair color when they are hired. Consider three potential entity occurrences: a man with black hair, a woman with unknown hair color, and a bald man. The woman with the unknown hair color and the bald man both could be assigned as null, but for different reasons. The woman’s hair color would be null meaning presently unknown; the bald man’s hair color could be null too, in this case meaning not applicable.

How could you handle this without using nulls? You would need to create special values for the HairColor column that mean “bald” and “unknown.” This is possible for a CHAR column like HairColor. But what about a DB2 DATE column? All occurrences of a column assigned as a DATE data type are valid dates. It might not be possible to use a special date value to mean “unknown.” This is where using nulls is most practical.

DB2 does not differentiate between nulls that signify unknown data and those that signify inapplicable data. This distinction must be made by the program logic of each application.
Keep in mind, though, that using null to indicate “not applicable” can be an indication of improper database design. By properly modeling and normalizing your data structures you can usually eliminate the need to use nulls to indicate that a column is inapplicable for a specific row. For example, consider the following table:

CREATE TABLE EMP
(EMPNO INTEGER NOT NULL,
LAST_NAME CHAR(20) NOT NULL,
FIRST_NAME CHAR(15) NOT NULL,
STREET_ADDR CHAR(30) NOT NULL WITH DEFAULT,
CITY CHAR(12) NOT NULL WITH DEFAULT,
STATE CHAR(2) NOT NULL WITH DEFAULT,
POSTAL_CODE CHAR(10) NOT NULL WITH DEFAULT,
EMP_TYPE CHAR(1) NOT NULL
CHECK(EMP_TYPE IN 'F', 'C', 'P'),
HIRE_DATE DATE,
SALARY DECIMAL(9,2),
BILLING_RATE DECIMAL(5,2));

In this case, we have a code in the EMP_TYPE column that can contain F (full-time), C (contractor), or P (part-time). We also have a SALARY column that is populated for full-time and part-time employees, but is set to null for contractors; and a BILLING_RATE column that is populated for contractors but set to null for full-time and part-time employees. Additionally, the HIRE_DATE column is set to null for contractors.

Well, here we have three columns that are set to null (or not) based on other values in the table. We can design our way out of this problem by creating a separate table for employees and contractors. If additional columns were needed for full-time employees that did not apply part-time employees we might even split the employee table into two: one for full-time and another for part-time. After doing so, there is no more need to use null for inapplicable data.

Indicator Variables

DB2 represents null in a special “hidden” column known as an indicator variable. An indicator variable is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to an end user, but must be provided for when programming in a host language (such as COBOL or PL/I).

The null indicator is used by DB2 to track whether its associated column is null or not. A positive value or a value of 0 means the column is not null and any actual value stored in the column is valid. If a CHAR column is truncated on retrieval because the host variable is not large enough, the indicator value will contain the original length of the truncated column. A negative value indicates that the column is set to null. If the value is -2 then the column was set to null as the result of a data conversion error.

Let’s take a moment to clear up a common misunderstanding right here: nulls NEVER save storage space in DB2 for OS/390 and z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not.

DB2 for Linux, Unix, and Windows has a compression option that allows columns set to null to save space. Using this option causes DB2 to eliminate the unused space from a row where columns are set to null. This option is not available on the mainframe, though.

Syntax

Every column defined to a DB2 table must be designated as either allowing or disallowing nulls. A column is defined as nullable – meaning it can be set to NULL – in the table creation DDL. Null is the default if nothing is specified after the column name. To prohibit the column from being set to NULL you must explicitly specify NOT NULL after the column name. In the following sample table, COL1 and COL3 can be set to null, but not COL2, COL4, or COL5:

CREATE TABLE SAMPLE1
(COL1 INTEGER,
COL2 CHAR(10) NOT NULL,
COL3 CHAR(5),
COL4 DATE NOT NULL WITH DEFAULT,
COL5 TIME NOT NULL);

In SELECT statements, testing for null is accomplished differently than testing for other “values.” You cannot specify WHERE COL = NULL, because this does not make any sense. Remember, null is a lack of a value, so the column does not equal anything. Instead, you would have to code WHERE COL IS [NOT] NULL.

In INSERT statements NULL can be specified in the VALUES clause to indicate that a column is to be set to NULL; but in UPDATE statements you can use the equality predicate (=) to assign a column to NULL.

When inserting data, if the user fails to make an entry in a column that allows nulls, DB2 supplies the NULL as a default (unless another default value exists). If an attempt to insert NULL is made against a column defined as NOT NULL, the statement will fail.

Guidance

Now that you have a good understanding of the basics of nulls, let’s review some guidelines for their usage.

Whenever possible, avoid nulls in columns that must participate in arithmetic logic (for example, DECIMAL money values), and especially when functions will be used. The AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. To clarify with an example, if the COMM column is nullable, the result of the following query:

SELECT AVG(COMM)
FROM DSN8810.EMP;

is not the same as for this query:

SELECT SUM(COMM)/COUNT(*)
FROM DSN8810.EMP;

So to avoid confusion, avoid nulls in columns involved in math functions whenever possible.
When DATE, TIME, and TIMESTAMP columns can be unknown, consider creating them as nullable. DB2 checks to ensure that only valid dates, times, and timestamps are placed in columns defined as such. If the column can be unknown, it must be defined to be nullable because the default for these columns is the current date, current time, and current timestamp (unless explicitly defined otherwise using the DEFAULT clause). Null, therefore, is the only viable option for the recording of missing dates, times, and timestamps (unless you pick a specific valid date that is not used by your applications to indicate unknown).

For every other column, determine whether nullability can be of benefit before allowing nulls. Consider these rules of operation:
When a nullable column participates in an ORDER BY or GROUP BY clause, the returned nulls are grouped at the high end of the sort order.
  • Nulls are considered to be equal when duplicates are eliminated by SELECT DISTINCT or COUNT (DISTINCT column).
  • A unique index considers nulls to be equivalent and disallows duplicate entries because of the existence of nulls, unless the WHERE NOT NULL clause is specified in the index.
  • For comparison in a SELECT statement, two null columns are not considered equal. When a nullable column participates in a predicate in the WHERE or HAVING clause, the nulls that are encountered cause the comparison to evaluate to UNKNOWN.
  • When a nullable column participates in a calculation, the result is null.
  • Columns that participate in a primary key cannot be null.
  • To test for the existence of nulls, use the special predicate IS NULL in the WHERE clause of the SELECT statement. You cannot simply state WHERE column = NULL. You must state WHERE column IS NULL.
  • It is invalid to test if a column is <> NULL, or >= NULL. These are all meaningless because null is the absence of a value.
Examine these rules closely. ORDER BY, GROUP BY, DISTINCT, and unique indexes consider nulls to be equal and handle them accordingly. The SELECT statement, however, deems that the comparison of null columns is not equivalence, but unknown. This inconsistent handling of nulls is an anomaly that you must remember when using nulls. The following are several sample SQL queries and the effect nulls have on them.

SELECT JOB, SUM(SALARY)
FROM DSN8810.EMP
GROUP BY JOB;

This query returns the average salary for each type of job. All instances in which JOB is null will group at the bottom of the output.

SELECT EMPNO, PROJNO, ACTNO, EMPTIME,
EMSTDATE, EMENDATE
FROM DSN8810.EMPPROJACT
WHERE EMSTDATE = EMENDATE;

This query retrieves all occurrences in which the project start date is equal to the project end date. This information is clearly erroneous, as anyone who has ever worked on a software development project can attest. The query does not return any rows in which either dates or both dates are null for two reasons: (1) two null columns are never equal for purposes of comparison, and (2) when either column of a comparison operator is null, the result is unknown.

UPDATE DSN8810.DEPT
SET MGRNO = NULL
WHERE MGRNO = '000010';

This query sets the MGRNO column to null wherever MGRNO is currently equal to '000010' in the DEPT table.

When creating tables, treat nullable columns the same as you would any other column. Some DBAs advise you to place nullable columns of the same data type after non-nullable columns. This is supposed to assist in administering the null columns, but it does not really help – and it might hurt. Sequencing nullable columns in this manner provides no clear benefit and should be avoided.

Summary

Nulls are clearly one of the most misunderstood features of DB2 – indeed, of most SQL database systems. Although nulls can be confusing, you cannot bury your head in the sand and ignore nulls if you choose to use DB2 as your DBMS. Understanding what nulls are, and how best to use them, can help you to create usable DB2 databases and design useful and correct queries in your DB2 applications.

For those of you interested in hearing about nulls in more depth, please consider attending my presentation at IDUG 2006 in Tampa, FL. It is titled "Null and Void? Dealing with Nulls in DB2" and it will be held on Thursday, May 11th, at 8:30 AM. For more details on IDUG, including how to register check out IDUG on the web at http://conferences.idug.org/Default.aspx?alias=conferences.idug.org/na

Sunday, April 16, 2006

Mainframe Alive at Merrill Lynch

A recent article in Baseline Magazine highlights how Merrill Lynch & Company reinvigorated 420 financial programs stored on the mainframe by building Web services that can handle millions of interactions a day.

Merrill was looking to leverage its continuing heavy investment in mainframe applications and hardware by making functions in legacy applications available as Web services. Indeed, many companies with robust mainframe systems should be looking to do this - it can breathe new life into the applications without requiring a complete global re-write of still-working programs. And maintain the high availability, security, and performance of the mainframe.

According to the article, "Merrill has a huge IBM mainframe installation—one of the largest in the world—with 1,200 programmers supporting some 23,000 mainframe programs that process 80 million Customer Information Control System (CICS) transactions per day."

Yes, there is still life in mainframe systems. Wise organizations will continue to utilize their existing and vibrant mainframe systems by building web services on top of them.

Friday, March 31, 2006

DSNTIAD - The Dynamic SQL Update Program

In my last blog entry I posted about DSNTEP2, the batch SQL program. But there is another batch SQL program named DSNTIAD that is less widely-known and used.

DSNTIAD is an assembler application program that can issue the same DB2 dynamic SQL statements as DSNTEP2, with the exception of the SELECT statement. For this reason, applications programmers almost always prefer to use DSNTEP2 rather than DSNTIAD.

DSNTAID is written in Assembler language. Because DSNTIAD is a sample program, its source code could be modified to accept SELECT statements if you so desired. But this task is complex and should not be undertaken by a beginning programmer.

Here is sample JCL for running DSNTIAD:

//DB2JOBU JOB (UTILITY),'DB2 SAMPLE UPD',MSGCLASS=X,
// CLASS=X,NOTIFY=USER
//*
//**************************************************
//*
//* DSNTIAD - SAMPLE DB2 UPDATE PROGRAM
//*
//**************************************************
//*
//JOBLIB DD DSN=DSN810.DSNLOAD,DISP=SHR
//BATUPSQL EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSTSIN DD *
DSN SYSTEM(DSN)
RUN PROGRAM(DSNTIAD) PLAN(DSNTIAD6) -
LIB('DSN810.RUNLIB.LOAD')
END
/*
//SYSIN DD *
UPDATE DSN8810.DEPT
SET DEPTNAME = 'CHANGED NAME'
WHERE DEPTNO = 'D01' ;

INSERT INTO DSN8510.ACT
VALUES (129, 'XXXXXX', 'SAMPLE ACCT') ;

DELETE FROM DSN8510.EMP
WHERE SALARY < 1000 ;

CREATE DATABASE TESTNAME
BUFFERPOOL BP12
STOGROUP DSN8G510 ;

GRANT DBADM ON TESTNAME TO USERA ;
/*
//


So why would anyone consider using DSNTIAD over DSNTEP2? Well, DSNTIAD supports the LABEL ON statement, whereas DSNTEP2 does not. But unlike DSNTEP2, DSNTIAD does not accept comments embedded in SQL statements.

Also note that DSNTIAD can be a little more efficient than DSNTEP2 because it is written in Assembler.

Friday, March 24, 2006

DSNTEP2 aka Batch SPUFI

DSNTEP2 is an application program that can be used to issue DB2 dynamic SQL statements. It is sometimes referred to as “Batch SPUFI” because it allows you to submit SQL in batch similar to how SPUFI allows online SQL execution.

The following sample JCL demonstrates the capability of DSNTEP2 to issue DCL, DDL, and DML dynamically.

//DB2JOBU JOB (BATCHSQL),'DB2 SAMPLE SQL',MSGCLASS=X,
// CLASS=X,NOTIFY=USER
//*
//***********************************
//*
//* DB2 SAMPLE SQL PROGRAM
//*
//***********************************
//*
//JOBLIB DD DSN=DSN810.DSNLOAD,DISP=SHR
//BATCHSQL EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSTSIN DD *
DSN SYSTEM(DSN)
RUN PROGRAM(DSNTEP2) PLAN(DSNTEP81) -
LIB('DSN810.RUNLIB.LOAD')
END
/*
//SYSIN DD *
SELECT * FROM SYSIBM.SYSTABLES;

UPDATE DSN8810.DEPT
SET DEPTNAME = 'CHANGED NAME'
WHERE DEPTNO = 'D01' ;

INSERT INTO DSN8810.ACT
VALUES (129, 'XXXXXX', 'SAMPLE ACCT');

DELETE FROM DSN8810.EMP
WHERE SALARY <>

/*

The DNSTEP2 program is written in the PL/I programming language. Prior to DB2 V6, you needed to have a PL/I compiler to use DSNTEP2. However, as of V6 IBM now provides both the source code and an object code version of DSNTEP2 with DB2. So, you no longer need a PL/I compiler to use DSNTEP2.

Because DSNTEP2 is an application program, and the PL/I source code is provided with DB2, a knowledgeable PL/I programmer can easily modify the code. After doing so, of course, it must be compiled, linked, and bound before it can be used.

DSNTEP2 can process almost every SQL statement that can be executed dynamically. DSNTEP2 accepts
· The GRANT and REVOKE DCL statements,
· The ALTER, COMMENT ON, CREATE, and DROP DDL statements,
· The DELETE, INSERT, SELECT, and UPDATE DML statements, and
· The COMMIT, ROLLBACK, EXEC SQL, EXPLAIN, and LOCK statements.

The only important statement that DSNTEP2 does not support is the LABEL ON DDL statement. Of course, DSNTEP2 can be modified to support this statement if you have PL/I knowledge and a PL/I compiler.

When Does DSNTEP2 Commit?

Well, the simple answer to that question is that the results of the SQL are committed upon completion of all the SQL. A helpful answer is a little longer.

First off, you need to know that DSNTEP2 has an internal parameter named MAXERRORS that controls the number of failing statements that can occur before it stops. A failing statement is one which returns a negative SQLCODE. The value of MAXERRORS is set to 10 inside the program, so DSNTEP2 will allow 9 failing SQL statements but when it hits the 10th failing statement, it will exit, COMMITting all other work.

This is ugly because it can wreak havoc on the integrity of your data. I mean, who wants to figure out what was run, what was impacted, and then try to rebuild a job to fix data and/or restart at the right place? To rerun DSNTEP2, remember that all SQL statements that completed with a 0 SQL code were committed. These statements should not be rerun. All SQL statements completed with a negative SQL code must be corrected and reprocessed.

Certain severe errors cause DSNTEP2 to exit immediately. One severe error is a -101 “SQL statement too long or too complex".

If any SQL errors occurred during the execution of DSNTEP2, a return code of 8 is returned by the job step.

At any rate, DSNTEP2 never issues an explicit COMMIT or ROLLBACK by itself. A COMMIT occurs at the end unless the program abends.

Specify Your SQL Statements Properly

The SQL to be run by DSNTEP2 is specified in SYSIN. Be sure to code the DSNTEP2 input properly.

DSNTEP2 reads SQL statements from an input data set with 80-byte records. The SQL statements must be coded in the first 72 bytes of each input record. SQL statements can span multiple input records and are terminated by a semicolon (;). Semicolons are not permitted in the text of the SQL statement.

Liberally Comment DSNTEP2 Input

Comments can be passed to DSNTEP2 in the SQL statements using two hyphens in columns 1 and 2 or a single asterisk in column 1. Doing so is good form and helps others to understand what your DSNTEP2 job is attempting to accomplish.

Bottom Line

DSNTEP2 is especially useful for running one-off SQL statements. Use DSNTEP2 when you have ad hoc SQL to run in a batch environment. DSNTEP2 is easier than writing your own quick and dirty programs to run ad hoc SQL in batch. It is simple to set up and saves time. But be careful if you have multiple SQL modification statements (INSERT, UPDATE, DELETE) because of the above-mentioned possibility of causing data integrity problems if some of the statements fail and others succeed.

Tuesday, March 21, 2006

IBM Announces z/OS V1.8

On February 28, 2006 IBM announced the latest planned upgrade of its stalwart mainframe operating system, z/OS. IBM has announced some interesting new functionality for the new version, V1.8. Planned are improvements in scale, availability, security, and resource optimization. Also, there is an increased focus on simplification to make z/OS easier to use.

The actual IBM announcement can be read here.

Further details can be found in this handy summary of z/OS futures planning put together by Jim Schesvold, a technical editor for IBM Systems Magazine.

IBM plans for z/OS V1.8 to be available in September 2006.