Welcome to Part 3 in my on-going blog series on The Most
Misunderstood Features of DB2. You can find the first two parts here: Part 1
(on locking) and Part 2 (OPTIMIZE FOR v. FETCH FIRST). Today’s topic is one
that confuses many SQL developers, Nulls.
What is a Null?
A null represents missing or unknown information at the column
level. When a column is set as null, it can mean one of two things: the
attribute is not applicable for certain occurrences of the entity, or the
attribute applies to all entity occurrences, but the information may not always
be known. Of course, it could be a combination of these two situations, too.
A null is not the same as 0 or blank. Null means no information has
been specified for the column and it implies that the value is either unknown
or not applicable.
Because DB2 supports null you can distinguish between a
deliberate entry of 0 (for numerical columns) or a blank (for character
columns) and an unknown or inapplicable entry (NULL for both numerical and
character columns). Null indicates that the user did not explicitly make an
entry or has explicitly entered NULL for the column. For example, a null “value”
in the Price column of the ITEM table in some database does not mean that the
item is being given away for free; instead it means that the price is not known
or has not yet been set.
Nulls sometimes are inappropriately referred to as “null
values.” Using the term value to describe a null is inaccurate because a null
implies the lack of a value. Therefore, it is a good practice to simply use the
term null or nulls (without appending the term “value” or “values” to it).
A Few Examples
When are nulls useful? Well, defining a column as NULL provides
a placeholder for data you might not yet know. For example, when a new employee
is hired and is inserted into the EMP table, what should the employee
termination date column be set to? I don’t know about you, but I wouldn’t want
any valid date to be set in that column for my employee record. Instead, null
can be used to specify that the termination date is currently unknown.
Let’s consider another example. Suppose that we also capture
employee’s hair color when they are hired. Consider three potential entity
occurrences: a man with black hair, a woman with unknown hair color, and a bald
man. The woman with the unknown hair color and the bald man both could be
assigned as null, but for different reasons. The woman’s hair color would be
null meaning presently unknown; the bald man’s hair color could be null too, in
this case meaning not applicable. But there are probably better methods of
dealing with the bald man’s hair color.
How could you handle bald without using nulls? You could create
special values for the HairColor column that mean “bald” and “unknown.” This is
possible for a CHAR column like HairColor, but what about a DB2 DATE column?
All occurrences of a column assigned as a DATE data type are valid dates. It
might not be possible to use a special date value to mean “unknown.” This is
where using a null is most practical.
DB2 does not differentiate between nulls that signify unknown
data and those that signify inapplicable data. This distinction must be made by
the program logic of each application.
Keep in mind, though, that using null to indicate “not
applicable” can be an indication of improper database design. By properly modeling
and normalizing your data structures you can usually eliminate the need to use
nulls to indicate that a column is inapplicable for a specific row.
DB2 and Indicator VariablesDB2 represents null in a special “hidden” column known as an
indicator variable. An indicator variable is defined to DB2 for each column
that can accept nulls. The indicator variable is transparent to the end user,
but must be managed when programming in a host language (such as Java, COBOL or
PL/I).
The null indicator is used by DB2 to track whether its
associated column is null or not. A positive value or a value of 0 means the
column is not null and any actual value stored in the column is valid. If a
CHAR column is truncated on retrieval because the host variable is not large
enough, the indicator value will contain the original length of the truncated
column. A negative value indicates that the column is set to null. If the value
is -2 then the column was set to null as the result of a data conversion error.
Let’s take a moment to clear up one common misunderstanding
right here: nulls NEVER save storage space in DB2 for z/OS. Every nullable
column requires one additional byte of storage for the null indicator. So, a
CHAR(10) column that is nullable will require 11 bytes of storage per row – 10
for the data and 1 for the null indicator. This is the case regardless of
whether the column is set to null or not.
DB2 for Linux, Unix, and Windows has a compression option that
allows columns set to null to save space. Using this option causes DB2 to
eliminate the unused space from a row where columns are set to null. This
option is not available on the mainframe, though.
Every column defined to a DB2 table must be designated as either
allowing or disallowing nulls. A column is defined as nullable – meaning it can
be set to NULL – in the table creation DDL. Null is the default if nothing is
specified after the column name. To prohibit the column from being set to NULL
you must explicitly specify NOT NULL after the column name. In the following
sample table, COL1 and COL3 can be set to null, but not COL2, COL4, or COL5:
CREATE TABLE SAMPLE1
(COL1 INTEGER,
COL2 CHAR(10) NOT NULL,
COL3 CHAR(5),
COL4 DATE NOT NULL WITH DEFAULT,
COL5 TIME NOT NULL);
In SELECT statements, testing for null is accomplished
differently than testing for other “values.” You cannot specify WHERE COL =
NULL, because this does not make any sense. Remember, null is a lack of a value
so the column does not equal anything. Instead, you would
have to code WHERE COL IS [NOT] NULL. In INSERT statements NULL can be
specified in the VALUES clause to indicate that a column is to be set to NULL;
but in UPDATE statements you can use the equality predicate (=) to assign a
column to NULL. When inserting data, if the user fails to make an entry in a
column that allows nulls, DB2 supplies the NULL as the default (unless another
default value exists). If an attempt to insert NULL is made against a column
defined as NOT NULL, the statement will fail.
More Confusion
OK, now that we have discussed what nulls are and the basics of
using nulls in DB2, let’s look at how the presence of nulls can confuse SQL
developers.
The AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column
occurrences set to null. The COUNT(*) function, however, does not omit columns
set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*)
when the average is being computed for a column that can contain nulls. You
must be aware of which columns can be null as well as which functions operate
on columns versus rows to make sure you code correctly!
It is important to understand, too, that for comparison in a
SELECT statement, two null columns are not considered to be equal. When a
nullable column participates in a predicate in the WHERE or HAVING clause, the
nulls that are encountered cause the comparison to evaluate to UNKNOWN. But ORDER
BY, GROUP BY, DISTINCT, and unique indexes consider nulls to be equal and
handle them accordingly.
Additionally, when a column set to null participates in a
calculation, the result is null. This is true even if the calculation is NULL/0
(which as every math savvy person knows should be an error). But because there
is a null in the calculation, the result will be null!
Let’s look at another statement that confuses a lot of people:
SELECT SUM(SALARY)
FROM EMP
WHERE JOBCODE > 999;
Assume that SALARY is defined as NOT NULL. Furthermore, assume
that the largest JOBCODE is 500 (that is, less than the 999 we are checking
for). What is the results of this SQL?
No rows satisfy the predicate. As such, many people say the
results should be 0. But it is not. The result is null! Yes, you can get a null
result when performing functions on a non-nullable column. The predicate
resolves to the empty set and the sum of the salaries in the empty set is null
because there are no employees, and hence no salaries. Therefore, you better
make sure that you code a null-indicator in your program when you write code
like this.
The Bottom Line
Nulls are confusing, but cannot be completely avoided in DB2.
Take the time to learn how nulls work and how they impact your coding and
development practices… or you will certainly make mistakes.
----------------------------------------------------------------------------------------------------------
Additional resources for understanding nulls and their usage: