One of the key
design issues that is addressed in most DB2 applications is how to implement
character data that varies in size from row to row. The basic trade-off is
“ease-of-use and performance” versus “storage requirements.” It is possible to
save storage space by using variable columns instead of placing small
amounts of data in a large fixed space. Each variable column carries a 2-byte
overhead, however, for storing the length of the data. Additionally, variable
columns tend to increase CPU usage and can cause the update process to become
inefficient. When a variable column is updated with a larger value, the row
becomes larger; if not enough space is available to store the row, it must be
moved to another page. This makes the update and any subsequent retrieval
slower.
Follow these
rules when defining variable character columns:
- Avoid variable columns if a sufficient disk space is available to store the data using fixed columns. The general rule is: disk is cheap! Trading the cost of disk for more efficient development and subsequent performance is often an easy decision to make.
- Though this may be obvious, do not define a variable column when the data does not vary from row to row.
- Do not define a variable columns if its maximum length is less than 30 bytes. Furthermore, do not define a variable column if its maximum length is within 10 bytes of the average length of the column. Adhering to these two rules will cause you to choose VARCHAR data types only when they can potentially provide enough DASD savings to offset other costs.
- Consider redefining variable columns by placing multiple rows of fixed length columns in another table or by shortening the columns and placing the overflow in another table.
If, after following these guidelines, VARCHAR columns need to be implemented, go ahead and do so. However, it is wise to continuously re-assess the decision to use variable character data.
So how can you monitor the effectiveness of variable columns? Well, it is possible to query the DB2 Catalog to determine the effectiveness of
using VARCHAR for a column instead of CHAR. Consider, for example, the PROJNAME
column of the DSN81010.PROJ table. It is defined as VARCHAR(24).
To gauge
whether VARCHAR is appropriate for this particular column, consider issuing the following SQL query:
SELECT COL_LGTH, COUNT(*) FROM (SELECT LENGTH(PROJNAME) AS COL_LENGTH FROM DSN81010.PROJ
) GROUP BY COL_LGTH ORDER BY COL_LGTH;
This query will
produce a report listing the lengths (in this case, from 1 to 24, excluding
those lengths which do not occur) and the number of times that each length
occurs in the table. These results can
be analyzed to determine the range of lengths stored within the variable
column.
If you are not
concerned about this level of detail, the following query can be used instead
to summarize the space characteristics of the variable column in question:
SELECT 24*COUNT(*), SUM(2+LENGTH(PROJNAME)), 24*COUNT(*)-SUM(2+LENGTH(PROJNAME)), 24, AVG(2+LENGTH(PROJNAME)), 24-AVG(2+LENGTH(PROJNAME)) FROM DSN81010.PROJ;
This
query produces a report similar to the one shown below:
SPACE SPACE TOTAL AVG. AVG. AVG. USED AS USED AS SPACE SPACE AS SPACE AS SPACE CHAR(24) VARCHAR(24) SAVED CHAR(24) VARCHAR(24) SAVED -------- ----------- ------ --------- ----------- ----- 158058 96515 61543 24 16 8
The following
list itemizes the definition for each of the individual columns calculated by
this query:
Definition
|
Calculation
|
Space
Used As CHAR(24)
|
24*COUNT(*)
|
Space
Used As VARCHAR(24)
|
SUM(2+LENGTH(PROJNAME))
|
Total
Space Saved Using VARCHAR
|
24*COUNT(*)-SUM(2+LENGTH(PROJNAME))
|
Avg.
Space Used As CHAR(24)
|
24
|
Avg.
Space Used As VARCHAR(24)
|
AVG(2+LENGTH(PROJNAME))
|
Avg.
Space Saved Using VARCHAR
|
24-AVG(2+LENGTH(PROJNAME))
|
The
query can be modified to be used for any VARCHAR-defined column. The constant
24 can be changed to indicate the maximum length of the variable column as
defined in the DDL. Using these tools, you can better judge the actual disk space savings accruing as a result of VARCHAR usage.
Synopsis
There
are sound reasons for using variable data types within DB2 databases. Yet, it
is important to remember that business conditions change and what may have been
a sound reason for using VARCHAR data in the past, may no longer be sound. Use
the queries in this article to determine if it is still appropriate to use
VARCHAR data. In general, do not use VARCHAR for small columns or for columns
whose length does not vary considerably.