It is impossible to over-estimate the importance of database
design on the effectiveness and efficiency of application systems. The first
step, of course, is to create a logical data model of the business information
that must be stored in, and accessed through, the database. This is a
non-trivial task, but it is not the focus of today’s blog post, which is
implementing optimal and well-designed database schemata.
Oftentimes the database design does not get the attention it
deserves. This can occur for numerous reasons such as:
- Insufficient specifications and/or poor logical data modeling
- Not enough time in the development schedule
- Too many changes occurring throughout the development cycle
- Database design assigned to, or performed by novices
And even when the database design is being performed by
experienced professionals with sufficient time and a solid logical model, it is
very easy for design flaws to creep into the database. This is especially the
case for larger and more complex databases required to support mission critical
applications.
Of course, the first step in constructing a physical database
should be transforming the logical design using best practices. The
transformation consists of the following:
- Transforming entities into tables
- Transforming attributes into columns
- Transforming domains into data types and constraints
- Transforming relationships into primary and foreign keys
But a simple transformation will not result in a complete and
correct physical database design – it is merely the first step. And design flaws can be introduced even
during such a transformation.
The process of normalizing your data should be conducted during
the logical design phase, but sometimes mistakes are made during the logical
modeling process. In simple terms, normalization
is the process of identifying the one best place where each fact belongs. A
normalized design minimizes data redundancy and optimizes data structures by
systematically and properly placing data elements into the appropriate
groupings.
- First normal form (1NF) eliminates repeating groups and non-atomic data from an entity.
- Second normal form (2NF) ensures that all the attributes of each entity are dependent on the primary key.
- Third normal form (3NF) ensures that no relationships between attributes exist within an entity.
Although normalization is a logical process and does not
necessarily dictate physical database design it is good practice to implement
normalized physical databases – especially with today’s powerful hardware and
database systems. A normalized data model will ensure that each entity is well
formed and that each attribute is assigned to the proper entity. And
denormalized data structures should always be documented.
A database that is not normalized can be difficult to modify
(because facts may be strewn about within the database) and prone to data
quality issues (again because one fact may be in more than one place). Failing
to achieve appropriate normalization is a significant design flaw.
Design flaws can make it difficult to access a database in an
efficient manner. Something as simple as creating columns with inconsistent
data type and length across tables can introduce inefficiency. For example,
perhaps JOB_CODE is defined as CHAR(10) in one table and CHAR(15) in another.
If these columns need to be compared or joined, additional work is required by
the DBMS to make the columns comparable.
It is easy to see how a database design issue can make applications
inefficient no matter how adept the developers are. And this is just one type
of design flaw.
There are numerous other types of design flaws that can
negatively impact the usability and efficiency of a database implementation
(and the applications that use it). Improperly defined constraints
(referential, check, uniqueness) can cause data quality problems. Improper
indexing (to support constraints and other physical structures) can cause a
database to operate inefficiently or even cease operating altogether. In DB2
for z/OS, failing to explicitly name a clustering index causes DB2 to default
cluster sequence to the oldest index. Changing index structures, which
sometimes require dropping and recreating the index, can cause the data to be
ordered in a completely different physical sequence, thereby impacting
performance.
Some database design flaws are more subtle. What happens if you
create two incompatible check constraints? For example, consider the following:
CHECK (empno < 100)
CHECK (empno >= 101)
No data can be added to the table because no number is less than
100 and greater than or equal to 101. Of course, this is an extreme example to
highlight the problem. Another situation can occur if the default value is not
one of the values permitted by the check constraint, for example:
emp_type CHAR(8) DEFAULT 'new'
CHECK (emp_type IN ('temp',
'fulltime',
'contract')), ...
'fulltime',
'contract')), ...
Cycles are another potential problem that can be created in a
physical database schema. A cycle is a referential path that connects a table
to itself. This can occur when multiple tables are related back to each other
and it looks like a loop when diagrammed. DB2 forbids a table from being
delete-connected to itself in a cycle. A table is delete-connected to another
table if it is a dependent of a table specified with a CASCADE delete rule.
Nested triggers can also create problems. A trigger is fired by
an INSERT, UPDATE, or DELETE operation. However, a trigger can also contain
INSERT, UPDATE, or DELETE logic within itself. Therefore, a trigger is fired by
a data modification, but can also cause another data modification, thereby
firing yet another trigger. When a trigger contains INSERT, UPDATE, and/or
DELETE logic (including MERGE statements), the trigger is said to be a nested
trigger. If referential integrity is combined with triggers, additional
cascading updates and/or deletes can occur.
DB2 restricts the number of nested triggers that can be executed
within a single firing event. If this were not done, it could be quite possible
to have triggers firing triggers ad infinitum until all of the data was removed
from an entire database. DB2 limits this cascading effect to 16 levels to
prevent endless looping. If more than 16 levels of nesting occur, the
transaction is aborted. Such nesting may restrict certain types of data
modification from happening at all because the number of nested calls will
always be exceeded. So nesting levels need to be controlled and managed in the
database schema to avoid problems.
And this is by no means a comprehensive list of database design
flaws that can cause real problems for application developers and end users.
Getting the database design correct is imperative if you wish to have any hope
of assuring application performance.
So how can you go about examining the correctness of your
database structures? The best approach is an automated one such as that
provided by Database Examiner, a product offered by DBE Software that performs a
comprehensive validation of a database schema. The in-depth technology incorporated
into Database Examiner allows you to review your database schema, identify
problems, and remediate the issues to ensure a quality database implementation
offering peak performance.
Simply provide Database Examiner with the data model, DDL, or a
link to the active database and it will perform a series of comprehensive diagnostics. Database
Examiner applies the rules of relational technology to detect flaws,
inconsistencies and lack of integrity. The product applies more than 50
diagnostics that can be organized by category or severity levels. And you can
customize the diagnostics by selecting those to be executed and assigning each
one a severity level.
But Database Examiner doesn’t just highlight the flaws, it also helps to fix them be providing recommendations
and corrections for each issue it detects, including the generation of SQL DDL
scripts to remedy the flaws. Even better, Database Examiner provides multi-platform DBMS
support, including DB2 for z/OS, DB2 LUW, Oracle, Microsoft SQL Server and
Sybase.
And DBE Software is currently offering folks the ability to download
and use a scaled down “Performance Edition” of Database Examiner, free ofcharge for a limited time. Be sure to enter the promo code MULL on the download
request to get the best offer available.
Hopefully this blog entry has convinced you that database design
is important… important enough to take some time to evaluate the quality of
your existing database structures. And to take a look at automating the process
using Database Examiner (using the promo code MULL ).