Showing posts with label constraints. Show all posts
Showing posts with label constraints. Show all posts

Thursday, July 06, 2023

Top 10 Db2 Performance Tips - No. 2: Optimize Database Design

A well-designed database schema forms the foundation of a high-performing and efficient IBM Db2 database... and therefore, also serves as the basic starting point for efficient Db2 applications. The importance of optimizing the database design cannot be overstated, as it directly impacts query performance, data integrity, and overall system efficiency.

The Logical Data Model

The first step toward a proper database design is the creation of a logical data model. Before implementing databases of any sort, it is imperative to first develop a sound model of the data to be used. Novice database developers frequently begin with the quick-and-dirty approach to database implementation. They approach database design from a programming perspective. Because novices often lack experience with databases and data requirements gathering, they attempt to design databases like the flat files they are accustomed to using. This is a major mistake. Indeed, most developers using this approach quickly discover problems after the databases and applications become operational in a production environment. At a minimum, performance will suffer and data may not be as readily available as required. At worst, data integrity problems and/or performance problems may arise, rendering the entire application unusable.

The goal of a data model is to record the data requirements of a business process. The scope of the data model for each line of business must be comprehensive. A data model serves as lexicon for the data needs of the business... and as a blueprint for the physical implementation of the structures of the database.

A key component of building a proper data model is to ensure proper normalization. 

Normalization

Normalization reduces data redundancy and inconsistencies by ensuring that the data elements are designed appropriately. A series of normalization rules are applied to the entities and data elements, each of which is called a “normal form.” If the data conforms to the first rule, the data model is said to be in “first normal form,” and so on.

A database design in First Normal Form (1NF) will have no repeating groups and each instance of an entity can be identified by a primary key. For Second Normal Form (2NF), instances of an entity must not depend on anything other than the primary key for that entity. Third Normal Form (3NF) removes data elements that do not depend on the primary key. If the contents of a group of data elements can apply to more than a single entity instance, those data elements belong in a separate entity.

This is a quick and dirty introduction to normalization, and there are further levels of normalization not discussed here in order to keep the discussion moving along. For an introductory discussion of normalization visit http://wdvl.com/Authoring/DB/Normalization.

The bottom line is that normalization reduces data redundancy and improves data integrity by organizing data into logical entities and minimizing data duplication. By carefully analyzing the business requirements and applying normalization principles, database designers can create tables that are lean, efficient, and accurately represent the data model.

Relationships

Optimizing relationships between tables is another critical aspect of database design. Relationships, such as primary key-foreign key associations, define the logical connections between tables. This too, should be evident in the logical data model, which is frequently depicted as an entity/relationship diagram. 

Choosing appropriate indexing strategies, enforcing referential integrity, and carefully considering the cardinality and selectivity of relationships are crucial steps to ensure efficient query processing and join operations.

From Logical to Physical

Assuming you have a well-designed logical data model, the first step in moving to a physical database design is the process of transforming that logical data model into an actual physical database. The first step is to create an initial physical data model by transforming the logical data model into a physical implementation based on an understanding of the DBMS being used for deployment. To successfully create a physical database design you will need to have a good working knowledge of the features of the DBMS including:

  • In-depth knowledge of the database objects supported by the DBMS and the physical structures and files required to support those objects.

  • Details regarding the manner in which the DBMS supports indexing, referential integrity, constraints, data types, and other features that augment the functionality of database objects.

  • Detailed knowledge of new and obsolete features for particular versions or releases of the DBMS to be used.

  • Knowledge of the DBMS configuration parameters that are in place.

  • Data definition language (DDL) skills to translate the physical design into actual database objects.

Armed with the correct information, you can create an effective and efficient database from a logical data model. The first step in transforming a logical data model into a physical model is to perform a simple translation from logical terms to physical objects. Of course, this simple transformation will not result in a complete and correct physical database design – it is simply the first step. The transformation consists of the following:

  • Transforming entities into tables

  • Transforming attributes into columns

  • Transforming domains into data types and constraints

Data Types

To support the mapping of attributes to table columns you will need to map each logical domain of the attribute to a physical data type and perhaps additional constraints. In a physical database, each column must be assigned a data type. 

Selecting appropriate data types is vital for optimizing database design. Choosing the right data types can have a significant impact on storage requirements, query performance, and overall system efficiency. By selecting data types that accurately represent the data and minimize storage overhead, such as using integer types instead of character types for numeric values, assuring that date and time data use appropriate data/time data types, and choosing wisely between the various text and character data types for each column helps to improve data integrity, optimize storage utilization, and improve query execution speed.

Constraints

Furthermore, you will need to implement appropriate constraints, such as primary keys, unique constraints, and foreign keys. This enhances data integrity and query performance. Furthermore, additional constraints such as check constraints and nullability enable the DBMS to better enforce data integrity, instead of leaving it to application code written at a later time. Constraints enforce data consistency rules, ensure referential integrity, and provide the optimizer with valuable information for query optimization.

An Iterative Process

It is worth mentioning that database design optimization is an iterative process that should consider not only the current requirements but also the future growth and scalability of the system. Regularly reviewing and revisiting the database design as the application evolves can help identify areas for improvement and ensure that the database remains optimized over time.

Finally...

In conclusion, a well-designed database schema is fundamental to achieving optimal performance for your database applications. By focusing on strategies such as normalization, relationship optimization, appropriate data types, and constraints, database designers can create a robust and efficient database environment. Optimizing the database design not only enhances query performance and data integrity but also lays the groundwork for scalability and adaptability as the system evolves.

Monday, May 05, 2014

Using Check Constraints to Simulate Domains

Check constraints are a very useful, though somewhat underutilized feature of DB2. Check constraints enable enhanced data integrity without requiring procedural logic (such as in stored procedures and triggers). Let’s examine the basics of table check constraints.
A constraint is basically a restriction placed upon the data values that can be stored in a column or columns of a table. Of course, most RDBMS products provide several different types of constraints, such as referential constraints (to define primary and foreign keys) and unique constraints (to prohibit duplicates).
Check constraints place specific data value restrictions on the contents of a column through the specification of a Boolean expression. The expression is explicitly defined in the table DDL and is formulated in much the same way that SQL WHERE clauses are formulated. Any attempt to modify the column data (i.e. during INSERT and UPDATE processing) will cause the expression to be evaluated. If the modification conforms to the Boolean expression, the modification is permitted to continue.  If not, the statement will fail with a constraint violation.

This functionality is great for simulating the relational concept of a domain. A domain is basically the set of valid values that a column or data type can take on. Check constraints only simulate domains, though, because there are other features provided by domains that are not provided by check constraints. One such feature is that columns pooled from separate domains must not be compared or operated on by expressions that require the same type of data for both operands. For domains to truly be supported the DBMS should support both check constraints and user-defined data types with strong type checking. This prohibits allowing ridiculous operations, such as comparing IQ to shoe size or adding Australian dollars to Euros.

Forming Check Constraints

Check constraints are written using recognizable SQL syntax. This makes them easy to implement for anyone who has even a passing familiarity with SQL. The check constraint consists of two components: a constraint name and a check condition. 

The constraint name is an SQL identifier and is used to reference or identify the constraint. The same constraint name cannot be specified more than once for the same table. If a constraint name is not explicitly coded, DB2 will create a unique name automatically for the constraint. 

The check condition defines the actual constraint logic.  The check condition can be defined using any of the basic predicates (>, <, =, <>, <=, >=), as well as BETWEEN, IN, LIKE, and NULL.  Furthermore, AND and OR can be used to string conditions together.

There are, however, restrictions on how check constraints are formulated. Some of these restrictions include:
·         Limitations on the entire length of the check condition.
·         Other tables may not be accessed in the check condition.
·         Only a limited subset of SQL operations are permitted (for example subselects and column functions are prohibited in a check constraint).
·         One of the operands (usually the first) of the check constraint must be the name of a column contained in the table for which the constraint is defined.
·         The other operand (usually the second) must be either another column name in the same table or a constant value.
·         If the second operand is a constant, it must be compatible with the data type of the first operand.  If the second operand is a column, it must be the same data type as the first column specified.

Check Constraint Examples

Check constraints enable the DBA or database designer to specify more robust data integrity rules directly into the database.  Consider the following example:

CREATE TABLE EMP
      (EMPNO                      INTEGER

               CONSTRAINT CHECK_EMPNO
               CHECK (EMPNO BETWEEN 100 and 25000),

        EMP_ADDRESS     VARCHAR(70),
        EMP_TYPE               CHAR(8)

               CHECK (EMP_TYPE IN (‘TEMP’,   ‘FULLTIME’, ‘CONTRACT’)),

        EMP_DEPT              CHAR(3)                        NOT NULL WITH DEFAULT,
        SALARY                    DECIMAL(7,2)               NOT NULL

              CONSTRAINT CHECK_SALARY
               CHECK (SALARY < 50000.00),

        COMMISSION          DECIMAL(7,2),
        BONUS                      DECIMAL(7,2)
        );

The CREATE statement for the EMP table contains three different check constraints:

1.      The name of the first check constraint for the EMP table is CHECK_EMPNO.  It is defined on the EMPNO column.  The constraint ensures that the EMPNO column can contain values that range from 100 to 25000 (instead of the domain of all valid integers).
2.      The second check constraint for this table is on the EMP_TYPE column.   This is an example of an unnamed constraint.  Though this is possible, it is not recommended.  It is best to always provide an explicit constraint name in order to ease identification and administration.  This specific constraint restricts the values that can be placed into EMP_TYPE as: 'TEMP', 'FULLTIME', and 'CONTRACT';  no other values would be accepted.
3.      The last check constraint on this table is named CHECK_SALARY.  It effectively ensures that no employee can be entered with a salary of more than $50,000. (Now who would want to work there?)

Column vs. Table Level Constraints

The first check constraint example we reviewed showed a column-level check constraint. However, check constraints also may be coded at the table-level. A column-level check constraint is defined in the DDL immediately after the column. Appropriately enough, a table-level check constraint is defined after all of the columns of the table have already been defined.

It is quite common for business rules to require access to multiple columns within a single table.  When this situation occurs, it is wise to code the business rule into a check constraint at the table-level, instead of at the column level.  Of course, any column-level check constraint can also be defined at the table-level, as well.  In terms of functionality, there is no difference between an integrity constraint defined at the table-level and the same constraint defined at the column-level.

Let’s augment our sample table DDL to add two table-level check constraints:

CREATE TABLE EMP
      (EMPNO                      INTEGER
               CONSTRAINT CHECK_EMPNO
               CHECK (EMPNO BETWEEN 100 and 25000),
        EMP_ADDRESS     VARCHAR(70),
        EMP_TYPE               CHAR(8)
               CHECK (EMP_TYPE IN (‘TEMP’,   ‘FULLTIME’, ‘CONTRACT’)),
        EMP_DEPT              CHAR(3)                        NOT NULL WITH DEFAULT,
        SALARY                    DECIMAL(7,2)               NOT NULL
               CONSTRAINT CHECK_SALARY
               CHECK (SALARY < 50000.00),
        COMMISSION          DECIMAL(7,2),
        BONUS                      DECIMAL(7,2),

               CONSTRAINT COMM_VS_SALARY
               CHECK (SALARY > COMMISSION),

               CONSTRAINT COMM_BONUS
               CHECK (COMMISSION>0 OR BONUS > 0),
        );



The CREATE statement for the EMP table has been modified to contain two table-level check constraints having the following ramifications:

1.      The name of the first table-level check constraint for the EMP table is COMM_VS_SALARY.  This constraint will ensure that no employee can earn more commission than salary.
2.      The second table-level check constraint is named COMM_BONUS.   This constraint will ensure that every employee either earns a commission or a bonus (or possibly, both).

Check Constraint Benefits

So what are the benefits of check constraints? The primary benefit is the ability to enforce business rules directly in each database without requiring additional application logic. Once defined, the business rule is physically implemented and cannot be bypassed. Check constraints also provide the following benefits:
·         Because there is no additional programming required, DBAs can implement check constraints without involving the application programming staff. This effectively minimizes the amount of code that must be written by the programming staff.  With the significant application backlog within most organizations, this can be the most crucial reason to utilize check constraints.
·         Check constraints provide better data integrity.  As check constraints are always executed whenever the data in the column upon which they are defined is to be modified, the business rule is not bypassed during ad hoc processing and dynamic SQL. When business rules are enforced using application programming logic instead, the rules can not be checked during ad hoc processes.
·         Check constraints promote consistency. Because they are implemented once, in the table DDL, each constraint is always enforced. Constraints written in application logic, on the other hand, must be executed within each program that modifies any data to which the constraint applies. This can cause code duplication and inconsistent maintenance resulting in inaccurate business rule support.
·         Typically check constraints coded in DDL will outperform the corresponding application code.

The overall impact of check constraints will be to increase application development productivity while at the same time promoting higher data integrity.

Check Constraints, NULLs, and Defaults

An additional consideration for check constraints is the relational NULL. Any nullable column also defined with a check constraint can be set to null. When the column is set to null, the check constraint evaluates to unknown.  Because null indicates the lack of a value, the presence of a null will not violate the check constraint.

Additionally, DB2 provides the ability to specify defaults for table columns – both system-defined defaults (pre-defined and automatically set by the DBMS) and user-defined defaults. When a row is inserted or loaded into the table and no value is specified for the column, the column will be set to the value that has been identified in the column default specification.  For example, we could define a default for the EMP_TYPE column of our sample EMP table as follows:

        EMP_TYPE               CHAR(8)        DEFAULT ‘FULLTIME’
               CHECK (EMP_TYPE IN (‘TEMP’,   ‘FULLTIME’, ‘CONTRACT’)),

If a row is inserted without specifying an EMP_TYPE, the column will default to the value, ‘FULLTIME’.

A problem can arise when using defaults with check constraints. Most DBMS products do not perform semantic checking on constraints and defaults. The DBMS, therefore, will allow the DBA to define defaults that contradict check constraints.  Furthermore, it is possible to define check constraints that contradict one another. Care must be taken to avoid creating this type of problem. 

Examples of contradictory constraints are depicted below:

CHECK (EMPNO > 10 AND EMPNO <9 o:p="">

In this case, no value is both greater than 10 and less than 9, so nothing could ever be inserted.

EMP_TYPE    CHAR(8)  DEFAULT ‘NEW’
CHECK (EMP_TYPE IN (‘TEMP’,   ‘FULLTIME’, ‘CONTRACT’)),

In this case, the default value is not one of the permitted EMP_TYPE values according to the defined constraint.  No defaults would ever be inserted.

CHECK (EMPNO > 10)
CHECK (EMPNO >= 11)

In this case, the constraints are redundant.  No logical harm is done, but both constraints will be checked, thereby impacting the performance of applications that modify the table in which the constraints exist.

Other potential semantic problems could occur if the constraints contradicts a referential integrity DELETE or UPDATE rule, if two constraints are defined on the same column with contradictory conditions, or if the constraint requires that the column be NULL, but the column is defined as NOT NULL.

Other Potential Hazards

Take care when using the LOAD utility on a table with check constraints defined to it. By specifying the ENFORCE NO parameter you can permit DB2 to load data that does not conform to the check constraints (as well as the referential constraints). Although this eases the load process by enabling DB2 to bypass constraint checking, it will place the table space into a check pending state. You can run CHECK DATA to clear this state (or force the check pending off by using START with the FORCE option or the REPAIR utility). If you do not run CHECK DATA, constraint violations may occur causing dirty data.

Summary


Check constraints provide a very powerful vehicle for supporting business rules in the database. They can be used to simulate relational domains. Because check constraints are non-bypassable, they provide better data integrity than corresponding logic programmed into the application. It is a wise course of action to use check constraints in your database designs to support data integrity, domains, and business rules in all of your relational database applications. 

Sunday, September 01, 2013

Top Ten Common SQL Mistakes (with DB2 for z/OS)

There are many different types of development mistakes that can be made when you are coding a complex application system. But probably the most pervasive mistakes being made when developing an application using DB2 for z/OS are SQL mistakes... well, actually, that statement is probably true regardless of the DBMS, but I am going to focus on DB2 for z/OS given that this is a blog that focuses on DB2 and mainframe issues.

  1. Assuming an ORDER BY is not necessary for ordered results - if you want your results always to be returned in a specific order, you must include the ORDER BY clause. If you fail to do so, a future rebind can change the access path and also possibly change the order of your results set.
  2. Forgetting the NULL indicator - when your query can return a null, but sure to include a null indicator or you will get a -305 SQLCODE (22002 SQLSTATE). And be sure to check the null indicator to see if the data is null or not!
  3. Incorrect expectations when using the NOT IN predicate with NULLs - Nulls can be confusing when using the IN predicate. For example, what if we want to find all Colors who are not assigned to a particular Product using a query like shown below. THe problem arises when the P.color can be NULL. The NULL causes the predicate to be UNKNOWN so the results set is always empty. 
           SELECT C.color
           
FROM   Colors AS C
           
WHERE  C.color NOT IN (SELECT P.color
                                  
FROM Products AS P);

  1. Coding predicates appropriately in Outer Joins - Sometimes it can be difficult to figure out how to code outer joins appropriately if you don't code them very often. Terry Purcell has written a nice article on the topic that can be found here.
  2. Not coding a cursor for a multi-row result - When more than one row can be returned by your query you must use a cursor (or specify FETCH FIRST 1 ROW ONLY)
  3. Recompiling but not binding - if you make changes to the SQL, you have to BIND it again. Otherwise it won't work! 
  4. Forgetting to use single quotes around strings (instead of double quotes) - DB2 SQL expects single quotes around character strings.
  5. Trying to modify a Primary Key column - you cannot modify a primary key. A primary key should be unique within the table and immutable.
  6. Forcing dynamic SQL into static SQL (sometimes hundreds of static SQL statements) - you should analyze the type and nature of your database queries to determine whether they should be static or dynamic. Consider using the advice here (static SQL v. dynamic SQL) to guide you.
  7. Asking for more data than you need (columns and/or rows) - specify only the columns that you actually need in your SELECT-list... and use WHERE clauses (predicates) to filter the data to just that data that you need before bringing it into the program. The lesser the amount of data that DB2 needs to transfer from the database to your program, the more efficient things will be!

Tuesday, May 26, 2009

Ensuring Data Integrity is a Tricky Business

The term "data integrity" can mean different things to different people and at different times. But at a high level, there really are two aspects of integrity with respect to databases: database structure integrity and semantic data integrity. Keeping track of database objects and ensuring that each object is created, formatted and maintained properly is the goal of database structure integrity. Each DBMS uses its own internal format and structure to support the databases, table spaces, tables, and indexes under its control. System and application errors at times can cause faults within these internal structures. The DBA must identify and correct such faults before insurmountable problems occur. Semantic data integrity refers to the meaning of data and relationships that need to be maintained between different types of data. The DBMS provides options, controls and procedures to define and assure the semantic integrity of the data stored within its databases.

Structural database integrity and consistency is critical in the ongoing administration of databases. If the structural integrity of the database is not sound, everything else will be suspect, too. There are multiple types of structural problems that can occur. Indexing problems are one. Certain types of database maintenance can cause such problems and DBAs need to be able to recognize the problem, and rebuild the indexes to correct their structural integrity. Indexes are not the only database objects that utilize pointers. Many DBMSs use pointers to store very large objects containing text and image data. These can become corrupted.In today's modern database systems, structural integrity is rare -- much rarer than it used to be.

The more difficult and more pervasive problem is assuring the semantic integrity of the data. Getting that right requires proper design, processes that match your business requirements, good communication skills, and constant vigilance.

Perhaps the number one cause of data integrity problems is improperly designed databases. Just getting the data type and length correct for each column can go a long way to making sure the right data is stored. Think about it. If you need to store dates but the column is defined as CHAR(8) how can you enforce that only valid dates are stored? You would need to code program logic to accomplish that. But if the column is defined as DATE then the DBMS would take care of it -- and more of the data would be likely to be correct.

The DBA must also set up data relationships properly in the database. This is done using referential integrity (RI), a method for ensuring the "correctness" of data within a DBMS. People tend to over-simplify RI stating that it is merely the identification of relationships between tables. It is actually much more than this. Of course, the identification of the primary and foreign keys that constitutes a relationship between tables is a component of defining referential integrity. Basically, RI guarantees that an acceptable value is always in the foreign key column. Acceptable is defined in terms of an appropriate value as housed in the corresponding primary key (or perhaps null).

The combination of the relationship and the rules attached to that relationship is referred to as a referential constraint. The rules that accompany the RI definition are just as important as the relationship. These rules define how data is to be properly added to the databases and what happens when it is removed.

There are other mechanisms in the DBMS that DBAs can use to enforce semantic data integrity. Check constraints and rules can be applied to columns that dictate valid values. The DBMS will reject invalid data that does not conform to the constraints. More complex data relationships can be set up using database triggers.

Every DBA should take advantage of the mechanisms provided by the DBMS to ensure data integrity. When DBMS-provided methods are used, fewer data integrity problems are likely to be found. Fewer data integrity problems mean higher quality databases and more proficient end users. You have to know what integrity rules are proper for the DBMS to enforce. But once defined, many of those rules can be enforced by the DBMS.

And that is very good, indeed!