The Db2 Portal Blog

Thursday, July 13, 2023

Top 10 Db2 Performance Tips - No. 3 Efficient Indexing Strategies

Indexes play a crucial role in improving query performance when accessing data in a Db2 database. By creating indexes on specific columns, the Db2 optimizer can more quickly locate the desired data, reducing the need for full table scans and enabling faster query execution. Understanding and implementing efficient indexing strategies can greatly enhance the overall performance of a database and the applications that access it.

One of the first things you need to do is to understand the type of indexes used by Db2, which are B-tree indexes. B-tree indexes are commonly used and efficient for many situations. The general idea behind B-tree indexing is to improve the process of accessing data by making it easier to search through data faster. A B-tree stores data such that each node contains keys in ascending order. Each of these keys has two references to another two child nodes. The left side child node keys are less than the current keys, and the right side child node keys are more than the current keys.

The first important factor in efficient indexing is carefully selecting the columns to be indexed. Not all columns require an index, and indexing too many columns can lead to unnecessary overhead. It is essential to identify the columns frequently used in WHERE clauses, JOIN conditions, or ORDER BY and create indexes on those columns. This targeted approach ensures that the indexes are focused on the most critical areas and deliver optimal query performance.

Regular index maintenance is vital for sustaining performance gains. Over time, indexes can become fragmented or outdated due to data modifications. Fragmented indexes can hinder query performance, so it is important to periodically monitor and address index fragmentation. DB2 provides utilities to reorganize or rebuild indexes, reducing fragmentation and improving query response times.

Considerations for index fragmentation include setting appropriate free space, which is the percentage of space used on each index page when data is loaded. It is important to strike a balance between minimizing space consumption and reducing index maintenance.

Furthermore, it is crucial to monitor the impact of index usage on overall system performance. Query plans and performance monitoring tools provide insights into index usage and can identify situations where indexes are not effectively utilized. Identifying unused or underutilized indexes allows for their removal or modification, reducing storage overhead and improving overall database performance.

The following list of 10 steps can be used to ensure that you are using an effective indexing strategy for your Db2 databases and applications:

1. Index by workload, not by object

Many people make the mistake of just guessing at some indexes to create when they are creating other database objects (like tables and table spaces). But without an idea of how the tables are going to be accessed these guesses are usually wrong – at least some of them.

Indexes should be built to optimize the access of your SQL queries. To properly create an optimal set of indexes requires a list of the SQL to be used, an estimate of the frequency that each SQL statement will be executed, and the importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries most of the time be made.

If you are doing it any other way, you are doing it wrong.

2. Build indexes based on predicates

3. Index most-heavily used queries

Numbers 2 and 3 can be thought of as corollaries to Number 1… that is, these are the aspects of application workload that need to be examined to produce appropriate and effective indexes.

Look at the predicates of the queries in the workload and work to create a set of indexes that match up to most (if not all) of them. When it is not practical to build all of the indexes you have identified, then it makes sense to look at the queries that will be used most often and optimize them first.

4. Index "important" queries

The more important the query, the more you might want to tune by index creation. If you are coding a query that the CIO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist.

Of course, the decision should depend on the application’s importance to the business - not just on the user’s importance. For example, if that clerk runs a query hundreds of times a day that impacts revenue, but the CIO runs his query infrequently, then perhaps the clerk's query is more important.

5. Index to avoid sorting (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses, the optimizer can use an index to avoid a sort, and thereby potentially improve performance.

6. Create indexes for uniqueness (PK, U)

Some indexes are required to make the database schema valid. For example, Db2 requires that unique indexes be created when unique and primary key constraints exist. These indexes are not optional.

7. Create indexes for foreign keys

Even if not required, creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints (RI – referential integrity).

8. Consider adding columns for index-only access

Sometimes it can be advantageous to include additional columns in an index to increase the chances of index-only access. With index-only access, all of the data needed to satisfy the query can be found in the index alone — without having to read data from the tablespace.

For example, suppose that there is an index on the DEPTNO column of the DEPT table. The following query may use this index:

   SELECT   DEPTNAME
FROM     DEPT
WHERE    DEPTNO > 'D00';

The index could be used to access only those columns with a DEPTNO greater than D00, but then Db2 would need to access the data in the table space to return the DEPTNAME. If you added DEPTNAME to the index, that is, create the index on (DEPTNO, DEPTNAME) then all of the data needed for this query exists in the index and additional I/O to the table space would not be needed. This technique is sometimes referred to as index overloading.

Of course, this is not always a good idea. You have to take into account whether other queries use the index and how it might negatively impact their performance.

It is also worth mentioning index include columns, which allows you to define additional (included) columns that are not part of the actual key but are available in the index. So we can create an index like this:

   CREATE INDEX IXNAME
     ON DEPT ( DEPTNO )
      INCLUDE ( DEPTNAME )

The index key is only on the DEPTNO column, but the DEPTNAME is also included in the index, so the earlier query can be satisfied using index-only access with this index.

9. Don’t arbitrarily limit the number of indexes

An example of a bad practice would be to place an artificial limit on indexing. There should be no arbitrary limit on the number of indexes that you can create for any database table. Indexes are undoubtedly one of the most important factors in creating efficient queries. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data. When a rule such as this exists, it usually is stated something like this...“Each table can have at most five indexes created for it” — or — “Do not create more than three indexes for any single table in the database.” These are bad standards.

If you already have three indexes, or five indexes, or even a dozen indexes -- and another index will improve performance -- why would you arbitrarily want to avoid creating that index? Item 10 below discusses one reason, but otherwise, the downside is only that you will need to manage the index.

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.”

Which brings us to…

10. Be aware of I/U/D implications

Db2 must automatically maintain every index you create. This means every INSERT and every DELETE to an indexed table will insert and delete not just from the table, but also from its indexes.

Additionally, when you UPDATE the value of a column that has been defined in an index, the DBMS must also update the index. So, indexes speed the process of retrieval but slow down modification.

So the general rule of thumb should be "Index until it hurts... and then back off the least effective index to make it no longer hurt." Sure, that is easier said than done, but it is a valid philosophy to follow.

Summary

In conclusion, efficient indexing strategies are vital for enhancing query performance in IBM Db2. By understanding the role of indexes in query execution and following best practices such as selecting appropriate index types, carefully choosing indexed columns, addressing index fragmentation, and monitoring index usage, database administrators can optimize the database's performance. Effective indexing improves query response times, reduces resource consumption, and contributes to a DB2 environment with high performance!

Thursday, July 06, 2023

Top 10 Db2 Performance Tips - No. 2: Optimize Database Design

A well-designed database schema forms the foundation of a high-performing and efficient IBM Db2 database... and therefore, also serves as the basic starting point for efficient Db2 applications. The importance of optimizing the database design cannot be overstated, as it directly impacts query performance, data integrity, and overall system efficiency.

The Logical Data Model

The first step toward a proper database design is the creation of a logical data model. Before implementing databases of any sort, it is imperative to first develop a sound model of the data to be used. Novice database developers frequently begin with the quick-and-dirty approach to database implementation. They approach database design from a programming perspective. Because novices often lack experience with databases and data requirements gathering, they attempt to design databases like the flat files they are accustomed to using. This is a major mistake. Indeed, most developers using this approach quickly discover problems after the databases and applications become operational in a production environment. At a minimum, performance will suffer and data may not be as readily available as required. At worst, data integrity problems and/or performance problems may arise, rendering the entire application unusable.

The goal of a data model is to record the data requirements of a business process. The scope of the data model for each line of business must be comprehensive. A data model serves as lexicon for the data needs of the business... and as a blueprint for the physical implementation of the structures of the database.

A key component of building a proper data model is to ensure proper normalization.

Normalization

Normalization reduces data redundancy and inconsistencies by ensuring that the data elements are designed appropriately. A series of normalization rules are applied to the entities and data elements, each of which is called a “normal form.” If the data conforms to the first rule, the data model is said to be in “first normal form,” and so on.

A database design in First Normal Form (1NF) will have no repeating groups and each instance of an entity can be identified by a primary key. For Second Normal Form (2NF), instances of an entity must not depend on anything other than the primary key for that entity. Third Normal Form (3NF) removes data elements that do not depend on the primary key. If the contents of a group of data elements can apply to more than a single entity instance, those data elements belong in a separate entity.

This is a quick and dirty introduction to normalization, and there are further levels of normalization not discussed here in order to keep the discussion moving along. For an introductory discussion of normalization visit http://wdvl.com/Authoring/DB/Normalization.

The bottom line is that normalization reduces data redundancy and improves data integrity by organizing data into logical entities and minimizing data duplication. By carefully analyzing the business requirements and applying normalization principles, database designers can create tables that are lean, efficient, and accurately represent the data model.

Relationships

Optimizing relationships between tables is another critical aspect of database design. Relationships, such as primary key-foreign key associations, define the logical connections between tables. This too, should be evident in the logical data model, which is frequently depicted as an entity/relationship diagram.

Choosing appropriate indexing strategies, enforcing referential integrity, and carefully considering the cardinality and selectivity of relationships are crucial steps to ensure efficient query processing and join operations.

From Logical to Physical

Assuming you have a well-designed logical data model, the first step in moving to a physical database design is the process of transforming that logical data model into an actual physical database. The first step is to create an initial physical data model by transforming the logical data model into a physical implementation based on an understanding of the DBMS being used for deployment. To successfully create a physical database design you will need to have a good working knowledge of the features of the DBMS including:

In-depth knowledge of the database objects supported by the DBMS and the physical structures and files required to support those objects.
Details regarding the manner in which the DBMS supports indexing, referential integrity, constraints, data types, and other features that augment the functionality of database objects.
Detailed knowledge of new and obsolete features for particular versions or releases of the DBMS to be used.
Knowledge of the DBMS configuration parameters that are in place.
Data definition language (DDL) skills to translate the physical design into actual database objects.

Armed with the correct information, you can create an effective and efficient database from a logical data model. The first step in transforming a logical data model into a physical model is to perform a simple translation from logical terms to physical objects. Of course, this simple transformation will not result in a complete and correct physical database design – it is simply the first step. The transformation consists of the following:

Transforming entities into tables
Transforming attributes into columns
Transforming domains into data types and constraints

Data Types

To support the mapping of attributes to table columns you will need to map each logical domain of the attribute to a physical data type and perhaps additional constraints. In a physical database, each column must be assigned a data type.

Selecting appropriate data types is vital for optimizing database design. Choosing the right data types can have a significant impact on storage requirements, query performance, and overall system efficiency. By selecting data types that accurately represent the data and minimize storage overhead, such as using integer types instead of character types for numeric values, assuring that date and time data use appropriate data/time data types, and choosing wisely between the various text and character data types for each column helps to improve data integrity, optimize storage utilization, and improve query execution speed.

Constraints

Furthermore, you will need to implement appropriate constraints, such as primary keys, unique constraints, and foreign keys. This enhances data integrity and query performance. Furthermore, additional constraints such as check constraints and nullability enable the DBMS to better enforce data integrity, instead of leaving it to application code written at a later time. Constraints enforce data consistency rules, ensure referential integrity, and provide the optimizer with valuable information for query optimization.

An Iterative Process

It is worth mentioning that database design optimization is an iterative process that should consider not only the current requirements but also the future growth and scalability of the system. Regularly reviewing and revisiting the database design as the application evolves can help identify areas for improvement and ensure that the database remains optimized over time.

Finally...

In conclusion, a well-designed database schema is fundamental to achieving optimal performance for your database applications. By focusing on strategies such as normalization, relationship optimization, appropriate data types, and constraints, database designers can create a robust and efficient database environment. Optimizing the database design not only enhances query performance and data integrity but also lays the groundwork for scalability and adaptability as the system evolves.

Monday, June 26, 2023

Top 10 Db2 Performance Tips - No. 1: Analyze Query Execution Plans

Query execution plans play a crucial role in identifying performance bottlenecks within an IBM Db2 database. When a query is executed, the database optimizer determines the most efficient way to retrieve the requested data by generating a query execution plan. It does this by parsing SQL requests and turning them into actionable commands for accessing and modifying data.

So the query execution plan outlines the steps and operations involved in executing the query, including accessing tables, applying filters, joining data, and performing sorting or aggregations.

Understanding query execution plans is important because they reveal the inner workings of how Db2 will process the query. And they also highlight areas where performance can be improved if you understand the various ways that data can be processed by Db2. By analyzing query execution plans, you can identify potential inefficiencies and take proactive steps to optimize queries for better performance.

For example, one high-level performance tuning step that you can identify using a query execution plan is to find table (space) scans that would be better off using an index instead.

Plan Tables

But how can you review and analyze query execution plans? Perhaps the first method used by most folks is to examine the Plan Tables. A lot of information is contained in the PLAN_TABLE. The PLAN_TABLE is simply a specific table used by the EXPLAIN command to populate information about the choices made by the Db2 Optimizer when it formulates a query execution plan. The information in the PLAN_TABLE provides detailed information about the access paths chosen, join strategies employed, and other relevant details that impact query performance.

As long as EXPLAIN YES is specified when you bind your program, or you explicitly issue an EXPLAIN PLAN statement, information about the query execution plan will be placed in the PLAN_TABLE.

So, after the optimizer creates the access paths and populates the PLAN_TABLE with data representing those access paths, you will need to examine the results to determine if everything is satisfactory. Many questions can be answered by analyzing the results of EXPLAIN – questions like:

if we are joining what type of join is used (NLJ, MS, Hybrid),
was an index used, and if so how many columns matched,
are we doing a scan, and if so what type of scan (full or page range)
is prefetch being used, and if so what type (sequential, list)
was a hint used
was parallelism used, and if so what degree and type
was a sort required, and if so why (Join, Unique, Group By, Order By)
what type of locking is required

And that just covers the main PLAN_TABLE. There are actually many other explain tables available that, if created, will be populated by EXPLAIN. But that is beyond the scope of this high-level tip, which is already becoming rather lengthy.

Other Tools

There are also several additional tools and techniques that can be used to analyze Db2 query execution plans. One commonly used tool is Visual Explain, which provides a graphical representation of the query execution plan. Many tuners prefer a visual approach to revieing and tuning access paths. Visual Explain allows users to visualize the flow of data, understand the order in which operations are performed, and identify potential performance bottlenecks.

Visual Explain is a feature of the free IBM Data Studio tool, as well as the new Db2 Administration Foundation offering. There are other tools, from other vendors, that also provide a visual Explain capability.

Speaking of which, another technique to analyze query execution plans is to use database monitoring tools. Such tools capture real-time performance data, including query execution plans, and provide useful performance metrics that can be reviewed and acted upon as needed.

By monitoring the execution plans of frequently executed queries, it is possible to detect patterns, identify resource-intensive operations, and make informed decisions about query optimization.

A Bit of Guidance

It is wise to always Bind your production programs (that use static SQL) specifying EXPLAIN YES so that you can see the access paths chosen by Db2. This way, you can either check to make sure that no access paths have changed before proceeding to implement changes in production, or at least you have the information available should problems arise after you move code into production.

It can be difficult to determine if any access paths have changed, because you will need the old execution plans to compare to the new ones. For this reason, it is a good idea to keep several versions of access path information for each program.

Additionally, sometimes the program has changed between running EXPLAIN, which can make it challenging to find the exact SQL statements to compare. But you are still better off with older versions than without any historical data.

To optimize query execution plans for better performance, it is essential to focus on areas such as index usage, join strategies, and predicate selection. By strategically creating and maintaining appropriate indexes, rewriting complex queries, and refining predicate conditions, it is possible to influence the optimizer's decisions and improve query performance.

In Conclusion

Analyzing query execution plans is a vital step in optimizing the performance of IBM Db2 applications. By using tools like EXPLAIN and Visual Explain, PLAN_TABLE data, and monitoring tools, it is possible to gain valuable insights into query execution, identify potential bottlenecks, and enhance overall performance.

By proactively analyzing and optimizing query execution plans, organizations can achieve faster response times, improved scalability, and a more efficient database environment.

Be sure to check back here next week when we will address database design as a performance issue.

Wednesday, June 21, 2023

Top 10 Performance Tuning Tips for IBM Db2

Performance is a critical aspect of any database management system, and IBM Db2 is no exception. As organizations deal with ever-increasing data volumes and complex workloads, it is crucial that Db2 environments are fine-tuned to ensure optimal performance, responsiveness, and scalability. In this multi-part blog post, I will examine the top ten performance tuning tips for assuring optimal IBM Db2 applications and systems.

Whether you are an experienced database administrator or just starting your journey with Db2, these performance tuning tips will provide some practical strategies and techniques to enhance the efficiency and speed of your Db2 deployments. This series of posts will cover a range of topics, including query optimization, database design, indexing strategies, memory configuration, I/O optimization, and more.

By following these Db2 performance tips, you can significantly improve query response times, optimize resource utilization, and ensure the smooth operation of your Db2 environment. Whether you are dealing with transactional workloads, analytical queries, or a combination of both, the tips presented here will equip you with the knowledge and techniques to tackle performance bottlenecks and fine-tune Db2 to meet your specific needs.

Whether you are looking to optimize existing Db2 deployments or planning for a new implementation, this blog post will serve as a valuable resource to guide your performance tuning efforts. So, let's dive into the top ten performance tuning tips for IBM Db2 and discover how you can unlock the full potential of your database system.

Here's an outline for the upcoming posts in this series on "Top 10 Performance Tuning Tips for IBM Db2":

Tip 1: Analyze Query Execution Plans

Tip 2: Optimize Database Design

Tip 3: Efficient Indexing Strategies

Tip 4: Effective Memory Configuration

Tip 5: Workload Management and Query Optimization

Tip 6: Monitoring and Performance Metrics

Tip 7: Configuration Parameters and Tuning

Tip 8: I/O Optimization

Tip 9: Regular Maintenance and Housekeeping

Tip 10: Stay Up to Date

So be sure to keep tuning in to the Db2 Portal blog as we publish these Top 10 Performance Tips over the course of the next few weeks.

And be sure to share your own performance and tuning thoughts here... or on any of the subsequent posts in this series.

Thursday, June 01, 2023

Manage Your Test Data Before It Becomes a Bottleneck

Test data management is an important aspect of application development. Anybody who has ever been involved in software development knows that creating applications requires test data. Without the data to test your programs against, there is really no way to ensure that the programs are doing what you want them to do.

Image by Freepik

But test data management can quickly consume a lot of time and effort. Supporting multiple programmers, multiple database environments, and making sure you have sufficient test cases is easier said than done. And we are not just talking about developing new applications, but also modifying existing ones!

So, what can be done? Well, I'm going to point you to a couple of resources. First, I am delivering a webinar on this topic on June 8, 2023 titled Manage Your Test Data Before it Becomes a Bottleneck -- the same title as this blog post!

In this webinar, I discuss test data management and the trends impacting application development and testing. Some of the issues I examine include automation, shift-left testing, agile development, and DevOps. And I'll uncover the impact of these trends and how they drive up the speed and complexity of creating and testing applications. I will also define the components and requirements of test data management and examine the pitfalls that can lead test data management to become a bottleneck, slowing down your developers.

Another resource you can take a look at is the article I wrote for ELNION, titled The Test Data Management Struggles of Modern App Development.

In this article, I explore the issues impacting application developers regarding test data management, and look at how they can overcome these challenges to ensure a successful testing process.

So if test data management is a challenge for you and your organization, I urge you to take advantage of these resources by registering for and attending the webinar, and also reading the article.