The Db2 Portal Blog: July 2023

Monday, July 24, 2023

Top 10 Db2 Performance Tips - No. 5 Workload Management and Query Optimization

Managing workloads and optimizing queries are essential tasks for maximizing the performance of a Db2 databases and the applications that access them. By employing effective techniques, DBAs and performance analysts can streamline query execution, reduce resource contention, and enhance overall system efficiency.

The application code itself must be designed appropriately and monitored for efficiency. In fact, many experts agree that as much as 70 to 80 percent of performance problems are caused by improperly coded database applications. SQL is the primary culprit. Coding efficient SQL statements can be complicated. Developers need to be taught how to properly formulate SQL statements and SQL statements must be constantly monitored and tuned.

Query Rewriting

One technique for workload management and query optimization is query rewriting. Query rewriting involves modifying the structure or logic of a query to achieve better performance. This can include simplifying complex expressions, reordering join operations, or breaking down a single complex query into multiple simpler queries. By understanding the underlying data model and query requirements, you can rewrite queries to optimize execution plans and improve overall performance.

With that in mind, it is important to understand that Db2 itself can perform query rewrite as part of the optimization process. The query compiler can rewrite SQL and XQuery statements into different forms to improve optimization.

Of course, you (as a developer or tuner) can still make changes to SQL to try to influence the optimizer to achieve different (and hopefully better) access paths.

SQL, by its very nature, is quite flexible. It uses a free-form structure that gives the user the ability to develop SQL statements in a way best suited to each user. Each SQL request is parsed by Db2 during compilation and optimization before it is executed to check for proper syntax and to optimize the request.

Therefore, SQL statements do not need to start in any given column and can be strung together on one line or broken apart on several lines. Any SQL request could beformulated in a number of different but functionally equivalent ways. SQL’s flexibility makes it intrinsically simple, but flexibility can complicate performance management because different but equivalent SQL formulations can result in variable performance.

When you are writing your SQL statements to access Db2 data, keep in mind that you should look at various different ways to formulate the same query to see which one performs best. For example, you might change a BETWEEN predicate to two predicates using <= and >=. This is one simple example and many different things can change an access path, so be creative and test different ways of using SQL to request the same data.

Use EXPLAIN to Examine Access Paths

Programmers need to be schooled in the practice of examining SQL access paths. Using EXPLAIN and querying the resutls allows developers to request information on how the optimizer will satisfy each query. Will an index be used? In what order will the tables be joined? Will the query be broken up into parallel tasks or not?

Of course, you may want to use a Visual Explain tool to look at access paths visually instead of querying complex and sometimes arcane data in plan tables. For example, IBM Data Studio offers a Visual Explain capability (as do other tools).

A few guidelines to consider:

Always bind every production program using EXPLAIN YES. This will ensure that the access path details are externalized if/when you need to review them should performance issues arise.

Ensure that application developers understand EXPLAIN and have access to plan table data when testing new code. Catching problematic access paths in test is clearly better than waiting for the problems to surface in production.

Make sure that all Db2 developers have access to tools for reviewing and examining access paths and explain information (e.g. Data Studio).

Fundamental SQL Guidelines

These and many other factors influence the efficiency of SQL. Not all application problems are due to improperly coded SQL. The host language application code in which the SQL has been embedded also can be inefficient, causing database application performance to suffer.

These are, however, three simple but important rules to follow when writing your SQL statements for performance. Of course, SQL performance is a complex topic and to understand every nuance of how SQL performs can take alifetime. That said, adhering to the following simple rules puts you on the right track to achieving high-performing Db2 applications.

First, always provide only the exact columns that you need to retrieve in the SELECT-list of each SQL SELECT statement. A common way of stating this is “do not use SELECT *”. The shorthand SELECT * means retrieve all columns from the table(s) being accessed. Using SELECT * may be fine for quick and dirty queries but it is a bad practice for inclusion in application programs because:

Db2 tables may need to be changed in the future to include additional columns. SELECT * in production programs will retrieve those new columns, too, and your program may not be capable of handling the additional data without requiring time-consuming changes.

Db2 consumes additional resources for every column that is requested to be returned. If the program does not need the data, it should not ask for it. Even if the program needs every column, it is better to explicitly ask for each column by name in the SQL statement for clarity and to avoid the previous pitfall.

Secondly, do not ask for what you already know. This may sound simplistic, but most programmers violate this rule at one time or another. For example,consider what is wrong with this simple query:

SELECT LASTNAME, FIRST_NAME, JOB_CODE, DEPTNO

FROM EMP

WHERE JOB_CODE = 'A'

AND DEPTNO = 'D01';

Look at the SELECT-list. There are four columns specified but only two of them are needed. We know that JOB_CODE will always be A and DEPTNO will always be D01 because we told Db2 to only return those rows using the WHERE clauses. So do not ask Db2 to return that data... you already know it!

Every column that Db2 must and return to the program adds overhead. It may be a small amount of overhead, but if this statement runs many times during the day (hundreds, or even thousands, of times), that small overhead adds up to significant overhead.

And thirdly, use the WHERE clause to filter data in the SQL instead of bringing it all into your program to filter. This too is a common rookie mistake. It is much better for Db2 to filter the data before returning it to your program. This is so because Db2 requires additional I/O and CPU resources to obtain each row of data. The fewer rows passed to your program, the more efficient your SQL will be.

Query Hints and Tweaking

The use of query hints is another approach to query optimization. Hints provide directives to the optimizer on how to execute a specific query, influencing the choice of access paths, join strategies, or join orders. By carefully selecting and applying query hints, you can guide the optimizer's decisions and ensure optimal query execution plans. There are three types of hints:

One type of hint is to modify the query in some way to encourage (or force) the optimizer to choose a different access path. This is often called tweaking the SQL. For example, you might choose to append OR 0 = 1 to a predicate to cause Db2 to avoid using an index.
A second type of hint is to give the optimizer quidance as to the number of rows that will be returned using OPTIMIZE FOR n ROWS. In that case, instead of using the database statistics it will use the guidance you provide.
Another type of hint, which is much better, is to explicitly use the hint capabilities of Db2 to force a particular query to use a specific, pre-determined access path.

In any case, it is important to use hints judiciously, as excessive or unnecessary hints may hinder the optimizer's ability to adapt to changing data or system conditions.

Additional Techniques and Tools

As part of ensuring an optimal SQL environment it is important that DBAs first setup an effective environment that is properly implemented and administered. This includes establishing standard methods for appropriate indexing, regular statistics collection, and setting database configuration parameters approrpriately to optimize query performance.

Perhaps the most important thing you can do to assure optimal performance of your database applications is to create the correct indexes for your tables. Indexing appropriately on frequently queried columns can significantly improve query execution times. Regularly collecting and updating statistics ensures that the optimizer has accurate information to make informed decisions. Optimizing database configuration parameters, such as query parallelism or memory settings, can also have a significant impact on workload management and query performance.

Managing the performance of your database applications requires in-depth monitoring. Be sure to allocate an appropriate budget to acquire performance management tools to ensure the efficiency of your database systems.

Query optimization tools are valuable assets for managing workloads and improving query performance. These tools provide insights into query execution plans, access paths, and performance statistics. They allow database administrators to analyze query performance, identify potential bottlenecks, and make informed decisions for optimization.

Moreover, workload management techniques such as query prioritization, resource allocation, and concurrency control contribute to efficient query execution. Prioritizing critical queries, allocating resources based on workload importance, and managing concurrency effectively help ensure that high-priority queries receive the necessary resources and are processed efficiently.

Summing Things Up

In conclusion, managing workloads and optimizing queries are crucial aspects of maximizing the performance of your Db2 database and applications. Techniques such as applying fundamentail query development methods, query rewriting, the use of hints, and leveraging query optimization tools can significantly enhance query performance. Additionally, employing indexing strategies, collecting accurate statistics, and configuring database parameters contribute to efficient workload management. By implementing these techniques, DBAs, develoeprs, and performance analysts can streamline query execution, improve system responsiveness, and achieve optimal performance in their Db2 environments.

Monday, July 17, 2023

Top 10 Db2 Performance Tips - No. 4 Effective Memory Configuration

The memory configuration of an IBM Db2 environment plays a critical role in determining the performance of applications accessing data, as well as the overall efficiency of the subsystem and DBMS. By properly allocating and managing memory resources, DBAs and systems programmers can significantly enhance the system's responsiveness and query execution speed.

One of the primary areas of memory configuration is the allocation of buffer pools, as well as other pools of memory. Db2 for z/OS uses memory for buffer pools, the EDM pool, RID pool and sort pools to cache data and structures in memory. The better memory is allocated to these structures, the better Db2 and applications that access it will perform.

Buffer pools act as a cache for frequently accessed data pages, reducing disk I/O and improving query performance. Allocating an appropriate amount of memory to buffer pools is crucial to ensure that frequently accessed data remains in memory, readily available for query processing. By monitoring workload patterns and adjusting the buffer pool sizes accordingly, database administrators can optimize memory utilization and minimize disk I/O.

When allocating Db2 buffer pools, keep these rules of thumb in mind:

Don't allocate everything to a single buffer pool (e.g., BP0); use a multiple buffer pool strategy.
Explicitly specify a buffer pool for every table space and index. Don't simply let Db2 choose a default buffer pool by failing to specify one.
Isolate the Db2 Catalog in BP0 (and BP8K0 and BP16K0 ); put user and application Db2 objects into other buffer pools.
Consider separating indexes from table spaces with each in their own dedicated buffer pools.
Consider isolating heavily hit data into its own buffer pool to better control performance.
Consider isolating sorts into a single buffer pool and tuning it for mostly sequential access (e.g. BP7).
Consider separating Db2 objects into separate buffer pools that have been configured for sequential verses random access.

Forget about trying to follow a cookie-cutter approach to buffer pool management. Every shop must create and optimize a buffer pool strategy for its own data and application mix. DB2 offers several buffer pool tuning "knobs" that can be used to configure virtual buffer pools to the type of processing they support. The following parameters all can be changed using the ALTER BUFFERPOOL command:

DWQT – this value is the deferred write threshold; it is expressed as a percentage of the virtual buffer pool that might be occupied by unavailable pages. When this threshold is reached Db2 will start to schedule write I/Os to externalize data. The default is 30%, which is likely to be too high for most shops.
VDWQT – this value is the vertical deferred write threshold; it is basically the same as DWQT, but for individual data sets. The default is 10%, which once again is quite likely to be too high for many shops.
VPSEQT – this value is the sequential steal threshold; it is a expressed as a percentage of the virtual buffer pool that can be occupied by sequentially accessed pages. Tune buffer pools for sequential access (such as scans and sorting) by modifying VPSEQT to a larger value. The default is 80%.
VPPSEQT – this value is the sequential steal threshold for parallel operations; the default value is 50% of VPSEQT.
VPXPSEQT – this value is assisting parallel sequential threshold; it is basically the VPPSEQT for operations from another Db2 subsystem in the data sharing group.

You can tune these parameters, as well as the size of the buffer pools, to accommodate the usage patterns of your Db2 applications.

Consider modifying the deferred write threshold parameters to enable trickle write; that means lower values that will cause changed data to be written to disk more frequently, instead of waiting for a system checkpoint to occur.

Furthermore, think about modifying the sequential steal thresholds for the type of data being buffered; if that data is mostly sequentially accessed, then increase these thresholds… if the data is mostly randomly accessed, then decrease these thresholds. Of course, these are basic, high-level guidelines that you will need to study before adjusting them at your shop.

The PGSTEAL parameter also can be adjusted to modify the manner in which the buffer pool steals pages when new data arrives and there is no space for it. There are three options:

LRU,
FIFO, and
NONE.

The typical option is LRU, or least recently used. This will cause the oldest pages (in terms of when they were last accessed) to be stolen before newer pages. An alternate approach is FIFO, or first in/first out. With this approach there is no need for Db2 to monitor when the data was last accessed to determine which the least-recently pages. FIFO can reduce CPU usage (no LRU algorithm needed) and works well if the data is read once and never accessed again. The final option, NONE, is a special case to be used when a buffer pool is large enough to hold all of the data assigned to it so no page stealing is needed. When NONE is specified, Db2 will pre-load the buffer pool when the objects are opened, basically creating an in-memory area for the data.

You can also use the PGFIX parameter to fix buffer pool pages in real storage. Doing so avoids the processing time that DB2 needs to fix and free pages for every I/O operation. This can reduce CPU for bufferpools involved in very intensive I/O applications.

Other Memory Considerations

In addition to buffer pools, Db2 uses memory for other purposes. The first we will examine is the EDM pool. EDM stands for Environmental Descriptor Manager. The EDM pool is used for caching internal structures used by Db2 programs. This includes DBDs, SKCTs, CTs, SKPTs, and PTs. It also includes the authorization cache for plans and packages, as well as the cache for dynamic SQL mini-plans.

Although it is common to refer to the EDM pool in the singular, Db2 actually breaks the EDM pool into separate pools for DBDs, for the dynamic statement cache, and for the program elements (CTs, SKCTs, PTs, SKPTs). Tuning the size of these structures to facilitate the processing requirements of your applications is crucial to ensuring optimal performance.

As a general rule of thumb, shoot for an 80 percent hit rate with the EDM pools; this means that only one out every five times should a structure need to be loaded from disk into the EDM pool.

Db2 also uses a pool to help with specific types of access paths called the RID pool. The RID pool is used by Db2 to store RIDs (record identifiers) for List Prefetch, Multiple Index Access, and Hybrid Join access paths. RID pool failures can cause performance degradation as alternate access paths are invoked, such as scans, and the CPU invested up to the point of the failure is wasted. Not to mention that the scan usually will not perform as well as an indexed access!

Another aspect of memory configuration is sort memory. Sort operations are commonly performed during query execution, such as order by, group by, or distinct operations. Allocating sufficient memory for sort operations reduces the need for temporary disk storage, which can significantly impact query performance. It is important to allocate an appropriate amount of memory for sort operations based on the workload requirements, ensuring efficient sorting and minimizing disk I/O. Failure to provide sufficient memory for sorts can cause performance degradations can impact elapsed times dramatically and sort failures can terminate a statement.

In addition to specific memory allocations, it is important to consider the overall memory availability and system-wide settings. Ensuring that Db2 has access to an adequate amount of system memory prevents excessive swapping or paging, which can severely degrade performance. Adjusting system-wide memory parameters, such as the maximum memory target, can help fine-tune the overall memory allocation for Db2.

Regular monitoring of memory usage and performance metrics is crucial for effective memory configuration. By analyzing memory-related statistics and performance indicators, DBAs and performance analysts can identify potential bottlenecks or areas where memory resources may be over or underutilized. Proactive monitoring allows for timely adjustments to memory configuration to optimize performance.

Summing It Up

By allocating memory resources efficiently, including buffer pools, EDM pools, sort memory, and so on, you can minimize disk I/O, reduce contention, and enhance query execution speed. Regular monitoring and tuning of memory settings based on workload patterns and system-wide considerations contribute to a well-optimized Db2 environment with improved responsiveness and overall performance.

Finally, remember that tuning the memory structures of Db2 is in-depth subject that cannot be adequately covered in-depth in a blog post such as this. So, study those IBM Db2 manuals - and learn by doing.

Thursday, July 13, 2023

Top 10 Db2 Performance Tips - No. 3 Efficient Indexing Strategies

Indexes play a crucial role in improving query performance when accessing data in a Db2 database. By creating indexes on specific columns, the Db2 optimizer can more quickly locate the desired data, reducing the need for full table scans and enabling faster query execution. Understanding and implementing efficient indexing strategies can greatly enhance the overall performance of a database and the applications that access it.

One of the first things you need to do is to understand the type of indexes used by Db2, which are B-tree indexes. B-tree indexes are commonly used and efficient for many situations. The general idea behind B-tree indexing is to improve the process of accessing data by making it easier to search through data faster. A B-tree stores data such that each node contains keys in ascending order. Each of these keys has two references to another two child nodes. The left side child node keys are less than the current keys, and the right side child node keys are more than the current keys.

The first important factor in efficient indexing is carefully selecting the columns to be indexed. Not all columns require an index, and indexing too many columns can lead to unnecessary overhead. It is essential to identify the columns frequently used in WHERE clauses, JOIN conditions, or ORDER BY and create indexes on those columns. This targeted approach ensures that the indexes are focused on the most critical areas and deliver optimal query performance.

Regular index maintenance is vital for sustaining performance gains. Over time, indexes can become fragmented or outdated due to data modifications. Fragmented indexes can hinder query performance, so it is important to periodically monitor and address index fragmentation. DB2 provides utilities to reorganize or rebuild indexes, reducing fragmentation and improving query response times.

Considerations for index fragmentation include setting appropriate free space, which is the percentage of space used on each index page when data is loaded. It is important to strike a balance between minimizing space consumption and reducing index maintenance.

Furthermore, it is crucial to monitor the impact of index usage on overall system performance. Query plans and performance monitoring tools provide insights into index usage and can identify situations where indexes are not effectively utilized. Identifying unused or underutilized indexes allows for their removal or modification, reducing storage overhead and improving overall database performance.

The following list of 10 steps can be used to ensure that you are using an effective indexing strategy for your Db2 databases and applications:

1. Index by workload, not by object

Many people make the mistake of just guessing at some indexes to create when they are creating other database objects (like tables and table spaces). But without an idea of how the tables are going to be accessed these guesses are usually wrong – at least some of them.

Indexes should be built to optimize the access of your SQL queries. To properly create an optimal set of indexes requires a list of the SQL to be used, an estimate of the frequency that each SQL statement will be executed, and the importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries most of the time be made.

If you are doing it any other way, you are doing it wrong.

2. Build indexes based on predicates

3. Index most-heavily used queries

Numbers 2 and 3 can be thought of as corollaries to Number 1… that is, these are the aspects of application workload that need to be examined to produce appropriate and effective indexes.

Look at the predicates of the queries in the workload and work to create a set of indexes that match up to most (if not all) of them. When it is not practical to build all of the indexes you have identified, then it makes sense to look at the queries that will be used most often and optimize them first.

4. Index "important" queries

The more important the query, the more you might want to tune by index creation. If you are coding a query that the CIO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist.

Of course, the decision should depend on the application’s importance to the business - not just on the user’s importance. For example, if that clerk runs a query hundreds of times a day that impacts revenue, but the CIO runs his query infrequently, then perhaps the clerk's query is more important.

5. Index to avoid sorting (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses, the optimizer can use an index to avoid a sort, and thereby potentially improve performance.

6. Create indexes for uniqueness (PK, U)

Some indexes are required to make the database schema valid. For example, Db2 requires that unique indexes be created when unique and primary key constraints exist. These indexes are not optional.

7. Create indexes for foreign keys

Even if not required, creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints (RI – referential integrity).

8. Consider adding columns for index-only access

Sometimes it can be advantageous to include additional columns in an index to increase the chances of index-only access. With index-only access, all of the data needed to satisfy the query can be found in the index alone — without having to read data from the tablespace.

For example, suppose that there is an index on the DEPTNO column of the DEPT table. The following query may use this index:

   SELECT   DEPTNAME
FROM     DEPT
WHERE    DEPTNO > 'D00';

The index could be used to access only those columns with a DEPTNO greater than D00, but then Db2 would need to access the data in the table space to return the DEPTNAME. If you added DEPTNAME to the index, that is, create the index on (DEPTNO, DEPTNAME) then all of the data needed for this query exists in the index and additional I/O to the table space would not be needed. This technique is sometimes referred to as index overloading.

Of course, this is not always a good idea. You have to take into account whether other queries use the index and how it might negatively impact their performance.

It is also worth mentioning index include columns, which allows you to define additional (included) columns that are not part of the actual key but are available in the index. So we can create an index like this:

   CREATE INDEX IXNAME
     ON DEPT ( DEPTNO )
      INCLUDE ( DEPTNAME )

The index key is only on the DEPTNO column, but the DEPTNAME is also included in the index, so the earlier query can be satisfied using index-only access with this index.

9. Don’t arbitrarily limit the number of indexes

An example of a bad practice would be to place an artificial limit on indexing. There should be no arbitrary limit on the number of indexes that you can create for any database table. Indexes are undoubtedly one of the most important factors in creating efficient queries. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data. When a rule such as this exists, it usually is stated something like this...“Each table can have at most five indexes created for it” — or — “Do not create more than three indexes for any single table in the database.” These are bad standards.

If you already have three indexes, or five indexes, or even a dozen indexes -- and another index will improve performance -- why would you arbitrarily want to avoid creating that index? Item 10 below discusses one reason, but otherwise, the downside is only that you will need to manage the index.

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.”

Which brings us to…

10. Be aware of I/U/D implications

Db2 must automatically maintain every index you create. This means every INSERT and every DELETE to an indexed table will insert and delete not just from the table, but also from its indexes.

Additionally, when you UPDATE the value of a column that has been defined in an index, the DBMS must also update the index. So, indexes speed the process of retrieval but slow down modification.

So the general rule of thumb should be "Index until it hurts... and then back off the least effective index to make it no longer hurt." Sure, that is easier said than done, but it is a valid philosophy to follow.

Summary

In conclusion, efficient indexing strategies are vital for enhancing query performance in IBM Db2. By understanding the role of indexes in query execution and following best practices such as selecting appropriate index types, carefully choosing indexed columns, addressing index fragmentation, and monitoring index usage, database administrators can optimize the database's performance. Effective indexing improves query response times, reduces resource consumption, and contributes to a DB2 environment with high performance!