Monday, July 17, 2023

Top 10 Db2 Performance Tips - No. 4 Effective Memory Configuration

The memory configuration of an IBM Db2 environment plays a critical role in determining the performance of applications accessing data, as well as the overall efficiency of the subsystem and DBMS. By properly allocating and managing memory resources, DBAs and systems programmers can significantly enhance the system's responsiveness and query execution speed.

One of the primary areas of memory configuration is the allocation of buffer pools, as well as other pools of memory. Db2 for z/OS uses memory for buffer pools, the EDM pool, RID pool and sort pools to cache data and structures in memory. The better memory is allocated to these structures, the better Db2 and applications that access it will perform. 

Buffer pools act as a cache for frequently accessed data pages, reducing disk I/O and improving query performance. Allocating an appropriate amount of memory to buffer pools is crucial to ensure that frequently accessed data remains in memory, readily available for query processing. By monitoring workload patterns and adjusting the buffer pool sizes accordingly, database administrators can optimize memory utilization and minimize disk I/O.

When allocating Db2 buffer pools, keep these rules of thumb in mind:

  • Don't allocate everything to a single buffer pool (e.g., BP0); use a multiple buffer pool strategy.
  • Explicitly specify a buffer pool for every table space and index. Don't simply let Db2 choose a default buffer pool by failing to specify one.
  • Isolate the Db2 Catalog in BP0 (and BP8K0 and BP16K0 ); put user and application Db2 objects into other buffer pools.
  • Consider separating indexes from table spaces with each in their own dedicated buffer pools.
  • Consider isolating heavily hit data into its own buffer pool to better control performance.
  • Consider isolating sorts into a single buffer pool and tuning it for mostly sequential access (e.g. BP7).
  • Consider separating Db2 objects into separate buffer pools that have been configured for sequential verses random access.

Forget about trying to follow a cookie-cutter approach to buffer pool management. Every shop must create and optimize a buffer pool strategy for its own data and application mix. DB2 offers several buffer pool tuning "knobs" that can be used to configure virtual buffer pools to the type of processing they support. The following parameters all can be changed using the ALTER BUFFERPOOL command:

DWQT – this value is the deferred write threshold; it is expressed as a percentage of the virtual buffer pool that might be occupied by unavailable pages. When this threshold is reached Db2 will start to schedule write I/Os to externalize data. The default is 30%, which is likely to be too high for most shops.

VDWQT – this value is the vertical deferred write threshold; it is basically the same as DWQT, but for individual data sets. The default is 10%, which once again is quite likely to be too high for many shops.

VPSEQT – this value is the sequential steal threshold; it is a expressed as a percentage of the virtual buffer pool that can be occupied by sequentially accessed pages. Tune buffer pools for sequential access (such as scans and sorting) by modifying VPSEQT to a larger value. The default is 80%.

VPPSEQT – this value is the sequential steal threshold for parallel operations; the default value is 50% of VPSEQT.

VPXPSEQT – this value is assisting parallel sequential threshold; it is basically the VPPSEQT for operations from another Db2 subsystem in the data sharing group.

You can tune these parameters, as well as the size of the buffer pools, to accommodate the usage patterns of your Db2 applications. 

Consider modifying the deferred write threshold parameters to enable trickle write; that means lower values that will cause changed data to be written to disk more frequently, instead of waiting for a system checkpoint to occur. 

Furthermore, think about modifying the sequential steal thresholds for the type of data being buffered; if that data is mostly sequentially accessed, then increase these thresholds… if the data is mostly randomly accessed, then decrease these thresholds. Of course, these are basic, high-level guidelines that you will need to study before adjusting them at your shop.

The PGSTEAL parameter also can be adjusted to modify the manner in which the buffer pool steals pages when new data arrives and there is no space for it. There are three options: 

  • LRU, 
  • FIFO, and 
  • NONE. 

The typical option is LRU, or least recently used. This will cause the oldest pages (in terms of when they were last accessed) to be stolen before newer pages. An alternate approach is FIFO, or first in/first out. With this approach there is no need for Db2 to monitor when the data was last accessed to determine which the least-recently pages. FIFO can reduce CPU usage (no LRU algorithm needed) and works well if the data is read once and never accessed again. The final option, NONE, is a special case to be used when a buffer pool is large enough to hold all of the data assigned to it so no page stealing is needed. When NONE is specified, Db2 will pre-load the buffer pool when the objects are opened, basically creating an in-memory area for the data. 

You can also use the PGFIX parameter to fix buffer pool pages in real storage. Doing so avoids the processing time that DB2 needs to fix and free pages for every I/O operation. This can reduce CPU for bufferpools involved in very intensive I/O applications.

Other Memory Considerations

In addition to buffer pools, Db2 uses memory for other purposes. The first we will examine is the EDM pool. EDM stands for Environmental Descriptor Manager. The EDM pool is used for caching internal structures used by Db2 programs. This includes DBDs, SKCTs, CTs, SKPTs, and PTs. It also includes the authorization cache for plans and packages, as well as the cache for dynamic SQL mini-plans.

Although it is common to refer to the EDM pool in the singular, Db2 actually breaks the EDM pool into separate pools for DBDs, for the dynamic statement cache, and for the program elements (CTs, SKCTs, PTs, SKPTs). Tuning the size of these structures to facilitate the processing requirements of your applications is crucial to ensuring optimal performance.

As a general rule of thumb, shoot for an 80 percent hit rate with the EDM pools; this means that only one out every five times should a structure need to be loaded from disk into the EDM pool.

Db2 also uses a pool to help with specific types of access paths called the RID poolThe RID pool is used by Db2 to store RIDs (record identifiers) for List Prefetch, Multiple Index Access, and Hybrid Join access paths.  RID pool failures can cause performance degradation as alternate access paths are invoked, such as scans, and the CPU invested up to the point of the failure is wasted. Not to mention that the scan usually will not perform as well as an indexed access!

Another aspect of memory configuration is sort memory. Sort operations are commonly performed during query execution, such as order by, group by, or distinct operations. Allocating sufficient memory for sort operations reduces the need for temporary disk storage, which can significantly impact query performance. It is important to allocate an appropriate amount of memory for sort operations based on the workload requirements, ensuring efficient sorting and minimizing disk I/O. Failure to provide sufficient memory for sorts can cause performance degradations can impact elapsed times dramatically and sort failures can terminate a statement.

In addition to specific memory allocations, it is important to consider the overall memory availability and system-wide settings. Ensuring that Db2 has access to an adequate amount of system memory prevents excessive swapping or paging, which can severely degrade performance. Adjusting system-wide memory parameters, such as the maximum memory target, can help fine-tune the overall memory allocation for Db2.

Regular monitoring of memory usage and performance metrics is crucial for effective memory configuration. By analyzing memory-related statistics and performance indicators, DBAs and performance analysts can identify potential bottlenecks or areas where memory resources may be over or underutilized. Proactive monitoring allows for timely adjustments to memory configuration to optimize performance.

Summing It Up

By allocating memory resources efficiently, including buffer pools, EDM pools, sort memory, and so on, you can minimize disk I/O, reduce contention, and enhance query execution speed. Regular monitoring and tuning of memory settings based on workload patterns and system-wide considerations contribute to a well-optimized Db2 environment with improved responsiveness and overall performance.

Finally, remember that tuning the memory structures of Db2 is in-depth subject that cannot be adequately covered in-depth in a blog post such as this. So, study those IBM Db2 manuals - and learn by doing. 

 



Thursday, July 13, 2023

Top 10 Db2 Performance Tips - No. 3 Efficient Indexing Strategies

Indexes play a crucial role in improving query performance when accessing data in a Db2 database. By creating indexes on specific columns, the Db2 optimizer can more quickly locate the desired data, reducing the need for full table scans and enabling faster query execution. Understanding and implementing efficient indexing strategies can greatly enhance the overall performance of a database and the applications that access it.

One of the first things you need to do is to understand the type of indexes used by Db2, which are B-tree indexes. B-tree indexes are commonly used and efficient for many situations. The general idea behind B-tree indexing is to improve the process of accessing data by making it easier to search through data faster. A B-tree stores data such that each node contains keys in ascending order. Each of these keys has two references to another two child nodes. The left side child node keys are less than the current keys, and the right side child node keys are more than the current keys. 

The first important factor in efficient indexing is carefully selecting the columns to be indexed. Not all columns require an index, and indexing too many columns can lead to unnecessary overhead. It is essential to identify the columns frequently used in WHERE clauses, JOIN conditions, or ORDER BY and create indexes on those columns. This targeted approach ensures that the indexes are focused on the most critical areas and deliver optimal query performance.

Regular index maintenance is vital for sustaining performance gains. Over time, indexes can become fragmented or outdated due to data modifications. Fragmented indexes can hinder query performance, so it is important to periodically monitor and address index fragmentation. DB2 provides utilities to reorganize or rebuild indexes, reducing fragmentation and improving query response times.

Considerations for index fragmentation include setting appropriate free space, which is the percentage of space used on each index page when data is loaded. It is important to strike a balance between minimizing space consumption and reducing index maintenance. 

Furthermore, it is crucial to monitor the impact of index usage on overall system performance. Query plans and performance monitoring tools provide insights into index usage and can identify situations where indexes are not effectively utilized. Identifying unused or underutilized indexes allows for their removal or modification, reducing storage overhead and improving overall database performance.

The following list of 10 steps can be used to ensure that you are using an effective indexing strategy for your Db2 databases and applications:

1. Index by workload, not by object

Many people make the mistake of just guessing at some indexes to create when they are creating other database objects (like tables and table spaces). But without an idea of how the tables are going to be accessed these guesses are usually wrong – at least some of them.

Indexes should be built to optimize the access of your SQL queries. To properly create an optimal set of indexes requires a list of the SQL to be used, an estimate of the frequency that each SQL statement will be executed, and the importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries most of the time be made.

If you are doing it any other way, you are doing it wrong.

2. Build indexes based on predicates

3. Index most-heavily used queries

Numbers 2 and 3 can be thought of as corollaries to Number 1… that is, these are the aspects of application workload that need to be examined to produce appropriate and effective indexes. 

Look at the predicates of the queries in the workload and work to create a set of indexes that match up to most (if not all) of them. When it is not practical to build all of the indexes you have identified, then it makes sense to look at the queries that will be used most often and optimize them first.

4. Index "important" queries

The more important the query, the more you might want to tune by index creation. If you are coding a query that the CIO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. 

Of course, the decision should depend on the application’s importance to the business - not just on the user’s importance. For example, if that clerk runs a query hundreds of times a day that impacts revenue, but the CIO runs his query infrequently, then perhaps the clerk's query is more important.

5. Index to avoid sorting (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses, the optimizer can use an index to avoid a sort, and thereby potentially improve performance.

6. Create indexes for uniqueness (PK, U)

Some indexes are required to make the database schema valid. For example, Db2 requires that unique indexes be created when unique and primary key constraints exist. These indexes are not optional.

7. Create indexes for foreign keys

Even if not required, creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints (RI – referential integrity).

8. Consider adding columns for index-only access

Sometimes it can be advantageous to include additional columns in an index to increase the chances of index-only access. With index-only access, all of the data needed to satisfy the query can be found in the index alone — without having to read data from the tablespace.

For example, suppose that there is an index on the DEPTNO column of the DEPT table. The following query may use this index:

    SELECT   DEPTNAME
    FROM     DEPT
    WHERE    DEPTNO >  'D00';

The index could be used to access only those columns with a DEPTNO greater than D00, but then Db2 would need to access the data in the table space to return the DEPTNAME. If you added DEPTNAME to the index, that is, create the index on (DEPTNO, DEPTNAME) then all of the data needed for this query exists in the index and additional I/O to the table space would not be needed. This technique is sometimes referred to as index overloading.

Of course, this is not always a good idea. You have to take into account whether other queries use the index and how it might negatively impact their performance.

It is also worth mentioning index include columns, which allows you to define additional (included) columns that are not part of the actual key but are available in the index. So we can create an index like this:


   CREATE INDEX IXNAME
     ON DEPT ( DEPTNO )
      INCLUDE ( DEPTNAME )
The index key is only on the DEPTNO column, but the DEPTNAME is also included in the index, so the earlier query can be satisfied using index-only access with this index.

9. Don’t arbitrarily limit the number of indexes

An example of a bad practice would be to place an artificial limit on indexing. There should be no arbitrary limit on the number of indexes that you can create for any database table. Indexes are undoubtedly one of the most important factors in creating efficient queries. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data. When a rule such as this exists, it usually is stated something like this...“Each table can have at most five indexes created for it” — or — “Do not create more than three indexes for any single table in the database.” These are bad standards. 

If you already have three indexes, or five indexes, or even a dozen indexes -- and another index will improve performance -- why would you arbitrarily want to avoid creating that index? Item 10 below discusses one reason, but otherwise, the downside is only that you will need to manage the index.

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.” 

Which brings us to…

10. Be aware of I/U/D implications

Db2 must automatically maintain every index you create. This means every INSERT and every DELETE to an indexed table will insert and delete not just from the table, but also from its indexes.

Additionally, when you UPDATE the value of a column that has been defined in an index, the DBMS must also update the index. So, indexes speed the process of retrieval but slow down modification.

So the general rule of thumb should be "Index until it hurts... and then back off the least effective index to make it no longer hurt." Sure, that is easier said than done, but it is a valid philosophy to follow.

Summary

In conclusion, efficient indexing strategies are vital for enhancing query performance in IBM Db2. By understanding the role of indexes in query execution and following best practices such as selecting appropriate index types, carefully choosing indexed columns, addressing index fragmentation, and monitoring index usage, database administrators can optimize the database's performance. Effective indexing improves query response times, reduces resource consumption, and contributes to a DB2 environment with high performance!