Showing posts with label sort. Show all posts
Showing posts with label sort. Show all posts

Monday, July 17, 2023

Top 10 Db2 Performance Tips - No. 4 Effective Memory Configuration

The memory configuration of an IBM Db2 environment plays a critical role in determining the performance of applications accessing data, as well as the overall efficiency of the subsystem and DBMS. By properly allocating and managing memory resources, DBAs and systems programmers can significantly enhance the system's responsiveness and query execution speed.

One of the primary areas of memory configuration is the allocation of buffer pools, as well as other pools of memory. Db2 for z/OS uses memory for buffer pools, the EDM pool, RID pool and sort pools to cache data and structures in memory. The better memory is allocated to these structures, the better Db2 and applications that access it will perform. 

Buffer pools act as a cache for frequently accessed data pages, reducing disk I/O and improving query performance. Allocating an appropriate amount of memory to buffer pools is crucial to ensure that frequently accessed data remains in memory, readily available for query processing. By monitoring workload patterns and adjusting the buffer pool sizes accordingly, database administrators can optimize memory utilization and minimize disk I/O.

When allocating Db2 buffer pools, keep these rules of thumb in mind:

  • Don't allocate everything to a single buffer pool (e.g., BP0); use a multiple buffer pool strategy.
  • Explicitly specify a buffer pool for every table space and index. Don't simply let Db2 choose a default buffer pool by failing to specify one.
  • Isolate the Db2 Catalog in BP0 (and BP8K0 and BP16K0 ); put user and application Db2 objects into other buffer pools.
  • Consider separating indexes from table spaces with each in their own dedicated buffer pools.
  • Consider isolating heavily hit data into its own buffer pool to better control performance.
  • Consider isolating sorts into a single buffer pool and tuning it for mostly sequential access (e.g. BP7).
  • Consider separating Db2 objects into separate buffer pools that have been configured for sequential verses random access.

Forget about trying to follow a cookie-cutter approach to buffer pool management. Every shop must create and optimize a buffer pool strategy for its own data and application mix. DB2 offers several buffer pool tuning "knobs" that can be used to configure virtual buffer pools to the type of processing they support. The following parameters all can be changed using the ALTER BUFFERPOOL command:

DWQT – this value is the deferred write threshold; it is expressed as a percentage of the virtual buffer pool that might be occupied by unavailable pages. When this threshold is reached Db2 will start to schedule write I/Os to externalize data. The default is 30%, which is likely to be too high for most shops.

VDWQT – this value is the vertical deferred write threshold; it is basically the same as DWQT, but for individual data sets. The default is 10%, which once again is quite likely to be too high for many shops.

VPSEQT – this value is the sequential steal threshold; it is a expressed as a percentage of the virtual buffer pool that can be occupied by sequentially accessed pages. Tune buffer pools for sequential access (such as scans and sorting) by modifying VPSEQT to a larger value. The default is 80%.

VPPSEQT – this value is the sequential steal threshold for parallel operations; the default value is 50% of VPSEQT.

VPXPSEQT – this value is assisting parallel sequential threshold; it is basically the VPPSEQT for operations from another Db2 subsystem in the data sharing group.

You can tune these parameters, as well as the size of the buffer pools, to accommodate the usage patterns of your Db2 applications. 

Consider modifying the deferred write threshold parameters to enable trickle write; that means lower values that will cause changed data to be written to disk more frequently, instead of waiting for a system checkpoint to occur. 

Furthermore, think about modifying the sequential steal thresholds for the type of data being buffered; if that data is mostly sequentially accessed, then increase these thresholds… if the data is mostly randomly accessed, then decrease these thresholds. Of course, these are basic, high-level guidelines that you will need to study before adjusting them at your shop.

The PGSTEAL parameter also can be adjusted to modify the manner in which the buffer pool steals pages when new data arrives and there is no space for it. There are three options: 

  • LRU, 
  • FIFO, and 
  • NONE. 

The typical option is LRU, or least recently used. This will cause the oldest pages (in terms of when they were last accessed) to be stolen before newer pages. An alternate approach is FIFO, or first in/first out. With this approach there is no need for Db2 to monitor when the data was last accessed to determine which the least-recently pages. FIFO can reduce CPU usage (no LRU algorithm needed) and works well if the data is read once and never accessed again. The final option, NONE, is a special case to be used when a buffer pool is large enough to hold all of the data assigned to it so no page stealing is needed. When NONE is specified, Db2 will pre-load the buffer pool when the objects are opened, basically creating an in-memory area for the data. 

You can also use the PGFIX parameter to fix buffer pool pages in real storage. Doing so avoids the processing time that DB2 needs to fix and free pages for every I/O operation. This can reduce CPU for bufferpools involved in very intensive I/O applications.

Other Memory Considerations

In addition to buffer pools, Db2 uses memory for other purposes. The first we will examine is the EDM pool. EDM stands for Environmental Descriptor Manager. The EDM pool is used for caching internal structures used by Db2 programs. This includes DBDs, SKCTs, CTs, SKPTs, and PTs. It also includes the authorization cache for plans and packages, as well as the cache for dynamic SQL mini-plans.

Although it is common to refer to the EDM pool in the singular, Db2 actually breaks the EDM pool into separate pools for DBDs, for the dynamic statement cache, and for the program elements (CTs, SKCTs, PTs, SKPTs). Tuning the size of these structures to facilitate the processing requirements of your applications is crucial to ensuring optimal performance.

As a general rule of thumb, shoot for an 80 percent hit rate with the EDM pools; this means that only one out every five times should a structure need to be loaded from disk into the EDM pool.

Db2 also uses a pool to help with specific types of access paths called the RID poolThe RID pool is used by Db2 to store RIDs (record identifiers) for List Prefetch, Multiple Index Access, and Hybrid Join access paths.  RID pool failures can cause performance degradation as alternate access paths are invoked, such as scans, and the CPU invested up to the point of the failure is wasted. Not to mention that the scan usually will not perform as well as an indexed access!

Another aspect of memory configuration is sort memory. Sort operations are commonly performed during query execution, such as order by, group by, or distinct operations. Allocating sufficient memory for sort operations reduces the need for temporary disk storage, which can significantly impact query performance. It is important to allocate an appropriate amount of memory for sort operations based on the workload requirements, ensuring efficient sorting and minimizing disk I/O. Failure to provide sufficient memory for sorts can cause performance degradations can impact elapsed times dramatically and sort failures can terminate a statement.

In addition to specific memory allocations, it is important to consider the overall memory availability and system-wide settings. Ensuring that Db2 has access to an adequate amount of system memory prevents excessive swapping or paging, which can severely degrade performance. Adjusting system-wide memory parameters, such as the maximum memory target, can help fine-tune the overall memory allocation for Db2.

Regular monitoring of memory usage and performance metrics is crucial for effective memory configuration. By analyzing memory-related statistics and performance indicators, DBAs and performance analysts can identify potential bottlenecks or areas where memory resources may be over or underutilized. Proactive monitoring allows for timely adjustments to memory configuration to optimize performance.

Summing It Up

By allocating memory resources efficiently, including buffer pools, EDM pools, sort memory, and so on, you can minimize disk I/O, reduce contention, and enhance query execution speed. Regular monitoring and tuning of memory settings based on workload patterns and system-wide considerations contribute to a well-optimized Db2 environment with improved responsiveness and overall performance.

Finally, remember that tuning the memory structures of Db2 is in-depth subject that cannot be adequately covered in-depth in a blog post such as this. So, study those IBM Db2 manuals - and learn by doing. 

 



Tuesday, June 10, 2014

ORDER BY an Expression

Sometimes a program requires that the results of a query be returned in a specific sequence. We all know that the ORDER BY clause can be used to sort SQL results into a specific order. For example, to return a sorted list of employee compensation sorted by last name we could write:

SELECT LASTNAME, 
       FIRSTNAME, 
       SALARY+COMM+BONUS
FROM   EMP
ORDER BY LASTNAME;

But what if we need to sort it by total compensation? There are two approaches that work here: position number and column renaming. Using position number the ORDER BY clause becomes:

SELECT LASTNAME, 
       FIRSTNAME, 
       SALARY+COMM+BONUS
FROM   EMP
ORDER BY 3;

This will cause DB2 to sort by the third element in the SELECT-list, in this case the total compensation expression. But what if we add another column at the beginning of the SELECT-list? Or what if we need to port the SQL to a different database with different standards? Well, in that case we can use column renaming:

SELECT LASTNAME, 
       FIRSTNAME, 
       SALARY+COMM+BONUS AS TOTAL_COMP
FROM   EMP
ORDER BY TOTAL_COMP;

This method is preferred for a number of reasons:

  • it will continue to work even if the SQL statement is changed
  • it gives the expression a name making it more self-documenting
  • it should be more portable

Monday, February 21, 2011

Not Your Standard Sorting Requirement

Sometimes the requirements of a particular application dictate that data needs to be sorted using some irregular collating sequence. These odd needs sometimes cause developers to sit and scratch their heads for hours, searching for ways to make DB2 do something that seems to be "unnatural." But often you can create an answer just by understanding the problem and applying some creative SQL.

At this point, some of you might be asking "What the heck is he talking about?" Fair enough. Let’s take a look at an example to bring the issue into focus.

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing an abbreviation for the name of the day on which the transaction happened; let’s call this column DAY_NAME. So, for example, the DAY_NAME column would contain MON for Monday data, and so on.

Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, the first step is usually to write the first query that comes to mind, or something like this:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY DAY_NAME;

Of course, the results will be sorted improperly. ORDER BY will sort the results alphabetically; in other words: FRI MON SAT SUN THU TUE WED

This is what I mean by an irregular sorting requirement. Here we have an example that occurs commonly enough, but without an obvious immediate solution. Furthermore, many businesses and applications have similar requirements for which the business needs dictate a different sort order than strictly alphabetical or numeric. So what is the solution here?

Of course, one solution would be to design the table with an additional numeric or alphabetic column that would sort properly. By this, I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. But this requires a database design change, and it becomes possible for the DAY_NUM and DAY_NAME to get out of sync if you are not careful.

There is another solution that is both elegant and does not require any change to the database. To implement this solution, all you need is an understanding of SQL and SQL functions -- in this case, the LOCATE function. Consider this SQL:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME);

The trick here is to understand how the LOCATE function works. The LOCATE function returns the starting position of the first occurrence of one string within another string.

So, in our example, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position. So, if DAY_NAME is WED, the LOCATE function returns 10. (Note: Some other database systems have a similar function called INSTR.) Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value, given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME)/3) + 1;

Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week.

Summary

With a sound understanding of the features of DB2 SQL and a little imagination many irregular requirements are achievable using nothing more than SQL!