Showing posts with label parallelism. Show all posts
Showing posts with label parallelism. Show all posts

Tuesday, October 17, 2023

Top Ten Db2 Performance Tips – No. 8 I/O Optimization

I/O operations play a critical role in the performance of an IBM DB2 database and the applications that access it. Efficient disk I/O can significantly impact query response times and overall system performance. Let’s explore the importance of I/O optimization and discuss techniques for maximizing disk I/O performance.

The performance of I/O directly affects the speed at which data is read from or written to disk. Slow or inefficient I/O can become a performance bottleneck, leading to increased query execution times and degraded system responsiveness.

Data Set Placement

One technique for optimizing disk I/O is strategic data placement. Placing frequently accessed data on faster storage devices, such as solid-state drives (SSDs) or high-performance disk arrays, can significantly improve I/O performance. Identifying hot spots, such as frequently accessed tables or indexes, and ensuring they are stored on optimal storage media can help minimize I/O bottlenecks and accelerate query processing.

When it comes to the placement of Db2 data sets, the most critical consideration is certainly the log data sets and the BSDS. Be sure to place the active logs and BSDS data sets on the fastest disk you have at your disposal. The log can quickly become a bottleneck because all changes (with few exceptions) are written to the logs. Db2 writes first to log buffers and then from the buffers to an active log data set. The log is there to ensure recoverability. So, using the highest-speed disk for your logs can help to minimize the potential for pain here.

You also need to take care to make sure that your logs are efficient and effective by running with dual logs. By doing so you ensure that if one copy of the log is damaged, Db2 can continue operating with the other copy. If there is no log available Db2 operations screech to a halt. To run with dual logging, one set of log data sets should be on separate volumes than the other set. This should remove a disk failure as the cause of an outage.

As the active logs are filled, a process called log offloading is invoked by Db2 to move the log information to archive log data sets. This process prohibits the active logs from filling up during Db2 processing which would stifle processing. The archive logs are needed for recovery situations, such as rolling back transactions and recovering table spaces and indexes.

It is a wise course of action to keep your archive logs on disk, and over time migrate them to tape. Think about the trade-off in saving tape versus recovery speed, and your decision should be obvious. Before backing up any logs to tape, it’s a good idea to always have at least 24 hours covered by the active log and the least 48 hours by archive logs on disk.

And then there is the BSDS (bootstrap data set), which is a VSAM key-sequenced data set (KSDS). The BSDS contains important information for Db2 operations. One such piece of information is the names of the logs. Db2 uses information in the BSDS for system restarts and for any activity that requires reading the log. So again, keeping the BSDS on high-speed disk further boosts the operational performance of Db2.

Of course, there are other important Db2 data sets that you will want to carefully consider placing on optimal I/O devices.  For example, the Db2 Directory (DSNBD01) and Db2 Catalog (DSNBD06) data sets are frequently accessed and should be cared for to ensure the performance of the entire Db2 subsystem. And you may have some user database objects that are critically important or frequently accessed that you can place on faster disk than your other user database objects. This, of course, requires an understanding of your business and Db2 applications, as well as the appropriate budget to accomplish.

The overall goal of data set placement should be to balance I/O activity between different volumes, control units, and channels. This can minimize I/O elapsed time and I/O queuing.

RAID

RAID (Redundant Array of Independent Disks) configurations also play a significant role in I/O optimization. RAID levels, such as RAID 0, RAID 1, RAID 5, or RAID 10, offer different levels of performance, fault tolerance, and data protection. Configuring RAID appropriately based on workload requirements and system resources can improve I/O performance and provide the desired level of data redundancy. RAID striping, in particular, can distribute data across multiple disks, enabling parallel I/O operations and enhancing throughput.

What About Extents?

Some folks think “With RAID/modern storage devices and the latest Db2 and z/OS features, extents are no longer anything to worry about.” And this is a largely true statement, but there are some things you still should think about with regard to extents.

First, keep in mind that the latest extent management features only work with SMS-managed data sets, so if you are still using user-managed data sets then all of the old rules apply! For SMS-managed data set you can have up to 123 extents on each of 59 volumes. So as of z/OS 1.7, the limit is 7,257 extents for a data set instead of the 255 we’ve been used to for some time. Again though, to enable this requires DFSMS.

Extent consolidation also requires SMS-managed STOGROUPs. If a new extent is adjacent to an old extent, they will be merged together automatically. This can result in some extents being larger than the PRIQTY or SECQTY specification(s). Note that this feature was introduced in z/OS 1.5.

So, what if everything is SMS-controlled? Even then it is possible for extents to impact performance, albeit probably only slightly. Each extent on a disk file has different control blocks controlling access. This means that elapsed time can increase if there is heavy insert activity. For other types of processing (read and update) the number of extents really does not impact on performance.

Another thing to consider is that Db2 can allocate sliding scale secondary extents. This is enabled by setting MGEXTSZ DSNZPARM to YES. Note that the default is YES as of DB2 9 for z/OS. With sliding scale extents the extent sizes allocated gradually increase. Db2 uses a sliding scale for secondary extent allocations of table spaces and indexes when:

  • You do not specify a value for the SECQTY option of a CREATE TABLESPACE or CREATE INDEX statement
  • You specify a value of -1 for the SECQTY option of an ALTER TABLESPACE or ALTER INDEX statement.

Otherwise, Db2 uses the SECQTY value for secondary extent allocations, if one is explicitly specified (and the SECQTY value is larger than the value that is derived from the sliding scale algorithm). If the table space or index space has a SECQTY greater than 0, the primary space allocation of each subsequent data set is the larger of the SECQTY setting and the value that is derived from a sliding scale algorithm. Without going into all of the gory details, sliding scale extent allocation can help to reduce the number of extents for your Db2 objects as they grow in size over time. And it can help when you do not have a firm understanding of how your data will grow over time.

At any rate, things are not like the olden days where you had to regularly monitor extents and clean them up all the time by reorganizing your table spaces and index spaces.

Additional I/O Considerations

Optimizing I/O parallelism settings can help to improve disk I/O performance. Tuning I/O parallelism settings, such as the number of concurrent I/O operations or I/O thread configuration, can also enhance I/O performance by allowing simultaneous access to multiple disks or storage devices.

Regular monitoring of I/O performance metrics, such as disk read/write rates, queue lengths, and average response times, is essential for identifying potential I/O bottlenecks and fine-tuning the I/O subsystem. Analyzing performance data and workload patterns can help identify areas for improvement and guide decision-making regarding data placement, file system selection, RAID configuration, and other I/O optimization techniques.

Conclusion

In conclusion, optimizing disk I/O is vital for improving the performance of Db2 databases and the applications that access them. By strategically placing data, properly configured your logs and BSDS, configuring RAID appropriately, tuning I/O parallelism settings, and regular monitoring, you can enhance I/O performance, reduce latency, and accelerate query processing. 

Friday, July 08, 2016

Good Old Partitioned Table Space Confusion

Partitioned table spaces have been with us in the world of DB2 forever, but even as we embark on moving to Universal table spaces, there still exists some confusion "out there" in terms of the behavior of partitioning... 

For example, consider this question:

If a table is in a partitioned table space, say four partitions, for example, then to process the table in batch can I run four instances of the batch program in parallel, one against each partition?

This type of question exhibits some confusion about how partitioned table spaces work. First of all, let's assume that we are talking about classic partitioned table spaces and/or Universal range-partitioned table spaces. 

Given that assumption, yes, you can run four instances of a batch program in parallel if you so desire. Of course, that is not necessary to get DB2 use parallel tasks to read the data. The best approach is to BIND the program (package) specifying the DEGREE(ANY) parameter. Then DB2 will choose the degree of parallelism for the batch program. This is simpler than any alternative approach you might choose to code up because DB2 handles it all for you.

If you still wish to run four instances of the batch program you, of course, can. In that case you would probably want to BIND using DEGREE(1). In order for this to work the way I infer that you intend it to work, however, you may have to modify the program somewhat. I assume that you wish each job to process only against one of the four partitions. To accomplish this, you must provide some way for the program to identify and process only the data from one of the four partitions based on the partitioning key range specified in the partitioning index. 

For example, you might choose to input parameters to the batch run specifying the key range for that program to process. As long as the program adheres to that key range you should only process data from the one partition that holds that data.

Of course, that means that your programmers must be diligent in applying the key range. Perhaps you could set up 4 views that apply the key range for each partition and then use only those views in each program. But that means you will have multiple versions of the same program. You could also specify the range as an input variable and then use only one program. But that means you have to make sure that you are sending the proper range to the input variables.

So just go with the first approach and let DB2 do it for you when it makes sense...