Monitoring and measuring performance metrics are essential practices for maintaining and optimizing the performance of an IBM Db2 environment. By regularly monitoring and analyzing performance data, DBAs can identify bottlenecks, proactively address performance issues, and make informed decisions to improve overall system efficiency.
It is important to monitor and measure performance metrics in order to gain insights into the behavior of the applications and databases in use at your site. By examining their behavior and identifying areas for improvement, you can improve the overall satisfaction of your customers and end users.
Performance metrics provide valuable information about resource utilization, query response times, disk I/O, CPU usage, and other critical aspects of database performance. By tracking these metrics over time, administrators can detect patterns, identify trends, and pinpoint potential performance bottlenecks.
A Strategy
The first part of any Db2 performance management strategy should be
to provide a comprehensive approach to the monitoring of the Db2 subsystems
operating at your site. This approach involves monitoring not only the threads
accessing Db2 and the SQL they issue, but also the Db2 address spaces. You can
accomplish this task in three ways:
- Batch reports run against DB2 trace records. While DB2 is running, you can activate traces that accumulate information, which can be used to monitor both the performance of the DB2 subsystem and the applications being run.
- Online access to DB2 trace information and DB2 control blocks. This type of monitoring also can provide information on DB2 and its subordinate applications.
- Sampling DB2 application programs as they run and analyzing which portions of the code use the most resources.
You also need to establish a strategy for your organization's tuning goals. And your tuning goals should be set using the discipline of service level management (SLM). A service level is a measure of operational behavior. SLM ensures applications behave accordingly by applying resources to those applications based on their importance to the organization. Depending on the needs of the organization, SLM can focus on availability, performance, or both. In terms of availability, the service level can be defined as “99.95% up time, during the hours of 9:00 AM to 10:00 PM on weekdays.” Of course, a service level can be more specific, stating “average response time for transactions will be two seconds or less for workloads of strategic external clients.”
For a service
level agreement (SLA) to be successful, all of the parties involved must agree
upon stated objectives for availability and performance. The end users must be
satisfied with the performance of their applications, and the DBAs and
technicians must be content with their ability to manage the system to the
objectives. Compromise is essential to reach a useful SLA.
Furthermore, the objectives of the SLA must be defined and measurable. For example, in the earlier SLA you must define what a “strategic client” is and differentiate strategic from nonstrategic. Although doing this may be difficult, failing to do so can result in worthless SLAs that cannot be achieved.
In the end, if you do not identify service levels for each transaction, then you will always be managing to an unidentified requirement. Without a predefined and agreed upon SLA, how will the DBA and the end users know whether an application is performing adequately? Without SLAs, business users and DBAs might have different expectations, resulting in unsatisfied business executives and frustrated DBAs. Not a good situation.
Db2 Traces
One of the first types of performance metrics to consider is
monitoring based on reading trace information. You can think of a Db2 trace as
a window into the performance characteristics of aspects of your Db2 workload.
Db2 traces record diagnostic information describing particular events. As Db2
operates, it writes trace information that can be read and analyzed to obtain
performance information.
Db2 provides six types of traces, and each describes information about the Db2 environment:
- Accounting - Records
performance information about the execution of DB2 application programs
- Audit - Provides information about DB2 DDL, security, utilities, and data modification
- Global - Provides information for the
servicing of DB2
- Monitor - Records data useful for online
monitoring of the DB2 subsystem and DB2 application programs
- Performance - Collects detailed data about DB2 events, enabling database and performance analysts to pinpoint the causes of performance problems
- Statistics - Records information regarding the DB2 subsystem’s use of resources
You can start Db2 traces in two ways: by specifying the appropriate DSNZPARMs at Db2 startup
or by using the -START TRACE command to initiate specific traces when Db2 is already running.
Each trace is broken down further into classes, each of which
provides information about aspects of that trace. Additional informatiohn about the classes for each type of trace can be found here, here, and here.
You should understand what type of information is traced and the approximate overhead of each trace class before starting any of the Db2 traces. Some traces are better left off until or unless a performance problem is occurring, at which point, the trace can be started to capture details about the situation. Others are better to have turned on before problems occur. Keep in mind that you will have some trace classes (and IFCIDs) that are always started, and other that are only used in emergencies.
What are IFCIDs?
Each trace class is associated with specific trace events known as
Instrumentation Facility Component Identifier (IFCIDs), pronounced “if-kid.” An
IFCID defines a record that represents a trace event. IFCIDs are the single
smallest unit of tracing that can be invoked
by Db2.
In some cases, it can make sense to avoid activating trace classes
altogether and start traces specifying only the IFCIDs needed. This way, you
can reduce the overhead associated with tracing by recording only the trace
events needed.
There are several hundred different IFCIDs. Most IFCIDs contain data fields that describe events pertinent to the event being traced. Some IFCIDs have no data; instead they merely mark a specific time. Certain trace events of extended durations require a pair of IFCIDs: one for the beginning of the event and another for the end. These pairs enable the computation of elapsed times. Other trace events that are not as lengthy require only a single IFCID.
You can find the IFCIDs associated with each trace class in the IBM Db2Command Reference manual in the section describing the -START TRACE command. But that manual does not describe the purpose of each IFCID. A list describing each IFCID can be found in the data set named SDSNIVPD(DSNWMSGS), which is part of the Db2 installation.
Db2 Performance Monitors
Several popular performance monitoring solutions are available for
Db2 for z/OS. IBM’s Omegamon, BMC Software’s MainView, Broadcom’s Sysview,
and Rocket Software's TMON are the leading performance monitors. When
selecting a performance monitor, be sure to examine the online components as
well as the batch reporting capabilities of the monitor.
An online performance monitor is a tool that provides real-time
reporting on Db2 performance statistics as Db2 operates. In contrast, a batch
performance monitor reads previously generated trace records from an input data
set. Most performance monitors today can be used to serve both purposes.
With online DB2 performance monitors, you can perform proactive
performance management tasks. In other words, you can set up the monitor such
that when it detects a problem it alerts a DBA and possibly takes actions on
its own to resolve the problem.
The most common way to provide online performance monitoring capabilities is by online access to DB2 trace information in the MONITOR trace class. Some online DB2 performance monitors also provide direct access to Db2 performance data by reading the control blocks of the Db2 and application address spaces. This type of monitoring provides a “window” to up-to-the-minute performance statistics while Db2 runs. This information is important if a quick reaction to performance problems is required.
Most online Db2 performance monitors provide a menu-driven
interface accessible from TSO or VTAM. It enables online performance monitors
to start and stop traces as needed based on the menu options chosen by the user.
Consequently, you can reduce overhead and diminish the learning curve involved
in understanding Db2 traces and their correspondence to performance reports.
Following are some typical uses of online performance monitors.
Many online performance monitors can establish effective exception-based
monitoring. When specified performance thresholds are reached, triggers can
offer notification and take action. For example, you could set a trigger when
the number of lock suspensions for the TXN00002 plan is reached; when the
trigger is activated, a message is sent to the console, and a batch report is
generated to provide accounting detail information for the plan. You can set
any number of triggers for many thresholds. Following are some examples of thresholds you might choose to set:
- When a buffer pool threshold is reached (PREFETCH DISABLED, DEFERRED WRITE THRESHOLD, or DM CRITICAL THRESHOLD).
- For critical transactions, when predefined performance objectives are not met. For example, if TXN00001 requires subsecond response time, set a trigger to notify a DBA when the transaction receives a class 1 accounting elapsed time exceeding 1 second by more than 25%.
- Many types of thresholds can be established. Most online monitors support this capability. As such, you can customize the thresholds for the needs of your DB2 environment.
Most online performance monitors can produce real-time EXPLAINs for
long-running SQL statements. If an SQL statement takes a significant amount of
time to process, an analyst can display the SQL statement as it executes and
dynamically issue an EXPLAIN for the statement. Even as the statement executes, an
understanding of why it takes so long to run can be achieved. A complete
discussion of the EXPLAIN statement is provided in the next chapter.
Some online performance monitors provide historical trending. These monitors track performance statistics and store them in DB2 tables or in VSAM files with a timestamp. They also provide the capability to query these stores of performance data to assist in the following:
- Analyzing recent history. Most SQL statements execute quickly, making difficult the job of capturing and displaying information about the SQL statement as it executes. However, you might not want to wait until the SMF data is available to run a batch report. Quick access to recent past-performance data in these external data stores provides a type of online monitoring that is as close to real time as is usually needed.
- Determining performance trends, such as a transaction steadily increasing in its CPU consumption or elapsed time.
- Performing capacity planning based on a snapshot of the recent performance of DB2 applications.
A final benefit of online DB2 performance monitors is their capability to interface with other z/OS monitors for IMS/TM, CICS, z/OS, or VTAM. This way, an analyst gets a view of the entire spectrum of system performance. Understanding and analyzing the data from each of these monitors, however, requires a different skill. Quite often, one person cannot master all these monitors
Although this blog entry was brief, and there are many additional aspects to Db2 performance monitoring, monitoring and measuring performance metrics should be viewed as a vital requirement for all organizations using Db2 databases. By collecting and analyzing performance data, DBAs can detect performance bottlenecks, identify areas for improvement, and make informed decisions to enhance overall system efficiency.