The Db2 Portal Blog: SQL

Showing posts with label SQL. Show all posts

Thursday, July 25, 2024

Coding Db2 Applications for Performance - Expert Videos Series

Today's blog post is to share with my readers that I have partnered with Interskill Learning and produced a series of videos in the Expert Video Series on how to code Db2 applications for performance.

My regular readers know that application performance is a passion of mine. You may also have read my recent book on the topic, A Guide to Db2 Performance for Application Developers. But if you are looking for videos to guide you through the process optimizing your application development for Db2, look no further than the six-part series I recorded for Interskill Learning, Coding Db2 Applications for Performance.

You do not need in-depth pre-existing knowledge of Db2 to gain insight from these video lessons. The outline of the six courses are as follows:

Db2 Coding – Defining Database Performance

Providing a Definition
The Four Components
Diving a Little Deeper

Db2 Coding – Coding Relationally

What is Relational?
Relational vs. Traditional Thinking
What Does It Mean to Code Relationally?
Unlearning Past Coding Practices

Db2 Coding – General SQL and Indexing Guidelines

Types of SQL
SQL Coding Best Practices
Indexes and Performance
Stages and Clustering

Db2 Coding – Coding for Concurrent Access

Introduction to Concurrency
Locking
Locking Duration and Binding
Locking Issues and Strategies
Query Parallelism

Db2 Coding – Understanding and Reviewing Db2 Access Paths

Single Table Access Paths
Multi-table Access Paths
Filter Factors
Access Paths and EXPLAIN

Db2 Coding – SQL Coding Tips and Techniques

Avoid Writing Code
Reusable Db2 Code
Dynamic and Static SQL
SQL Guidelines
Set Operations

So if you are looking for an introduction to Db2 performance or want to brush up on the fundamentals of coding for performance, look no further. Check out this series of videos on Coding Db2 Applications for Performance from Interskill Learning (featuring yours truly)!

Note that Interskill Learning also offers other categories of training in their Expert Video series including systems programming, quantum computing, and pervasive encryption.

Tuesday, March 26, 2024

Mixing Db2 Database Administration with DevOps - Part 5: SQL Performance Testing

Although implementing database schema changes is the most important component when incorporating database structures into your DevOps pipeline, it is not the only thing to consider. It is also important to be able to analyze and optimize SQL performance within your application code.

As anyone who has written SQL knows, it is a very flexible language. There are multiple ways to write SQL queries to achieve the same results. For example, you can combine multiple tables using a join or a subselect and achieve the same results. But each SQL formulation is likely to perform differently, one better than the other. And this is but one example of the various ways you can build SQL statements to perform the same function.

The development mindset is usually to write code that matches the requirements and delivers the expected results, not necessarily to assure the best performance. Therefore, SQL performance testing should be carried out on all programs before they are migrated to a production environment. Failure to do so will likely result in poorly performing applications.

In a DevOps environment, the best approach is to measure, analyze and improve SQL statements at all stages as your code progresses from development to testing to production. The more SQL performance testing that can be accomplished by developers the earlier performance problems will be found and corrected. And that means the cost of delivering high-quality Db2 applications will decline.

However, things are not as simple as just running your program and evaluating its performance metrics. The data that you use in your test environment will not be the same as your production data. Typically, you will have less test data than you do in production. So, if you run the RUNSTATS utility on your test data you will get different statistics than in production, which means you will also get different access paths and performance results.

Setting up the test environment with production statistics and modeling the environment to mimic production is an important aspect of performance testing during development.

With the proper setup and tooling, developers can examine the access paths of their SQL statements to judge their efficiency. Of course, tools that can simplify this process are needed to speed up SQL performance testing. Such tooling should be able to capture Explain information, display it graphically and combine it with pertinent catalog statistics, store a repository of access paths by statement, compare access paths, identify changes, and make recommendations. Ideally, the tool should be integrated into the DevOps toolchain so that information is automatically captured and analyzed each time the program is compiled and bound.

Considerations should also be made for testing specific use cases for performance. For example, consider skewed data. Db2 assumes that data values are mostly uniformly distributed throughout the data. However, not all data is uniformly distributed. RUNSTATS can be used to capture information about non-uniformly distributed and skewed data.

Another performance testing consideration is to always try multiple SQL variations, especially for queries that access a lot of data or have complex access paths. Do not just find one SQL formulation that works and stick with it. Remember that you can code multiple variations of SQL statements that return the same data, but that perform quite differently.

Tools that can help set up testing for various use cases and SQL variations will be needed for integrating SQL performance testing into the DevOps toolchain. There are a wide variety of vendors and solutions for managing Db2 for z/OS SQL performance, but I am not aware of any that have been fully integrated into the DevOps toolchain.

Tuesday, August 08, 2023

Top 10 Db2 Performance Tips - No. 6 Monitoring and Performance Metrics

Monitoring and measuring performance metrics are essential practices for maintaining and optimizing the performance of an IBM Db2 environment. By regularly monitoring and analyzing performance data, DBAs can identify bottlenecks, proactively address performance issues, and make informed decisions to improve overall system efficiency.

It is important to monitor and measure performance metrics in order to gain insights into the behavior of the applications and databases in use at your site. By examining their behavior and identifying areas for improvement, you can improve the overall satisfaction of your customers and end users.

Performance metrics provide valuable information about resource utilization, query response times, disk I/O, CPU usage, and other critical aspects of database performance. By tracking these metrics over time, administrators can detect patterns, identify trends, and pinpoint potential performance bottlenecks.

A Strategy

The first part of any Db2 performance management strategy should be to provide a comprehensive approach to the monitoring of the Db2 subsystems operating at your site. This approach involves monitoring not only the threads accessing Db2 and the SQL they issue, but also the Db2 address spaces. You can accomplish this task in three ways:

Batch reports run against DB2 trace records. While DB2 is running, you can activate traces that accumulate information, which can be used to monitor both the performance of the DB2 subsystem and the applications being run.

Online access to DB2 trace information and DB2 control blocks. This type of monitoring also can provide information on DB2 and its subordinate applications.

Sampling DB2 application programs as they run and analyzing which portions of the code use the most resources.

Using all three approaches can be a reasonable approach if you have analyzed the type of workloads in use and which types of monitoring are most effective for each.
You also need to establish a strategy for your organization's tuning goals. And your tuning goals should be set using the discipline of service level management (SLM). A service level is a measure of operational behavior. SLM ensures applications behave accordingly by applying resources to those applications based on their importance to the organization. Depending on the needs of the organization, SLM can focus on availability, performance, or both. In terms of availability, the service level can be defined as “99.95% up time, during the hours of 9:00 AM to 10:00 PM on weekdays.” Of course, a service level can be more specific, stating “average response time for transactions will be two seconds or less for workloads of strategic external clients.”

For a service level agreement (SLA) to be successful, all of the parties involved must agree upon stated objectives for availability and performance. The end users must be satisfied with the performance of their applications, and the DBAs and technicians must be content with their ability to manage the system to the objectives. Compromise is essential to reach a useful SLA.

Furthermore, the objectives of the SLA must be defined and measurable. For example, in the earlier SLA you must define what a “strategic client” is and differentiate strategic from nonstrategic. Although doing this may be difficult, failing to do so can result in worthless SLAs that cannot be achieved.

In the end, if you do not identify service levels for each transaction, then you will always be managing to an unidentified requirement. Without a predefined and agreed upon SLA, how will the DBA and the end users know whether an application is performing adequately? Without SLAs, business users and DBAs might have different expectations, resulting in unsatisfied business executives and frustrated DBAs. Not a good situation.

Db2 Traces

One of the first types of performance metrics to consider is monitoring based on reading trace information. You can think of a Db2 trace as a window into the performance characteristics of aspects of your Db2 workload. Db2 traces record diagnostic information describing particular events. As Db2 operates, it writes trace information that can be read and analyzed to obtain performance information.

Db2 provides six types of traces, and each describes information about the Db2 environment:

Accounting - Records performance information about the execution of DB2 application programs
Audit - Provides information about DB2 DDL, security, utilities, and data modification
Global - Provides information for the servicing of DB2
Monitor - Records data useful for online monitoring of the DB2 subsystem and DB2 application programs
Performance - Collects detailed data about DB2 events, enabling database and performance analysts to pinpoint the causes of performance problems
Statistics - Records information regarding the DB2 subsystem’s use of resources

You can start Db2 traces in two ways: by specifying the appropriate DSNZPARMs at Db2 startup or by using the -START TRACE command to initiate specific traces when Db2 is already running.

Each trace is broken down further into classes, each of which provides information about aspects of that trace. Additional informatiohn about the classes for each type of trace can be found here, here, and here.

You should understand what type of information is traced and the approximate overhead of each trace class before starting any of the Db2 traces. Some traces are better left off until or unless a performance problem is occurring, at which point, the trace can be started to capture details about the situation. Others are better to have turned on before problems occur. Keep in mind that you will have some trace classes (and IFCIDs) that are always started, and other that are only used in emergencies.

What are IFCIDs?

Each trace class is associated with specific trace events known as Instrumentation Facility Component Identifier (IFCIDs), pronounced “if-kid.” An IFCID defines a record that represents a trace event. IFCIDs are the single smallest unit of tracing that can be invoked
by Db2.

In some cases, it can make sense to avoid activating trace classes altogether and start traces specifying only the IFCIDs needed. This way, you can reduce the overhead associated with tracing by recording only the trace events needed.

There are several hundred different IFCIDs. Most IFCIDs contain data fields that describe events pertinent to the event being traced. Some IFCIDs have no data; instead they merely mark a specific time. Certain trace events of extended durations require a pair of IFCIDs: one for the beginning of the event and another for the end. These pairs enable the computation of elapsed times. Other trace events that are not as lengthy require only a single IFCID.

You can find the IFCIDs associated with each trace class in the IBM Db2Command Reference manual in the section describing the -START TRACE command. But that manual does not describe the purpose of each IFCID. A list describing each IFCID can be found in the data set named SDSNIVPD(DSNWMSGS), which is part of the Db2 installation.

Db2 Performance Monitors

Several popular performance monitoring solutions are available for Db2 for z/OS. IBM’s Omegamon, BMC Software’s MainView, Broadcom’s Sysview, and Rocket Software's TMON are the leading performance monitors. When selecting a performance monitor, be sure to examine the online components as well as the batch reporting capabilities of the monitor.

An online performance monitor is a tool that provides real-time reporting on Db2 performance statistics as Db2 operates. In contrast, a batch performance monitor reads previously generated trace records from an input data set. Most performance monitors today can be used to serve both purposes.

With online DB2 performance monitors, you can perform proactive performance management tasks. In other words, you can set up the monitor such that when it detects a problem it alerts a DBA and possibly takes actions on its own to resolve the problem.

The most common way to provide online performance monitoring capabilities is by online access to DB2 trace information in the MONITOR trace class. Some online DB2 performance monitors also provide direct access to Db2 performance data by reading the control blocks of the Db2 and application address spaces. This type of monitoring provides a “window” to up-to-the-minute performance statistics while Db2 runs. This information is important if a quick reaction to performance problems is required.

Most online Db2 performance monitors provide a menu-driven interface accessible from TSO or VTAM. It enables online performance monitors to start and stop traces as needed based on the menu options chosen by the user. Consequently, you can reduce overhead and diminish the learning curve involved in understanding Db2 traces and their correspondence to performance reports.

Following are some typical uses of online performance monitors. Many online performance monitors can establish effective exception-based monitoring. When specified performance thresholds are reached, triggers can offer notification and take action. For example, you could set a trigger when the number of lock suspensions for the TXN00002 plan is reached; when the trigger is activated, a message is sent to the console, and a batch report is generated to provide accounting detail information for the plan. You can set any number of triggers for many thresholds. Following are some examples of thresholds you might choose to set:

When a buffer pool threshold is reached (PREFETCH DISABLED, DEFERRED WRITE THRESHOLD, or DM CRITICAL THRESHOLD).
For critical transactions, when predefined performance objectives are not met. For example, if TXN00001 requires subsecond response time, set a trigger to notify a DBA when the transaction receives a class 1 accounting elapsed time exceeding 1 second by more than 25%.
Many types of thresholds can be established. Most online monitors support this capability. As such, you can customize the thresholds for the needs of your DB2 environment.

Most online performance monitors can produce real-time EXPLAINs for long-running SQL statements. If an SQL statement takes a significant amount of time to process, an analyst can display the SQL statement as it executes and dynamically issue an EXPLAIN for the statement. Even as the statement executes, an understanding of why it takes so long to run can be achieved. A complete discussion of the EXPLAIN statement is provided in the next chapter.

Some online performance monitors provide historical trending. These monitors track performance statistics and store them in DB2 tables or in VSAM files with a timestamp. They also provide the capability to query these stores of performance data to assist in the following:

Analyzing recent history. Most SQL statements execute quickly, making difficult the job of capturing and displaying information about the SQL statement as it executes. However, you might not want to wait until the SMF data is available to run a batch report. Quick access to recent past-performance data in these external data stores provides a type of online monitoring that is as close to real time as is usually needed.
Determining performance trends, such as a transaction steadily increasing in its CPU consumption or elapsed time.
Performing capacity planning based on a snapshot of the recent performance of DB2 applications.

Some monitors also run when Db2 is down to provide access to the historical data accumulated by the monitor.

A final benefit of online DB2 performance monitors is their capability to interface with other z/OS monitors for IMS/TM, CICS, z/OS, or VTAM. This way, an analyst gets a view of the entire spectrum of system performance. Understanding and analyzing the data from each of these monitors, however, requires a different skill. Quite often, one person cannot master all these monitors

Conclusion

Although this blog entry was brief, and there are many additional aspects to Db2 performance monitoring, monitoring and measuring performance metrics should be viewed as a vital requirement for all organizations using Db2 databases. By collecting and analyzing performance data, DBAs can detect performance bottlenecks, identify areas for improvement, and make informed decisions to enhance overall system efficiency.

Monday, July 24, 2023

Top 10 Db2 Performance Tips - No. 5 Workload Management and Query Optimization

Managing workloads and optimizing queries are essential tasks for maximizing the performance of a Db2 databases and the applications that access them. By employing effective techniques, DBAs and performance analysts can streamline query execution, reduce resource contention, and enhance overall system efficiency.

The application code itself must be designed appropriately and monitored for efficiency. In fact, many experts agree that as much as 70 to 80 percent of performance problems are caused by improperly coded database applications. SQL is the primary culprit. Coding efficient SQL statements can be complicated. Developers need to be taught how to properly formulate SQL statements and SQL statements must be constantly monitored and tuned.

Query Rewriting

One technique for workload management and query optimization is query rewriting. Query rewriting involves modifying the structure or logic of a query to achieve better performance. This can include simplifying complex expressions, reordering join operations, or breaking down a single complex query into multiple simpler queries. By understanding the underlying data model and query requirements, you can rewrite queries to optimize execution plans and improve overall performance.

With that in mind, it is important to understand that Db2 itself can perform query rewrite as part of the optimization process. The query compiler can rewrite SQL and XQuery statements into different forms to improve optimization.

Of course, you (as a developer or tuner) can still make changes to SQL to try to influence the optimizer to achieve different (and hopefully better) access paths.

SQL, by its very nature, is quite flexible. It uses a free-form structure that gives the user the ability to develop SQL statements in a way best suited to each user. Each SQL request is parsed by Db2 during compilation and optimization before it is executed to check for proper syntax and to optimize the request.

Therefore, SQL statements do not need to start in any given column and can be strung together on one line or broken apart on several lines. Any SQL request could beformulated in a number of different but functionally equivalent ways. SQL’s flexibility makes it intrinsically simple, but flexibility can complicate performance management because different but equivalent SQL formulations can result in variable performance.

When you are writing your SQL statements to access Db2 data, keep in mind that you should look at various different ways to formulate the same query to see which one performs best. For example, you might change a BETWEEN predicate to two predicates using <= and >=. This is one simple example and many different things can change an access path, so be creative and test different ways of using SQL to request the same data.

Use EXPLAIN to Examine Access Paths

Programmers need to be schooled in the practice of examining SQL access paths. Using EXPLAIN and querying the resutls allows developers to request information on how the optimizer will satisfy each query. Will an index be used? In what order will the tables be joined? Will the query be broken up into parallel tasks or not?

Of course, you may want to use a Visual Explain tool to look at access paths visually instead of querying complex and sometimes arcane data in plan tables. For example, IBM Data Studio offers a Visual Explain capability (as do other tools).

A few guidelines to consider:

Always bind every production program using EXPLAIN YES. This will ensure that the access path details are externalized if/when you need to review them should performance issues arise.

Ensure that application developers understand EXPLAIN and have access to plan table data when testing new code. Catching problematic access paths in test is clearly better than waiting for the problems to surface in production.

Make sure that all Db2 developers have access to tools for reviewing and examining access paths and explain information (e.g. Data Studio).

Fundamental SQL Guidelines

These and many other factors influence the efficiency of SQL. Not all application problems are due to improperly coded SQL. The host language application code in which the SQL has been embedded also can be inefficient, causing database application performance to suffer.

These are, however, three simple but important rules to follow when writing your SQL statements for performance. Of course, SQL performance is a complex topic and to understand every nuance of how SQL performs can take alifetime. That said, adhering to the following simple rules puts you on the right track to achieving high-performing Db2 applications.

First, always provide only the exact columns that you need to retrieve in the SELECT-list of each SQL SELECT statement. A common way of stating this is “do not use SELECT *”. The shorthand SELECT * means retrieve all columns from the table(s) being accessed. Using SELECT * may be fine for quick and dirty queries but it is a bad practice for inclusion in application programs because:

Db2 tables may need to be changed in the future to include additional columns. SELECT * in production programs will retrieve those new columns, too, and your program may not be capable of handling the additional data without requiring time-consuming changes.

Db2 consumes additional resources for every column that is requested to be returned. If the program does not need the data, it should not ask for it. Even if the program needs every column, it is better to explicitly ask for each column by name in the SQL statement for clarity and to avoid the previous pitfall.

Secondly, do not ask for what you already know. This may sound simplistic, but most programmers violate this rule at one time or another. For example,consider what is wrong with this simple query:

SELECT LASTNAME, FIRST_NAME, JOB_CODE, DEPTNO

FROM EMP

WHERE JOB_CODE = 'A'

AND DEPTNO = 'D01';

Look at the SELECT-list. There are four columns specified but only two of them are needed. We know that JOB_CODE will always be A and DEPTNO will always be D01 because we told Db2 to only return those rows using the WHERE clauses. So do not ask Db2 to return that data... you already know it!

Every column that Db2 must and return to the program adds overhead. It may be a small amount of overhead, but if this statement runs many times during the day (hundreds, or even thousands, of times), that small overhead adds up to significant overhead.

And thirdly, use the WHERE clause to filter data in the SQL instead of bringing it all into your program to filter. This too is a common rookie mistake. It is much better for Db2 to filter the data before returning it to your program. This is so because Db2 requires additional I/O and CPU resources to obtain each row of data. The fewer rows passed to your program, the more efficient your SQL will be.

Query Hints and Tweaking

The use of query hints is another approach to query optimization. Hints provide directives to the optimizer on how to execute a specific query, influencing the choice of access paths, join strategies, or join orders. By carefully selecting and applying query hints, you can guide the optimizer's decisions and ensure optimal query execution plans. There are three types of hints:

One type of hint is to modify the query in some way to encourage (or force) the optimizer to choose a different access path. This is often called tweaking the SQL. For example, you might choose to append OR 0 = 1 to a predicate to cause Db2 to avoid using an index.
A second type of hint is to give the optimizer quidance as to the number of rows that will be returned using OPTIMIZE FOR n ROWS. In that case, instead of using the database statistics it will use the guidance you provide.
Another type of hint, which is much better, is to explicitly use the hint capabilities of Db2 to force a particular query to use a specific, pre-determined access path.

In any case, it is important to use hints judiciously, as excessive or unnecessary hints may hinder the optimizer's ability to adapt to changing data or system conditions.

Additional Techniques and Tools

As part of ensuring an optimal SQL environment it is important that DBAs first setup an effective environment that is properly implemented and administered. This includes establishing standard methods for appropriate indexing, regular statistics collection, and setting database configuration parameters approrpriately to optimize query performance.

Perhaps the most important thing you can do to assure optimal performance of your database applications is to create the correct indexes for your tables. Indexing appropriately on frequently queried columns can significantly improve query execution times. Regularly collecting and updating statistics ensures that the optimizer has accurate information to make informed decisions. Optimizing database configuration parameters, such as query parallelism or memory settings, can also have a significant impact on workload management and query performance.

Managing the performance of your database applications requires in-depth monitoring. Be sure to allocate an appropriate budget to acquire performance management tools to ensure the efficiency of your database systems.

Query optimization tools are valuable assets for managing workloads and improving query performance. These tools provide insights into query execution plans, access paths, and performance statistics. They allow database administrators to analyze query performance, identify potential bottlenecks, and make informed decisions for optimization.

Moreover, workload management techniques such as query prioritization, resource allocation, and concurrency control contribute to efficient query execution. Prioritizing critical queries, allocating resources based on workload importance, and managing concurrency effectively help ensure that high-priority queries receive the necessary resources and are processed efficiently.

Summing Things Up

In conclusion, managing workloads and optimizing queries are crucial aspects of maximizing the performance of your Db2 database and applications. Techniques such as applying fundamentail query development methods, query rewriting, the use of hints, and leveraging query optimization tools can significantly enhance query performance. Additionally, employing indexing strategies, collecting accurate statistics, and configuring database parameters contribute to efficient workload management. By implementing these techniques, DBAs, develoeprs, and performance analysts can streamline query execution, improve system responsiveness, and achieve optimal performance in their Db2 environments.

Monday, June 26, 2023

Top 10 Db2 Performance Tips - No. 1: Analyze Query Execution Plans

Query execution plans play a crucial role in identifying performance bottlenecks within an IBM Db2 database. When a query is executed, the database optimizer determines the most efficient way to retrieve the requested data by generating a query execution plan. It does this by parsing SQL requests and turning them into actionable commands for accessing and modifying data.

So the query execution plan outlines the steps and operations involved in executing the query, including accessing tables, applying filters, joining data, and performing sorting or aggregations.

Understanding query execution plans is important because they reveal the inner workings of how Db2 will process the query. And they also highlight areas where performance can be improved if you understand the various ways that data can be processed by Db2. By analyzing query execution plans, you can identify potential inefficiencies and take proactive steps to optimize queries for better performance.

For example, one high-level performance tuning step that you can identify using a query execution plan is to find table (space) scans that would be better off using an index instead.

Plan Tables

But how can you review and analyze query execution plans? Perhaps the first method used by most folks is to examine the Plan Tables. A lot of information is contained in the PLAN_TABLE. The PLAN_TABLE is simply a specific table used by the EXPLAIN command to populate information about the choices made by the Db2 Optimizer when it formulates a query execution plan. The information in the PLAN_TABLE provides detailed information about the access paths chosen, join strategies employed, and other relevant details that impact query performance.

As long as EXPLAIN YES is specified when you bind your program, or you explicitly issue an EXPLAIN PLAN statement, information about the query execution plan will be placed in the PLAN_TABLE.

So, after the optimizer creates the access paths and populates the PLAN_TABLE with data representing those access paths, you will need to examine the results to determine if everything is satisfactory. Many questions can be answered by analyzing the results of EXPLAIN – questions like:

if we are joining what type of join is used (NLJ, MS, Hybrid),
was an index used, and if so how many columns matched,
are we doing a scan, and if so what type of scan (full or page range)
is prefetch being used, and if so what type (sequential, list)
was a hint used
was parallelism used, and if so what degree and type
was a sort required, and if so why (Join, Unique, Group By, Order By)
what type of locking is required

And that just covers the main PLAN_TABLE. There are actually many other explain tables available that, if created, will be populated by EXPLAIN. But that is beyond the scope of this high-level tip, which is already becoming rather lengthy.

Other Tools

There are also several additional tools and techniques that can be used to analyze Db2 query execution plans. One commonly used tool is Visual Explain, which provides a graphical representation of the query execution plan. Many tuners prefer a visual approach to revieing and tuning access paths. Visual Explain allows users to visualize the flow of data, understand the order in which operations are performed, and identify potential performance bottlenecks.

Visual Explain is a feature of the free IBM Data Studio tool, as well as the new Db2 Administration Foundation offering. There are other tools, from other vendors, that also provide a visual Explain capability.

Speaking of which, another technique to analyze query execution plans is to use database monitoring tools. Such tools capture real-time performance data, including query execution plans, and provide useful performance metrics that can be reviewed and acted upon as needed.

By monitoring the execution plans of frequently executed queries, it is possible to detect patterns, identify resource-intensive operations, and make informed decisions about query optimization.

A Bit of Guidance

It is wise to always Bind your production programs (that use static SQL) specifying EXPLAIN YES so that you can see the access paths chosen by Db2. This way, you can either check to make sure that no access paths have changed before proceeding to implement changes in production, or at least you have the information available should problems arise after you move code into production.

It can be difficult to determine if any access paths have changed, because you will need the old execution plans to compare to the new ones. For this reason, it is a good idea to keep several versions of access path information for each program.

Additionally, sometimes the program has changed between running EXPLAIN, which can make it challenging to find the exact SQL statements to compare. But you are still better off with older versions than without any historical data.

To optimize query execution plans for better performance, it is essential to focus on areas such as index usage, join strategies, and predicate selection. By strategically creating and maintaining appropriate indexes, rewriting complex queries, and refining predicate conditions, it is possible to influence the optimizer's decisions and improve query performance.

In Conclusion

Analyzing query execution plans is a vital step in optimizing the performance of IBM Db2 applications. By using tools like EXPLAIN and Visual Explain, PLAN_TABLE data, and monitoring tools, it is possible to gain valuable insights into query execution, identify potential bottlenecks, and enhance overall performance.

By proactively analyzing and optimizing query execution plans, organizations can achieve faster response times, improved scalability, and a more efficient database environment.

Be sure to check back here next week when we will address database design as a performance issue.

Monday, January 23, 2023

Escaping Single Quotes in Db2 SQL Statements

Every now and then I write about some things that may seem to be basic to old-timers, but can be very confusing to developers the first time they encounter them. Today's post is an example.

How do you handle a text string with a single quote in it? For example, something like this:

Today is Craig's birthday (not really)

Well, the first thing to corroborate is that text in Db2 SQL is delimited by single quotes. So the first attempt people tend to make is something like this:

'Today is Craig's birthday (not really)'

But they also quickly discover that this does not work. There are three single quotes here: one at the beginning, one between the "g" and the "s", and one at the end. So Db2 will choke on it and you will get an error.

The trick is knowing the escape character. An "escape" character is used to tell Db2 that the next character is actually part of the text string, not a delimiter. For Db2, the escape is to double up the single quote (note that other DBMSes may use a different technique). So, if we want to use our sample text phrase in Db2 SQL, we'd code it with the escape character, as follows:

'Today is Craig''s birthday (not really)'

The single quote denoting the possessive for Craig's is doubled. This tells Db2 to treat the single quote as part of the text. So we could write SQL like this:

SELECT 'Today is Craig''s birthday (not really)'
FROM SYSIBM.SYSDUMMY1;

This will return the text that we want:

Today is Craig's birthday (not really)

This works even if there are multiple single quotes within the text. For example, consider if we want to use the following text in SQL:

Today is not Craig's birthday but it is Beth's

We could write the SQL like this:

SELECT 'Today is not Craig''s birthday but it is Beth''s'
FROM SYSIBM.SYSDUMMY1;

And it also works even at the end of the text string, even though it starts to look somewhat confusing. For example, what is we want to use the following text in SQL?

I like the book 'A Confederacy of Dunces'

In that case, we still double up the single quotes, like so:

SELECT 'I like ths book ''A Confederacy of Dunces'''
FROM SYSIBM.SYSDUMMY1;

And that will return our desired text.

I hope this short treatise on how to use the escape for single quotes in Db2 SQL has been helpful!

Monday, November 21, 2022

Db2 Education is a Constant Requirement

Ensuring access to proper Db2 educational materials should be one of the first issues to be addressed after your organization decides to implement Db2. But education sometimes falls through the cracks...

Does your organization understand what Db2 is? How it works? For what purposes it is needed at your shop? How it will be used? Without a sound understanding of Db2, its components, and features, it is unlikely that you will be able to use Db2 to its best advantage. A basic level of Db2 knowledge can be acquired through a short Db2 fundamentals class for the IT personnel charged with making Db2 a success at your organization. But long-term success with Db2 requires ongoing education.

After addressing the basics of Db2 education, you must support a curriculum for on-going Db2 education for your organization. This support falls into four categories:

The first category of training is a standard regimen of SQL and Db2 programming training to be used by application developers. Every programmer should receive basic training on SQL, education on how to embed SQL in the programming languages they will be using, and possibly additional courses on using Db2 with specific infrastructure software like MQ, WebSphere, CICS, and IMS. Also, with the importance of distributed access to Db2 these days, a course should be made available on that topic, with specific portions that address the technology used by your shop to provide distributed Db2 data access. If this basic level of Db2 education is not required for every Db2 programmer, then Db2 application performance will surely suffer as untrained coders write inefficient and incorrect SQL. This type of training can be delivered as instructor-led training by in-house SQL coding experts, as computer-based training, or as external classes. It should also be augmented with reference material such as books, articles, and blog posts.

The second category of education support is external training for special needs. This support includes education for database administrators, technical support personnel, and performance analysts. Additionally, your organization needs to plan for ongoing education to keep appropriate personnel up-to-date on new versions and releases of DB2. Although IBM offers great courses for new Db2 releases, several third-party vendors such as ProTech, Interskill Learning and KBCE regularly offer in-depth training and release-specific Db2 courses and lectures.

The third category of education is in-house, interactive training in the form of videos, computer-based training, and instructor-led courses. These courses should be used to augment and refresh the formal training given to your Db2 professional staff.

The fourth, and final category of support, is reference material—for example, IBM’s Db2 manuals, Db2 books (such as DB2 Developer's Guide and A Guide to Db2 Performance for Application Developers), vendor-supplied white papers, and industry publications and periodicals. The current IBM manuals for Db2 are always available on the web, as well:

IBM Db2 12 for z/OS documentation

IBM Db2 13 for z/OS documentation

Of course, you should consider augmenting the standard IBM Db2 manuals with IBM redbooks. IBM redbooks provide in-depth, detailed coverage of a specific technology topic. IBM publishes redbooks on multiple subjects, including Db2, IMS, CICS, z/OS, and many other topics. IBM redbooks can greatly assist Db2 technicians working to understand a feature or nuance of DB2. You can download IBM red books for free in Adobe Acrobat format over the Web at http://www.redbooks.ibm.com/

IBM also offers many other useful sources of DB2 information on the Web including white papers, articles, and book excerpts from IBM Press publications. As does the International Db2 User Group (IDUG).

Independent Software Vendors (ISVs) are another rich source of Db2 information. The major vendors provide in-depth technical papers on features of Db2 that would be difficult for most shops to research in the same detail. Be sure to watch for and request the great Db2 Catalog posters, reference guides, white papers and eBooks available from BMC Software and Broadcom. There are other great Db2 ISVs who offer educational information, too, such as the newsletters, white papers, and webinars; these include InfoTel, SEG, and UBS-Hainer, among others.

All of these educational components—in-house education, external education, and industry publications—are useful for learning how you can use Db2 effectively. You would be wise to have a mix of material that supports more than one of the categories outlined previously. In this way, you provide a varied learning environment that meets the needs of all students. This varied learning environment allows each student to learn in the most conducive way for him or her. Plan to provide an on-site library of educational material addressing the following subjects:

Introduction to relational databases
Introduction to DB2 and SQL
Advanced SQL
Programming DB2 using (your languages of choice)
Programming DB2 in batch
Programming DB2 using TSO, CICS, and IMS
Programming DB2 and the Web
Creating DB2 stored procedures, triggers, and UDFs
Programming DB2 in a Distributed Environment
Debugging and Problem Analysis

You also might want to have an introductory Db2 database administration course to train new DBAs. In addition to this basic education library, plan to provide advanced education for technical Db2 users, such as DBAs, technical support personnel, and technical programmers and analysts. Advanced DBA topics (such as Sysplex Data Sharing, performance management, and backup/recovery) should be left to instructor-led training courses because of the complex nature of Db2 database administration. Additional advanced topics to consider include system administration (for systems programmers) and disaster recovery. Many vendors, including the ones metntioned above, offer these classes. Searching for smaller consulting firms and local resources is also prudent; these firms sometimes can provide courses tailored to your installation-specific needs.

The advanced education program should include allocating time to attend area user groups meetings, such as the International Db2 User Group (IDUG) conferences held every year in North America, EMEA, and Canada.

The bottom line is simple, though. Plan for ongoing Db2 education for your DBAs, programmers, and analysts... or plan on failing.

Friday, June 03, 2022

Time to Download the Db2 13 for z/OS Manuals!

Yesterday I posted a brief piece about the new version of Db2 for z/OS that is now generally available: Db2 13 for z/OS is Here! It is always an exciting time when a new version of Db2 for z/OS is unleashed on the world and one of the first things I recommend is to start reading the manuals to see what all is there!

I like having the PDF versions of the manuals on my hard drive, so with each new version, I start by downloading them! With that in mind, here is the link to the IBM page that hosts the PDF manuals for Db2 13 for z/OS.

Of course, over time, this documentation will be modified. Therefore, it is a good idea to be aware that subsequent modifications may render the versions on your hard drive inaccurate. You can always replace them periodically if you wish... perhaps when you adopt a new function level make plans to downloads the latest iterations of the manuals.

There is also a great, new redbook on Db2 13 for z/OS that you should download and read. It is titled IBM Db2 13 for z/OS and More, and it, along with the What's New manual, can serve as a nice introduction to this new version.

A new manual that I need to mention is the SQL Data Insights Users Guide. If you read my earlier post (Db2 13 for z/OS is Here!) then you know that SQL Data Insights is a new Db2 13 capability that combines AI/deep learning with Db2 for z/OS and IBM Z technologies to deliver SQL-based semantic queries on tables and views. It is probably one of the most exciting new things in this new version...

So if you work with Db2 for z/OS don't delay... download those new manuals and start learning what new wonders Db2 13 will offer up!

Monday, December 13, 2021

Top 10 Steps to Building the Right Indexes

One of the constant struggles with Db2 development, and indeed developing for any relational DBMS, is designing and creating appropriate indexes. Perhaps the most important thing you can do to assure optimal application performance when accessing data in a relational/SQL database is to create correct indexes for your tables based on the queries your applications use. Of course, this is easier said than done.

But we can start with some basics. For example, consider this SQL statement:

    SELECT   LASTNAME, SALARY
    FROM     EMP
    WHERE    EMPNO = '000010'
    AND      DEPTNO = 'D01';

What index (or indexes) would it make sense to build for this simple query? First, think about all the possible indexes that you could create. Your first short-list probably looks something like this:

Index1 on EMPNO
Index2 on DEPTNO
Index3 on EMPNO and DEPTNO

This is a good start, and Index3 is likely to be the best of the lot. It enables the optimizer to use the index to immediately look up the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table, you might want to examine the impact of creating yet another index on the table.

With the caveat that appropriate index creation can be complicated, let’s look at the Top 10 things you can do to build the right indexes on your tables:

1. Index by workload, not by object

Many people make the mistake of just guessing at some indexes to create when they are creating other database objects (like tables and tablespaces). But without an idea of how the tables are going to be accessed these guesses are usually wrong – at least some of them.

Indexes should be built to optimize the access of your SQL queries. To properly create an optimal set of indexes requires a list of the SQL to be used, an estimate of the frequency that each SQL statement will be executed, and the importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries most of the time be made.

If you are doing it any other way, you are doing it wrong.

2. Build indexes based on predicates

3. Index most-heavily used queries

Numbers 2 and 3 can be thought of as corollaries to Number 1… that is, these are the aspects of application workload that need to be examined to produce appropriate and effective indexes.

Look at the predicates of the queries in the workload and work to create a set of indexes that match up to most (if not all) of them. When it is not practical to build all of the indexes you have identified, then it makes sense to look at the queries that will be used most often and optimize them first.

4. Index important queries

The more important the query, the more you might want to tune by index creation. If you are coding a query that the CIO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist.

Of course, the decision should depend on the application’s importance to the business - not just on the user’s importance. For example, if that clerk runs a query hundreds of times a day and the CIO runs his query only once, then perhaps the clerk's query is more important.

5. Index to avoid sorting (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses, the optimizer can use an index to avoid a sort, and thereby potentially improve performance.

6. Create indexes for uniqueness (PK, U)

Some indexes are required in order to make the database schema valid. Most Database systems require that unique indexes be created when unique and primary key constraints exist.

7. Create indexes for foreign keys

Even if not required, creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints (RI – referential integrity).

8. Consider adding columns for index-only access

Sometimes it can be advantageous to include additional columns in an index to increase the chances of index-only access. With index-only access, all of the data needed to satisfy the query can be found in the index alone — without having to read data from the tablespace.

For example, suppose that there is an index on the DEPTNO column of the DEPT table. The following query may use this index:

   SELECT   DEPTNAME
FROM     DEPT
WHERE    DEPTNO = 'D01';

The index could be used to access only those columns with a DEPTNO greater than D00, but then the DBMS would need to access the data in the table space to return the DEPTNAME. If you added DEPTNAME to the index, that is, create the index on (DEPTNO, DEPTNAME) then all of the data needed for this query exists in the index and additional I/O to the table space would not be needed. This technique is sometimes referred to as index overloading.

Of course, this is not always a good idea. You have to take into account whether other queries use the index and how it might negatively impact their performance.

It is also worth mentioning index include columns, which allows you to define additional (included) columns that are not part of the actual key but are available in the index. So we can create an index like this:

CREATE INDEX IXNAME
    ON DEPT ( DEPTNO )
     INCLUDE ( DEPTNAME )

The index key is only on the DEPTNO column, but the DEPTNAME is also included in the index, so the earlier query can be satisfied using index-only access with this index.

9. Don’t arbitrarily limit the number of indexes

An example of a bad standard would be to place an artificial limit on indexing. There should be no arbitrary limit on the number of indexes that you can create for any database table. Indexes are undoubtedly one of the most important factors in creating efficient queries. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data. When a rule such as this exists, it usually is stated in the standards manual using verbiage something like “Each table can have at most five indexes created for it” — or — “Do not create more than three indexes for any single table in the database.” These are bad standards.

If you already have three indexes, or five indexes, or even 57 indexes, and another index will improve performance why would you arbitrarily want to avoid creating that index?

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.”

Which brings us to…

10. Be aware of I/U/D implications

Db2 must automatically maintain every index you create. This means every INSERT and every DELETE to an indexed table will insert and delete not just from the table, but also from its indexes.

Additionally, when you UPDATE the value of a column that has been defined in an index, the DBMS must also update the index. So, indexes speed the process of retrieval but slow down modification.

So the general rule of thumb should be "Index until it hurts... and then back off the least effective index to make it no longer hurt." Sure, that is easier said than done, but it is a valid philosophy to follow.

Summary

Following these Top 10 index design techniques can go a long way toward improving not just your index usage, but also the performance of your database applications.