The Db2 Portal Blog: BIND

Showing posts with label BIND. Show all posts

Monday, June 26, 2023

Top 10 Db2 Performance Tips - No. 1: Analyze Query Execution Plans

Query execution plans play a crucial role in identifying performance bottlenecks within an IBM Db2 database. When a query is executed, the database optimizer determines the most efficient way to retrieve the requested data by generating a query execution plan. It does this by parsing SQL requests and turning them into actionable commands for accessing and modifying data.

So the query execution plan outlines the steps and operations involved in executing the query, including accessing tables, applying filters, joining data, and performing sorting or aggregations.

Understanding query execution plans is important because they reveal the inner workings of how Db2 will process the query. And they also highlight areas where performance can be improved if you understand the various ways that data can be processed by Db2. By analyzing query execution plans, you can identify potential inefficiencies and take proactive steps to optimize queries for better performance.

For example, one high-level performance tuning step that you can identify using a query execution plan is to find table (space) scans that would be better off using an index instead.

Plan Tables

But how can you review and analyze query execution plans? Perhaps the first method used by most folks is to examine the Plan Tables. A lot of information is contained in the PLAN_TABLE. The PLAN_TABLE is simply a specific table used by the EXPLAIN command to populate information about the choices made by the Db2 Optimizer when it formulates a query execution plan. The information in the PLAN_TABLE provides detailed information about the access paths chosen, join strategies employed, and other relevant details that impact query performance.

As long as EXPLAIN YES is specified when you bind your program, or you explicitly issue an EXPLAIN PLAN statement, information about the query execution plan will be placed in the PLAN_TABLE.

So, after the optimizer creates the access paths and populates the PLAN_TABLE with data representing those access paths, you will need to examine the results to determine if everything is satisfactory. Many questions can be answered by analyzing the results of EXPLAIN – questions like:

if we are joining what type of join is used (NLJ, MS, Hybrid),
was an index used, and if so how many columns matched,
are we doing a scan, and if so what type of scan (full or page range)
is prefetch being used, and if so what type (sequential, list)
was a hint used
was parallelism used, and if so what degree and type
was a sort required, and if so why (Join, Unique, Group By, Order By)
what type of locking is required

And that just covers the main PLAN_TABLE. There are actually many other explain tables available that, if created, will be populated by EXPLAIN. But that is beyond the scope of this high-level tip, which is already becoming rather lengthy.

Other Tools

There are also several additional tools and techniques that can be used to analyze Db2 query execution plans. One commonly used tool is Visual Explain, which provides a graphical representation of the query execution plan. Many tuners prefer a visual approach to revieing and tuning access paths. Visual Explain allows users to visualize the flow of data, understand the order in which operations are performed, and identify potential performance bottlenecks.

Visual Explain is a feature of the free IBM Data Studio tool, as well as the new Db2 Administration Foundation offering. There are other tools, from other vendors, that also provide a visual Explain capability.

Speaking of which, another technique to analyze query execution plans is to use database monitoring tools. Such tools capture real-time performance data, including query execution plans, and provide useful performance metrics that can be reviewed and acted upon as needed.

By monitoring the execution plans of frequently executed queries, it is possible to detect patterns, identify resource-intensive operations, and make informed decisions about query optimization.

A Bit of Guidance

It is wise to always Bind your production programs (that use static SQL) specifying EXPLAIN YES so that you can see the access paths chosen by Db2. This way, you can either check to make sure that no access paths have changed before proceeding to implement changes in production, or at least you have the information available should problems arise after you move code into production.

It can be difficult to determine if any access paths have changed, because you will need the old execution plans to compare to the new ones. For this reason, it is a good idea to keep several versions of access path information for each program.

Additionally, sometimes the program has changed between running EXPLAIN, which can make it challenging to find the exact SQL statements to compare. But you are still better off with older versions than without any historical data.

To optimize query execution plans for better performance, it is essential to focus on areas such as index usage, join strategies, and predicate selection. By strategically creating and maintaining appropriate indexes, rewriting complex queries, and refining predicate conditions, it is possible to influence the optimizer's decisions and improve query performance.

In Conclusion

Analyzing query execution plans is a vital step in optimizing the performance of IBM Db2 applications. By using tools like EXPLAIN and Visual Explain, PLAN_TABLE data, and monitoring tools, it is possible to gain valuable insights into query execution, identify potential bottlenecks, and enhance overall performance.

By proactively analyzing and optimizing query execution plans, organizations can achieve faster response times, improved scalability, and a more efficient database environment.

Be sure to check back here next week when we will address database design as a performance issue.

Monday, November 15, 2021

Db2, SQL, and Modern Coding Techniques

Mainframe application development has changed dramatically during the lifetime of Db2. Way back in the beginning, and even for a long time thereafter, most Db2 programs were written in COBOL and coded to use static SQL. This meant that the SQL was bound before it was executed and the access paths were etched in stone. And it also meant that the access paths for any SQL in almost any program were readily available to DBAs in PLAN_TABLEs.

Fast forward to the present and static SQL in COBOL is the exception rather than the rule. Oh, sure, those COBOL programs with static SQL are still there, running in CICS or in batch (or even IMS/TM), but most new development is not done this way anymore. Today, programmers use IDEs to build code that accesses mainframe Db2 data using distributed data access over the web or from a GUI.

Most modern, distributed, application development projects typically rely upon application development frameworks. The two most commonly-used frameworks are Microsoft .NET and Java/J2EE. And these frameworks use dynamic SQL, not static.

An additional contributing force is the adoption of commercial off-the-shelf applications for ERP and CRM like SAP, Peoplesoft, and Siebel (the last two of which are now Oracle products). These applications are not tied to a specific DBMS but support by multiple different DBMSs, one of which is (was) DB2 for z/OS. So, these applications use dynamic SQL because that makes it easier to accomplish multi-vendor DBMS support. For the most part though, none of these ERP packages run on the mainframe (or Db2) because two of the are now Oracle products and SAP is promoting its own DBMS (HANA).

The point though is that the manner in which Db2 applications are developed has changed. And that means the manner in which Db2 is managed is changing, too. Instead of relying on access paths already being there, DBAs must develop ways of capturing access paths for dynamic SQL statements. Prepared dynamic SQL can be cached in the dynamic statement cache so that the same SQL statement can reuse the mini plan for the statement the next time it runs. And the BIND command can be used to snap the statements in the cache and find the access paths used. But the dynamic statement cache works like a buffer pool, with least recently used statements being flushed to make room for new statements… so you might not find the dynamic SQL you are looking for, at least not without helpful tools or scripts to stay on top of things.

This change has caused many organizations to experience SQL performance problems. Because dynamic SQL is easier to introduce into the system performance can become problematic. It is difficult to get a handle on what dynamic SQL runs when. And it is difficult to track because the DBAs are not accustomed to monitoring and tuning dynamic SQL… or they just do not have the proper tools to be able to do the job appropriately. So, these black holes of dynamic SQL performance problems open up that are left to run wild.

New SQL Stuff

It is not just the new programming paradigm that causes management problems for modern DB2 usage. The amount of new SQL statements and functions and features continues to grow with each new version (and even function level) of Db2. This can be a mixed blessing though. It is good because expanding SQL functionality makes it easier to program and access Db2 data. But it can be problematic because it can be more confusing and difficult to learn these new features.

Exacerbating the issue is that many organizations do not invest a sufficient amount of time and effort in educating their staff. If the DBAs are not trained in all of the new functionality, then new SQL can cause big issues. How? Imagine a shop that just migrated to a new version or level of Db2, but the only education available was to read the manuals. So, some of the DBAs are not knowledgeable on the new SQL code. Then a developer picks up a manual, and learns how to use a new function or arrays as parameters. Kudos to the developer for the initiative, but if problems arise there may not be anybody who knows how to support the new feature.

And there is a lot of new SQL functionality being added. If we focus on just the past few versions of Db2, here is a representative list of new SQL programming related enhancements that have been introduced: TRUNCATE, new data types (DECFLOAT, VARBINARY), optimistic locking, FETCH CONTINUE, ROLE, MERGE, SELECT from MERGE, pureXML, FETCH FIRST & ORDER BY in subselect and fullselect, INTERSECT, EXCEPT, Indicator Variables, TIMESTAMP precision and time zones, Moving sums and averages, Inline and Non-inline SQL scalar functions, SQL table functions, extended implicit casting, RANK(), ROW_NUMBER(), XQuery, transparent archive query, APPLCOMPAT, IDAA/analytics, grouping sets, ROLLUP, Hadoop access, FTBs, LISTAGG…

That is a lot to learn and this is just a selection of what has been added!

Summary

Things are moving at a fast and furious pace for application developers these days. And the resultant changes can introduce problems that impact your business unless you adapt to align your management and DBA capabilities with the new development methods.

Thursday, September 17, 2020

Convert Your COBOL Db2 Programs to Java Without Rebinding

As most Db2 developers and DBAs know, when you modify a Db2 program you have to prepare the program to enable it to be executed. This program preparation process requires running a series of code preprocessors that—when enacted in the proper sequence—creates an executable load module and a Db2 application package. The combination of the executable load module and the application package is required before any Db2 program can be run, whether batch or online.

But it is not our intent here to walk through and explain all of the steps and nuances involved in Db2 program preparation. Instead, we are taking a look at the impact of converting COBOL programs to Java programs, particularly when it comes to the need to bind as a part of the process.

We all know that issuing the BIND command causes Db2 to formulate access paths for SQL. If enough things (statistics, memory, buffers, etc.) have changed, then access paths can change whenever you BIND or REBIND. And this can be troublesome to manage.

But if the SQL does not change, then it is not technically necessary to bind to create a new package. You can prevent unnecessary BIND operations by comparing the new DBRM from the pre-compile with the previous version. Of course, there is no native capability in Db2 or the BIND command to compare the DBRM. That is why there are third-party tools on the market that can be used for this purpose.

But again, it is not the purpose of today’s post to discuss such tools. Instead, the topic is converting COBOL to Java. I have discussed this previously in the blog in the post Consider Cross-Compiling COBOL to Java to Reduce Costs, so you might want to take a moment to read through that post to acquaint yourself with the general topic.

Converting COBOL to Java and BIND

So, let’s consider a COBOL program with Db2 SQL statements in it. Most COBOL uses static SQL, meaning that the access paths are determined at bind time, not at execution time. If we convert that COBOL program to Java then we are not changing the SQL, just the code around it. Since the SQL does not change, then a bind should not be required, at least in theory, right?

Well, we first need to get into a quick discussion about types of Java programs. You can use either JDBC or SQLJ for accessing Db2 data from a Java program. With JDBC the program will use dynamic SQL whereas SQLJ will deliver static SQL. The Db2 BIND command can be issued using a DBRM (precompiler output) or a SQLJ customized profile.

So, part of the equation to avoid binding is to utilize SQLJ for converted COBOL programs.

CloudFrame, the company and product discussed in the referenced blog post above can be used to convert COBOL programs into modular Java. And it uses SQLJ for the Db2 access. As such, with embedded SQLJ, static SQL will be used and the access paths will be determined at bind time instead of execution time.

But remember, we converted business logic, not SQL. The same SQL statements that were used in the COBOL program can be used in the converted Java. CloudFrame takes advantage of this and re-purposes the existing package from the previous COBOL program to the new Java SQLJ. CloudFrame automates the entire process as part of the conversion from COBOL to Java. This means that the static SQL from the COBOL program is converted and customized into SQLJ in java. This is a built-in capability of CloudFrame that allows you to simply reuse the same package information that was already generated and bound earlier.

This means no bind is required when you use CloudFrame to convert your Db2 COBOL applications to Java… and no access paths will change. And that is a good thing, right? Conversion and migration are already time-consuming processes; eliminating performance problems due to changing access paths means that one less issue to worry about during a COBOL to Java conversion when you use CloudFrame.

Monday, October 12, 2015

OK, So How Does the CURRENTDATA Parm Work?

There seems to be a LOT of confusion out there in DB2-land about the CURRENTDATA parameter, so I thought it might be helpful to blog about the subject.

So first things first: CURRENTDATA is a BIND parameter that can be specified when you bind or rebind your plans and packages. There are two options that can be specified: YES and NO. The default is NO.

But what does it do?

Well, the manuals tell us that CURRENTDATA indicates whether data currency is required for read-only and ambiguous cursors when the ISOLATION(CS) option is used. But that is not very clear... except that CURRENTDATA really only applies with an isolation level of cursor stability.

The general idea here is to enable the developer to exert a level of control by choosing between high performance or up-to-date data. Sure, in a perfect world, everybody would choose both. But in the real world we deal with tradeoffs and the CURRENTDATA parameter allows us to tell DB2 which of these options to prefer.

CURRENTDATA (NO)

The default, CURRENTDATA(NO), is the performance choice, and probably the best choice for most applications. When NO is chosen, you are telling DB2 that completely up-to-date data currency is not required for read-only and ambiguous cursors. That means that DB2 can perform block fetching for distributed, ambiguous cursors.

And that is probably what you want. However, if you are not using pristine coding practices, and your program tries attempts to DELETE WHERE CURRENT OF against an ambiguous cursor, DB2 will return a negative SQLCODE.

What is an Ambiguous Cursor?

At this point you might be asking “What is an ambiguous cursor?” Generally speaking, an ambiguous cursor is one where DB2 cannot tell if your program needs to use the cursor for updating or deleting. You can avoid ambiguous cursors by coding the FOR READ ONLY and FOR UPDATE OF clause on your cursors to inform DB2 of the program’s future intent for the data being accessed.

CURRENTDATA (YES)

What about CURRENTDATA(YES)? So far, we have discussed only the NO option, but you can also specify YES. Doing so indicates that the data should be current for read-only and ambiguous cursors. That means DB2 will acquire page (or row) locks to ensure the currency of the data and block fetching for distributed, ambiguous cursors will be curtailed.

Summary

Hopefully this short blog post helps to clear up a muddy topic. Of course, there is a lot more to DB2 locking and performance and I do not want to make it seem like this is the only thing you need to know in terms of the tradeoffs between data currency and performance.

But at least one parameter might be a little bit clearer to you today than it was yesterday…

Wednesday, February 04, 2015

Helping Out the DB2 Optimizer Using the VOLATILE Keyword

Do you know about the VOLATILE keyword? This keyword was added to DB2 for z/OS back in Version 8. It can be specified on a table using either the CREATE TABLE or ALTER TABLE statement.

By specifying VOLATILE, you are indicating that the volume of data in the table is not stable and is likely to fluctuate. In other words, it is volatile! One common scenario where VOLATILE will be helpful is for tables that are emptied nightly and then repopulated the next day, such as an input queue.

When you specify the VOLATILE keyword on a table, BIND will favor using indexed access paths, even if the table was empty when RUNSTATS was run. It is ideal for single-index tables where you want DB2 to favor using the index.

ERP environments, such as SAP and Peoplesoft, with thousands of tables typically have some tables that meet these criteria. Even worse, it is not uncommon for DBAs to have no idea of the actual content or use for many of those thousands of tables generated by the ERP installation. At times, some of the ERP tables are not in use – depending on which modules of the ERP system you implement… but the tables get created anyway. Many DBAs simply maintain all of the tables provided with the ERP system, whether they are used or not, including running image copies and gathering RUNSTATS for them... and many are empty tables.

Collecting statistics on an empty table populates the catalog with stats indicating that the table contains no data. And, of course, when access paths are generated using those statistics DB2 will probably favor a scan because the table is small (how much smaller can you get than empty?) But some of those tables are volatile, going from empty to perhaps hundreds of thousands of rows during processing.

Of course, if the table is actually empty (or contains only a small amount of data), and VOLATILE is specified, DB2 will favor the use an index if one exists, which can degrade performance a bit. But that is a smaller price to pay than scanning thousands of rows, isn't it?

So one approach is to use the VOLATILE keyword for these types of tables... your users will be glad that you did.

Tuesday, January 20, 2015

Approaches to DB2 Access Path Management

BIND and REBIND are crucially important components as you attempt to assure efficient DB2 applications. Because the BIND/REBIND process determines exactly how your DB2 data is accessed it is important that you develop an appropriate strategy for when and how to REBIND your programs.

There are several common REBIND approaches taken by DB2 users. By far, the best approach is to REBIND your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change.

Other approaches include REBINDing only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to REBIND automatically after a regular period of time (days, weeks, months, etc.). This approach can work if the period of time is wisely chosen based on the application data – but it still can pose administrative issues.

The final approach can be summarized as “if it ain’t broke don’t fix it!” This is the worst of the several approaches discussed here. The biggest problem with this approach is that you are penalizing every program in your subsystem for fear that a program or two may have a degraded access path. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data changes.

Of course, the possibility of degraded performance is real – and that is why this approach has been adopted at some sites. The problem is being able to find which statements have degraded. In an ideal world we would be to be able to review the access path changes beforehand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to REBIND on a regular basis as your data changes. This approach has become known as the Three Rs. To implement this approach you:

Regularly reorganize the data to ensure that it is optimally structured.
Follow that with RUNSTATS to be sure that the reorganized state of the data is reflected in the DB2 Catalog.
And follow that with a REBIND for all the application programs that access the data structures impacted by the REORG and RUNSTATS.

By REBINDing you can generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. Additionally, as DB2 changes are introduced (PTFs, new version/release) optimizer improvements and new access techniques can be incorporated into the access paths. That is, if you never REBIND, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

Adopting the Three R’s approach can pose additional questions. For example, when should you reorganize? In order to properly determine when a REORG is needed you’ll have to look at statistics. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps it should be at least 4 R’s – in other words:

RTS (or RUNSTATS)
REORG
RUNSTATS
REBIND

Without accurate statistics there is little hope that the optimizer will formulate the best access path to retrieve your data. If the optimizer does not have accurate information on the size, organization, and particulars of your data then it will be creating access paths based on either default or inaccurate statistics. Incorrect statistics will cause bad choices to be made – such as choosing a merge-scan join when a nested loop join would be better, or failure to invoke sequential prefetch, or using the wrong index – or no index at all. And the problem of inaccurate statistics is pervasive. There are shops out there that never, or rarely, run RUNSTATS to gather up-to-date statistics. Make sure yours is not one of those shops!

When should you run RUNSTATS? One answer is “As frequently as possible based on how often your data changes.” To do this you will need to know a thing or two about your data growth patterns: what is its make-up, how is it used, how fast does it grow, and how often does it change? These patterns will differ for every table space in your system.

Next we need to decide when to REBIND? The best answer for this is when statistics have changed significantly enough to change access paths. When we know that data has significantly changed it makes sense to REBIND after the RUNSTATS completes. But the trick is determining exactly when we have a “significant” change in our data. Without an automated method of comparing and contrasting statistics (or even better yet, access paths) coming up with an answer in a manual way can be time-consuming and error-prone – especially if we have thousands of DB2 programs to manage.

As we REBIND, we always must be on alert for rogue access paths. A rogue access path is created when the optimizer formulates a new access path that performs worse than the previous access path. This can happen for a variety of reasons. Of course, number one is that the optimizer, though good, is not perfect. So mistakes can happen. Other factors can cause degraded access paths, too. The access paths for volatile tables depend on when you run the RUNSTATS. Volatile tables are those that start out empty, get rows added to them during processing, and are emptied out at the end of the day. And, of course, if the catalog or statistics are not accurate we can get problems, too.

So adopting the Four Rs approach implies that you will have to develop a methodology for reviewing your access paths and taking care of any “potential” problem access paths. Indeed, the Four Rs becomes The Five Rs as we add a step to review the access paths after REBINDing to make sure that there are no rogue access paths:

Start with Real Time Stats (or RUNSTATS) to determine when to REORG.
Then we Reorganize the needy table spaces (and indexes)
After reorganizing, run RUNSTATS to pick up current statistics,
Follow that with the BEBINDs.
Then we need that fifth R – which is to Review the access paths generated by the REBIND.

The review is of utmost importance because the optimizer can make mistakes. And, of course, so can you. But your users will not call you when performance is better (or the same). They only dial your numbers when performance gets worse. As such, proactive shops will put best practices in place to test REBIND results comparing the before and after impact of the optimizer’s choices.

Tuesday, July 08, 2014

DB2 Application Performance Management

Assuring optimal performance for database applications can be a tricky thing. In today's blog I ruminate on the high-level issues involved in optimizing your DB2 for z/OS applications.

Applications that access databases are only as good as the performance they achieve. And every user wants their software to run as fast as possible. As such, performance tuning and management is one of the biggest demands on the DBA’s time. When asked what is the single most important or stressful aspect of their job, DBAs typically respond "assuring optimal performance." Indeed, a Forrester Research survey indicates that performance and troubleshooting tops the list of most challenging DBA tasks.

But when you are dealing with data in a database management system there are multiple interacting components that must be managed and tuned to achieve optimal performance. That is, every database application, at its core, requires three components to operate:

· the system (that is, the DBMS itself, the network, and the O/S),

· the database (that is, the DDL and database schema), and

· the application (that is, the SQL and program logic).

Each of these components requires care and attention, but today I want to focus on the high-level aspects of performance management from the perspective of the application. Furthermore, I will discuss this in terms of DB2 for z/OS.

So where do we begin? For DB2, a logical starting point is with BIND Parameters. There are many parameters and values that must be chosen from and specified when you bind a DB2 application program. The vast array of options at our disposal can render the whole process extremely confusing -- especially if you don’t bind on a daily basis. And even if you do, some of the options still might be confusing if you rarely have to change them. You know what I’m talking about, parameters like ACQUIRE, RELEASE, VALIDATE, and DEGREE.

I will not delve into the myriad bind options and give you advice on which to use when. There are many articles and books, as well as the IBM DB2 manuals that you can use to guide you along that path. Suffice it to say, that there are some standard parameters and values that should be chosen most of the time in most situations. As such, a wise DBA group will set up canned routines for the programmers to use for compiling and binding their applications. Choices such as: CICS transaction, DB2 batch, and BI/analytical query can be presented to the developer and then, based on which of the various types of programs and environments that are available, the canned script can choose the proper bind options. Doing so can greatly diminish the problems that can be encountered when the "wrong" parameters or values are chosen at Bind time.

Before concluding this short section on Bind parameters I want to give one important piece of advice: In production, always Bind your plans and packages specifying EXPLAIN YES. Failing to do so means that access paths will be generated, but you will not know what they are. This is akin to blinding yourself to what DB2 is doing and it makes application performance tuning much more difficult.

Access Path Management

Bind and Rebind are important components to achieve optimal DB2 application performance. This is so because these commands are what determine the access paths to the data requested by your program. So it is vitally important that you create a strategy for when and how to Rebind your programs. There are several common approaches. The best approach is to Rebind your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change. More on this in a moment.

Other approaches include Rebinding only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to Rebind automatically after a regular period of time, whether it is days, weeks, months, or whatever period of time you deem significant. This approach can work if the period of time is wisely chosen based on the application data â€“ but it still can pose significant administrative issues.

The final approach -- the worst of the bunch -- comes from the if it ain’t broke don’t fix it school of thought. Basically, it boils down to (almost) never rebinding your programs. This approach penalizes every program for fear that a single program (or two) might experience a degraded access path. Oh, the possibility of degraded performance is real and that is why this approach has been adopted by some. And it can be difficult to find which statements may have degraded after a Rebind. The ideal situation would allow us to review the access path changes before hand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to Rebind regularly as your data changes. This involves what is known as the three Rs: REORG, RUNSTATS, and Rebind. At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions.

By Rebinding you will generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. And as you apply changes to DB2 (new releases/PTFs) optimizer improvements and new access techniques can be used. If you never Rebind, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

To adopt the Three R’s you need to determine when to REORG. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps we need 4 R’s:

RUNSTATS or preferably, RTS
REORG
RUNSTATS
REBIND

But is this enough? Probably not because we need to review the access paths after rebinding to make sure that there are no rogue access paths. So, let’s add another R to Review the access paths generated by the REBIND. As we mentioned, the optimizer can make mistakes. And, of course, so can you. Users don't call you when performance is better (or the same). But if performance gets worse, you can bet on getting a call from irate users.

So we need to put in place best practices whereby we test Rebind results to compare the before and after impact of the optimizer’s choices. Only then can we assure that we are achieving optimal DB2 application performance.

Tuning the Code

Of course, everything we’ve discussed so far assumes that the code is written efficiently to begin with -- and that is a big assumption. We also need to make sure that we are implementing efficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.

SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include:

Let SQL do the work instead of the program. For example, code an SQL join instead of two cursors using program logic to join.
Simpler is generally better, but complex SQL can be very efficient.
Retrieve only the columns required, never more.
Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate.
When joining tables, always provide join predicates. In other words, avoid Cartesian products.
Favor using Stage 1 and Indexable predicates.
But favor Stage 2 predicates over application logic.
Avoid sorting (if possible) by creating indexes for ORDER BY and GROUP BY operations.
Avoid black boxes -- that is, avoid I/O routines that are called by programs instead of using embedded SQL.
Minimize deadlocks by updating tables in the same sequence in every program.
Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible.
Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems.

Even if you follow the guidelines in this bulleted list, there will still be numerous opportunities for you to tune SQL for performance. To tune SQL you must be able to interpret the output of the access paths produced by EXPLAIN. This information is encoded in the plan tables. IBM offers Data Studio (as a free download) with a visual explain capability that can simplify this process. But you will also have to accumulate experience as to which SQL formulations work more efficiently than others. This skill will come with time and on-the-job learning.

Finally, some attention must be paid to the host language code. Host language code refers to the application programs written in C, COBOL, Java, Visual Basic or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem.

Bottom Line

Although DBAs must understand all three aspects of database performance management concentrating on the application aspects of performance will most likely provide the most bang-for-the-buck. Of course, we have only touched the tip of the DB2 application performance iceberg today. But even this high-level view into application performance can serve as a nice starting place for tuning your DB2 applications.

Good luck with DB2 for z/OS and happy performance tuning!

Monday, June 02, 2014

Don't Neglect Your DB2 Rebind Strategy

We’re all busy. Frequently it can seem like you just got in to the office and already it is past quitting time! There is so much to do and so little time to do it all. And we all work more than 40 hours a week… these are some of the common complaints of the busy DBA.

And those are valid concerns, but it does not diminish the need to properly address DB2 database administration and performance management... with a special focus on proactive management.

So please take a little bit of time to read about, and consider your organization's strategy for rebinding DB2 applications.

REBIND Strategy

One of the most important contributors to the on-going efficiency and health of your DB2 environment is proper management of DB2 access path changes. A thorough REBIND management process is a requirement for healthy DB2 applications.

But many shops do not do everything possible to keep access paths up-to-date with the current state of their data. Approaches vary, such as rebinding only when a new version of DB2 is installed, whenever PTFs are applied to DB2, or to rebind automatically after a regular period of time. Although these methods are workable, they are less than optimal.

The worst approach though is the “if it ain’t broke don’t fix it” mentality. In other words, many DBA groups adopt “never REBIND unless you absolutely have to” as a firm policy. The biggest problem this creates is that it penalizes every program in your subsystem for fear of a few degraded access paths. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data and environment change. Of course, the possibility of degraded performance after a REBIND is real – and that is why many sites avoid regularly rebinding their programs.

Even so, the best approach is to perform regular REBINDs as your data changes. To do so, you should follow the Three R’s. Regularly reorganizing to ensure optimal structure; followed by RUNSTATS to ensure that the reorganized state of the data is reflected in the DB2 Catalog; and finally, rebinding all of programs that access the reorganized structures. This technique can improve application performance because access paths will be better designed based on an accurate view of your data.

Of course, adopting the Three R’s approach raises questions, such as “When should you reorganize?” To properly determine when to reorganize you’ll have to examine statistics. This means looking at either RUNSTATS in the catalog or Real Time Statistics (RTS). So, the Three R’s become the Four 4 R’s – examine the Real Time Stats, REORG database objects as indicated by RTS, RUNSTATS to get the new statistics, then REBIND any impacted application programs.

Some organizations do not rely on statistics to schedule REORGs. Instead, they build REORG JCL as they create each object – that is, create a table space, build and schedule a REORG job, and run it monthly or quarterly. This is better than no REORG at all, but it is not ideal because you are likely to be reorganizing too soon (wasting CPU cycles) or too late (causing performance degradation until REORG).

It is better to base your REORGs off of thresholds on catalog or real-time statistics. Statistics are the fuel that makes the optimizer function properly. Without accurate statistics the optimizer cannot formulate the best access path to retrieve your data because it does not know how your data is currently structured. So when should you run RUNSTATS? One answer is “as frequently as possible based on how often your data changes.” To succeed you need an understanding of data growth patterns – and these patterns will differ for every table space and index.

The looming question is this: why are we running all of these RUNSTATS and REORGs? To improve performance, right? But only with regular REBINDs will your programs take advantage of the new statistics to build more efficient access paths, at least for static SQL applications.

Without an automated method of comparing and contrasting access paths, DB2 program change management can be time-consuming and error-prone – especially when we deal with thousands of programs. And we always have to be alert for a rogue access path – that is, when the optimizer formulates a new access path that performs worse than the previous access path.

Regular rebinding means that you must regularly review access paths and correct any “potential” problems. Indeed, the Four R’s become the Five R’s because we need to review the access paths after rebinding to make sure that there are no problems. So, we should begin with RTS (or RUNSTATS) to determine when to REORG. After reorganizing we should run RUNSTATS again, followed by a REBIND. Then we need that fifth R – which is to Review the access paths generated by the REBIND.

The review process involves finding which statements might perform worse than before. Ideally, the DBAs would review all access path changes to determine if they are better or worse. But DB2 does not provide any systematic means of doing that. There are tools that can help you achieve this though.

The bottom line is that DB2 shops should implement best practices whereby access paths are tested to compare the before and after impact of the optimizer’s choices. By adopting best practices to periodically REBIND your DB2 programs you can achieve better overall application performance because programs will be using access paths generated from statistics that more accurately represent the data. And by implementing a quality review step there should be less need to reactively tune application performance because there will be fewer access path and SQL-related performance problems.

Sunday, September 01, 2013

Top Ten Common SQL Mistakes (with DB2 for z/OS)

There are many different types of development mistakes that can be made when you are coding a complex application system. But probably the most pervasive mistakes being made when developing an application using DB2 for z/OS are SQL mistakes... well, actually, that statement is probably true regardless of the DBMS, but I am going to focus on DB2 for z/OS given that this is a blog that focuses on DB2 and mainframe issues.

Assuming an ORDER BY is not necessary for ordered results - if you want your results always to be returned in a specific order, you must include the ORDER BY clause. If you fail to do so, a future rebind can change the access path and also possibly change the order of your results set.
Forgetting the NULL indicator - when your query can return a null, but sure to include a null indicator or you will get a -305 SQLCODE (22002 SQLSTATE). And be sure to check the null indicator to see if the data is null or not!
Incorrect expectations when using the NOT IN predicate with NULLs - Nulls can be confusing when using the IN predicate. For example, what if we want to find all Colors who are not assigned to a particular Product using a query like shown below. THe problem arises when the P.color can be NULL. The NULL causes the predicate to be UNKNOWN so the results set is always empty.

SELECT C.color
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
FROM Products AS P);

Coding predicates appropriately in Outer Joins - Sometimes it can be difficult to figure out how to code outer joins appropriately if you don't code them very often. Terry Purcell has written a nice article on the topic that can be found here.
Not coding a cursor for a multi-row result - When more than one row can be returned by your query you must use a cursor (or specify FETCH FIRST 1 ROW ONLY)
Recompiling but not binding - if you make changes to the SQL, you have to BIND it again. Otherwise it won't work!
Forgetting to use single quotes around strings (instead of double quotes) - DB2 SQL expects single quotes around character strings.
Trying to modify a Primary Key column - you cannot modify a primary key. A primary key should be unique within the table and immutable.
Forcing dynamic SQL into static SQL (sometimes hundreds of static SQL statements) - you should analyze the type and nature of your database queries to determine whether they should be static or dynamic. Consider using the advice here (static SQL v. dynamic SQL) to guide you.
Asking for more data than you need (columns and/or rows) - specify only the columns that you actually need in your SELECT-list... and use WHERE clauses (predicates) to filter the data to just that data that you need before bringing it into the program. The lesser the amount of data that DB2 needs to transfer from the database to your program, the more efficient things will be!

Friday, July 26, 2013

Wednesday, June 19, 2013

DB2 Locking, Part 12: Lock Promotion and Escalation

It can be potentially troublesome when DB2 lock promotion and escalation occur. These situations can greatly reduce the availability of concurrent access to data.

First let's look at lock promotion. When binding a program with an ISOLATION level of RR, the optimizer sometimes decides that table space locks will perform better than page locks. As such, the optimizer promotes the locking level to table space locking, regardless of the LOCKSIZE specified in the DDL. This process is called lock promotion.

When you set the LOCKSIZE bind parameter to ANY, DB2 processing begins with page-level locking. As processing continues and locks are acquired, however, DB2 might decide that too many page (or row) locks have been acquired, causing inefficient processing.

In this scenario, DB2 escalates the level of locking from page (or row) locks to table or table space locks—a procedure called lock escalation. The threshold governing when lock escalation occurs is set in one of two ways:

The DSNZPARM start-up parameters for DB2
The LOCKMAX parameter of the CREATE or ALTER TABLESPACE statement (which is stored in the MAXROWS column of SYSIBM.SYSTABLESPACE)

Lock escalation applies to objects defined with LOCKSIZE ANY in the DDL.

There are some limitations on lock escalation, though. A table lock can never be escalated to a table space lock. Table space locks are the highest level of locking and, therefore, cannot be escalated. Furthermore, a row lock will not be escalated to a page lock. It will be escalated to a table space lock.

User lock escalation occurs if a single user accumulates more page locks than are allowed by the DB2 subsystem (as set in DSNZPARMs), the program is informed via a -904 SQLCODE. The program can either issue a ROLLBACK and produce a message indicating that the program should be modified to COMMIT more frequently or, alternately, escalate the locking strategy itself by explicitly issuing a LOCK TABLE statement within the code.

Prior to implementing the second approach, be sure to understand the ramifications of issuing the LOCK TABLE statement and how it works.

Locking DSNZPARMs

There are two DSNZPARM parameters that are used to govern DB2 locking and lock escalation: NUMLKTS and NUMLKUS.

NUMLKTS defines the threshold for the number of page locks that can be concurrently held for any one table space by any single DB2 application (thread). When the threshold is reached, DB2 escalates all page locks for objects defined as LOCKSIZE ANY according to the following rules:

All page locks held for data in segmented table spaces are escalated to table locks.
All page locks held for data in partitioned table spaces are escalated to table space locks.

NUMLKUS defines the threshold for the total number of page locks across all table spaces that can be concurrently held by a single DB2 application. When any given application attempts to acquire a lock that would cause the application to surpass the NUMLKUS threshold, the application receives a resource unavailable message (SQLCODE of -904).

TABLESPACE DDL Locking Parameters

In general, letting DB2 handle the level of locking required can be a fine strategy. Turning over the determination for lock size to DB2 requires setting LOCKSIZE ANY. Of course, you might have a compelling reason to use a specific LOCKSIZE. For example, you might choose to specify LOCKSIZE PAGE to explicitly direct DB2 to lock at the page level. Or, under certain conditions you might choose LOCKSIZE ROW to implement row-level locking.

The LOCKMAX parameter specifies the maximum number of page or row locks that any one process can hold at any one time for the table space. When the threshold is reached, the page or row locks are escalated to a table or table space lock. The LOCKMAX parameter is similar to the NUMLKTS parameter, but for a single table space only.

Monday, November 23, 2009

Reading Things That Aren't There... and Missing Things That Are!

You can shoot yourself in the foot using DB2 if you are not careful. There are options that you can specify that may cause you to read data that is not really in the database. And, alternately, you can set things up so that you miss reading data that is actually in the database.

How, you might be asking? Well, dirty reads will take care of the first one. Specifying ISOLATION(UR) implements read-through locks, which is sometimes referred to as a dirty read. It applies to read operations only. With this isolation level data may be read that never actually exists in the database, because the transaction can read data that has been changed by another process but is not yet committed.

Read uncommitted isolation provides the highest level availability and concurrency of the isolation levels, but the worst degree of data integrity. It should be used only when data integrity problems can be tolerated. Certain types of applications, such as those using analytical queries, estimates, and averages are likely candidates for read uncommitted locking. A dirty read can cause duplicate rows to be returned where none exist or no rows may be returned when one (or more) actually exists. When choosing read uncommitted isolation the programmer and DBA must ensure that these types of problems are acceptable for the application.

OK, so what about not reading data that is in the database? DB2 V9 provides us an option to do just that. In DB2 9 it is possible for a transaction to skip over rows that are locked. This can be accomplished by means of the SKIP LOCKED DATA option within your SQL statement(s). SKIP LOCKED DATA can be specified in SELECT, SELECT INTO, and PREPARE, as well as searched UPDATE and DELETE statements. You can also use the SKIP LOCKED DATA option with the UNLOAD utility.

When you tell DB2 to skip locked data then that data is not accessed and your program will not have it available. DB2 just skip over any locked data instead of waiting for it to be unlocked. The benefit, of course, is improved performance because you will not incur any lock wait time. But it comes at the cost of not accessing the locked data at all. This means that you should only utilize this clause when your program can tolerate skipping over some data.

The SKIP LOCKED DATA option is compatible with cursor stability (CS) isolation and read stability (RS) isolation. But it cannot be used with uncommitted read (UR) or repeatable read (RR) isolation levels. DB2 will simply ignore the SKIP LOCKED DATA clause under UR and RR isolation levels.

Additionally, SKIP LOCKED DATA works only with row locks and page locks. That means that SKIP LOCKED DATA does not apply to table, partition, LOB, XML, or table space locks. And the bigger the lock size, the more data that will be skipped when a lock is encountered. With row locking you will be skipping over locked rows, but with page locking you will be skipping over all the rows on the locked page.

Use both of these features with extreme care and make sure that you know exactly what you are telling DB2 to do. Otherwise, you might be reading more... or less than you want!

Friday, August 21, 2009

Approaches to Access Path Management

BIND and REBIND are important components in assuring efficient DB2 applications. Because the
BIND/REBIND process determines exactly how your DB2 data is accessed, it is important that you develop an appropriate strategy for when and how to REBIND your programs.

There are several common REBIND approaches taken by DB2 users. By far, the best approach is to REBIND your applications over time as your data and systems change. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change.

Other approaches include REBINDing only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to REBIND automatically after a regular period of time (days, weeks, months, etc.). This approach can work if the period of time is wisely chosen based on the application data – but it still can pose administrative issues.

The final approach can be summarized as “if it ain’t broke don’t fix it!” This is the worst of the several approaches discussed here. The biggest problem with this approach is that you are penalizing every program in your subsystem for fear that a program or two may have a degraded access path. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data changes.

Of course, the possibility of degraded performance is real – and that is why this approach has been adopted at some sites. The problem is being able to find which statements have degraded. In an ideal world we would be to be able to review the access path changes beforehand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to REBIND on a regular basis as your data changes. This approach has become known as the three Rs. To implement this approach you:

Regularly reorganize the data to ensure that it is optimally structured.
Follow that with RUNSTATS to be sure that the reorganized state of the data is reflected in the DB2 Catalog.
And follow that with a REBIND for all the application programs that access the data structures impacted by the REORG and RUNSTATS.

At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions. DB2 is unlikely to make the same access path choice as your data grows – and as patterns within the data change.

By REBINDing you can generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. Additionally, as DB2 changes are introduced (PTFs, new version/release) optimizer improvements and new access techniques can be incorporated into the access paths. That is, if you never REBIND, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

Adopting the Three R’s approach can pose additional questions. For example, when should you reorganize? In order to properly determine when a REORG is needed you’ll have to look at statistics. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, we need to add another R – in other words:

RUNSTATS or better yet, RTS
REORG
RUNSTATS
REBIND

Now it is true that some folks don’t rely on statistics to schedule a REORG. Instead, they just build the JCL to REORG their database objects when they create the object. So they create a table space then build the REORG job and schedule it to run monthly, or quarterly, or on some regular basis. This is better than no REORG at all, but it is probably not the best approach because you are most likely either reorganizing too soon (in which case you waste the CPU cycles to do the REORG) or you are reorganizing too late (in which case performance is suffering for a period of time before the REORG runs). Better to base your REORGs off of statistics and thresholds using either RUNSTATS or RTS.

Without accurate statistics there is little hope that the optimizer will formulate the best access path to retrieve your data. If the optimizer doesn’t have accurate information on the size, organization, and particulars of your data then it will be creating access paths based on either default or inaccurate statistics. Incorrect statistics will cause bad choices to be made – such as choosing a merge-scan join when a nested loop join would be better, or failure to invoke sequential prefetch, or using the wrong index – or no index at all. And the problem of inaccurate statistics is pervasive.

There are shops out there that never, or rarely, run RUNSTATS to gather up-to-date statistics. Make sure yours is not one of those shops!

When should you run RUNSTATS? One answer is “As frequently as possible based on how often your data changes.” To do this you will need to know a thing or two about your data growth patterns: what is its make-up, how is it used, how fast does it grow, and how often does it change? These patterns will differ for every table space in your system.

Next we need to decide when to REBIND? The best answer for this is when statistics have changed significantly enough to change access paths. When we know that data has significantly changed it makes sense to REBIND after the RUNSTATS completes. But the trick is determining exactly when we have a “significant” change in our data. Without an automated method of comparing and contrasting statistics (or even better yet, access paths) coming up with an answer in a manual way can be time-consuming and error-prone – especially if we have thousands of DB2 programs to manage.

As we REBIND, we always must be on alert for rogue access paths. A rogue access path is created when the optimizer formulates a new access path that performs worse than the previous access path. This can happen for a variety of reasons. Of course, number one is that the optimizer, though good, is not perfect. So mistakes can happen. Other factors can cause degraded access paths, too. The access paths for volatile tables depend on when you run the RUNSTATS. Volatile tables are those that start out empty, get rows added to them during processing, and are emptied out at the end of the day. And, of course, if the catalog or statistics are not accurate we can get problems, too.

So adopting the Four R’s approach implies that you will have to develop a methodology for reviewing your access paths and taking care of any “potential” problem access paths. Indeed, the Four R’s becomes the Five R’s as we add a step to review the access paths after REBINDing to make sure that there are no rogue access paths:

Start RTS (or a RUNSTATS) to determine when to REORG.
Reorganize the table spaces (and/or indexes)
After reorganizing, run RUNSTATS again,
Follow that with the REBINDs.
Then we need that fifth R – which is to Review the access paths generated by the REBIND.

The review is of utmost importance because the optimizer can make mistakes. And, of course, so can you. But your users will not call you when performance is better (or the same). They only ring you up when performance gets worse. As such, proactive shops will put best practices in place to test REBIND results comparing the before and after impact of the optimizer’s choices.