The Db2 Portal Blog

Tuesday, February 27, 2007

Dynamic SQL and Performance

The performance of dynamic SQL is one of the most widely debated DB2 issues. Some shops try to avoid it, but in this day and age of ERP/CRM systems, Java/JDBC programming, and web applications, all of which heavily rely on dynamic SQL, its use is becoming more ubiquitous.

Still, many shops that allow dynamic SQL try to place strict controls on its use. But as new and faster versions of DB2 are released, many of the restrictions on dynamic SQL use can be eliminated.

I suppose that you can still find some valid reasons for prohibiting dynamic SQL. For example, you should avoid dynamic SQL when the dynamic SQL statements are just a series of static SQL statements in disguise. Consider an application that needs two or three predicates for one SELECT statement that is otherwise unchanged. Coding three static SELECT statements can be more efficient than coding one dynamic SELECT with a changeable predicate. The static SQL takes more time to code but probably less time to execute.

Another reason for avoiding dynamic SQL is that it can require more overhead to process than equivalent static SQL. Dynamic SQL incurs overhead because the cost of the dynamic bind, or PREPARE, must be added to the processing time of all dynamic SQL programs. But this overhead is not quite as costly as many people seem to think it is.

To determine the cost of a dynamic bind, consider running some queries using SPUFI with the DB2 Performance trace turned on. Then examine the performance reports or performance monitor output to determine the elapsed and TCB time required to perform the PREPARE. The results should show elapsed times less than 1 second and subsecond TCB times. The actual time required to perform the dynamic prepare will vary with the complexity of the SQL statement. In general, the more complex the statement, the longer DB2 will take to optimize it. So be sure to test SQL statements of varying complexity.

Of course, the times you get will vary based on your environment, the type of dynamic SQL you use, and the complexity of the statement being prepared. Complex SQL statements with many joins, table expressions, unions, and subqueries take longer to PREPARE than simple queries. However, factors such as the number of columns returned or the size of the table being accessed have little or no effect on the performance of the dynamic bind.

And prepared dynamic SQL can be cached in the EDM pool so that the same SQL statement can reuse the mini plan for the statement the next time it runs. Of course, the statement has to be exactly the same in order to benefit from the dynamic SQL cache.

Keep in mind, too, that performance is not the only factor when deciding whether or not to use dynamic SQL. For example, if a dynamic SQL statement runs a little longer than a static SQL statement but saves days of programming cost then perhaps dynamic SQL is the better choice. It all depends on what is more important -- the cost of development and maintenance or squeezing out every last bit of performance at any cost.

Overhead issues notwithstanding, there are valid performance reasons for favoring dynamic SQL, too. For example, dynamic SQL can enable better use of indexes, choosing different indexes for different SQL formulations. Properly coded, dynamic SQL can use the column distribution statistics stored in the DB2 catalog, whereas static SQL is limited in how it can use these statistics. Use of the distribution statistics can cause DB2 to choose different access paths for the same query when different values are supplied to its predicates.

The REOPT bind parameter can be used to allow static SQL containing host variables or special registers to behave like incremental-bind statements. When dynamic reoptimization is activated, a dynamic bind similar to what is performed for dynamic SQL is performed. This means that these statements get compiled at the time of EXECUTE or OPEN instead of at BIND time. During this compilation, the access plan is chosen, based on the real values of these variables.

Specifying REOPT ONCE causes the access plan to be cached after the first OPEN or EXECUTE request and it is used for subsequent execution of this statement. With REOPT ALWAYS, the access plan is regenerated for every OPEN and EXECUTE request, and the current set of host variable, parameter marker, and special register values is used to create this plan.

Additionally, consider that the KEEPDYNAMIC bind option can enhance the performance of dynamic SQL. When a plan or package is bound specifying KEEPDYNAMIC(YES), the prepared statement is maintained across COMMIT points. Contrast that with KEEPDYNAMIC(NO), where only cursors using the WITH HOLD option keep the prepared statement after a COMMIT.

Dynamic SQL usually provides the most efficient development techniques for applications with changeable requirements (for example, numerous screen-driven queries). In addition, dynamic SQL generally reduces the number of SQL statements coded in your application program, thereby reducing the size of the plan and increasing the efficient use of system memory.
So, if you have a compelling reason to use dynamic SQL, then by all means, go ahead and code up your program to use dynamic SQL. I mean, after all, it is no longer the early, dark days of DB2 when dynamic SQL almost certainly meant performance problems. And, as I mentioned in the beginning, dynamic SQL is likely to be foisted on you in your new, more modern applications even if you continue to desperately keep it out of your COBOL programs.

For more in-depth details on dynamic SQL, consider downloading the following IBM redbook: DB2 for z/OS and OS/390 : Squeezing the Most Out of Dynamic SQL.

Wednesday, February 14, 2007

A Report From SHARE

As regular readers of this blog know, I regularly attend and speak at the SHARE user conferences held twice annually. The Spring 2007 SHARE conference is being held this week, February 12 thru 16, 2007, in Tampa, Florida. It is now Wednesday, mid-way through the week, and time for me to post a quick update of what is going on at SHARE, from a DB2 point of view.

First of all, I’ve been privileged to deliver two presentations this week. The first is one of the perennial favorites called “A DB2 Performance Tuning Roadmap.” I’ve been delivering this presentation now, in various forms and with various changes, for about 8 years. And every time I think it is time to retire it, I hear from a user group or a conference that is interested in having me present it. And every time I get good feedback. I guess that is not so surprising, I mean, after all, performance is always one of those topics to raise interest among all DBAs and many developers. This particular presentation works as an overview of the three basic components of database performance management: that is, you need to manage and tune the application/SQL, the database objects, and the DBMS/subsystem. All three are integral to optimizing DB2 performance and the presentation walks through the basics of each of the three areas and offers guidance and suggestions.

The second presentation I delivered is called “Change Control for DB2 Access Paths” and it is a bit newer. The premise of this presentation is that organizations need to REBIND for performance, but they also need to control change in their production environment. In the presentation I discuss the issues involved in BIND/REBIND management and then introduce how NEON Enterprise Software’s Bind ImpactExpert can provide a solution.

OK, so much for my presentations, what else is happening DB2-wise at SHARE this week? Well, a lot of the presentations are by IBMers focusing on DB2 9 for z/OS. Even though this new version has not been officially released yet, IBM is getting the word out early so we all can understand the benefits of this new release – and also, so we can understand that it is not the goliath that V8 was (nay, let’s say is).

Roger Miller has offered a couple of great presentations on migration and performance issues, and Willie Favero spoke about all of the great new SQL features and functionality that will be available in DB2 9. Then there was the update on the DB2 utilities and what is happening to them in DB2 9… And there is a lot of good news – to net it out, more online utilities, more availability, and less CPU.

If you’re an IMS person, then there has been a lot of coverage of IMS Version 10 here, too. I haven’t personally attended many of the IMS sessions, but I’ve gathered handouts that cover IMS V10 dynamic resource definition, transaction manager enhancements, and database and DBRC enhancements.

And the zNextGen program, that is spearheaded by NEON Enterprise Software's own Kristine Harper, offered an interesting program this week as well. One of the highlights was a fun presentation by Cheryl Watson, a well-reknowned expert in the mainframe world.

Basically, there is a lot of education going on in Tampa this week for mainframe database zealots… and I’m glad I’m hear to soak it up. If you weren’t able to make it to Tampa, consider trying to get to the Fall 2007 SHARE conference in San Diego, CA this August.

Monday, February 05, 2007

See You at SHARE?

Well, it is that time of the year again... that is, time for the first of two annual SHARE conferences. The "Spring 2007" SHARE will be held next week, in Tampa, FL.

For those of you who don't know about SHARE, it is an independent, volunteer-run association for IBM customers. It is not only mainframe-focused, but it is one of the last events mainframe folks can attend to hear about mainframe topics from z/OS to JES and from CICS to IMS to DB2.

I will be presenting on Monday (2/12) and Tuesday (2/13). On Monday I will cover a DB2 Performance Tuning Roadmap - basically presenting a ahigh-level overview of the "things" you need to be aware of, and track, in order to assure an optimally performing DB2 environment. And then on Tuesday I'll be addressing change control for DB2 access paths. As you know, change is strictly controlled on the mainframe, but one exception has been DB2 access paths. Binds and Rebinds are typically done in the production environments without the benefit of oversight or prior testing. This lack of change control results in unpredictable performance impacts. This presentation will discuss the issues and problems this causes, as well as how you can use Bind ImpactExpert to analyze and manage access path changes in your DB2 environment.

And I'll be attending the event all week long, so I'll be sure to post a blog or two about interesting things I learn at the show... so get yourself to Tampa if you can, but if you can't, be sure to tune in during the week next week to hear about some of the highlights (at least my take on the highlights) of SHARE.

Wednesday, January 31, 2007

A Quick Intro to DB2 SQL Functions

The last post I made to this blog used functions to translate data from one form to another. Functions frequently can be used to overcome issues when you are reading data from your DB2 tables. With that in mind, I thought it might make sense to go over DB2 SQL functions at a basic level - as a review for those who might not have used them before and to refresh the memory of those who have.

Functions can be specified in SQL statements to transform data from one state to another. Two types of functions can be applied to data in a DB2 table using SQL: column functions and scalar functions. Column functions analyze a group of rows and compute a single value for a designated column or expression. For example, the SUM function can be used to add, returning the sum of the values instead of each individual value:

SELECT SUM(SALARY)
FROM EMP
WHERE WORKDEPT = 'A01';

This query will return the total salary for all employees in the A01 department. This is a better solution than writing a program to read all employee rows in department A01 and then adding the salaries. With the SUM function less data is returned to the program, so it will operate much more efficiently.

The second type of function is a scalar function. Scalar functions are applied to a column or expression and operate on a single value. For example, the DAYOFWEEK function reads a single date value and returns an integer in the range of 1 to 7; where 1 is Sunday and 7 is Saturday:

SELECT DAYOFWEEK(HIREDATE)
FROM EMP
WHERE EMPNO = '000100';

This query will return an integer between 1 and 7 that indicates the day of the week when employee number 000100 was hired. Of course, this is just one example of many, many scalar functions supported by DB2. For a complete list of the available functions, refer to Chapter 3 of the IBM DB2 SQL Reference manual. Using scalar functions is generally more efficient than writing your own code to do the same thing.

Speaking of writing your own code, though, you can write your own user-defined functions (UDFs) in addition to the base, system-defined functions. UDFs give you the ability to write your own customized DB2 functions and standardize on that code. Consider creating UDFs for specific algorithms and processing that need to be executed in multiple applications in your organization. By writing the code once, putting it into a DB2 UDF, and then calling it whenever you need it, you will minimize errors and improve reusability.

A user-defined function can be specified anywhere a system-defined function can be specified.

Sunday, January 28, 2007

Sorting Out a Solution in DB2

Sometimes an application can require different, perhaps even somewhat "odd" data sorting. These needs may cause developers to sit and scratch their head for hours trying to make DB2 do something that seems "unnatural." But often you can conjure up an answer by properly understanding the problem and applying some creative SQL.

Some of you might be asking "What the heck is he talking about?" That's a fair question, so let’s look at an example to clarify.

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing the name of the day on which the transaction happened; let’s call this column DAY_NAME.

Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, if we write the first query that comes to mind the results will obviously be sorted improperly:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY DAY_NAME;

The results from this query would be ordered alphabetically; in other words

FRI
MON
SAT
SUN
THU
TUE
WED

This is what I mean be an irregular sorting requirement. The example may not be an everyday need, but it is not unrealistic for a business to have this, or a similar requirement that needs a different sort order than strictly alphabetical or numeric. So what is the solution here?

Of course, one solution would be to design the table with an additional numeric or alphabetic column that would sort properly. By this I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. But this requires a database design change, and it becomes possible for the DAY_NUM and DAY_NAME to get out of sync.

There is another solution that is both elegant and does not require any change to the database. To implement this solution all you need is an understanding of SQL and SQL functions – in this case, the LOCATION function. Here is the SQL:

SELECT DAY_NAME, COL1, COL2 . . .
FROM TXN_TABLE
ORDER BY LOCATE(DAY_NAME, 'SUNMONTUEWEDTHUFRISAT');

The trick here is to understand how the LOCATE function works: it returns the starting position of the first occurrence of one string within another string.

So, in our example SQL, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position. So, if DAY_NAME is WED, the LOCATE function in the above SQL statement returns 10. Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

(Note: Other database systems have a similar function called INSTR.)

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE(DAY_NAME, 'SUNMONTUEWEDTHUFRISAT')/3) + 1;

Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week.

With a sound understanding of the features of DB2 SQL, many requirements that seem "odd" at first glance, are achievable using nothing more than SQL and your imagaination.