Monday, November 02, 2015

IBM Insight 2015 Wrap-Up

Last week I attended the IBM Insight conference and blogged about the first few days of the conference here at http://db2portal.blogspot.com/2015/10/an-update-from-ibm-insight-2015.html… and I promised to blog about the remainder of the conference, so here is a synopsis of the highlights.


On Wednesday, the focus of the general session was on IBM’s acquisition of The Weather Company’s technology.  The deal calls for IBM to acquire The Weather Company’s B2B, mobile and cloud-based web properties, including WSI, weather.com, Weather Underground and The Weather Company brand. IBM will not be acquiring The Weather Channel television station, which will license weather forecast data and analytics from IBM under a long-term contract. IBM intends to utilize its newly acquired weather data in its Watson platform.

The deal is expected to close in the first quarter of 2016. Terms were not disclosed.

You can read all about the acquisition in this IBM press release

I spent some of my time at Insight this year learning more about dashDB and it is a very interesting technology. Marketed as data warehousing in the cloud, IBM touts four use cases for dashDB: standalone cloud data warehouse, as a store for data scientists, for those implementing a hybrid data warehouse, and for NoSQL analysis and rapid prototyping.
IBM promotes simplicity, performance, analytics on both traditional and NoSQL, and polyglot language support as the most important highlights of dashDB. And because it has DB2 BLU under the covers IBM dashDB not only super-compresses data, but it can operate on that data without necessarily decompressing it.
Additionally, a big theme of the conference was in-memory technology, and dashDB sports CPU cache capabilities. In fact, I heard several folks at the conference say some variation of “RAM is too slow”… meaning that CPU cache is faster and IBM is moving in that direction.
The bottom line for dashDB is that it offers built-in high availability and workload management capabilities, along with being in-memory optimized and scalable. Worth a look for folks needing a powerful data warehousing platform.
For you DB2 for z/OS folks, IDAA was a big theme of this year’s Insight conference. The latest version, V5.1, adds advanced analytics capabilities and in database transformation, making your mainframe queries that can take advantage of the accelerator faster than ever.
Apache Spark was another pervasive topic this year. It was talked about in multiple sessions and I even had the opportunity to play with it in a hands-on lab. The big news for z folks is that IBM is bringing out a version of Spark for the mainframe that will run on z/OS – it is already supported on zLinux.
Of course, I attended a whole slew of DB2 sessions including SQL coding, performance and administration presentations. Some of the highlights include DB2 11 for LUW being announced, several discussions about dark data, and a lot of information about IBM's Big SQL and how it can be used to rapidly and efficiently access Hadoop (and other unstructured) data using SQL.
I live-tweeted a bunch of highlights of those sessions, too. Indeed, too many to include here, if you are interested in catching everything I have to say about a conference, keep reading these blog posts, of course, but you should really follow me on Twitter, too at http://twitter.com/craigmullins
I also had the honor of delivering a presentation at this year's conference on the changes and trends going on in the world of DB2 for z/OS. Thanks to the 70 or so people who attended my session - I hope you all enjoyed it and learned something, too!
As usual, and well-you-know if you've ever attended this conference before, there was also a LOT of walking to be done. From the hotel to the conference center to the expo hall to lunch to the conference center. But at least there were some signs making light of the situation this year! 
There was a lot of fun to be had at the conference, too. The vendor exhibition hall was stocked with many vendors, big and small, and it seems like they all had candy. I guess that’s what you get when the conference is so close to Halloween! The annual Z party at the House of Blues (for which you need a Z pin to get in – this year’s pin was orange) was a blast and the Maroon 5 concert courtesy of Rocket Software was a lot of fun, too.

If you are looking for a week of database, big data, and analytics knowledge transfer, the opportunity to chat and connect with your peers, as well as some night-time entertainment, be sure to plan to attend next year’s IBM Insight conference (October 23 thru 27, 2016 at the Mandalay Bay in Las Vegas).

Monday, October 26, 2015

An Update from IBM Insight 2015

The annual IBM Insight conference is being held this week in Las Vegas, as usual at the Mandalay Bay Resort and Conference Center. And I have the good fortunate to be in attendance.

If you follow me on Twitter (http://www.twitter.com/craigmullins) I am live tweeting many of the sessions I am attending at the conference. But for those of you who are not following along on Twitter, or just prefer a summary, here’s a quick overview of Monday’s highlights.

The day kicked off with the general session keynote which was delivered by a combination of Jake Porway (of DataKind), IBMers and IBM customers. The theme of the keynote was analytics and cognitive computing. The emphasis, in my opinion, of the event has kind of shifted from the data to what is being done with the data… in other words, the applications. And that is interesting, because the data is there to support the applications, right? That said, I’m a data bigot from way back, so it was a bit app-heavy for me.

That said, there were some insightful moments delivered during the keynote. Bob Picciano, Senior VP of IBM Analytics, kicked things off by informing the audience that true insight comes from exploiting existing data, dark data, and IoT data with agility driven by the cloud. That’s a lot of buzzwords, but it makes sense! And then executives from Whirlpool, Coca-Cola, and Twitter were brought out to talk about how they derive insight from their data and systems.

Perhaps the most interesting portion of the general session was the Watson discussion, led by Mike Rhodin, Senior VP, IBM Watson. We learned more about cognitive systems and how they get more valuable over time as they learn, which makes them unique in the world of computers and IT. IBM shared that there are more than 50 core technologies used by IBM Watson to deliver its cognitive computing capabilities. It was exciting to learn about systems that reason, never stop learning and drive more value over time.
Additional apps were discussed that let us learn about the various ways to choose wine, that nobody starts thinking about ice cream early in the week and that when the weather changes women buy warmer clothes; men buy beer and chips. You kind of had to be there, I guess!

Of course, this was only the first session of a long day. Additional highlights of the day included a high-level overview of the recently announced (but not yet released) DB2 12 for z/OS, the features that should be in a next generation database, and Gartner’s take on the state of big data and analytics. Let’s briefly address each of these one by one.
Firstly, DB2 12, which will be available in the ESP (early support program) in March 2016. There are a lot of nice new features that will be available in this new version. We’ll see a lot more in-memory capabilities which will speed up queries and processing. Buffer pools can be up to 16 TBs, even though today’s z systems can support only 10 TBs – IBM is planning for the future with that one!

And we’ll continue to see the blurring of the lines between static and dynamic SQL. How? Well, we’ll get RLF for static SQL and plan stability for dynamic SQL in DB2 12. IBM claims that we’ll be able to achieve up to 360 million txns/hour with DB2 12 for z/OS using a RESTful web API. Pretty impressive.

And there will be no skip-level migration for DB2 12... You have to migrate thru DB2 11 to get to 12.

OK, what about the features that should be in a next generation database? According to IBM a next gen DBMS should:
  • Deliver advanced in-memory technology
  • Be fast for both transactional and analytic workloads
  • Provide scalable performance
  • Be available, reliable, resilient, and secure
  • Be simple, intelligent and agile
  • And be easy to deploy and cloud-ready

Sounds about right to me!                                                                                        

And finally, let’s briefly take a look at the some of the Gartner observations on big data and analytics. The Gartner presentation was delivered by Donald Feinberg, long-time Gartner analyst on the topic of data and database systems. First of all, Feinberg rejects the term “big data” saying there is no such thing. It is all just data. He went on to claim that “big data” is perhaps the most ambiguous term out there, but it is also the most searched term at Gartner!

Feinberg also rejected the term data lake, saying “There is no such thing as a data lake, it is just a place to dump data.” He warned that it will  come back to bite organizations in a few years if they do not take the time to manage, transform, secure, etc. the data in the lake, turning it into a data reservoir, instead.

He also made the very interesting observation that BI/analytics was the number 1 priority for CIOs on Gartner’s annual CIO priorities survey and that it has been in that slot for 8 years running. But if analytics was really that much of a priority why haven't they gotten it done yet?

Of course, a lot more happened at IBM Insight today – and many more things were discussed. But I don’t want this blog post to become too unwieldy, so that is all I’m going to cover for now.

I’ll try to cover more later in the week as the conference progresses.


Monday, October 12, 2015

OK, So How Does the CURRENTDATA Parm Work?

There seems to be a LOT of confusion out there in DB2-land about the CURRENTDATA parameter, so I thought it might be helpful to blog about the subject.

So first things first: CURRENTDATA is a BIND parameter that can be specified when you bind or rebind your plans and packages. There are two options that can be specified: YES and NO. The default is NO.

But what does it do?

Well, the manuals tell us that CURRENTDATA indicates whether data currency is required for read-only and ambiguous cursors when the ISOLATION(CS) option is used. But that is not very clear... except that CURRENTDATA really only applies with an isolation level of cursor stability.

The general idea here is to enable the developer to exert a level of control by choosing between high performance or up-to-date data. Sure, in a perfect world, everybody would choose both. But in the real world we deal with tradeoffs and the CURRENTDATA parameter allows us to tell DB2 which of these options to prefer.

CURRENTDATA (NO)

The default, CURRENTDATA(NO), is the performance choice, and probably the best choice for most applications. When NO is chosen, you are telling DB2 that completely up-to-date data currency is not required for read-only and ambiguous cursors. That means that DB2 can perform block fetching for distributed, ambiguous cursors.

And that is probably what you want. However, if you are not using pristine coding practices, and your program tries attempts to DELETE WHERE CURRENT OF against an ambiguous cursor, DB2 will return a negative SQLCODE.

What is an Ambiguous Cursor?

At this point you might be asking “What is an ambiguous cursor?” Generally speaking, an ambiguous cursor is one where DB2 cannot tell if your program needs to use the cursor for updating or deleting. You can avoid ambiguous cursors by coding the FOR READ ONLY and FOR UPDATE OF clause on your cursors to inform DB2 of the program’s future intent for the data being accessed.

CURRENTDATA (YES)

What about CURRENTDATA(YES)? So far, we have discussed only the NO option, but you can also specify YES. Doing so indicates that the data should be current for read-only and ambiguous cursors. That means DB2 will acquire page (or row) locks to ensure the currency of the data and block fetching for distributed, ambiguous cursors will be curtailed.

Summary

Hopefully this short blog post helps to clear up a muddy topic. Of course, there is a lot more to DB2 locking and performance and I do not want to make it seem like this is the only thing you need to know in terms of the tradeoffs between data currency and performance.


But at least one parameter might be a little bit clearer to you today than it was yesterday…

Tuesday, October 06, 2015

Sorting by Day of the Week in DB2

Today's blog will walk you through a sorting "trick" that you can use when you are dealing with days of the week. A relatively common problem involves sorting a column that contains a day-of-the-week not in alphabetical order by in the order it appears in the week...

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing the name of the day on which the transaction happened; let’s call this column DAY_NAME. Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, if we write the first query that comes to mind, the results will obviously be sorted improperly:

SELECT   DAY_NAME, COL1, COL2 . . .
FROM     TXN_TABLE
ORDER BY DAY_NAME;

The results from this query would be ordered alphabetically; in other words
  1. FRI
  2. MON
  3. SAT
  4. SUN
  5. THU
  6. TUE
  7. WED

And it would rare to want this type of data sorted in that order, right? The more common need would be to sort the data the way it appears on the calendar.



One solution would be to design the table with an additional numeric or alphabetic column that would allow us to sort that data properly. By this I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. Then we could SELECT DAY_NAME but sort by DAY_NUM.

But this is a bad fix. Not only does it requires a database design change, this "fix" also introduces the possibility for the DAY_NUM value and DAY_NAME value to get out of sync... unless we are very careful, or perhaps do not allow DAY_NUM to be changed other than via an INSERT trigger that automatically populates the correct number. But requiring a trigger adds even more complexity to this "fix" which really should indicate to us that it is not a very good proposal.

A better solution uses just SQL and requires no change to the database structures. All you need is an understanding of SQL and SQL functions – in this case, the LOCATE function.  Here is the SQL:

SELECT   DAY_NAME, COL1, COL2 . . .
FROM     TXN_TABLE
ORDER BY LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME);

The trick here is to understand how the LOCATE function works: it returns the starting position of the first occurrence of one string within another string. So, in our example, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position.

So,if DAY_NAME is WED, the LOCATE function in the above SQL statement returns 10... and so on. To summarize, Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

(Note: Some other database management systems have a function similar to LOCATE called INSTR.) 

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME)/3) + 1;


Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week. 

You can monkey with the code samples here to modify the results to your liking. For example, if you prefer Monday to be the first day of the week and Sunday the last day of the week, simply change the text string in the LOCATE function as follows:

LOCATE('MONTUEWEDTHUFRISATSUN',DAY_NAME)

My intent with this blog posting is actually twofold. Firstly, I wanted to share a method of sorting days-of-the-week... but secondly, I hope to also have piqued your interest in using SQL functions to achieve all types of different processing requirements. DB2 offers quite a lot of functions and it seems to me that many programmers fail to use functions to their fullest.

So if you are ever face-to-face with a difficult query request, step back and take a moment to consider how DB2 functions might be able to help!

Thursday, October 01, 2015

Understanding DB2 SELECT Syntax

Most DB2 programmers think they know how to correctly code simple SQL SELECT statements. And they usually are correct, as long as you keep that adjective “simple” in the assertion. When the statement requires more than SELECT...FROM…WHERE though, problems can ensue.

One of the biggest SELECT type of problem encountered by DB2 users is related to syntax. To paraphrase Mark Twain, sometimes what people think they know, just ain’t so.

How can you find the proper syntax for SELECT statements? The DB2 SQL Reference manual contains all of the syntax for DB2 SQL, but query syntax is separated from the rest of the language. Typically, users go to Chapter 5 of the SQL Reference that contains syntax diagrams, semantic descriptions, rules, and examples of the use of DB2 SQL statements. SELECT INTO is there, but SELECT is not in this section. Well, actually, there is a placeholder page that refers the reader back to Chapter 4. Chapter 4 contains the detailed syntax information and usage details for DB2 queries using SELECT. 


Another potentially confusing aspect of DB2 SELECT is the breakdown of SELECT into three sections: fullselect, subselect, and select-statement. This causes many developers to confuse which query options are available to the SELECT statements they want to code.

Let's take a look at each of these three sections:

First up is the select-statement, which is the form of a query that can be directly specified in a DECLARE CURSOR statement, or prepared and then referenced in a DECLARE CURSOR statement. It is the thing most people think of when they think of SELECT in all its glory. If so desired, it can be issued interactively using SPUFI. The select-statement consists of a fullselect, and any of the following optional clauses: WITH CTE, update, read-only, optimize-for, isolation, queryno and SKIP LOCKED DATA.

A CTE, or common table expression, defines a result table with a table-identifier that can be referenced in any FROM clause of the fullselect that follows. Multiple CTEs can be specified following a single WITH keyword. Each specified CTE can also be referenced by name in the FROM clause of subsequent common table expressions.

The next component is a fullselect, which can be part of a select-statement, a CREATE VIEW statement, a materialized query table, a temporary table or an INSERT statement. Basically, a fullselect specifies a result table. A fullselect consists of at least a subselect, possibly connected to another subselect via UNION, EXCEPT or INTERSECT. And ever since DB2 version 9 for z/OS, you can apply either or both ORDER BY and FETCH FIRST clauses. Prior to V9, this sometimes confused folks as they tried to put a FETCH FIRST n ROWS clause or an ORDER BY in a view or as part of an INSERT. That was not allowed! But it is now.

However, a fullselect does not allow any of the following clauses: FOR FETCH ONLY, FOR UPDATE OF, OPTIMIZE FOR, WITH, QUERYNO and SKIP LOCKED DATA. A fullselect specifies a result table – and none of these afore-mentioned clauses apply.

This sometimes confuses folks. I recently had a conversation with a guy who swore that at one point he created a view using the WITH UR clause and that it worked. It didn’t when we spoke and I’m sure it never did.

Finally, a subselect is a component of the fullselect. A subselect specifies a result table derived from the result of its first FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next.

I know, this can all seem to be a bit confusing. But think of it this way: in a subselect you specify the FROM to get the tables, the WHERE to get the conditions, GROUP BY to get aggregation, HAVING to get the conditions on the aggregated data, and the SELECT clause to get the actual columns. In a fullselect you add in the UNION to combine subselects and other fullselects. Finally, you add on any optional clauses (as specified earlier) to get the select-statement.

Now what could be any easier?

Actually, it is not super easy. And if you add in some of the newer SQL capabilities, like OLAP functions or temporal time travel query clauses, it gets even more complicated.


I guess the bottom line is that you really should make sure you have the SQL Reference handy (see link above) if you are doing anything other than simple selecting… because you’ll probably need it.

Monday, September 14, 2015

Are You Heading to Las Vegas for IBM Insight?

The IBM Insight conference is coming up and if you work with DB2, Big Data, analytics, data warehousing, or really, anything at all about enterprise data, then Las Vegas is the place to be the week of October 25 thru 29, 2015.

But what is IBM Insight? Well, you may remember it as the IBM Information on Demand conference, or IOD for short, Yes, IBM has renamed the conference yet again. I'm sure a lot of you can remember when there were a bunch of different "Technical Conferences" like the IMS Tech Conference or the DB2 Tech Conference. Those conferences, as well as several others, all got rolled up into IOD... which is now IBM Insight.


So with that bit of potential confusion out of the way, why should you attend this year's IBM Insight conference? Simply put, there is something of interest for everybody in a data-related profession. 

The conference boasts more than 1600 presentations that run the gamut from DB2 to IMS to Cognos to BI to Big Data to analytics to... well, you get the idea. You can experiences technical training, hands-on labs, and industry use cases delivered by over 1000 industry experts.

There will be multiple keynote sessions, focusing in on IBM's important data initiatives including:
  • Data Management
  • Advanced Analytics
  • Cloud
  • Hadoop & Spark
  • Content Management
  • Watson
Last year's event was attended by over 13,000 folks, which gives attendees a great opportunity to network with your peers and IBMers. Add to that the over 350 exhibitors at the Expo hall and you will be able to view, review, and examine all kinds of interesting software to help you manage your enterprise data.

I also want to promote my presentation at this year's conference, called Not Your Daddy's DB2! I'll talk about the changing landscape of the industry and how DB2 for z/OS has changed (and continues to change) to embrace modern IT. This session will be held on Wednesday, October 28, at 4:00 pm in the South Seas J room). If you're going to the conference, I hope to see you at my presentation.

And, as always, there will be plenty of time to kick back and relax after a long day of networking and learning. On Wednesday evening conference attendees will be treated to a free concert from Maroon 5... and, of course, everything that Las Vegas has to offer can be on your agenda, too.

So what are you waiting for? Register for the 2015 IBM Insight conference today... 

And if you register by September 18th you can get a discounted rate ($300 off).

Tuesday, September 08, 2015

Mainframe Cost Optimization



Modern businesses live in an age of financial austerity, cost containment and cutbacks. It is just a fact of life in many organizations that you need to be constantly vigilant for new ways to reduce costs. If your business relies on the mainframe -- and many of the biggest businesses do -- cost containment is of the utmost importance.
But how to cut costs? Many mainframe support groups are running thin in terms of people - so layoffs don't make a lot of sense. And the software that runs the business can't be cut. Management software supports the business systems, so cutting those may cost more than you save!

But there are things you can do. If you're interested in learning more about IBM MLC software costs, pricing/licensing, mainframe cost optimization and a solution to dynamically manage your system to reduce software costs be sure to take an hour out of your busy schedule and join me for a free webinar titled Mainframe Cost Optimization: Pricing, Licensing, the R4HA, and More! 

I'll be delivering this webinar on September 10, 2015 at 2pm EDT.

During this session I'll discuss:
  • The new mainframe pricing options including zCAP, CMP and MWP
  • The disparate moving parts of sub-capacity pricing including the R4HA
  • Methods for controlling R4HA intelligently to reduce monthly software costs.

So click here to register for the webinar and join me on September 10th. 

Friday, September 04, 2015

Influencing the DB2 Optimizer: Part 7 - Miscellaneous Additional Considerations

In this 7th, and final installment of this series on influencing the DB2 optimizer's access path choices, we will take a look at a couple of additional things to consider as you work toward improving your SQL performance.

Favor Optimization Hints Over Updating the DB2 Catalog  

Optimization hints to influence access paths are less intrusive and easier to implement than changing data in the DB2 Catalog. However, that does not mean that you should use optimization hints all the time! Do not use optimization hints as a crutch to arrive at a specific access path. Optimization hints are best used when an access path changes and you want to go back to a previous, efficient access path.

Limit Ordering to Avoid Scanning  

The optimizer is more likely to choose an index scan when ordering is important (ORDER BY, GROUP BY, or DISTINCT) and the index is clustered by the columns to be sorted.

Maximize Buffers and Minimize Data Access  

If the inner table fits in 2% of the buffer pool, nested loop join should be favored. Therefore, to increase the chances of nested loop joins, increase the size of the buffer pool (or decrease the size of the inner table, if ­possible).

Consider Deleting Non-uniform Distribution Statistics

Sometimes non-uniform distribution statistics can cause dynamic SQL statements to fluctuate dramatically in terms of how they perform. To decrease these wild fluctuations, consider removing the non-uniform distribution statistics from the DB2 Catalog.

Although dynamic SQL makes the best use of these statistics, the overall performance of some applications that heavily use dynamic SQL can suffer. The optimizer might choose a different access path for the same dynamic SQL statement, depending on the values supplied to the predicates. In theory, this should be the desired goal. In practice, however, the results might be unexpected. For example, consider the following dynamic SQL statement:

SELECT   EMPNO, LASTNAME
FROM     DSN81010.EMP
WHERE    WORKDEPT = ?

The access path might change depending on the value of WORKDEPT because the optimizer calculates different filter factors for each value, based on the distribution statistics. As the number of occurrences of distribution statistics increases, the filter factor decreases. This makes DB2 think that fewer rows will be returned, which increases the chance that an index will be used and affects the choice of inner and outer tables for joins.

These statistics are stored in the SYSIBM.SYSCOLDIST and SYSIBM.SYSCOLDISTSTATS tables and can be removed using SQL DELETE statements.

This suggested guideline does not mean that you should always delete the non-uniform distribution statistics. My advice is quite to the contrary. When using dynamic SQL, allow DB2 the chance to use these statistics. Delete these statistics only when performance is unacceptable. (They can always be repopulated later using RUNSTATS.)

Collect More Than Just the Top Ten Non-uniform Distribution Statistics   

If non-uniform distribution impacts more than just the top ten most frequently occurring values, you should use the FREQVAL option of RUNSTATS to capture more than 10 values. Capture only as many as will prove to be useful for optimizing queries against the non-uniformly distributed data.

DB2 Referential Integrity Use

Referential integrity (RI) is the implementation of constraints between tables so that values from one table (the parent) control the values in another (the dependent, or child). A referential constraint between a parent table and a dependent table is defined by a relationship between the columns of the tables. The parent table’s primary key columns control the values permissible in the dependent table’s foreign key columns. For example, in the sample table, DSN8810.EMP, the WORKDEPT column (the foreign key) must reference a valid department as defined in the DSN8810.DEPT table’s DEPTNO column (the primary key).

You have two options for implementing RI at your disposal: declarative and application. Declarative constraints provide DB2-enforced referential integrity and are specified by DDL options. All modifications, whether embedded in an application program or ad hoc, must comply with the referential constraints. Favor using declarative RI as DB2 will then be aware of the relationship and can use that information during access path optimization.

Application-enforced referential integrity is coded into application programs. Every program that can update referentially-constrained tables must contain logic to enforce the referential integrity. This type of RI is not applicable to ad hoc updates.

With DB2-enforced RI, CPU use is reduced because the Data Manager component of DB2 performs DB2-enforced RI checking, whereas the RDS component of DB2 performs ­application-enforced RI checking. Additionally, rows accessed for RI checking when using application-enforced RI must be passed back to the application from DB2. DB2-enforced RI does not require this passing of data, further reducing CPU time.

In addition, DB2-enforced RI uses an index (if one is available) when enforcing the referential constraint. In application-enforced RI, index use is based on the SQL used by each program to enforce the constraint.

If you must use application RI instead of declarative RI, be sure to also define referential constraints with the NOT ENFORCED keyword. In that case, the constraints will not be enforced by DB2, but will be documented in the DDL. And it gives DB2 additional information that can be used by the Optimizer for query optimization.

Summary

Hopefully this 7-part series on influencing DB2 access paths provided you with a nice overview of the options available to you and considerations for their use. If you are interested in learning more about SQL tuning and DB2 performance, consider purchasing the book from which this series was drawn: DB2 Developer's Guide 6th edition.

Happy SQL performance tuning!

Tuesday, August 25, 2015

Influencing the DB2 Optimizer: Part 6 - Using Optimization Hints

So far we have discussed several standard methods of influencing DB2’s access path selection -- in the first 4 parts of this series (1, 2, 3, 4) -- as well as updating DB2 Catalog statistics in part 5. But there is another tool in your DBA arsenal that you can use to force/influence access path selection - optimization hints, which will be the focus of today's post.


Using Optimization Hints to Force an Access Path

Optimization hints can be coded to influence the DB2 Optimizer's choice of access path. Actually, though, this method does not “influence” the access path; instead it directs DB2 to use a specific access path instead of determining a new access path using statistics.

The same basic cautions that apply to modifying DB2 Catalog statistics also apply to optimization hints. Only experienced analysts and DBAs should attempt to use optimization hints. However, optimization hints are much easier to apply than updating DB2 Catalog statistics.

There are several methods of implementing optimization hints. First, you can code optimization hints using the PLAN_TABLE. However, before you can use optimization hints, the DB2 DSNZPARM parameter for optimization hints (OPTHINTS) must be set to YES. If it is set to NO, you cannot use optimization hints.



There are two ways to specify an optimization hint to the PLAN_TABLE:
  • Modify PLAN_TABLE data to use an access path that was previously created by the DB2 optimizer, or;
  • INSERT rows to the PLAN_TABLE to create a new access path independently.


In general, favor the first method over the second . It is a difficult task to create from scratch an accurate access path in the PLAN_TABLE. If you do not get every nuance of the access path correct, it is possible that DB2 will ignore the optimization hint and calculate an access path at bind time. However, if you use an access path that was originally created by DB2, you can be reasonably sure that the access path will be valid. Of course, sometimes an access path created for an older version of DB2 will not be valid in a newer version of DB2. 

You should consider using optimization hints for the same reasons you would choose to modify DB2 Catalog statistics or tweak SQL. The general reason is to bypass the access path chosen by DB2 and use a different, hopefully more efficient, access path.

In addition to this reason, optimization hints are very useful as you migrate from release to release of DB2. Sometimes, a new release or version of DB2 can cause different access paths to be chosen for queries that were running fine. Or perhaps new statistics were accumulated between binds causing access paths to change. By saving old access paths in a PLAN_TABLE, you can use optimization hints to direct DB2 to use the old access paths instead of the new, and perhaps undesirable, access paths due to the new release or statistics.Of course, ever since DB2 9 for z/OS, you can use the plan management feature to save old access paths across rebinds. This is a more effective approach than saving them yourself.

Always test and analyze the results of any query that uses optimization hints to be sure that the desired performance is being achieved.

Defining an Optimization Hint  

To specify that an optimization hint is to be used, you will have to ensure that the PLAN_TABLE has the appropriate columns in it: 

OPTHINT              VARCHAR(128)  NOT NULL WITH DEFAULT
HINT_USED            VARCHAR(128)  NOT NULL WITH DEFAULT
PRIMARY_ACCESSTYPE   CHAR(1)       NOT NULL WITH DEFAULT

To set an optimization hint, you need to first identify (or create) the PLAN_TABLE rows that refer to the desired access path. You will then need to update those rows in the PLAN_TABLE, specifying an identifier for the hint in the OPTHINT column. For example,

UPDATE PLAN_TABLE
   SET OPTHINT = ‘SQLHINT’
WHERE  PLANNO = 50
AND    APPLNAME = ‘PLANNAME’;

Of course, this is just an example. You may need to use other predicates to specifically identify the PLAN_TABLE rows to include in the optimization hint. Some columns that might be useful, depending on your usage of dynamic SQL and packages, include QUERYNO, PROGNAME, VERSION, and COLLID.

Keep in mind, though, that when you change a program that uses static SQL statements, the statement number might change, causing rows in the PLAN_TABLE to be out of sync with the modified application. For this reason, you should probably choose the newer method of specifying optimization hints, which I will discuss in a moment.

You can use the QUERYNO clause in SQL statements to ease correlation of SQL statements in your program with your optimization hints. Statements that use the QUERYNO clause are not dependent on the statement number. To use QUERYNO, you will need to modify the SQL in your application to specify a QUERYNO, as shown in the following:

SELECT MGRNO
FROM   DSN81010.DEPT
WHERE  DEPTNO = ‘A00’
QUERYNO 200;

You can then UPDATE the PLAN_TABLE more easily using QUERYNO and be sure that the optimization hint will take effect, as shown in the following:

UPDATE PLAN_TABLE
   SET OPTHINT = ‘SQLHINT’
WHERE  QUERYNO = 200 
AND    APPLNAME = ‘PLANNAME’;

When the PLAN_TABLE is correctly updated (as well as possibly the application), you must REBIND the plan or package to determine if the hint is being used by DB2. When rebinding you must specify the OPTHINT parameter:

REBIND PLAN PLANNAME . . . OPTHINT(SQLHINT)

Be aware that the optimization hints may not actually be used by DB2. For optimization hints to be used, the hint must be correctly specified, the REBIND must be accurately performed, and the environment must not have changed. For example, DB2 will not use an access path specified using an optimization hint if it relies on an index that has since been dropped.


Use EXPLAIN(YES) to verify whether the hint was actually used. If the hint was used, the HINT_USED column for the new access path will contain the name of the optimization hint (such as SQLHINT in the previous example).


Optimization Hints and DB2 10 for z/OS

As of DB2 Version 10, applying optimization hints is much easier than it was in prior versions of DB2. You can build the access path repository using the BIND QUERY command. The access path repository contains system-level access path hints and optimization options.

These statement-level optimization hints, also known as instance-based statement hints, are enforced across the entire DB2 subsystem based upon the actual text of the SQL statement.

The access path repository contains information about queries including the text of the text, access paths, and optimization options. Using the access path repository, you can save multiple copies of access paths and switch back and forth between different copies of access paths for the same query. The access path repository resides in the DB2 Catalog and is composed of the following tables:

  • The primary table in the access path repository is SYSIBM.SYSQUERY. It contains one row for each static or dynamic SQL query.
  • SYSIBM.SYSQUERYPLAN holds access path details for queries in SYSQUERY. A query can have more than one access path for it.
  • SYSIBM.SYSQUERYOPTS stores miscellaneous information about each query.



The access path repository tables are populated when you issue the BIND QUERY command. Before running BIND QUERY, though, you must first populate the DSN_USERQUERY_TABLE with the query text you want to bind. Table 1 below depicts the columns of the DSN_USERQUERY_TABLE.

Table 1. DSN_USERQUERY_TABLE Columns
Column
Description
QUERYNO
An integer value that uniquely identifies the query. It can be used to correlate data in this table with the PLAN_TABLE.

SCHEMA
The default schema name to be used for unqualified database objects used in the query (or blank).

HINT_SCOPE
The scope of the access plan hint:
System-level access plan hint
Package-level access plan hint

QUERY_TEXT
The actual text of the SQL statement.

USERFILTER
A filter name that can be used to group a set of queries together (or blank).

OTHER_OPTIONS
IBM use only (or blank).

COLLECTION
The collection name of the package (option for package-level access plan hints).

PACKAGE
The name of the package (option for package-level access plan hints).

VERSION
The version of the package (option for package-level access plan hints); if '*' is specified, DB2 uses only COLLECTION and PACKAGE values to look up rows in the SYSIBM.SYSPACKAGE and SYSIBM.SYSQUERY catalog tables.

REOPT
The value of the REOPT BIND parameter:
A: REOPT(AUTO)
1: REOPT(ONCE)
N: REOPT(NONE)
Y: REOPT(ALWAYS)
blank: Not specified

STARJOIN
Contains an indicator specifying whether star join processing was enabled for the query:
Y: Yes, star join enabled
N: No, star join disabled
blank: Not specified.

MAX_PAR__DEGREE
The maximum degree of parallelism (or -1 if not specified).

DEF_PAR_DEGREE
Indicates whether parallelism was enabled:
ONE: Parallelism disabled
ANY: Parallelism enabled
Blank: Not specified

SJTABLES
Contains the minimum number of tables to qualify for star join (or [nd]1 if not specified).

QUERYID
Identifies access plan hint information in the SYSIBM.SYSQUERY and SYSIBM.SYSQUERYPLAN tables.

OTHER_PARMS
IBM use only (or blank).


You can populate the DSN_USERQUERY_TABLE using simple INSERT statements. Consider using a subselect from the DB2 Catalog to assist, such as in the following example: 

INSERT INTO DSN_USERQUERY_TABLE          
  (QUERYNO, SCHEMA, HINT_SCOPE, QUERY_TEXT, USERFILTER,
   OTHER_OPTIONS, COLLECTION, PACKAGE, VERSION, REOPT, 
   STARJOIN, MAX_PAR_DEGREE, DEF_CURR_DEGREE, SJTABLES, 
   OTHER_PARMS)                                        
SELECT SPS.STMTNO, 'SCHEMANAME', 1, SPS.STATEMENT, '', 
       '', SPS.COLLID, SPS.NAME, SPS.VERSION, '', 
       '', -1, '', -1, ''
FROM   SYSIBM.SYSPACKSTMT SPS
WHERE  SPS.COLLID = 'COLLID1'
AND    SPS.NAME = 'PKG1'
AND    SPS.VERSION = 'VERSION1'
AND    SPS.STMTNO = 177;

This particular INSERT statement retrieves data from SYSIBM.SYSPACKSTMT for statement number 177 of VERSION1 of the PKG1 package in collection COLLID1.

You still must populate the PLAN_TABLE with hints. Then you need to run the BIND QUERY command to build the data in the access path repository. Running BIND QUERY  LOOKUP(NO) reads the statement text, default schema, and bind options from DSN_USERQUERY_TABLE, as well as the system-level access path hint details from correlated PLAN_TABLE rows, and inserts the data into the access path repository tables.

After you have populated the access path repository, it is a good idea to delete the statement from the DSN_USERQUERY_TABLE. Doing so ensures that hints are not replaced when you issue subsequent BIND QUERY commands. 

When a statement-level hint is established, DB2 tries to enforce it for that statement. Hints for static SQL statements are validated and applied when you REBIND the package that contains the statements. Hints for dynamic SQL statements are validated and enforced when the statements are prepared.

To remove hints from the access path repository, use the FREE QUERY command.

Runtime Options Hints

You can also use BIND QUERY to set run-time options for a given SQL statement text. This allows you to control how a particular statement should behave without trying to force its actual access path.
To specify a run-time options hint, INSERT a row into DSN_USERQUERY_TABLE indicating the QUERYNO as you would for an optimization hint. Before running BIND QUERY, however, make sure that the PLAN_TABLE does not contain a corresponding access path for the query.

If the PLAN_TABLE contains an access path when you run BIND QUERY LOOKUP(NO), the SYSIBM.SYSQUERYOPTS table won't be populated. If you err and issue the BIND QUERY when data exists in the PLAN_TABLE, simply DELETE the pertinent PLAN_TABLE data and rerun the BIND QUERY command.

The REOPT, STARJOIN, MAX_PAR_DEGREE, DEF_CURR_DEGREE, SJTABLES, and GROUP_MEMBER columns can be used to set the run-time options using DSN_USERQUERY_TABLE.

Summary

Optimization hints provide a strong option for particularly vexing DB2 tuning problems. Consider investigating their usage for troublesome queries that once ran efficiently (and you still have the PLAN_TABLE data) but for whatever reasons, are no longer being optimized effectively.