The Db2 Portal Blog

Friday, November 13, 2015

A Quick and Dirty Guide to DB2 Performance

All performance problems are caused by change. Now as regular readers know, normally I avoid all-encompassing terms like all, always, none and never, but in this case, all is the appropriate term. If your DB2 environment is humming along and things are performing well, then the only thing that will disrupt that is a change of some form.

Change can take many forms, not just a program change. Consider the following ways in which change can be introduced to your DB2 world:

Physical changes to the environment, such as a new CPU, additional memory, new disk, or different storage technology (e.g. SSD).
Changes to system software, such as a new release of a product (for example, QMF, CICS, or GDDM), the alteration of a product (for example, the addition of more or fewer CICS regions or an IMS SYSGEN), or a new product (for example, implementation of DFHSM).
And let's not forget the installation of a new release or version of DB2, which can result in changes in access paths and the utilization of features new to DB2.
An updated or new version of the operating system can also cause DB2 performance problems.
Changes to the DB2 engine from maintenance releases, which can change the DB2 optimizer and other facets of DB2's operation.
Changes in system capacity. More or fewer jobs could be executing concurrently when the performance problem occurs.
Environment changes, such as the implementation of dynamic SQL, web-based programs or the adoption of Data Sharing.
Database changes. This involves changes to any DB2 object, and ranges from adding a new column or an index, to changing table space types, to dropping and re-creating an object.
Changes to the application development methodology, such as usage of check constraints instead of application logic or the use of stored procedures.
Changes to application code.

So regardless of how it may seem at times, performance problems are not caused by magic. Something tangible changes, creating a performance problem in the application, database, or subsystem. The challenge of tuning is to find the source of the change -- the root cause if you will, gauge its impact, and formulate a solution.

Although the majority of your problems will be application-oriented, you will frequently need to explore the tuning opportunities outside of SQL and programming changes when application tuning has little effect.

The following is a quick reference of the possible tuning options for each environment:

To tune z/OS

Change WLM parameters to enhance throughput for DB2, its allied agent, and/or programs.
Modify swappability.
Add memory.
Upgrade CPU.
Add zIIPs and tune SQL/program to take advantage of zIIP processors.
Implement data sharing.
Use an active performance monitor (enables tuning on the fly).

To tune the teleprocessing environments

Change the system generation parameters.
Tune the program definition (PSBs and PPT entries).
Modify the Attachment Facility parameters.
Add or modify resource parameters (e.g. RDO)
Use an active performance monitor (enables tuning on the fly).

To tune the DB2 subsystem

Modify DSNZPARMs, for example to increase log buffers, increase or decrease the number of concurrent users, change lock escalation, and so on.
Issue DISPLAY and ALTER BUFFERPOOL commands to review buffer pool usage characteristics, change buffer pool sizes, and increase or decrease buffer pool thresholds.
Tune the DB2 Catalog, including dropping and freeing objects, executing MODIFY, reorganizing DB2 catalog table spaces and indexes, recovering the DB2 catalog indexes, building additional indexes on the DB2 catalog, tuning disk usage, and implementing data set shadowing.
Perform DSNDB07 tuning.

To tune the DB2 database design

Modify the logical and physical model.
Modify and issue DDL.
Execute ALTER statements.
Ensure that proper parameters are specified.
Implement table changes.
Partition data into universal PBG table spaces.
Spread non-partitioned objects over multiple devices using PIECESIZE.
Add indexes.
REORG tablespaces.
REORG or RECOVER indexes.
Consider or reconsider data compression.
Denormalize the database design.
Consider adding redundant tables.

To tune programs

Perform SQL tuning.
Tune the high-level language (such as COBOL or 4GL).
Use a program restructuring tool.
Run RUNSTATS.
Execute EXPLAIN, modify your code, and REBIND.
Examine the REOPT clause and whether it can help your specific code.
Consider using SQL tweaks and/or hints.
Use the OPTIMIZE FOR n ROWS clause.
Consider activating query parallelism.
Change locking strategies.
Change the DB2 catalog statistics and REBIND.
Use a testing tool to provide what if testing and tuning.
Use a tool to sample the application's address space as it executes.

Of course, this is not an exhaustive list and should only be used as a guideline for the

types of tuning options available to you.

Hopefully this high-level overview of DB2 performance and various tuning options at your disposal has given you some food for thought as you tackle the task of keeping your DB2 environment operating smoothly and efficiently...

Monday, November 02, 2015

IBM Insight 2015 Wrap-Up

Last week I attended the IBM Insight conference and blogged about the first few days of the conference here at http://db2portal.blogspot.com/2015/10/an-update-from-ibm-insight-2015.html… and I promised to blog about the remainder of the conference, so here is a synopsis of the highlights.

On Wednesday, the focus of the general session was on IBM’s acquisition of The Weather Company’s technology. The deal calls for IBM to acquire The Weather Company’s B2B, mobile and cloud-based web properties, including WSI, weather.com, Weather Underground and The Weather Company brand. IBM will not be acquiring The Weather Channel television station, which will license weather forecast data and analytics from IBM under a long-term contract. IBM intends to utilize its newly acquired weather data in its Watson platform.

The deal is expected to close in the first quarter of 2016. Terms were not disclosed.

You can read all about the acquisition in this IBM press release.

I spent some of my time at Insight this year learning more about dashDB and it is a very interesting technology. Marketed as data warehousing in the cloud, IBM touts four use cases for dashDB: standalone cloud data warehouse, as a store for data scientists, for those implementing a hybrid data warehouse, and for NoSQL analysis and rapid prototyping.

IBM promotes simplicity, performance, analytics on both traditional and NoSQL, and polyglot language support as the most important highlights of dashDB. And because it has DB2 BLU under the covers IBM dashDB not only super-compresses data, but it can operate on that data without necessarily decompressing it.

Additionally, a big theme of the conference was in-memory technology, and dashDB sports CPU cache capabilities. In fact, I heard several folks at the conference say some variation of “RAM is too slow”… meaning that CPU cache is faster and IBM is moving in that direction.

The bottom line for dashDB is that it offers built-in high availability and workload management capabilities, along with being in-memory optimized and scalable. Worth a look for folks needing a powerful data warehousing platform.

For you DB2 for z/OS folks, IDAA was a big theme of this year’s Insight conference. The latest version, V5.1, adds advanced analytics capabilities and in database transformation, making your mainframe queries that can take advantage of the accelerator faster than ever.

Apache Spark was another pervasive topic this year. It was talked about in multiple sessions and I even had the opportunity to play with it in a hands-on lab. The big news for z folks is that IBM is bringing out a version of Spark for the mainframe that will run on z/OS – it is already supported on zLinux.

Of course, I attended a whole slew of DB2 sessions including SQL coding, performance and administration presentations. Some of the highlights include DB2 11 for LUW being announced, several discussions about dark data, and a lot of information about IBM's Big SQL and how it can be used to rapidly and efficiently access Hadoop (and other unstructured) data using SQL.

I live-tweeted a bunch of highlights of those sessions, too. Indeed, too many to include here, if you are interested in catching everything I have to say about a conference, keep reading these blog posts, of course, but you should really follow me on Twitter, too at http://twitter.com/craigmullins

I also had the honor of delivering a presentation at this year's conference on the changes and trends going on in the world of DB2 for z/OS. Thanks to the 70 or so people who attended my session - I hope you all enjoyed it and learned something, too!

As usual, and well-you-know if you've ever attended this conference before, there was also a LOT of walking to be done. From the hotel to the conference center to the expo hall to lunch to the conference center. But at least there were some signs making light of the situation this year!

There was a lot of fun to be had at the conference, too. The vendor exhibition hall was stocked with many vendors, big and small, and it seems like they all had candy. I guess that’s what you get when the conference is so close to Halloween! The annual Z party at the House of Blues (for which you need a Z pin to get in – this year’s pin was orange) was a blast and the Maroon 5 concert courtesy of Rocket Software was a lot of fun, too.

If you are looking for a week of database, big data, and analytics knowledge transfer, the opportunity to chat and connect with your peers, as well as some night-time entertainment, be sure to plan to attend next year’s IBM Insight conference (October 23 thru 27, 2016 at the Mandalay Bay in Las Vegas).

Monday, October 26, 2015

An Update from IBM Insight 2015

The annual IBM Insight conference is being held this week in Las Vegas, as usual at the Mandalay Bay Resort and Conference Center. And I have the good fortunate to be in attendance.

If you follow me on Twitter (http://www.twitter.com/craigmullins) I am live tweeting many of the sessions I am attending at the conference. But for those of you who are not following along on Twitter, or just prefer a summary, here’s a quick overview of Monday’s highlights.

The day kicked off with the general session keynote which was delivered by a combination of Jake Porway (of DataKind), IBMers and IBM customers. The theme of the keynote was analytics and cognitive computing. The emphasis, in my opinion, of the event has kind of shifted from the data to what is being done with the data… in other words, the applications. And that is interesting, because the data is there to support the applications, right? That said, I’m a data bigot from way back, so it was a bit app-heavy for me.

That said, there were some insightful moments delivered during the keynote. Bob Picciano, Senior VP of IBM Analytics, kicked things off by informing the audience that true insight comes from exploiting existing data, dark data, and IoT data with agility driven by the cloud. That’s a lot of buzzwords, but it makes sense! And then executives from Whirlpool, Coca-Cola, and Twitter were brought out to talk about how they derive insight from their data and systems.

Perhaps the most interesting portion of the general session was the Watson discussion, led by Mike Rhodin, Senior VP, IBM Watson. We learned more about cognitive systems and how they get more valuable over time as they learn, which makes them unique in the world of computers and IT. IBM shared that there are more than 50 core technologies used by IBM Watson to deliver its cognitive computing capabilities. It was exciting to learn about systems that reason, never stop learning and drive more value over time.

Additional apps were discussed that let us learn about the various ways to choose wine, that nobody starts thinking about ice cream early in the week and that when the weather changes women buy warmer clothes; men buy beer and chips. You kind of had to be there, I guess!

Of course, this was only the first session of a long day. Additional highlights of the day included a high-level overview of the recently announced (but not yet released) DB2 12 for z/OS, the features that should be in a next generation database, and Gartner’s take on the state of big data and analytics. Let’s briefly address each of these one by one.

Firstly, DB2 12, which will be available in the ESP (early support program) in March 2016. There are a lot of nice new features that will be available in this new version. We’ll see a lot more in-memory capabilities which will speed up queries and processing. Buffer pools can be up to 16 TBs, even though today’s z systems can support only 10 TBs – IBM is planning for the future with that one!

And we’ll continue to see the blurring of the lines between static and dynamic SQL. How? Well, we’ll get RLF for static SQL and plan stability for dynamic SQL in DB2 12. IBM claims that we’ll be able to achieve up to 360 million txns/hour with DB2 12 for z/OS using a RESTful web API. Pretty impressive.

And there will be no skip-level migration for DB2 12... You have to migrate thru DB2 11 to get to 12.

OK, what about the features that should be in a next generation database? According to IBM a next gen DBMS should:

Deliver advanced in-memory technology
Be fast for both transactional and analytic workloads
Provide scalable performance
Be available, reliable, resilient, and secure
Be simple, intelligent and agile
And be easy to deploy and cloud-ready

Sounds about right to me!

And finally, let’s briefly take a look at the some of the Gartner observations on big data and analytics. The Gartner presentation was delivered by Donald Feinberg, long-time Gartner analyst on the topic of data and database systems. First of all, Feinberg rejects the term “big data” saying there is no such thing. It is all just data. He went on to claim that “big data” is perhaps the most ambiguous term out there, but it is also the most searched term at Gartner!

Feinberg also rejected the term data lake, saying “There is no such thing as a data lake, it is just a place to dump data.” He warned that it will come back to bite organizations in a few years if they do not take the time to manage, transform, secure, etc. the data in the lake, turning it into a data reservoir, instead.

He also made the very interesting observation that BI/analytics was the number 1 priority for CIOs on Gartner’s annual CIO priorities survey and that it has been in that slot for 8 years running. But if analytics was really that much of a priority why haven't they gotten it done yet?

Of course, a lot more happened at IBM Insight today – and many more things were discussed. But I don’t want this blog post to become too unwieldy, so that is all I’m going to cover for now.

I’ll try to cover more later in the week as the conference progresses.

Monday, October 12, 2015

OK, So How Does the CURRENTDATA Parm Work?

There seems to be a LOT of confusion out there in DB2-land about the CURRENTDATA parameter, so I thought it might be helpful to blog about the subject.

So first things first: CURRENTDATA is a BIND parameter that can be specified when you bind or rebind your plans and packages. There are two options that can be specified: YES and NO. The default is NO.

But what does it do?

Well, the manuals tell us that CURRENTDATA indicates whether data currency is required for read-only and ambiguous cursors when the ISOLATION(CS) option is used. But that is not very clear... except that CURRENTDATA really only applies with an isolation level of cursor stability.

The general idea here is to enable the developer to exert a level of control by choosing between high performance or up-to-date data. Sure, in a perfect world, everybody would choose both. But in the real world we deal with tradeoffs and the CURRENTDATA parameter allows us to tell DB2 which of these options to prefer.

CURRENTDATA (NO)

The default, CURRENTDATA(NO), is the performance choice, and probably the best choice for most applications. When NO is chosen, you are telling DB2 that completely up-to-date data currency is not required for read-only and ambiguous cursors. That means that DB2 can perform block fetching for distributed, ambiguous cursors.

And that is probably what you want. However, if you are not using pristine coding practices, and your program tries attempts to DELETE WHERE CURRENT OF against an ambiguous cursor, DB2 will return a negative SQLCODE.

What is an Ambiguous Cursor?

At this point you might be asking “What is an ambiguous cursor?” Generally speaking, an ambiguous cursor is one where DB2 cannot tell if your program needs to use the cursor for updating or deleting. You can avoid ambiguous cursors by coding the FOR READ ONLY and FOR UPDATE OF clause on your cursors to inform DB2 of the program’s future intent for the data being accessed.

CURRENTDATA (YES)

What about CURRENTDATA(YES)? So far, we have discussed only the NO option, but you can also specify YES. Doing so indicates that the data should be current for read-only and ambiguous cursors. That means DB2 will acquire page (or row) locks to ensure the currency of the data and block fetching for distributed, ambiguous cursors will be curtailed.

Summary

Hopefully this short blog post helps to clear up a muddy topic. Of course, there is a lot more to DB2 locking and performance and I do not want to make it seem like this is the only thing you need to know in terms of the tradeoffs between data currency and performance.

But at least one parameter might be a little bit clearer to you today than it was yesterday…

Tuesday, October 06, 2015

Sorting by Day of the Week in DB2

Today's blog will walk you through a sorting "trick" that you can use when you are dealing with days of the week. A relatively common problem involves sorting a column that contains a day-of-the-week not in alphabetical order by in the order it appears in the week...

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing the name of the day on which the transaction happened; let’s call this column DAY_NAME. Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, if we write the first query that comes to mind, the results will obviously be sorted improperly:

SELECT DAY_NAME, COL1, COL2 . . .

FROM TXN_TABLE

ORDER BY DAY_NAME;

The results from this query would be ordered alphabetically; in other words

And it would rare to want this type of data sorted in that order, right? The more common need would be to sort the data the way it appears on the calendar.

One solution would be to design the table with an additional numeric or alphabetic column that would allow us to sort that data properly. By this I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. Then we could SELECT DAY_NAME but sort by DAY_NUM.

But this is a bad fix. Not only does it requires a database design change, this "fix" also introduces the possibility for the DAY_NUM value and DAY_NAME value to get out of sync... unless we are very careful, or perhaps do not allow DAY_NUM to be changed other than via an INSERT trigger that automatically populates the correct number. But requiring a trigger adds even more complexity to this "fix" which really should indicate to us that it is not a very good proposal.

A better solution uses just SQL and requires no change to the database structures. All you need is an understanding of SQL and SQL functions – in this case, the LOCATE function. Here is the SQL:

SELECT DAY_NAME, COL1, COL2 . . .

FROM TXN_TABLE

ORDER BY LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME);

The trick here is to understand how the LOCATE function works: it returns the starting position of the first occurrence of one string within another string. So, in our example, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position.

So,if DAY_NAME is WED, the LOCATE function in the above SQL statement returns 10... and so on. To summarize, Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

(Note: Some other database management systems have a function similar to LOCATE called INSTR.)

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME)/3) + 1;

Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week.

You can monkey with the code samples here to modify the results to your liking. For example, if you prefer Monday to be the first day of the week and Sunday the last day of the week, simply change the text string in the LOCATE function as follows:

LOCATE('MONTUEWEDTHUFRISATSUN',DAY_NAME)

My intent with this blog posting is actually twofold. Firstly, I wanted to share a method of sorting days-of-the-week... but secondly, I hope to also have piqued your interest in using SQL functions to achieve all types of different processing requirements. DB2 offers quite a lot of functions and it seems to me that many programmers fail to use functions to their fullest.

So if you are ever face-to-face with a difficult query request, step back and take a moment to consider how DB2 functions might be able to help!