Tuesday, February 02, 2016

The Most Misunderstood Features of DB2 – Part 2: Optimize vs. Limited Fetch

Welcome to Part 2 in my on-going blog series on The Most Misunderstood Features of DB2. In Part 1 of the series we tackled the topic of locking, which IMHO is easily the most misunderstood feature of DB2 (probably of most DBMSes).

Today's topic is a brief one, but one that I've found folks to be confused about. Namely, the difference between the OPTIMIZE FOR and FETCH x ROWS ONLY clauses.

The OPTIMIZE FOR n ROWS and FETCH FIRST n ROWS ONLY are similar when it comes to influencing the optimizer. Both will provide help to the optimizer about the scale of the query (that is, the number of rows to expect when optimizing the access path to the data).

But, the two clauses have different purposes. If all you want to do is influence the optimizer then you should use OPTIMIZE FOR n ROWS as its usage does not limit the number of rows that can be fetched when the statement is executed. For example, you might specify OPTIMIZE FOR 1 ROW on a SELECT statement. But if 500 rows qualify then you can FETCH all 500 rows at run time. 

(Note: the specific numbers used in the previous paragraph are not important; it is important only that you know the number of rows specified in the OPTIMIZE clause does not inhibit the number of rows that can later be accessed by the query/program.)

The FETCH FIRST n ROWS ONLY clause is different. It actually limits the number of rows that can be fetched when the query is run. If you specify the clause FETCH FIRST 5 ROWS ONLY, then only 5 rows can be fetched - even if 100 or 1000 or more actually qualify. And yes, the number of rows you specify for FETCH FIRST will be provided to the optimizer for access path formulation, but it also will change the way the query behaves when it runs. 

So, when you write a SELECT statement that qualifies a large number of rows, but you only need a few rows at run time, it is a good idea to use FETCH FIRST n ROWS ONLY. On the other hand, if you are simply trying to tweak query performance by getting the optimizer to consider a different number of qualifying rows, then you should consider using OPTIMIZER FOR n ROWS.

The two statements are quite different and it is important that developers not confuse the two!

Tuesday, January 26, 2016

The Most Misunderstood Features of DB2 – Part 1: Locking


Today I am introducing a new series of blog posts here on misunderstood DB2 features and functions. But before I start this blog post I want to emphasize that this is just my opinion. I’m sure many of you have your own ideas of the DB2 features that are most misunderstood. But please, take a moment to consider my thoughts here… and then share your own in the comments section below!

Locking!

One of the most misunderstood features of DB2 is how locking works. This is probably to be expected because database locking is a rather complex topic and it is constantly morphing with each and every new version of DB2.

First of all you have to understand the lock options available and how DB2 locks at each level: table space, table, page and row. You need to understand share and exclusive locks and that the manner in which DB2 takes them depends upon the SQL statements you issue, the program bind options you choose, and even, in some cases, the DDL options you’ve chosen.

And more. What about claims and drains? Do you know the different between locks and latches? What about the difference between lock suspension, timeouts, and deadlocks?

Then there is the whole issue of lock avoidance and what that means. I visit a lot of shops each year and I still see a lot of ambiguous cursors. Simply adding FOR READ ONLY to all of the cursors in your program where you will not be modifying the data can make a world of performance difference because DB2 can avoid locking.

There is also the frequently-ignored need for a commit strategy. Every batch program should have a commit strategy implemented whereby modifications are committed to the database and locks are released. Failing to do this causes other programs to timeout waiting for locks. I wrote about this in the past and I call it Bachelor Programming Syndrome.

Another locking issue is the whole dirty read everywhere problem. Programmers learn that they can improve performance by adding WITH UR to the end of their SQL statements in their programs. UR, Uncommitted Read – aka dirty read, causes DB2 to not take locks. But that can cause problems because the program is reading uncommitted data – and that data it is reading may never actually exist in the database (it could be rolled back) and may not be consistent because it is not reading only committed units of work. I’ve visited shops that use WITH UR everywhere and they insist that they have not encountered any problems. To which I always add, “…yet.”

Then there are the new features added in the last few releases of DB2, including improved lock efficiencies when accessing the DB2 Catalog, the ability to access currently committed data, optimistic locking, skipping locked data, and more.

Summary

Yes, it is easy to misunderstand locking, and many do so. Taking the time to bolster your knowledge of how DB2 locks, the locking options available, and how they work can do you and your shop a world of good.


And be sure to check back here for future blogs about misunderstood DB2 features!

Monday, January 11, 2016

The Impact of Data Volume on Operational Databases

Operational databases are growing in size for many reasons, not the least of which is the growing importance of big data and analytics projects. There is the overarching trend of more and more data being generated every year. But also, there is the growing need to store more data for longer periods of time due to regulatory and compliance issues. Some organizations and business have encountered the need to store certain types of data for 100 years or more (as this video and this storage project point out).

But I doubt that I really need to convince you that your databases are growing in size. Most DBAs experience the reality of data growth every day.

As data volumes expand, it impacts operational databases in two ways:
  1. additional data stresses transaction processing and can cause performance slow downs, and;
  2. database administration tasks are negatively impacted.
In terms of performance, the more data in the operational database, the less efficient transactions running against that database tend to be. Table scans must reference more pages of data to return a result. Indexes grow in size to support larger data volumes, causing access by the index to degrade because there are more levels to traverse to return an answer. Such performance impacts are causing many companies to seek solutions that offload older data to either reference databases or to archive data stores.

The other impact, database administration complexity, causes longer processing time and outages to perform such functions as backups, unloads, reorganizations, recoveries, and disaster recoveries.  The larger the underlying data sets for your tables and table spaces, the longer it takes to run administrative utilities for them. In many cases the lengthened outages can become unacceptable, causing companies to again seek ways to lighten up the operational databases... or perhaps acquire next generation utility technology that understands the reality of large DB2 database objects.

But even though we want to keep all of that additional data, there is no reason that it necessarily has to be stored in operational databases that run the business. For many reasons, you probably want to separate active data from historical data. 

Some companies create purge jobs for all (or many) of their tables to remove data from the production databases as it ages. This can be an acceptable approach to reduce the size of your operational databases. But it also means that the data, which you might want to keep for analytical purposes, is lost. Another approach is to archive the data. Archiving data and purging data are two different processes. When data is purged, it is removed from the operational database and discarded. But archived data is removed from the operational database and maintained in an archive data store. The archive might be a flat file, another relational table or to HDFS using Hadoop.

The bottom line is that it makes sense for us, as DBAs, to keep any eye on the size of our operational databases and take action when production workload is impacted.  

Monday, January 04, 2016

A Lot of Extraneous Data Sets?

In a recent blog post here I talked about a quick and dirty method of converting your partitioned table spaces from index-controlled to table-controlled. If you haven't read that post, take a moment to click over and read it here: Easily Convert to Table-Controlled Partitioning.

The reason I bring this up today is that I received an interesting e-mail from a long-time friend and DB2 DBA who read the post and had some information to share with me. He told me about how his organization used one of my tips to drop unused indexes as part of this process.

He told me that during the conversion process they dropped a lot of the clustering indexes because they weren't being used for access paths or for uniqueness. And they were able to release an "astonishing 4,100 data sets" by doing so!

Now I am not suggesting that every shop will be able to experience a similar savings, but if you have indexes that have no purpose other than enforcing index-controlled partitioning, it is time to bite the bullet and drop those indexes as you convert to table-controlled-partitioning (and then on to Universal table spaces).

And when you convert, please drop a note here on the blog to let us know how your conversion efforts went!

Tuesday, December 15, 2015

Happy Holidays!

Well, it is that time of year again. The days are shorter and the weather is colder... even if it isn't as cold as normal it is colder than it was in July! And most people are taking the time to celebrate the holiday season. 

Here's wishing each and every one of my readers a happy holiday... regardless of your chosen season to celebrate! Whether you celebrate Chanukah, Christmas, Kwanzaa, the Winter Solstice, Saturnalia, or just the end of another year on Planet Earth, I'm with you, and celebrating my good fortune, great family and friends, and you, my regular blog readers. I appreciate and thank you all...

This will be the final post of the year (2015) for this blog, but be sure to join me again next year - 2016 - as we continue to examine all aspects of everybody's favorite DBMS... IBM's DB2...

Tuesday, December 08, 2015

Easily Convert to Table-Controlled Partitioning

Up through DB2 V8 for z/OS, the only way to control partitioning of DB2 table spaces was by using a clustering index that specified the range of key values for each partition. With V8, though, DB2 adds the ability to specify the partitioning criteria in the CREATE TABLE specification. This is known as table-controlled partitioning and it is the preferred method for creating (non-Universal) partitioned table spaces. With table-controlled partitioning you can cluster on a different column (or set of columns) than you are partitioning on. Furthermore, you can make changes such as dropping a partitioning index or creating a table in a partitioned table space without defining any indexes at all.


But given the long history of DB2, many existing partitioned table spaces are index-controlled. 

Fortunately, there is a quick-and-dirty technique that you can use to easily convert from index-controlled to table-based partitioning. Simply follow these steps:

  • Identify the index-controlled partitioned table space you wish to convert
  • Convert the clustering index on the table to NOT CLUSTER using ALTER INDEX. (Alternately, you could drop the clustering index, but I wouldn’t recommend that unless you no longer need that index at all.)
  • Convert the index back to CLUSTER, again using ALTER INDEX
Voila! DB2 will have converted your table space to table-controlled partitioning.



Note: DB2 will also convert from index-controlled to table-controlled partitioning if you use ALTER TABLE to add a new partition, change a partition boundary, or rotate a partition to last on an index-controlled partitioned table space. But these are more intrusive methods than simply altering the index from clustering to non-clustering and back again.

Wednesday, November 25, 2015

Happy Thanksgiving 2015

Every year this week those of us in the USA take time out to give thanks for all that we have. We do this by taking time off of work, gathering with our families, eating turkey (and a lot of other stuff), and watching football.

It is one of my favorite holidays as it offers most of the joys of Christmas without many of the trappings.

So with this in mind, I'd like to wish all of my readers -- whether you reside here in the USA or anywhere in the world -- a very Happy Thanksgiving. Take some time to reflect on your good fortune... consider what you might be able to do to help others achieve success... and relax a bit and enjoy yourself...



We can talk about DB2 and databases again in December!

Wednesday, November 18, 2015

Midwest DB2 User Group (Dec 4, 2015)

Just a short blog post today to promote my upcoming speaking engagement at the Midwest DB2 User Group, in Chicago. If you are in or around the Chicagoland area on December 4th, 2015 I invite you to stop by and participate in the meeting!

The meeting starts at Noon and a free lunch is provided. I will be giving one of the 3 presentations that day. My presentation is titled Database and DB2 Trends circa 2015 - An overview of an industry in transition.... This is an ever-changing presentation that I have delivered on several occasions in the past, but not in exactly the same way. This pitch provides an overview of the transformation of data management over the course of the past few years. I discuss Big Data, analytics, NoSQL, and their impact on the modern data architecture and DB2 for z/OS in particular. 

But that is not the only highlight of this event. Sheryl M. Larsen, now with BMC Software, will regale the group with the results of BMC's Annual Mainframe Research Survey. BMC started their mainframe survey ten years ago as a way to gain insight into the issues and challenges facing mainframe customers. And it always contains a lot of useful information and details for those of us in the business of mainframe computing.

The third speaker is Tim Lenahan, who I've been told will be presenting something a little bit different this time around. And having heard Tim speak in the past, I'm looking forward to what he has to say now!

So if you are going to be near Chicago in early December, register and attend the MWDUG meeting. I'm sure it will be worth your time!

Hope to see you there!

Friday, November 13, 2015

A Quick and Dirty Guide to DB2 Performance

All performance problems are caused by change. Now as regular readers know, normally I avoid all-encompassing terms like all, always, none and never, but in this case, all is the appropriate term. If your DB2 environment is humming along and things are performing well, then the only thing that will disrupt that is a change of some form.

Change can take many forms, not just a program change. Consider the following ways in which change can be introduced to your DB2 world:
  • Physical changes to the environment, such as a new CPU, additional memory, new disk, or different storage technology (e.g. SSD).
  • Changes to system software, such as a new release of a product (for example, QMF, CICS, or GDDM), the alteration of a product (for example, the addition of more or fewer CICS regions or an IMS SYSGEN), or a new product (for example, implementation of DFHSM). 
  • And let's not forget the installation of a new release or version of DB2, which can result in changes in access paths and the utilization of features new to DB2. 
  • An updated or  new version of the operating system can also cause DB2 performance problems.
  • Changes to the DB2 engine from maintenance releases, which can change the DB2 optimizer and other facets of DB2's operation.
  • Changes in system capacity. More or fewer jobs could be executing concurrently when the performance problem occurs.
  • Environment changes, such as the implementation of dynamic SQL, web-based programs or the adoption of Data Sharing.
  • Database changes. This involves changes to any DB2 object, and ranges from adding a new column or an index, to changing table space types, to dropping and re-creating an object.
  • Changes to the application development methodology, such as usage of check constraints instead of application logic or the use of stored procedures.
  • Changes to application code.

So regardless of how it may seem at times, performance problems are not caused by magic. Something tangible changes, creating a performance problem in the application, database, or subsystem. The challenge of tuning is to find the source of the change -- the root cause if you will, gauge its impact, and formulate a solution.

Although the majority of your problems will be application-oriented, you will frequently need to explore the tuning opportunities outside of SQL and programming changes when application tuning has little effect.

The following is a quick reference of the possible tuning options for each environment:

To tune z/OS
  • Change WLM parameters to enhance throughput for DB2, its allied agent, and/or programs.
  • Modify swappability.
  • Add memory.
  • Upgrade CPU.
  • Add zIIPs and tune SQL/program to take advantage of zIIP processors.
  • Implement data sharing.
  • Use an active performance monitor (enables tuning on the fly).

To tune the teleprocessing environments
  • Change the system generation parameters.
  • Tune the program definition (PSBs and PPT entries).
  • Modify the Attachment Facility parameters.
  • Add or modify resource parameters (e.g. RDO)
  • Use an active performance monitor (enables tuning on the fly).

To tune the DB2 subsystem
  • Modify DSNZPARMs, for example to increase log buffers, increase or decrease the number of concurrent users, change lock escalation, and so on.
  • Issue DISPLAY and ALTER BUFFERPOOL commands to review buffer pool usage characteristics, change buffer pool sizes, and increase or decrease buffer pool thresholds.
  • Tune the DB2 Catalog, including dropping and freeing objects, executing MODIFY, reorganizing DB2 catalog table spaces and indexes, recovering the DB2 catalog indexes, building additional indexes on the DB2 catalog, tuning disk usage, and implementing data set shadowing.
  • Perform DSNDB07 tuning.

To tune the DB2 database design
  • Modify the logical and physical model.
  • Modify and issue DDL.
  • Execute ALTER statements.
  • Ensure that proper parameters are specified.
  • Implement table changes.
  • Partition data into universal PBG table spaces.
  • Spread non-partitioned objects over multiple devices using PIECESIZE.
  • Add indexes.
  • REORG tablespaces.
  • REORG or RECOVER indexes.
  • Consider or reconsider data compression.
  • Denormalize the database design.
  • Consider adding redundant tables.
To tune programs
  • Perform SQL tuning.
  • Tune the high-level language (such as COBOL or 4GL).
  • Use a program restructuring tool.
  • Run RUNSTATS.
  • Execute EXPLAIN, modify your code, and REBIND.
  • Examine the REOPT clause and whether it can help your specific code.
  • Consider using SQL tweaks and/or hints.
  • Use the OPTIMIZE FOR n ROWS clause.
  • Consider activating query parallelism.
  • Change locking strategies.
  • Change the DB2 catalog statistics and REBIND.
  • Use a testing tool to provide what if testing and tuning.
  • Use a tool to sample the application's address space as it executes.

Of course, this is not an exhaustive list and should only be used as a guideline for the

types of tuning options available to you.

Hopefully this high-level overview of DB2 performance and various tuning options at your disposal has given you some food for thought as you tackle the task of keeping your DB2 environment operating smoothly and efficiently... 

Monday, November 02, 2015

IBM Insight 2015 Wrap-Up

Last week I attended the IBM Insight conference and blogged about the first few days of the conference here at http://db2portal.blogspot.com/2015/10/an-update-from-ibm-insight-2015.html… and I promised to blog about the remainder of the conference, so here is a synopsis of the highlights.


On Wednesday, the focus of the general session was on IBM’s acquisition of The Weather Company’s technology.  The deal calls for IBM to acquire The Weather Company’s B2B, mobile and cloud-based web properties, including WSI, weather.com, Weather Underground and The Weather Company brand. IBM will not be acquiring The Weather Channel television station, which will license weather forecast data and analytics from IBM under a long-term contract. IBM intends to utilize its newly acquired weather data in its Watson platform.

The deal is expected to close in the first quarter of 2016. Terms were not disclosed.

You can read all about the acquisition in this IBM press release

I spent some of my time at Insight this year learning more about dashDB and it is a very interesting technology. Marketed as data warehousing in the cloud, IBM touts four use cases for dashDB: standalone cloud data warehouse, as a store for data scientists, for those implementing a hybrid data warehouse, and for NoSQL analysis and rapid prototyping.
IBM promotes simplicity, performance, analytics on both traditional and NoSQL, and polyglot language support as the most important highlights of dashDB. And because it has DB2 BLU under the covers IBM dashDB not only super-compresses data, but it can operate on that data without necessarily decompressing it.
Additionally, a big theme of the conference was in-memory technology, and dashDB sports CPU cache capabilities. In fact, I heard several folks at the conference say some variation of “RAM is too slow”… meaning that CPU cache is faster and IBM is moving in that direction.
The bottom line for dashDB is that it offers built-in high availability and workload management capabilities, along with being in-memory optimized and scalable. Worth a look for folks needing a powerful data warehousing platform.
For you DB2 for z/OS folks, IDAA was a big theme of this year’s Insight conference. The latest version, V5.1, adds advanced analytics capabilities and in database transformation, making your mainframe queries that can take advantage of the accelerator faster than ever.
Apache Spark was another pervasive topic this year. It was talked about in multiple sessions and I even had the opportunity to play with it in a hands-on lab. The big news for z folks is that IBM is bringing out a version of Spark for the mainframe that will run on z/OS – it is already supported on zLinux.
Of course, I attended a whole slew of DB2 sessions including SQL coding, performance and administration presentations. Some of the highlights include DB2 11 for LUW being announced, several discussions about dark data, and a lot of information about IBM's Big SQL and how it can be used to rapidly and efficiently access Hadoop (and other unstructured) data using SQL.
I live-tweeted a bunch of highlights of those sessions, too. Indeed, too many to include here, if you are interested in catching everything I have to say about a conference, keep reading these blog posts, of course, but you should really follow me on Twitter, too at http://twitter.com/craigmullins
I also had the honor of delivering a presentation at this year's conference on the changes and trends going on in the world of DB2 for z/OS. Thanks to the 70 or so people who attended my session - I hope you all enjoyed it and learned something, too!
As usual, and well-you-know if you've ever attended this conference before, there was also a LOT of walking to be done. From the hotel to the conference center to the expo hall to lunch to the conference center. But at least there were some signs making light of the situation this year! 
There was a lot of fun to be had at the conference, too. The vendor exhibition hall was stocked with many vendors, big and small, and it seems like they all had candy. I guess that’s what you get when the conference is so close to Halloween! The annual Z party at the House of Blues (for which you need a Z pin to get in – this year’s pin was orange) was a blast and the Maroon 5 concert courtesy of Rocket Software was a lot of fun, too.

If you are looking for a week of database, big data, and analytics knowledge transfer, the opportunity to chat and connect with your peers, as well as some night-time entertainment, be sure to plan to attend next year’s IBM Insight conference (October 23 thru 27, 2016 at the Mandalay Bay in Las Vegas).

Monday, October 26, 2015

An Update from IBM Insight 2015

The annual IBM Insight conference is being held this week in Las Vegas, as usual at the Mandalay Bay Resort and Conference Center. And I have the good fortunate to be in attendance.

If you follow me on Twitter (http://www.twitter.com/craigmullins) I am live tweeting many of the sessions I am attending at the conference. But for those of you who are not following along on Twitter, or just prefer a summary, here’s a quick overview of Monday’s highlights.

The day kicked off with the general session keynote which was delivered by a combination of Jake Porway (of DataKind), IBMers and IBM customers. The theme of the keynote was analytics and cognitive computing. The emphasis, in my opinion, of the event has kind of shifted from the data to what is being done with the data… in other words, the applications. And that is interesting, because the data is there to support the applications, right? That said, I’m a data bigot from way back, so it was a bit app-heavy for me.

That said, there were some insightful moments delivered during the keynote. Bob Picciano, Senior VP of IBM Analytics, kicked things off by informing the audience that true insight comes from exploiting existing data, dark data, and IoT data with agility driven by the cloud. That’s a lot of buzzwords, but it makes sense! And then executives from Whirlpool, Coca-Cola, and Twitter were brought out to talk about how they derive insight from their data and systems.

Perhaps the most interesting portion of the general session was the Watson discussion, led by Mike Rhodin, Senior VP, IBM Watson. We learned more about cognitive systems and how they get more valuable over time as they learn, which makes them unique in the world of computers and IT. IBM shared that there are more than 50 core technologies used by IBM Watson to deliver its cognitive computing capabilities. It was exciting to learn about systems that reason, never stop learning and drive more value over time.
Additional apps were discussed that let us learn about the various ways to choose wine, that nobody starts thinking about ice cream early in the week and that when the weather changes women buy warmer clothes; men buy beer and chips. You kind of had to be there, I guess!

Of course, this was only the first session of a long day. Additional highlights of the day included a high-level overview of the recently announced (but not yet released) DB2 12 for z/OS, the features that should be in a next generation database, and Gartner’s take on the state of big data and analytics. Let’s briefly address each of these one by one.
Firstly, DB2 12, which will be available in the ESP (early support program) in March 2016. There are a lot of nice new features that will be available in this new version. We’ll see a lot more in-memory capabilities which will speed up queries and processing. Buffer pools can be up to 16 TBs, even though today’s z systems can support only 10 TBs – IBM is planning for the future with that one!

And we’ll continue to see the blurring of the lines between static and dynamic SQL. How? Well, we’ll get RLF for static SQL and plan stability for dynamic SQL in DB2 12. IBM claims that we’ll be able to achieve up to 360 million txns/hour with DB2 12 for z/OS using a RESTful web API. Pretty impressive.

And there will be no skip-level migration for DB2 12... You have to migrate thru DB2 11 to get to 12.

OK, what about the features that should be in a next generation database? According to IBM a next gen DBMS should:
  • Deliver advanced in-memory technology
  • Be fast for both transactional and analytic workloads
  • Provide scalable performance
  • Be available, reliable, resilient, and secure
  • Be simple, intelligent and agile
  • And be easy to deploy and cloud-ready

Sounds about right to me!                                                                                        

And finally, let’s briefly take a look at the some of the Gartner observations on big data and analytics. The Gartner presentation was delivered by Donald Feinberg, long-time Gartner analyst on the topic of data and database systems. First of all, Feinberg rejects the term “big data” saying there is no such thing. It is all just data. He went on to claim that “big data” is perhaps the most ambiguous term out there, but it is also the most searched term at Gartner!

Feinberg also rejected the term data lake, saying “There is no such thing as a data lake, it is just a place to dump data.” He warned that it will  come back to bite organizations in a few years if they do not take the time to manage, transform, secure, etc. the data in the lake, turning it into a data reservoir, instead.

He also made the very interesting observation that BI/analytics was the number 1 priority for CIOs on Gartner’s annual CIO priorities survey and that it has been in that slot for 8 years running. But if analytics was really that much of a priority why haven't they gotten it done yet?

Of course, a lot more happened at IBM Insight today – and many more things were discussed. But I don’t want this blog post to become too unwieldy, so that is all I’m going to cover for now.

I’ll try to cover more later in the week as the conference progresses.


Monday, October 12, 2015

OK, So How Does the CURRENTDATA Parm Work?

There seems to be a LOT of confusion out there in DB2-land about the CURRENTDATA parameter, so I thought it might be helpful to blog about the subject.

So first things first: CURRENTDATA is a BIND parameter that can be specified when you bind or rebind your plans and packages. There are two options that can be specified: YES and NO. The default is NO.

But what does it do?

Well, the manuals tell us that CURRENTDATA indicates whether data currency is required for read-only and ambiguous cursors when the ISOLATION(CS) option is used. But that is not very clear... except that CURRENTDATA really only applies with an isolation level of cursor stability.

The general idea here is to enable the developer to exert a level of control by choosing between high performance or up-to-date data. Sure, in a perfect world, everybody would choose both. But in the real world we deal with tradeoffs and the CURRENTDATA parameter allows us to tell DB2 which of these options to prefer.

CURRENTDATA (NO)

The default, CURRENTDATA(NO), is the performance choice, and probably the best choice for most applications. When NO is chosen, you are telling DB2 that completely up-to-date data currency is not required for read-only and ambiguous cursors. That means that DB2 can perform block fetching for distributed, ambiguous cursors.

And that is probably what you want. However, if you are not using pristine coding practices, and your program tries attempts to DELETE WHERE CURRENT OF against an ambiguous cursor, DB2 will return a negative SQLCODE.

What is an Ambiguous Cursor?

At this point you might be asking “What is an ambiguous cursor?” Generally speaking, an ambiguous cursor is one where DB2 cannot tell if your program needs to use the cursor for updating or deleting. You can avoid ambiguous cursors by coding the FOR READ ONLY and FOR UPDATE OF clause on your cursors to inform DB2 of the program’s future intent for the data being accessed.

CURRENTDATA (YES)

What about CURRENTDATA(YES)? So far, we have discussed only the NO option, but you can also specify YES. Doing so indicates that the data should be current for read-only and ambiguous cursors. That means DB2 will acquire page (or row) locks to ensure the currency of the data and block fetching for distributed, ambiguous cursors will be curtailed.

Summary

Hopefully this short blog post helps to clear up a muddy topic. Of course, there is a lot more to DB2 locking and performance and I do not want to make it seem like this is the only thing you need to know in terms of the tradeoffs between data currency and performance.


But at least one parameter might be a little bit clearer to you today than it was yesterday…

Tuesday, October 06, 2015

Sorting by Day of the Week in DB2

Today's blog will walk you through a sorting "trick" that you can use when you are dealing with days of the week. A relatively common problem involves sorting a column that contains a day-of-the-week not in alphabetical order by in the order it appears in the week...

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing the name of the day on which the transaction happened; let’s call this column DAY_NAME. Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, if we write the first query that comes to mind, the results will obviously be sorted improperly:

SELECT   DAY_NAME, COL1, COL2 . . .
FROM     TXN_TABLE
ORDER BY DAY_NAME;

The results from this query would be ordered alphabetically; in other words
  1. FRI
  2. MON
  3. SAT
  4. SUN
  5. THU
  6. TUE
  7. WED

And it would rare to want this type of data sorted in that order, right? The more common need would be to sort the data the way it appears on the calendar.



One solution would be to design the table with an additional numeric or alphabetic column that would allow us to sort that data properly. By this I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. Then we could SELECT DAY_NAME but sort by DAY_NUM.

But this is a bad fix. Not only does it requires a database design change, this "fix" also introduces the possibility for the DAY_NUM value and DAY_NAME value to get out of sync... unless we are very careful, or perhaps do not allow DAY_NUM to be changed other than via an INSERT trigger that automatically populates the correct number. But requiring a trigger adds even more complexity to this "fix" which really should indicate to us that it is not a very good proposal.

A better solution uses just SQL and requires no change to the database structures. All you need is an understanding of SQL and SQL functions – in this case, the LOCATE function.  Here is the SQL:

SELECT   DAY_NAME, COL1, COL2 . . .
FROM     TXN_TABLE
ORDER BY LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME);

The trick here is to understand how the LOCATE function works: it returns the starting position of the first occurrence of one string within another string. So, in our example, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position.

So,if DAY_NAME is WED, the LOCATE function in the above SQL statement returns 10... and so on. To summarize, Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

(Note: Some other database management systems have a function similar to LOCATE called INSTR.) 

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME)/3) + 1;


Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week. 

You can monkey with the code samples here to modify the results to your liking. For example, if you prefer Monday to be the first day of the week and Sunday the last day of the week, simply change the text string in the LOCATE function as follows:

LOCATE('MONTUEWEDTHUFRISATSUN',DAY_NAME)

My intent with this blog posting is actually twofold. Firstly, I wanted to share a method of sorting days-of-the-week... but secondly, I hope to also have piqued your interest in using SQL functions to achieve all types of different processing requirements. DB2 offers quite a lot of functions and it seems to me that many programmers fail to use functions to their fullest.

So if you are ever face-to-face with a difficult query request, step back and take a moment to consider how DB2 functions might be able to help!

Thursday, October 01, 2015

Understanding DB2 SELECT Syntax

Most DB2 programmers think they know how to correctly code simple SQL SELECT statements. And they usually are correct, as long as you keep that adjective “simple” in the assertion. When the statement requires more than SELECT...FROM…WHERE though, problems can ensue.

One of the biggest SELECT type of problem encountered by DB2 users is related to syntax. To paraphrase Mark Twain, sometimes what people think they know, just ain’t so.

How can you find the proper syntax for SELECT statements? The DB2 SQL Reference manual contains all of the syntax for DB2 SQL, but query syntax is separated from the rest of the language. Typically, users go to Chapter 5 of the SQL Reference that contains syntax diagrams, semantic descriptions, rules, and examples of the use of DB2 SQL statements. SELECT INTO is there, but SELECT is not in this section. Well, actually, there is a placeholder page that refers the reader back to Chapter 4. Chapter 4 contains the detailed syntax information and usage details for DB2 queries using SELECT. 


Another potentially confusing aspect of DB2 SELECT is the breakdown of SELECT into three sections: fullselect, subselect, and select-statement. This causes many developers to confuse which query options are available to the SELECT statements they want to code.

Let's take a look at each of these three sections:

First up is the select-statement, which is the form of a query that can be directly specified in a DECLARE CURSOR statement, or prepared and then referenced in a DECLARE CURSOR statement. It is the thing most people think of when they think of SELECT in all its glory. If so desired, it can be issued interactively using SPUFI. The select-statement consists of a fullselect, and any of the following optional clauses: WITH CTE, update, read-only, optimize-for, isolation, queryno and SKIP LOCKED DATA.

A CTE, or common table expression, defines a result table with a table-identifier that can be referenced in any FROM clause of the fullselect that follows. Multiple CTEs can be specified following a single WITH keyword. Each specified CTE can also be referenced by name in the FROM clause of subsequent common table expressions.

The next component is a fullselect, which can be part of a select-statement, a CREATE VIEW statement, a materialized query table, a temporary table or an INSERT statement. Basically, a fullselect specifies a result table. A fullselect consists of at least a subselect, possibly connected to another subselect via UNION, EXCEPT or INTERSECT. And ever since DB2 version 9 for z/OS, you can apply either or both ORDER BY and FETCH FIRST clauses. Prior to V9, this sometimes confused folks as they tried to put a FETCH FIRST n ROWS clause or an ORDER BY in a view or as part of an INSERT. That was not allowed! But it is now.

However, a fullselect does not allow any of the following clauses: FOR FETCH ONLY, FOR UPDATE OF, OPTIMIZE FOR, WITH, QUERYNO and SKIP LOCKED DATA. A fullselect specifies a result table – and none of these afore-mentioned clauses apply.

This sometimes confuses folks. I recently had a conversation with a guy who swore that at one point he created a view using the WITH UR clause and that it worked. It didn’t when we spoke and I’m sure it never did.

Finally, a subselect is a component of the fullselect. A subselect specifies a result table derived from the result of its first FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next.

I know, this can all seem to be a bit confusing. But think of it this way: in a subselect you specify the FROM to get the tables, the WHERE to get the conditions, GROUP BY to get aggregation, HAVING to get the conditions on the aggregated data, and the SELECT clause to get the actual columns. In a fullselect you add in the UNION to combine subselects and other fullselects. Finally, you add on any optional clauses (as specified earlier) to get the select-statement.

Now what could be any easier?

Actually, it is not super easy. And if you add in some of the newer SQL capabilities, like OLAP functions or temporal time travel query clauses, it gets even more complicated.


I guess the bottom line is that you really should make sure you have the SQL Reference handy (see link above) if you are doing anything other than simple selecting… because you’ll probably need it.

Monday, September 14, 2015

Are You Heading to Las Vegas for IBM Insight?

The IBM Insight conference is coming up and if you work with DB2, Big Data, analytics, data warehousing, or really, anything at all about enterprise data, then Las Vegas is the place to be the week of October 25 thru 29, 2015.

But what is IBM Insight? Well, you may remember it as the IBM Information on Demand conference, or IOD for short, Yes, IBM has renamed the conference yet again. I'm sure a lot of you can remember when there were a bunch of different "Technical Conferences" like the IMS Tech Conference or the DB2 Tech Conference. Those conferences, as well as several others, all got rolled up into IOD... which is now IBM Insight.


So with that bit of potential confusion out of the way, why should you attend this year's IBM Insight conference? Simply put, there is something of interest for everybody in a data-related profession. 

The conference boasts more than 1600 presentations that run the gamut from DB2 to IMS to Cognos to BI to Big Data to analytics to... well, you get the idea. You can experiences technical training, hands-on labs, and industry use cases delivered by over 1000 industry experts.

There will be multiple keynote sessions, focusing in on IBM's important data initiatives including:
  • Data Management
  • Advanced Analytics
  • Cloud
  • Hadoop & Spark
  • Content Management
  • Watson
Last year's event was attended by over 13,000 folks, which gives attendees a great opportunity to network with your peers and IBMers. Add to that the over 350 exhibitors at the Expo hall and you will be able to view, review, and examine all kinds of interesting software to help you manage your enterprise data.

I also want to promote my presentation at this year's conference, called Not Your Daddy's DB2! I'll talk about the changing landscape of the industry and how DB2 for z/OS has changed (and continues to change) to embrace modern IT. This session will be held on Wednesday, October 28, at 4:00 pm in the South Seas J room). If you're going to the conference, I hope to see you at my presentation.

And, as always, there will be plenty of time to kick back and relax after a long day of networking and learning. On Wednesday evening conference attendees will be treated to a free concert from Maroon 5... and, of course, everything that Las Vegas has to offer can be on your agenda, too.

So what are you waiting for? Register for the 2015 IBM Insight conference today... 

And if you register by September 18th you can get a discounted rate ($300 off).