Monday, April 22, 2019

Db2 Application Testing Considerations


Testing application programs is a discipline unto itself and there are many considerations and nuances to be mastered to be able to test software appropriately. This blog post will not go into a lot of depth regarding testing practices, other than to highlight the important things to keep in mind to assure optimal performance of your Db2 applications.

Statistics

The data that you use in your test environment will not be the same as your production data. Typically, you will have less test data than you do in production. So, if you run the RUNSTATS utility on your test data you will get different statistics than in production. 

Instead of running RUNSTATS, you can test your SQL access paths by updating the test catalog statistics to be the same as your production system. Do not attempt to modify test statistics on your own. You should work with your DBA group to set up a process for up­dating test statistics. This can be accomplished in various ways. Your organization may have a tool that makes it easy to copy statistics from production to test; or your DBA team may use a DDL script of queries and modification statements to populate test statistics from production.

If your test table definitions are different in test and produc­tion you will need to take this into account in the script. For example, things like creator, name, indexes, number of parti­tions, and even columns can differ between environments. Furthermore, you may have new tables and columns for which there are no current production statistics, meaning that you will need to create estimated statistics based on your knowledge of the business and application.

Some organizations make the mistake of copying production statistics to test once, and never (or rarely) populating test again. This is a mistake because most production databases change over time, sometimes dramatically. When you run Runstats for your production applications it is a good idea to also update your test statistics from the new production statis­tics.

Modeling a Production Environment in Test

Another tactic you can use to improve the accuracy of access path testing is to model the configuration and settings of your productionenvironment in your test system. Remember that the Db2 optimizer does not just use statistics, but also infor­mation about your computing environment.

Db2 test systems typically vary from the production system. Application testing is often accomplished on test systems that have different parameters and configurations than the produc­tion systems that run the applications. Test systems usually get set up on a less powerful processor (or LPAR), and use less memory for buffering, sorting, and other system processes. This can result in different access paths for test and produc­tion systems, which can cause performance problems that only show up after you move to production.

However, it is possible to model the configuration and param­eters of your production environment in your test system. You can specify configuration details for Db2 to use for access path selection in your test system using profile tables.

Work with your DBA group to configure proper profile settings for testing your applications.

Test Cases for Skewed Data

Db2 assumes that data values are mostly uniformly distributed throughout the data. However, not all data is uniformly distributed. Db2 RUNSTATS can capture information about non-uniformly distributed and skewed data.

When data is non-uniformly distributed a subset of the values occur much more frequently than others. A special case of non-uniformly distributed data is skewed data. When data is skewed, one value (or a very small number of values) occurs much more frequently than others.

Non-uniformly distributed and skewed data presents a performance testing challenge. The Db2 optimizer can formulate different access paths for non-uniformly distributed data based on the actual values supplied. This is particularly important for dynamic SQL applications, but you should be aware of non-uniformly distributed and skewed data even for static SQL applications.

For non-uniformly distributed data you can examine the Db2 catalog to obtain values for the most commonly occurring values. For Db2 for z/OS this information is in the SYSIBM.SYSCOLDIST table.

Be sure to test with several of the values that are stored in the Colvalue column of these tables, and some that are not. This will enable you to test the performance of the most common values and less common values. The access paths may differ for each and the performance also can vary.

An Example

Let’s discuss an example. Suppose you operate a bar with a cigar room and you want to keep track of customers with a table of cigar smokers. We gather information like name, address, phone number, sex, and favorite cigar in the table. Cigar smokers skew male, so there will likely be many more rows where Sex is M, than there are where Sex is F.  With that background, consider a query like this one:

  SELECT name, phoneno, fav_cigar
  FROM   cigar_smokers
  WHERE  sex = ?;

Because the data is skewed, it is possible that Db2 will choose a different access path for M than for F. If the vast majority of the rows are men, then a table scan might be chosen for Sex = ‘M’; whereas with only a few rows for women, an index might be chosen if one exists on the Sex column.

This is just a simple example. You need to understand your data and how it skews to make sure that you create and test sample test cases for all of the pertinent values.

SQL Variations

A final performance testing consideration is to consider multiple SQL variations, especially for queries that access a lot of data or have complex access paths. Do not just find one SQL formulation that works and stick with it. Remember from earlier chapters that you can code multiple variations of SQL statements that return the same data, but that perform quite differently.


Note
This blog post was adapted and excerpted from my latest book, A Guide to Db2 Performance for Application Developers. Click the link for more information or to buy a copy (both print and ebook available).



Tuesday, March 26, 2019

IBM Releases Db2 12 for z/OS Function Level 504

Early in March 2019, IBM announced the latest and greatest new function level, 504 (FL504), for Db2 12 for z/OS. For those of you not paying attention, starting with Version 12, IBM has moved to a continuous delivery model and has been releasing new functionality regularly. This means that new capabilities are rolled out periodically in what IBM calls function levels.

The first few new function levels were released slowly since the GA of Version 12, but IBM has indicated that it will begin releasing function levels much more quickly now that it has tested and perfected its continuous delivery methodology.

So, what’s new in FL504? Well, several things. Let’s start with better control over deprecated objects. Those of us who work with Db2 know that, over time, IBM deprecates functionality. When functionality is deprecated it means that it is on the list of things that will be removed from Db2 in the future. That is, deprecated features are still supported, but IBM does not recommend that you continue to use the features. Basically, deprecation gives users time to migrate their usage to another feature or capability until such time as support is entirely removed.

OK, so IBM has been deprecating features for a long time now, what is new in FL504? We now get the ability to prevent the creation of new deprecated objects, which is a good idea, right? If the feature has been deprecated by IBM you really shouldn’t be building new systems with that deprecated capability; the idea behind deprecation is to allow existing functionality to work until you can remove or replace it with support functionality. Click here for a full list of deprecated function in Db2 12.

Once you have enabled function level 504 you can prevent the creation of certain deprecated objects in your Db2 subsystems. Starting in function level 504, SQL statement in packages that are bound with APPLCOMPAT(V12R1M504) or higher, or with comparable SQL options in effect, are prevented from creating the following types of deprecated objects:
  • Synonyms
  • Segmented (non-UTS) or partitioned (non-UTS) tables spaces
  • Hash-organized tables

Hash-organized tables are an interesting new deprecation. They were introduced quite recently in Version 10. Evidently, they were not used by enough Db2 sites to justify their continued support.

Another new capability of function level 504 is support for Huffman compression of Db2 data. This uses the IBM Z hardware-based entropy encoding (Huffman) compression with the IBM z14 Compression Coprocessor (CMPSC). A new ZPARM, TS_COMPRESSION_TYPE, has been introduced (once you have implemented FL504) that controls the compression method for the entire Db2 subsystem. 

The TS_COMPRESSION_TYPE subsystem parameter specifies the type of compression algorithm to use when Db2 creates new compressed table spaces, loads into, and reorganizes existing compressed table spaces. Of course, the Db2 subsystem must be running on z14 hardware with Huffman compression enabled for this parameter to be used. Additionally, Huffman compression only applies to universal table spaces. All other table space types will use fixed-length compression regardless of TS_COMPRESSION_TYPE.

The third new capability of FL504 is support for the built-in functions provided by the IBM Db2 Analytics Accelerator  (IDAA). The support is pass-through only, meaning that you must have IDAA in order for these BIFs to work. Db2 will recognize that they are supported by IDAA and pass the work to the accelerator. Db2 for z/OS only verifies that the data types of the parameters are valid for the functions. The accelerator engine does all other function resolution processing and validation. Click here for more information on how Db2 determines whether to accelerate eligible queries or not


Finally, FL504 provides new SQL syntax alternatives to make it easier to port applications to Db2 from other platforms. This capability gives developers more options for specifying certain special registers and NULL predicates.

Table 1 below outlines the new syntax variations that are supported for several existing special registers:
Table 1. New Special Register Syntax Alternatives 
Existing Special Register
New Syntax Alternative
CURRENT CLIENT_ACCTNG
CLIENT ACCTNG
CURRENT CLIENT_APPLNAME
CLIENT APPLNAME
CURRENT CLIENT_USERID
CLIENT USERID
CURRENT CLIENT_WRKSTNNAME
CLIENT WRKSTNNAME
CURRENT SERVER
CURRENT_SERVER
CURRENT TIME ZONE or
CURRENT TIMEZONE
CURRENT_TIMEZONE

Additionally, ISNULL and NOTNULL are also now supported as alternatives for the IS NULL and IS NOT NULL predicates. For additional details, see NULL predicates.

Summary

Take some time to investigate this new functionality to determine whether it makes sense to introduce it to your Db2 environment. If so, be suire to read through the incompatible changes (such as if your Db2 has a UDF that matches one of the new IDAA BIFs, or a variable that uses one of the new syntax alternatives).  And only then build your plan for activating the new function level.

Also, be aware that if you are not currently running at FL503, moving to FL504 activates all earlier function levels. You can find a list of all the current function levels here.

Monday, February 18, 2019

My Thoughts on Think

Last week I had the great pleasure of attending the IBM Think 2019 conference in San Francisco. There were many great sessions and content covering my interest areas, including AI, machine learning, analytics, data governance, Db2, digital transformation, and more. 

Part of the difficulty of attending such an event is deciding what to attend from a list of competing, interesting topics. And another is trying to piece together everything you hear into a cogent, comprehensive message. 

Instead of writing down all of my thoughts I decided to make a quick video on YouTube summarizing my takeaways from the conference. Let me know what you think...


Thursday, January 17, 2019

Making Plans for IBM Think 2019


I'm looking forward to once again attend IBM Think, IBM's flagship technology conference. I attended the inaugural Think conference and it was one of the highlights of the year (2017). This year IBM Think is being held in San Francisco February 12 thru 15 at the Moscone Center and surrounding hotels. San Francisco is a wonderful location because it will give the conference more room to accommodate the large crowds more comfortably than the Las Vegas venue of 2017.

One of the great things about Think is the breadth and scope of pertinent technical content that it covers. So whether you are a developer, a DBA, a data scientist, a manager, or any flavor of IT specialist, there will be a plethora of useful sessions and activities to educate and make you “think.”

Now you all know that my primary background is database administration and Db2, but I also work with and have interest in many other technologies, including data governance, security and data protection, DevOps, machine learning, AI, blockchain, quantum computing, and cloud computing. And the great thing about the IBM Think conference is that it provides in-depth coverage of all of these areas, and more.

A big struggle for such a large event (expected attendance in excess of 30,000) is finding what you need. Well, IBM Think makes it a bit easier because it is broken down into campuses that focus on a specific areas. This year’s campuses include:
  • ·         Smarter Business Showcase
  • ·         Data & AI Campus
  • ·         Cloud & Infrastructure Campus
  • ·         Security & Resiliency Campus

There will be more than 2,000 business strategy sessions and technical deep dives over the course of the week, along with professional development opportunities from 100s of hands-on labs and certification exams.

One of the big highlights of IBM Think is always the great speakers, and this year is no exception. From IBM speakers like CEO Ginni Rometty and Sr. VP Hybrid Cloud Arvind Krishna, to industry speakers like Founder & CEO of Mogul Tiffany Pham and AT&T CEO John Donovan, to researchers like MIT Media Lab and Harvard research specialist Dr. Kate Darling, to entertainers like Super Bowl MVP Joe Montana and skateboarding legend Tony Hawk, there will be a lot of knowledge imparted. I’m particularly looking forward to hearing Paul Cormier, EVP and President of Products and Technologies at Red Hat to hear how the IBM / Red Hat combination is working.

Another advantage of attending IBM Think is the access to exclusive information about IBM products, technologies, strategies, and services that are sure to be shared during the event. IBM always unveils a ton of great stories and technologies at Think.

I’ll be live-tweeting at IBM Think 2019, so be sure to follow me at twitter.com/craigmullins so you can experience Think right along with me, as it happens. Some of the sessions I plan on attending include topics on governed data science, using machine learning to prioritize business issues, and Db2 on cloud... but those are just the tip of the tech iceberg.

And finally, it is not too late. Click here if you want to attend IBM Think 2019… If you do, maybe I’ll see you there amongst 30,000 of our IT friends!

Monday, December 24, 2018

Happy Holidays 2018

It is hard to believe that yet another year has come and gone (well, almost) and that the holiday season is once again upon us. And that means it is time to reflect on the past year -- including all that we have accomplished and what is yet to be done.

And importantly, it is also time to wind down and relax with friends, family and loved ones.  A time to put down the work that consumes us most of the year and to celebrate and enjoy... 


So whatever holiday tradition you celebrate, be sure to celebrate well, wave goodbye to 2018 and ring in the New Year with happiness and anticipation...

...and I'll see you back here on the blog in the New Year, 2019!