Thursday, August 01, 2019

DevOps is Coming to Db2 for z/OS


Mainframe development teams are relying on DevOps practices more extensively, bringing the need to incorporate Db2 for z/OS database changes into the toolset that is supporting their software development lifecycle (SDLC).

But most mainframe professionals have only heard a little about DevOps and are not really savvy as to what it entails. DevOps is an amalgamation of Development and Operations. The goal of DevOps is to increase collaboration between developers and operational support and management professionals, with the desired outcome of faster, more accurate software delivery.

DevOps typically relies on agile development, coupled with a collaborative approach between development and operations personnel during all stages of the application development lifecycle. The DevOps approach results in small and frequent code changes and it can significantly reduce the lead time for changes, lower the rate of failure, and reduce the mean time to recovery when errors are encountered. These are all desirable qualities, especially as organizations are embracing digital transformation driven by the 24/7 expectations of users and customers to access data and apps at any time from any device.

The need to be able to survive and thrive in the new digital economy has caused organizations to adopt new and faster methods of developing, testing and delivering application software. Moving from a waterfall software development methodology to an agile methodology is one way that organizations are speeding the time-to-delivery of their software development. Incorporating a DevOps approach is another.

Instead of long software development projects that may not deliver value for months, or perhaps even years (common using the Waterfall development methodology) an agile DevOps approach delivers value quickly, and then incrementally over time. DevOps enables the continuous delivery of new functionality demanded by customers in the digital economy.

Succeeding with DevOps, however, requires a cultural shift in which all groups within IT work in collaboration with one another, and where management endorses and cultivates this cultural change. Because DevOps relies upon incremental development and rapid software delivery, your IT department can only thrive if there is a culture of accountability, collaboration, and team responsibility for desired business outcomes. Furthermore, it requires solid, integrated automated tooling to facilitate the SDLC from development, through testing, to delivery. Creating such an environment and culture can be challenging.

With DevOps the result will be a constantly repeating cycle of continuous development, continuous integration and continuous deployment. This is typically depicted graphically as the infinity symbol such as in Figure 1 (below).

Figure 1 - continuous development, integration and deployment


Note, however, that this particular iteration of the DevOps infinity graphic calls out the participation of both the application and the database. This is an important, though often lacking, detail that should be stressed when adopting DevOps practices.

The Mainframe and DevOps

The adoption of DevOps has, until now, been much slower within mainframe development teams than for distributed and cloud application development. The staid nature of mainframe development and support, coupled with a glass house mentality, and a rigid production turnover process contribute to the delayed adoption of DevOps on the mainframe. This is not surprising as mainframes mostly are used by large organizations running mission critical workloads with an aversion to any kind of change and risk-averse.

Additionally, the traditional waterfall development methodology has been used by most mainframe software developers for multiple decades, whereas DevOps is closely aligned with an agile approach, which differs significantly from waterfall.

Notwithstanding all of these barriers to acceptance of DevOps on the mainframe, mainframe developers can, and in some cases already do successfully utilize a DevOps approach. Technically speaking, the mainframe is just another platform and there is nothing inherent in its design or usage that obviates the ability to participate in a DevOps approach to application development and delivery.

What about Db2 for z/OS?

Integrating database change into the application delivery lifecycle can be a stumbling block on the road to DevOps success. Development teams focus on application code, as they should, and typically view database structure changes as ancillary to their coding efforts. In most application development projects, it is not the programmer’s responsibility to administer the database and modify database structures. But applications rely on the database being designed, implemented, and changed in accordance with the needs of the business and the code.

This means that many development projects have automated their SDLC tool chain to speed up the delivery of applications. This is the “Dev” portion of DevOps. But the requisite automation and tooling has not been as pervasively implemented to speed up the delivery of database changes. This is the “Ops” portion of DevOps. And this is changing.

A big consideration is that the manner in which change is applied to applications differs from how database changes are applied. That means each must be managed using different techniques and probably different tools. When an application program changes, the code is compiled, and the load module is migrated from test to production. The old load module is saved for posterity in case the change needs to be backed out, but the change is a wholesale replacement of the executable code.

Database changes are different. The database is an entire configuration in each environment and changes get migrated. There is no wholesale replacement of the database structures. DDL commands are issued to ALTER, DROP, and CREATE the changes to the database structures as needed.

From the perspective of database changes on Db2 for z/OS, DBAs need the ability to modify all the database objects supported by Db2 for z/OS. Supporting Db2 for z/OS using DevOps requires tooling that understands both Db2 for z/OS and the DevOps methodology and toolchain. And the tooling must understand how changes are made, as well as any underlying changes that may be required to effectively implement the database change. Some types of database changes are intrusive, requiring a complicated series of unloads, metadata captures, drops, creates, loads, and additional steps to implement. The tooling must be capable of making any of these changes in an automated way that the DBA trusts.

Fortunately, for organizations adopting DevOps on the mainframe with Db2, there is a solution for integrating Db2 database change into the DevOps toolchain: BMC AMI DevOps for Db2. BMC AMI DevOps for Db2 integrates with Jenkins, an application development orchestration tool, to automatically research and determine database schema change requirements, to streamline the review and approval process, and to safely implement the database schema changes making development and operations teams more efficient and agile.  

Monday, July 29, 2019

Webinar: DevOps and Database Change Management for Db2 for z/OS - August 13, 2019

DevOps practices are gaining popularity on all development platforms and the mainframe is no exception. DevOps relies heavily on agile development and automated software delivery. However, the ability to integrate and orchestrate database changes has lagged. To learn more about DevOps, change management, and Db2 for z/OS, I am delivering a webinar on this topic along with John Barry of BMC. We will discusses issues including an overview of DevOps, the requirements for database change management, and an introduction to BMC’s new AMI DevOps for Db2 that solves the change management dilemma for Db2 for z/OS development. You can register today to attend the webinar on August 13, 2019 (Noon Central) at https://event.webcasts.com/starthere.jsp?ei=1251892&tp_key=3ff9b7af72.

Tuesday, July 16, 2019

Proud to be an IBM Champion

Just a quick post today about the IBM Champions program, which if you haven't heard about, is a special program run by IBM to recognize and reward non-IBM thought leaders for their work associated with IBM products and communities. 

IBM publishes the list of IBM Champions annually and the title is valid for one year. So, champions must be nominated each year to maintain their status.

I want to thank IBM for running such a wonderful program and for all they have done to help recognize those of us in the trenches using IBM's technology. I have been named an IBM Champion for Data and Analytics again this year... for the 10th time. So IBM bestowed upon me this Acclaim badge:


As an IBM Champion I have had the opportunity to interact with IBM folks and with other IBM Champions at events, webinars, and in person, and it has definitely helped to enrich my professional life.

Although the majority of IBM Champions focus on data and analytics, the program is not just for data people! IBM names champions in each of the following nine categories: 
  • Data & Analytics
  • Cloud 
  • Collaboration Solutions 
  • Power Systems 
  • Storage 
  • IBM Z 
  • Watson IoT 
  • Blockchain 
  • Security 
If you are, or know of, somebody who should be an IBM Champion, you can nominate them here: https://developer.ibm.com/champions/.

Thanks again, IBM... and congratulations to all of this year's IBM Champions.

Wednesday, July 10, 2019

There’s a New Db2 12 for z/OS Function Level (505)


In late June 2019, IBM delivered more great new capabilities with the latest new function level for Db2 12 for z/OS, Function Level 505 (or FL505).

If you do not know what a function level is, then you probably aren’t yet on Version 12, because function levels are how new capabilities are being rolled out for Db2 12 and beyond. It is how IBM has enabled a continuous delivery model for Db2 functionality. You can learn more about function levels here.

Although the first link above goes into all of the gory details of the new functionality, I will take a bit of time to summarize the highlights of this new function level.

The first thing that will appeal to most Db2 users is improved performance. And FL505 delivers improved performance in two areas: HTAP and RUNSTATS.
  • For HTAP, FL505 improves the latency between Db2 and the IBM Analytics Accelerator (sometimes called IDAA). Nobody likes latency and these improvements can enable transactional and analytical applications to see the same data.
  • For RUNSTATS, FL505 makes the default RUNSTATS behavior to use page sampling for universal table spaces (unless the RUNSTATS specification explicitly states TABLESAMPLE SYSTEM with a value other than AUTO). This will boost RUNSTATS performance. (A nice description of this is provided by Peter Hartmann here.)

FL505 also delivers
REBIND phase-in for executing packages. Waiting for a package to be idle (not running) has long been a deterrent to rebinding. Now, you can REBIND a package while it is running. Db2 makes this happen by creating a new copy of the package. When the REBIND completes, new executions of the package will use the newly rebound package and the threads already running with the old package continue to do so successfully until completion.

We also get some new built-in functions (BIFs) in FL505, for encrypting and decrypting data using key labels. You may be aware that Db2 already had functions for encryption and decryption but these functions, introduced back in V9 were not very capable because they required you to provide and manage a password to decrypt the data. The new functions work with key labels: encrypting plain text using ENCRYPT_DATAKEY to a block of encrypted text using a specified algorithm and key label; and decrypting with DECRYPT_DATAKEY to return the block of data decrypted to the specified data type.

And with FL505 we finally get additional functionality for DECFLOAT data type. The DECFLOAT data type was introduced in DB2 9 for z/OS, but it is not widely used because of some shortcoming. But first, what is DECFLOAT? Well, DECFLOAT is basically a combination of DECIMAL and floating-point data types, that is a decimal floating-point or DECFLOAT data type. Specified as DECXFLOAT(n), where the value of n can be either 16 or or 34, representing the number of significant digits that can be stored. A decimal floating-point value is an IEEE 754r number with a decimal point and it can be useful to store and manage very large numbers.

So what is the improvement? Quite simply, it is now possible to specify columns defined as DECFLOAT in an index and as a key in a primary key or a unique key. Unfortunately, there is still no support for DECFLOAT usage in COBOL programs, which will likely continue to hinder its uptake in many shops.

And finally, FL505 improves temporal support for triggers. It delivers the capability to reference system temporal tables and archive-enabled tables in the WHEN clause of your triggers.  

Summary

IBM is using function levels to deliver significant new capabilities for Db2 12 for z/OS. It is important for you and your organization to keep up-to-date on this new functionality and to determine where and when it makes sense to introduce it into your Db2 databases and applications.

Also, be aware that if you are not currently running at FL504, moving to FL505 activates all earlier function levels. You can find a list of all the current function levels here.



Monday, June 17, 2019

Two Types of Db2 for z/OS Temporary Tables: Declared vs. Created

Db2 for z/OS has provided the ability to create temporary tables since way back in Version 5. But the initial functionality was practical only in certain circumstances due to some inherent limitations. The first type of temporary table supported by Db2 is now known as a created temporary table. 

But IBM’s support of temporary tables expanded since (in Version 7) and Db2 offers two different types of temporary tables: created and declared.

Why use Temporary Tables?

Before we delve into these two types of temporary tables, let’s first consider why anybody would want or need to use a temporary table in the first place.

One potential usage of temporary tables is to store intermediate SQL results. Consider, for example, if the results of one query need to be used in a subsequent query. Instead of rerunning the first query (or combining it with the subsequent query), the results of the first query can be stored in a temporary table. Then the temporary table can be joined into the second query without incurring the overhead of rerunning the first query. This is particularly useful if the first query is particularly complex or inefficient.

An additional use case is when a query result set needs to be returned more than once during the execution of the same program. Consider this scenario: a complex multi-table join is coded that consumes a lot of resources to run. Furthermore, that join needs to be run three times during the course of the program. Instead of running the join three times you can run it once and populate a temporary table with the results. The next two times you can simply read the temporary table which might be more efficient than re-executing the complex, resource-consuming multi-table join.

Temporary tables also can be useful for enabling non-relational data to be processed using SQL. For example, you can create a global temporary table that is populated with IMS data (or any other non-relational data source) by a program. Then during the course of that program, the temporary table (that contains the heretofore non-relational data) can be accessed by SQL statements and even joined to other Db2 tables. The same could be done for data from a flat file, VSAM, IDMS, or any other non-relational data.

Another reason for temporary tables is to make conversion from other relational products easier.

Now let’s examine the two types of temporary tables supported by DB2.

Created Temporary Tables

A created temporary table exists only as long as the process that uses it. Temporary tables are created using the CREATE GLOBAL TEMPORARY TABLE statement. When created, the schema for the table is stored in the Db2 system catalog (SYSIBM.SYSTABLES) just like any other table, but the TYPE column is set to 'G' to indicate a global temporary table. Created temporary tables are sometimes referred to as global temporary tables – but this is confusing since declared temporary tables are also referred to as global declared tables.

It is important to remember that a created global temporary table must be created using a DDL CREATE statement before it can be used in any program.

A created temporary table is instantiated when it is referenced in an OPEN, SELECT INTO, INSERT, or DELETE statement, not when it is created. Each application process that uses the temporary table creates a new instance of the table for its use. When using a created temporary table, keep the following in mind:
·        Because they are not persistent, some typical database operations including locking, logging, and recovery do not apply to created temporary tables.
·        Indexes can not be created on created temporary tables so all access is by a complete table scan.
·        Constraints can not be created on created temporary tables.
·        A null is the only default value permitted for columns of a created temporary table.
·        Created temporary tables can not be referenced by Db2 utilities.
·        Created temporary tables can not be specified as the object of an UPDATE statement.
·        When deleting from a created temporary table, all rows must be deleted.
·        Although views can be created on created temporary tables, the WITH CHECK OPTION can not be specified.

Work file data sets are used to manage the data of created temporary tables. The work database (DSNDB07) is used as storage for processing SQL statements that require working storage – not just for created temporary tables. So if you are using created temporary tables be sure to examine the Db2 Installation Guide for tactics to estimate the disk storage required for temporary work files.

When a temporary work file result table is populated using an INSERT statement, it uses work file space. No other process can use the same work file space as that temporary work file table until the table goes away. The space is reclaimed when the application process commits or rolls back, or when it is deallocated, depending which RELEASE option was used when the plan or package was bound. It is a good idea to keep the work files in a separate buffer pool to make it easier to monitor.

Declared Temporary Tables

The second type of Db2 temporary table is the temporary tables. It is different than a created temporary table and overcomes many of their limitations. The first significant difference between declared and created temporary tables is that declared temporary tables are specified using a DECLARE statement in an application program – and not using a DDL CREATE statement. Because they are not persistent they do not have descriptions in the DB2 Catalog. 

Additionally, declared temporary tables offer significant features and functionality not provided by created temporary tables. Consider:
·      Declared temporary tables can have indexes and CHECK
     constraints defined on them.
·      You can issue UPDATE statements and positioned DELETE statements against a declared temporary table.
·      You can implicitly define the columns of a declared temporary table and use the result table from a SELECT.

To “create” an instance of a declared temporary table you must issue the DECLARE GLOBAL TEMPORARY TABLE statement inside of an application program. That instance of the declared temporary table is known only to the process that issues the DECLARE statement. Multiple concurrent programs can be executing using the same declared temporary table name because each program will have its own copy of the temporary table.

But there is more work required to use a declared temporary table than there is to use a created temporary table. Before you can declare temporary tables you must create a temporary database and table spaces for them to use. This is accomplished by specifying the AS TEMP clause on a CREATE DATABASE statement. Then, you must create segmented table spaces in the temporary database. Only one temporary database for declared temporary tables is permitted per Db2 subsystem.

When a DECLARE GLOBAL TEMPORARY TABLE statement is issued, Db2 will create an empty instance of the temporary table in the temporary table space. INSERT statements are used to populate the temporary table. Once inserted, the data can be accessed, modified, or deleted. When the program completes, DB2 will drop the instance of the temporary table. Also, be aware that users of temporary tables must have been granted USE authority on the temporary table space.

The following example shows a DECLARE statement that can be issued from an application program (assuming the temporary database and table spaces already have been defined):

DECLARE GLOBAL TEMPORARY TABLE TEMP_EMP
  (EMPNO      CHAR(6)     NOT NULL,
   FIRSTNME   VARCHAR(12) NOT NULL,
   MIDINIT    CHAR(1)     NOT NULL,
   LASTNAME   VARCHAR(15) NOT NULL,
   WORKDEPT   CHAR(3),
   PHONENO    CHAR(4)
  );

This creates a declared temporary table named TEMP_EMP. 

Additionally, you can use the LIKE clause to DECLARE a temporary table that uses the same schema definition as another currently existing table. You can use the INCLUDING IDENTITY COLUMN ATTRIBUTES clause to copy the IDENTITY columns as well. For example:

DECLARE GLOBAL TEMPORARY TABLE TEMP_PROJ
 LIKE DSN8C10.PROJ
 ON COMMIT PRESERVE ROWS;

Notice the ON COMMIT PRESERVE ROWS clause in the previous example. The ON COMMIT clause specifies what action Db2 is to take with the data in the declared temporary table when the program issues a COMMIT statement. There are two options: PRESERVE or DELETE rows. 

Specifying PRESERVE ROWS indicates that the rows of the table are to be kept. Beware, though, that the PRESERVE ROWS option impacts thread reuse. You will not be able to reuse threads for any application process that contains, at its most recent COMMIT, an active declared temporary table defined using the PRESERVE ROWS option of the ON COMMIT clause. The other option, which is the default, is ON COMMIT DELETE ROWS. In that case all of the rows of the table are deleted as long as there are no cursors defined using WITH HOLD.

Scrollable Cursors

A scrollable enables cursor provides the ability to scroll forward and backward through the data once the cursor is open. Using only SQL, the programmer can navigate up and down the cursor results. Although this blog post is not about scrollable cursors, it is important to know that there are two types of scrollable cursors: static and dynamic. Static scrollable cursors use declared temporary tables. 

So, keep in mind, even if you do not choose to use temporary tables in your application programs, you may need to implement them to support scrollable cursors.

Declared Temporary Table Storage

Before using declared temporary tables, the temporary database and temporary table spaces must be defined to store the temporary data. For example:

CREATE DATABASE TEMPDB AS TEMP;
CREATE TABLESPACE TEMPTS IN TEMPDB
 SEGSIZE 4
 BUFFERPOOL BP7;

The table space is created as a temporary table space by virtue of it being in the temporary database.

The page size of the temporary table space must be large enough to hold the longest row in the declared temporary table. The size of a row in the declared temporary table might be considerably larger then the size of the row in the table for which the scrollable cursor is used. As with a regular table, the size of the row depends on the number of columns that are stored in the declared temporary table and the size of each column.

An in-depth discussion of calculating the storage requirements for declared temporary table table spaces is provided in the Db2 Installation Guide. Be sure to refer to that manual before implementing declared temporary tables or any features that rely on declared temporary tables (e.g. static scrollable cursors).

Keep in mind, too, that when there is more than one temporary table space defined to the Db2 subsystem, Db2 will select which temporary table spaces it will use for scrollable cursor processing.

Declare or Create?

With all of the limitations of created temporary tables why would anyone still want to use them instead of declared temporary tables?

Well, there are a few potential problems with declared temporary tables, too. First of all, the SYSPACKDEP catalog table will not show dependencies for declared temporary tables, but it will for created temporary tables. Secondly, some DBAs are leery of allowing database structures to be created by application programmers inside of an application program. With limited DDL and database design knowledge it may not be wise to trust programmers to get the table structure correct. Furthermore, the additional management of the temporary database and table spaces can become an administrative burden.

So, created temporary tables are still useful – in the right situations. They should be considered primarily when no updating of temporary data is needed and access to the temporary data is purely sequential.

Synopsis

Db2 provides two options for handling temporary data in tables: created and declared temporary tables. The wise Db2 professional will understand the capabilities and shortcomings of each type of temporary table – and deploy the correct type for each specific situation.



Friday, May 24, 2019

Time for the IDUG North American Db2 Tech Conference 2019


It is hard to believe that May is almost over already and that means that the 2019 IDUG North American Db2 Tech Conference is right round the courner! This year’s event is being held in beautiful Charlotte, NC the week of June 2-6, 2019. That is a little bit later than normal, but I think that is a good thing as it should be nice and warm in North Carolina in June!

I hope you’ve already made your plans to be there, but if you haven’t there’s still time to get your manager’s approval, make you travel plans, and be where all the Db2 folks will be the first week of June.

If you’ve ever attended an IDUG conference before then you know why I’m excited. IDUG offers a world of phenomenal educational opportunities delivered by IBM developers, vendor experts, users, and consultants from all over the world. There will be a slew of informative technical sessions on all of the latest and greatest Db2 technologies and features. The 2019 event offers more educational opportunities and training than ever before, including:
·                     Five days of educational sessions
·                     Half and full-day workshops
·                     Over 100 one-hour technical sessions
·                     Two expert panels on z/OS and LUW

If you have any doubts whether there will be something worthwhile for you there just take a look at this packed agenda! For me, one of the highlights of the conference is always the Db2 for z/OS Spotlight session. It will be delivered right after the keynote on day one by Jeff Josten, Maureen Townsend, and John Campbell. I always learn something new about what is going on with my favorite DBMS at this spotlight session.

What Am I Up to at IDUG?

As usual, I will be busy at this year’s IDUG. I will be arriving in Charlotte the Saturday before the conference to participate in some pre-conference meetings. On Saturday I’ll be meeting up with the other IBM Gold Consultants to hear from IBM and, of course, to catch up with my consultant friends. Then on Sunday I’ll be participating in Db2 customer meetings that IBM conducts to share upcoming “things” and to get feedback.

The first time you will probably get a chance to see me this year will be at the opening keynote, where I’ll be chatting on a panel about DevOps, impact on database administration and operations, and its importance moving forward.

I also will be delivering two sessions this year. My conference session is on Tuesday, June 4, at 8:00 AM (Session E05) titled Coding Db2 for Performance: By the Book. This session is based on my latest book and it is aimed at application developers. The general idea is to give an overview of the things that you can do as you design and code your Db2 programs (for z/OS or LUW) with performance in mind. All too often performance is an afterthought – and that can be quite expensive. Nail down the basics by attending this session!

I’m also delivering a vendor-sponsored presentation (or VSP) for Infotel, also on June 4th but later, at 10:40 AM. This presentation, titled Improving Db2 Application Quality for Optimizing Performance and Controlling Costs. My portion of the presentation focuses on the impact of DevOps on database; it will be followed up by Carlos Almeida of Infotel, who will talk about how their SQL quality assurance solutions can aid the DevOps process for Db2 development.

And let's not forget the exhibit hall (aka Solutions Center) where vendors present and demo their products that can help you manage Db2 more effectively. It is a good place to learn about new technology solutions for Db2, but also to hang out and meet with IBMers, consultants, and your peers.

Additionally, this year I'll be spending some time in the booth with some of my favorite Db2 vendors.
On the first day of the Solutions Center, Monday, I'll be at the BMC Software booth supporting a new DevOps-related Db2 for z/OS offering they are delivering. Be sure to stop by the BMC booth to hear about it and ask me what I think about it… and, of course, talk with BMC folks about your favorite Db2 products and see what’s new with BMC and Db2! 



On the second day of the exhibition hall, Tuesday, I'll be at the Infotel booth, but only for an hour at lunchtime. So if you have any questions we didn’t answer in the VSP, you can ask us at the Infotel booth. Be sure to stop by and say hello, take a look at Infotel’s SQL quality assurance solutions for Db2 for z/OS, and register to win one of 2 of my Db2 application performance  books that will be raffled off. If you win, be sure to get me to sign your copy!


That is a lot for one week, but there is more.

Just Look at All That IDUG Has to Offer!

You can go to full-day education sessions on Sunday June 2nd (at an additional cost – but only $250). These sessions are typically chockful of useful information that you can take home and apply to your Db2 environment. And this year there are sessions teaching how to get certified as a DBA for either z/OS or LUW or as a SYSADMIN, SQL PL best practices, and Db2 for beginners.

You can also attend one or more of the 8 different Hands-On Labs (with hands-on training led by IBM). There are hands on labs focused on diagnosing performance problems, Machine Learning for z/OS, DevOps, Mobile, Cloud and more.
There are also two different half-day workshops on June 6th – you can choose either Db2 12 for z/OS Migration Planning Workshop or Hands-on Machine Learning Bootcamp.

And don’t forget to attend one of the Special Interest Groups where you can discuss the latest industry trends and topics with other interested technicians. This year’s SIGs span the spectrum from big data to cloud to continuous delivery to IDAA and even AI.

Be sure to download the mobile app for the conference to help you navigate all the opportunities available to you! Armed with the mobile app you’ll get daily intel on what’s happening at the conference.

Justifying Your Attendance

Finally, if you need any help justifying your attendance at this year’s IDUG event, just use this justification letter as your template to create an iron-clad rationale for your boss.

The Bottom Line

The IDUG Db2 Tech Conference is the place to be to learn all about Db2 from IBMers, gold consultants, IBM champions, end users, ISVs, and more. With all of this great stuff going on this year in Charlotte, why wouldn't you want to be there!?!?

Monday, April 22, 2019

Db2 Application Testing Considerations


Testing application programs is a discipline unto itself and there are many considerations and nuances to be mastered to be able to test software appropriately. This blog post will not go into a lot of depth regarding testing practices, other than to highlight the important things to keep in mind to assure optimal performance of your Db2 applications.

Statistics

The data that you use in your test environment will not be the same as your production data. Typically, you will have less test data than you do in production. So, if you run the RUNSTATS utility on your test data you will get different statistics than in production. 

Instead of running RUNSTATS, you can test your SQL access paths by updating the test catalog statistics to be the same as your production system. Do not attempt to modify test statistics on your own. You should work with your DBA group to set up a process for up­dating test statistics. This can be accomplished in various ways. Your organization may have a tool that makes it easy to copy statistics from production to test; or your DBA team may use a DDL script of queries and modification statements to populate test statistics from production.

If your test table definitions are different in test and produc­tion you will need to take this into account in the script. For example, things like creator, name, indexes, number of parti­tions, and even columns can differ between environments. Furthermore, you may have new tables and columns for which there are no current production statistics, meaning that you will need to create estimated statistics based on your knowledge of the business and application.

Some organizations make the mistake of copying production statistics to test once, and never (or rarely) populating test again. This is a mistake because most production databases change over time, sometimes dramatically. When you run Runstats for your production applications it is a good idea to also update your test statistics from the new production statis­tics.

Modeling a Production Environment in Test

Another tactic you can use to improve the accuracy of access path testing is to model the configuration and settings of your productionenvironment in your test system. Remember that the Db2 optimizer does not just use statistics, but also infor­mation about your computing environment.

Db2 test systems typically vary from the production system. Application testing is often accomplished on test systems that have different parameters and configurations than the produc­tion systems that run the applications. Test systems usually get set up on a less powerful processor (or LPAR), and use less memory for buffering, sorting, and other system processes. This can result in different access paths for test and produc­tion systems, which can cause performance problems that only show up after you move to production.

However, it is possible to model the configuration and param­eters of your production environment in your test system. You can specify configuration details for Db2 to use for access path selection in your test system using profile tables.

Work with your DBA group to configure proper profile settings for testing your applications.

Test Cases for Skewed Data

Db2 assumes that data values are mostly uniformly distributed throughout the data. However, not all data is uniformly distributed. Db2 RUNSTATS can capture information about non-uniformly distributed and skewed data.

When data is non-uniformly distributed a subset of the values occur much more frequently than others. A special case of non-uniformly distributed data is skewed data. When data is skewed, one value (or a very small number of values) occurs much more frequently than others.

Non-uniformly distributed and skewed data presents a performance testing challenge. The Db2 optimizer can formulate different access paths for non-uniformly distributed data based on the actual values supplied. This is particularly important for dynamic SQL applications, but you should be aware of non-uniformly distributed and skewed data even for static SQL applications.

For non-uniformly distributed data you can examine the Db2 catalog to obtain values for the most commonly occurring values. For Db2 for z/OS this information is in the SYSIBM.SYSCOLDIST table.

Be sure to test with several of the values that are stored in the Colvalue column of these tables, and some that are not. This will enable you to test the performance of the most common values and less common values. The access paths may differ for each and the performance also can vary.

An Example

Let’s discuss an example. Suppose you operate a bar with a cigar room and you want to keep track of customers with a table of cigar smokers. We gather information like name, address, phone number, sex, and favorite cigar in the table. Cigar smokers skew male, so there will likely be many more rows where Sex is M, than there are where Sex is F.  With that background, consider a query like this one:

  SELECT name, phoneno, fav_cigar
  FROM   cigar_smokers
  WHERE  sex = ?;

Because the data is skewed, it is possible that Db2 will choose a different access path for M than for F. If the vast majority of the rows are men, then a table scan might be chosen for Sex = ‘M’; whereas with only a few rows for women, an index might be chosen if one exists on the Sex column.

This is just a simple example. You need to understand your data and how it skews to make sure that you create and test sample test cases for all of the pertinent values.

SQL Variations

A final performance testing consideration is to consider multiple SQL variations, especially for queries that access a lot of data or have complex access paths. Do not just find one SQL formulation that works and stick with it. Remember from earlier chapters that you can code multiple variations of SQL statements that return the same data, but that perform quite differently.


Note
This blog post was adapted and excerpted from my latest book, A Guide to Db2 Performance for Application Developers. Click the link for more information or to buy a copy (both print and ebook available).