The Db2 Portal Blog

Tuesday, June 09, 2020

Optimizing Mainframe Data Access

Nobody can deny that the amount of data that we store and manage continues to expand at a rapid pace. A recent study by analysts at IDC Corporation concludes that the size of what they call The Global Datasphere will reach 175 zettabytes by 2025.

This data growth is being driven by many different industry trends and patterns. We see mobile usage continuing to grow unabated, while at the same time we are hooking up more devices that generate more data to the Internet of Things (IoT), we are storing more data for analysis and machine learning, creating data lakes, and copying data all over the place.

As such, organizations are looking to process, analyze, and exploit this data accurately and quickly. This is especially the case for mainframe sites, where optimizing data usage and access can result in big returns on decision-making and also big savings…

So how can organizations leverage the best data storage for each type of usage required? Well, it helps to think of the different types and usages of data at a high level. If we consider data along two axes -- volatility and usage -- we can map out where it makes sense to store the data… and which IBM technologies we can bring to bear to optimize that data.

From a volatility perspective, there is a continuum of possibilities from never-to-rarely changing to frequently changing. And from a usage perspective there is a continuum of possibilities ranging from mostly analytical and decision-making to transactional that conducts day-to-day business operations.

These continuums are outlined in the chart below. As you review the chart, keep in mind that transaction processing is typified by short queries that get in, do their business, and get out. Analytics processing, on the other hand, is typically going to require longer-running queries. Furthermore, the chart calls out two other types of data: reference data and temporary data.

Reference data is that which defines permissible values to be used by other data elements (columns or fields). Reference data is typically widely-used and referenced by many applications. Additionally, reference data does not change very often (hence its inclusion near the bottom of the volatility continuum on the chart). Temporary data, as the name suggests, exists for a period of time during processing, but is not stored persistently (which is why it is depicted near the top of the volatility continuum on the chart).

With this framework as our perspective, let’s dig in and look at the options shown in the center of the chart. For the most part, we assume that mainframe data will be stored in IBM Db2 for z/OS, but of course, not all mainframe data will be. For analytical processes, IBM provides the IBM Db2 Analytics Accelerator (aka IDAA).

IBM Db2 Analytics Accelerator

IDAA is a high-performance appliance for analytical processing that is tightly-integrated with Db2 for z/OS. The general idea is to enable HTAP (Hybrid Transaction Analytical Processing) from the same database, on Db2 for z/OS. IDAA stores data in a columnar format that is ideal for speeding up complex queries – sometimes by orders of magnitude.

Data that is loaded into the IDAA goes through a hashing process to map data to multiple discs and different blades. The primary purpose of spreading the data over multiple discs is to enable parallelism, where searching is performed across multiple discs (portions of the data) at the same time, resulting in efficiency gains.

Db2 for z/OS automatically decides which queries are appropriate for execution on IDAA. And when a query is run on IDAA, it is distributed across multiple blades. Each blade delivers a partial answer to the query, based on the portion of the data on its discs. The combination of each of the blades results provides the final query results.

When you think of the types of queries, that is, longer-running ones, that IDAA can boost, think of queries like the following:

SELECT something
FROM big table
WHERE suitable filter clause

SELECT something with aggregation
FROM big table
WHERE suitable filter clause

IBM Z Table Accelerator

But IDAA is not designed to help your short-running transactional queries; at least not that much. So we can turn to another IBM offering to help out here: IBM Z Table Accelerator? At a high level, IBM Z Table Accelerator is an in-memory table accelerator for Db2 and or VSAM tables that can dramatically improve overall Z application performance and reduce operational cost.

The most efficient way to access data is, of course, in-memory access. Disk access is orders-of-magnitude less efficient than access data from memory. Memory access is usually measured in microseconds, whereas disk access is measured in milliseconds. (Note that 1 millisecond equals 1000 microseconds.)

This is the case not only because disk access is mechanical and memory access is not, but because there are a lot of actions going on behind the scenes when you request an I/O. Take a look at this diagram, which comes from a Marist University white paper on mainframe I/O.

The idea here is to show the complexity of operations that are required in order to request and move data from disk to memory for access, not to explicitly walk through each of these activities. If you are interested in doing that, I refer you to the link shown for the white paper.

So an in-memory table processor, like IBM Z Table Accelerator, can be used to keep data in memory for program access to eliminate the processing and complexity of disk-based I/O operations. The benefits of IBM Z Table Accelerator are many: it can allow you to reduce resource consumption, it can help to reduce elapsed time experienced in batch windows, and it can reduce operational cost and improve system capacity.

The concept is simple enough, as shown in this overview graphic (above). The IBM Z Table Accelerator is used to host reference and/or temporary data in memory, instead of on disk, to significantly improve application performance.

So let’s take a look at the difference between an I/O operation (or any fetch of data from Db2) versus accessing the data using IBM Z Table Accelerator:

The top of the diagram shows the code path required by the data request (or fetch) as it makes it way from disk through Db2 and back to the application. The bottom portion of the diagram shows the code path accessing data using IBM Z Table Accelerator. This is a significant simplification of the process and it should help to clarify how much more efficient in-memory table access can be.

The Bottom Line

If you take the time to analyze the type of data you are using, and how you are using it, you can use complementary acceleration software from IBM to optimize your application accesses. For analytical, long-running queries consider using IBM Db2 Analytics Accelerator. And for transactional processing of reference data and temporary data, consider using IBM Z Table Accelerator.

Both technologies are useful for different types of data and processing.

(Note: You can click on any graphic in the post to see it enlarged.)

Wednesday, May 20, 2020

IBM Think 2020: Virtual, On Demand, Hybrid Cloud and Z

This year’s IBM Think event was quite different than in past years. Usually, Think is an in-person event and attracts a lot of people, typically more than ten thousand IT executives and practitioners. But as we all know, this year with the global COVID-19 pandemic an in-person event was not practical, so IBM held it on-line. And I have to say, they did a fantastic job of managing multiple threads of content without experiencing bandwidth or access issues – at least none that I encountered.

The theme and focus of the content for the event was different, too. Instead of the usual conference focus on products, announcements, and customer stories, this year’s event was more philanthropic. Oh, sure, you could still hear about IBM’s products and customer successes, but the keynote and featured sessions were at a higher level this year.

In the kickoff session, new IBM CEO Arvind Krishna spoke about the driving forces in IT as being hybrid cloud and AI. And he spoke about these things in the context of moving IBM forward, but also how they can be used to help healthcare workers combat pandemics like we are currently experiencing.

In another keynote, IBM Executive Chairman Ginni Rometty spoke with Will.i.am (of the Black-Eyed Peas) about making the digital era inclusive through education, skills development, and the digital workforce.

And then there was Mayim Bialik’s session on women and STEM, which was sincere, heartfelt, and entertaining.

For those who don’t know who she is, she is the actress who played Blossom (on Blossom) and Amy Farrah Fowler (on The Big Bang Theory)… but she is also a scientist with a doctorate in neuroscience. Bialik’s session focused on putting a positive female face on STEM, something that is definitely needed!

So, what about the technology side of things? Well, you can take a clue from Krishna’s assertion that IBM as a company has to have a “maniacal” focus on hybrid cloud and AI in order to compete. But the company has a rich and deep heritage across the computing spectrum that gives it a key advantage even as it adjusts to embracing hybrid cloud and AI.

The first thing to remember is that IBM uses the term “hybrid multicloud[RB1] ” very specifically and deliberately. Everything is not going to be in the cloud[RB2] . Large enterprises continue to rely on the infrastructure and applications they have built over many years, many of them on z Systems mainframes. The key to the future is both on-premises and cloud, and IBM understands this with its hybrid cloud approach… as they clearly demonstrated at Think 2020.

My specific area of focus and expertise is the mainframe and Db2 for z/OS, so I sought out some sessions at Think in those areas. Let me tell you a bit about two of them.

First let’s take a quick look at how IBM Cloud Pak for Data can work with data on the Z platform. This information was drawn from IBM Distinguished Engineer Gary Crupi’s session, titled "Drive Actionable, Real-Time Insight from Your High-Value IBM Z Data Using IBM Cloud Pak for Data."

What is Cloud Pak for Data? Well, it is an IBM platform for unifying and simplifying the collection, organization, and analysis of data. Heretofore, it was mostly focused on non-mainframe platforms, but the latest release, version 3.0, is a major upgrade with an enhanced unified experience, expanded ecosystem, and optimized Red Hat integration. And it enables several ways for you to turn your enterprise data on IBM Z into actionable, real-time insight through the integrated cloud-native architecture of IBM Cloud Pak for Data.

Crupi’s session started out with the now familiar (at least to IBM customers and Think attendees) Ladder to AI and how Cloud Pak for Data helps to enable customer’s journey up the ladder. Data is the foundation for smart business decisions and AI can unlock the value of this data.

He went on to discuss the continuing importance of the mainframe providing facts including:

70% of Fortune 500 companies use mainframe for their most critical business functions
72% of customer-facing applications are completely or very dependent on mainframe processing
The mainframe handles 1.1 million transactions per second (as compared to Google experiences of 60,000 searches per second)
95% of transactions in the banking, insurance, airline and retail industries run on the mainframe

These are all good points; and things that mainframe users like to hear. It is good to see IBM promoting the ubiquity and capabilities of the mainframe.

Now, what about IBM Cloud Pak for Data better-exploiting mainframe data? Crupi goes back to the AI Ladder to talk about z/OS capabilities for analyzing and collecting data for AI.

Solutions such as Watson Machine Learing for z/OS, Db2 AI for z/OS, and QMF can be used for analyzing data; while Db2 for z/OS and Tools, IDAA, and Data Virtualization Manager can be used for data collection. These things already exist, but using them effectively with distributed platform capabilities will be crucial to be able to climb the ladder to AI.

IBM Cloud Pak for Data will leverage IBM Z technology to bring valuable IBM Z data into a modern analytics/AI platform. It can now exploit IBM Z data and resources where appropriate enabling you to further benefit from IBM Z technology and data.

A key new component of making the data on IBM Z accessible is IBM Db2 for z/OS Data Gate, a new product announced during Think 2020. Db2 Data Gate can help you reduce the cost and complexity of your data delivery with a simple, easy-to-deploy mechanism to deliver read-only access to Db2 for z/OS data. Instead of building and maintaining costly custom code, Db2 Data Gate do the work. Data can be synchronized between Db2 for z/OS data sources and target databases on IBM Cloud Pak for Data.

Instead of accessing data in the IBM Z data source directly, an application accesses a synchronized copy of the Db2 for z/OS data, hosted by a separate system. This target system can be established anywhere Cloud Pak for Data is supported, thus enabling a wide range of target platforms that include public cloud, on-premises, and private cloud deployments.

So IBM is helping you to expand the accessibility of your Z data.

And that brings me to the second session I’d like to briefly mention, Automate Your Mainframe z/OS Processes with Ansible [Session 6760].

Although Ansible is not a replacement for your operational mainframe automation tools, it can be used to communicate with and automate z/OS using the out-of-the-box SSH into z/OS Unix Systems Services to execute commands and scripts, submit JCL, and copy data. And Ansible has existing modules that can be used to make calls to RESTful/SOAP APIs that are available in many z/OS products.

Ansible can be beneficial to orchestrate cross-platform, including Z systems, and to simplify configuration and deployment management. But keep in mind that Ansible is a proactive framework for automation and is not intended to replace automation solutions that monitor and react.

Here is a nice, but by no means exhaustive, list of examples showing how Ansible can be used to interact with popular z/OS products.

The Bottom Line

The IBM Think 2020 conference was a great success considering how rapidly IBM had to move to convert it from an in-person event, to an online, virtual one. And the content was informative, entertaining, and had something for everybody. I hope you enjoyed my take on the event… feel free to share your comments below on anything I’ve written here, or on your experiences at the event.

Wednesday, May 13, 2020

Db2 11 for z/OS End of Service Date Extended!

In an earlier blog post, I wrote about Db2 11 for z/OS End of Support Coming This Year (2020)... but that was before the global COVID-19 pandemic swooped in and changed everything!

If you check out that earlier post, you'll even see that I made the comment that the "date appears to be a firm one... don't bet on IBM extending it." Well, that would have been a bad bet! And that is another reason why it is not a good idea to predict the future (even when you hear the prediction from a credible source).

Yes, IBM has extended the end of service (EOS) for Db2 11 for z/OS by 6 months... from September 30, 2020, to next year, March 31, 2021. They furthermore state that they expect it to be a one-time adjustment (but I'm not going to predict the future this time).

You can find the revised EOS terms here.

Regardless of the extension, it still makes sense to start planning your migration to Db2 12 for z/OS now. Actually, with the slowdown in many corporations due to the pandemic, your DBAs and systems programmers might have some time to do this now.

Keep in mind that Db2 11 was made generally available way back on October 25, 2013, 7 years ago. This is an eternity in the world of enterprise software. So it is nice to have more wiggle room, but don't use it to delay further... start your planning now (if you haven't already).

Friday, May 01, 2020

Db2 for z/OS and Managing Database Changes - The Recap

During the month of April 2020 I wrote a series of blog posts on the different types of Db2 for z/OS database change management and the things to remember and consider...

Today, the first day of May, I just wanted to publish a quick recap and links to all of these posts.

So without further ado...

The first post in this series introduced the types of changes and briefly explained the differences at a very high level. It serves as the introduction to the next three parts.

Part 2 examined simple changes, the easiest of the three types of change to implement. These usually just require issuing a simple ALTER to effect database changes.

In the next installment, Part 3 details medium changes, known in the Db2 world as pending changes. Introduced in Db2 10 for z/OS, these require a little bit more work and can only be performed on database objects in Universal table spaces.

And then in the final post, Part 4 takes a look at complex changes. These are the types of changes to database structures that are only supported by dropping and then re-creating the database structure with your required changes.

If this quick recap whetted your appetite for more details, please take a moment or two to click through each of the links and read the more detailed posts.

And good luck managing your Db2 for z/OS changes!

Tuesday, April 28, 2020

Db2 for z/OS and Managing Database Changes - Part 4

Today brings the fourth, and final installment of our series examining the different types of changes that can be made to database objects and structures in Db2 for z/OS. Part 1 introduced the three types of changes, part 2 examined simple database changes, and part 3 took a look at medium, or pending changes.

And that brings us to the final type of Db2 schema change, the complex change. A complex change is essentially one that is unsupported by Db2 other than by dropping and then re-creating the database structure with the desired change. Of course, implementing such changes is not as easy as just dropping and re-creating the object. For example, if you want to add a column to the middle of an existing row, it cannot be done using ALTER, and such, it is a complex change. Of course, this is not the only type of complex change. Any change that is not simple (immediate) or medium (pending) is a complex change and it requires an in-depth series of tasks that will differ based on the database object being changed and the specific change to implement.

An example of the type of activities that may need to be scripted to implement a complex database change include:

Retrieve the current definition of the database object by querying the appropriate Db2 Catalog tables, which will be different for each type of object.
Retrieve the current definition of any dependent objects as well; for example, if you drop a table, then triggers, views, and indexes are also dropped.
Capture all referential constraints for all tables involved in the change (either directly or indirectly).
Retrieve all security authorizations that have been granted for all database objects that will be dropped either directly or as a result of cascading drops.
Obtain a list of all programs that access impacted tables by using the Db2 Catalog, Db2 Directory, and any other program documentation at your disposal.
Unload the data from all tables that will be impacted.
Drop the database object to be changed, which in turn drops any dependent objects, revokes authorizations, and invalidates any SQL statements against any impacted tables in any application programs.
Recreate the database object with the new specifications by using the definition obtained from the Db2 Catalog earlier.
Reload the tables, using the unloaded data obtained earlier.
Recreate any referential constraints that may have been dropped.
Recreate any triggers, views, and indexes for the table.
Recreate the security authorizations captured earlier.
Examine each application program to determine whether changes are required for it to continue functioning appropriately.
Test thoroughly.

The above list is not meant to be an exhaustive list of everything that must be accomplished for every type of complex schema change that you might have to implement. Instead, the list is intended to convey the intricacies involved in making complex changes and how automation can minimize risk and speed up the process!

Furthermore, it should be clear that complex changes will require an outage to complete. When database objects are dropped applications will no longer be able to access them until the changes are complete and if tables are involved, not until the data has been reloaded.

A well-designed and implemented database schema change solution must be able to understand and implement all of the types of changes covered in this section, and to implement them appropriately. That means that the tool should implement a medium, pending change when possible instead of simply deferring to a complex change. It also means being able to assemble a script of all the appropriate actions required for any type of complex change that the DBA may need to perform.

To work in a modern environment, the tool should also understand DevOps and agile development and integrate into any DevOps pipeline/toolchain seamlessly.

Obviously, such capabilities require built-in intelligence and knowledge of Db2 for z/OS and its many nuances and features.