Friday, July 16, 2021

Keeping Track of Data Movement in Db2 for z/OS

Creating and managing test data for Db2 application development and testing requirements can be a significant challenge. To enable not only the development of new programs, but to be able to maintain existing ones, organizations must ensure that there is an adequate amount of accurate test data always available. Without relevant, useful data, there is no way to test applications to make sure they are operating correctly. 

Although this duty must be a shared one between the application developers and the DBAs, managing and controlling all of the data movement tasks typically falls on the DBAs. And keeping track of what data moved where, when it moved, and why can at times be as much of a challenge as moving the data itself.

Fortunately, there are test data management tools available to not only move the data, but to keep track of it. I’ve written about one of the better Db2 for z/OS data movement tools here in the blog before: Fast and Effective Db2 for z/OS Test Data Management with BCV5. I hope you'll take a moment to click and read that post.

Now BCV5 has been improved with a new reporting feature, to enable users to track the movement of data across their Db2 subsystems. This is a significant new feature that can be used to glean useful information for DBAs, storage administrators, and even by data stewards for data governance.

There are six tables of metadata that BCV5 populates to track the data movement and the copy tasks it performs. These tables are:
  • BCV531.TASK_EXECUTIONS
  • BCV531.TASK_EXECUTIONS_OBJECTS
  • BCV531.TASK_EXECUTIONS_PARAMETERS
  • BCV531.TASK_EXECUTIONS_JOBS
  • BCV531.TASK_EXECUTIONS_RULES
  • BCV531.TASK_EXECUTIONS_MASKING
The information in these tables is updated whenever BCV5 runs a task to copy Db2 data. Users can query these tables just like any other Db2 tables to monitor the details of the BCV5 tasks you have run. This information may be useful for many different IT and business professionals, but let’s take a look at three specific use cases: 
  1. database administration (DBA), 
  2. storage administration, and 
  3. data governance.
DBA tracking
DBAs tasked with moving and refreshing data from one Db2 environment to another are the typical users of BCV5, and therefore they will be one of the primary users of the new reporting tables. Most sites that use BCV5 use it to refresh test data, for example, copying production data to test, or copying unit test data to an integration test set of tables.  

Regardless of the type of data movement that is being undertaken, it is usually being done for multiple tables, tablespaces, and databases. Usually, there will be regularly scheduled processes that copy some of the data, but this is rarely sufficient as there will be on-off requests, special situations, and emergency data refreshes happening all the time. Keeping track of such a hectic morass of copying and refreshing data can be difficult. 

Fortunately, if you are using BCV5 the new Usage Tracker tables can simplify keeping track of data refreshes for DBAs. For example, a DBA looking to find out which BCV5 copy tasks were run during the month of May could code a query like:

 SELECT T.ROWDATE, T.USER, T.TASKNAME 
 FROM BCV531.TASK_EXECUTIONS T 
 WHERE T.ROWDATE BETWEEN ´2021-05-01´ AND ´2021-05-31´ 
 ORDER BY T.ROWDATE ;

This will show all the BCV5 copy tasks that ran during that timeframe, and will look something like this:

ROWDATE      USER     TASKNAME 
---------+---------+---------+---------+--------- 
2021-05-12   USERID1  TSK0001 
2021-05-12   USERID1  TSK0002 
2021-05-20   USERID5  TSKPROD1 
2021-05-21   USERID9  TSKPROD4 
 

The results shown here are just a sample and will likely be a subset of the actual results of running such a query. 

Of course, this is rudimentary information and it is likely that the DBA will want to know more, such as which objects were impacted by these tasks. A query such as the following will come in handy:

SELECT SUBSTR(SRCSCHEMA,1,8) AS SS, 
       SUBSTR(SRCNAME,1,12) AS SN, 
       SUBSTR(TGTSCHEMA,1,8) AS TS, 
       SUBSTR(TGTNAME,1,12) AS TN, 
       SUBSTR(OBJTYPE,1,1), 
       ROWDATE AS DATE_COPIED, 
       SIZEKB 
FROM   BCV531.TASK_EXECUTIONS_OBJECTS 
ORDER BY DATE_COPIED ;

The results here show all the Db2 objects copied by BCV5 showing the source and target names as well as the object type, date copied, and amount of data (in KB) copied:

SS      SN       TS     TN         DATE_COPIED   SIZEKB 
---------+---------+---------+---------+---------+-----
DB500XA TS500X01 DBA001 TS500XA1 S  2021-05-17  1462480 
QUALID  XCL59011 TESTID XCL59011 X  2021-05-17   325040 
QUALID  XCL59012 TESTID XCL59012 X  2021-05-17   125200 
QUALID  XCL59013 TESTID XCL59013 X  2021-05-17   301460 
QUALID  XCL5901C TESTID XCL5901C X  2021-05-17    98400 
QUALID  TEST_TBL TESTID TEST_TBL T  2021-05-17       20 

Again, the results have been truncated as this is intended as an example.

A DBA looking to track down the results of a specific copy task that ran on a specific date might want to run a query like this, to verify which objects were copied. Simply plug in the name of your task and the date it ran:

SELECT T.TASKNAME, O.OBJTYPE, O.SRCSCHEMA, O.SRCNAME, 
       O.TGTSCHEMA, O.TGTNAME, O.PARTITIONS 
FROM   BCV531.TASK_EXECUTIONS         T, 
       BCV531.TASK_EXECUTIONS_OBJECTS O 
WHERE  T.ID = O.EXECID 
AND    T.TASKNAME = ? 
AND    T.ROWDATE = ? ;

Storage Administration Tracking 
Another type of user who might find the Usage Tracker capabilities of BCV5 useful is the storage administrator. Storage administrators are responsible for managing an organization’s disk and tape systems. Additionally, they are also responsible for monitoring storage usage and capacity to ensure that sufficient storage is available for the organization’s IT requirements. 

As such, the storage administrator will likely want to keep an eye on the data movement activities of BCV5. For example, a query such as this one can be used to report on the total amount of data (TS and IX) copied by date:

SELECT ROWDATE AS DATE_COPIED, 
       SUM(SIZEKB) AS TOTAL_KB_COPIED 
FROM   BCV531.TASK_EXECUTIONS_OBJECTS 
WHERE  OBJTYPE IN ('S', 'X') 
GROUP BY ROWDATE ;

Which will return data similar to this:

DATE_COPIED  TOTAL_KB_COPIED 
---------+---------+---------+---------+------ 
2021-06-12         106231270 
2021-06-19         106231270 
2021-06-21        4451457810 
2021-06-26         106231270 
Another potentially useful query, not only for storage administrators and DBAs, but also for application managers, is tracking the amount of actual (tablespace) data copied by date and application. Finding the application name or identifier can be tricky, but if we assume that an application identifier is embedded in the second 2 chars of database name then a query like this can be run:

WITH SIZEBYAPP AS ( 
  SELECT ROWDATE AS DATE_COPIED, 
         SUBSTR(TGTSCHEMA,2,2) AS APPL_NAME, 
         SIZEKB AS SIZE_IN_KB 
  FROM   BCV531.TASK_EXECUTIONS_OBJECTS 
  WHERE OBJTYPE = 'S' 
                  ) 
SELECT DATE_COPIED, APPL_NAME, 
       SUM(SIZE_IN_KB) AS TOTAL_KB_COPIED 
FROM   SIZEBYAPP 
GROUP BY DATE_COPIED, APPL_NAME ;

Which might return a report looking something like this:

DATE_COPIED APPL_NAME TOTAL_KB_COPIED 
---------+------------------+---------+------- 
2021-05-11  EN               46805760 
2021-05-22  BA              242791056 
2021-05-22  BX                4094640 
2021-05-22  CM                1008720 
2021-05-22  DA              270390816 
2021-05-22  OR                  90528 
2021-05-26  PR               55737376 
2021-05-26  XX              537647328 

You can adjust this query if you want to know the amount of index data copied by data and application like so (under the same assumption as above for application identifier):

WITH SIZEBYAPP AS ( 
  SELECT B.ROWDATE AS DATE_COPIED, 
         SUBSTR(T.DBNAME,2,2) AS APPL_NAME, 
         B.SIZEKB AS SIZE_IN_KB 
  FROM   BCV531.TASK_EXECUTIONS_OBJECTS B, 
         SYSIBM.SYSINDEXES              X, 
         SYSIBM.SYSTABLES               T 
 WHERE   B.OBJTYPE = 'X' 
 AND     B.TGTNAME = X.NAME 
 AND     B.TGTSCHEMA = X.CREATOR 
 AND     X.TBNAME = T.NAME 
 AND     X.TBCREATOR = T.CREATOR 
                 ) 
SELECT DATE_COPIED, APPL_NAME, 
       SUM(SIZE_IN_KB) AS TOTAL_KB_COPIED 
FROM   SIZEBYAPP 
GROUP BY DATE_COPIED, APPL_NAME ;

Data Governance Tracking 
Although tracking data movement activities is useful for DBAs, it is also important for data governance reporting. Data governance refers to the processes and standards of ensuring access to high-quality data throughout an organization. Data governance encompasses all aspects of data quality including its accuracy, availability, consistency, integrity, security, and usability. The role of data governance has expanded as data privacy rules and regulations have expanded in response to an increasing number of data breaches and hacker attacks. For example, in early July 2021, Colorado passed the Colorado Privacy Act, meaning that now Colorado, Virginia, and California have passed data privacy legislation that impacts how personal data must be governed. More states are certain to follow their lead… and let’s not forget the European GDPR act!

These type of regulations provide rights for access, deletion, correction, portability, and protection for personally identifiable information, or PII. Note that “portability” is one aspect of data protection covered under the auspices of such regulations… and BCV5 is a mover of data, so you need to be able to track what data moved where, especially when the data that moved contains any PII. 

So, what types of queries can be run using the new BCV5 reporting tables to help satisfy the needs of data governance? 

Well, if you have identified specific tables that have personally identifiable information, and therefore requires specific policies to ensure its privacy and protection, a data steward might want to run a query that shows all of the times that a specific protected table was copied:

SELECT SUBSTR(SRCSCHEMA,1,8) AS SS, 
       SUBSTR(SRCNAME,1,12) AS SN, 
       SUBSTR(TGTSCHEMA,1,8) AS TS, 
       SUBSTR(TGTNAME,1,12) AS TN, 
       ROWDATE AS DATE_COPIED, 
       SIZEKB 
FROM   BCV531.TASK_EXECUTIONS_OBJECTS 
WHERE  OBJTYPE = 'T' 
AND    SRCSCHEMA = ? 
AND    SRCNAME = ? 
ORDER BY DATE_COPIED ;

Simply code the appropriate schema (SRCSCHEMA) and table name (SRCNAME) and this query will show all the times that the particular table (say PROD.CUSTOMERS) was copied. A data governance professional with a list of tables that contains PII could alter this query to accept that list as an IN clause instead of the simple equality clause shown here. 

Additionally, BCV5 can de-identify sensitive data using masking. Whenever a task that requires data to be masked is run, information is captured in the TASK_EXECUTIONS_MASKING table. So, a data governance professional might want to run a query like this one to report on all the masking of sensitive data.

SELECT SUBSTR(TBCREATOR,1,8) AS TBCREATOR,
       SUBSTR(TBNAME,1,12) AS TBNAME, 
       SUBSTR(COLNAME,1,18) AS COLNNAME, 
       METHOD AS MASKING_METHOD, 
       SQLEXPR 
FROM   BCV531.TASK_EXECUTIONS_MASKING 
ORDER BY ROWDATE ;

This can always be modified to join it to the TASK_EXECUTIONS table to obtain the task name if that is important. And with a little manipulation of the query it is possible to look for tables that contain sensitive data that have been copied using BCV5, but have not had masking applied. 

Summary 
BCV5 has offered powerful data movement and masking capabilities for Db2 data for a long time, but now it also offers the ability to track and report on your organization’s Db2 data movement and copy tasks. This new functionality opens a plethora of useful information for BCV5 users.

Tuesday, July 06, 2021

How Many Temporal Tables Does Your Site Have?

Sometimes it can be difficult to remember where information is stored in the Db2 Catalog. Usually, with a little rumination and a little review of Appendix A of the IBM Db2 SQL Reference manual (SC27-8859), you can come up with a solution.

For example, I was talking to some DBAs who were trying to remember if they had ever created any business-time temporal tables. A comment was made that we could surely find that in the Db2 Catalog and the conversation moved along... but then I thought, hmmm, let me see what I can do about coming up with a catalog query.

The first step was to think about where this information might be found, which took me to SYSTABLES. A good first thought, but no, it isn't there. So I thought, how about SYSCOLUMNS? And lo' and behold, there was the answer.

The columns identified as the start and end date/time for the temporal range are documented in SYSCOLUMNS in the PERIOD column. PERIOD is defined as a CHAR(1) column and it contains one of the following values for every column defined for each table:

Value Meaning                                                                                    
   B         Column is the start of period BUSINESS_TIME
   C     Column is the end of period BUSINESS_TIME with
    an exclusive endpoint
    I     Column is the end of period BUSINESS_TIME with
    an inclusive endpoint
   S     Column is the start of period SYSTEM_TIME
   T     Column is the end of period SYSTEM_TIME
blank         Column is not used as either the start or the end of
    a period


So using this information, here is a query that will show information about all of the business-time temporal tables you have created:

SELECT SUBSTR(TBCREATOR,1,8) || '.' || SUBSTR(TBNAME,1,30) 
       AS TABLENAME,
       SUBSTR(NAME,1,40) AS COLUMNNAME, 
       COLNO, 
       PERIOD
FROM   SYSIBM.SYSCOLUMNS
WHERE  PERIOD IN ('B', 'C', 'I')
ORDER BY TABLENAME, PERIOD, COLUMNNAME;

If you want to find the system-time temporal tables, just swap out the WHERE clause with this one:

  WHERE PERIOD IN ('S', 'T')  


By becoming adept at querying the Db2 Catalog tables you can find out just about everything you want to know about the databases and objects defined in your Db2 subsystems!

Tuesday, June 15, 2021

Db2 12 for z/OS Function Level 510

I'm a little late with this Db2 function level update, but better late than never, right?

In April 2021, IBM introduced a new function level, FL510, for Db2 12 for z/OS. If you want to take a look at the announcement for it, you can read it here, but there really isn't a lot to it.

Unlike all the other function levels, FL510 does not add any new features or capabilities, nor does it introduce any new changes to the Db2 Catalog. So what does it do?

This function level is basically there to prepare for the next new release of Db2, which will obviously be coming soon, or IBM would not have created this function level for it!  So it is time to start thinking about Db2 Next and getting ready for a new release/version of our favorite DBMS!

But we really haven't answered what FL510 does, have we? It is a housekeeping type of function level. When you activate FL510 it verifies and enforces several pre-migration conditions that have to be met before you can migrate to the next Db2 release. It will make sure that all Db2 12 function levels are activated and that all catalog updates for Db2 12 have been applied. This means that the Db2 catalog level is at the last catalog level for Version 12 and any future migration can therefore proceed.

Additionally, FL510 will check to make sure that your application packages were rebound recently enough to ensure that they are supported by the next Db2 release.

If any of the previous conditions are not met, then the activation of FL510 will fail. You will have to remediate your system and try to activate FL510 again before you can move forward to the new release.

Also, please be aware that FL510 has nothing to do with the fallback SPE that will have to be applied before moving forward with the eventual, new Db2 release. IBM will deliver the fallback SPE in a subsequent APAR at a point in time.

So I guess that this is a boring function level in that it delivers no new functionality... but it is exciting as it is a pre-req for a new  Db2 release that is on the horizon!

Sunday, May 09, 2021

Thinking About the Mainframe, the Cloud, and IBM Think 2021

A Bit about Think

I am looking forward to attending the IBM Think 2021 conference, IBM's annual flagship technology event. I have attended several in-person Think events, as well as last year’s virtual conference, and I always come away with new knowledge and additional insight into technology and IBM’s vast portfolio of hardware, software, and solutions. The Think conference is always one of the tech highlights of the year for me!

This year’s event, IBM Think 2021, is again being held as a virtual conference, May 11 and 12, 2021. And it is free of charge, which means that you can experience all the great education, presentations, and networking opportunities without having to leave your desk.

My favorite aspect of the Think conference is the breadth and scope of pertinent technical content that it covers. Whether you are a developer, a DBA, a data scientist, a manager, an executive, or any flavor of IT or business specialist, there will be a wealth of useful information presented to educate you and make you “think.”  Be sure to register here.

My Think 2021 Agenda

There are multiple sessions to be delivered at this year’s IBM Think conference that intrigue me because they focus on areas where I specialize.  For example, Dr. Dario Gil, SVP and Director of IBM Research will be delivering a keynote session on IT infrastructure which is sure to be educational. This session, 2081, offers a deep dive into the IBM innovations powering the next generation of hardware, including IBM Z.

Another session I am looking forward to is session 2303 focusing on security “everywhere.” It features IBM luminaries like Tom Rosamilia, Senior Vice President, IBM Systems, and Mary O’Brien, General Manager IBM Security. And Forrester Research Director, Lauren Nelson, will also be lending her industry expertise to the session.

But I think the Think 2021 session I am most looking forward to is The IBM Z roadmap for hybrid cloud and AI (session 1605) featuring Ross Mauri General Manager for IBM Z. Mauri promises to offer a timely discussion on the business value of integrating the IBM Z platform as a full participant into your hybrid cloud. And he’ll speak with Russell Plew, Technology Senior Manager at M&T Bank who will discuss their real-life experiences in doing so!

Why is this session so interesting to me? Well, I’ve worked with the mainframe my entire career, and as anybody who works on the mainframe knows, the IBM Z platform is used to drive mission-critical workloads across all major industry sectors, worldwide. If your organization needs to perform large-scale transaction processing (thousands of transactions per second), support thousands of users and programs concurrently, manage terabytes of information, and handle large-bandwidth communication, chances are you rely on the mainframe to do that because the platform excels at all of those things.

If you’ve ever deposited a check into your bank account, booked a flight on an airline, or used a credit card to purchase something, it is probable that a mainframe was involved in completing that activity!

Ever since Stewart Alsop of InfoWorld predicted the last mainframe would be unplugged on March 15, 1996 there has been a lingering perception that the mainframe would go away at some point. But here we are, 25 years later, and the mainframe is still going strong! At last year’s IBM Think conference IBM presented the following statistics on the mainframe’s ubiquity and power:

      70% of the Fortune 500 use mainframes and 72% of customer-facing applications are dependent on the mainframe for some or all data processing.

      Mainframes are designed to be able to process a trillion web transactions a day with the capability to process 1.1 million transactions per second.

      95% of transactions in the banking, insurance, airline and retail industries are handled by mainframes.


Indeed, the mainframe continues to offer a strong, unparalleled platform for performance, security, and reliability. Of course, the mainframe has changed and grown over its 50+ year lifespan. Today’s IBM z15 is light-years beyond the original IBM System/360 introduced in 1964. Some of the great newer capabilities of the IBM Z include encryptions everywhere with pervasive encryption and Data Privacy Passports, rack-mountable mainframes, Instant Recovery, and cloud-native development. I’m looking forward to hear how IBM’s customers have taken advantage of these, and other capabilities, to integrate the IBM Z into their hybrid cloud architecture.

It only makes sense that businesses relying on the mainframe will continue to do so, even as they embrace cloud computing. This is what the “hybrid” in the term hybrid cloud implies, an IT infrastructure that uses a mix of on-premises and private / public cloud from multiple providers. And this approach makes the most sense because everything can’t shift to the cloud immediately (perhaps ever) because most existing applications were not built with an understanding of the public cloud and it would take a lot of investment to re-engineer them to properly take advantage of a public cloud architecture. And even if you wanted to move everything, cloud service providers (CSPs) can’t build out their infrastructure fast enough to support all the existing data center capacity “out there” to immediately support everything.

So, it will be exciting to watch the IBM continue to innovate on the IBM Z platform as enterprise customers work to integrate Z as a vital component of their hybrid cloud infrastructure. With the large investment enterprises have in their working mainframe applications, large data sets and databases containing crucial data, and high-volume processing requirements they will continue to rely on the mainframe well into the future… and that makes it important to understand how IBM is enabling the IBM Z to participate in your hybrid cloud architecture.

So, join me at Think 2021 for session 1605 to learn how to use your investments in IBM Z and build and modernize applications into container-based workloads using a common DevOps experience. And stick around for other sessions to gain insights on harnessing the full value of IBM hardware, software and services in your organization as you continue to support, manage, and transform traditional business and IT operations.


Wednesday, April 07, 2021

Happy Birthday to the IBM Mainframe

I am older than the mainframe... I turned 58 on April 3rd, and the IBM mainframe officially celebrates its 57th birthday today, April 7th.

The IBM 360 was launched on April 7, 1964 and the world of enterprise computing has never been the same.

Here are a few links and articles to check out as we celebrate the ongoing vitality of mainframe computing:

So, all of you mainframe users out there, today is indeed a day to celebrate... another year has gone by, and mainframes are still here... running the world!