Thursday, September 08, 2022

The Importance of Data Masking for IBM Db2 z/OS

A three-pronged set of trends are conspiring to increase the need for organizations to protect and mask sensitive data stored in Db2 for z/OS databases:

  • data privacy regulations place specific requirements on how data is to be protected, 
  • data breaches continue to grow, thereby requiring improvements in data protection protocols,
  • and the on-going requirement to copy production data to test in order to conduct realistic application testing 

Protecting your data while managing these three trends requires a systematic way to ensure that your sensitive data is not exposed or surreptitiously accessed. An ideal method to accomplish this is by masking sensitive data using a data masking tool.



If you want to learn all about the requirements for data masking, and how UBS-Hainer's BCV5 Masking Tool can protect your critical Db2 for z/OS data, be sure to register for and attend my upcoming webinar, Protect Your Sensitive Db2 for z/OS Data with the BCV5 Masking Tool on October 13, 2022 at 11am EST / 5 pm pm CEST.

I hope you will join me for this informative webinar.


Tuesday, September 06, 2022

New Series of Short, Low-Price Books

Today's blog post is to introduce my regular readers to a series of new books that I am writing and selling exclusively on Amazon. The books will be shorter than the ones I've written in the past, each one will be under 100 pages. The primary audience is for Kindle eBook readers, but there will also be paperback, print editions, too!

The general idea is to discuss a specific topic or idea within the realm of data, database systems, or mainframe computing. And to offer the books at a low price point. The eBook format will generally cost a couple of bucks or so (with the option for Kindle Unlimited readers to read free)! And the print editions will generally cost around $10 or so. Very low cost with some high quality information!

As of today, the first three books are now available to be ordered on amazon. They are as follows:

The Tao of Db2: Achieving Balance and Understanding with Db2

This book follows the exploits of a seasoned Db2 DBA training his intern in the ways of Db2 before he can enjoy a well-earned retirement. Learn best practices for managing Db2 properly to achieve harmonious systems and applications that deliver quality and performance using the Tao of Db2

Based on a presentation I gave at IDUG several years ago now, this book uses Tao as a training mechanism for achieving the proper mindset and qualities to manage and administer Db2 databases and applications. 

Database Design Reviews: Techniques to Prepare Databases and Applications for Operational Implementation

This book provides a framework for a series of design reviews that should be conducted to ensure the delivery of quality database and application code. It defines what a database design review is, and offers guidance on structuring the design review meetings, assigning appropriate participants to attend, and describing the timing and purpose of each meeting and attendee.

You can use this book as a template for establishing a consistent approach to database design reviews in your organization.

Regulatory Compliance for Data & Database Systems: Protect Your Sensitive Data

Being in compliance with governmental and industry regulations is important for organizations of all types and sizes. Indeed, there are numerous regulations impacting your data and database systems, ranging from industry regulations like PCI DSS to more encompassing governmental regulations like GDPR. There are around 100,000 regulations "out there" so it is certain that many of them apply to you and your business. But what do you need to do to protect your senstive data to comply with all these regulations?

This book delivers an introduction to the regulatory landscape and its impact on data and database systems and management. We will take a high-level look at some of the most impactful regulations that affect how we manage database systems. The primary focus of this book is to examine several techniques that can be used to ensure compliance. We will look at things like database auditing, data masking, and data archiving. And it will show the tactics and software required to implement compliance within your database systems.
Armed with the information in this book you can confidently implement effective compliance for your corporate databases and systems.


Thursday, September 01, 2022

Know Your Isolation Levels to Develop Correct and Efficient Db2 Programs

Did you know that Db2 provides a method to change the way a program or SQL statement acquires locks? That way is known as the isolation level and it can be set to specify the locking behavior for a transaction or statement. Standard SQL defines four isolation levels that can be set using the SET TRANSACTION ISOLATION LEVEL statement:

·        Serializable

·        Repeatable read

·        Read committed

·        Read uncommitted

The isolation level determines the mode of page or row locking implemented by the program as it runs.

Db2 supports a variation of the standard isolation levels. Db2 implements page and row locking at the program execution level, which means that all page or row locks are acquired as needed during the program run. Page and row locks are released as the program run depending on the isolation level.

In Db2 you can specify the following four isolation levels: cursor stability (CS), repeatable read (RR), read stability (RS), and uncommitted read (UR).

Using the ISOLATION parameter of the BIND command you can set the isolation level of a package or plan. You also can use the WITH parameter on a SELECT statement to set the isolation level of a single SQL statement.

Cursor stability is the Db2 implementation of the SQL standard read committed isolation level. CS is perhaps the most common DB2 isolation level in use in production applications because it offers a good tradeoff between data integrity and concurrency. When CS is specified the transaction will never read data that is not yet committed; only committed data can be read. Cursor stability is the current default isolation level if none is specified at BIND time.

A higher level of integrity is provided with repeatable read. Under an RR isolation level all page locks are held until they are released by a COMMIT (or ROLLBACK), whereas with CS read-only page locks are released as soon as another page is accessed.

An RR page locking strategy is useful when an application program requires consistency in rows that may be accessed twice in one execution of the program, or when an application program requires data integrity that cannot be achieved with CS.

For example, of a good reason to use RR page locking, consider a reporting program that scans a table to produce a detail report, and then scans it again to produce a summarized managerial report. If the program is bound using CS, the results of the first report might not match the results of the second.

Suppose that you are reporting the estimated completion dates for project activities. The first report lists every project and the estimated completion date. The second, managerial report lists only the projects with a completion date greater than one year.

The first report indicates that two activities are scheduled for more than one year. After the first report but before the second, however, an update occurs. A manager realizes that she underestimated the resources required for a project. She invokes a transaction to change the estimated completion date of one of her project's activities from 8 months to 14 months. The second report is produced by the same program, but reports 3 activities.

If the program used an RR isolation level rather than CS, an UPDATE that occurs after the production of the first report but before the second would not have been allowed. The program would have maintained the locks it held from the generation of the first report and the updater would be locked out until the locks were released.

How about another example? Consider a program that is looking for pertinent information about employees in the information center and software support departments who make more than $30,000 in base salary. In the DB2 sample tables department 'C01' is the information center and department 'E21' is software support.

The program opens a cursor based on the following SELECT statement:

    SELECT  EMPNO, FIRSTNME, LASTNAME,
            WORKDEPT, SALARY
    FROM    DSN8B10.EMP
    WHERE   WORKDEPT IN ('C01', 'E21')
    AND     SALARY > 30000;

The program then begins to FETCH employee rows. Assume further, as would probably be the case, that the statement uses the XEMP2 index on the WORKDEPT column. An update program that implements employee modifications is running concurrently. The program handles transfers by moving employees from one department to another, and implements raises by increasing the salary.

Assume that Sally Kwan, one of your employees, has just been transferred from the information center to software support. Assume further that another information center employee, Heather Nicholls, received a 10 percent raise. The update program running concurrently with the report program implements both of these modifications.

If the report program were bound with an isolation level of CS, the second program could move Sally from 'C01' to 'E21' after she was reported to be in department 'C01' but before the entire report was finished. Thus, she could be reported twice: once as an information center employee and again as a software support employee. Although this circumstance is rare, it can happen with programs that use cursor stability. If the program were bound instead with RR, this problem could not happen. The update program probably would not be allowed to run concurrently with a reporting program, however, because it would experience too many locking problems.

Now consider Heather's dilemma. The raise increases her salary 10 percent, from $28,420 to $31,262. Her salary now fits the parameters specified in the WHERE condition of the SQL statement. Will she be reported? It depends on whether the update occurs before or after the row has been retrieved by the index scan, which is clearly a tenuous situation. Once again, RR avoids this problem.

You might be wondering, "If CS has the potential to cause so many problems, why is it used so ubiquitously? Why not trade the performance and concurrency gain of CS for the integrity of RR?"

The answer is simple: the types of problems outlined are rare. The expense of using RR, however, can be substantial in terms of concurrency. So the tradeoff between the concurrency expense of RR and the efficiency of CS usually is not a sound one.

The third isolation level provided by DB2 is read stability (RS). Read stability is similar in functionality to the RR isolation level, but a little less. A retrieved row or page is locked until the end of the unit of work; no other program can modify the data until the unit of work is complete, but other processes can insert values that might be read by your application if it accesses the row a second time.

Consider using read stability over repeatable read only when your program can handle retrieving a different set of rows each time a cursor or singleton SELECT is issued. If using read stability, be sure your application is not dependent on having the same number of rows returned each time.

Finally, we come to the last, and most maligned isolation level, uncommitted read (UR). The UR isolation level provides read-through locks, also know as dirty read or read uncommitted. Using UR can help to overcome concurrency problems. When you're using an uncommitted read, an application program can read data that has been changed but is not yet committed.

UR can be a performance booster, too, because application programs bound using the UR isolation level will read data without taking locks. This way, the application program can read data contained in the table as it is being manipulated. Consider the following sequence of events:

1.   To change a specific value, at 9:00 a.m. a transaction containing the
      following SQL is executed:

UPDATE DSN8B10.EMP
   SET FIRSTNME = ˈMICHELLEˈ
WHERE  EMPNO = 10020;

The transaction is a long-running one and continues to execute without issuing a COMMIT.

2.   At 9:01 a.m., a second transaction attempts to SELECT the data that was
      changed, but not committed.

If the UR isolation level were used for the second transaction, it would read the changed data even though it had yet to be committed. Obviously, if the program doesn't need to wait to take a lock and merely reads the data in whatever state it happens to be at that moment, the program will execute faster than if it had to wait for locks to be taken and resources to be freed before processing.

The implications of reading uncommitted data, however, must be carefully examined before being implemented. Several types of problems can occur. Using the previous example, if the long-running transaction rolled back the UPDATE to EMPNO 10020, the program using dirty reads may have picked up the wrong name ("MICHELLE") because it was never committed to the database.

Inaccurate data values are not the only problems that can be caused by using UR. A dirty read can cause duplicate rows to be returned where none exist. Alternatively, a dirty read can cause no rows to be returned when one (or more) actually exists. Additionally, an ORDER BY clause does not guarantee that rows will be returned in order if the UR isolation level is used. Obviously, these problems must be taken into consideration before using the UR isolation level.

Keep in mind, too, that the UR isolation level applies to read-only operations: SELECT, SELECT INTO, and FETCH from a read-only result table. Any application plan or package bound with an isolation level of UR will use uncommitted read functionality for any read-only SQL. Operations contained in the same plan or package and are not read-only will use an isolation level of CS.

When is it appropriate to use UR isolation? The general rule of thumb is to avoid UR whenever the results must be 100 percent accurate. Following are examples of when this would be true:

  • Calculations that must balance are being performed on the selected data
  • Data is being retrieved from one source to insert to or update another
  • Production, mission-critical work is being performed that cannot contain or cause data integrity problems

In general, most production Db2 applications are not serious candidates for dirty reads. In a few specific situations, however, the dirty read capability will be of major benefit. Consider the following cases in which the UR isolation level could prove to be useful:

  • Access is required to a reference, code, or look-up table that basically is static in nature. Due to the non-volatile nature of the data, a dirty read would be no different than a normal read the majority of the time. In those cases when the code data is being modified, any application reading the data would incur minimum, if any, problems.
  • Statistical processing must be performed on a large amount of data. Your company, for example, might want to determine the average age of female employees within a certain pay range. The impact of an uncommitted read on an average of multiple rows will be minimal because a single value changed will not greatly impact the result.
  • Dirty reads can prove invaluable in a data warehousing environment that uses DB2 as the DBMS. A data warehouse is a time-sensitive, subject-oriented, store of business data that is used for online analytical processing. Other than periodic data propagation and/or replication, access to the data warehouse is read-only. Because the data is generally not changing, an uncommitted read is perfect in a read-only environment due to the fact that it can cause little damage. More data warehouse projects are being implemented in corporations worldwide and DB2 with dirty read capability is a very wise choice for data warehouse implementation.
  • In those rare cases when a table, or set of tables, is used by a single user only, UR can make a lot of sense. If only one individual can be modifying the data, the application programs can be coded such that all (or most) reads are done using UR isolation level, and the data will still be accurate.
  • Finally, if the data being accessed already is inconsistent, little harm can be done using a dirty read to access the information.

Although the dirty read capability can provide relief to concurrency problems and deliver faster performance in specific situations, it also can cause data integrity problems and inaccurate results. Be sure to understand the implications of the UR isolation level and the problems it can cause before diving headlong into implementing it in your production applications.

Summary

It is important for Db2 DBAs and application programmers to know the four isolation levels and their impact on SQL. Using the isolation levels is an effective way to control concurrency and locking for your Db2 applications.

Thursday, August 04, 2022

All About zIIPs

If you work with Db2 then you need to know about the IBM zIIP specialty processor. This is true whether you are a DBA or a developer... and, let's face it, if you work on a mainframe in any capacity, you'll want to know at least something about zIIPs.

With that in mind, today's blog post is sort of a meta-post. You see, I've been writing a series of blog posts for Cloudframe on the topic of zIIPs. So, the goal today is to provide you with information about those posts and links to their content.

The first blog post, Understanding Mainframe Specialty Processors: zIIPs and More, is introductory in nature. It defines the term "specialty processor" and defines the various types of specialty processors available from IBM. And then it offers a bit more information about the zIIP.

The second post in this series is titled Digging Into the zIIP: What Does zIIP-Eligible Mean? The title is kind of self-describing as to what this post offers!

Next (3 of 8) we have Types of Processing That Can Utilize zIIPs & Why You Want to Use zIIPs which gets into TCBs and SRBs, enclaves, and discusses how zIIPs can reduce your mainframe software costs.

The fourth post in this series digs into one of the major benefits of zIIPs, namely the ability to run Java workloads on the zIIP. In the article, titled Java and the zIIP: Five Major Benefits, we look at the primary advantages that can be accrued by running Java on zIIPs instead of a general purpose CP.

And because the vast majority of mainframe programs are written in COBOL the fifth post, provides Options for Converting from COBOL to Java.

Next, we take a look at Common zIIP Usage Mistakes and How to Identify Them.

Then we turn our attention to the latest IBM pricing model, Tailored Fit Pricing, and examine how it can impact cost savings when it comes to zIIP usage. This post is titled, appropriately enough, The Impact of Tailored Fit Pricing on zIIPs.

And we close out with Predictions on the Future of zIIP and Specialty Processors (of course, this is my personal viewpoint on the topic with the proviso that nobody knows the future for sure)!

So it is my hope that you will take a moment or two and click through the links to the articles above that look interesting to you. And if you have any comments, or suggestions for future articles, as always, please post them below!

Friday, June 17, 2022

My Speaking Schedule at IDUG Db2 Tech Conference in Boston (2022)

 Just a quick note to let everybody who is coming to Boston in July for IDUG know what I will be speaking about and when my presentations are scheduled!

First of all, my regular IDUG session this year is titled "Things Your DBAs Hear... and how to stop making them crazy!" This session is based on my decades of experience as a DBA and as a consultant. This session walks you through interactions between developers and DBAs, in a light-hearted way. All of them are real-life examples of actual conversations I've been in (or observed).  Attend this session to learn what frustrates DBAs and how improving your communication can improve your relationship with your DBAs... and therefore improve your development  efforts!  This is session E11, and it will be delivered on Wednesday, July 13 at 11:30 AM.

I will also be presenting at two different VSP sessions, one on Tuesday and another on Wednesday (this is the 10:15 AM time slot on both days).

On Tuesday, I will be presenting with InfoTel on the topic "To Protect and Preserve: Treat Your Data Properly or Pay the Consquences." This session will discuss vital data management issues such as data archiving and data protection (my portion), as well as some products that can help you manager your data better (the InfoTel portion). 

On Wednesday, I will be presenting "How to Accelerate Db2 SQL Workloads... Without Db2!" for Log-On Software. This session takes a look at in-memory trends and issues, and shines a light on how QuickSelect can improve the performance of SQL queries.

I hope to see you at this year's IDUG North American conference the week of July 11, 2022. If you are there, come see one (or all) of my sessions... and be sure to say "Howdy!"