Tuesday, July 12, 2016

DB2 for z/OS Webinar: Who Did What to Which Data When?

Today's blog post is to let you know about an upcoming webinar (July 21, 2016) that I am conducting with CorreLog on the topic of database auditing for DB2 for z/OS. 




Unless you have been living under a rock these past few years you will have heard at least something about the many data breaches, hacks, and security issues that have been prevalent in the news. These issues have resulted in numerous industry and governmental regulations that organizations must understand and comply with. Data professionals, in particular, need to be vigilant in the tools and techniques that are available for better protecting their company’s data, and tracking those that access it. One of the most important of these techniques is database auditing. 

Database auditing provides a facility for tracking the use of database resources and authority. When auditing is enabled, each audited database operation produces an audit trail of information including information such as what database object was impacted, who performed the operation and when. The comprehensive audit trail of
database operations produced can be maintained over time to allow DBAs and auditors, as well as any authorized personnel, to perform in-depth analysis of access and modification patterns against data in the DBMS.

With this basic information on database auditing in mind, you should be able to readily see how auditing can help your organization answer questions like:

  • “Who accessed or changed critical data?” and 
  • “When was the data actually changed?” and perhaps even
  • “What was the old content prior to the change?” 

Your ability to answer such questions can make or break a compliance audit. Of course, these are just the high-level details. To delve more deeply into the important issues involving database auditing requires time and research... or you can attend our upcoming webinar and get up to speed quickly on the essentials you need to know!

The webinar, entitled Who Did What to Which Data When? will elaborate on database auditing details and issues such as the data breach trends and how costly they can be, an overview of pertinent regulations and their impact, and the various types of database auditing methods with their pros and cons... all with a specific focus on mainframe DB2. And there will also be an overview and demo of CorreLog's database auditing offering for DB2 for z/OS. 

You can learn all of this and more on Thursday, July 21, 2016, at 11:00 am Eastern Daylight Time (EDT).

But you have to register to attend, so be sure to click here to register.

Friday, July 08, 2016

Good Old Partitioned Table Space Confusion

Partitioned table spaces have been with us in the world of DB2 forever, but even as we embark on moving to Universal table spaces, there still exists some confusion "out there" in terms of the behavior of partitioning... 

For example, consider this question:

If a table is in a partitioned table space, say four partitions, for example, then to process the table in batch can I run four instances of the batch program in parallel, one against each partition?

This type of question exhibits some confusion about how partitioned table spaces work. First of all, let's assume that we are talking about classic partitioned table spaces and/or Universal range-partitioned table spaces. 

Given that assumption, yes, you can run four instances of a batch program in parallel if you so desire. Of course, that is not necessary to get DB2 use parallel tasks to read the data. The best approach is to BIND the program (package) specifying the DEGREE(ANY) parameter. Then DB2 will choose the degree of parallelism for the batch program. This is simpler than any alternative approach you might choose to code up because DB2 handles it all for you.

If you still wish to run four instances of the batch program you, of course, can. In that case you would probably want to BIND using DEGREE(1). In order for this to work the way I infer that you intend it to work, however, you may have to modify the program somewhat. I assume that you wish each job to process only against one of the four partitions. To accomplish this, you must provide some way for the program to identify and process only the data from one of the four partitions based on the partitioning key range specified in the partitioning index. 

For example, you might choose to input parameters to the batch run specifying the key range for that program to process. As long as the program adheres to that key range you should only process data from the one partition that holds that data.

Of course, that means that your programmers must be diligent in applying the key range. Perhaps you could set up 4 views that apply the key range for each partition and then use only those views in each program. But that means you will have multiple versions of the same program. You could also specify the range as an input variable and then use only one program. But that means you have to make sure that you are sending the proper range to the input variables.

So just go with the first approach and let DB2 do it for you when it makes sense...


Tuesday, June 14, 2016

Four Important Buffer Pool Tuning Knobs in DB2 for z/OS

DB2 has five (well, four current) primary adjustable thresholds that can be modified using the ALTER BUFFERPOOL command.  

These thresholds are as follows:

The Sequential Steal Threshold, or VPSEQT, is the percentage of the buffer pool that can be occupied by sequentially accessed pages. For example, at the default value of 80, when this threshold is reached, 80% of the buffer pool represents pages for sequential processing. Of course, 80% is just the default; you can modify this value based on your processing needs to any value ranging from 0 to 100. When this threshold is reached, DB2 will steal a sequential page first before stealing a page from a randomly accessed page. So, for data that is accessed mostly sequentially (for example, through scans and prefetching) consider increasing the value of this parameter, and for data that is accessed most randomly, consider decreasing the value of this parameter. A VPSEQT value of zero will prevent any sequential pages from lingering in the buffer pool and it will turn off sequential prefetch.  A VPSEQT value of 100 allows the entire buffer pool to be monopolized by sequential pages.

The next tunable buffer pool threshold is the Parallel Sequential Threshold, or VPPSEQT. This threshold indicates the amount of the buffer pool that can be consumed by sequentially accessed data for parallel queries. When this threshold is reached, DB2 will cease to steal random pages to store sequential pages accessed by parallel queries. The default value for VPPSEQT is 50%, indicating its size as 50% of the sequential steal threshold (VPSEQT). For example, if the buffer pool is defined as 1000 pages and VPSEQT is set at 80%, a query using I/O parallelism can consume up to 400 sequential pages (that is, 1000 x 80% = 800 for the sequential steal threshold and 800 x 50% = 400 for the parallel sequential threshold).

The third, and final sequential threshold is the Assisting Parallel Sequential Threshold (or VPXPSEQT). This threshold is no longer supported as of DB2 11 because Sysplex Parallelism is no longer supported. When it was available, VPXPSEQT was used to indicate the portion of the buffer pool that might be used to assist with parallel operations initiated from another DB2 in the data sharing group. 

The final two modifiable DB2 buffer pool thresholds are used to indicate when modified data is to be written from the buffer pool to disk. Log data is externalized when a COMMIT is taken, but writing of the actual data itself is controlled by the two deferred write thresholds (and DB2 system checkpoints).

First we have the Deferred Write Threshold (or DWQT). When DWQT is reached, DB2 starts scheduling write I/Os to externalize the data pages to disk. By default, the deferred write threshold is reached when 30% of the buffer pool is allocated to unavailable pages, whether updated or in use. The default is probably too high for most larger buffer pools.

DB2 also provides the Vertical Deferred Write Threshold (VDWQT), which is basically the same as DWQT but for a single page set. By default, VDWQT is reached when 5% of the buffer pool is allocated to one data set. When reached, DB2 will start scheduling write I/Os to externalize the data pages to disk. Once again, this default is most likely too high for most shops.

The VDWQT threshold can be specified as a percentage of the buffer pool, or as an absolute number of buffers. When you want to specify a relatively low threshold for VDWQT using an absolute number of buffers provides better granularity and control.

In general, consider ratcheting the deferred write thresholds down to smaller percentages (from the defaults) for most of your buffer pools. Doing so enables “trickle” write from the DB2 buffer pools. This means that the data is written asynchronously to disk regularly over time in smaller amounts, instead of storing up a lot of modified data that has to be written all at once when the threshold percentage is reached. Of course, the needs of every shop will vary.

And yes, there are other buffer pool tuning options other than these 5  4 parameters, such as changing the size of the pool, specifying min/max size for BP expansion/contraction, or altering the page most often used to fine tune buffer pool operations.

All of the above thresholds can be changed using the -ALTER BUFFERPOOL command. 

Wednesday, June 01, 2016

Carefully Code Your DB2 LIKE Predicates

The LIKE predicate is a powerful but potentially problem-causing operator that can be used to great effect in your SQL development efforts.

The LIKE predicate searches for strings that match a specified pattern. Here is the definition from the DB2 SQL Guide:

Read syntax diagram
>>-match-expression--+-----+--LIKE--pattern-expression---------->
                     '-NOT-'                             

>--+---------------------------+-------------------------------><
   '-ESCAPE--escape-expression-'   

But what does this mean? Well, the match-expression is the string to be tested for conformity to the pattern specified in pattern-expression. You can use the underscore and the percent sign characters as wildcards in the pattern to indicate 1 (underscore) or many (percent sign) indeterminate characters. 

The ESCAPE clause is used when you want to actually search for one of the wildcard characters.
But I do not really want to get into explaining the basics of how LIKE works here. If you really need more details on LIKE I refer you to the appropriate IBM Knowledge Center details.

The Semantics of LIKE
What I do want to do today is to give some advice on LIKE usage. First of all, be careful in terms of how you use the wildcard characters (underscore and percent sign).  The '_' character requires a matching character and the '%' character does not.  This can produce interesting results. For example, the following two queries are not equivalent, though at first glance you might think they were:

SELECT *
FROM   cust
WHERE (lastname LIKE 'Peter_%');

SELECT *
FROM   cust
WHERE (lastname LIKE 'Peter%');

Both will match to 'Peterson' and 'Peters', but the first will not match to 'Peter' because of the underscore. 

The LIKE predicate offers a great deal of flexibility and power to your SQL statements. Using LIKE you can quickly retrieve data based on patterns and wildcards. However, some uses of LIKE can be confusing to implement appropriately—especially when LIKE is used with host variables.

Let’s assume that you need to create an application that retrieves customers by last name, but the supplied value for last name can be either the entire name or just the first few bytes of that name. In that case, the following query can suffice:

SELECT custno, firstname, lastname
FROM   cust
WHERE  lastname LIKE :host_variable;

In order for this to work, when you enter the value for host_variable be sure to append percent signs (“%”) to the end of the value. The percent sign specifies that DB2 should accept as a match any number of characters (including 0). This must be done programmatically. So, if the value entered is SM, the host_variable should contain “SM%%%%%%%%” and if the value entered is SMITH, the host_variable should contain “SMITH%%%%%”. Append as many percent signs as required to fill up the entire length of the host variable. Failure to do so will result in DB2 searching for blank spaces. Think about it—if you assign “SMITH%” to a 10-byte host variable, that host variable will think it should search for “SMITH%”, that is SMITH at the beginning, four blanks at the end, and anything in the middle.

So, for “SMITH%%%%%”, SMITH will be returned, but so will SMITHLY (or any name beginning with SMITH). There is no way to magically determine if what was entered is a complete name or just a portion thereof. If this is not acceptable, then a single query will not likely be feasible. Instead, you would have to ask the user to enter whether a full name or just a portion is being entered.

What About Performance?

Notwithstanding the semantic details, there are performance considerations to understand when using LIKE, too. It is a good practice to avoid using the LIKE predicate when the percentage sign (%) or the underscore (_) appears at the beginning of the comparison string because they prevent DB2 from using a matching index. 

The LIKE predicate can produce efficient results, however, when you use the wildcard characters at the end or in the middle of the comparison string, for example:

InefficientCan be efficient with index
LIKE '%NAME' LIKE 'NAME%'                         
LIKE '_NAME'          LIKE 'NA_ME'


DB2 will not use direct index lookup when a wildcard character is supplied as the first character of a LIKE predicate. At bind time, DB2 cannot determine when a host variable contains a wildcard character as the first character of a LIKE predicate. The optimizer therefore does not assume that an index cannot be used; rather, it indicates that an index might be used. At runtime, DB2 determines whether the index will be used based on the value supplied to the host variable. When a wildcard character is specified for the first character of a LIKE predicate, DB2 uses a non-matching index scan or a table space scan to satisfy the search.

Summary

The LIKE operator brings powerful search capabilities to your DB2 SQL queries. Be sure to understand its capabilities and to use it appropriately in your development efforts.

Tuesday, May 17, 2016

Come See Me at IDUG!

Just a quick blog post today to let all of my readers know my schedule for the IDUG DB2 Technical Conference being held next week in Austin, TX. With this information you can track me down in multiple places if you want to ask me a question, catch up, or just say “Hi!”

First of all, you can find me on Tuesday at the Expo Hall in the CorreLog booth (#300) from 12:00 to 1:00 and from 5:00 to 6:00. CorreLog will also be raffling off copies of my book, DB2 Developer’s Guide, to 4 lucky winners… so be sure to stop by. And chat with CorreLog about their SIEM and auditing solutions for DB2.

You should also make sure to attend my IDUG session titled “It’s Not Your Daddy’s DB2!” on Wednesday at 3:30 PM (session B13). The general idea of the session is that DB2 is changing and you should be changing with it. Over the course of the past few releases of DB2 for z/OS, IBM has added many features and capabilities that are transforming the platform. I’ll take a look at the big changes that have been introduced to DB2 including new SQL, universal table spaces, improved security, and more. The session also offers guidance on how to continue improving your DB2 environment to keep up with industry, technology and DBA trends circa 2016.

But that’s not all. On both Tuesday and Wednesday I will be co-presenting with SEG at their VSP sessions. On Tuesday at 1:00 PM (Session V02) I will be co-presenting with Ulf Heinrich on DB2 audit and compliance issues and how they impact your DB2 data and database assets. In particular, we will focus on database auditing, talking about what is needed, the various methods of auditing, and a new offering from SEG that you can use to run audit analytics against a long-term repository. And on Wednesday at 10:30 AM (Session V08) I will co-present with Roy Boxwell about DB2 release incompatibilities and how they impact your DB2 applications. And I hear that SEG will have some of my books to raffle off, too!

And there’s still more! On Tuesday night (6pm to 9pm) I will be speaking at a DataKinetics event with Colin Oakhill on the topic of SQL quality assurance. Be sure to stop by the DataKinetics booth (#509) to get an invitation to the Tuesday night event where you can hear me and others speak about SQL quality and performance… as well as enjoy a tasty snack or beverage.


So if you’ll be at IDUG and you want to find me, there is really no reason why you shouldn’t be able to track me down at one or more of these places… 

See you in Austin!