The Db2 Portal Blog: 2016

Thursday, December 22, 2016

Happy Holidays 2016

It is that time of year again. When we all take some time to pause and spend time with our family, friends and loved ones. We take time off of work to celebrate and reflect on the past year... and to welcome the new year.

No matter what holiday tradition you celebrate, I wish you an enjoyable holiday season. Happy holidays, that is!

And we'll meet back here again in January 2017 to talk more about DB2 stuff!

Happy Holidays!

Thursday, November 24, 2016

Happy Thanksgiving

Today, in the United States of America, we celebrate Thanksgiving by gathering together with our loved ones and giving thanks for all we have. This includes celebrations with food, traditionally a big turkey dinner with mashed potatoes and gravy, as we watch a parade and football games...

With this in mind, I give thanks for all of the reader's of my blog, wherever you may be, and wish you a very happy day!

Tuesday, July 12, 2016

DB2 for z/OS Webinar: Who Did What to Which Data When?

Today's blog post is to let you know about an upcoming webinar (July 21, 2016) that I am conducting with CorreLog on the topic of database auditing for DB2 for z/OS.

Unless you have been living under a rock these past few years you will have heard at least something about the many data breaches, hacks, and security issues that have been prevalent in the news. These issues have resulted in numerous industry and governmental regulations that organizations must understand and comply with. Data professionals, in particular, need to be vigilant in the tools and techniques that are available for better protecting their company’s data, and tracking those that access it. One of the most important of these techniques is database auditing.

Database auditing provides a facility for tracking the use of database resources and authority. When auditing is enabled, each audited database operation produces an audit trail of information including information such as what database object was impacted, who performed the operation and when. The comprehensive audit trail of
database operations produced can be maintained over time to allow DBAs and auditors, as well as any authorized personnel, to perform in-depth analysis of access and modification patterns against data in the DBMS.

With this basic information on database auditing in mind, you should be able to readily see how auditing can help your organization answer questions like:

“Who accessed or changed critical data?” and
“When was the data actually changed?” and perhaps even
“What was the old content prior to the change?”

Your ability to answer such questions can make or break a compliance audit. Of course, these are just the high-level details. To delve more deeply into the important issues involving database auditing requires time and research... or you can attend our upcoming webinar and get up to speed quickly on the essentials you need to know!

The webinar, entitled Who Did What to Which Data When? will elaborate on database auditing details and issues such as the data breach trends and how costly they can be, an overview of pertinent regulations and their impact, and the various types of database auditing methods with their pros and cons... all with a specific focus on mainframe DB2. And there will also be an overview and demo of CorreLog's database auditing offering for DB2 for z/OS.

You can learn all of this and more on Thursday, July 21, 2016, at 11:00 am Eastern Daylight Time (EDT).

But you have to register to attend, so be sure to click here to register.

Friday, July 08, 2016

Good Old Partitioned Table Space Confusion

Partitioned table spaces have been with us in the world of DB2 forever, but even as we embark on moving to Universal table spaces, there still exists some confusion "out there" in terms of the behavior of partitioning...

For example, consider this question:

If a table is in a partitioned table space, say four partitions, for example, then to process the table in batch can I run four instances of the batch program in parallel, one against each partition?

This type of question exhibits some confusion about how partitioned table spaces work. First of all, let's assume that we are talking about classic partitioned table spaces and/or Universal range-partitioned table spaces.

Given that assumption, yes, you can run four instances of a batch program in parallel if you so desire. Of course, that is not necessary to get DB2 use parallel tasks to read the data. The best approach is to BIND the program (package) specifying the DEGREE(ANY) parameter. Then DB2 will choose the degree of parallelism for the batch program. This is simpler than any alternative approach you might choose to code up because DB2 handles it all for you.

If you still wish to run four instances of the batch program you, of course, can. In that case you would probably want to BIND using DEGREE(1). In order for this to work the way I infer that you intend it to work, however, you may have to modify the program somewhat. I assume that you wish each job to process only against one of the four partitions. To accomplish this, you must provide some way for the program to identify and process only the data from one of the four partitions based on the partitioning key range specified in the partitioning index.

For example, you might choose to input parameters to the batch run specifying the key range for that program to process. As long as the program adheres to that key range you should only process data from the one partition that holds that data.

Of course, that means that your programmers must be diligent in applying the key range. Perhaps you could set up 4 views that apply the key range for each partition and then use only those views in each program. But that means you will have multiple versions of the same program. You could also specify the range as an input variable and then use only one program. But that means you have to make sure that you are sending the proper range to the input variables.

So just go with the first approach and let DB2 do it for you when it makes sense...

Tuesday, June 14, 2016

Four Important Buffer Pool Tuning Knobs in DB2 for z/OS

DB2 has five (well, four current) primary adjustable thresholds that can be modified using the ALTER BUFFERPOOL command.

These thresholds are as follows:

The Sequential Steal Threshold, or VPSEQT, is the percentage of the buffer pool that can be occupied by sequentially accessed pages. For example, at the default value of 80, when this threshold is reached, 80% of the buffer pool represents pages for sequential processing. Of course, 80% is just the default; you can modify this value based on your processing needs to any value ranging from 0 to 100. When this threshold is reached, DB2 will steal a sequential page first before stealing a page from a randomly accessed page. So, for data that is accessed mostly sequentially (for example, through scans and prefetching) consider increasing the value of this parameter, and for data that is accessed most randomly, consider decreasing the value of this parameter. A VPSEQT value of zero will prevent any sequential pages from lingering in the buffer pool and it will turn off sequential prefetch. A VPSEQT value of 100 allows the entire buffer pool to be monopolized by sequential pages.

The next tunable buffer pool threshold is the Parallel Sequential Threshold, or VPPSEQT. This threshold indicates the amount of the buffer pool that can be consumed by sequentially accessed data for parallel queries. When this threshold is reached, DB2 will cease to steal random pages to store sequential pages accessed by parallel queries. The default value for VPPSEQT is 50%, indicating its size as 50% of the sequential steal threshold (VPSEQT). For example, if the buffer pool is defined as 1000 pages and VPSEQT is set at 80%, a query using I/O parallelism can consume up to 400 sequential pages (that is, 1000 x 80% = 800 for the sequential steal threshold and 800 x 50% = 400 for the parallel sequential threshold).

The third, and final sequential threshold is the Assisting Parallel Sequential Threshold (or VPXPSEQT). This threshold is no longer supported as of DB2 11 because Sysplex Parallelism is no longer supported. When it was available, VPXPSEQT was used to indicate the portion of the buffer pool that might be used to assist with parallel operations initiated from another DB2 in the data sharing group.

The final two modifiable DB2 buffer pool thresholds are used to indicate when modified data is to be written from the buffer pool to disk. Log data is externalized when a COMMIT is taken, but writing of the actual data itself is controlled by the two deferred write thresholds (and DB2 system checkpoints).

First we have the Deferred Write Threshold (or DWQT). When DWQT is reached, DB2 starts scheduling write I/Os to externalize the data pages to disk. By default, the deferred write threshold is reached when 30% of the buffer pool is allocated to unavailable pages, whether updated or in use. The default is probably too high for most larger buffer pools.

DB2 also provides the Vertical Deferred Write Threshold (VDWQT), which is basically the same as DWQT but for a single page set. By default, VDWQT is reached when 5% of the buffer pool is allocated to one data set. When reached, DB2 will start scheduling write I/Os to externalize the data pages to disk. Once again, this default is most likely too high for most shops.

The VDWQT threshold can be specified as a percentage of the buffer pool, or as an absolute number of buffers. When you want to specify a relatively low threshold for VDWQT using an absolute number of buffers provides better granularity and control.

In general, consider ratcheting the deferred write thresholds down to smaller percentages (from the defaults) for most of your buffer pools. Doing so enables “trickle” write from the DB2 buffer pools. This means that the data is written asynchronously to disk regularly over time in smaller amounts, instead of storing up a lot of modified data that has to be written all at once when the threshold percentage is reached. Of course, the needs of every shop will vary.

And yes, there are other buffer pool tuning options other than these 5 4 parameters, such as changing the size of the pool, specifying min/max size for BP expansion/contraction, or altering the page most often used to fine tune buffer pool operations.

All of the above thresholds can be changed using the -ALTER BUFFERPOOL command.

Wednesday, June 01, 2016

Carefully Code Your DB2 LIKE Predicates

The LIKE predicate is a powerful but potentially problem-causing operator that can be used to great effect in your SQL development efforts.

The LIKE predicate searches for strings that match a specified pattern. Here is the definition from the DB2 SQL Guide:


>>-match-expression--+-----+--LIKE--pattern-expression---------->
                     '-NOT-'                             

>--+---------------------------+-------------------------------><
   '-ESCAPE--escape-expression-'

But what does this mean? Well, the match-expression is the string to be tested for conformity to the pattern specified in pattern-expression. You can use the underscore and the percent sign characters as wildcards in the pattern to indicate 1 (underscore) or many (percent sign) indeterminate characters.

The ESCAPE clause is used when you want to actually search for one of the wildcard characters.
But I do not really want to get into explaining the basics of how LIKE works here. If you really need more details on LIKE I refer you to the appropriate IBM Knowledge Center details.

The Semantics of LIKE
What I do want to do today is to give some advice on LIKE usage. First of all, be careful in terms of how you use the wildcard characters (underscore and percent sign). The '_' character requires a matching character and the '%' character does not. This can produce interesting results. For example, the following two queries are not equivalent, though at first glance you might think they were:

SELECT *

FROM cust

WHERE (lastname LIKE 'Peter_%');

SELECT *

FROM cust

WHERE (lastname LIKE 'Peter%');

Both will match to 'Peterson' and 'Peters', but the first will not match to 'Peter' because of the underscore.

The LIKE predicate offers a great deal of flexibility and power to your SQL statements. Using LIKE you can quickly retrieve data based on patterns and wildcards. However, some uses of LIKE can be confusing to implement appropriately—especially when LIKE is used with host variables.

Let’s assume that you need to create an application that retrieves customers by last name, but the supplied value for last name can be either the entire name or just the first few bytes of that name. In that case, the following query can suffice:

SELECT custno, firstname, lastname

FROM cust

WHERE lastname LIKE :host_variable;

In order for this to work, when you enter the value for host_variable be sure to append percent signs (“%”) to the end of the value. The percent sign specifies that DB2 should accept as a match any number of characters (including 0). This must be done programmatically. So, if the value entered is SM, the host_variable should contain “SM%%%%%%%%” and if the value entered is SMITH, the host_variable should contain “SMITH%%%%%”. Append as many percent signs as required to fill up the entire length of the host variable. Failure to do so will result in DB2 searching for blank spaces. Think about it—if you assign “SMITH%” to a 10-byte host variable, that host variable will think it should search for “SMITH%”, that is SMITH at the beginning, four blanks at the end, and anything in the middle.

So, for “SMITH%%%%%”, SMITH will be returned, but so will SMITHLY (or any name beginning with SMITH). There is no way to magically determine if what was entered is a complete name or just a portion thereof. If this is not acceptable, then a single query will not likely be feasible. Instead, you would have to ask the user to enter whether a full name or just a portion is being entered.

What About Performance?

Notwithstanding the semantic details, there are performance considerations to understand when using LIKE, too. It is a good practice to avoid using the LIKE predicate when the percentage sign (%) or the underscore (_) appears at the beginning of the comparison string because they prevent DB2 from using a matching index.

The LIKE predicate can produce efficient results, however, when you use the wildcard characters at the end or in the middle of the comparison string, for example:

Inefficient	Can be efficient with index
LIKE '%NAME'	LIKE 'NAME%'
LIKE '_NAME'	LIKE 'NA_ME'

DB2 will not use direct index lookup when a wildcard character is supplied as the first character of a LIKE predicate. At bind time, DB2 cannot determine when a host variable contains a wildcard character as the first character of a LIKE predicate. The optimizer therefore does not assume that an index cannot be used; rather, it indicates that an index might be used. At runtime, DB2 determines whether the index will be used based on the value supplied to the host variable. When a wildcard character is specified for the first character of a LIKE predicate, DB2 uses a non-matching index scan or a table space scan to satisfy the search.

Summary

The LIKE operator brings powerful search capabilities to your DB2 SQL queries. Be sure to understand its capabilities and to use it appropriately in your development efforts.

Tuesday, May 17, 2016

Come See Me at IDUG!

Just a quick blog post today to let all of my readers know my schedule for the IDUG DB2 Technical Conference being held next week in Austin, TX. With this information you can track me down in multiple places if you want to ask me a question, catch up, or just say “Hi!”

First of all, you can find me on Tuesday at the Expo Hall in the CorreLog booth (#300) from 12:00 to 1:00 and from 5:00 to 6:00. CorreLog will also be raffling off copies of my book, DB2 Developer’s Guide, to 4 lucky winners… so be sure to stop by. And chat with CorreLog about their SIEM and auditing solutions for DB2.

You should also make sure to attend my IDUG session titled “It’s Not Your Daddy’s DB2!” on Wednesday at 3:30 PM (session B13). The general idea of the session is that DB2 is changing and you should be changing with it. Over the course of the past few releases of DB2 for z/OS, IBM has added many features and capabilities that are transforming the platform. I’ll take a look at the big changes that have been introduced to DB2 including new SQL, universal table spaces, improved security, and more. The session also offers guidance on how to continue improving your DB2 environment to keep up with industry, technology and DBA trends circa 2016.

But that’s not all. On both Tuesday and Wednesday I will be co-presenting with SEG at their VSP sessions. On Tuesday at 1:00 PM (Session V02) I will be co-presenting with Ulf Heinrich on DB2 audit and compliance issues and how they impact your DB2 data and database assets. In particular, we will focus on database auditing, talking about what is needed, the various methods of auditing, and a new offering from SEG that you can use to run audit analytics against a long-term repository. And on Wednesday at 10:30 AM (Session V08) I will co-present with Roy Boxwell about DB2 release incompatibilities and how they impact your DB2 applications. And I hear that SEG will have some of my books to raffle off, too!

And there’s still more! On Tuesday night (6pm to 9pm) I will be speaking at a DataKinetics event with Colin Oakhill on the topic of SQL quality assurance. Be sure to stop by the DataKinetics booth (#509) to get an invitation to the Tuesday night event where you can hear me and others speak about SQL quality and performance… as well as enjoy a tasty snack or beverage.

So if you’ll be at IDUG and you want to find me, there is really no reason why you shouldn’t be able to track me down at one or more of these places…

See you in Austin!

Thursday, May 05, 2016

I'm Co-Presenting with a Couple of DB2 Experts at IDUG

I have the absolute pleasure of being able to co-present on a couple of great DB2 topics with two very knowledgeable and great speakers at this year's North American IDUG conference in Austin, TX.

The first session is on Tuesday, May 24, 2016 at 1:00 PM (Session V02). In this session I will be co-presenting with Ulf Heinrich, the Director of Solutions Delivery at Software Engineering GmbH. Many of you know Ulf from his many technical DB2 presentation at past IDUG events and regional DB2 user groups.

We will be talking about audit and compliance issues and how they impact your DB2 data and database assets. In particular, we will focus on database auditing, talking about what is needed, the various methods of auditing, and a new offering from SEG that you can use to run audit analytics against a long-term repository. If you need to be able to pinpoint who executed a query, when and from where, across your entire DB2 environment (and who doesn't?) then don't miss this presentation.

But that is not all... I also get the opportunity to co-present with Roy Boxwell. Many of you know Roy, too, from his many IDUG presentations and his valuable contributions to the DB2-L list server. Roy is a Senior Software Architect for DB2 product development at SEGUS Inc. He has been working for more than 30 years in the mainframe world, with 26 of these years strictly focused on the development of DB2 solutions in the realm of installation, migration, performance monitoring, and tuning.

Roy and I will be talking about DB2 release incompatibilities and how they impact your DB2 applications in our presentation on Wednesday, May 25, 2016 entitled Don't Let ICIs Put Your DB2 Application in the ICU!

This session will explain what an incompatible change is, offer assistance in how to identigy them and explain their potential impact on your applications. We'll also offer guidance on how to tackle the whole experience and learn how SEG’s Workload Expert technology makes it easier to manage and control these incompatible changes.

So, if you are going to IDUG this May in Austin, be sure to jot down the dates and times of these sessions so you don't miss them... and we'll see you there!

More details on the sessions can be found here on the SEG web site...

Wednesday, April 20, 2016

IDUG is Coming to the Lone Star State

The weather is improving, Spring has sprung and that means it is, once again, time to start planning your trip to this year's IDUG North American DB2 Tech Conference. This year it is being held in Austin, TX at the Renaissance Austin Hotel. This is a very nice area and the hotel is great – I’ve stayed there numerous times in the past.

If you’ve ever attended an IDUG conference before then you know all about the fantastic educational and entertainment opportunities that IDUG offers. Of course there will be a ton of informative technical sessions on all of the latest and greatest DB2 technologies and features. The presentations vary in length from an hour to a half day to complete full day training sessions.

And if you are looking to learn something completely new, but data-related, this year there is even a special opportunity to learn about Spark technology on Wednesday.

But IDUG is not just a bunch of presentations… it is so much more. There are countless networking opportunities to meet and talk with your peers, IBM developers, and industry consultants. And let’s not forget about the vendor exhibit hall where you can talk to the ISVs and learn all the about software and tools that can help you get the most out of DB2... not to mention picking up a few tchotchkes and mementos.

Additionally, there are Special Interest Groups (where you can discuss the latest industry trends and topics with other interested technicians), Hands-On Labs (delivering working training led by IBM), and the opportunity to take complementary IBM certification exams.

I’ll be there, too. So if you're going to IDUG be sure to find me and say “Hello.” I'll be delivering talking about the changing state of DB2 for z/OS in my session: It’s Not Your Daddy’s DB2! And I’ll also be talking at the VSPs this year… 2 of them. I'll be co-presenting with SEG on DB2 auditing and also on managing DB2 incompatibilities. So be sure to attend these informative sessions!

I am also planning to present at a special evening meeting being held by DKL. I’ll talk about SQL quality assurance and performance… but there will also be libations and snacks, too. So hunt us down on Tuesday night… stop by the DKL booth for an invitation.

The IDUG DB2 Tech Conference is the place to be to learn all about DB2 from IBMers, gold consultants, IBM champions, end users, and more. With all of this great stuff going on in Austin this May, why wouldn't you want to be there!?!?

Monday, April 18, 2016

Don’t let ICIs put your DB2 application in the ICU!

Those of you who have been paying close attention have probably already noticed, or indeed encountered, incompatibility issues with how DB2 for z/OS behaves from version to version. Oh, sure, we all know that there have been deprecated features, and we deal with those over the long periods of time it takes for the features to be completely removed from DB2. Sure, we don't like it, but it is not really a huge problem to manage.

But there are other DB2 incompatibilities that are more troublesome to manage. You see, over the course of the past several releases of DB2 for z/OS, IBM has been making changes that can modify the behavior of your DB2 application programs. Code that worked one way works differently after migration. The number of incompatible changes being introduced by IBM started off slowly in DB2 9 but has grown to a significant number today. So much so that there IBM has introduced ICIs, or incompatible change indicators that can be traced using IFCIDs.

Additionally, there are ways to repress these changes from occurring, but it is not quite as simple as that. What ever is that simple, right?

Well, if you are at all interested in learning more about DB2 incompatibilities and how to manage them, join me and Roy Boxwell for our webinar, Don’t let ICIs put your DB2 application in the ICU!

We will discuss the issues, how they impact your applications, and offer up some guidance on how to get your arms around the problem. And Roy will show us how SEG's Workload Expert technology can be used to make it easier to track these issues, as well as to manage and control their impact on your DB2 applications.

So register today and join us in this informative webinar scheduled for April 28 2016 at 1:00 PM Eastern time.

This session will discuss the incompatible changes, their potential impact on your applications, as well as provide guidance on how to tackle the whole experience. And you’ll also see how SEGUS’s Workload Expert technology makes it easier to manage and control these incompatible changes. With all of this information at your disposal, you can make sure that your DB2 applications do not wind up in the ICU (intensive care unit)!

Monday, April 11, 2016

The Most Misunderstood Features of DB2 - Part 8: Do I have to pick just one?

There are so many misunderstood and misused aspects of DB2 for z/OS that it would be hard to choose only one. This series of blog posts, which concludes today with this one, has covered a wide variety of topics over 7 installments. So it would seem that there is no lack of misunderstanding out there!

So, if I had to pick just one thing as the most misunderstood thing, I think I’m going to have to cheat and say that the biggest problem in DB2-land is an overall pervasive lack of knowledge of what DB2 can do. Who among us can really say that we know everything there is to know about every last feature and function that our favorite DBMS supports?

Now don’t get me wrong? I am not trying to indict the technicians using DB2. Richard Saul Wurman wrote a book called Information Anxiety that details what I think many of us are feeling, which is a general uneasiness at being overwhelmed with a large amount of facts, figures and data that can be hard to master.

This situation has come about because DBMSs are becoming large and unwieldy as they adopt functionality that previously was performed outside of the database environment. This is certainly that case with DB2 for z/OS which has adding a lot of new SQL functionality, new types of table spaces, temporal support, IDAA, and on and on and on in just the past few releases. There is simply too much functionality in DB2 for anyone to be an expert on all of it (well, with this possible exception of some of those IBM developers).

If you’ve missed any of the previous posts in this series, here are links to each of the 7 prior installments in this series:

And now, with all of these misunderstandings out of the way, maybe we can focus on more understanding in upcoming blog posts!

Friday, April 08, 2016

Happy Birthday Mainframe!

Fifty Two Years Ago This Week

On April 7, 1964, IBM announced "a new generation of electronic computing equipment" called the IBM System/360... or as it, and subsequent generations of the machine have become known, the mainframe!

If you'd like to take a walk down memory lane you can read the original press release on the IBM web site.

The System/360, unquestionably, was one of the most significant products in the history of computing. The general, sound concepts of the System/360 are still the at the foundation of modern mainframe. Sure, a lot has changed, but those changes were made to a solid, fundamentally sound base.

Code that ran on the System/360 can still be run on the latest and greatest modern IBM mainframe, the z13. What other platform can say that? I mean, when my desktop was recently force upgraded from Windows 7 to WIndows 10 some of my software stopped running!

So let's take a moment and salute the mainframe on its 52nd birthday... and wish it many more years of productive use.

Happy birthday, mainframe!

Monday, April 04, 2016

The Most Misunderstood Features of DB2 – Part 7: It Depends!

"It depends" is probably the most famous phrase out there when it comes to DB2 performance. Some call it the cardinal rule. You can answer just about any question that anybody asks at any time with it. Try it out. It depends! Always works... that is, it always works if you are trying to avoid answering the question!

Most DBAs and SQL experts resist giving a straight or simple answer to a general question because there is no simple and standard implementation that exists. Every situation is different, and every organization is unique in some way. So answering "it depends" to most questions can make a lot of sense. But "it depends" should never be the end of the answser!

Don’t be discouraged when you ask the local expert which statement will perform better, and the answer is “It depends.” The expert is just doing his or her job. The secret to optimizing DB2 performance is being able to answer the follow-up question to “It depends”—and that is “What does it depend on?”

The key to effective SQL performance tuning is to document each SQL change along with the reason for the change. Follow up by monitoring the effectiveness of every change to your SQL statements before moving them into a production environment. Over time, trends will emerge that will help to clarify which types of SQL formulations perform best.

So the misunderstanding in this case is thinking that "it depends" is a complete answer to any type of question. It isn't... it is just the beginning of most answers. But it takes time, experience, and study to be able to answer what it depends upon. And that is why you might not get that answer unless you press for it... and attempt to find the answer yourself without just leaning on others for the answer all the time.

There is a corollary to the “It depends” rule that also is important. Hard and fast rules that rigidly enforce or forbid usage are not a good idea. In other words, this corollary is simply stated: “Almost never say always or never.” Notice that even the wording of this corollary embraces flexibility.

So be flexible, but embrace answers that help... because "it depends" helps nobody unless you tell them what it depends upon!

Thursday, March 17, 2016

Digital Transformation and DB2 for z/OS: It’s Not Your Daddy’s DB2!

If you are a DBA who has been using DB2 for z/OS for a while you should have noticed that we are not doing things the same way we used to. DB2 is changing and we should be changing with it. If you are still using DB2 the same way you did 10 or 20 years ago, then you are definitely not adhering to industry best practices!

The same trends that are driving the digital explosion are also changing DB2 and the traditional role of the DBA. We are storing more data and different types of data for longer periods of time and in different ways than we have in the past.

And DB2 for z/OS keeps changing to adopt and embrace modern data management requirements and techniques. Whether it is modernizing storage with universal table spaces, embracing unstructured data in LOBs, or expanding the SQL language with new and more functionality, today’s DB2 looks a lot different than it did yesterday. Indeed, it is different – it is not your daddy’s DB2.

I’ve been writing a series of blog posts for BMC about this topic under the title It’s Not Your Daddy’s DB2! You can find the first three blog posts in this series here: 1 2 3

But you can also attend a live webinar that BMC is sponsoring where I will talk about these issues. You can learn about:

· Trends that influence the size and complexity of your DB2 environment and how this impacts data management

· How to adapt to new DB2 data types and structures

· Best practices and technologies for managing DB2 in the digital age

· And BMC will share its next generation technology for managing the new world of DB2 for z/OS.

Learn how digital transformation will change the way your DBAs manage critical business needs. Attend this webinar on March 30, 2016, at 12:00 pm CT.

Tuesday, March 15, 2016

The Most Misunderstood Features of DB2 – Part 6: Not Indexing

Welcome to our on-going series of misunderstood issues in the world of DB2 for z/OS. Today’s topic is indexing, or to be more precise: not indexing. What do I mean by that?

Well, I’ve heard folks say that you should always create at least one index on every DB2 table. And while I can empathize with that general notion, I can’t agree completely because of that word “always,” which almost always makes a sentence wrong.

Sure, there are several good arguments for creating indexes on DB2 tables. Perhaps the most important one is clustering. DB2 uses an index to control how data is clustered on disk. Only one index can be specified as the clustering index (because, of course, the data on disk can only be stored in one order, right!). If you do not specify a clustering index then DB2 will use the earliest created (oldest) index to cluster the data. So it is usually a good idea to create an index to guide clustering.

Another popular reason to create an index on a DB2 table is to enforce uniqueness for a UNIQUE constraint or PRIMARY KEY. The only way to enforce uniqueness on a DB2 column (or set of columns) is by using a unique index.

Of course, there are a lot of other good reasons to create indexes, most of them to improve query performance. But I contend that there are situations when it makes sense not to create any indexes at all.

So when does it make more sense not to build an index for a DB2 table?

Let's start by saying that most of the time you will want to build at least one - and probably multiple - indexes on each table that you create. Indexes are crucial for optimizing performance of SQL access. Without an index, queries must scan every row of the table to come up with a result. And that can be very slow.

Having said that, here are a few situations some times where it can make sense to have no indexes defined on a table:

When all (or most) accesses retrieve every row of the table. Because every row will be retrieved every time you want to use the table, an index (if used) would just add extra I/O and would diminish, not enhance performance. Though not extremely common, you may indeed come across such tables in your organization.

For a very small table with only a few pages of data and no primary key or uniqueness requirements. A very small table (perhaps 20 to 30 or so pages) might not need an index because simply reading all of the pages is very efficient already.

When performance doesn't matter and the table is only accessed very infrequently. But, hey, when do you ever have that type of requirement in the real world?

Other than for these circumstances, you will most likely want to build one or more indexes on each table, not only to optimize performance, but also to ensure uniqueness, to support referential integrity, and to drive data clustering.

Of course, indexes do not come without cost. Indexes take up disk space and adding a lot of indexes will consume disk space. An additional consideration is their impact in data modification. Although indexes speed up queries they degrade inserts and deletes, as well as any modification to indexed columns.

What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!

Friday, March 04, 2016

The Most Misunderstood Features of DB2 – Part 5: Choosing the Clustering Key

Hello everybody, and welcome to part 5 in my on-going series where we take a look at The Most Misunderstood Features of DB2. You can find the earlier installments in this series here (Part 1 Part 2 Part 3 Part 4).

Today’s topic, on choosing an appropriate clustering index, might be a bit confusing to some of you. So let’s start at the beginning and describe what clustering is.

What is Clustering?

In DB2 for z/OS, you use an index to indicate how DB2 should try to store table space data physically on disk. This is called clustering. A DB2 index is a clustering index if the CLUSTER keyword is specified when the index is created. Clustering causes inserted rows to be stored contiguously in sequence whenever possible. Additionally, when the table space is reorganized the data will be sequenced according to the clustering index. Since there can only be one physical sequence for data on disk, there can only be one clustering index per table.

For tables that do not specify APPEND YES and do not have hashing defined, if you do not specify a clustering index, DB2 chooses to cluster the data using the oldest existing index. Therefore, it is wise to explicitly specify a clustering index instead of letting DB2 decide because you can almost always choose better than the (basically random) choice DB2 makes.

Clustering of data is important because when the data is sequentially accessed, if the data exists on the same page (or range of pages) then I/O is minimized because fewer pages are read than if the data were spread throughout the entire table space on “random” pages. And fewer I/Os means improved performance.

So then, how should a DBA go about deciding how to cluster the data? Well, the best answer is to analyze the various ways in which the data is to be accessed. This means reviewing the actual SQL along with frequency and importance of the execution patterns. When the most important/frequently executed sequential accesses are identified, then you can choose to cluster the data according to that information. By following this approach then the most frequent and/or important sequential accesses will be optimized by clustering.

Of course, frequently the decision on how to cluster is made during the database design phase when few, if any, SQL statements or access patterns are readily available. So the DBA sometimes makes a knee-jerk decision to just cluster based on the primary key (PK) and be done with it. This is usually not a good idea.

Why is Clustering by PK not a very good idea?

Think about what we just said about sequentially accessing data. How often is data sequentially accessed by PK? Sometimes this might be the case, maybe if you are producing a report in order by PK, but even that example is not a very good one. Think about it. If you are producing a customer report you probably do not create it in CUSTNO order but probably in order by customer name (or something more interesting to humans than a key number).

So the best option is to figure out the most common and important sequential access patterns and cluster accordingly. But what if this information is not available? My assertion is that clustering by Foreign Key (FK), instead of by PK, makes a lot more sense. Usually a FK is on the many side of a one-to-many relationship. Think about what that means. When you join PK to FK there will be one PK row being joined to multiple FK rows. If those multiple FK rows are all on the same page (or on contiguous pages) due to clustering, then we can impact I/O.

The Bottom Line

The bottom line here is that you should think about (and even re-think) your general assumptions and why you make any database design decision. Sometimes your immediate gut reaction won’t be the right one. And it is always better to think with our heads instead of our guts, right?

Wednesday, February 17, 2016

The Most Misunderstood Features of DB2 – Part 4: Base Table Views

Welcome to my continuing series focusing on The Most Misunderstood Features of DB2. Today’s topic is an oldie but a goodie: base table views.

What is a Base Table View?

I am going to assume that regular readers of this blog know what a view is (if not, go here). So what is a base table view? It is a view that contains all of the columns and all of the rows of the base table and nothing else. In other words, it is like a SELECT * against the base table with no WHERE clauses.

Now why would somebody create such a beast? This gets to the heart of the misunderstanding here. There is a continuing notion “out there” that it is a good idea to create a base table view for every table and to give programmers access to the base table view only, not the base table itself.

The reason given for doing so is to insulate programs from change. But this is a weak argument at best. I admit that this particular misunderstanding is less prevalent than it was in the earlier years of DB2. But I do still find shops adhering to this archaic, and poorly founded, idea.

Reasons Given for Base Table Views

One reason given by base table view proponents is that when you add a column to a table you do not have to change the program. But this is the case, too, if you code your programs appropriately, avoiding SELECT * and coding only the specific columns you need to access. Most shops do this even when using base table views.

Other reasons given revolve around removing columns or splitting tables. If you are interested in all of the reasons and rebuttals I urge you to read my article (PDF) on this topic from 1991 titled One View Per Base Table? Don’tDo It! (Wow! This is a quarter of a Century old now!)

Another View Fallacy

I recall yet another misunderstanding about views from many years ago that I have not heard in a long time. Basically, the argument was that accessing a base table view would out-perform accessing the base table.

This was never proven to me – and I do not believe it was ever true. How could it be? Static SQL against a base table view would use view merge so it would be the same as going against the base table. And dynamic SQL would have the added step of having to resolve the view to the base table. Sooo…

Summary

Base table views are not worth the time and effort because the bring no added value of any type whatsoever. Do not use them.