Thursday, March 17, 2016

Digital Transformation and DB2 for z/OS: It’s Not Your Daddy’s DB2!

If you are a DBA who has been using DB2 for z/OS for a while you should have noticed that we are not doing things the same way we used to. DB2 is changing and we should be changing with it. If you are still using DB2 the same way you did 10 or 20 years ago, then you are definitely not adhering to industry best practices!
The same trends that are driving the digital explosion are also changing DB2 and the traditional role of the DBA. We are storing more data and different types of data for longer periods of time and in different ways than we have in the past.
And DB2 for z/OS keeps changing to adopt and embrace modern data management requirements and techniques. Whether it is modernizing storage with universal table spaces, embracing unstructured data in LOBs, or expanding the SQL language with new and more functionality, today’s DB2 looks a lot different than it did yesterday. Indeed, it is different – it is not your daddy’s DB2.
I’ve been writing a series of blog posts for BMC about this topic under the title It’s Not Your Daddy’s DB2!  You can find the first three blog posts in this series here: 1 2 3
But you can also attend a live webinar that BMC is sponsoring where I will talk about these issues. You can learn about:
·        Trends that influence the size and complexity of your DB2 environment and how this impacts data management
·        How to adapt to new DB2 data types and structures
·        Best practices and technologies for managing DB2 in the digital age
·        And BMC will share its next generation technology for managing the new world of DB2 for z/OS.

Learn how digital transformation will change the way your DBAs manage critical business needs. Attend this webinar on March 30, 2016, at 12:00 pm CT.

Tuesday, March 15, 2016

The Most Misunderstood Features of DB2 – Part 6: Not Indexing

Welcome to our on-going series of misunderstood issues in the world of DB2 for z/OS. Today’s topic is indexing, or to be more precise: not indexing. What do I mean by that?

Well, I’ve heard folks say that you should always create at least one index on every DB2 table. And while I can empathize with that general notion, I can’t agree completely because of that word “always,” which almost always makes a sentence wrong.

Sure, there are several good arguments for creating indexes on DB2 tables. Perhaps the most important one is clustering. DB2 uses an index to control how data is clustered on disk. Only one index can be specified as the clustering index (because, of course, the data on disk can only be stored in one order, right!). If you do not specify a clustering index then DB2 will use the earliest created (oldest) index to cluster the data. So it is usually a good idea to create an index to guide clustering.

Another popular reason to create an index on a DB2 table is to enforce uniqueness for a UNIQUE constraint or PRIMARY KEY. The only way to enforce uniqueness on a DB2 column (or set of columns) is by using a unique index.

Of course, there are a lot of other good reasons to create indexes, most of them to improve query performance. But I contend that there are situations when it makes sense not to create any indexes at all.

So when does it make more sense not to build an index for a DB2 table?

Let's start by saying that most of the time you will want to build at least one - and probably multiple - indexes on each table that you create. Indexes are crucial for optimizing performance of SQL access. Without an index, queries must scan every row of the table to come up with a result. And that can be very slow.

Having said that, here are a few situations some times where it can make sense to have no indexes defined on a table:

When all (or most) accesses retrieve every row of the table. Because every row will be retrieved every time you want to use the table, an index (if used) would just add extra I/O and would diminish, not enhance performance. Though not extremely common, you may indeed come across such tables in your organization.
For a very small table with only a few pages of data and no primary key or uniqueness requirements. A very small table (perhaps 20 to 30 or so pages) might not need an index because simply reading all of the pages is very efficient already.
When performance doesn't matter and the table is only accessed very infrequently. But, hey, when do you ever have that type of requirement in the real world?

Other than for these circumstances, you will most likely want to build one or more indexes on each table, not only to optimize performance, but also to ensure uniqueness, to support referential integrity, and to drive data clustering.


Of course, indexes do not come without cost. Indexes take up disk space and adding a lot of indexes will consume disk space. An additional consideration is their impact in data modification. Although indexes speed up queries they degrade inserts and deletes, as well as any modification to indexed columns.

What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!

Friday, March 04, 2016

The Most Misunderstood Features of DB2 – Part 5: Choosing the Clustering Key

Hello everybody, and welcome to part 5 in my on-going series where we take a look at The Most Misunderstood Features of DB2. You can find the earlier installments in this series here (Part 1 Part 2 Part 3 Part 4).

Today’s topic, on choosing an appropriate clustering index, might be a bit confusing to some of you. So let’s start at the beginning and describe what clustering is.

What is Clustering?

In DB2 for z/OS, you use an index to indicate how DB2 should try to store table space data physically on disk. This is called clustering. A DB2 index is a clustering index if the CLUSTER keyword is specified when the index is created. Clustering causes inserted rows to be stored contiguously in sequence whenever possible. Additionally, when the table space is reorganized the data will be sequenced according to the clustering index. Since there can only be one physical sequence for data on disk, there can only be one clustering index per table.

For tables that do not specify APPEND YES and do not have hashing defined, if you do not specify a clustering index, DB2 chooses to cluster the data using the oldest existing index. Therefore, it is wise to explicitly specify a clustering index instead of letting DB2 decide because you can almost always choose better than the (basically random) choice DB2 makes.

Clustering of data is important because when the data is sequentially accessed, if the data exists on the same page (or range of pages) then I/O is minimized because fewer pages are read than if the data were spread throughout the entire table space on “random” pages. And fewer I/Os means improved performance.

So then, how should a DBA go about deciding how to cluster the data? Well, the best answer is to analyze the various ways in which the data is to be accessed. This means reviewing the actual SQL along with frequency and importance of the execution patterns. When the most important/frequently executed sequential accesses are identified, then you can choose to cluster the data according to that information. By following this approach then the most frequent and/or important sequential accesses will be optimized by clustering.

Of course, frequently the decision on how to cluster is made during the database design phase when few, if any, SQL statements or access patterns are readily available. So the DBA sometimes makes a knee-jerk decision to just cluster based on the primary key (PK) and be done with it. This is usually not a good idea.

Why is Clustering by PK not a very good idea?

Think about what we just said about sequentially accessing data. How often is data sequentially accessed by PK? Sometimes this might be the case, maybe if you are producing a report in order by PK, but even that example is not a very good one. Think about it. If you are producing a customer report you probably do not create it in CUSTNO order but probably in order by customer name (or something more interesting to humans than a key number).

So the best option is to figure out the most common and important sequential access patterns and cluster accordingly. But what if this information is not available? My assertion is that clustering by Foreign Key (FK), instead of by PK, makes a lot more sense. Usually a FK is on the many side of a one-to-many relationship. Think about what that means. When you join PK to FK there will be one PK row being joined to multiple FK rows. If those multiple FK rows are all on the same page (or on contiguous pages) due to clustering, then we can impact I/O.

The Bottom Line


The bottom line here is that you should think about (and even re-think) your general assumptions and why you make any database design decision. Sometimes your immediate gut reaction won’t be the right one. And it is always better to think with our heads instead of our guts, right?