Showing posts with label LOBs. Show all posts
Showing posts with label LOBs. Show all posts

Thursday, March 17, 2016

Digital Transformation and DB2 for z/OS: It’s Not Your Daddy’s DB2!

If you are a DBA who has been using DB2 for z/OS for a while you should have noticed that we are not doing things the same way we used to. DB2 is changing and we should be changing with it. If you are still using DB2 the same way you did 10 or 20 years ago, then you are definitely not adhering to industry best practices!
The same trends that are driving the digital explosion are also changing DB2 and the traditional role of the DBA. We are storing more data and different types of data for longer periods of time and in different ways than we have in the past.
And DB2 for z/OS keeps changing to adopt and embrace modern data management requirements and techniques. Whether it is modernizing storage with universal table spaces, embracing unstructured data in LOBs, or expanding the SQL language with new and more functionality, today’s DB2 looks a lot different than it did yesterday. Indeed, it is different – it is not your daddy’s DB2.
I’ve been writing a series of blog posts for BMC about this topic under the title It’s Not Your Daddy’s DB2!  You can find the first three blog posts in this series here: 1 2 3
But you can also attend a live webinar that BMC is sponsoring where I will talk about these issues. You can learn about:
·        Trends that influence the size and complexity of your DB2 environment and how this impacts data management
·        How to adapt to new DB2 data types and structures
·        Best practices and technologies for managing DB2 in the digital age
·        And BMC will share its next generation technology for managing the new world of DB2 for z/OS.

Learn how digital transformation will change the way your DBAs manage critical business needs. Attend this webinar on March 30, 2016, at 12:00 pm CT.

Monday, April 21, 2014

A Little Bit About LOBs

In today's blog post we will take a brief look at LOBs, or Large OBjects, in DB2. I have been prepping for my webinar later this week, titled Bringing Big Data to DB2 for z/OS with LOBs: Understanding, Using, and Managing DB2 LOBsBe sure to click on the link for that and join me on April 24, 2014 for that webinar!

But back to the topic du jour... LOBs. Let's start with a bit of history. LOBs were added to relational products like DB2, back in the 1990s, ostensibly to compete more capably against the Object-Oriented databases of the time. Remember them? Back then it seemed that everybody thought OO DBMS products would supplant relational technology. Seems somewhat quaint now, doesn't it?

At any rate, the idea was to enable relational tables to house unstructured data like images and audio and video data. DB2 for z/OS users were slow to adopt LOBs in their mainframe databases. I think that is due to several different reasons, not the least of which is that it took several new versions of DB2 for LOBs to mature into capable, usable things for enterprise applications. Early implementations of LOBs in DB2 were somewhat kludgy and difficult to administer and use. But IBM has corrected many of the deficiencies over time and there are now tools that can help us to effectively manage DB2 LOBs, too.

The other new force driving LOB usage is the whole Big Data movement. Big Data is a force that is driving organizations to accumulate and analyze more data, and more varied types of data, to gain business insight. The most common definition of Big Data was coined by Forrester Research defining big data in terms of “The 4 V’s” -- volume, velocity, variety, variability. But that is somewhat limiting. However, I do not want to turn this posting into a definition of big data, so... let's just think of Big Data as MORE DATA, MORE TYPES OF DATA, and FASTER GENERATION OF DATA. One example of Big Data driving the usage of LOBs in DB2 is the JSON support that has been added to DB2. JSON objects are stored in DB2 as BLOBs. 

So LOBs have been around for awhile now, but it has taken some time for them to gain significant levels of usage "out there!" 

Why LOBs, you may ask? Why not just store the data in regular old DB2 data types like VARCHAR or VARGRAPHIC? The basic answer is that DB2's data types were not large enough to hold this amount of data, because of their limit of 32 KB. Some multimedia data can get VERY large. For example, a high resolution video requires about 3 GB for each hour. And high-def TV video requires 720 GB/hour!

But an additional part of the answer is that you do not always want to access the LOB data with the traditional data. For example, if you are querying employee information you probably don't always want their photo or their entire resume... so the data can be stored in a LOB, external from the rest of the data (so as not to impede performance) but connected to the data (when you need to access it).

There are three types of LOBs supported by DB2:
1. BLOB – Binary Large Object – for binary data
2. CLOB – Character Large Object – for text data
3. DBCLOB – Double Byte Character Large Object – for graphic character data

Furthermore, there are two options for specifying LOBs in DB2 tables:

  1. The traditional way to define LOBs where the LOB data is stored separately from the rest of the data. This requires defining additional database objects.
  2. A newer method, for smaller LOBs, introduced with DB2 10, called Inline LOBs.

Using the traditional method, LOBs are defined such that they are stored external to the base table. The column is specified as a BLOB, CLOB or DBCLOB in the base table, but DB2 stores the actual LOB data in an auxiliary table in a LOB table space. You must define a LOB table space per LOB per partition. So, let’s say we are creating a table with 2 LOBs in a table space with ten (10) partitions. That means we will need to define 20 LOB table spaces – 10 for the first LOB (one per partition) and 10 for the second LOB (one for each partition). An auxiliary table is defined for each LOB table space to hold the LOB data. And an auxiliary index is required on each auxiliary table. 

Now back to the base table. Remember that we have defined the LOB columns in the base table. But we also need to include a ROWID column in the base table. Only one ROWID column is needed per base table no matter how many LOBs you have defined in the table.


Don’t confuse the ROWID with other concepts that may seem to be similar. A ROWID is not an identity column nor is it a SEQUENCE. The ROWID is used by DB2 behind the scenes to connect the base table row to the auxiliary table row that contains the LOB data. You generally will not access the ROWID, although it is possible to do so just like any other column using SQL. With the ROWID you can perform direct row access. This is so because the row ID value implicitly contains the location of the row. 

Finally, each LOB column also has a 2 byte version number associated with it. This is used by DB2 when LOB data is modified. You cannot access the version number.

Now as of DB2 10 for z/OS, you can create inline LOBs in your DB2 tables. But what is an inline LOB? Well, at a high level, the name is somewhat self-explanatory. The LOB data for inline LOBs is stored with the rest of the base table.

But it is really a bit more nuanced that that. You can store part of the LOB inline with the rest of the data and the rest of the LOB externally, if you so choose. So an inline LOB can have all of the LOB data stored inline with the rest of the data (if it is small enough) or it can store only a portion with the rest of the data. This can be a great technique to use if you have some applications that require perhaps only the first 500 bytes of a larger LOB. Those 500 can be stored inline – with the rest of the data – while the rest is stored externally and accessed only when needed. You can create an inline LOB by specifying the INLINE LENGTH clause on your CREATE TABLE statement.

Inline LOBs can improve performance depending upon the type of access required. Consider the example where RESUME data is stored in a CLOB. The first 200 bytes of the resume are accessed most of the time, with the entire resume being accessed only during rare instances (e.g. interview, performance reviews, etc.) By storing those first 200 bytes inline with the rest of the data we can eliminate the I/O to the LOB in the auxiliary table, thereby improving performance. External LOB data is not buffered, but inline LOB data is – this too can impact performance.

This blog entry is getting a little longer than I was expecting, so I am going to cut it off here. We've reviewed what LOBs are, a little but of their history, and discussed a but about both traditional and in-line LOBs. To learn more, be sure to join me for the webinar on April 24th (or watch it later - it will be recorded).

Thursday, May 24, 2007

LOB Enhancements [DB2 9 for z/OS]

IBM focused their attention on improving DB2’s ability to store and manage LOB data in Version 9. As anyone who has tried to use LOBs in a previous version of DB2 knows, the usability limitations were troublesome. But with Version 9, IBM chips away at some of the more annoying LOB limitations.

FETCHing LOBs

Prior to Version 9, there were two methods you could deploy in your programs to fetch LOB data:

  • Fetching data into a pre-allocated buffer
  • Using a LOB locator to retrieve a handle on the data.

Both methods have their issues. Fetching data into a preallocated buffer can cause virtual storage constraint problems, especially for larger LOBs. On the other hand, using LOB locators that commit infrequently or do not explicitly free the locators can use considerable amounts of DB2 resources.

V9 introduces a new clause, WITH CONTINUE, for use on your FETCH statements. By coding your program to use WITH CONTINUE you can retrieve LOB columns in multiple pieces without using a LOB locator, and continue a FETCH operation to retrieve the remaining LOB data when truncation occurs. (Note: this method can be used with XML data, too.) You will have to manage the buffers and reassemble the pieces of data in your application program.

So, by specifying WITH CONTINUE on your FETCH statement you tell DB2 to allow subsequent FETCH CURRENT CONTINUE operations. These will allow you to access the remaining truncated LOB (or XML) column after the initial FETCH. If truncation occurs, DB2 will remember the truncation position and will not discard the remaining data. DB2 will return the total length that would have been required to hold all of the data of the LOB or XML column. This will either be in the first four bytes of the LOB host variable structure or in the 4 byte area that is pointed to by the SQLDATALEN pointer in the SQLVAR entry of the SQLDA for that host variable.

File Reference Variables

DB2 V9 adds support for a LOB file reference variable. A file reference variable is a host variable that is defined in a host language to contain the file name that directs file input and output for a large object (LOB).

Using file reference variables, large LOB values can be inserted from a file or selected into a file rather than a host variable. This means that your application program does not need to acquire storage to contain the LOB value. File reference variables also enable you to move LOB values from the DBMS to a client application or from a client application to a database server without going through the working storage of the client application.

LOBs and Utilities

The manner in which DB2 handles LOBs in utility processing has also been improved in DB2 V9.

Loading and unloading LOBs has been improved. For LOAD, an input field value can contain the name of the file that contains a LOB column value. The LOB column value will then be loaded from that file. For UNLOAD, you can store the value of a LOB column in a file and record the name of the file in the unloaded record in the base table.

What about REORG? Well, prior to V9 you could not access LOB data during a REORG. And a REORG did not reclaim physical space from the LOB data set because LOBs were moved within the existing LOB table space. V9 fixes these problems. During a REORG (in V9), the original LOB table space is drained of writers. All LOBs are then extracted from the original data set and inserted into a shadow data set. When this operation is complete, all access to the LOB table space is stopped (the readers are drained) while the original data set is switched with the shadow data set. At this point, full access to the new data set is enabled, and an inline copy is taken to ensure recoverability of data.

Additionally, both CHECK LOB and CHECK DATA have SHRLEVEL REFERENCE and SHRLEVEL CHANGE options. Using these you can minimize downtime. For LOBs, CHECK DATA checks for consistency between a base table space and the corresponding LOB or XML table spaces. The CHECK LOB utility identifies structural defects in the LOB table space and any invalid LOB values. Running CHECK (DATA or LOB) with SHRLEVEL REFERENCE indicates that applications can read from but cannot write to the index, table space, or partition that is to be checked. SHRLEVEL CHANGE means that applications can read from and write to the index, table space, or partition that is to be checked.

Performance

Finally, DB2 V9 provides several performance improvements related to LOBs. Using file reference variables or WITH CONTINUE to “chunk” read LOBs can improve performance over using locators.

And as we all know, removing locks can improve performance. DB2 V9 eliminates LOB locking for space allocation, as well as for insert, delete, update, and select operations. Additionally, a LOB lock is no longer required to serialize the consistency between the value of the LOB and the column of the base row for an uncommitted read operation.