But back to the topic du jour... LOBs. Let's start with a bit of history. LOBs were added to
relational products like DB2, back in the 1990s, ostensibly to compete more
capably against the Object-Oriented databases of the time. Remember them? Back then it seemed that everybody thought OO DBMS products would supplant relational technology. Seems somewhat quaint now, doesn't it?
At any rate, the idea was to enable relational tables to house unstructured
data like images and audio and video data. DB2 for z/OS users were slow to
adopt LOBs in their mainframe databases. I think that is due to several
different reasons, not the least of which is that it took several new versions
of DB2 for LOBs to mature into capable, usable things for enterprise
applications. Early implementations of LOBs in DB2 were somewhat kludgy
and difficult to administer and use. But IBM has corrected many of the
deficiencies over time and there are now tools that can help us to effectively
manage DB2 LOBs, too.
The other new force driving LOB usage is
the whole Big Data movement. Big Data is a force that is driving organizations to accumulate and analyze
more data, and more varied types of data, to gain business insight. The
most common definition of Big Data was coined by Forrester Research defining
big data in terms of “The 4 V’s” -- volume, velocity, variety,
variability. But that is somewhat limiting. However, I do not want to turn this posting into a definition of big data, so... let's just think of Big Data as MORE DATA, MORE TYPES OF
DATA, and FASTER GENERATION OF DATA. One example of Big Data driving
the usage of LOBs in DB2 is the JSON support that has been added to DB2. JSON
objects are stored in DB2 as BLOBs.
So LOBs have been around for awhile now, but it has taken some time for them to gain significant levels of usage "out there!"
Why LOBs, you may ask? Why not just store the data in regular old DB2 data types like VARCHAR or VARGRAPHIC? The basic answer is that DB2's data types were not large enough to hold this amount of data,
because of their limit of 32 KB. Some multimedia data can get VERY large. For example, a high resolution video requires about 3 GB for each hour. And high-def TV video requires 720 GB/hour!
But an additional part of the answer is that you do not always want to access the LOB data with the traditional data. For example, if you are querying employee information you probably don't always want their photo or their entire resume... so the data can be stored in a LOB, external from the rest of the data (so as not to impede performance) but connected to the data (when you need to access it).
are three types of LOBs supported by DB2:
– Binary Large Object – for binary data
– Character Large Object – for text data
– Double Byte Character Large Object – for graphic character data
Furthermore, there are two options for specifying LOBs in DB2
- The traditional way to define LOBs where the LOB data is stored separately from the rest of the data. This requires defining additional database objects.
- A newer method, for smaller LOBs, introduced with DB2 10, called Inline LOBs.
Using the traditional method, LOBs are
defined such that they are stored external to the base table. The column is specified as a BLOB, CLOB or
DBCLOB in the base table, but DB2 stores the actual LOB data in an auxiliary
table in a LOB table space. You must define a LOB table space per LOB per
partition. So, let’s say we are creating a table with 2 LOBs in a table space
with ten (10) partitions. That means we will need to define 20 LOB table spaces
– 10 for the first LOB (one per partition) and 10 for the second LOB (one for
each partition). An auxiliary table is defined for each LOB table space to hold
the LOB data. And an auxiliary index is required on each auxiliary table.
back to the base table. Remember that we have defined the LOB columns in the
base table. But we also need to include a ROWID column in the base table. Only
one ROWID column is needed per base table no matter how many LOBs you have
defined in the table.
Don’t confuse the ROWID with other concepts that may seem to be similar. A
ROWID is not an identity column nor is it a SEQUENCE. The ROWID is used by DB2
behind the scenes to connect the base table row to the auxiliary table row that
contains the LOB data. You generally will not access the ROWID, although it is
possible to do so just like any other column using SQL. With the ROWID you can
perform direct row access. This
is so because the row ID value implicitly contains the location of the row.
each LOB column also has a 2 byte version number associated with it. This
is used by DB2 when LOB data is modified. You cannot access the version number.
Now as of
DB2 10 for z/OS, you can create inline LOBs in your DB2 tables. But what is an
inline LOB? Well, at a high level, the name is somewhat self-explanatory. The
LOB data for inline LOBs is stored with the rest of the base table.
it is really a bit more nuanced that that. You can store part of the LOB inline
with the rest of the data and the rest of the LOB externally, if you so choose.
So an inline LOB can have all of the LOB data stored inline with the rest of
the data (if it is small enough) or it can store only a portion with the rest
of the data. This can be a great technique to use if you have some applications
that require perhaps only the first 500 bytes of a larger LOB. Those 500 can be
stored inline – with the rest of the data – while the rest is stored externally
and accessed only when needed. You can create an inline LOB by
specifying the INLINE LENGTH clause on your CREATE TABLE statement.
Inline LOBs can improve performance
depending upon the type of access required. Consider the example where RESUME data is stored
in a CLOB. The first 200 bytes of the resume are accessed most of the time,
with the entire resume being accessed only during rare instances (e.g.
interview, performance reviews, etc.) By storing those first 200 bytes inline
with the rest of the data we can eliminate the I/O to the LOB in the auxiliary
table, thereby improving performance. External LOB data is not buffered,
but inline LOB data is – this too can impact performance.
This blog entry is getting a little longer than I was expecting, so I am going to cut it off here. We've reviewed what LOBs are, a little but of their history, and discussed a but about both traditional and in-line LOBs. To learn more, be sure to join me for the webinar on April 24th (or watch it later - it will be recorded).
Labels: Big Data, LOBs