DBAs and database professionals
have been aware of the pros and cons of compressing data for years. The
traditional argument goes something like this: with compression you can store
more data in less space, but at the cost of incurring CPU to compress the data
upon insertion (and modification) and decompress the data upon reading it. Over
time, the benefits of compression became greater as compression algorithms
became more robust, hardware assist chips became available to augment
compression speed, and the distributed model of computing made transmitting
data across networks a critical piece of the business transaction (and
transmitting compressed data is more efficient than transmitting uncompressed
data).
IBM has significantly improved
compression in DB2 for z/OS over the years. In the early days of mainframe DB2
no compression capability came with DB2 out-of-the-box -- the only mechanism
for compressing data was via an exit routine (EDITPROC). Many software vendors
developed and sold compression routines for DB2. Eventually, IBM began shipping
a sample compression routine with DB2. And then in DB2 Version 3 (1993)
hardware-assisted compression was introduced. Using the hardware assist , the
CPU used by DB2 compression is minimal and the cons list gets a little shorter.
Indeed, one piece of advice that
I give to most shops when I consult for them is that they probably need to look
at compressing more data than they already are. Compressed data can improve
performance these days because, in many cases, you can fit more rows per page.
And therefore scans and sequential processes can process more data with the
same number of I/Os, thereby improving performance. Of course, you should use
the DSN1COMP utility to estimate the amount of savings that can accrue via
compression before compressing any existing data.
Eventually, in DB2 9 we even get
index compression capability (of course, using different technology than data
compression). At any rate, compressing data on DB2 for z/OS is no longer the
“only-if-I-have-to” task that it once was.
Then along comes the Big Data
phenomenon where increasingly large data sets need to be stored and analyzed.
Big Data is typified by data sets that are so large and complex that
traditional tools and database systems are ill-suited to process them. Clearly,
compressing such data could be advantageous… but is it possible to process and
compress such large volumes of data?
New alternatives to traditional
systems are being made available that offer efficient resource usage based on
principles of compressed sensing and other techniques. One example of this new
technology is IBM’s BLU Acceleration, which is included in DB2 10.5 for Linux,
Unix, and Windows. One feature of BLU Acceleration is extended compression,
which eliminates the need for indexes and aggregation and operates on
compressed data and can thereby eliminate the CPU time that would be required
to decompress the data. Advanced encoding maximizes compression while preserving
the order of encoding so compressed data can be quickly analyzed without
decompressing it. It is an impressive technology as no changes are required to
your existing SQL statements.
IBM reports that some clients
using DB2 10.5 for LUW with BLU Acceleration have achieved compression rates 10
times greater than uncompressed tables.
Of course, BLU Acceleration is
much more than compression (it combines in-memory, columnar and compression
technologies), but for the purposes of today’s blog entry we won’t delve deeper
into the technology. If you are interested in a little bit more on BLU read my
high-level overview in my coverage of this year’s IDUG DB2 TechnicalConference.
So compression is becoming cool…
who’d have thought that back in the 1980s when compression was something we
only did when we absolutely had to?
Fantastic!
ReplyDelete