Fifty Two Years Ago This Week
On April 7, 1964, IBM announced "a new generation of electronic computing equipment" called the IBM System/360... or as it, and subsequent generations of the machine have become known, the mainframe!
If you'd like to take a walk down memory lane you can read the original press release on the IBM web site.
The System/360, unquestionably, was one of the most significant products in the history of computing. The general, sound concepts of the System/360 are still the at the foundation of modern mainframe. Sure, a lot has changed, but those changes were made to a solid, fundamentally sound base.
Code that ran on the System/360 can still be run on the latest and greatest modern IBM mainframe, the z13. What other platform can say that? I mean, when my desktop was recently force upgraded from Windows 7 to WIndows 10 some of my software stopped running!
So let's take a moment and salute the mainframe on its 52nd birthday... and wish it many more years of productive use.
Happy birthday, mainframe!
Friday, April 08, 2016
Monday, April 04, 2016
The Most Misunderstood Features of DB2 – Part 7: It Depends!
"It depends" is probably the most famous phrase out there when it comes to DB2 performance. Some call it the cardinal rule. You can answer just about any question that anybody asks at any time with it. Try it out. It depends! Always works... that is, it always works if you are trying to avoid answering the question!
Most DBAs and SQL experts resist
giving a straight or simple answer to a general question because there is no
simple and standard implementation that exists. Every situation is different,
and every organization is unique in some
way. So answering "it depends" to most questions can make a lot of sense. But "it depends" should never be the end of the answser!
Don’t be discouraged when you ask
the local expert which statement will perform better, and the answer is “It
depends.” The expert is just doing his or her job. The secret to optimizing DB2
performance is being able to answer the follow-up question to “It depends”—and
that is “What
does it depend on?”
The key
to effective SQL performance tuning is to document each SQL change along with
the reason for the change. Follow up by monitoring the effectiveness of every
change to your SQL statements before moving them into a production environment.
Over time, trends will emerge that will help to clarify which types of SQL
formulations perform best.
So the misunderstanding in this case is thinking that "it depends" is a complete answer to any type of question. It isn't... it is just the beginning of most answers. But it takes time, experience, and study to be able to answer what it depends upon. And that is why you might not get that answer unless you press for it... and attempt to find the answer yourself without just leaning on others for the answer all the time.
There is a corollary to the “It depends”
rule that also is important. Hard and fast rules that rigidly enforce or forbid
usage are not a good idea. In other words, this corollary is simply stated:
“Almost never say always or never.” Notice that even the wording of this
corollary embraces flexibility.
So be flexible, but embrace answers that help... because "it depends" helps nobody unless you tell them what it depends upon!
Thursday, March 17, 2016
Digital Transformation and DB2 for z/OS: It’s Not Your Daddy’s DB2!
If you are a DBA who
has been using DB2 for z/OS for a while you should have noticed that we are not
doing things the same way we used to. DB2 is changing and we should be changing
with it. If you are still using DB2 the same way you did 10 or 20 years ago,
then you are definitely not adhering to industry best practices!
The same trends that
are driving the digital explosion are also changing DB2 and the traditional
role of the DBA. We are storing more data and different types of data for
longer periods of time and in different ways than we have in the past.
And DB2 for z/OS keeps
changing to adopt and embrace modern data management requirements and
techniques. Whether it is modernizing storage with universal table spaces,
embracing unstructured data in LOBs, or expanding the SQL language with new and
more functionality, today’s DB2 looks a lot different than it did yesterday.
Indeed, it is different – it is not your daddy’s DB2.
I’ve been writing a
series of blog posts for BMC about this topic under the title It’s Not Your
Daddy’s DB2! You can find the first
three blog posts in this series here: 1 2 3
But you can also
attend a live webinar that BMC is sponsoring where I will talk about these
issues. You can learn about:
·
Trends that influence
the size and complexity of your DB2 environment and how this impacts data
management
·
How to adapt to new
DB2 data types and structures
·
Best practices and
technologies for managing DB2 in the digital age
·
And BMC will share its
next generation technology for managing the new world of DB2 for z/OS.
Learn how digital
transformation will change the way your DBAs manage critical business needs.
Attend this webinar on March 30, 2016, at 12:00 pm CT.
Tuesday, March 15, 2016
The Most Misunderstood Features of DB2 – Part 6: Not Indexing
Welcome to our
on-going series of misunderstood issues in the world of DB2 for z/OS. Today’s
topic is indexing, or to be more precise: not indexing. What do I mean by that?
Well, I’ve heard folks
say that you should always create at least one index on every DB2 table. And
while I can empathize with that general notion, I can’t agree completely
because of that word “always,” which almost always makes a sentence wrong.
Sure, there are
several good arguments for creating indexes on DB2 tables. Perhaps the most
important one is clustering. DB2 uses an index to control how data is clustered
on disk. Only one index can be specified as the clustering index (because, of
course, the data on disk can only be stored in one order, right!). If you do
not specify a clustering index then DB2 will use the earliest created (oldest)
index to cluster the data. So it is usually a good idea to create an index to
guide clustering.
Another popular reason
to create an index on a DB2 table is to enforce uniqueness for a UNIQUE
constraint or PRIMARY KEY. The only way to enforce uniqueness on a DB2 column
(or set of columns) is by using a unique index.
Of course, there are a
lot of other good reasons to create indexes, most of them to improve query
performance. But I contend that there are situations when it makes sense not to
create any indexes at all.
So when does it make more sense not to build an index for a DB2 table?
Let's
start by saying that most of the time you will want to build at least one - and
probably multiple - indexes on each table that you create. Indexes are crucial
for optimizing performance of SQL access. Without an index, queries must scan
every row of the table to come up with a result. And that can be very slow.
Having
said that, here are a few situations some times where it can make sense to have
no indexes defined on a table:
When all (or most) accesses retrieve every row
of the table. Because every row will be retrieved every time you want to use
the table, an index (if used) would just add extra I/O and would diminish, not
enhance performance. Though not extremely common, you may indeed come across
such tables in your organization.
For a very small table with only a few pages of data and no
primary key or uniqueness requirements. A very small table (perhaps 20 to 30 or
so pages) might not need an index because simply reading all of the pages is
very efficient already.
When performance doesn't matter and the table is only accessed
very infrequently. But, hey, when do you ever
have that type of requirement in the real world?
Other
than for these circumstances, you will most likely want to build one or more
indexes on each table, not only to optimize performance, but also to ensure
uniqueness, to support referential integrity, and to drive data clustering.
Of
course, indexes do not come without cost. Indexes take up disk space and adding
a lot of indexes will consume disk space. An additional consideration is their
impact in data modification. Although indexes speed up queries they degrade
inserts and deletes, as well as any modification to indexed columns.
What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!
What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!
Friday, March 04, 2016
The Most Misunderstood Features of DB2 – Part 5: Choosing the Clustering Key
Hello everybody, and welcome
to part 5 in my on-going series where we take a look at The Most Misunderstood
Features of DB2. You can find the earlier installments in this series here
(Part 1 Part 2 Part 3 Part 4).
Today’s topic, on choosing an appropriate clustering index, might be a bit confusing to some of you. So let’s start at
the beginning and describe what clustering is.
What is Clustering?
In DB2 for z/OS, you use an
index to indicate how DB2 should try to store table space data physically on
disk. This is called clustering. A DB2 index is a clustering index if the CLUSTER keyword is specified when the
index is created. Clustering causes inserted rows to be stored contiguously in
sequence whenever possible. Additionally, when the table space is reorganized
the data will be sequenced according to the clustering index. Since there can
only be one physical sequence for data on disk, there can only be one
clustering index per table.
For tables that do not specify
APPEND YES and do not have hashing
defined, if you do not specify a clustering index, DB2 chooses to cluster the
data using the oldest existing index. Therefore, it is wise to explicitly
specify a clustering index instead of letting DB2 decide because you can almost
always choose better than the (basically random) choice DB2 makes.
Clustering of data is
important because when the data is sequentially accessed, if the data exists on
the same page (or range of pages) then I/O is minimized because fewer pages
are read than if the data were spread throughout the entire table space on
“random” pages. And fewer I/Os means improved performance.
So then, how should a DBA go
about deciding how to cluster the data? Well, the best answer is to analyze the
various ways in which the data is to be accessed. This means reviewing the
actual SQL along with frequency and importance of the execution patterns. When
the most important/frequently executed sequential accesses are identified, then
you can choose to cluster the data according to that information. By following this approach then the most frequent and/or important sequential accesses will be optimized by clustering.
Of course, frequently the
decision on how to cluster is made during the database design phase when few,
if any, SQL statements or access patterns are readily available. So the DBA
sometimes makes a knee-jerk decision to just cluster based on the primary key (PK)
and be done with it. This is usually not a good idea.
Why is Clustering by PK not a very good idea?
Think about what we just said
about sequentially accessing data. How often is data sequentially accessed by PK?
Sometimes this might be the case, maybe if you are producing a report in order
by PK, but even that example is not a very good one. Think about it. If you are
producing a customer report you probably do not create it in CUSTNO order but
probably in order by customer name (or something more interesting to humans
than a key number).
So the best option is to
figure out the most common and important sequential access patterns and cluster
accordingly. But what if this information is not available? My assertion is
that clustering by Foreign Key (FK), instead of by PK, makes a lot more sense.
Usually a FK is on the many side of a one-to-many relationship. Think about
what that means. When you join PK to FK there will be one PK row being joined
to multiple FK rows. If those multiple FK rows are all on the same page (or on
contiguous pages) due to clustering, then we can impact I/O.
The Bottom Line
The bottom line here is that
you should think about (and even re-think) your general assumptions and why you
make any database design decision. Sometimes your immediate gut reaction won’t
be the right one. And it is always better to think with our heads instead of
our guts, right?
Wednesday, February 17, 2016
The Most Misunderstood Features of DB2 – Part 4: Base Table Views
Welcome to my continuing series focusing on The Most
Misunderstood Features of DB2. Today’s topic is an oldie but a goodie: base
table views.
What is a Base
Table View?
I am going to assume that regular readers of this blog know what
a view is (if not, go here). So what is a base table view? It is a view that
contains all of the columns and all of the rows of the base table and nothing
else. In other words, it is like a SELECT * against the base table with no WHERE
clauses.
Now why would somebody create such a beast? This gets to the
heart of the misunderstanding here. There is a continuing notion “out there”
that it is a good idea to create a base table view for every table and to give programmers
access to the base table view only, not the base table itself.
The reason given for doing so is to insulate programs from
change. But this is a weak argument at best. I admit that this particular
misunderstanding is less prevalent than it was in the earlier years of DB2. But
I do still find shops adhering to this archaic, and poorly founded, idea.
Reasons Given for
Base Table Views
One reason given by base table view proponents is that when you
add a column to a table you do not have to change the program. But this is the
case, too, if you code your programs appropriately, avoiding SELECT * and
coding only the specific columns you need to access. Most shops do this even
when using base table views.
Other reasons given revolve around removing columns or splitting
tables. If you are interested in all of the reasons and rebuttals I urge you to
read my article (PDF) on this topic from 1991 titled One View Per Base Table? Don’tDo It! (Wow! This is a quarter of a Century old now!)
Another View
Fallacy
I recall yet another misunderstanding about views from many
years ago that I have not heard in a long time. Basically, the argument was
that accessing a base table view would out-perform accessing the base table.
This was never proven to me – and I do not believe it was ever
true. How could it be? Static SQL against a base table view would use view
merge so it would be the same as going against the base table. And dynamic SQL
would have the added step of having to resolve the view to the base table. Sooo…
Summary
Base table views are not worth the time and effort because the
bring no added value of any type whatsoever. Do not use them.
Wednesday, February 10, 2016
The Most Misunderstood Features of DB2 – Part 3: Nulls
Welcome to Part 3 in my on-going blog series on The Most
Misunderstood Features of DB2. You can find the first two parts here: Part 1
(on locking) and Part 2 (OPTIMIZE FOR v. FETCH FIRST). Today’s topic is one
that confuses many SQL developers, Nulls.
What is a Null?
A null represents missing or unknown information at the column
level. When a column is set as null, it can mean one of two things: the
attribute is not applicable for certain occurrences of the entity, or the
attribute applies to all entity occurrences, but the information may not always
be known. Of course, it could be a combination of these two situations, too.
A null is not the same as 0 or blank. Null means no information has
been specified for the column and it implies that the value is either unknown
or not applicable.
Because DB2 supports null you can distinguish between a
deliberate entry of 0 (for numerical columns) or a blank (for character
columns) and an unknown or inapplicable entry (NULL for both numerical and
character columns). Null indicates that the user did not explicitly make an
entry or has explicitly entered NULL for the column. For example, a null “value”
in the Price column of the ITEM table in some database does not mean that the
item is being given away for free; instead it means that the price is not known
or has not yet been set.
Nulls sometimes are inappropriately referred to as “null
values.” Using the term value to describe a null is inaccurate because a null
implies the lack of a value. Therefore, it is a good practice to simply use the
term null or nulls (without appending the term “value” or “values” to it).
A Few Examples
When are nulls useful? Well, defining a column as NULL provides
a placeholder for data you might not yet know. For example, when a new employee
is hired and is inserted into the EMP table, what should the employee
termination date column be set to? I don’t know about you, but I wouldn’t want
any valid date to be set in that column for my employee record. Instead, null
can be used to specify that the termination date is currently unknown.
Let’s consider another example. Suppose that we also capture
employee’s hair color when they are hired. Consider three potential entity
occurrences: a man with black hair, a woman with unknown hair color, and a bald
man. The woman with the unknown hair color and the bald man both could be
assigned as null, but for different reasons. The woman’s hair color would be
null meaning presently unknown; the bald man’s hair color could be null too, in
this case meaning not applicable. But there are probably better methods of
dealing with the bald man’s hair color.
How could you handle bald without using nulls? You could create
special values for the HairColor column that mean “bald” and “unknown.” This is
possible for a CHAR column like HairColor, but what about a DB2 DATE column?
All occurrences of a column assigned as a DATE data type are valid dates. It
might not be possible to use a special date value to mean “unknown.” This is
where using a null is most practical.
DB2 does not differentiate between nulls that signify unknown
data and those that signify inapplicable data. This distinction must be made by
the program logic of each application.
Keep in mind, though, that using null to indicate “not
applicable” can be an indication of improper database design. By properly modeling
and normalizing your data structures you can usually eliminate the need to use
nulls to indicate that a column is inapplicable for a specific row.
DB2 and Indicator VariablesDB2 represents null in a special “hidden” column known as an
indicator variable. An indicator variable is defined to DB2 for each column
that can accept nulls. The indicator variable is transparent to the end user,
but must be managed when programming in a host language (such as Java, COBOL or
PL/I).
The null indicator is used by DB2 to track whether its
associated column is null or not. A positive value or a value of 0 means the
column is not null and any actual value stored in the column is valid. If a
CHAR column is truncated on retrieval because the host variable is not large
enough, the indicator value will contain the original length of the truncated
column. A negative value indicates that the column is set to null. If the value
is -2 then the column was set to null as the result of a data conversion error.
Let’s take a moment to clear up one common misunderstanding
right here: nulls NEVER save storage space in DB2 for z/OS. Every nullable
column requires one additional byte of storage for the null indicator. So, a
CHAR(10) column that is nullable will require 11 bytes of storage per row – 10
for the data and 1 for the null indicator. This is the case regardless of
whether the column is set to null or not.
DB2 for Linux, Unix, and Windows has a compression option that
allows columns set to null to save space. Using this option causes DB2 to
eliminate the unused space from a row where columns are set to null. This
option is not available on the mainframe, though.
Every column defined to a DB2 table must be designated as either
allowing or disallowing nulls. A column is defined as nullable – meaning it can
be set to NULL – in the table creation DDL. Null is the default if nothing is
specified after the column name. To prohibit the column from being set to NULL
you must explicitly specify NOT NULL after the column name. In the following
sample table, COL1 and COL3 can be set to null, but not COL2, COL4, or COL5:
CREATE TABLE SAMPLE1
(COL1 INTEGER,
COL2 CHAR(10) NOT NULL,
COL3 CHAR(5),
COL4 DATE NOT NULL WITH DEFAULT,
COL5 TIME NOT NULL);
In SELECT statements, testing for null is accomplished
differently than testing for other “values.” You cannot specify WHERE COL =
NULL, because this does not make any sense. Remember, null is a lack of a value
so the column does not equal anything. Instead, you would
have to code WHERE COL IS [NOT] NULL. In INSERT statements NULL can be
specified in the VALUES clause to indicate that a column is to be set to NULL;
but in UPDATE statements you can use the equality predicate (=) to assign a
column to NULL. When inserting data, if the user fails to make an entry in a
column that allows nulls, DB2 supplies the NULL as the default (unless another
default value exists). If an attempt to insert NULL is made against a column
defined as NOT NULL, the statement will fail.
More Confusion
OK, now that we have discussed what nulls are and the basics of
using nulls in DB2, let’s look at how the presence of nulls can confuse SQL
developers.
The AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column
occurrences set to null. The COUNT(*) function, however, does not omit columns
set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*)
when the average is being computed for a column that can contain nulls. You
must be aware of which columns can be null as well as which functions operate
on columns versus rows to make sure you code correctly!
It is important to understand, too, that for comparison in a
SELECT statement, two null columns are not considered to be equal. When a
nullable column participates in a predicate in the WHERE or HAVING clause, the
nulls that are encountered cause the comparison to evaluate to UNKNOWN. But ORDER
BY, GROUP BY, DISTINCT, and unique indexes consider nulls to be equal and
handle them accordingly.
Additionally, when a column set to null participates in a
calculation, the result is null. This is true even if the calculation is NULL/0
(which as every math savvy person knows should be an error). But because there
is a null in the calculation, the result will be null!
Let’s look at another statement that confuses a lot of people:
SELECT SUM(SALARY)
FROM EMP
WHERE JOBCODE > 999;
Assume that SALARY is defined as NOT NULL. Furthermore, assume
that the largest JOBCODE is 500 (that is, less than the 999 we are checking
for). What is the results of this SQL?
No rows satisfy the predicate. As such, many people say the
results should be 0. But it is not. The result is null! Yes, you can get a null
result when performing functions on a non-nullable column. The predicate
resolves to the empty set and the sum of the salaries in the empty set is null
because there are no employees, and hence no salaries. Therefore, you better
make sure that you code a null-indicator in your program when you write code
like this.
The Bottom Line
Nulls are confusing, but cannot be completely avoided in DB2.
Take the time to learn how nulls work and how they impact your coding and
development practices… or you will certainly make mistakes.
----------------------------------------------------------------------------------------------------------
Additional resources for understanding nulls and their usage:
Tuesday, February 02, 2016
The Most Misunderstood Features of DB2 – Part 2: Optimize vs. Limited Fetch
Welcome to Part 2 in my on-going blog series on The Most Misunderstood Features of DB2. In Part 1 of the series we tackled the topic of locking, which IMHO is easily the most misunderstood feature of DB2 (probably of most DBMSes).
Today's topic is a brief one, but one that I've found folks to be confused about. Namely, the difference between the OPTIMIZE FOR and FETCH x ROWS ONLY clauses.
The OPTIMIZE FOR n ROWS and FETCH FIRST n ROWS ONLY are similar when it comes to influencing the optimizer. Both will provide help to the optimizer about the scale of the query (that is, the number of rows to expect when optimizing the access path to the data).
But, the two clauses have different purposes. If all you want to do is influence the optimizer then you should use OPTIMIZE FOR n ROWS as its usage does not limit the number of rows that can be fetched when the statement is executed. For example, you might specify OPTIMIZE FOR 1 ROW on a SELECT statement. But if 500 rows qualify then you can FETCH all 500 rows at run time.
(Note: the specific numbers used in the previous paragraph are not important; it is important only that you know the number of rows specified in the OPTIMIZE clause does not inhibit the number of rows that can later be accessed by the query/program.)
The FETCH FIRST n ROWS ONLY clause is different. It actually limits the number of rows that can be fetched when the query is run. If you specify the clause FETCH FIRST 5 ROWS ONLY, then only 5 rows can be fetched - even if 100 or 1000 or more actually qualify. And yes, the number of rows you specify for FETCH FIRST will be provided to the optimizer for access path formulation, but it also will change the way the query behaves when it runs.
So, when you write a SELECT statement that qualifies a large number of rows, but you only need a few rows at run time, it is a good idea to use FETCH FIRST n ROWS ONLY. On the other hand, if you are simply trying to tweak query performance by getting the optimizer to consider a different number of qualifying rows, then you should consider using OPTIMIZER FOR n ROWS.
The two statements are quite different and it is important that developers not confuse the two!
Tuesday, January 26, 2016
The Most Misunderstood Features of DB2 – Part 1: Locking
Today I am introducing a new series of blog posts here on misunderstood DB2 features and functions. But before I start this blog post I want to emphasize
that this is just my opinion. I’m sure many of you have your own ideas of the
DB2 features that are most misunderstood. But please, take a moment to consider
my thoughts here… and then share your own in the comments section below!
Locking!
One
of the most misunderstood features of
DB2 is how locking works. This is probably to be expected because database locking
is a rather complex topic and it is constantly morphing with each and every new
version of DB2.
First
of all you have to understand the lock options available and how DB2 locks at
each level: table space, table, page and row. You need to understand share and
exclusive locks and that the manner in which DB2 takes them depends upon the
SQL statements you issue, the program bind options you choose, and even, in
some cases, the DDL options you’ve chosen.
And more. What about claims and drains? Do you know the different between locks and latches? What about the difference between lock suspension, timeouts, and deadlocks?
And more. What about claims and drains? Do you know the different between locks and latches? What about the difference between lock suspension, timeouts, and deadlocks?
Then
there is the whole issue of lock avoidance and what that means. I visit a lot
of shops each year and I still see a lot of ambiguous cursors. Simply adding
FOR READ ONLY to all of the cursors in your program where you will not be
modifying the data can make a world of performance difference because DB2 can
avoid locking.
There
is also the frequently-ignored need for a commit strategy. Every batch program
should have a commit strategy implemented whereby modifications are committed to
the database and locks are released. Failing to do this causes other programs
to timeout waiting for locks. I wrote about this in the past and I call it
Bachelor Programming Syndrome.
Another
locking issue is the whole dirty read everywhere problem. Programmers learn
that they can improve performance by adding WITH UR to the end of their SQL statements
in their programs. UR, Uncommitted Read – aka dirty read, causes DB2 to not
take locks. But that can cause problems because the program is reading
uncommitted data – and that data it is reading may never actually exist in the
database (it could be rolled back) and may not be consistent because it is not reading
only committed units of work. I’ve visited shops that use WITH UR everywhere
and they insist that they have not encountered any problems. To which I always add, “…yet.”
Then
there are the new features added in the last few releases of DB2, including improved lock efficiencies when accessing the DB2 Catalog, the ability to access currently
committed data, optimistic locking, skipping locked data, and more.
Summary
Yes,
it is easy to misunderstand locking, and many do so. Taking the time to bolster
your knowledge of how DB2 locks, the locking options available, and how they
work can do you and your shop a world of good.
And
be sure to check back here for future blogs about misunderstood DB2 features!
Monday, January 11, 2016
The Impact of Data Volume on Operational Databases
Operational databases are growing in size for many reasons, not the least of which is the growing importance of big data and analytics projects. There is the overarching trend of more and more data being generated every
year. But also, there is the growing need to store more data for longer periods
of time due to regulatory and compliance issues. Some organizations and business have encountered the need to store certain types of data for 100 years or more (as this video and this storage project point out).
But I doubt that I really need to convince you that your databases are growing in size. Most DBAs experience the reality of data growth every day.
As data volumes expand, it impacts operational databases in
two ways:
- additional
data stresses transaction processing and can cause performance slow downs, and;
- database
administration tasks are negatively impacted.
In terms of performance, the more data in the operational
database, the less efficient transactions running against that database tend to
be. Table scans must reference more pages of data to return a result. Indexes
grow in size to support larger data volumes, causing access by the index to
degrade because there are more levels to traverse to return an answer. Such
performance impacts are causing many companies to seek solutions that offload
older data to either reference databases or to archive data stores.
The other impact, database administration complexity, causes
longer processing time and outages to perform such functions as backups,
unloads, reorganizations, recoveries, and disaster recoveries. The larger the underlying data sets for your tables and table spaces, the longer it takes to run administrative utilities for them. In many cases the lengthened outages can become unacceptable, causing companies to again seek ways to lighten up the
operational databases... or perhaps acquire next generation utility technology that understands the reality of large DB2 database objects.
But even though we want to keep all of that additional data, there is no reason that it necessarily has to be stored in operational databases that run the business. For many reasons, you probably want to separate active data from historical data.
Some companies create purge jobs for all (or many) of their tables to remove data from the production databases as it ages. This can be an acceptable approach to reduce the size of your operational databases. But it also means that the data, which you might want to keep for analytical purposes, is lost. Another approach is to archive the data. Archiving data and purging data are two different processes. When data is purged, it is removed from the operational database and discarded. But archived data is removed from the operational database and maintained in an archive data store. The archive might be a flat file, another relational table or to HDFS using Hadoop.
The bottom line is that it makes sense for us, as DBAs, to keep any eye on the size of our operational databases and take action when production workload is impacted.
Monday, January 04, 2016
A Lot of Extraneous Data Sets?
In a recent blog post here I talked about a quick and dirty method of converting your partitioned table spaces from index-controlled to table-controlled. If you haven't read that post, take a moment to click over and read it here: Easily Convert to Table-Controlled Partitioning.
The reason I bring this up today is that I received an interesting e-mail from a long-time friend and DB2 DBA who read the post and had some information to share with me. He told me about how his organization used one of my tips to drop unused indexes as part of this process.
He told me that during the conversion process they dropped a lot of the clustering indexes because they weren't being used for access paths or for uniqueness. And they were able to release an "astonishing 4,100 data sets" by doing so!
Now I am not suggesting that every shop will be able to experience a similar savings, but if you have indexes that have no purpose other than enforcing index-controlled partitioning, it is time to bite the bullet and drop those indexes as you convert to table-controlled-partitioning (and then on to Universal table spaces).
And when you convert, please drop a note here on the blog to let us know how your conversion efforts went!
Subscribe to:
Posts (Atom)