Thursday, November 06, 2008

Data Driven: A Great New Book on Data Quality Issues

If you are at all involved in assuring the quality of your company’s data you need to know the work of Thomas C. Redman. Dr. Redman has been working on improving data quality for years and he has written numerous articles and books on the subject. His latest book, Data Driven: Profiting From Your Most Important Business Asset is another winner.

Redman offers the basic thesis of the book right there on page one, where he states “…bad data lie at the root of issues of international importance, including the current subprime mortgage meltdown, lost and stolen identities, hospital errors and contested elections.” After laying down the problem, the rest of the book tells us what we need to do to correct the problems.

Data Driven will help you to improve the methods you deploy for the care and feeding of your data and information; in other words, helping you to control and manage data using similar processes and controls that you deploy on your other assets (finances, people, structures, etc.) – a noble goal, indeed!

The writing is concise and snappy – you won’t get bored reading this book. The style is engaging and it is easy to read. For example, instead of just saying what to do and how to do it, which can be boring, Redman discusses many of the arguments people use to say that data quality is impossible, and then debunks them showing that data quality is possible, if approached properly and thoroughly.

There are many good ideas, charts, and graphs in Data Driven, too. One of my favorites is on page 54, where you can find a chart of the ten habits followed by those with the best data. If you buy this book, make a poster-sized photocopy of that page and hang it up on the wall of the break room and in the data folks’ cubicles. Maybe the habits will rub off on everyone as they gaze upon them everywhere.

But the best little gem in this wonderful book is the entirety of the last chapter, which is titled “The Next One Hundred Days.” In this chapter Dr. Redman offers what he calls a hundred-day panorama. It is not a grand plan because most will not have the depth of understanding required to create such a plan and have it succeed. Instead, the panorama strives for breadth, not depth, with a focus on quality. Diligent readers can follow the guidance in this chapter and thereby begin the long-term process of appreciating the importance of data quality on their business practices.

And that alone is worth the price of the book… but, of course, Data Driven offers much more and I recommend it to every IT and business professional whose job relies on accurate data.

Monday, November 03, 2008

On Date Formats, Part 2

Here is a follow-up question and answer based on my previous blog post:


Q: My format does not fit into any of the formats listed in the DB2 manuals. What if I have a DATE stored like YYYYMMDD (with no dashes or slashes) and I want to compare it to a DB2 date?


A: Okay, let's look at one potential solution to your problem (and then I want to briefly talk about the use of proper data types). First of all you indicate that your date column contains dates in the following format: yyyymmdd with no dashes or slashes. You do not indicate whether this field is a numeric or character field - I will assume that it is character. If it is not, you can use the CHAR function to convert it to a character string.


Then, you can use the SUBSTR function to break the character column apart into the separate components, for example SUBSTR(column,1,4) returns the year component, SUBSTR(column,5,2) returns the month, and SUBSTR(column,7,2) returns the day.


Then you can concatenate all of these together into a format that DB2 recognizes, for example, the USA format which is mm/DD/yyyy. This can be done as follows:


SUBSTR(column,5,2) || "/" || SUBSTR(column,7,2) || 
"/" || SUBSTR(column,1,4)


Then you can use the DATE function to convert this character string into a DATE that DB2 will recognize. This is done as follows:


DATE(SUBSTR(column,5,2) || "/" || SUBSTR(column,7,2) || 
"/" || SUBSTR(column,1,4))


The result of this can be used in date arithmetic with other dates or date durations. Of course, it may not perform extremely well, but it should return the results you desire.


Now, a quick word about using proper data types. I say this all of the time, but there are many applications and implementations "out there" that do not heed the advice: it is wise to use the DATE data type when you store dates in DB2 tables. It simplifies life later on when you want to do things like formatting dates and performing date arithmetic.


Using the appropriate data type also ensures that DB2 will perform the proper integrity checks on the columns when data is entered, instead of requiring application logic to ensure that valid dates are entered.

Wednesday, October 29, 2008

On Date Formats

Regular readers of my blog know that from time to time I use the blog as a forum to answer questions I get via e-mail. Today, we address a popular theme - dealing with DB2 date data...


Q:I have a DATE column in a DB2 table, but I do not want it to display the way DB2 displays it by default. How can I get a date format retrieved from a column in a table from DB2 database in the format MM/DD/YYYY?


A:The simplest way to return a date in the format you desire is to use the built-in column function CHAR. Using this function you can convert a date column into any number of formats. The specific format you request, MM/DD/YYYY, is the USA date format. So, for example, to return the date in the format you requested for a column named START_DATE you would code the function as follows:


CHAR(START_DATE,USA)

The first argument is the column name and the second argument is the format. Consult the following table for a list of the date formats that are supported by DB2.

Name

Layout

Example

ISO

yyyy-mm-dd

2002-10-22

USA

mm/dd/yyyy

10/22/2002

EUR

dd.mm.yyyy

22.10.2002

JIS

yyyy-mm-dd

2002-10-22

LOCAL

Locally defined layout

N/A


You may also have an installation-defined date format that would be named LOCAL. For LOCAL, the date exit for ASCII data is DSNXVDTA, the date exit for EBCDIC is DSNXVDTX, and the date exit for Unicode is DSNXVDTU.

Of course, this is a simple date question... I will follow-up with some additional date-related questions and answers in my next couple of blog posts.

Wednesday, October 22, 2008

Bad Standards

Just started a new series on bad standards over on my Data Management Today blog.

Check it out when you get a chance and share your favorite "bad standards" either here or there... or by e-mailing me.

Monday, October 20, 2008

DBA Rules of Thumb

Database administration is a very technical discipline, but it is also a discipline in which the practitioner is very visible politically within the organization. As such, DBAs should be armed with the proper attitude and knowledge before attempting to practice the discipline of database administration.

Just as important as technical acumen, though, is the ability to carry oneself properly and to embrace the job appropriately. With this in mind, I wrote a series of blog entries on DBA Rules of Thumb over at my Data Management Today blog... and I thought the information I wrote there may be helpful to my DB2 and mainframe readership here, so I'm sharing the eight rules of thumb (with links) here on my DB2 Portal blog:
  1. Document Everything!
  2. Automate Ingelligently
  3. Share
  4. Don't Panic!
  5. Focus Your Efforts
  6. Invest In Yourself
  7. Diversify
  8. Develop Business Acumen
What do you think? Did I miss anything important?

P.S. Just a reminder that I will be presenting a webinar on assuring DB2 recoverability with my colleague, Michael Figaro, this Thursday, October 23, 2008 at 10:30 Central time. If you are at all interested in the topic, be sure to register today - and attend this Thursday!