The Db2 Portal Blog

Friday, October 30, 2009

IOD2009 Day Three – Malcolm Gladwell

Today’s blog entry is a little late seeing as how this is Friday and IOD is over, but I’m writing about Wednesday morning’s keynote session highlighted by Malcolm Gladwell.

For those who do not know him, Malcolm Gladwell is a Canadian journalist and author best known as the author of the books The Tipping Point (2000), Blink (2005), and Outliers (2008). I’ve read all three of them and I highly recommend that you do, too. He also has a new book, What The Dog Saw, that I bought at the airport on the way home from IOD. I hope it is as good as the other three!

Gladwell’s books deal with the unexpected implications of research in the social sciences. He spoke about some of these during his keynote session in a very entertaining and informative way. It was especially rewarding to hear him tie his messaging into the conference them of information-led transformation… and to hear him call all of the attendees mavens (read his books to understand that term).

Gladwell began his talk by noting the irony of hosting a conference dealing with information analytics in Las Vegas, of all places. You would think that the casinos might have an interest in that topic!

The major idea conveyed by Gladwell during his talk focused on change, and how it never occurs the way you think it will. He explained how radical changes happen much more quickly than we imagine. And he used a story about how the broadcast of a major prize fight transformed radio from a niche product to a transformative one.

He also talked about reframing as being necessary to elicit major changes. Prior to broadcasting the boxing match, radio was used to deliver news and classical music. But reframing it as a product for the delivery of real-time sports coverage – reframing its use – caused major transformation. Gladwell also highlighted Apple and the iPod. The iPod was not the first portable MP3 product, but it is the most successful. Because Apple simplified the interface and promoted it simply… that is, they reframed the issue!

I also enjoyed Gladwell’s story about how he purchased a laptop at CompUSA. Upon entering the store he was confronted with tables and tables of similar looking laptops and had no way to differentiate them. He tried the sales people but as anyone who ever went to CompUSA knows, they either couldn’t be found when you needed them or knew nothing about technology. So he called up his brother, who works in IT. He is not a well-known expert, but he is Malcolm’s maven – the guy he turns to for help. And he told him which laptop to purchase. Gladwell then went back to the CompUSA and he said he stood there pointing to the laptop his brother told him about. And about a half hour later the crack CompUSA salesman came to help him.

This story highlighted the concept of the maven. He also mentioned that if the executives were watching how he confidently pointed to the machine he wanted that it might have led them to erroneously believe that CompUSA needed less experienced… and, indeed, fewer sales people. Which might have led to their demise.

All in all, Gladwell was entertaining and informative – a very powerful combination.

The other highlight of the keynote session, for me at least, was that I was mentioned by name as the most prolific Twitter-er at the IOD conference. Now does that mean I was the most helpful or the most annoying… I’ll leave that to you all to decide.

Wednesday, October 28, 2009

IOD2009 Day Two

Day two of the IBM Information on Demand conference was just as informative and exciting as day one. The day kicked off with a general session titled "A New Kind of Intelligence for a Smarter Planet." The idea presented is that the world is changing. It is becoming more instrumented, interconnected, and intelligent. Basically, as Steve Mills of IBM clarified, the ability to embed intelligence into millions of things will lead the transformation to an information led smarter planet. And that this information-led transformation will create opportunities for organizations to strategically gain control of information and create a new kind of intelligence.

Expanded intelligence begins with sensors and metering, which is doable today because price points have become reasonable now. There are 1 billion transistors per human, today. And an estimated 2 billion people will soon be on the Internet. At the same time, we are moving toward one trillion connected objects (video cameras, GPS devices, healthcare instruments, and even livestock). And there are hundreds of satellites orbiting the Earth, generating terabytes of data each and every day.

As we begin to intelligently interconnect these devices and analyze their streaming data, a transformative opportunity can results...

IBM brought up three customers to talk about how their organization were helping to transform to a smarter planet. The customers came from Cardinal Health, Statoil Hydro ASA (Norway), and the Food and Drug Administration. Highlights of the panel discussion:

Security in the supply chain for food is absolutely critical and food safety is one of the most complex systems to deal with.
The US imports 50pct of its food - from over 150 different countries.
Every food supplier to US (domestic or foreign) must be registered with the FDA.
And each supplier must complete forms as to the safety of its food. This can be difficult since suppliers range from large agri-businesses to small farms many of whom do not have a computer at all. So some records are probably kept on paper in shoe boxes.
Supply chain security is very important in the health care (drugs) and oil industries too! In fact, it was discussed how each of these seemingly disparate industries face many similar challenges.
Preventing problems requires understanding risk. And complexity requires collaboration in order to succeed becase nobody has enough resources to do it all alone.
In some areas (such as remote parts of Norway) instrumentation is essential to even get the data because nobody wants to go there.
Legacy systems often are rich sources of information (hey, that probably means mainframes!)
Analytics are required for prevention of problems, but also aid in reaction to problems that actually occur too. No one prevents 100 percent of their problems.

After the panel IBM came back and wrapped it up. They mentioned how IBM was awarded the National Medal of Honor for their Blue Gene supercomputer for DNA sequencing. It was a very informative and entertaining general session.

I then attended the DB2 9.7 versus Oracle 11g Smackdown presentation. It was chock full of statistics on why IBM's DB2 is superior to Oracle in terms of cost. The presenter explained how DB2 outperforms Oracle on TPC-C benchmarks for the same test, on the same machine, at the same point in time. He cautioned folks to to read the small print details on all benchmark results... for example, if you are examining the cost of ownership double check to see whether the benchmark uses a term or full purchase license. Also, the cost of the database license depends a great deal on your ability to negotiate a discount (if the vendor will discount). And you also need to be aware of how the products are licensed. Some features are separately licensed for both DB2 and Oracle. The bottom line is that licensing can cause a more than 30 percentt swing in price performance results

But do people even believe these benchmarks any more? I don't think very many people put much stock in benchmark tests today.

The author frequently cited an independent study by ITG that compares the value proposition of DB2 9.7 versus Oracle Database 11g. You can read the study yourself at this link.

(Note to my regular z/OS readers: the previous discussion was all about DB2 LUW and not DB2 for z/OS).

I also got to attend a special briefing for bloggers on IBM's new stream computing solution, which I blogged about on my Data Management Today blog if you are interested.

And finally, I'll plug my Twitter feed again. You can follow my tweets at IOD this week (well, thru Wednesday) by following me on Twitter at http://www.twitter.com/craigmullins.

Monday, October 26, 2009

IBM IOD2009 Day One

It is Monday, October 26, 2009, and the annual IBM Information on Demand (IOD2009) conference is officially underway. Well, actually it kicked off with a bang on Sunday. The exhibition hall opened at 6:00 pm and the early goers traipsed through the vendor hall sharing stories, checking out the vnedor's wares, and looking for the latest tschotskes (the favorites seem to be a mini-book light being given out by SPSS and the nifty DB2 t-shirts being given out by SEGUS, Inc.).

But the event really does not get started (at least as far as I'm concerned) until the big Monday morning kickoff, which this year was emceed by Terry Fator (the ventriloquist impersonator who won America's Got Talent) last year. He did a very nice job, and his mimicry and ventriloquism skills are great... but I think ventriloquism works better in a more intimate setting.

Before Terry came on an IBMer shared several tidbits of data from research and from polls they have been taking at IOD. For example, did you know that 15 petabytes of data are created every day? Regarding the even, they shared that there are more overall attendees here than there were at last year's event (6,000+) and that the average distance folks traveled to attend #IOD2009 is 2500 miles. You can actually participate in the daily IOD polls at http://iodpoll.com.

In between the entertainment respites, we got to hear from several IBM folks, including Ambuj Goyal, Arvind Krishna, and Frank Kern. According to Mr. Goyal, IBM is building an information on demand software stack to deliver trusted information. This can solve a very real problem that organizations are experiencing today. You see, it seems that one in three business leaders frequently make critical decisions based on inaccurate or non-existing information. Business executives don't just need data, but trusted information that can be relied upon for analysis to make accurate business decisions.

Again, according to Goyal, he sees IBM's Information Agenda doing for trusted information what ERP did for enterprise resource processes. IBM will help move its customers from information projects to information-based architecture and operations, moving organizations from information on demand to information transformation.

Then Frank Kern hosted three IBM customers as he asked each of them to explain how they benefited from IBM's Information Management solutions. The representative from BlueCross BlueShield talked about their data warehouse with 54 millions records, which happens to be the largest healthcare data warehouse in the world. Their analytical data is made available thru a portal. Offering integrated healthcare details helps improve healthcare. And the Chevron representative spoke about how IBM has helped Chevron improve its supply chain operations using integrated information.

And Arvind Krishna, GM of the Information Management group at IBM, told about IBM's on-going investments in research and development for IOD. So far, IBM has invested over $12 billion in R+D and acquisitions.

IBM also unveiled some announcements today at IOD including several new offerings to provide organizations with analytics capabilities to make better use of their archived information and improve business processes. These offerings are part of IBM's unified archiving strategy called IBM Smart Archive. Most intriguing to me is the preview of a SaaS offering called IBM Information Archive Cloud Services. The entire press release can be read here, if you're interested in the topic.

I also attended a presentation on DB2 X which was quite interesting, too. Delivered by Jeff Josten, there was much good news for mainframe DB2 folks including improved performance, temporal support, improved utility operations, hashed data and access paths, and more. I tweeted about this quite extensively on Twitter.

In fact, if you want to follow my tweets all week (well, thru Wednesday) be sure to follow me on Twitter at http://www.twitter.com/craigmullins... even when I'm not at a conference like IOD, I regularly tweet on DB2, database, and information management topics.

Monday, October 19, 2009

Hear Ye, Hear Ye, Learn All About DB2 X for z/OS

IBM is hosting a free webinar on November 3, 2009, offering a technical preview of the next version of DB2 for z/OS, currently known as DB2 X (but we all know, err, ah, think it will be called DB2 10).

If you work with DB2 on the mainframe you will want to set aside some time to attend this informative DB2 X webinar. It will have information for DBAs, as well as for Managers, Application Architects, Developers, System Administrators, System Programmers, and System Architects... and that is just about anyone who works with DB2.

The speaker will be IBM Distinguished Engineer, Jeff Josten.

How do you participate, you may be asking? Well, that is easy. Simply click on over to DB2 X for z/OS Technical Preview and register yourself.

The webinar will be held on November 3, 2009 at 11:00 a.m. Eastern Standard Time, 4:00 p.m. UTC.

Monday, October 12, 2009

On The Importance of Choosing the Correct Data Type

Most DBAs have grappled with the pros and cons of choosing one data type versus another. Sometimes the decision is easy, whereas sometimes the decision is a little bit more difficult. In today's blog entry, we'll discuss both situations.

Data type and length are the most fundamental integrity constraints applied to data in a database. Simply by specifying the data type for each column when a table is created, DB2 automatically ensures that only the correct type of data is stored in that column. Processes that attempt to insert or update the data to a non-conforming value will be rejected. Furthermore, a maximum length is assigned to the column to prohibit larger values from being stored in the table.

The DBA must choose the data type and length of each column wisely. The general rule of thumb for choosing a data type for the columns in your tables is to choose the data type that most closely matches the domain of correct values for the column. That means you should try to adhere to the following rules:

If the data is numeric, favor SMALLINT, INTEGER, BIGINT, or DECIMAL data types. DECFLOAT and FLOAT are also options for very large numbers.
If the data is character, use CHAR or VARCHAR data types.
If the data is date and time, use DATE, TIME, and TIMESTAMP data types.
If the data is multimedia, use GRAPHIC, VARGRAPHIC, BLOB, CLOB, or DBCLOB data types.

But, of course, there are always exceptions. For example, what about social security number? That is a numeric column, but you won't be performing calculations using it. And, to format it correctly requires hyphens, which cannot be stored in a numeric data type. So, perhaps, SSN is an exception.

Leading Zeroes

Let's consider another typical situation: the desire to display leading zeroes. Maybe the application requires a four-byte code that is used to identify products, accounts, or some other business object, and all of the codes are numeric and will stay that way. But, for reporting purposes, the users want the codes to print out with leading zeroes. So, the users request that the column be defined as CHAR(4) to ensure that leading zeroes are always shown. But what are the drawbacks of doing this?

Well, there are drawbacks! Without proper edit checks, inserts and updates could place invalid alphabetic characters into the product code. This can be a very valid concern if ad hoc data modifications are permitted. This is rare in production databases, but data problems can still occur if the proper edit checks are not coded into every program that can modify the data. But let's assume that proper edit checks are coded and will never be bypassed. This removes the data integrity question (yeah, right!).

There is another problem that is related to performance and filter factors. Consider the possible number of values that a CHAR(4) column and a SMALLINT column can assume. Even if edit checks are coded for each, DB2 is not aware of these and assumes that all combinations of characters are permitted. DB2 uses base 37 math when it determines access paths for character columns, under the assumption that 26 alphabetic letters, 10 numeric digits, and a space will be used. This adds up to 37 possible characters. For a four-byte character column there are 374 or 1,874,161 possible values.

A SMALLINT column can range from -32,768 to 32,767 producing 65,536 possible small integer values. The drawback here is that negative or 5 digit product codes could be entered. However, if we adhere to our proper edit check assumption, the data integrity problems will be avoided here, as well. And we can always code a simple CHECK constraint to ensure that the code is always greater than zero.

DB2 will use the HIGH2KEY and LOW2KEY values to calculate filter factors. For character columns, the range between HIGH2KEY and LOW2KEY is larger than numeric columns because there are more total values. The filter factor will be larger for the numeric data type than for the character data type which may influence DB2 to choose a different access path. For this reason, favor the SMALLINT over the CHAR(4) definition.

The leading zeroes problem might be able to be solved using other methods. If you are using a report writer, most of them have quick methods of displaying leading zeroes. When using QMF, you can ensure that leading zeroes are shown by using the "J" edit code. Report programs can be coded to display leading zeroes easily enough by moving the host variables to appropriate display fields. Ad hoc access through other reporting tools typically provide a parameter that can enable leading zeroes to be displayed.

In general, it is wise to choose a data type which is closest to the domain for the column. IF the column is to store numeric data, favor choosing a numeric data type: SMALLINT, INTEGER, DECIMAL, or floating point. In addition, always be sure to code appropriate edit checks to ensure data integrity.

A DATE with DB2

What about dates? In my opinion, you should always use a DATE data type for date data. Why anyone would ever store a date or time in a DB2 table any other format than DATE, TIME, or TIMESTAMP is beyond me. But, oh, they do it all the time. And it causes all sorts of headaches. The benefits of using the proper DB2 data type are many, including:

Ensuring data integrity because DB2 will ensure that only valid date and time values are stored
The ability to use date and time arithmetic
A vast array of built-in functions to operate on and transform date and time values
Multiple formatting choices

Additionally, you have multiple built-in functions at your disposal for manipulating date and time data, but only when the data is stored as DATE, TIME, and TIMESTAMP data types. The time-related DB2 functions include CHAR, DATE, DAY, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, DAYS, HOUR, JULIAN_DAY, MICROSECOND, MIDNIGHT_SECONDS, MINUTE, MONTH, QUARTER, SECOND, TIME, TIMESTAMP, WEEK, WEEK_ISO, and YEAR.

I get questions all the time from folks who've stored dates using character data types; they ask all kinds of integrity and manipulation questions and I always, always, always start my reply with "You shouldn't have done that!"... and then I try to help them out, if I can. Don't fall into the trap of storing dates using anything but a DATE data type... you'll thank me later.

Summary

There is a lot more that can be said about choosing appropriate data types, but I think we've covered enough for today. The bottom line on data types is to use them to protect the integrity of your DB2 data and to simplify your job by taking advantage of DB2’s built-in capabilities. By choosing that data type that most closely matches your data you will be doing yourself, your systems, and your users a big favor.