Tuesday, February 09, 2010

IBM Announces DB2 10 for z/OS Beta Program

IBM announced the beta program for the next version of DB2 today, now "officially" known as DB2 10 (no more DB2 X). It is a closed beta program that will begin on March 12, 2010. That means you have to be selected by IBM to participate.

The announcement highlighted some of the areas of improvement to be delivered by DB2 10 for z/OS, and at the top of that list, to no one's surprise, is performance. DB2 10 promises to deliver out-of-the-box savings by improving operational efficiencies ranging from 5% to 10% out-of-the-box CPU savings for traditional workloads and up to 20% out-of-the-box CPU savings for nontraditional workloads.

Other areas called out by IBM in the announcement include
  • Improved business resiliency through scalability improvements and fewer outages (planned or unplanned).
  • Schema evolution or data definition on demand as well as query performance manageability enhancements support improved availability.
  • New features such as hash access, index include columns, inline large objects, parallel index updates, faster single row retrievals, work file in-memory, index list prefetch, 64-bit memory enhancements, use of the 1 MB page size of the System z10, buffer pools in memory, access path enhancements, member clustering for universal table spaces, efficient caching of dynamic SQL statements with literals, improved large object streaming, and SQL procedure language performance.
  • Rapid application and warehouse deployment for business growth including improved concurrency for data access, data management, and data definition.
  • The ability avoid an outage by adding active log data to a subsystem.
  • Improved application and data warehousing support including temporal data, a 64 bit ODBC driver, currerntly committed locking, implicit casting or loose typing, timestamp with time zone, variable timestamp precision, moving sum, and moving average.
  • Improvements to DB2's XML support including expanded pureXML, customer-driven performance and usability requirements, schema validation in the engine, binary XML exchange format, multiversioning, easy update of subparts of XML document, stored procedures, user-defined functions and triggers, XML index matching with date/timestamp, and a CHECK XML utility.
  • Enhanced query and reporting facilities, including QMF V10 with over 140 new analytical functions, support for HTML, PDF, and Flash reports, and more.
So it would seem that there is a lot of new functionality for us to begin to become acquainted with. As IBM rolls out more details, and customers begin to use the new version of DB2, we will examine some of these new features in more depth here on the DB2 Portal blog.

If you are interested in the beta program, the pre-requisite for DB2 10 is z/OS V1.10 (5694-A01) or later running in 64 bit mode. More information about the DB2 10 beta program is available on IBM's web site.

No GA date for DB2 10 has been announced.

Wednesday, February 03, 2010

IBM Manages the Data Lifecycle

Data lifecycle is a somewhat new-ish term, at least in terms of what I plan to talk about in this blog posting. The data lifecycle – and data lifecycle management – deals with tracking, managing, and understanding data and metadata as it flows through organizations. From its inception…whether entered by a clerk or read via a feed or loaded from an external source, etc…through its various usages…whether to conduct business, analyze trends and patterns, and so on…tracked from system to system, application to application, and user to user…and finally through its end of life.

Not many companies today can track all of their important data and what happens to it throughout its entire lifecycle. But doing so is important. Having such a capability enables organizations to adapt and react, gaining a competitive advantage. Much can go awry as data moves throughout an organization. Schema changes, policy changes, regulations adapt, programs change, formats changes, and so on. Any of these things can cause data quality issues, which should be brought to the attention of the business analyst using the data. But how often is this done? Knowing the history of data and its related metadata can improve business processes. But it is a major task – both for businesses and IT vendors hoping to offer solutions.

Which brings me to today’s (February 3, 2010) announcements from IBM. Big Blue announced new data protection software, a line of consulting services and resources and previewed information monitoring software to help organizations expand their use of trusted information to improve decision making. These moves further bolster IBM’s already formidable arsenal of data lifecycle management solutions.

The data protection announcement was for Optim Data Redaction. This solution, engineered for unstructured data like Word documents and PDF files, automatically recognizes and removes sensitive content from documents and forms. For example, a customer’s credit scores in a loan document could be hidden from an office clerk, while still being visible to a loan officer. In today’s atmosphere of more and more stringent regulations, a data redaction solution is becoming a requirement. For example, PCI DSS industry standards dictate specific rules regarding the display of debit and credit card information on receipts and reports.

Optim Data Redaction is planned for general availability in March 2010.

The information monitoring announcement was for InfoSphere Business Monitor. This technology is based on a combination of work from IBM’s research group and technology gained when IBM acquired Guardium. Guardium is a database activity monitoring (or auditing) solution. InfoSphere Business Monitor tracks the quality and flow of an organization’s information and provides real-time alerts of potential flaws. For example, if a health insurance company was analyzing profit margins across different product lines (individual, group, HMO, Medicare, etc.), decision makers would immediately be alerted when a data feed from a specific geography was not successfully integrated.

InfoSphere Business Monitor is available as a technology preview; it is not generally available and no GA date was announced.

At the same time, IBM announced its intention to acquire Initiate Systems, a provider of data integrity software for information sharing among healthcare and government organizations. Initiate's software helps healthcare clients work more intelligently and efficiently with timely access to patient and clinical data. It also enables governments to share information across multiple agencies to better serve citizens. IBM plans to continue to support and enhance Initiate's technologies while helping clients take advantage of the broader IBM portfolio, specifically Cognos and InfoSphere solutions for BI and analytics. This acquisition bolsters IBM’s data lifecycle management offering along these verticals.

And all of today’s announcements serve to clarify IBM’s ascent to the throne within the realm of information and data lifecycle management.

Tuesday, February 02, 2010

Larry Sure Knows How to Get Press

Going under the assumption (I assume) that no press is bad press, Oracle CEO Larry Ellison has attacked IBM's DB2... but he made several factual errors in his rant.

Here are some of the highlight (?) of the claims Ellison made about DB2 during a webcast last week.

Regarding TPC-C benchmarks, Ellison claims to have "(blown) the doors off of IBM. We crushed them." He went on to elaborate saying "In a machine that took up less than 10% the floor space, of IBM's record setting computer. We ran faster, we ran a lot faster: using a tiny fraction of the floor space, a tiny fraction of the power, cost less."

First of all, technicians working in trenches know that benchmarks are not indicative of real life performance. That aside, it is true that Oracle currently has the leading TPC-C benchmark result. Until late in 2009, DB2 enjoyed a massive 49% lead over Oracle. Oracle's most recent results give them a 25% lead (using more than six times as many CPU cores to do it).

Regarding the claim of using less space and power, this is due to Oracle using flash memory and comparing it with an IBM benchmark using conventional disk technology. If Oracle compared its benchmark to an IBM system using flash memory, these claims would not stand.

Later, Ellison claimed that "SAP chooses the Oracle Database to run under SAP in almost all their large accounts." As anyone who follows the computer industry knows, this claim is rather absurd. SAP's customers choose the DBMS to run, not SAP. And if SAP had anything to say about it, they would not recommend Oracle, their biggest competitor in the commercial business applications space. Furthermore, SAP favors DB2 for their own systems. They operate more than a thousand SAP systems, and all of those systems run on DB2.

Perhaps the silliest of Ellison's comments is this: "The Oracle Database scales out, IBM DB2 for Unix does not. Let me see, how many servers can IBM put together for an OLTP application? Let's see, how many can they group together? Um, one. They can have up to one server attacking really big jobs. When they need more capacity, they make that server bigger. And then they take the old server out, put a bigger one in. And when you've got the biggest server, that's it. That's all the can do for OLTP." Ellison also claimed that IBM "can't scale out, they can't do cloud, they can't do clusters, the can't do any of this."

I bet this surprised a lot of DB2 users doing these things with DB2! DB2 Parallel Edition was released in 1995, along with the capability to scale to a system of over a 100 Unix servers. DB2 LUW scalability is proven in many of the world's largest OLTP environments. Consider this press release talking about how DB2 LUW powers one of the largest OLTP systems in the world.

And what about that clustering claim? Evidently Mr. Ellison slept through 2009. IBM DB2 pureScale, released last year, offers powerful, efficient database clustering. For a cluster of 64 nodes, DB2 pureScale maintains 95% efficiency. At 128 nodes, DB2 pureScale maintains 84% efficiency. This is important because if you are growing a cluster to handle bigger workloads, you want your hardware to be doing productive work, not handling system overhead. On the other hand, Oracle RAC has a 100 server limit...

Ellison also made other far-out claims about IBM like "They're so far behind, I don't think they have any chance at all. I'm serious." Ellison also said "They are not competitive in the database business, except on the mainframe."

If this were true, why would Ellison spend any time thinking or talking about IBM. He must be worried, IMHO. Anyone with even a cursory knowledge of the computer industry has to admire IBM. They have led the industry in developing patents for the last 17 years. In 2009, IBM produced 4914 patents while Oracle did not even place in the top 50 patent leaders. A search of the US Patent office database reveals 1588 patents with "database" in the patent description while Oracle produced only 184 patents.

Hyperbole is one thing, but gross inaccuracy is another. In his latest tirade, Ellison is guilty of both. Oracle makes a good DBMS... pity its CEO doesn't think it can sell it on its own merits without making up stuff about the competition.

Monday, February 01, 2010

Some New Year's Resolutions for DBAs

This is sort of a re-blogging (to coin a term). I first published this last month in the Data Management Today blog I wrote for NEON. Well, I no longer work for NEON and I'm not sure how long that blog will remain active, so I thought it might make sense to re-blog some of the pertinent entries here... so here goes with my New Year's Resolutions for DBAs blog entry...

At the beginning over every year many of us take the time to cobble together some resolutions for the coming year. We plan to lose weight, save money, stop smoking, and so on. Usually, it doesn’t take long before we’ve abandoned these resolutions. Perhaps we’d be wiser to make some business related resolutions. With that in mind, here are some thoughts on the New Year’s resolutions you might be wise to make as a DBA in 2010.

Are you insatiably curious? A good DBA must become a jack-of-all-trades. DBAs are expected to know everything about everything -- at least in terms of how it works with databases. From technical and business jargon to the latest management and technology fads, the DBA is expected to be "in the know." So perhaps “be more curious” would be a useful DBA resolution.

Most DBAs know that private time is a luxury we cannot afford. A DBA must be prepared for interruptions at any time to answer any type of question -- and not just about databases, either. With that in mind, how are your people skills? DBA are usually respected as a database guru, but also frequently criticized as a curmudgeon with limited people skills. Just about every database programmer has his or her favorite DBA story. You know, those anecdotes that begin with "I had a problem..." and end with "and then he told me to stop bothering him and read the manual." DBAs simply do not have a "warm and fuzzy" image. However, this perception probably has more to do with the nature and scope of the job than with anything else. The DBMS spans the enterprise, effectively placing the DBA on call for the applications of the entire organization. As such, you will interact with many different people and take on many different roles. To be successful, you will need an easy-going and somewhat amiable manner. So another good New Year’s resolution might be to “improve your people skills.” Take a Dale Carnegie course or start by reading Carnegie’s seminal book, How to Win Friends and Influence People.

How adaptable you are? A day in the life of a DBA is usually quite hectic. The DBA maintains production and test environments, monitors active application development projects, attends strategy and design meetings, selects and evaluates new products and connects legacy systems to the Web. And, of course: Joe in Accounting just resubmitted that query from hell that's bringing the system to a halt. Can you do something about that? All of this can occur within a single workday. You must be able to embrace the chaos to succeed as a DBA. So a third resolution might be to “roll with the punches” better – and without complaining!

Of course, you need to be organized and capable of succinct planning, too. Being able to plan for changes and implement new functionality is a key component of database administration. And although this may seem to clash with the need to be flexible and adaptable, it doesn't really. Not once you get used to it. You just need to prepare yourself to be adapatable and organize to incorporate change more rapidly than others. So my final suggestion for a 2010 New Year’s resolution is to adopt a planning methodology and stick to it. Buy a planner – either electronic or not – and use it this year. You might even consider taking a time management class.

If you keep all of these resolutions, just imagine how productive you will be in 2010. And then you can use 2011 to lose weight and save money and…

Monday, January 25, 2010

Which is better? "BETWEEN" vs "<=" and >"="

This was a recent topic on the DB2-L mailing list so I thought I'd weigh in with my two cents worth on the topic.

As with most DB2 (and, indeed, IT) issues, the correct answer is "it depends!" Let's dig a bit deeper to explain what I mean.

From a maintainability perspective, BETWEEN is probably better. The BETWEEN predicate is easier to understand and code than the equivalent combination of the less than or equal to predicate (<=) and the greater than or equal to predicate (>=). In past releases, in many cases it was more efficient, too. But today the Optimizer recognizes the two formulations as equivalent and there usually is no performance benefit one way or the other. Performance reasons aside, one BETWEEN predicate is easier to understand and maintain than multiple <= and >= predicates. For this reason, I tend to favor using BETWEEN.

But not always. Consider the scenario of comparing a host variable to two columns. Usually BETWEEN is used to compare one column to two values, here shown using host variables:

WHERE COLUMN1 BETWEEN :HOST-VAR1 AND :HOST-VAR2

However, it is possible to use BETWEEN to compare one value to two columns, as shown:

WHERE :HOST-VAR BETWEEN COLUMN1 AND COLUMN2

This statement should be changed to

WHERE :HOST_VAR >= COLUMN1 and :HOST-VAR <= COLUMN2

The reason for this exception is that a BETWEEN formulation comparing a host variable to two columns is a Stage 2 predicate, whereas the preferred formulation is Stage 1. And we all know that Stage 1 outperforms Stage 2, right?

Remember too, that SQL is flexible and often the same results can be achieved using different SQL formulations. Sometimes one SQL statement will dramatically outperform a functionally equivalent SQL statement just because it is indexable and the other is not. For example, consider this SQL statement

SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
FROM EMP
WHERE MIDINIT NOT BETWEEN 'A' AND 'G';

It is not indexable because it uses the NOT BETWEEN predicate. However, if we understand the data in the table and the desired results, perhaps we can reformulate the SQL to use indexable predicates, such as

SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
FROM EMP
WHERE MIDINIT >= 'H';

Or we could code MIDINIT BETWEEN 'H' AND 'Z' in place of MIDINIT >= 'H'. Of course, for either of these solutions to work correctly we would need to know that MIDINIT never contained values that collate lower than the value 'A'.

So, as usual, there is no one size fits all answer to the question!