Wednesday, February 03, 2010
IBM Manages the Data Lifecycle
Not many companies today can track all of their important data and what happens to it throughout its entire lifecycle. But doing so is important. Having such a capability enables organizations to adapt and react, gaining a competitive advantage. Much can go awry as data moves throughout an organization. Schema changes, policy changes, regulations adapt, programs change, formats changes, and so on. Any of these things can cause data quality issues, which should be brought to the attention of the business analyst using the data. But how often is this done? Knowing the history of data and its related metadata can improve business processes. But it is a major task – both for businesses and IT vendors hoping to offer solutions.
Which brings me to today’s (February 3, 2010) announcements from IBM. Big Blue announced new data protection software, a line of consulting services and resources and previewed information monitoring software to help organizations expand their use of trusted information to improve decision making. These moves further bolster IBM’s already formidable arsenal of data lifecycle management solutions.
The data protection announcement was for Optim Data Redaction. This solution, engineered for unstructured data like Word documents and PDF files, automatically recognizes and removes sensitive content from documents and forms. For example, a customer’s credit scores in a loan document could be hidden from an office clerk, while still being visible to a loan officer. In today’s atmosphere of more and more stringent regulations, a data redaction solution is becoming a requirement. For example, PCI DSS industry standards dictate specific rules regarding the display of debit and credit card information on receipts and reports.
Optim Data Redaction is planned for general availability in March 2010.
The information monitoring announcement was for InfoSphere Business Monitor. This technology is based on a combination of work from IBM’s research group and technology gained when IBM acquired Guardium. Guardium is a database activity monitoring (or auditing) solution. InfoSphere Business Monitor tracks the quality and flow of an organization’s information and provides real-time alerts of potential flaws. For example, if a health insurance company was analyzing profit margins across different product lines (individual, group, HMO, Medicare, etc.), decision makers would immediately be alerted when a data feed from a specific geography was not successfully integrated.
InfoSphere Business Monitor is available as a technology preview; it is not generally available and no GA date was announced.
At the same time, IBM announced its intention to acquire Initiate Systems, a provider of data integrity software for information sharing among healthcare and government organizations. Initiate's software helps healthcare clients work more intelligently and efficiently with timely access to patient and clinical data. It also enables governments to share information across multiple agencies to better serve citizens. IBM plans to continue to support and enhance Initiate's technologies while helping clients take advantage of the broader IBM portfolio, specifically Cognos and InfoSphere solutions for BI and analytics. This acquisition bolsters IBM’s data lifecycle management offering along these verticals.
And all of today’s announcements serve to clarify IBM’s ascent to the throne within the realm of information and data lifecycle management.
Tuesday, February 02, 2010
Larry Sure Knows How to Get Press
Here are some of the highlight (?) of the claims Ellison made about DB2 during a webcast last week.
Regarding TPC-C benchmarks, Ellison claims to have "(blown) the doors off of IBM. We crushed them." He went on to elaborate saying "In a machine that took up less than 10% the floor space, of IBM's record setting computer. We ran faster, we ran a lot faster: using a tiny fraction of the floor space, a tiny fraction of the power, cost less."
First of all, technicians working in trenches know that benchmarks are not indicative of real life performance. That aside, it is true that Oracle currently has the leading TPC-C benchmark result. Until late in 2009, DB2 enjoyed a massive 49% lead over Oracle. Oracle's most recent results give them a 25% lead (using more than six times as many CPU cores to do it).
Regarding the claim of using less space and power, this is due to Oracle using flash memory and comparing it with an IBM benchmark using conventional disk technology. If Oracle compared its benchmark to an IBM system using flash memory, these claims would not stand.
Later, Ellison claimed that "SAP chooses the Oracle Database to run under SAP in almost all their large accounts." As anyone who follows the computer industry knows, this claim is rather absurd. SAP's customers choose the DBMS to run, not SAP. And if SAP had anything to say about it, they would not recommend Oracle, their biggest competitor in the commercial business applications space. Furthermore, SAP favors DB2 for their own systems. They operate more than a thousand SAP systems, and all of those systems run on DB2.
Perhaps the silliest of Ellison's comments is this: "The Oracle Database scales out, IBM DB2 for Unix does not. Let me see, how many servers can IBM put together for an OLTP application? Let's see, how many can they group together? Um, one. They can have up to one server attacking really big jobs. When they need more capacity, they make that server bigger. And then they take the old server out, put a bigger one in. And when you've got the biggest server, that's it. That's all the can do for OLTP." Ellison also claimed that IBM "can't scale out, they can't do cloud, they can't do clusters, the can't do any of this."
I bet this surprised a lot of DB2 users doing these things with DB2! DB2 Parallel Edition was released in 1995, along with the capability to scale to a system of over a 100 Unix servers. DB2 LUW scalability is proven in many of the world's largest OLTP environments. Consider this press release talking about how DB2 LUW powers one of the largest OLTP systems in the world.
And what about that clustering claim? Evidently Mr. Ellison slept through 2009. IBM DB2 pureScale, released last year, offers powerful, efficient database clustering. For a cluster of 64 nodes, DB2 pureScale maintains 95% efficiency. At 128 nodes, DB2 pureScale maintains 84% efficiency. This is important because if you are growing a cluster to handle bigger workloads, you want your hardware to be doing productive work, not handling system overhead. On the other hand, Oracle RAC has a 100 server limit...
Ellison also made other far-out claims about IBM like "They're so far behind, I don't think they have any chance at all. I'm serious." Ellison also said "They are not competitive in the database business, except on the mainframe."
If this were true, why would Ellison spend any time thinking or talking about IBM. He must be worried, IMHO. Anyone with even a cursory knowledge of the computer industry has to admire IBM. They have led the industry in developing patents for the last 17 years. In 2009, IBM produced 4914 patents while Oracle did not even place in the top 50 patent leaders. A search of the US Patent office database reveals 1588 patents with "database" in the patent description while Oracle produced only 184 patents.
Hyperbole is one thing, but gross inaccuracy is another. In his latest tirade, Ellison is guilty of both. Oracle makes a good DBMS... pity its CEO doesn't think it can sell it on its own merits without making up stuff about the competition.
Monday, February 01, 2010
Some New Year's Resolutions for DBAs
This is sort of a re-blogging (to coin a term). I first published this last month in the Data Management Today blog I wrote for NEON. Well, I no longer work for NEON and I'm not sure how long that blog will remain active, so I thought it might make sense to re-blog some of the pertinent entries here... so here goes with my New Year's Resolutions for DBAs blog entry...
At the beginning over every year many of us take the time to cobble together some resolutions for the coming year. We plan to lose weight, save money, stop smoking, and so on. Usually, it doesn’t take long before we’ve abandoned these resolutions. Perhaps we’d be wiser to make some business related resolutions. With that in mind, here are some thoughts on the New Year’s resolutions you might be wise to make as a DBA in 2010.
Are you insatiably curious? A good DBA must become a jack-of-all-trades. DBAs are expected to know everything about everything -- at least in terms of how it works with databases. From technical and business jargon to the latest management and technology fads, the DBA is expected to be "in the know." So perhaps “be more curious” would be a useful DBA resolution.
Most DBAs know that private time is a luxury we cannot afford. A DBA must be prepared for interruptions at any time to answer any type of question -- and not just about databases, either. With that in mind, how are your people skills? DBA are usually respected as a database guru, but also frequently criticized as a curmudgeon with limited people skills. Just about every database programmer has his or her favorite DBA story. You know, those anecdotes that begin with "I had a problem..." and end with "and then he told me to stop bothering him and read the manual." DBAs simply do not have a "warm and fuzzy" image. However, this perception probably has more to do with the nature and scope of the job than with anything else. The DBMS spans the enterprise, effectively placing the DBA on call for the applications of the entire organization. As such, you will interact with many different people and take on many different roles. To be successful, you will need an easy-going and somewhat amiable manner. So another good New Year’s resolution might be to “improve your people skills.” Take a Dale Carnegie course or start by reading Carnegie’s seminal book, How to Win Friends and Influence People.
How adaptable you are? A day in the life of a DBA is usually quite hectic. The DBA maintains production and test environments, monitors active application development projects, attends strategy and design meetings, selects and evaluates new products and connects legacy systems to the Web. And, of course: Joe in Accounting just resubmitted that query from hell that's bringing the system to a halt. Can you do something about that? All of this can occur within a single workday. You must be able to embrace the chaos to succeed as a DBA. So a third resolution might be to “roll with the punches” better – and without complaining!
Of course, you need to be organized and capable of succinct planning, too. Being able to plan for changes and implement new functionality is a key component of database administration. And although this may seem to clash with the need to be flexible and adaptable, it doesn't really. Not once you get used to it. You just need to prepare yourself to be adapatable and organize to incorporate change more rapidly than others. So my final suggestion for a 2010 New Year’s resolution is to adopt a planning methodology and stick to it. Buy a planner – either electronic or not – and use it this year. You might even consider taking a time management class.
If you keep all of these resolutions, just imagine how productive you will be in 2010. And then you can use 2011 to lose weight and save money and…
Monday, January 25, 2010
Which is better? "BETWEEN" vs "<=" and >"="
As with most DB2 (and, indeed, IT) issues, the correct answer is "it depends!" Let's dig a bit deeper to explain what I mean.
From a maintainability perspective, BETWEEN is probably better. The BETWEEN predicate is easier to understand and code than the equivalent combination of the less than or equal to predicate (<=) and the greater than or equal to predicate (>=). In past releases, in many cases it was more efficient, too. But today the Optimizer recognizes the two formulations as equivalent and there usually is no performance benefit one way or the other. Performance reasons aside, one BETWEEN predicate is easier to understand and maintain than multiple <= and >= predicates. For this reason, I tend to favor using BETWEEN.
But not always. Consider the scenario of comparing a host variable to two columns. Usually BETWEEN is used to compare one column to two values, here shown using host variables:
WHERE COLUMN1 BETWEEN :HOST-VAR1 AND :HOST-VAR2
However, it is possible to use BETWEEN to compare one value to two columns, as shown:
WHERE :HOST-VAR BETWEEN COLUMN1 AND COLUMN2
This statement should be changed to
WHERE :HOST_VAR >= COLUMN1 and :HOST-VAR <= COLUMN2
The reason for this exception is that a BETWEEN formulation comparing a host variable to two columns is a Stage 2 predicate, whereas the preferred formulation is Stage 1. And we all know that Stage 1 outperforms Stage 2, right?
Remember too, that SQL is flexible and often the same results can be achieved using different SQL formulations. Sometimes one SQL statement will dramatically outperform a functionally equivalent SQL statement just because it is indexable and the other is not. For example, consider this SQL statement
SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
FROM EMP
WHERE MIDINIT NOT BETWEEN 'A' AND 'G';
It is not indexable because it uses the NOT BETWEEN predicate. However, if we understand the data in the table and the desired results, perhaps we can reformulate the SQL to use indexable predicates, such as
SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
FROM EMP
WHERE MIDINIT >= 'H';
Or we could code MIDINIT BETWEEN 'H' AND 'Z' in place of MIDINIT >= 'H'. Of course, for either of these solutions to work correctly we would need to know that MIDINIT never contained values that collate lower than the value 'A'.
So, as usual, there is no one size fits all answer to the question!
Wednesday, January 20, 2010
On Specialty Processors
As many of my readers know I write two blogs, this one that is predominantly about DB2 for z/OS, and the Data Management Today blog, that focuses on data-related issues, database management, and industry trends.
Well, I've written a series of entries on my other blog about specialty processors (zIIPs, zAAPs, etc.) and related issues. Since DB2 for z/OS folks should find that information useful and relevant, I thought I'd write an entry pointing my readers here to the pertinent specialty processor entries at my Data Management Today blog, so here goes...
The first entry I'd like to call your attention to is titled Specialty Processors on the Mainframe. This piece is basically an introduction to the different types of specialty processors (zIIp, zAAP, IFL, ICF), and what they can be used for. It is a good place to start if you are new to specialty processors or are looking for an update.
The second entry worth a peak is titled simply zAAP on zIIP. As you may or may not know, IBM delivered the capability for zAAP work to run on a zIIP (with certain conditions). This blog entry provides a brief synopsis of the August 18, 2009 z/OS V1.11 announcement that introduced the new capability to enable zAAP-eligible workloads to run on zIIPs
Next up is What is an Enclave? If you are working with zIIPs you have probably heard the term Enclave SRB. And if you are doing any type of distributed workload you've probably heard about enclaves, too. This blog entry is for those who are new to the term, or are confused about it. It offers an explanatory definition of the term "enclave" and points you on to additional reference material for those interested.
My most recent post over there (January 19, 2010), titled What is Generosity Factor?, has been a popular one. This blog entry delves into the generosity factor imposed upon zIIP workload including definitions of geneorsity factor, qualified and eligible work, and a discussion of what it implies for ISV products.
Hope you find this material worthwhile... and thanks for your continued support of my blogs.