The Db2 Portal Blog

Thursday, April 22, 2010

The Ever-Changing Role of the DBA

Defining the job of DBA is getting to be increasingly difficult. Oh, most people know the rudimentary aspects of the job, namely keeping your organization's databases and applications running up to par. The DBA has to be the resident DBMS expert (whether that is DB2, Oracle or SQL Server, or most likely a combination of those). He or she has to be able to solve thorny performance problems, ensure backups are taken, recover and restore data when problems occur, make operational changes to database structures and, really, be able to tackle any issue that arises that is data-related.

The technical duties of the DBA are numerous. These duties span the realm of IT disciplines from database design to physical implementation and consistent, on-going monitoring of the database environment.

DBAs must possess the abilities to create, interpret, and communicate a logical data model and to create an efficient physical database design from a logical data model and application specifications. There are many subtle nuances involved that make these tasks more difficult than they sound. And this is only the very beginning. DBAs also need to be able to collect, store, manage, and query data about the data (metadata) in the database and disseminate it to developers that need the information to create effective application systems. This may involve repository management and administration duties, too.

After a physical database has been created from the data model, the DBA must be able to manage that database once it has been implemented. One major aspect of this management involves performance management. A proactive database monitoring approach is essential to ensure efficient database access. The DBA must be able to utilize the monitoring environment, interpret its statistics, and make changes to data structures, SQL, application logic, and the DBMS subsystem to optimize performance. And systems are not static, they can change quite dramatically over time. So the DBA must be able to predict growth based on application and data usage patterns and implement the necessary database changes to accommodate the growth.

And performance management is not just managing the DBMS and the system. The DBA must understand SQL used to access relational databases. Furthermore, the DBA must be able to review SQL and host language programs and to recommend changes for optimization. As databases are implemented with triggers, stored procedures, and user-defined functions, the DBA must be able to design, debug, implement, and maintain these code-based database objects as well.

Additionally, data in the database must be protected from hardware, software, system, and human failures. The ability to implement an appropriate database backup and recovery strategy based on data volatility and application availability requirements is required of DBAs. Backup and recovery is only a portion of the data protection story, though. DBAs must be able to design a database so that only accurate and appropriate data is entered and maintained - this involves creating and managing database constraints in the form of check constraints, rules, triggers, unique constraints, and referential integrity.

DBAs also are required to implement rigorous security schemes for production and test databases to ensure that only authorized users have access to data. As industry and governmental regulations multiply, the need to audit who did what to which data when is also a requirement for sensitive data in production systems – and the DBA must be involved in ensuring data auditability without impacting availability or performance.

And there is more! The DBA must possess knowledge of the rules of relational database management and the implementation of many different DBMS products. Also important is the ability to accurately communicate them to others. This is not a trivial task since each DBMS is different than the other and many organizations have multiple DBMS products (e.g., DB2, Oracle, SQL Server).

Remember, too, that the database does not exist in a vacuum. It must interact with other components of the IT infrastructure. As such, the DBA must be able to integrate database administration requirements and tasks with general systems management requirements and tasks such as network management, production control and scheduling, and problem resolution, to name just a few systems management disciplines. The capabilities of the DBA must extend to the applications that use databases, too. This is particularly important for complex ERP systems that interface differently with the DBMS. The DBA must be able to understand the requirements of the application users and to administer their databases to avoid interruption of business. This includes understanding how any ERP packages impact the business and how the databases used by those packages differ from traditional relational databases.

But Things Are Changing

So at a high level, DBAs are tasked with managing and assuring the integrity and efficiency of database systems. But keep in mind, too, that there are actually many different DBAs. Some focus on logical design; others focus on physical design; some DBAs specialize in building systems and others specialize in maintaining and tuning systems; and there are specialty DBAs and general-purpose DBAs. Truly, the job of DBA encompasses many roles.

Some organizations choose to split DBA responsibilities into separate jobs. Of course, this occurs most frequently in larger organizations, because smaller organizations often cannot afford the luxury of having multiple, specialty DBAs.

Still other companies simply hire DBAs to perform all of the tasks required to design, create, document, tune, and maintain the organization’s data, databases, and database management systems.

But no matter what "type" of DBA you happen to be, chances are that your role is changing and adapting to new types of computing and data requirements. Indeed, one of the biggest challenges for DBAs these days is the ongoing redefinition of the job roles and responsibilities.

The primary role of database "custodian," of course, continues to be the main emphasis of the job. But that is no longer sufficient for most organizations. The DBA is expected to take on numerous additional -- mostly technical -- roles. These can include writing application code, managing the application server, enterprise application integration, managing Web services, network administration and more.

If you compare the job description of DBAs across several organizations, it is likely that no two of them would match exactly. This is both good and bad. It is good because it continually challenges the technically-minded employees who tend to become DBAs. But it can be bad, too; because the job differs so much from company to company, it becomes more difficult to replace a DBA who leaves or retires. And no one can deny that database administration is a full-time, stressful job all on its own. But the stress level just keeps increasing as additional duties get tacked onto the DBA's "to do" list.

Summary

There are many jobs that DBAs perform and it can be confusing when you try to match job title up against the responsibilities of the job. Don't let your job title keep you from expanding into other, related disciplines. The more you know and the more you can do, the more employable you become... and that is important in this day and age!

Thursday, April 01, 2010

Nominate Someone for the CA IDUG Award for Outstanding Work in DB2

As many of you know, each year CA sponsors an award at IDUG to honor outstanding work with DB2. The only requirement for the award is that you and your company have used DB2, either on the mainframe or distributed system, in a novel, ground-breaking, or cutting-edge manner.

There is no requirement that an entrant organization be based in North America. There is also no requirement that an entrant organization license or use specific CA or other vendor database management tools or that he/she attend the IDUG conference in person. All nominations will be judged by a panel of independent DB2 consultants in conjunction with CA executives.

Nominations are now open through April 26, 2010. To learn more, visit ca.com/awards/db2.

The winner will be announced at the IDUG North America conference, May 10-14, 2010.

Each winner will receive:

One complimentary full IDUG conference pass for any IDUG Europe or IDUG North America conference held prior to December 2011
Half-day consulting engagements by each of the consultant judges within 12 months of award presentation
Expense reimbursement up to $1,500 either (a) the winner’s travel to an IDUG conference held prior to December 2011 or (b) travel by one of the DB2 consultants for on-site provision of the complimentary consulting services described above
Recognition plaque

Submit your project today and get recognized for your outstanding work. Visit ca.com/awards/db2.

Wednesday, March 17, 2010

What is Production Data?

I received an interesting e-mail recently that made me stop and think a bit... so I thought I'd blog about it. Basically, the e-mail posed the question in the title of this blog entry – “What is production data?”

The e-mail read as follows:

I'm looking for a one paragraph definition of "production data". What do you think of this: "Production data is data recorded for the purpose of controlling/managing/reporting/researching events, processes or states."

I'm trying to get around the belief that data recorded by a development team to manage its projects and resources is somehow less than production data. To me it should be regarded as the development team's "production data" and so I'm looking for a definition that satisfactorily encompasses that belief, as well as encompassing regular business production data.

You know, I do not recall ever seeing an actual definition of the term “production data.” The above definition is a good starting point, but I do not think it is complete. The author of the e-mail makes a good point about different types of production data. The data used by an application development team to conduct their business (writing computer programs to support business processes) is definitely production data… to the application development team.

Here is my take on a definition:

Production data is information that is persistently stored and used by professionals to conduct business processes. It must be accurate, documented, and managed on an on-going basis to ensure its value to the organization.

I say information instead of data because the data must be defined and in context in order to be useful for production work. And I say persistent because even though there may be many forms of transitory data used by production processes, it is the data that is stored over periods of time that needs to be managed.

I think this definition should serve the needs of the e-mailer... and more. What do you think?

Did I miss anything?

Friday, March 05, 2010

Mainframes: The Safe IT Career Choice

A recent Computerworld article (Bank of America touts mainframe work as a safe career) touts the mainframe as a safe haven for those considering a career in IT. This is an interesting article because the usual spiel you hear in industry trade rags is that the mainframe is dying and only a fool would work on such a platform. It is good to hear an alternate opinion on the matter in a journal as respected as Computerworld. (Of course, the fact that I agree with this opinion might have a little something to do with my cheer upon reading the article.)

One of the highlights of this particular article is the discussion of avialable mainframe jobs at sites such as Monster (764 jobs over 30 days) and Dice.com (1,200 ads over 30 days). These are significant numbers of jobs, especially in a down economy.

Another interesting tidbit from this piece is that "IBM says it's mainframe revenue has grown in eight of the last 13 quarters." This is impressive; consider the difficult servers market coupled with the impression that the platform is dying.

Speaking of the death of the mainframe, don't you believe it for a minute. People having been predicting the death of the mainframe since the advent of client/server in the late 1980s. That is more than 20 years! Think of all the things that have died in that timespan while the mainframe keeps on chugging away: IBM's PC business, Circuit City, Koogle peanut butter, public pay phones, Johnny Cash... the list is endless.

Some may counter that they recall reading about companies that were going to eliminate their mainframe. Well, yes, I'm sure you do remember those, I do, too. But do you recall reading many articles about companies that SUCCESSFULLY eliminated their mainframes? Many tried, few succeeded. Indeed, the re-Boot Hill web site provides examples of companies that tried to eliminate the mainframe but could not (hence, they had to re-boot). If you follow the link to the re-Boot Hill site click on the little tombstones to read the stories of failure.

So, the mainframe is a rock-solid platform, continues to grow, and is producing a significant number of job opportunities... what is not to like?

This blog has moved

This blog is now located at http://db2portal.blogspot.com/.

You will be automatically redirected in 30 seconds, or you may click here.

For feed subscribers, please update your feed subscriptions to
http://db2portal.blogspot.com/atom.xml.

Tuesday, February 09, 2010

IBM Announces DB2 10 for z/OS Beta Program

IBM announced the beta program for the next version of DB2 today, now "officially" known as DB2 10 (no more DB2 X). It is a closed beta program that will begin on March 12, 2010. That means you have to be selected by IBM to participate.

The announcement highlighted some of the areas of improvement to be delivered by DB2 10 for z/OS, and at the top of that list, to no one's surprise, is performance. DB2 10 promises to deliver out-of-the-box savings by improving operational efficiencies ranging from 5% to 10% out-of-the-box CPU savings for traditional workloads and up to 20% out-of-the-box CPU savings for nontraditional workloads.

Other areas called out by IBM in the announcement include

Improved business resiliency through scalability improvements and fewer outages (planned or unplanned).
Schema evolution or data definition on demand as well as query performance manageability enhancements support improved availability.
New features such as hash access, index include columns, inline large objects, parallel index updates, faster single row retrievals, work file in-memory, index list prefetch, 64-bit memory enhancements, use of the 1 MB page size of the System z10, buffer pools in memory, access path enhancements, member clustering for universal table spaces, efficient caching of dynamic SQL statements with literals, improved large object streaming, and SQL procedure language performance.
Rapid application and warehouse deployment for business growth including improved concurrency for data access, data management, and data definition.
The ability avoid an outage by adding active log data to a subsystem.
Improved application and data warehousing support including temporal data, a 64 bit ODBC driver, currerntly committed locking, implicit casting or loose typing, timestamp with time zone, variable timestamp precision, moving sum, and moving average.
Improvements to DB2's XML support including expanded pureXML, customer-driven performance and usability requirements, schema validation in the engine, binary XML exchange format, multiversioning, easy update of subparts of XML document, stored procedures, user-defined functions and triggers, XML index matching with date/timestamp, and a CHECK XML utility.
Enhanced query and reporting facilities, including QMF V10 with over 140 new analytical functions, support for HTML, PDF, and Flash reports, and more.

So it would seem that there is a lot of new functionality for us to begin to become acquainted with. As IBM rolls out more details, and customers begin to use the new version of DB2, we will examine some of these new features in more depth here on the DB2 Portal blog.

If you are interested in the beta program, the pre-requisite for DB2 10 is z/OS V1.10 (5694-A01) or later running in 64 bit mode. More information about the DB2 10 beta program is available on IBM's web site.

No GA date for DB2 10 has been announced.

Wednesday, February 03, 2010

IBM Manages the Data Lifecycle

Data lifecycle is a somewhat new-ish term, at least in terms of what I plan to talk about in this blog posting. The data lifecycle – and data lifecycle management – deals with tracking, managing, and understanding data and metadata as it flows through organizations. From its inception…whether entered by a clerk or read via a feed or loaded from an external source, etc…through its various usages…whether to conduct business, analyze trends and patterns, and so on…tracked from system to system, application to application, and user to user…and finally through its end of life.

Not many companies today can track all of their important data and what happens to it throughout its entire lifecycle. But doing so is important. Having such a capability enables organizations to adapt and react, gaining a competitive advantage. Much can go awry as data moves throughout an organization. Schema changes, policy changes, regulations adapt, programs change, formats changes, and so on. Any of these things can cause data quality issues, which should be brought to the attention of the business analyst using the data. But how often is this done? Knowing the history of data and its related metadata can improve business processes. But it is a major task – both for businesses and IT vendors hoping to offer solutions.

Which brings me to today’s (February 3, 2010) announcements from IBM. Big Blue announced new data protection software, a line of consulting services and resources and previewed information monitoring software to help organizations expand their use of trusted information to improve decision making. These moves further bolster IBM’s already formidable arsenal of data lifecycle management solutions.

The data protection announcement was for Optim Data Redaction. This solution, engineered for unstructured data like Word documents and PDF files, automatically recognizes and removes sensitive content from documents and forms. For example, a customer’s credit scores in a loan document could be hidden from an office clerk, while still being visible to a loan officer. In today’s atmosphere of more and more stringent regulations, a data redaction solution is becoming a requirement. For example, PCI DSS industry standards dictate specific rules regarding the display of debit and credit card information on receipts and reports.

Optim Data Redaction is planned for general availability in March 2010.

The information monitoring announcement was for InfoSphere Business Monitor. This technology is based on a combination of work from IBM’s research group and technology gained when IBM acquired Guardium. Guardium is a database activity monitoring (or auditing) solution. InfoSphere Business Monitor tracks the quality and flow of an organization’s information and provides real-time alerts of potential flaws. For example, if a health insurance company was analyzing profit margins across different product lines (individual, group, HMO, Medicare, etc.), decision makers would immediately be alerted when a data feed from a specific geography was not successfully integrated.

InfoSphere Business Monitor is available as a technology preview; it is not generally available and no GA date was announced.

At the same time, IBM announced its intention to acquire Initiate Systems, a provider of data integrity software for information sharing among healthcare and government organizations. Initiate's software helps healthcare clients work more intelligently and efficiently with timely access to patient and clinical data. It also enables governments to share information across multiple agencies to better serve citizens. IBM plans to continue to support and enhance Initiate's technologies while helping clients take advantage of the broader IBM portfolio, specifically Cognos and InfoSphere solutions for BI and analytics. This acquisition bolsters IBM’s data lifecycle management offering along these verticals.

And all of today’s announcements serve to clarify IBM’s ascent to the throne within the realm of information and data lifecycle management.

Tuesday, February 02, 2010

Larry Sure Knows How to Get Press

Going under the assumption (I assume) that no press is bad press, Oracle CEO Larry Ellison has attacked IBM's DB2... but he made several factual errors in his rant.

Here are some of the highlight (?) of the claims Ellison made about DB2 during a webcast last week.

Regarding TPC-C benchmarks, Ellison claims to have "(blown) the doors off of IBM. We crushed them." He went on to elaborate saying "In a machine that took up less than 10% the floor space, of IBM's record setting computer. We ran faster, we ran a lot faster: using a tiny fraction of the floor space, a tiny fraction of the power, cost less."

First of all, technicians working in trenches know that benchmarks are not indicative of real life performance. That aside, it is true that Oracle currently has the leading TPC-C benchmark result. Until late in 2009, DB2 enjoyed a massive 49% lead over Oracle. Oracle's most recent results give them a 25% lead (using more than six times as many CPU cores to do it).

Regarding the claim of using less space and power, this is due to Oracle using flash memory and comparing it with an IBM benchmark using conventional disk technology. If Oracle compared its benchmark to an IBM system using flash memory, these claims would not stand.

Later, Ellison claimed that "SAP chooses the Oracle Database to run under SAP in almost all their large accounts." As anyone who follows the computer industry knows, this claim is rather absurd. SAP's customers choose the DBMS to run, not SAP. And if SAP had anything to say about it, they would not recommend Oracle, their biggest competitor in the commercial business applications space. Furthermore, SAP favors DB2 for their own systems. They operate more than a thousand SAP systems, and all of those systems run on DB2.

Perhaps the silliest of Ellison's comments is this: "The Oracle Database scales out, IBM DB2 for Unix does not. Let me see, how many servers can IBM put together for an OLTP application? Let's see, how many can they group together? Um, one. They can have up to one server attacking really big jobs. When they need more capacity, they make that server bigger. And then they take the old server out, put a bigger one in. And when you've got the biggest server, that's it. That's all the can do for OLTP." Ellison also claimed that IBM "can't scale out, they can't do cloud, they can't do clusters, the can't do any of this."

I bet this surprised a lot of DB2 users doing these things with DB2! DB2 Parallel Edition was released in 1995, along with the capability to scale to a system of over a 100 Unix servers. DB2 LUW scalability is proven in many of the world's largest OLTP environments. Consider this press release talking about how DB2 LUW powers one of the largest OLTP systems in the world.

And what about that clustering claim? Evidently Mr. Ellison slept through 2009. IBM DB2 pureScale, released last year, offers powerful, efficient database clustering. For a cluster of 64 nodes, DB2 pureScale maintains 95% efficiency. At 128 nodes, DB2 pureScale maintains 84% efficiency. This is important because if you are growing a cluster to handle bigger workloads, you want your hardware to be doing productive work, not handling system overhead. On the other hand, Oracle RAC has a 100 server limit...

Ellison also made other far-out claims about IBM like "They're so far behind, I don't think they have any chance at all. I'm serious." Ellison also said "They are not competitive in the database business, except on the mainframe."

If this were true, why would Ellison spend any time thinking or talking about IBM. He must be worried, IMHO. Anyone with even a cursory knowledge of the computer industry has to admire IBM. They have led the industry in developing patents for the last 17 years. In 2009, IBM produced 4914 patents while Oracle did not even place in the top 50 patent leaders. A search of the US Patent office database reveals 1588 patents with "database" in the patent description while Oracle produced only 184 patents.

Hyperbole is one thing, but gross inaccuracy is another. In his latest tirade, Ellison is guilty of both. Oracle makes a good DBMS... pity its CEO doesn't think it can sell it on its own merits without making up stuff about the competition.

Monday, February 01, 2010

Some New Year's Resolutions for DBAs

This is sort of a re-blogging (to coin a term). I first published this last month in the Data Management Today blog I wrote for NEON. Well, I no longer work for NEON and I'm not sure how long that blog will remain active, so I thought it might make sense to re-blog some of the pertinent entries here... so here goes with my New Year's Resolutions for DBAs blog entry...

At the beginning over every year many of us take the time to cobble together some resolutions for the coming year. We plan to lose weight, save money, stop smoking, and so on. Usually, it doesn’t take long before we’ve abandoned these resolutions. Perhaps we’d be wiser to make some business related resolutions. With that in mind, here are some thoughts on the New Year’s resolutions you might be wise to make as a DBA in 2010.

Are you insatiably curious? A good DBA must become a jack-of-all-trades. DBAs are expected to know everything about everything -- at least in terms of how it works with databases. From technical and business jargon to the latest management and technology fads, the DBA is expected to be "in the know." So perhaps “be more curious” would be a useful DBA resolution.

Most DBAs know that private time is a luxury we cannot afford. A DBA must be prepared for interruptions at any time to answer any type of question -- and not just about databases, either. With that in mind, how are your people skills? DBA are usually respected as a database guru, but also frequently criticized as a curmudgeon with limited people skills. Just about every database programmer has his or her favorite DBA story. You know, those anecdotes that begin with "I had a problem..." and end with "and then he told me to stop bothering him and read the manual." DBAs simply do not have a "warm and fuzzy" image. However, this perception probably has more to do with the nature and scope of the job than with anything else. The DBMS spans the enterprise, effectively placing the DBA on call for the applications of the entire organization. As such, you will interact with many different people and take on many different roles. To be successful, you will need an easy-going and somewhat amiable manner. So another good New Year’s resolution might be to “improve your people skills.” Take a Dale Carnegie course or start by reading Carnegie’s seminal book, How to Win Friends and Influence People.

How adaptable you are? A day in the life of a DBA is usually quite hectic. The DBA maintains production and test environments, monitors active application development projects, attends strategy and design meetings, selects and evaluates new products and connects legacy systems to the Web. And, of course: Joe in Accounting just resubmitted that query from hell that's bringing the system to a halt. Can you do something about that? All of this can occur within a single workday. You must be able to embrace the chaos to succeed as a DBA. So a third resolution might be to “roll with the punches” better – and without complaining!

Of course, you need to be organized and capable of succinct planning, too. Being able to plan for changes and implement new functionality is a key component of database administration. And although this may seem to clash with the need to be flexible and adaptable, it doesn't really. Not once you get used to it. You just need to prepare yourself to be adapatable and organize to incorporate change more rapidly than others. So my final suggestion for a 2010 New Year’s resolution is to adopt a planning methodology and stick to it. Buy a planner – either electronic or not – and use it this year. You might even consider taking a time management class.

If you keep all of these resolutions, just imagine how productive you will be in 2010. And then you can use 2011 to lose weight and save money and…

Monday, January 25, 2010

Which is better? "BETWEEN" vs "<=" and >"="

This was a recent topic on the DB2-L mailing list so I thought I'd weigh in with my two cents worth on the topic.

As with most DB2 (and, indeed, IT) issues, the correct answer is "it depends!" Let's dig a bit deeper to explain what I mean.

From a maintainability perspective, BETWEEN is probably better. The BETWEEN predicate is easier to understand and code than the equivalent combination of the less than or equal to predicate (<=) and the greater than or equal to predicate (>=). In past releases, in many cases it was more efficient, too. But today the Optimizer recognizes the two formulations as equivalent and there usually is no performance benefit one way or the other. Performance reasons aside, one BETWEEN predicate is easier to understand and maintain than multiple <= and >= predicates. For this reason, I tend to favor using BETWEEN.

But not always. Consider the scenario of comparing a host variable to two columns. Usually BETWEEN is used to compare one column to two values, here shown using host variables:

WHERE COLUMN1 BETWEEN :HOST-VAR1 AND :HOST-VAR2

However, it is possible to use BETWEEN to compare one value to two columns, as shown:

WHERE :HOST-VAR BETWEEN COLUMN1 AND COLUMN2

This statement should be changed to

WHERE :HOST_VAR >= COLUMN1 and :HOST-VAR <= COLUMN2

The reason for this exception is that a BETWEEN formulation comparing a host variable to two columns is a Stage 2 predicate, whereas the preferred formulation is Stage 1. And we all know that Stage 1 outperforms Stage 2, right?

Remember too, that SQL is flexible and often the same results can be achieved using different SQL formulations. Sometimes one SQL statement will dramatically outperform a functionally equivalent SQL statement just because it is indexable and the other is not. For example, consider this SQL statement

SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
FROM EMP
WHERE MIDINIT NOT BETWEEN 'A' AND 'G';

It is not indexable because it uses the NOT BETWEEN predicate. However, if we understand the data in the table and the desired results, perhaps we can reformulate the SQL to use indexable predicates, such as

SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
FROM EMP
WHERE MIDINIT >= 'H';

Or we could code MIDINIT BETWEEN 'H' AND 'Z' in place of MIDINIT >= 'H'. Of course, for either of these solutions to work correctly we would need to know that MIDINIT never contained values that collate lower than the value 'A'.

So, as usual, there is no one size fits all answer to the question!

Wednesday, January 20, 2010

On Specialty Processors

Today's blog entry is kind of a meta entry.

As many of my readers know I write two blogs, this one that is predominantly about DB2 for z/OS, and the Data Management Today blog, that focuses on data-related issues, database management, and industry trends.

Well, I've written a series of entries on my other blog about specialty processors (zIIPs, zAAPs, etc.) and related issues. Since DB2 for z/OS folks should find that information useful and relevant, I thought I'd write an entry pointing my readers here to the pertinent specialty processor entries at my Data Management Today blog, so here goes...

The first entry I'd like to call your attention to is titled Specialty Processors on the Mainframe. This piece is basically an introduction to the different types of specialty processors (zIIp, zAAP, IFL, ICF), and what they can be used for. It is a good place to start if you are new to specialty processors or are looking for an update.

The second entry worth a peak is titled simply zAAP on zIIP. As you may or may not know, IBM delivered the capability for zAAP work to run on a zIIP (with certain conditions). This blog entry provides a brief synopsis of the August 18, 2009 z/OS V1.11 announcement that introduced the new capability to enable zAAP-eligible workloads to run on zIIPs

Next up is What is an Enclave? If you are working with zIIPs you have probably heard the term Enclave SRB. And if you are doing any type of distributed workload you've probably heard about enclaves, too. This blog entry is for those who are new to the term, or are confused about it. It offers an explanatory definition of the term "enclave" and points you on to additional reference material for those interested.

My most recent post over there (January 19, 2010), titled What is Generosity Factor?, has been a popular one. This blog entry delves into the generosity factor imposed upon zIIP workload including definitions of geneorsity factor, qualified and eligible work, and a discussion of what it implies for ISV products.

Hope you find this material worthwhile... and thanks for your continued support of my blogs.

Monday, January 11, 2010

Evaluating a DBA Job Offer

Today's blog entry is for DBAs. Although this blog is focused on mainframe DB2 the advice here applies to any DBA job... so pass the linik on to your Oracle and SQL Server friends...

As a DBA, it is almost inevitable that you will change jobs several times during your career. When making a job change, you will obviously consider requirements such as salary, bonus, benefits, frequency of reviews, and amount of vacation time. However, you also should consider how the company treats their DBAs. Different organizations place different value on the DBA job. It is imperative to your career development that you scout for progressive organizations that understand the complexity and ongoing learning requirements for the position.

Here are some useful questions to ask:

Does the company offer regular training for its DBAs to learn new DBMS features and functionality?

As a DBA you need to be well-versed on the latest and greatest features of the DBMSs you manage. And, on average, there will be a new version to contend with every 18 to 24 months. It is possible to learn the basics by reading the WHAT'S NEW manual and skimming through the voluminous, additional manuals, but some formal training is warranted to get the most out of a new version of the DBMS.

What about training for related technologies such as programming, networking, e-business, transaction management, message queueing, and the like?

DBAs are also called upon to administer more than just databases these days. A top notch employer will allow its DBAs to be trained in new technologies... as well as time for independent learning through reading books and articles.

Does the company allow DBAs to regularly attend local user groups? What about annual user groups at remote locations?

User groups are essential for networking with others who perform the same, or similar, job duties. By attending local user group meetings you can not only get inexpensive training through watching the presentations, but you can also learn exchange ideas with your peers.

Are there backup DBAs, or will you be the only one on call 24/7?

Nobody wants to be the only DBA on call, every night, all the time, on weekends, holidays, etc. And if there is no backup what happens if you take a vacation? Is it really a vacation if you have to carry a company cell phone everywhere you go?

Are there data administration and system administration organizations, or are the DBAs expected to perform all of these duties, too?

DBA is a full-time job but some organizations expect the DBA staff to handle data administration and system administration duties, too. Depending on the volume of work this might not be a deal breaker, but be cautious.

Does the DBA group view its relationship with application development groups as a partnership? Or is the relationship more antagonistic?

A partnership is essential in order to produce optimally performing database applications. And if you do not have applications that perform well, then the DBA job will be burdensome.

Are DBAs included in design reviews, budgeting discussions, and other high-level IT committees and functions?

The more involved the DBA team is in the overall IT strategy the better prepared the company's databases will be to support the required work... and the easier your job will be as a DBA.

The more YES answers you get to these questions, the more progressive the DBA environment is. Be sure to ask these questions during your interview. It will show that you have experience and that you care about your career. Be sure to research the answers later, too. Ask around about the company to those who used to work there and anyone you know (remember those user groups) that currently works there. Sometimes the answers given by the workers will not exactly match those given by the interviewer.

Keep in mind, too, that these are not the ONLY questions you need to ask.

And good luck with you DBA career!

Monday, December 21, 2009

Happy Holidays

Just a brief posting today to wish everyone a very happy holiday season.

I am taking some down time thru the end of the year to visit family "up North."

So, until next year, may your databases run without a glitch and here's hoping you all have an enjoyable holiday!

Wednesday, December 16, 2009

Quick Thoughts on DB2 Performance

Database performance problems are not caused by magic. Indeed, all performance problems are always caused by change. That statement flies in the face of what I normally say, which is “Almost never say always or never”… but in this case, it is true.

Think about it for a moment. If everything remains stable and unchanging in your environment, then why would performance vary? That’s right, it wouldn’t.

Something tangible must change before a performance problem can be experienced. The challenge of performance tuning is to find the source of the change, gauge its impact, and formulate a solution.

Change can take many forms, including the following:

Physical changes to the environment, such as a new CPU, new disk devices, or different tape drives.
Changes to system software, such as a new release of a product (for example, WebSphere, CICS, or even z/OS), the alteration of a product (for example, the addition of more or fewer CICS regions or an IMS SYSGEN), or a new product (for example, implementation of DFHSM). Also included is the installation of a new release or version of DB2, which can result in changes in access paths as well as utilization of new features.
Changes to the DB2 engine from maintenance releases and PTFs, which can change the optimizer (and sometimes introduce other new functionality).
Changes in system capacity. More or fewer jobs could be executing concurrently when the performance problem occurs. Or additional users may be banging away at your transactions.
Environmental changes, such as the implementation of client/server programs, the adoption of SOA, or other new technologies.
Database changes. This involves changes to any DB2 object, and ranges from adding a new column or an index to dropping and re-creating an object.
Changes to the application development methodology, such as usage of check constraints instead of application logic or the use of stored procedures.
Changes to application code, both SQL and host language code (COBOL, C, Java, etc.).

Although the majority of your performance problems are likely to be application-oriented, you must be prepared to explore any and all of these other areas when application tuning has little effect.

My advice is to be sure that you institute strict change control tracking across all areas of your IT infrastructure. That way, whenever you experience a performance problem, you will be able to track what has changed recently, along with who changed it and why. This is important because every DBA knows what the answer to the question “What changed?” will be… right?

It is always “nothing!”

And that cannot be true. Oh, it does not mean that the person answering is lying. He or she may not have changed anything. And it is not necessarily reasonable to expect an application developer to know what all could have changed…especially when what can impact DB2 performance spans so many areas of the IT infrastructure.

So do yourself… and your company a favor: be sure that you meticulously track each and every change to any aspect of your systems. Then – and this is where many shops break down – make sure that you have methods of tying all of the change information together in such a way that it can be queried and examined in the face of a performance problem.

Only then can you reasonably expect your DBAs rapidly to be able to track down and remedy DB2 performance problems… because only then will they have the pertinent information at their disposal.

Friday, December 11, 2009

A Short History of DB2 for z/OS – Part 2

Today’s blog entry is a continuation of yesterday’s post in which we began a brief review of the history of DB2 for z/OS. That post covered Versions 1 through 3; so today we pick up our historical review with Version 4.

Version 4 was a very significant milestone in the history of DB2. It was highlighted by the introduction of Type 2 indexes, which removed the need to lock index pages (or subpages, now obsolete). Prior to V4, index locking was a particularly thorny performance problem that vexed many shops. And, of course, I’d be remiss if I did not discuss data sharing, which made its debut in V4. With data sharing, DB2 achieved new heights of scalability and availability unmatched within the realm of DBMS; it afforded users the ability to upgrade without an outage and to add new subsystems to a group “on the fly.” The new capabilities did not stop with there; V4 also introduced stored procedures, CP parallelism, performance improvements, and more. DB2 V4 was, indeed, a major milestone in the history of mainframe DB2.

In June 1997 DB2 Version 5 became generally available. It was the first DB2 version to be referred to as DB2 for OS/390 (previously it was DB2 for MVS). Not as significant as V4, we see the trend of even numbered releases being bigger and more significant than odd numbered releases (of course, this is just my opinion). V5 was touted by IBM as the e-business and BI version. It included Sysplex parallelism, prepared statement caching, reoptimization, online REORG, and conformance to the SQL-92 standard.

Version 6 brings us to 1999 and the introduction of the Universal Database term to the DB2 moniker. The “official” name of the product is now DB2 Universal Database for OS/390. And the Release Guide swelled to over 600 pages! Six categories of improvements were introduced with V6 spanning:

Object-relational extensions and active data
Network computing
Performance and availability
Capacity improvements
Data sharing enhancements
User productivity

The biggest of the new features were SQLJ, inline statistics, triggers, large objects (LOBs), user-defined functions, and distinct types.

Version 6 is also somewhat unique in that there was this “thing” typically referred to as the V6 refresh. It added functionality to DB2 without there being a new release or version. The new functionality in the refresh included SAVEPOINTs, identity columns, declared temporary tables, and performance enhancements (including star join).

March 2001 brings us to DB2 Version 7, another “smaller” version of DB2. Developed and released around the time of the Year 2000 hubbub, it offered much improved utilities and some nice new SQL functionality including scrollable cursors, limited FETCH, and row expressions. Unicode support was also introduced in Db2 V7. For a more detailed overview of V7 (and the V6 refresh) consult An Introduction to DB2 for OS/390 Version 7.

DB2 Version 8 followed, but not immediately. IBM took advantage of Y2K and the general desire of shop’s to avoid change during this period to take its time and deliver the most significant and feature-laden version of DB2 ever. V8 had more new lines of code than DB2 V1R1 had total lines of code!

I don’t want to get bogged down in recent history here outlining the features and functionality of DB2 releases that should be fresh in our memory (V8 and V9). If you really want some details on those refer to these links for them:

An Overview of DB2 for z/OS Version 8

DB2 9 for z/OS Features

Which brings us to today. Most shops should be either running Version 9 in production or planning their migration from V8 to V9. And we are all waiting with baited breath for DB2 X… which hopefully should be announced sometime next year.

Thursday, December 10, 2009

A Short History of DB2 for z/OS – Part 1

Let's go back in time... almost three decades ago... back to the wild and woolly 1980s! And watch our favorite DBMS, DB2, grow up over time.

Version 1 Release 1 was announced on June 7, 1983. And it became generally available on Tuesday, April 2, 1985. I wonder if it was ready on April 1st but not released because of April Fool’s Day? Any old-time IBMer out there care to comment?

Initial DB2 development focused on the basics of making a relational DBMS work. Early releases of DB2 were viewed by many as an “information center” DBMS, not for production work like IMS.

Version 1 Release 2 was announced on February 4, 1986 and was released for general availability a month later on March 7, 1986. Wow! Can you imagine waiting only a month for a new release of DB2 these days? But that is how it happened back then. Same thing for Version 1 Release 3, which was announced on May 19, 1987 and became GA on June 26, 1987. DB2 V1R3 saw the introduction of date data types.

You might notice that IBM delivered “releases” of DB2 back in the 1980s, whereas today (and ever since V3) there have only been versions. Versions are major, whereas releases are not quite as significant as a version.

Version 2 of DB2 became a reality in 1988. Version 2 Release 1 was announced in April 1988 and delivered in September 1988. Here we start to see the gap widening again between announcement and delivery. V2R1 was a very significant release in the history of DB2. Some mark it as the bellwether for when DB2 began to be viewed as a DBMS capable of supporting mission critical, transaction processing workloads. Not only did V2R1 provide many performance enhancements but it also signaled the introduction of declarative Referential Integrity (RI) constraints. RI was important for the acceptance of DB2 because it helps to assure data integrity within the DBMS.

No sooner than V2R1 became GA than IBM announced Version 2 Release 2 on October 4, 1988. But it was not until a year later that it became generally available on September 23, 1988. DB2 V2R2 again bolstered performance in many ways. It also saw the introduction of distributed database support (private protocol) across MVS systems.

Version 2 Release 3 was announced on September 5, 1990 and became generally available on October 25, 1991. Two very significant features were added in V2R3: segmented table spaces and packages. Segmented table spaces have become the de facto standard for most DB2 data and packages made DB2 application programs easier to support. DB2 V2R3 is also the version that beefed up distributed support with Distributed Relational Database Architecture (DRDA). Remote unit of work distribution was not available in the initial GA version, but IBM came out with RUOW support for DB2 V2R3 in March 1992.

And along comes DB2 Version 3 announced in November 1993 and GA in December 1993. Now it may look like things sped up again here, but not really. This is when the QPP program for early support of DB2 started. QPP was announced in March 1993 and delivered to customers in June 1993. Still though, fairly rapid turnaround by today’s standards, right?

V3 greatly expanded the number of bufferpool options available (from 5 pools to 80). There were many advances made in DB2 V3 to take better advantage of the System 390 environment: V3 introduced support for hardware assisted compression and hiperpools. It was also V3 that introduced I/O parallelism for the first time.

We’ll stop here for today and continue our short history of DB2 in my next DB2Portal blog posting. See you soon...

Wednesday, December 02, 2009

Wordle of my DB2 Portal Blog

The "jumble" of words shown here is a Wordle, which is a "word cloud" of text. I fed my blog location into the Wordle generator and it created this pretty picture based on the words I most commonly use here in this blog.

The cloud gives greater prominence to words that appear more frequently in the source text. No surprise that "DB2" and "data" dominate the other words!

Monday, November 23, 2009

Reading Things That Aren't There... and Missing Things That Are!

You can shoot yourself in the foot using DB2 if you are not careful. There are options that you can specify that may cause you to read data that is not really in the database. And, alternately, you can set things up so that you miss reading data that is actually in the database.

How, you might be asking? Well, dirty reads will take care of the first one. Specifying ISOLATION(UR) implements read-through locks, which is sometimes referred to as a dirty read. It applies to read operations only. With this isolation level data may be read that never actually exists in the database, because the transaction can read data that has been changed by another process but is not yet committed.

Read uncommitted isolation provides the highest level availability and concurrency of the isolation levels, but the worst degree of data integrity. It should be used only when data integrity problems can be tolerated. Certain types of applications, such as those using analytical queries, estimates, and averages are likely candidates for read uncommitted locking. A dirty read can cause duplicate rows to be returned where none exist or no rows may be returned when one (or more) actually exists. When choosing read uncommitted isolation the programmer and DBA must ensure that these types of problems are acceptable for the application.

OK, so what about not reading data that is in the database? DB2 V9 provides us an option to do just that. In DB2 9 it is possible for a transaction to skip over rows that are locked. This can be accomplished by means of the SKIP LOCKED DATA option within your SQL statement(s). SKIP LOCKED DATA can be specified in SELECT, SELECT INTO, and PREPARE, as well as searched UPDATE and DELETE statements. You can also use the SKIP LOCKED DATA option with the UNLOAD utility.

When you tell DB2 to skip locked data then that data is not accessed and your program will not have it available. DB2 just skip over any locked data instead of waiting for it to be unlocked. The benefit, of course, is improved performance because you will not incur any lock wait time. But it comes at the cost of not accessing the locked data at all. This means that you should only utilize this clause when your program can tolerate skipping over some data.

The SKIP LOCKED DATA option is compatible with cursor stability (CS) isolation and read stability (RS) isolation. But it cannot be used with uncommitted read (UR) or repeatable read (RR) isolation levels. DB2 will simply ignore the SKIP LOCKED DATA clause under UR and RR isolation levels.

Additionally, SKIP LOCKED DATA works only with row locks and page locks. That means that SKIP LOCKED DATA does not apply to table, partition, LOB, XML, or table space locks. And the bigger the lock size, the more data that will be skipped when a lock is encountered. With row locking you will be skipping over locked rows, but with page locking you will be skipping over all the rows on the locked page.

Use both of these features with extreme care and make sure that you know exactly what you are telling DB2 to do. Otherwise, you might be reading more... or less than you want!

Tuesday, November 17, 2009

Replacing UNION with CASE

When a UNION is required to put together data from multiple queries, you might be able to use a CASE statement instead. This is very useful, particularly when the data for each of the queries in the UNION come from the same table. The CASE statement can potentially enhance performance by minimizing the number of times the data is read.

Let’s look at an example to clarify why:

SELECT CREATOR, NAME, 'TABLE'
FROM SYSIBM.SYSTABLES
WHERE TYPE = 'T'
UNION
SELECT CREATOR, NAME, 'VIEW '
FROM SYSIBM.SYSTABLES
WHERE TYPE = 'V'
UNION
SELECT CREATOR, NAME, 'ALIAS'
FROM SYSIBM.SYSTABLES
WHERE TYPE = 'A'
ORDER BY NAME;

This simple SQL statement uses UNION to put together the results of three queries against the SYSTABLES table. The report shows all of the DB2 table-like objects that exist in the DB2 subsystem: tables, views, and synonyms.

To do this, DB2 must scan through the table three times – once for each query (as there is no index on the TYPE column). But, you can use CASE and code an equivalent, but more efficient query, as follows:

SELECT CREATOR, NAME,
CASE TYPE
WHEN 'T' THEN 'TABLE'
WHEN 'V' THEN 'VIEW '
WHEN 'A' THEN 'ALIAS'
END
FROM SYSIBM.SYSTABLES
ORDER BY NAME;

This new query will need to scan SYSTABLES only once. The CASE statement will translate the code in the TYPE column into the text that we desire.

CASE statements are very powerful and you should use them when you can to create elegant and optimal SQL in your DB2 applications.

Tuesday, November 03, 2009

Deprecated Features Planned for DB2 X for z/OS

Everyone is always interested in the latest and greatest features of their favorite DBMS, in this case DB2. But sometimes features get removed from the DBMS when a new version is released. According to the IBM teleconference on DB2 X for z/OS today, there are several features planned to be deprecated (i.e. removed). Let's briefly take a look at them.

The first feature that will be removed is private protocol DRDA. This should come as no surprise to anybody since IBM has been indicating that private protocal distribution was on its way out for a number of releases now. And it really is not that difficult to convert to DRDA (and there will be some new help in DSNTP2DP).

Perhaps more troublesome to some shops will be the removal of support for plans containing DBRMs. You will have to convert the DBRMs to packages when you get to DB2 X. You would think that since packages have been available as far back as V2.3 that most shops would have converted to them already. But I do know that there are some shops out there" who still have DBRMs bound directly into their plans. So start planning to convert ASAP! IBM will offer some support in DB2 X to track down affected plans.

While on the topic of plans and packages, you will have to REBIND any that have not been rebound since V5 or before. But it is a good rule to REBIND everything when you go to a new version of DB2 anyway, and most (but not all) shops do that.

From a documentation perspective, BookManager will no longer be supported. Instead we get the PDF versions and the Info Center online.

And the DB2 Management Clients (DB2 Administration Server, Control Center, and Development Center) are deprecated. IBM's new management client direction for DB2 is Data Studio.

Other deprecated items include:

ACQUIRE(ALLOCATE) [change to ACQUIRE(USE)]
Workload capture through profile monitor
XML Extender [change to use pureXML data type]
DB2 MQ XML user-defined functions and stored procedures [change to XML functions]
msys for Setup DB2 Customization Center [change to install panels]

At this point, DB2 X is on track to become generally available at the end of next year, 2010. At this point IBM has announced that migration will be supported from DB2 9 NFM (and has hinted that they are considering supporting migration to DB2 X directly from DB2 V8 NFM).

Monday, November 02, 2009

New DB2 Twitter List

Just a very quick post this morning to let all you DB2 Twitter folks out there know that I created a list of the DB2 tweeters I know about at http://twitter.com/craigmullins/db2-folks.

If you are a DB2 professional and I left you off the list please leave a comment here or drop me an e-mail and I'll be happy to add you.

Friday, October 30, 2009

IOD2009 Day Three – Malcolm Gladwell

Today’s blog entry is a little late seeing as how this is Friday and IOD is over, but I’m writing about Wednesday morning’s keynote session highlighted by Malcolm Gladwell.

For those who do not know him, Malcolm Gladwell is a Canadian journalist and author best known as the author of the books The Tipping Point (2000), Blink (2005), and Outliers (2008). I’ve read all three of them and I highly recommend that you do, too. He also has a new book, What The Dog Saw, that I bought at the airport on the way home from IOD. I hope it is as good as the other three!

Gladwell’s books deal with the unexpected implications of research in the social sciences. He spoke about some of these during his keynote session in a very entertaining and informative way. It was especially rewarding to hear him tie his messaging into the conference them of information-led transformation… and to hear him call all of the attendees mavens (read his books to understand that term).

Gladwell began his talk by noting the irony of hosting a conference dealing with information analytics in Las Vegas, of all places. You would think that the casinos might have an interest in that topic!

The major idea conveyed by Gladwell during his talk focused on change, and how it never occurs the way you think it will. He explained how radical changes happen much more quickly than we imagine. And he used a story about how the broadcast of a major prize fight transformed radio from a niche product to a transformative one.

He also talked about reframing as being necessary to elicit major changes. Prior to broadcasting the boxing match, radio was used to deliver news and classical music. But reframing it as a product for the delivery of real-time sports coverage – reframing its use – caused major transformation. Gladwell also highlighted Apple and the iPod. The iPod was not the first portable MP3 product, but it is the most successful. Because Apple simplified the interface and promoted it simply… that is, they reframed the issue!

I also enjoyed Gladwell’s story about how he purchased a laptop at CompUSA. Upon entering the store he was confronted with tables and tables of similar looking laptops and had no way to differentiate them. He tried the sales people but as anyone who ever went to CompUSA knows, they either couldn’t be found when you needed them or knew nothing about technology. So he called up his brother, who works in IT. He is not a well-known expert, but he is Malcolm’s maven – the guy he turns to for help. And he told him which laptop to purchase. Gladwell then went back to the CompUSA and he said he stood there pointing to the laptop his brother told him about. And about a half hour later the crack CompUSA salesman came to help him.

This story highlighted the concept of the maven. He also mentioned that if the executives were watching how he confidently pointed to the machine he wanted that it might have led them to erroneously believe that CompUSA needed less experienced… and, indeed, fewer sales people. Which might have led to their demise.

All in all, Gladwell was entertaining and informative – a very powerful combination.

The other highlight of the keynote session, for me at least, was that I was mentioned by name as the most prolific Twitter-er at the IOD conference. Now does that mean I was the most helpful or the most annoying… I’ll leave that to you all to decide.

Wednesday, October 28, 2009

IOD2009 Day Two

Day two of the IBM Information on Demand conference was just as informative and exciting as day one. The day kicked off with a general session titled "A New Kind of Intelligence for a Smarter Planet." The idea presented is that the world is changing. It is becoming more instrumented, interconnected, and intelligent. Basically, as Steve Mills of IBM clarified, the ability to embed intelligence into millions of things will lead the transformation to an information led smarter planet. And that this information-led transformation will create opportunities for organizations to strategically gain control of information and create a new kind of intelligence.

Expanded intelligence begins with sensors and metering, which is doable today because price points have become reasonable now. There are 1 billion transistors per human, today. And an estimated 2 billion people will soon be on the Internet. At the same time, we are moving toward one trillion connected objects (video cameras, GPS devices, healthcare instruments, and even livestock). And there are hundreds of satellites orbiting the Earth, generating terabytes of data each and every day.

As we begin to intelligently interconnect these devices and analyze their streaming data, a transformative opportunity can results...

IBM brought up three customers to talk about how their organization were helping to transform to a smarter planet. The customers came from Cardinal Health, Statoil Hydro ASA (Norway), and the Food and Drug Administration. Highlights of the panel discussion:

Security in the supply chain for food is absolutely critical and food safety is one of the most complex systems to deal with.
The US imports 50pct of its food - from over 150 different countries.
Every food supplier to US (domestic or foreign) must be registered with the FDA.
And each supplier must complete forms as to the safety of its food. This can be difficult since suppliers range from large agri-businesses to small farms many of whom do not have a computer at all. So some records are probably kept on paper in shoe boxes.
Supply chain security is very important in the health care (drugs) and oil industries too! In fact, it was discussed how each of these seemingly disparate industries face many similar challenges.
Preventing problems requires understanding risk. And complexity requires collaboration in order to succeed becase nobody has enough resources to do it all alone.
In some areas (such as remote parts of Norway) instrumentation is essential to even get the data because nobody wants to go there.
Legacy systems often are rich sources of information (hey, that probably means mainframes!)
Analytics are required for prevention of problems, but also aid in reaction to problems that actually occur too. No one prevents 100 percent of their problems.

After the panel IBM came back and wrapped it up. They mentioned how IBM was awarded the National Medal of Honor for their Blue Gene supercomputer for DNA sequencing. It was a very informative and entertaining general session.

I then attended the DB2 9.7 versus Oracle 11g Smackdown presentation. It was chock full of statistics on why IBM's DB2 is superior to Oracle in terms of cost. The presenter explained how DB2 outperforms Oracle on TPC-C benchmarks for the same test, on the same machine, at the same point in time. He cautioned folks to to read the small print details on all benchmark results... for example, if you are examining the cost of ownership double check to see whether the benchmark uses a term or full purchase license. Also, the cost of the database license depends a great deal on your ability to negotiate a discount (if the vendor will discount). And you also need to be aware of how the products are licensed. Some features are separately licensed for both DB2 and Oracle. The bottom line is that licensing can cause a more than 30 percentt swing in price performance results

But do people even believe these benchmarks any more? I don't think very many people put much stock in benchmark tests today.

The author frequently cited an independent study by ITG that compares the value proposition of DB2 9.7 versus Oracle Database 11g. You can read the study yourself at this link.

(Note to my regular z/OS readers: the previous discussion was all about DB2 LUW and not DB2 for z/OS).

I also got to attend a special briefing for bloggers on IBM's new stream computing solution, which I blogged about on my Data Management Today blog if you are interested.

And finally, I'll plug my Twitter feed again. You can follow my tweets at IOD this week (well, thru Wednesday) by following me on Twitter at http://www.twitter.com/craigmullins.

Monday, October 26, 2009

IBM IOD2009 Day One

It is Monday, October 26, 2009, and the annual IBM Information on Demand (IOD2009) conference is officially underway. Well, actually it kicked off with a bang on Sunday. The exhibition hall opened at 6:00 pm and the early goers traipsed through the vendor hall sharing stories, checking out the vnedor's wares, and looking for the latest tschotskes (the favorites seem to be a mini-book light being given out by SPSS and the nifty DB2 t-shirts being given out by SEGUS, Inc.).

But the event really does not get started (at least as far as I'm concerned) until the big Monday morning kickoff, which this year was emceed by Terry Fator (the ventriloquist impersonator who won America's Got Talent) last year. He did a very nice job, and his mimicry and ventriloquism skills are great... but I think ventriloquism works better in a more intimate setting.

Before Terry came on an IBMer shared several tidbits of data from research and from polls they have been taking at IOD. For example, did you know that 15 petabytes of data are created every day? Regarding the even, they shared that there are more overall attendees here than there were at last year's event (6,000+) and that the average distance folks traveled to attend #IOD2009 is 2500 miles. You can actually participate in the daily IOD polls at http://iodpoll.com.

In between the entertainment respites, we got to hear from several IBM folks, including Ambuj Goyal, Arvind Krishna, and Frank Kern. According to Mr. Goyal, IBM is building an information on demand software stack to deliver trusted information. This can solve a very real problem that organizations are experiencing today. You see, it seems that one in three business leaders frequently make critical decisions based on inaccurate or non-existing information. Business executives don't just need data, but trusted information that can be relied upon for analysis to make accurate business decisions.

Again, according to Goyal, he sees IBM's Information Agenda doing for trusted information what ERP did for enterprise resource processes. IBM will help move its customers from information projects to information-based architecture and operations, moving organizations from information on demand to information transformation.

Then Frank Kern hosted three IBM customers as he asked each of them to explain how they benefited from IBM's Information Management solutions. The representative from BlueCross BlueShield talked about their data warehouse with 54 millions records, which happens to be the largest healthcare data warehouse in the world. Their analytical data is made available thru a portal. Offering integrated healthcare details helps improve healthcare. And the Chevron representative spoke about how IBM has helped Chevron improve its supply chain operations using integrated information.

And Arvind Krishna, GM of the Information Management group at IBM, told about IBM's on-going investments in research and development for IOD. So far, IBM has invested over $12 billion in R+D and acquisitions.

IBM also unveiled some announcements today at IOD including several new offerings to provide organizations with analytics capabilities to make better use of their archived information and improve business processes. These offerings are part of IBM's unified archiving strategy called IBM Smart Archive. Most intriguing to me is the preview of a SaaS offering called IBM Information Archive Cloud Services. The entire press release can be read here, if you're interested in the topic.

I also attended a presentation on DB2 X which was quite interesting, too. Delivered by Jeff Josten, there was much good news for mainframe DB2 folks including improved performance, temporal support, improved utility operations, hashed data and access paths, and more. I tweeted about this quite extensively on Twitter.

In fact, if you want to follow my tweets all week (well, thru Wednesday) be sure to follow me on Twitter at http://www.twitter.com/craigmullins... even when I'm not at a conference like IOD, I regularly tweet on DB2, database, and information management topics.

Monday, October 19, 2009

Hear Ye, Hear Ye, Learn All About DB2 X for z/OS

IBM is hosting a free webinar on November 3, 2009, offering a technical preview of the next version of DB2 for z/OS, currently known as DB2 X (but we all know, err, ah, think it will be called DB2 10).

If you work with DB2 on the mainframe you will want to set aside some time to attend this informative DB2 X webinar. It will have information for DBAs, as well as for Managers, Application Architects, Developers, System Administrators, System Programmers, and System Architects... and that is just about anyone who works with DB2.

The speaker will be IBM Distinguished Engineer, Jeff Josten.

How do you participate, you may be asking? Well, that is easy. Simply click on over to DB2 X for z/OS Technical Preview and register yourself.

The webinar will be held on November 3, 2009 at 11:00 a.m. Eastern Standard Time, 4:00 p.m. UTC.

Monday, October 12, 2009

On The Importance of Choosing the Correct Data Type

Most DBAs have grappled with the pros and cons of choosing one data type versus another. Sometimes the decision is easy, whereas sometimes the decision is a little bit more difficult. In today's blog entry, we'll discuss both situations.

Data type and length are the most fundamental integrity constraints applied to data in a database. Simply by specifying the data type for each column when a table is created, DB2 automatically ensures that only the correct type of data is stored in that column. Processes that attempt to insert or update the data to a non-conforming value will be rejected. Furthermore, a maximum length is assigned to the column to prohibit larger values from being stored in the table.

The DBA must choose the data type and length of each column wisely. The general rule of thumb for choosing a data type for the columns in your tables is to choose the data type that most closely matches the domain of correct values for the column. That means you should try to adhere to the following rules:

If the data is numeric, favor SMALLINT, INTEGER, BIGINT, or DECIMAL data types. DECFLOAT and FLOAT are also options for very large numbers.
If the data is character, use CHAR or VARCHAR data types.
If the data is date and time, use DATE, TIME, and TIMESTAMP data types.
If the data is multimedia, use GRAPHIC, VARGRAPHIC, BLOB, CLOB, or DBCLOB data types.

But, of course, there are always exceptions. For example, what about social security number? That is a numeric column, but you won't be performing calculations using it. And, to format it correctly requires hyphens, which cannot be stored in a numeric data type. So, perhaps, SSN is an exception.

Leading Zeroes

Let's consider another typical situation: the desire to display leading zeroes. Maybe the application requires a four-byte code that is used to identify products, accounts, or some other business object, and all of the codes are numeric and will stay that way. But, for reporting purposes, the users want the codes to print out with leading zeroes. So, the users request that the column be defined as CHAR(4) to ensure that leading zeroes are always shown. But what are the drawbacks of doing this?

Well, there are drawbacks! Without proper edit checks, inserts and updates could place invalid alphabetic characters into the product code. This can be a very valid concern if ad hoc data modifications are permitted. This is rare in production databases, but data problems can still occur if the proper edit checks are not coded into every program that can modify the data. But let's assume that proper edit checks are coded and will never be bypassed. This removes the data integrity question (yeah, right!).

There is another problem that is related to performance and filter factors. Consider the possible number of values that a CHAR(4) column and a SMALLINT column can assume. Even if edit checks are coded for each, DB2 is not aware of these and assumes that all combinations of characters are permitted. DB2 uses base 37 math when it determines access paths for character columns, under the assumption that 26 alphabetic letters, 10 numeric digits, and a space will be used. This adds up to 37 possible characters. For a four-byte character column there are 374 or 1,874,161 possible values.

A SMALLINT column can range from -32,768 to 32,767 producing 65,536 possible small integer values. The drawback here is that negative or 5 digit product codes could be entered. However, if we adhere to our proper edit check assumption, the data integrity problems will be avoided here, as well. And we can always code a simple CHECK constraint to ensure that the code is always greater than zero.

DB2 will use the HIGH2KEY and LOW2KEY values to calculate filter factors. For character columns, the range between HIGH2KEY and LOW2KEY is larger than numeric columns because there are more total values. The filter factor will be larger for the numeric data type than for the character data type which may influence DB2 to choose a different access path. For this reason, favor the SMALLINT over the CHAR(4) definition.

The leading zeroes problem might be able to be solved using other methods. If you are using a report writer, most of them have quick methods of displaying leading zeroes. When using QMF, you can ensure that leading zeroes are shown by using the "J" edit code. Report programs can be coded to display leading zeroes easily enough by moving the host variables to appropriate display fields. Ad hoc access through other reporting tools typically provide a parameter that can enable leading zeroes to be displayed.

In general, it is wise to choose a data type which is closest to the domain for the column. IF the column is to store numeric data, favor choosing a numeric data type: SMALLINT, INTEGER, DECIMAL, or floating point. In addition, always be sure to code appropriate edit checks to ensure data integrity.

A DATE with DB2

What about dates? In my opinion, you should always use a DATE data type for date data. Why anyone would ever store a date or time in a DB2 table any other format than DATE, TIME, or TIMESTAMP is beyond me. But, oh, they do it all the time. And it causes all sorts of headaches. The benefits of using the proper DB2 data type are many, including:

Ensuring data integrity because DB2 will ensure that only valid date and time values are stored
The ability to use date and time arithmetic
A vast array of built-in functions to operate on and transform date and time values
Multiple formatting choices

Additionally, you have multiple built-in functions at your disposal for manipulating date and time data, but only when the data is stored as DATE, TIME, and TIMESTAMP data types. The time-related DB2 functions include CHAR, DATE, DAY, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, DAYS, HOUR, JULIAN_DAY, MICROSECOND, MIDNIGHT_SECONDS, MINUTE, MONTH, QUARTER, SECOND, TIME, TIMESTAMP, WEEK, WEEK_ISO, and YEAR.

I get questions all the time from folks who've stored dates using character data types; they ask all kinds of integrity and manipulation questions and I always, always, always start my reply with "You shouldn't have done that!"... and then I try to help them out, if I can. Don't fall into the trap of storing dates using anything but a DATE data type... you'll thank me later.

Summary

There is a lot more that can be said about choosing appropriate data types, but I think we've covered enough for today. The bottom line on data types is to use them to protect the integrity of your DB2 data and to simplify your job by taking advantage of DB2’s built-in capabilities. By choosing that data type that most closely matches your data you will be doing yourself, your systems, and your users a big favor.

Friday, October 02, 2009

IDUG Europe is Right Around the Corner

Just a quick post today to remind everybody that the annual European IDUG conference will be held next week (the week of October 5, 2009) in Rome, Italy. And it is not too late to ensure that you will be there to hear the latest and greatest news, tips, tricks, and guidelines on our favorite DBMS - IBM's DB2!

For those of you not lucky enough to be there keep an eye on my DB2portal blog here where I will attempt to summarize the key events of the week.

And if you are a Twitter aficionado, be sure to follow me on Twitter as I will try to make regular Tweets about the event (as long as my Blackberry works in Rome).

Thursday, September 24, 2009

Limiting the Number of Rows Fetched

Application developers frequently need to retrieve a limited number of qualifying rows from a table. For example, maybe you need to list the top ten best selling items from inventory, or a list of the top five most expensive products (i.e., highest price tag). There are several ways to accomplish this prior to DB2 V7 using SQL, but they are not necessarily efficient.

The first reaction is to simply use the WHERE clause to eliminate non-qualifying rows. But this is simplistic, and often is not sufficient to produce the results desired in an optimal manner. What if the program only requires that the top ten results be returned? This can be a somewhat difficult request to formulate using SQL alone.

Consider, for example, an application that needs to retrieve only the top ten most highly paid employees from the EMP sample table. You could simply issue a SQL request that retrieves all of the employees in order by salary, but only use the first ten retrieved. That is easy; for example:

EMPNO, FIRSTNME, LASTNAME, SALARY
FROM DSN8710.EMP
ORDER BY SALARY DESC;

You must specify the ORDER BY clause with the DESC key word. This sorts the results into descending order, instead of the default, which is ascending. Without the DESC key word, the "top ten" would be at the very end of the results set, not at the beginning.

But that does not really satisfy the requirement - retrieving only the top ten. It merely sorts the results into descending sequence. So the results would still be all employees in the table, but in the correct order so you can view the "top ten" salaries very easily. The ideal solution should return only the top ten employees with the highest salary and not merely a sorted list of all employees.

You can code some "tricky" SQL to support this request for all versions of DB2, such as the following:

SELECT EMPNO, FIRSTNME, LASTNAME, SALARY
FROM DSN8710.EMP A
WHERE 10 > (SELECT COUNT(*)
FROM DSN8710.EMP B
WHERE A.SALARY < B.SALARY)
AND SALARY IS NOT NULL
ORDER BY SALARY DESC;

This SQL is portable from DB2 to other DBMSs, such as Oracle or SQL Server. And, of course, you can change the constant 10 to any number you wish, thereby retrieving the top 20, or top 5, as deemed necessary by the needs of your application.

Since the SALARY column is nullable in the EMP table, you must remove the nulls from the results set. And the ORDER BY is required to sort the results in the right order. If it is removed from the query, the results will still contain the top ten, but they will be in no particular order.

But DB2, as of V7, provides an easier and less complicated way to limit the results of a SELECT statement - the FIRST key word. You can code FETCH FIRST n ROWS which will limit the number of rows that are fetched and returned by a SELECT statement.

Additionally, you can specify a new clause -- FETCH FIRST ROW ONLY clause -- on SELECT INTO statements when the query can return more than one row in the answer set. Doing so informs DB2 to ignore the other rows.

There is one difference between the new V7 formulation and the other SELECT statement we reviewed, and that is the way "ties" are handled. A tie occurs when more than one row contains the same value. The previous query we examined may return more than 10 rows if there are multiple rows with the same value for price within the top ten.

Using the FIRST key word DB2 will limit the number of rows returned to ten, even if there are other rows with the same value for price as the number ten row in the results set. The needs of your application will dictate whether ties are to be ignored or included in the result set. If all "ties" need to be included in the results set, which would mean that more than 10 rows would be needed, the new V7 feature may not prove to be helpful.

And it is also important to note that as of DB2 9, you can include the FETCH FIRST clause in a subselect. ORDER BY is allowed in a subselect, too. The subselect MUST be enclosed in parentheses and the FETCH FIRST (or ORDER BY) cannot be in the outermost fullselect of a view, or in a materialized query table.

Friday, September 04, 2009

Dynamic SQL Causing Lock Escalation?

Lock escalation is the promotion of a lock from a row, page or LOB lock to a table space lock because the number of page locks that are concurrently held on a given resource exceeds a preset limit.

But lock escalation can cause problems. Yes, when fewer locks are taken, CPU cost and memory usage can be reduced. On the other hand, escalating the size of a lock causes more resources to be locked... and that impacts concurrency with the most likely result being applications experiencing lock timeouts or deadlocks.

When lock escalation occurs, DB2 writes messages to the system log. So when complaints inevitably arrive about failing transactions you can look for the appropriate lock escalation message indicating that it could be the culprit.

But such simple steps are typically not sufficient to track down the root cause and take corrective actions. You need to be able to determine what SQL statement in what program and under what circumstances is causing the escalation. Doing so involves tracing and using performance monitoring tools.

For those interested in this topic, (indeed, the primary purpose of this blog post) you should seek out the IBM TechDoc titled Identifying the dynamic SQL statement which is causing a
lock escalation in DB2 for z/OS. For those who don't know what a TechDoc is, don't worry, it isn't that complicated. It is basically a published piece of technical documentation that isn't part of a larger manual or IBM RedBook. They can be short papers, product flashes, presentations, etc.

This particular TechDoc details a methodology for identifying dynamic SQL statements involved in a lock escalation. It describes which traces to start, and which jobs to run to analyze the collected traces. It also explains how to analyze the trace report to find the problematic SQL statement.

You can search all of IBM's TechDocs by following this link.

Happy reading!

Friday, August 21, 2009

Approaches to Access Path Management

BIND and REBIND are important components in assuring efficient DB2 applications. Because the
BIND/REBIND process determines exactly how your DB2 data is accessed, it is important that you develop an appropriate strategy for when and how to REBIND your programs.

There are several common REBIND approaches taken by DB2 users. By far, the best approach is to REBIND your applications over time as your data and systems change. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change.

Other approaches include REBINDing only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to REBIND automatically after a regular period of time (days, weeks, months, etc.). This approach can work if the period of time is wisely chosen based on the application data – but it still can pose administrative issues.

The final approach can be summarized as “if it ain’t broke don’t fix it!” This is the worst of the several approaches discussed here. The biggest problem with this approach is that you are penalizing every program in your subsystem for fear that a program or two may have a degraded access path. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data changes.

Of course, the possibility of degraded performance is real – and that is why this approach has been adopted at some sites. The problem is being able to find which statements have degraded. In an ideal world we would be to be able to review the access path changes beforehand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to REBIND on a regular basis as your data changes. This approach has become known as the three Rs. To implement this approach you:

Regularly reorganize the data to ensure that it is optimally structured.
Follow that with RUNSTATS to be sure that the reorganized state of the data is reflected in the DB2 Catalog.
And follow that with a REBIND for all the application programs that access the data structures impacted by the REORG and RUNSTATS.

At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions. DB2 is unlikely to make the same access path choice as your data grows – and as patterns within the data change.

By REBINDing you can generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. Additionally, as DB2 changes are introduced (PTFs, new version/release) optimizer improvements and new access techniques can be incorporated into the access paths. That is, if you never REBIND, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

Adopting the Three R’s approach can pose additional questions. For example, when should you reorganize? In order to properly determine when a REORG is needed you’ll have to look at statistics. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, we need to add another R – in other words:

RUNSTATS or better yet, RTS
REORG
RUNSTATS
REBIND

Now it is true that some folks don’t rely on statistics to schedule a REORG. Instead, they just build the JCL to REORG their database objects when they create the object. So they create a table space then build the REORG job and schedule it to run monthly, or quarterly, or on some regular basis. This is better than no REORG at all, but it is probably not the best approach because you are most likely either reorganizing too soon (in which case you waste the CPU cycles to do the REORG) or you are reorganizing too late (in which case performance is suffering for a period of time before the REORG runs). Better to base your REORGs off of statistics and thresholds using either RUNSTATS or RTS.

Without accurate statistics there is little hope that the optimizer will formulate the best access path to retrieve your data. If the optimizer doesn’t have accurate information on the size, organization, and particulars of your data then it will be creating access paths based on either default or inaccurate statistics. Incorrect statistics will cause bad choices to be made – such as choosing a merge-scan join when a nested loop join would be better, or failure to invoke sequential prefetch, or using the wrong index – or no index at all. And the problem of inaccurate statistics is pervasive.

There are shops out there that never, or rarely, run RUNSTATS to gather up-to-date statistics. Make sure yours is not one of those shops!

When should you run RUNSTATS? One answer is “As frequently as possible based on how often your data changes.” To do this you will need to know a thing or two about your data growth patterns: what is its make-up, how is it used, how fast does it grow, and how often does it change? These patterns will differ for every table space in your system.

Next we need to decide when to REBIND? The best answer for this is when statistics have changed significantly enough to change access paths. When we know that data has significantly changed it makes sense to REBIND after the RUNSTATS completes. But the trick is determining exactly when we have a “significant” change in our data. Without an automated method of comparing and contrasting statistics (or even better yet, access paths) coming up with an answer in a manual way can be time-consuming and error-prone – especially if we have thousands of DB2 programs to manage.

As we REBIND, we always must be on alert for rogue access paths. A rogue access path is created when the optimizer formulates a new access path that performs worse than the previous access path. This can happen for a variety of reasons. Of course, number one is that the optimizer, though good, is not perfect. So mistakes can happen. Other factors can cause degraded access paths, too. The access paths for volatile tables depend on when you run the RUNSTATS. Volatile tables are those that start out empty, get rows added to them during processing, and are emptied out at the end of the day. And, of course, if the catalog or statistics are not accurate we can get problems, too.

So adopting the Four R’s approach implies that you will have to develop a methodology for reviewing your access paths and taking care of any “potential” problem access paths. Indeed, the Four R’s becomes the Five R’s as we add a step to review the access paths after REBINDing to make sure that there are no rogue access paths:

Start RTS (or a RUNSTATS) to determine when to REORG.
Reorganize the table spaces (and/or indexes)
After reorganizing, run RUNSTATS again,
Follow that with the REBINDs.
Then we need that fifth R – which is to Review the access paths generated by the REBIND.

The review is of utmost importance because the optimizer can make mistakes. And, of course, so can you. But your users will not call you when performance is better (or the same). They only ring you up when performance gets worse. As such, proactive shops will put best practices in place to test REBIND results comparing the before and after impact of the optimizer’s choices.

Thursday, July 30, 2009

Discount on IBM Information on Demand Conference

The IBM Information on Demand (IOD, for short) conference is rapidly approaching. The conference will be held in Las Vegas, Nevada the week of October 25 – 30, 2009 at the Mandalay Bay casino and hotel.

The IOD conference is IBM’s signature event for their data and information management product lines. By attending IOD 2009 you can gain unique perspectives from IBM experts, technical leaders and visionaries as well as peers in your industry. Many of the developers of IBM’s offerings, such as DB2, Informix, and IMS, will be delivering educational sessions at IOD. And over 200 customers will share real-world experiences detailing how they have unlocked the value of their information and realized tangible and immediate return on investment.

OK, so what about that discount? IBM has been kind enough to provide me with a discount code that I can share with my readers. You will need to provide the code, G09MULL, when your register for the conference and you will get a $100 discount off of your registration!

Here’s how to register on-line:

Visit the IBM Information On Demand 2009 global conference website and select 'Register now' at: www.ibm.com/events/informationondemand

An IBM userid and Password are required to register. If you have an existing IBM.com account ID, please select 'Register for the Conference' and sign on. If you do not have an IBM.com account ID, please follow the link 'MY IBM ID' to obtain one. Complete the form and click 'SUBMIT'. You will then be allowed to continue as a registered IBM user. Click 'Continue' and select 'Register for the Conference' and sign on with your new userid and password.

Select registration type of 'Customer' and click 'Continue'

Complete the appropriate fields on the enrollment form

Under the 'Promotion Code Information' section, enter PROMOTION CODE G09MULL. You must enter this code to receive your $100 discount.

Complete the form with attendee, arrival and payment information clicking 'Continue' at the bottom of each section.

Click 'Submit Registration'

Note: Credit card information is required to guarantee your registration regardless of your method of payment and/or discount amount.

That’s all there is to it. And you can use the promotion code multiple times for all of the folks at you company that will be attending the Information On Demand conference in 2009.

Cheers!