The Db2 Portal Blog

Friday, November 04, 2005

No Black Boxes!

I've written about this subject before, but I think it is important enough to merit another go-round. First of all, before I go any further, let's first define what I mean by a “black box.” If I plan to recommend that you prohibit them we better both understand what it is we are talking about proscribing.

Simply put, a black box is a database access program that sits in between your application programs and DB2. It is designed so that all application programs call the black box for data instead of writing SQL statements that are embedded into a program. The general idea behind such a contraption is that it will simplify DB2 development because programmers will not need to know how to write SQL. Instead, the programmer just calls the black box program to request whatever data is required. SQL statements become calls – and every programmer knows how to code a call, right?

This approach is commonly referred to as a “black box” approach because the data access interface shields the developers from the “complexities” of SQL. The SQL is contained in that black box and programmers do not need to know how the SQL works – just how to call the black box for data. Black boxes usually are introduced into an organization when management gets the notion that it would be quicker and easier for programmers to request data from a central routine than to teach them all SQL.

But I think there are a very good reasons why this approach is not sound. Let’s examine them.

Ignorance (of SQL) is not a Virtue

The basic premise of implementing black box technology is that it is better for programmers to be ignorant of SQL. This means that your company will be creating DB2 applications using developers with little to no understanding of how SQL works. So what may seem like simple requests to a non-educated programmer may actually involve very complex and inefficient SQL “behind the scenes” running in the black box. So innocuous requests for data can perform quite poorly.

When programmers are knowledgeable about SQL they can at least understand the complexity of their data requests and formulate them to perform better. For example, SQL programmers will understand when data must be joined and thereby can form their data requests in such a way as to join efficiently (and perhaps to minimize joining in certain circumstances). With no knowledge of SQL the programmer will have no knowledge of joining – and more importantly, no true means at his or her disposal to optimize their data requests.

Industry experts agree ("kind of") that about 80 percent of database performance problems are due to inefficient application code - mostly SQL. Whereas basic SQL is simple to learn and easy to start using, SQL tuning and optimization is an art that can take years to master.

Be sure to train your application development staff in the proper usage of SQL – and let them write the SQL requests in their programs. Develop and publish SQL guidelines in a readily accessible place (such as your corporate intranet or portal). These guidelines should outline the basics elements of style for DB2 SQL programming. I won't go into what these rules are here...

Now let’s face it, even when using the "black box" technique some technicians in your organization will still have to understand SQL – namely the writer(s) of the black box code. Because all of the SQL is coded in the black box program (or programs) someone has to be capable of writing efficient and effective SQL inside of the black box program. Which brings us to our next consideration.

Shortcuts Make for Poor Performance

The SQL programmers in charge of writing the black box code will inevitably introduce problems into the mix. This is so because of simple human nature – and because of most technicians’ desire to find shortcuts. But SQL shortcuts can lead to poor performance.
The black box inevitably will deviate from the standards and procedure of good SQL development. For example, let’s assume that there are three application programs and each one of them needs to retrieve customer information by area code. Program 1 needs the customer name and address, program 2 requires customer ID, name, and phone number, and program 3 requires customer ID, name and type. This is properly coded as three different SQL requests (each one in its own program). For program 1 we would write:

SELECT FIRST_NAME, LAST_NAME, ADDRESS, CITY,
STATE, ZIP
FROM CUSTOMERWHERE AREA_CODE = :HV-AC;

For program 2 we would write:

SELECT CUST_ID, FIRST_NAME, LAST_NAME, PHONE_NUM
FROM CUSTOMER
WHERE AREA_CODE = :HV-AC;

And for program 3 we would write:

SELECT CUST_ID, FIRST_NAME, LAST_NAME, CUST_TYPE
FROM CUSTOMER
WHERE AREA_CODE = :HV-AC;

Of course, all of these SQL statements are remarkably similar, aren’t they? If we were in charge of writing the black box for these requests we would likely think about consolidating these three SQL statements into one statement like this:

SELECT FIRST_NAME, LAST_NAME, ADDRESS, CITY,
STATE, ZIP, PHONE_NUM, CUST_TYPE
FROM CUSTOMER
WHERE AREA_CODE = :HV-AC;

Of course, this revised query will work for all three of these requests. When program 1 calls the black box we execute the query and return just the customer name and address; for program 2 we return just customer ID, name, and phone number; and for program 3 the black box returns only customer ID, name and type. If we do this, we’ve just coded a shortcut in our black box.

“So what?” you may ask. "Isn't it good to cut down on the amount of code that must be written?" Well, this is bad program design because we are violating one of the cardinal SQL coding guidelines, namely: SQL statements should retrieve only those columns required; never more. This is so because additional work is required by DB2 to send additional columns to your programs. Minimizing the number of columns in your SELECT statements will improve application performance.

By coding shortcuts such as these into the black box you are designing poor performance into your DB2 applications. And a black box will use shortcuts. The example given here is a simple one, but even more complex shortcuts are possible in which WHERE clauses are coded so that they can be bypassed with proper host variables. For example, perhaps sometimes we need to query by area code and other time by area code and customer type. Well, we could code the CUST_TYPE predicate as a range something like this:

WHERE CUST_TYPE >= :HV1 and CUST_TYPE =< :HV2;

When we want to query for CUST_TYPE we simply provide the same value to both HV1 and HV2; when we do not want to query for CUST_TYPE we choose a larger value for HV1 than for HV2 (for example, 1 and 0). This effectively blocks out the CUST_TYPE predicate. Using tricks like this it is possible to cram a lot of different SQL statements into one – with the results usually being worse performance than if they were separate SQL statements.

Imagine the further performance difficulties that can ensue if instead of just returning extra columns, we code SQL that returns extra rows -- but pass back only what is needed. Of course, this violates an even more important basic SQL rule, namely: return only those rows required by the program, never more. Applications and SQL performance suffers greatly when programs access and/or return rows that are not needed.

Extra Code Means Extra Work

Additionally, when you code a black box your application will require more lines of code to be executed than without the black box. It is elementary when you think about it. The call statement in the calling program is extra and the code surrounding the statements in the black box that ties them together is extra. None of this is required if you just plug your SQL statements right into your application programs.

This extra code must be compiled and executed. When extra host language code is required – no matter how little or efficient it may be – extra CPU will be expended to run the application. More code means more work. And that means degraded performance.

SQL is Already an Access Method

The final argument I will present here is a bit of a philosophical one. When you code a black box you are basically creating a data access method for your programs. To access data each program must call the black box. But SQL is already an access method – so why create another one?
Not only is SQL an access method but it is a very flexible and comprehensive access method at that. You will not be able to create an access method in your black box that is as elegant as SQL – so why try?

Summary

I assert that you should not implement data access interfaces that are called by application programs instead of coding SQL requests as needed in each program or stored procedure. When a black box is used, the tendency is that short cuts are taken. The black box inevitably deviates from proper SQL development guidelines, requires additional work and additional code, and is just another access method that is not required. Do not get lost in the black box – instead, train your programmers to code efficient SQL statements right in their application programs. Your applications will thank you for it!

Friday, October 28, 2005

A Forced Tour of Duty

Well, as promised, here is my first real post to my DB2portal blog. I want to use this post to rant a little bit about mainframes and their robust management environment.

Mainframe developers are well aware of the security, scalability, and reliability of mainframe computer systems and applications. Unfortunately, though, the bulk of new programmers and IT personnel are not mainframe-literate. This should change. But maybe not for the reasons you are thinking.

Yes, I am a mainframe bigot. I readily admit that. In my humble opinion there is no finer platform for mission critical software development than the good ol’ mainframe. And that is why every new programmer should have to work a tour of duty on mainframe systems and applications as soon as they graduate from college.

Why would I recommend such a thing? Well, due to the robust system management processes and procedures which are in place and working at every mainframe shop in the world. This is simply not the case for Windows, Unix, and other platforms. Of course, I don't want to overly disparage non-mainframe systems. Indeed, much of the credit for the mainframe's superior management lies in its long legacy. Decades of experience helped mainframers build up the systems management capabilities of the mainframe.

But by working on mainframe systems, newbies will finally begin to learn the correct IT discipline for managing mission critical software. The freed0m that is allowed on non-mainframe systems helps folks to learn - but it is not conducive to the creation of hardened, manageable systems.

No longer is it okay to just insert a CD and install new software willy-nilly onto a production machine. Mainframe systems have well-documented and enforced change management procedures that need to be followed before any software is installed into a production environment.

No longer is it okay to just flip the switch and reboot the server. Mainframe systems have safeguards against such practices. Months, sometimes years, can go by without having to power down and re-IPL the mainframe.

And don’t even think about trying to get around security protocols. In mainframe shops there is an entire group of people in the operations department responsible for protecting and securing mainframe systems, applications, and data.

Ever wonder why there are no mainframe viruses? A properly secured operating system and environment make such a beast extremely unlikely.

Project planning, configuration management, capacity planning, job scheduling and automation, storage management, database administration, operations management, and so on – all are managed and required in every mainframe site I’ve ever been involved in. When no mainframe is involved many of these things are afterthoughts, if they’re even thought of at all.

Growing up in a PC world is a big part of the problem. Although there may be many things to snark about with regard to personal computers, one of the biggest is that they were never designed to be used the way that mainframes are used. Yet we call a sufficiently “pumped-up” PC a server – and then try to treat it like we treat mainframes. Oh, we may turn it on its side and tape a piece of paper on it bearing a phrase like “Do Not Shut Off – This is the Production Server”… but that is a far cry from the glass house that we’ve built to nourish and feed the mainframe environment.

The bottom line is that today’s distributed systems do not deliver the stability, availability, security, or performance of mainframe systems. A forced tour of duty supporting or developing applications for a mainframe would do every IT professional a whole world of good!

Thursday, October 27, 2005

Welcome to my DB2 Blog

Hi everyone.

This is just a short post to introduce my new blog here on DB2portal.com. I write a regular blog on DBMS, data, and database management over at dbazine.com - you can check it out at here if you'd like.

This blog will be a little different than that one in that I will keep my comments focused on DB2 for z/OS and mainframe "stuff" only. Well, I might post something totally "off-topic" every now and then, too. Maybe if a news item touches a personal nerve - or I happen upon something too funny to not comment on.

I'll try to post something new here weekly (sometimes a little more, sometimes a little less). Look for my first content-laden posting tomorrow!

And in the meantime, feel free to post your own comments.