Monday, December 19, 2005
One issue that comes up a lot is dealing with locking issues. Usually it is posed by someone who is experiencing timeouts in an online environment and they want to know how to minimize them. Here is some guidance.
When you experience a timeout, it means that another process holds a lock on the data that you are trying to modify. So, it stands to reason that you should try to minimize the duration of locks that are being held in your system. There are some approaches you can take to achieve this, but for the most part, they require programming changes.
First of all, make sure that all of your batch processes -- especially any that run during the same timeframe, but really all batch process -- have a COMMIT strategy. This means that your programs should issue a COMMIT after processing "a set number of" inserts, updates, and deletes. A COMMIT will tell DB2 to make the changes permanent and releases locks.
A good approach is to set a counter that is incremented after every modification. Then, check it and when it exceeds a predefined threshold -- say 25 or 50 or 100 modications -- then issue a COMMIT. You should make the threshold an input parameter so that you can change it as the workload in your system changes. For example, make it 25 when concurrent activity is high, but ramp it up to 100 or higher when it is low. Failure to issue COMMITs will result in timeouts, as well as possibly deadlocks and lock escalation.
(Also, please note that these are just sample numbers and not necessarily the correct numbers to start with at your shop.)
A good article to read regarding COMMIT strategies is "The Woes of Commitment" by Bonnie Baker.
Next, look at all of your programs, batch and online, and try to save the data modification statements to as close to the COMMIT as possible. By saving the data modification until right before you issue a COMMIT, you reduce the overall average lock duration. This should result in reducing contention and therefore, the number of timeouts.
If you want to investigate the timeout details, be sure to examine the statistics trace class 3 and IFCID 0196 for timeouts (IFCID 0172 is for deadlocks).
Monday, December 12, 2005
Fortunately, IBM has come up with some help for those of us working with DB2 for z/OS. IBM has published – and made available for free download – a great little manual titled The Business Value of DB2 for z/OS. In this book you can learn all about the ROI that mainframe DB2 can offer your organization. Instead of all of those technical details that DBAs tend to savor, this document will assist you in analyzing and communicating the many business benefits provided by the combination of DB2 running on z/OS.
The book covers topics such as up-to-date details on how DB2 can integrate with, and enable sharing of information across multiple platforms. It discusses how the mainframe scales to manage large volumes of data. And the manual also describes how IBM’s on-going “on demand” initiative has improved the manageability of DB2.
If you use DB2 for z/OS, you should download a copy of this redbook. It is free, after all. And the next time that someone challenges the mainframe as an on-going, viable platform for business computing you will have a powerful, unemotional tool to help battle that misconception.
Tuesday, December 06, 2005
Now don't get me wrong. I'm sure there are some shops that have removed their mainframe. But I'm also sure that there are many more that thought about it but couldn't do it -- as well as those who wouldn't even consider it.
A bigger problem for the mainframe than the misguided notion that it is more costly than other computing platforms is the aging of the mainframe workforce. This is a reality. If you don't believe me, go to a SHARE conference and fix your eyeballs on some of the dinosaurs attending mainframe sessions there (myself included).
Basically, the problem is that mainframe experts are getting older and slowly retiring. And who will replace them? Most young IT professionals do not choose to work on mainframe systems, instead choosing to concentrate on the latest technology bandwagons -- things like Windows and Linux, open source and so on. Put one of these newbies in front of a terminal and introduce them to the joys of JCL, ISPF and COBOL, then watch them scream out the door yelling "I want my Java!" (And who can blame them?)
But this is actually an inaccurate perception. You see, mainframe no longer means ugly old green screens. Today's mainframe environment is quite different from the mainframe of yesteryear. That hulking, water-cooled beast you may remember has been replaced with chip-based, CMOS, air-cooled systems. Today's mainframes are easier to hook together using Parallel Sysplex technology. And all of the "modern" technology used on Windows and Linux platforms works on the mainframe, too. Yes, that means XML, TCP/IP, Java and so on all work on the mainframe, too.
Nowadays, the biggest mainframe "problems" are training and PR. Let's focus on training first. Mainframe technology is not taught by most universities these days; this really needs to change. What is needed is a comprehensive educational program delivered through major universities, as well as IT-focused institutions like DeVry and NorthFace universities. The program should be sponsored by major mainframe vendors, which could provide hardware and software, as well as a conduit for hiring graduates. Actually, IBM is doing something just like this nowadays. An ongoing mainframe program in the universities will help to further promote and extend the mainframe. And that is goodness.
And why would universities be interested in such a program? Employability of their graduates! As the current crop of mainframe experts retire, companies will have to replace them. I'd venture to guess that 10 years or so down the line, it will be easier for an IMS DBA, for example, to get a job offer than an Oracle DBA. The demand will be greater for the IMS talent because the supply is so low.
The publicity component is a bit more difficult. So much has been written and implied about the mainframe being dead that a lot folks believe it. But the mainframe continues to be a robust, viable component of today's IT infrastructure. Organizations continue to add more MIPS, deploy more applications and run their most important, mission-critical applications on mainframe computers. Until this aspect of the mainframe is publicized more, the existing perception is likely to linger.
Or maybe we should just give the mainframe a new name and pretend that it is a new technology with better availability, scalability and performance than the existing platforms - how about a name like the "AlwaysAvailable"?
This posting is a slightly revised version of a piece I wrote for Search390.com.
If you'd like to read the original, click here (registration required).
Tuesday, November 29, 2005
As regular readers of this blog know, this space is devoted to DB2 for z/OS – that’s the mainframe for those of you just visiting. I’d like to take a moment here, though, to point out some interesting posts for cross-platform DB2 DBAs.
Chris Eaton, Senior Product Manager for DB2 Universal Database at IBM, writes a blog for ITtoolbox focusing on DB2 UDB for Linux, Unix, and Windows (LUW). If you use DB2 on that platform, be sure to read his blog regularly.
So why am I mentioning this on a mainframe DB2 blog? Well, Chris has recently posted some very nice entries outlining the similarities and differences between DB2 on the z/OS and LUW platforms. Here are the links to those postings for those of you interested in expanding your understanding. Each is interesting and worth reading:
- DB2 on Linux, UNIX, Windows for a z/OS DBA
- Tablespaces z/OS vs. LUW
- Where are my Storage Groups?
- BSDS and Bufferpools
- The Rosetta Stone for DB2 Address Spaces
Tuesday, November 22, 2005
Basically, it boils down to this. There are some workloads that just are better off being served by mainframes. This is the parallel I like to draw:
If you are going to plow a field, what animal(s) would you choose to drive your plow: a nice strong, sturdy ox (mainframe) -or- 64 chickens (Unix servers) -or- 128 gerbils (Windows servers)?
Monday, November 21, 2005
(Of course, this begs the question: "How frequently should I run RUNSTATS?" to which my answer is "As frequently as possible based on how often your data changes," but enough of this aside for now.)
Of course, the DB2 optimizer is not perfect so sometimes rebinding can cause the performance of certain SQL statements to degrade. You will have to be ready to handle these problems by using optimization hints (OPTHINT in the PLAN_TABLE) to go back to a satisfactory access path or by tweaking your SQL to achieve a better performing access path (and some people also may say "...or change the catalog statistics," but that should only be a last resort and is rarely required these days).
Additionally, we have not considered the impact and need to periodically reorganize your DB2 table spaces using the REORG utility. RUNSTATS populates the DB2 Catalog with the information you need to decide when a REORG is warranted. Of course, you would want to run RUNSTATS again after a REORG to obtain the most up-to-date statistics... and only then would you want to REBIND your plans and packages.
The second approach is the "if it ain't broke, don't fix it" approach. In this scenario, you will continue to run RUNSTATS regularly but you will not REBIND your plans and packages until performance degrades. This approach is embraced by shops that do not have the manpower or time to review all access paths after a mass REBIND. By not running REBIND the thought is that performance will continue along as is until data volumes change so significantly that end users start to complain. Only then will individual plans and packages be rebound following the next scheduled RUNSTATS or immediately if the problem is large enough. This approach will degrade performance, albeit possibly subtly over time. However, it does save DBA manpower, which might be in short supply.
Examine your shop's approach to the REBIND issue to see which approach is best for you. Although philosophically I agree with the first approach, I understand that the second approach sometimes can be preferable in practice. If you follow the second approach, be sure that you have pre-agreed Service Level Agreements (SLAs) for your DB2 applications. Then, you can reasonably argue that there is no reason to REBIND anything until you are no longer meeting the SLA.
Friday, November 11, 2005
So, the portal won't be 100% mainframe content, but it should have some interesting nuggets of data for mainframe DB2 DBAs.
Thursday, November 10, 2005
For DB2 UDB on Linux/Unix/Windows, when creating a table, you can use the optional VALUE COMPRESSION clause to specify that the table is using the space saving row format at the table level and possibly at the column level. There are two ways in which tables can occupy less space when stored on disk:
- If the column value is NULL, do not set aside the defined, fixed amount of space.
- If the column value can be easily known or determined (like default values) and if the value is available to the database manager during record formatting and column extraction.
If VALUE COMPRESSION is used then the optional COMPRESS SYSTEM DEFAULT parameter can also be specified to further reduce disk space usage. Minimal disk space is used if the inserted or updated value is equal to the system default value for the data type of the column. The default value will not be stored on disk. Data types that support COMPRESS SYSTEM DEFAULT include all numerical type columns, fixed-length character, and fixed-length graphic string data types. This means that zeros and blanks can be compressed.
The two platforms vary dramatically in how they approach "compression." The mainframe actually applies an algorithm to the data to compress it into another format. Every row that is inserted must first be compressed before storing it; every row that is read must be decompressed. On LUW platforms, DB2 compression is simply a way of avoiding the storage of certain types of data that either can be determined easily, or need not be stored.
So, it is highly probable that you will get completely different results on LUW than you do on a mainframe (OS/390, z/OS). Which one is better will depend on the type of data you are storing based on the requirements of your applications.
So, when should you consider using compression? In general, use DB2 for z/OS compression for larger tablespaces where the disk savings can be significant. For very small tables, the amount of space required to store the compression dictionary may exceed the space saved by compressing the data.
What is the compression dictionary? Well, as I mentioned earlier, DB2 for z/OS compression is enabled by specifying COMPRESS YES for the tablespace in your DDL. When compression is specified, DB2 builds a static dictionary to control compression. This will cause from 2 to 17 dictionary pages to be stored in the tablespace. These pages are stored after the header and first space map page.
For partitioned tablespaces, DB2 will create a separate compression dictionary for each tablespace partition. Multiple dictionaries tend to cause better overall compression ratios. In addition, it is more likely that the partition-level compression dictionaries can be rebuilt more frequently than non-partitioned dictionaries. Frequent rebuilding of the compression dictionary can lead to a better overall compression ratio.
Avoid compressing table spaces with multiple tables in them because the compression ratio can be impacted by the different types of data in the multiple tables, and DB2 can only have one compression dictionary per table space.
But why compress data at all? Consider an uncompressed table with a large row size, say 800 bytes. Therefore, five of this table's rows fit on a 4K page. If the compression routine achieves 30 percent compression, on average, the 800-byte row uses only 560 bytes, because (800*0.3)=560. Now, on average, seven rows fit on a 4K page. Because I/O occurs at the page level, the cost of I/O is reduced because fewer pages must be read for tablespace scans, and the data is more likely to be in the bufferpool because more rows fit on a physical page. This can be a significant I/O improvement. Consider the following scenarios. A 10,000-row table with 800-byte rows requires 2,000 pages. Using a compression routine as outlined previously, the table would require only 1,429 pages. Another table also with 800-byte rows but now having 1 million rows would require 200,000 pages without a compression routine. Using the compression routine, you would reduce the pages to 142,858 - a reduction of more 50,000 pages.
Another question I am commonly asked is about overhead. Yes, there is going to be some overhead involved if you turn on compression... CPU is required to apply the Ziv-Lempel algorithm to compress upon insertion - and to de-compress upon access. Of course, this does NOT mean that overall performance will suffer if you turn on compression. Rememeber the trade-off: additional CPU in exchange for possibly improved I/O efficiency. You see, when more compressed rows fit onto a single page fewer I/O operations may be needed to satisfy your query processing needs. If you are performing a lot of sequential access (as opposed to random access) you can get improved performance because fewer I/O operations are required to access the same number of rows.
Of course. there is always the other trade-off to consider, too: disk storage savings in exchange for CPU cost of compressing and decompressing data. Keep in mind, too though, DB2 can use hardware-assisted compression if you have the right type of hardware. Hardware-assisted compression simply speeds up the compression and decompression of data -- it is not a requirement for the inherent data compression features of DB2. So, the overall cost of compression may be minimal with hardware-assisted compression. Indeed, due to I/O issues, overall elapsed time for certain I/O heavy processes may decrease when data is compressed.
You can use the DSN1COMP utility to estimate how much disk space will be saved by compressing a tablespace before deciding whether to turn compression on or not. This utility can be run on full image copy data sets, VSAM data sets that contain DB2 table spaces, or sequential data sets that contain DB2 table spaces (such as DSN1COPY output). DSN1COMP does not estimate savings for data sets that contain LOB table spaces or index spaces. Refer to the IBM Utility Guide and Reference for more information on DSN1COMP.
Of course, before you consider compression be sure to examine all of its details -- and be sure to understand all of the nuances of your particular data and applications. But don't be afraid of investigating its use... compression can be a very handy tool in the DBA's arsenal!
Friday, November 04, 2005
Simply put, a black box is a database access program that sits in between your application programs and DB2. It is designed so that all application programs call the black box for data instead of writing SQL statements that are embedded into a program. The general idea behind such a contraption is that it will simplify DB2 development because programmers will not need to know how to write SQL. Instead, the programmer just calls the black box program to request whatever data is required. SQL statements become calls – and every programmer knows how to code a call, right?
This approach is commonly referred to as a “black box” approach because the data access interface shields the developers from the “complexities” of SQL. The SQL is contained in that black box and programmers do not need to know how the SQL works – just how to call the black box for data. Black boxes usually are introduced into an organization when management gets the notion that it would be quicker and easier for programmers to request data from a central routine than to teach them all SQL.
But I think there are a very good reasons why this approach is not sound. Let’s examine them.
Ignorance (of SQL) is not a Virtue
The basic premise of implementing black box technology is that it is better for programmers to be ignorant of SQL. This means that your company will be creating DB2 applications using developers with little to no understanding of how SQL works. So what may seem like simple requests to a non-educated programmer may actually involve very complex and inefficient SQL “behind the scenes” running in the black box. So innocuous requests for data can perform quite poorly.
When programmers are knowledgeable about SQL they can at least understand the complexity of their data requests and formulate them to perform better. For example, SQL programmers will understand when data must be joined and thereby can form their data requests in such a way as to join efficiently (and perhaps to minimize joining in certain circumstances). With no knowledge of SQL the programmer will have no knowledge of joining – and more importantly, no true means at his or her disposal to optimize their data requests.
Industry experts agree ("kind of") that about 80 percent of database performance problems are due to inefficient application code - mostly SQL. Whereas basic SQL is simple to learn and easy to start using, SQL tuning and optimization is an art that can take years to master.
Be sure to train your application development staff in the proper usage of SQL – and let them write the SQL requests in their programs. Develop and publish SQL guidelines in a readily accessible place (such as your corporate intranet or portal). These guidelines should outline the basics elements of style for DB2 SQL programming. I won't go into what these rules are here...
Now let’s face it, even when using the "black box" technique some technicians in your organization will still have to understand SQL – namely the writer(s) of the black box code. Because all of the SQL is coded in the black box program (or programs) someone has to be capable of writing efficient and effective SQL inside of the black box program. Which brings us to our next consideration.
Shortcuts Make for Poor Performance
The SQL programmers in charge of writing the black box code will inevitably introduce problems into the mix. This is so because of simple human nature – and because of most technicians’ desire to find shortcuts. But SQL shortcuts can lead to poor performance.
The black box inevitably will deviate from the standards and procedure of good SQL development. For example, let’s assume that there are three application programs and each one of them needs to retrieve customer information by area code. Program 1 needs the customer name and address, program 2 requires customer ID, name, and phone number, and program 3 requires customer ID, name and type. This is properly coded as three different SQL requests (each one in its own program). For program 1 we would write:
SELECT FIRST_NAME, LAST_NAME, ADDRESS, CITY,
FROM CUSTOMERWHERE AREA_CODE = :HV-AC;
For program 2 we would write:
SELECT CUST_ID, FIRST_NAME, LAST_NAME, PHONE_NUM
WHERE AREA_CODE = :HV-AC;
And for program 3 we would write:
SELECT CUST_ID, FIRST_NAME, LAST_NAME, CUST_TYPE
WHERE AREA_CODE = :HV-AC;
Of course, all of these SQL statements are remarkably similar, aren’t they? If we were in charge of writing the black box for these requests we would likely think about consolidating these three SQL statements into one statement like this:
SELECT FIRST_NAME, LAST_NAME, ADDRESS, CITY,
STATE, ZIP, PHONE_NUM, CUST_TYPE
WHERE AREA_CODE = :HV-AC;
Of course, this revised query will work for all three of these requests. When program 1 calls the black box we execute the query and return just the customer name and address; for program 2 we return just customer ID, name, and phone number; and for program 3 the black box returns only customer ID, name and type. If we do this, we’ve just coded a shortcut in our black box.
“So what?” you may ask. "Isn't it good to cut down on the amount of code that must be written?" Well, this is bad program design because we are violating one of the cardinal SQL coding guidelines, namely: SQL statements should retrieve only those columns required; never more. This is so because additional work is required by DB2 to send additional columns to your programs. Minimizing the number of columns in your SELECT statements will improve application performance.
By coding shortcuts such as these into the black box you are designing poor performance into your DB2 applications. And a black box will use shortcuts. The example given here is a simple one, but even more complex shortcuts are possible in which WHERE clauses are coded so that they can be bypassed with proper host variables. For example, perhaps sometimes we need to query by area code and other time by area code and customer type. Well, we could code the CUST_TYPE predicate as a range something like this:
WHERE CUST_TYPE >= :HV1 and CUST_TYPE =< :HV2;
When we want to query for CUST_TYPE we simply provide the same value to both HV1 and HV2; when we do not want to query for CUST_TYPE we choose a larger value for HV1 than for HV2 (for example, 1 and 0). This effectively blocks out the CUST_TYPE predicate. Using tricks like this it is possible to cram a lot of different SQL statements into one – with the results usually being worse performance than if they were separate SQL statements.
Imagine the further performance difficulties that can ensue if instead of just returning extra columns, we code SQL that returns extra rows -- but pass back only what is needed. Of course, this violates an even more important basic SQL rule, namely: return only those rows required by the program, never more. Applications and SQL performance suffers greatly when programs access and/or return rows that are not needed.
Extra Code Means Extra Work
Additionally, when you code a black box your application will require more lines of code to be executed than without the black box. It is elementary when you think about it. The call statement in the calling program is extra and the code surrounding the statements in the black box that ties them together is extra. None of this is required if you just plug your SQL statements right into your application programs.
This extra code must be compiled and executed. When extra host language code is required – no matter how little or efficient it may be – extra CPU will be expended to run the application. More code means more work. And that means degraded performance.
SQL is Already an Access Method
The final argument I will present here is a bit of a philosophical one. When you code a black box you are basically creating a data access method for your programs. To access data each program must call the black box. But SQL is already an access method – so why create another one?
Not only is SQL an access method but it is a very flexible and comprehensive access method at that. You will not be able to create an access method in your black box that is as elegant as SQL – so why try?
I assert that you should not implement data access interfaces that are called by application programs instead of coding SQL requests as needed in each program or stored procedure. When a black box is used, the tendency is that short cuts are taken. The black box inevitably deviates from proper SQL development guidelines, requires additional work and additional code, and is just another access method that is not required. Do not get lost in the black box – instead, train your programmers to code efficient SQL statements right in their application programs. Your applications will thank you for it!
Friday, October 28, 2005
Mainframe developers are well aware of the security, scalability, and reliability of mainframe computer systems and applications. Unfortunately, though, the bulk of new programmers and IT personnel are not mainframe-literate. This should change. But maybe not for the reasons you are thinking.
Yes, I am a mainframe bigot. I readily admit that. In my humble opinion there is no finer platform for mission critical software development than the good ol’ mainframe. And that is why every new programmer should have to work a tour of duty on mainframe systems and applications as soon as they graduate from college.
Why would I recommend such a thing? Well, due to the robust system management processes and procedures which are in place and working at every mainframe shop in the world. This is simply not the case for Windows, Unix, and other platforms. Of course, I don't want to overly disparage non-mainframe systems. Indeed, much of the credit for the mainframe's superior management lies in its long legacy. Decades of experience helped mainframers build up the systems management capabilities of the mainframe.
But by working on mainframe systems, newbies will finally begin to learn the correct IT discipline for managing mission critical software. The freed0m that is allowed on non-mainframe systems helps folks to learn - but it is not conducive to the creation of hardened, manageable systems.
No longer is it okay to just insert a CD and install new software willy-nilly onto a production machine. Mainframe systems have well-documented and enforced change management procedures that need to be followed before any software is installed into a production environment.
No longer is it okay to just flip the switch and reboot the server. Mainframe systems have safeguards against such practices. Months, sometimes years, can go by without having to power down and re-IPL the mainframe.
And don’t even think about trying to get around security protocols. In mainframe shops there is an entire group of people in the operations department responsible for protecting and securing mainframe systems, applications, and data.
Ever wonder why there are no mainframe viruses? A properly secured operating system and environment make such a beast extremely unlikely.
Project planning, configuration management, capacity planning, job scheduling and automation, storage management, database administration, operations management, and so on – all are managed and required in every mainframe site I’ve ever been involved in. When no mainframe is involved many of these things are afterthoughts, if they’re even thought of at all.
Growing up in a PC world is a big part of the problem. Although there may be many things to snark about with regard to personal computers, one of the biggest is that they were never designed to be used the way that mainframes are used. Yet we call a sufficiently “pumped-up” PC a server – and then try to treat it like we treat mainframes. Oh, we may turn it on its side and tape a piece of paper on it bearing a phrase like “Do Not Shut Off – This is the Production Server”… but that is a far cry from the glass house that we’ve built to nourish and feed the mainframe environment.
The bottom line is that today’s distributed systems do not deliver the stability, availability, security, or performance of mainframe systems. A forced tour of duty supporting or developing applications for a mainframe would do every IT professional a whole world of good!
Thursday, October 27, 2005
This is just a short post to introduce my new blog here on DB2portal.com. I write a regular blog on DBMS, data, and database management over at dbazine.com - you can check it out at here if you'd like.
This blog will be a little different than that one in that I will keep my comments focused on DB2 for z/OS and mainframe "stuff" only. Well, I might post something totally "off-topic" every now and then, too. Maybe if a news item touches a personal nerve - or I happen upon something too funny to not comment on.
I'll try to post something new here weekly (sometimes a little more, sometimes a little less). Look for my first content-laden posting tomorrow!
And in the meantime, feel free to post your own comments.