The Db2 Portal Blog

Friday, February 27, 2015

A Few DB2 Connect Guidelines

Here are a few guidelines for you to consider as you work to implement and optimize your usage of IBM's DB2 Connect gateway...

Consider DB2 Connect PE for Single Workstation Solutions DB2 Connect PE can be a good choice for two-tier client/server applications running on desktop workstations that need to access data from mainframe DB2 databases. However, the personal edition is not suitable for deployment on multi-user systems and application servers.

If you have two people who share a workstation, DB2 Connect PE is a viable solution, because each person uses the workstation individually. That is, they cannot both use the workstation at the same time. Only one connection is required at any time for both of these users so only one license of DB2 Connect PE is required.

On the other hand, if you set up five workstations as dumb terminals making connections to a DB2 for z/OS server, you would need ten licenses of DB2 Connect PE. Of course, you could use DB2 Connect EE instead, too. Or setting up a client with a Type 4 driver can be sufficient.

Consider DB2 Connect EE for Multi-User Solutions DB2 Connect EE is the required solution when a middle tier connectivity server is needed. DB2 Connect EE is suitable for environments where the number of concurrent users can be easily determined. However, DB2 Connect UE and ASE are preferable to EE for web-based applications and multi-tier client/server applications where the number of concurrent users is difficult, or impossible, to determine. This is so mostly due to the cost of licensing.

Again, using Type 4 drivers without a DB2 Connect gateway can be a sufficient solution for many organizations.

Setting Up the DDF for z/OS You must set up the DDF as an application server to connect distributed applications to DB2 for z/OS. There are two ways of doing this:

INSERT the LUNAME of the remote system into the SYSIBM.LUNAMES table.
INSERT the LUNAME, SYSMODENAME, USERSECURITY, ENCRYPTPSWDS, MODESELECT, and USERNAMES values into the SYSIBM.LUNAMES table.

Be sure to perform a DDF update to the Boot Strap Data (BSDS) after making one of these changed. Because DDF will try to connect to VTAM you must ensure that VTAM is active when the DDF starts.

Configure Distributed Threads Use the DSNZPARM CMTSTAT to control the behavior of distributed threads. CMTSTAT specifies whether a DDF thread is made active or inactive after it successfully commits or rolls back and holds no cursors. Consider setting CMTSTAT to INACTIVE because inactive connections consume less storage. A thread can become inactive only if it holds no cursors, has no temporary tables defined, and executes no statements from the dynamic statement cache.

If you specify ACTIVE, the thread remains active. Although this can improve performance it consumes system resources. If your installation supports a large number of connections, specify INACTIVE.

DB2 supports two different types of inactive threads:

An inactive DBAT, previously called a type 1 inactive thread, has the same characteristics as inactive threads prior to DB2 V8. This type of thread remains associated with the connections, but DB2 minimizes the thread’s storage use as much as possible.
An inactive connection, previously called a type 2 inactive thread, uses less storage than an inactive DBAT. In this case, the connections are disassociated from the thread. The thread can be pooled and reused for other connections, new or inactive. This provides better resource utilization because there are typically a small number of threads that can be used to service a large number of connections.

Although inactive connections are preferable to inactive DBATs, not every thread can become an inactive connection. If a thread is to become inactive, DB2 tries to make it an inactive connection. If DB2 cannot make it an inactive connection, it tries to make it an inactive DBAT. If neither attempt is successful, the thread remains active.

Increasing Data Transfer Rates In addition to blocking of rows for a query result set, DB2 can also return multiple query blocks in response to an OPEN or FETCH request to a remote client (such as DB2 Connect). Instead of repeatedly sending requests to DB2 for z/OS requesting one block of data at a time, DB2 Connect can optionally request that the server send back some number of extra query blocks. These extra query blocks allow the client to minimize network traffic, thereby improving performance. DB2 Connect can be set up to request extra query blocks from a mainframe DB2 server by default.

Use the EXTRA BLOCKS SRV parameter on the DB2 DDF installation panel to enable support for extra query blocks. The minimum value for this parameter is 0 and the maximum is 100. Favor keeping the parameter set to the default value, which is 100.

You can also set up the EXTRA BLOCK REQ installation parameter to limit the number of extra DRDA query blocks that DB2 is to request from a remote DRDA server. This parameter does not limit the size of the SQL query answer set. It simply controls the total amount of data that can be transmitted on a network exchange. The minimum value for this parameter is 0 and the maximum is 100, which is also the default.

On the client side, you activate support on a cursor or statement basis. It is controlled by a query rowset size for a cursor, or the value of n in the OPTIMIZE FOR n ROWS clause or FETCH FIRST n ROWS ONLY clause.

Specify Character Data Types with Care When character data is accessed using DB2 Connect, the choice of CHAR versus VARCHAR will impact performance. VARCHAR is more efficient if the length varies significantly. If the size of data does not vary much, CHAR is more efficient because each VARCHAR column requires two bytes of length information to be transmitted along with the data.

Of course, DB2 Connect transmission performance is but one consideration when choosing a data type. Refer to Chapter 5 for additional considerations.

BIND the DB2 Connect Utilities DB2 Connect comes with several utilities that use embedded SQL, and therefore their programs must be bound to a database server before they can be used with that system. The list of bind files required by these utilities is contained in the following ddcsmvs.lst file for DB2 for z/OS. If you do not use the DB2 Connect utilities, you do not have to BIND their programs to each of your DB2 for z/OS database servers.

If DB2 Connect Enterprise Edition is installed, the DB2 Connect utilities must be bound to each database server once from each type of client platform, before they can be used with that system. For example, if you have 5 Windows clients and 4 Linux clients connecting to DB2 for z/OS using DB2 Connect EE, then BIND the programs in ddcsmvs.lst from one of the Windows clients, as well as from one of the Linux clients, and then from the DB2 Connect server. If all the clients are not at the same service level, you might need to BIND from each client for each particular service level.

Beware of SQLCODE and SQLSTATE Differences Different IBM relational DBMSs will not always produce the same SQLCODE values for the same or similar errors. This can be troublesome in distributed applications. There are two ways of handling this situation:

Use SQLSTATE instead of SQLCODE. The SQLSTATE values have the same basic meaning across the IBM’s DBMS products.
Use SQLCODE mapping.

DB2 Connect can map SQLCODEs and tokens from each IBM mainframe or iSeries server to your appropriate DB2 UDB system. You can specify your own SQLCODE mapping file if you want to override the default mapping or you are using a non-IBM database server that does not have SQLCODE mapping.

SQLCODE mapping is enabled by default. If you want to turn off SQLCODE mapping, specify NOMAP in the parameter string of the DCS directory or the DCE routing information object.

Assign Authids per Application Consider assigning each distributed application a dedicated authid. This approach is superior to simply using the same authid for all applications because:

Each authid can be assigned only the permissions it needs.
You can use WLM to assign each authid different priorities, as needed.
For troubleshooting, it is easier to identify the offending application.
You can measure application resource usage by authid.

Assure Appropriate Distributed Logon Authority Be sure that DB2 administrators have the required authority on the distributed platforms that interact with DB2 for z/OS. For example, a UNIX logon should be available for the performance analyst or DBA to view db2diag.log and web application logs.

In addition, the logon should have sufficient authority to execute appropriate commands as needed. For example, iostat and vmstat are useful commands that may need to be issued.

Similar considerations should be made for Windows servers.

Monday, February 16, 2015

Choosing Between DECIMAL and FLOAT Data Types

DB2 can use both DECIMAL and FLOAT data types to store non-integer numeric data. But the two are not equivalent. In general, use DECIMAL instead of FLOAT whenever you can. The main problem with floating point numbers is that they are not precise. DECIMAL values are precise. In other words, a FLOAT value will be an approximate value whereas a DECIMAL value will be an exact value.

At times, if 100 percent precision is not required, you might want to use floating point numbers to save on storage. DB2 provides a parameter to size the floating point column; n where the data type specification is FLOAT(n). If n is between 1 and 21, this is a single precision floating point number and the column will require 4 bytes of storage; if n is between 22 and 53, this is a double precision floating point number and it will require 8 bytes to store it. A single precision floating-point number is a short (32 bits) floating-point number. A double precision floating-point number is a long (64 bits) floating-point number.

For DECIMAL columns, the byte count is calculated as INTEGER(p/2)+1; where p is the precision of the DECIMAL column. So, a DECIMAL(10,2) column will require (10/2)+1 bytes = 6 bytes. An approximation of the same number could be stored in a FLOAT(21) column that would require only 4 bytes of storage.

For very large or very small numbers, though, you will have to use FLOAT columns. This is so because there is a limit of 31 on decimal precision. The following outlines the largest and smallest values that can be supported using DECIMAL and FLOAT data types:

Smallest FLOAT value is about -7.2**75
Largest FLOAT value is about 7.2**75
Smallest positive FLOAT value is about 5.4**-79
Largest negative FLOAT value is about -5.4**-79
Smallest DECIMAL value is 1 – 10**31
Largest DECIMAL value is 10**31 - 1

Note that the values for floating point numbers are approximations. The ** is used to indicate “raised to the power of”.

If you are moving the between platforms, there is an additional concern when using FLOAT. Mainframes use an IBM standard whereas other platforms use different standards. Since floating point numbers are imprecise to begin with this may not be a problem. However, if you want to make sure that a particular column will be exactly the same value regardless of platform, then floating point is not the way to go.

There is a third, newer option called DECFLOAT. Introduced in DB2 9 for z/OS, DECFLOAT is a combination of the two data types discussed here, or a decimal floating-point data type. Specified as DECFLOAT(n), where the value of n can be either 16 or or 34, representing the number of significant digits that can be stored. If the n is not specified, then the DECFLOAT column can represent 34 significant digits.

A decimal floating-point value is an IEEE 754r number with a decimal point. The maximum precision is 34 digits and the range of a DECFLOAT number is as follows:

A DECFLOAT(16) value can range from a low of:

-9.999999999999999×10**384

to a high of:

9.999999999999999×10**384

And a DECFLOAT(34) value can range from a low of:

-9.999999999999999999999999999999999 ×10**6144

to a high of:

9.999999999999999999999999999999999 ×10**6144

In addition, the DECFLOAT data type can be used to represent several special values that represent "non-number numbers," as follows:

Infinity - a value that represents a number whose magnitude is infinitely large.
Quiet NaN - a value that represents undefined results which does not cause an invalid number condition. NaN is not a number.
Signaling NaN - a value that represents undefined results which will cause an invalid number condition if used in any numerical operation.

So decimal-floating point values can be more flexible and precise with the ability to range lower and higher than floating point values (or decimal values). However, before using DECFLOAT be careful and plan ahead. If you use COBOL programs to operate on your DB2 for z/OS data because there is no way to specify the SQL DECFLOAT data type in COBOL.

Monday, February 09, 2015

Nulls Cannot Be Ignored!

NULLs are one of the more controversial things that a DB2 professional has to deal with. And I do mean HAS to deal with.

Because of all the controversy about their usefulness, implementation, and confusing nature, some DB2 folks just decide to hide their head in the sand and ignore NULLs. The thinking goes something like this: “If I do not create any nullable columns in any of my DB2 tables, then I can blissfully ignore the whole NULL mess and be happy!”

Well, that is simply not true. You can follow this approach and still write a query that will return NULL. Don’t believe it? Run this query then:

SELECT AVG(PRSTAFF)

FROM DSN8B10.PROJ

WHERE DEPTNO = ˈXOXˈ;

PRSTAFF is defined as a DECIMAL(5,2) column and we are trying to find the average staffing for a specific department. The PRSTAFF column can contain nulls, but even if it could not, the result here would be the same. This query returns a NULL (unless someone inserted a row with the value of ‘XOX’ for DEPTNO at your site).

Why? There is no department ‘XOX’ in the sample databases (provided with DB2). So, the query is attempting to find an average for the empty set. This is NULL. Some folks think this query should return 0, but it won't! The sum of all PRSTAFF for the department ‘XOX’ is not zero, but is unknown…

So, take the time to understand how NULLs work in DB2, because they cannot be ignored!

Wednesday, February 04, 2015

Helping Out the DB2 Optimizer Using the VOLATILE Keyword

Do you know about the VOLATILE keyword? This keyword was added to DB2 for z/OS back in Version 8. It can be specified on a table using either the CREATE TABLE or ALTER TABLE statement.

By specifying VOLATILE, you are indicating that the volume of data in the table is not stable and is likely to fluctuate. In other words, it is volatile! One common scenario where VOLATILE will be helpful is for tables that are emptied nightly and then repopulated the next day, such as an input queue.

When you specify the VOLATILE keyword on a table, BIND will favor using indexed access paths, even if the table was empty when RUNSTATS was run. It is ideal for single-index tables where you want DB2 to favor using the index.

ERP environments, such as SAP and Peoplesoft, with thousands of tables typically have some tables that meet these criteria. Even worse, it is not uncommon for DBAs to have no idea of the actual content or use for many of those thousands of tables generated by the ERP installation. At times, some of the ERP tables are not in use – depending on which modules of the ERP system you implement… but the tables get created anyway. Many DBAs simply maintain all of the tables provided with the ERP system, whether they are used or not, including running image copies and gathering RUNSTATS for them... and many are empty tables.

Collecting statistics on an empty table populates the catalog with stats indicating that the table contains no data. And, of course, when access paths are generated using those statistics DB2 will probably favor a scan because the table is small (how much smaller can you get than empty?) But some of those tables are volatile, going from empty to perhaps hundreds of thousands of rows during processing.

Of course, if the table is actually empty (or contains only a small amount of data), and VOLATILE is specified, DB2 will favor the use an index if one exists, which can degrade performance a bit. But that is a smaller price to pay than scanning thousands of rows, isn't it?

So one approach is to use the VOLATILE keyword for these types of tables... your users will be glad that you did.

Tuesday, January 20, 2015

Approaches to DB2 Access Path Management

BIND and REBIND are crucially important components as you attempt to assure efficient DB2 applications. Because the BIND/REBIND process determines exactly how your DB2 data is accessed it is important that you develop an appropriate strategy for when and how to REBIND your programs.

There are several common REBIND approaches taken by DB2 users. By far, the best approach is to REBIND your applications over time as the data changes. This approach involves some form of regular maintenance that keeps DB2 statistics up to date and formulates new access paths as data volumes and patterns change.

Other approaches include REBINDing only when a new version of DB2 is installed, or perhaps more ambitious, whenever new PTFs are applied to DB2. Another approach is to REBIND automatically after a regular period of time (days, weeks, months, etc.). This approach can work if the period of time is wisely chosen based on the application data – but it still can pose administrative issues.

The final approach can be summarized as “if it ain’t broke don’t fix it!” This is the worst of the several approaches discussed here. The biggest problem with this approach is that you are penalizing every program in your subsystem for fear that a program or two may have a degraded access path. This results in potentially many programs having sub-optimal performance because the optimizer never gets a chance to create better access paths as the data changes.

Of course, the possibility of degraded performance is real – and that is why this approach has been adopted at some sites. The problem is being able to find which statements have degraded. In an ideal world we would be to be able to review the access path changes beforehand to determine if they are better or worse. But DB2 itself does not provide any systematic method of administering access paths that way. There are third party tools that can help you achieve this though.

Anyway, let’s go back to the best approach again, and that is to REBIND on a regular basis as your data changes. This approach has become known as the Three Rs. To implement this approach you:

Regularly reorganize the data to ensure that it is optimally structured.
Follow that with RUNSTATS to be sure that the reorganized state of the data is reflected in the DB2 Catalog.
And follow that with a REBIND for all the application programs that access the data structures impacted by the REORG and RUNSTATS.

At any rate, your goal should be to keep your access paths up-to-date with the current state of your data. Failing to do this means that DB2 is accessing data based upon false assumptions. DB2 is unlikely to make the same access path choice as your data grows – and as patterns within the data change.

By REBINDing you can generally improve the overall performance of your applications because the access paths will be better designed based on an accurate view of the data. Additionally, as DB2 changes are introduced (PTFs, new version/release) optimizer improvements and new access techniques can be incorporated into the access paths. That is, if you never REBIND, not only are you forgoing better access paths due to data changes but you are also forgoing better access paths due to changes to DB2 itself.

Adopting the Three R’s approach can pose additional questions. For example, when should you reorganize? In order to properly determine when a REORG is needed you’ll have to look at statistics. This means looking at either RUNSTATS or Real-Time Statistics (RTS). So, perhaps it should be at least 4 R’s – in other words:

RTS (or RUNSTATS)
REORG
RUNSTATS
REBIND

Now it is true that some folks don’t rely on statistics to schedule a REORG. Instead, they just build the JCL to REORG their database objects when they create the object. So they create a table space then build the REORG job and schedule it to run monthly, or quarterly, or on some regular basis. This is better than no REORG at all, but it is probably not the best approach because you are most likely either reorganizing too soon (in which case you waste the CPU cycles to do the REORG) or you are reorganizing too late (in which case performance is suffering for a period of time before the REORG runs). Better to base your REORGs off of statistics and thresholds using either RUNSTATS or RTS.

Without accurate statistics there is little hope that the optimizer will formulate the best access path to retrieve your data. If the optimizer does not have accurate information on the size, organization, and particulars of your data then it will be creating access paths based on either default or inaccurate statistics. Incorrect statistics will cause bad choices to be made – such as choosing a merge-scan join when a nested loop join would be better, or failure to invoke sequential prefetch, or using the wrong index – or no index at all. And the problem of inaccurate statistics is pervasive. There are shops out there that never, or rarely, run RUNSTATS to gather up-to-date statistics. Make sure yours is not one of those shops!

When should you run RUNSTATS? One answer is “As frequently as possible based on how often your data changes.” To do this you will need to know a thing or two about your data growth patterns: what is its make-up, how is it used, how fast does it grow, and how often does it change? These patterns will differ for every table space in your system.

Next we need to decide when to REBIND? The best answer for this is when statistics have changed significantly enough to change access paths. When we know that data has significantly changed it makes sense to REBIND after the RUNSTATS completes. But the trick is determining exactly when we have a “significant” change in our data. Without an automated method of comparing and contrasting statistics (or even better yet, access paths) coming up with an answer in a manual way can be time-consuming and error-prone – especially if we have thousands of DB2 programs to manage.

As we REBIND, we always must be on alert for rogue access paths. A rogue access path is created when the optimizer formulates a new access path that performs worse than the previous access path. This can happen for a variety of reasons. Of course, number one is that the optimizer, though good, is not perfect. So mistakes can happen. Other factors can cause degraded access paths, too. The access paths for volatile tables depend on when you run the RUNSTATS. Volatile tables are those that start out empty, get rows added to them during processing, and are emptied out at the end of the day. And, of course, if the catalog or statistics are not accurate we can get problems, too.

So adopting the Four Rs approach implies that you will have to develop a methodology for reviewing your access paths and taking care of any “potential” problem access paths. Indeed, the Four Rs becomes The Five Rs as we add a step to review the access paths after REBINDing to make sure that there are no rogue access paths:

Start with Real Time Stats (or RUNSTATS) to determine when to REORG.
Then we Reorganize the needy table spaces (and indexes)
After reorganizing, run RUNSTATS to pick up current statistics,
Follow that with the BEBINDs.
Then we need that fifth R – which is to Review the access paths generated by the REBIND.

The review is of utmost importance because the optimizer can make mistakes. And, of course, so can you. But your users will not call you when performance is better (or the same). They only dial your numbers when performance gets worse. As such, proactive shops will put best practices in place to test REBIND results comparing the before and after impact of the optimizer’s choices.

Wednesday, January 14, 2015

IBM Announces New Mainframes

IBM today announced their latest model mainframe, the IBM z13.

Everybody knows -- or should know -- that the mainframe is the work horse of the IT industry, but the new z13 boasts some amazing capabilities. The z13 can process 2.5 billion transactions a day. And those transactions are persistent, protected and auditable end-to-end! Furthermore, the z13 delivers practical real-time encryption based on 500 new patents, including advanced cryptographic encryption technologies. And the z13 comes with embedded analytics capabilities.

“Every time a consumer makes a purchase or hits refresh on a smart phone, it can create a cascade of events on the back end of the computing environment. The z13 is designed to handle billions of transactions for the mobile economy. Only the IBM mainframe can put the power of the world's most secure data centers in the palm of your hand," said Tom Rosamilia, senior vice president, IBM Systems.

To put this amount of power into context, IBM notes that a single z13 mainframe can porcess more than 30,000 transactions per second, which is roughly the same as 100 Cyber Mondays every day!

IBM has designed the z13 to integrate real-time scoring and guarantees this capability as a feature of the system. This scoring can be used for fraud detection on 100 percent of a client's business transactions. To take advantage of this capability customers must be running the latest versions of DB2 and SPSS Scoring Adapter on z13 machine, as well as following IBM's capacity planning implementation instructions and processes.

For the long-time mainframe zealots out there, of which I am one, it always gives me a warm feeling inside when IBM comes out with a new line of mainframes. The mainframe always leads the IT industry in terms of processing power and capabilities... and it looks like it will continue to do so well into the future...

Friday, January 09, 2015

New Skills Required of Internet DBAs

When DBAs transition from supporting internal applications only, to Internet-exposed applications, there are a lot of additional skills required to ensure success. Some of these skills are obvious, like the need to understand the protocols of the Internet including HTTP and FTP and how data is transferred across the Internet. Others, are more business-focused, such as the impact of clients accessing your data over the Web from anywhere at any time. Still others are bit more esoteric, like knowledge of the basic tools used for web development including CGI, SSL (Secure Sockets Layer), and how URLs are structured. You should also bone up on Web-development methods and technologies like JSON (JavaScript Object Notation), XML, and the like.

If Java is used, the DBA must develop a working knowledge of the language and how databases are accessed using SQL with Java. This is needed for design, development, and tuning of Java-based web-enabled database applications.

There are several additional skills that will make you a better Internet-using-DBA that should top the list of things to learn before your try to support Web-enabled database applications as a DBA. Compliance and security should top that list. Learn about SQL injection attacks and how to prevent them. Learn about protecting your data using all of the capabilities of your DBMS including trusted context, data masking, row permissions, views, and label-based access control. Learn about the encryption capabilities of your DBMS, O/S and hardware -- and what encrypting data might mean regarding efficient data access. Also, knowledge of Internet security technologies like SSL, firewalls, and network/OS security will prove to be useful.

You must also obtain knowledge on the connections from DB2 to the Internet. This requires knowing how connections are configured and how TCP/IP is setup, configured, and administered.

Another good idea is to gain a working knowledge of the operating system for the server on which your Web server is running. This facilitates easier monitoring and tuning for performance problems. If you are a z/OS DB2 DBA, this will likely require you to get out of your comfort zone and dig into Linux or some variant of Unix.

It can also be worthwhile to develop a better understanding of highly available RAID storage technologies in use for most 24x7 Internet applications.

Of course, having a clear understanding of the business functionality that is being achieved via the web-enabled databases in your organization is also vitally important. Only by understanding the business impact of database downtime can the appropriate administrative techniques be deployed to maintain constant availability.

I don't imagine that this is a 100 percent complete list of thing-you-need-to-know, but it is probably a good place to start your learning adventure. Let me know what I might have missed by adding your comments below!

And if you already support Internet access to your databases and you don't feel up-to-date on these items, then it is definitely time to get cracking!

Friday, December 19, 2014

Season's Greetings

Seasons Greetings

And a hearty "Hello!" to everybody out there... just a short post today to wish everybody a very happy holiday season.

This is the time of year when work starts winding down somewhat and people start to spend time with their families. Whether it is Christmas, Hannukah, Kwanza, or just celebrating the end of one year and the beginning of another, may you spend it peacefully, happily and among those you love.

This will be the final post of the year for this blog, but be sure to join me again next year - 2015 - as we continue to examine all aspects of everybody's favorite DBMS... IBM's DB2...

Monday, December 15, 2014

The Wizard of Userville and the DB2 Developer's Guide

Once upon a time there was a kingdom called Userville. The people in the kingdom were impatient and wanted to know everything about everything—they could never get enough information. Life was difficult and the people were unhappy because data was often lost, and even when it was available, it was often inaccurate and not easy to access.

The King decided to purchase DB2, an advanced tool for storing and retrieving data. With DB2 the Users could process their data and turn it into information. “This,” he thought, “should keep the people happy. DB2 will solve all my problems.” But he soon found out that special knowledge was necessary to make DB2 work its wonders. Nobody in Userville knew how to use it properly.

Luckily, a grand Wizard living in a nearby kingdom knew many mystical secrets for retrieving data. These secrets were a form of magic called SQL. The King of Userville summoned the Wizard, offering him many great treasures if only he would help the poor Users in Userville.

The Wizard soon arrived, determined to please. Armed with nothing more than SQL and a smile, the Wizard strode to the terminal and uttered the magic words:

SELECT E.EMPNO, E.FIRSTNME, E.LASTNAME, D.DEPTNO, D.DEPTNAME

FROM DSN81010.DEPT D,

DSN81010.EMP E

WHERE E.WORKDEPT = D.DEPTNO;

A crowd gathered and applauded as the desired information began pumping out of the terminal. “More, more,” shouted the data-starved masses. The Wizard gazed into the screen, and with amazing speed effortlessly produced report after report. The King was overheard to say, “You know, this is just too good to be true!” Everybody was happy. The Users had their share of information, the King had a peaceful kingdom, and the Wizard had his treasures and the respect of the Users.

For many months, the Users were satisfied with the magic of the great Wizard. Then, one day, the Wizard disappeared…in a jet to the West Coast for 150 grand a year—and a bunch of stock options. The people of the kingdom began to worry. “How will we survive without the magic of the Wizard? Will we have to live, once again, without our precious information?” The Wizard’s apprentice tried to silence the crowd by using his magic, but it wasn’t the same. The information was still there, but it wasn’t coming fast enough or as effortlessly. The apprentice was not yet as skilled as the great Wizard who had abandoned the kingdom. But, as luck would have it, one day he stumbled upon the great Wizard’s diary. He quickly absorbed every page and soon was invoking the Wizard’s magic words. And all was well again.

Well, life is not always that simple. Departing Wizards do not often leave behind documentation of their secrets. But...

DB2 Developer's Guide

Many of you who have purchased my book, DB2 Developer's Guide, will recognize the story recounted above because it starts the book off in Chapter 1. The idea being that the rest of the book is the Wizard's guide to DB2 for z/OS...

If you use DB2 for z/OS for a living and you have never read DB2 Developer's Guide, maybe it is time to treat yourself to an early present for the holidays? The book comprises more than 1500 pages of in-depth DB2 knowledge and information. Over the course of 46 chapters DB2 Developer's Guide covers:

SQL Techniques, Tips, and Tricks
DB2 Application Development
DB2 In-Depth (an under the covers look)
DB2 Perfromance Monitoring
DB2 Utilities and Commands
DB2 Tools and Organizational Issues
Distributed DB2
and much, much more

The book has been in print for more than 20 years now and has been published in 6 different editions over that span. The current edition is the 6th edition published by IBM Press.

So continue and take the next step toward becoming a DB2 Wizard by getting your own copy today!

Wednesday, December 10, 2014

An Extra DBA Rule of Thumb

Last year on the blog I posted a series of 12 DBA Rules of Thumb. As a quick reminder, these Rules of Thumb - or ROTS, are some general rules of the road that apply to the management discipline of Database Administration that I have collected over the years. These ROTs are broadly applicable to all DBAs, even though this is a DB2-focused blog.

Please click on the link in the paragraph above if you need a refresher on the DBA ROTs from last year.

The purpose of today's blog post is to suggest an additional Rule of Thumb... and that is to Diversify! A good DBA is a Jack-of-All-Trades.

You can’t just master one thing and be successful in this day-and-age. The DBA maintains production, QA and test environments, monitors application development projects, attends strategy and design meetings, selects and evaluates new products, and connects legacy systems to the Web.

And if all of that is not enough, to add to the chaos, DBAs are expected to know everything about everything. From technical and business jargon to the latest management and technology fads and trends, the DBA is expected to be “in the know.” For example, the DBA must be up on trends like Big Data and Analytics.

And do not expect any private time: A DBA must be prepared for interruptions at any time to answer any type of question… and not just about databases, either.

When application problems occur, the database environment is frequently the first thing blamed. The database is “guilty until proven innocent.” And the DBA is expected to be there to fix things. That means the DBA is often forced to prove that the database is not the source of the problem. The DBA must know enough about all aspects of IT to track down errors and exonerate the DBMS and database structures he has designed. So he must be an expert in database technology, but also have semi-expert knowledge of the IT components with which the DBMS interacts: application programming languages, operating systems, network protocols and products, transaction processors, every type of computer hardware imaginable, and more. The need to understand such diverse elements makes the DBA a very valuable resource. It also makes the job interesting and challenging.

To summarize, the DBA must be a Jack-of-all-Trades... and a master of several!!!

Tuesday, December 02, 2014

DSN1COPY Improvements in DB2 11 for z/OS

There have been some nice data validation improvements made to the IBM DSN1COPY utility in DB2 11 for z/OS. I suppose I should first explain what the DSN1COPY utility is before I talk about how it has been improved, so...

DSN1COPY is also known as the "Offline Copy utility." It has many uses. Of course, the primary use case for DSN1COPY is to copy data sets without DB2 having to be up and running. DSN1COPY can be used to copy VSAM data sets to sequential data sets, and vice versa. It also can copy VSAM data sets to other VSAM data sets and can copy sequential data sets to other sequential data sets. As such, DSN1COPY can be used to:

Create a sequential data set copy of a DB2 table space or index data set.
Create a sequential data set copy of another sequential data set copy produced by DSN1COPY.
Create a sequential data set copy of an image copy data set produced using the DB2 COPY utility, except for segmented table spaces.
Restore a DB2 table space or index using a sequential data set produced by DSN1COPY.
Restore a DB2 table space using a full image copy data set produced using the DB2 COPY utility.
Move DB2 data sets from one disk to another.
Move a DB2 table space or index space from a smaller data set to a larger data set to eliminate extents. Or move a DB2 table space or index space from a larger data set to a smaller data set to eliminate wasted space.

DSN1COPY runs as a batch job, so it can run as an offline utility when the DB2 subsystem is inactive. It can run also when the DB2 subsystem is active, but the objects it operates on should be stopped to ensure that DSN1COPY creates valid output. DSN1COPY does not check to see whether an object is stopped before carrying out its task. DSN1COPY does not directly communicate with DB2.

DSN1COPY performs a page-by-page copy. Therefore, you cannot use DSN1COPY to alter the structure of DB2 data sets. For example, you cannot copy a partitioned table space into a segmented table space.

Perhaps the nicest feature of DSN1COPY is its ability to modify the internal object identifier stored in DB2 table space and index data sets, as well as in data sets produced by DSN1COPY and the DB2 COPY utility. When you specify the OBIDXLAT option, DSN1COPY reads a data set specified by the SYSXLAT DD statement. This data set lists source and target DBIDs, PSIDs or ISOBIDs, and OBIDs, thereby enabling you to modify these IDs accordingly (possibly for moving data from one subsystem to another).

You can also use DSN1COPY to check the validity of table space and index pages.

OK Then, But What's New?

So now that we understand the DSN1COPY utility, let's dig in to learn a little bit about how it has been improved in DB2 11 for z/OS. Basically, DB2 11 bring improved data validation to the DSN1COPY utility.

In DB2 11, the target data set produce by DSN1COPY is automatically validated after it is populated. The first time that the target data set is physically opened by an operation other than a utility, DB2 checks for inconsistencies in the data and the DB2 Catalog. The validation performed includes checking:

DBID, PSID, and OBID
SEGSIZE and PAGESIZE
Table space type
Table schema (if the table space contains only one table)

If inconsistencies are found, DB2 throws a -904 SQLCODE and reports the issue. You can then use the REPAIR utility to remediate the reported issues. In past releases, validation did not occur immediately, which could have resulted in data corruption issues, storage overlays, and even ABENDs.

Summary

So you can rest easier knowing that DSN1COPY data is checked after it is created, thereby removing a lot of the chance for calamity if you ran the utility improperly... and that's a good thing!

Monday, November 24, 2014

Peace, Prosperity and Happy Thanksgiving

Just a quick blog post today to wish all of my readers, wherever they live, peace and prosperity during this holiday season.

And to my readers in the USA, Happy Thanksgiving. You know what that means, right? Relatives, football, and a lot of food, including turkey!

It also means that we are about to "officially" start the holiday shopping season, which begins on Black Friday. Of course, anyone who has been out in the mall, or to any store really, knows that the holiday season started right after Halloween. But for most people it begins this week!

So, Happy Thanksgiving to the US folks... and peace, happiness and warm thoughts to all of you everywhere this week.

Friday, November 14, 2014

Database Basics for Beginners

Every now and then I get e-mail from readers of my blogs asking introductory/beginner questions about databases and DBMS. I cannot take the time to answer all of these e-mails in-depth, so I thought I blog a quick post with some good resources for folks.

I think a good place to start is an article I wrote several years ago now titled What is a Database? This article breaks down the benefits of a database, outlines the difference between a database and a DBMS, and provides some guidance for further reading (suggested books).

Other questions I am get asked frequently involve database administration. One is: What Does a DBA Do? Follow the link to find my answer to that question. Another is: how can I become a DBA? I wrote an article titled How to Become a DBA to answer that one. And finally, another frequent topic is: How many DBAs do I need? That is a tricky one, but I propose a framework to help answer that question in an article titled DBA Staffing Considerations.

I also get a lot of DB2 for z/OS questions. And I've written a book on that topic, plus a bunch of DB2 articles, too (all of which can be found here).

So I guess what I am saying here is to take a look at what is already "out there" to see if your questions can be answered on the web. But, please, keep the questions coming. If I do not answer your e-mail do not be discouraged. I do read most of them (unless it gets caught in my spam collector). Even if I do not have the time to respond, I keep track of what is asked and use it as input into my writing process... so you may see an answer pop up online in a blog, article, or column I write... eventually.

Tuesday, November 11, 2014

On Building Appropriate DB2 Indexes

Perhaps the single most important thing that can be done to assure optimal DB2 application performance is creating correct indexes for your tables based on the queries used by your applications. Of course, this is easier said than done. But we can start with some basics. For example, consider the following SQL statement:

  SELECT  LASTNAME, SALARY
FROM EMP
WHERE   EMPNO = '000010'
AND     DEPTNO = 'D01';

What index or indexes would make sense for this simple query? The short answer is “it depends.” But the more important answer is to understand what it depends upon! First, think about all of the possible indexes that could be created. Your first short list probably looks something like this:

Index1 on EMPNO
Index2 on DEPTNO
Index3 on EMPNO and DEPTNO
Index4 on DEPTNO and EMPNO

This is a good start and one of either Index3 or Index4 is probably the best. Either allows DB2 to use the index to immediately lookup the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table you might want to examine the impact of creating yet another index on the table. Factors to consider include:

Modification impact: DB2 will automatically maintain every index that you create. This means that every INSERT and every DELETE to this table will cause data to be inserted and deleted not just from the table, but also from its indexes. And if you UPDATE the value of a column that is in an index, the index will also be updated. So, indexes speed the process of retrieval but slow down modification.
Importance of this particular query: The more important the query the more you may want to tune by index creation. For example, if you are coding a query that will be run every day by the CIO, you will want to make sure that it performs optimally. Who wants to risk a call from the CIO complaining about performance? So building indexes for that particular query is very important. On the other hand, a query for a low-level clerk may not necessarily be weighted as high, so that query may have to make due with the indexes that already exist. Of course, the decision depend on the importance of the application to the business – not just on the importance of the user of the application.
Columns in the existing indexes: If an index already exists on EMPNO or DEPTNO it might not be wise to create another index on the combination. However, it might make sense to change the other index to add the missing column. But not always because the order of the columns in the index can make a big difference depending on the query. For example, consider the following query:

SELECT LASTNAME, SALARY

FROM EMP

WHERE EMPNO = '000010'

AND DEPTNO > 'D01';

In this case, EMPNO should be listed first in the index. And DEPTNO should be listed second allowing DB2 to do a direct index lookup on the first column (EMPNO) and then a scan on the second (DEPTNO) for the greater-than. Furthermore, if indexes already exist for both columns (one for EMPNO and one for DEPTNO) DB2 potentially can use them both to satisfy this query so creating another index may not be necessary.

One final thought for today, and that is to build your indexes based on workload, not object by object. Many people make the mistake of just guessing as some indexes as they create tables for new projects. And sometimes this cannot be avoided because the SQL typically is not known before the database is created. But some of the guesses -- or maybe many of them -- are likely to be suboptimal at best, wrong at worst.

Indexes should be built to optimize access to data via your SQL queries. (Of course, there are indexes required to support RI and uniqueness, but let's leave them out of the discussion for the moment.) To properly create a set of indexes requires a list of the SQL to be used, an estimate of the amount of data in the table (and an estimate of column values if possible), an estimate of the frequency that each SQL statement will be executed, and the relative importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries (most of the time) be attempted. If you are doing it any other way, you are doing it wrong.

Of course, there is much more to index design than we have covered so far. For example, you might consider index overloading to achieve index only access. If all of the data that a SQL query asks for is contained in the index, DB2 may be able to satisfy the request using only the index. Consider our previous SQL statement. We asked for LASTNAME and SALARY given information about EMPNO and DEPTNO. And we also started by creating an index on the EMPNO and DEPTNO columns. If we include LASTNAME and SALARY in the index as well then we never need to access the EMP table because all of the data we need exists in the index. This technique can significantly improve performance because it cuts down on the number of I/O requests.

Keep in mind, though, that it is not prudent (or even possible) to make every query an index only access. This technique should be saved for particularly troublesome or important SQL statements.

Monday, November 03, 2014

Removing Superfluous Spaces

We all can relate to dealing with systems that have data integrity problems. But some data integrity problems can be cleaned up using a touch of SQL. Consider the common data entry problem of extraneous spaces (or blanks) inserted into a name field. Not only is it annoying, sometimes it can cause the system to ignore relationships between data elements because the names do not match. For example, “Craig Mullins” is not equivalent to “Craig Mullins”; the first one has two spaces between the first and last name whereas the second one only has one.

You can write an SQL UPDATE statement to clean up the problem, if you know how to use the REPLACE function. REPLACE does what it sounds like it would do: it reviews a source string and replaces all occurrences of a one string with another. For example, to replace all occurrences of Z with A in the string BZNZNZ you would code:

REPLACE(‘BZNZNZ’,’Z’,’A’)

And the result would be BANANA. So, let’s create a SQL statement using REPLACE to get rid of any unwanted spaces in the NAME column of our EMPLOYEE table:

UPDATE EMPLOYEE

SET NAME = REPLACE(

REPLACE(

REPLACE(NAME, SPACE(1), '<>')

'><', SPACE(0))

'<>', SPACE(1));

"Wait-a-minute," you might be saying. "What are all of those left and right carats and why do I need them?"

OK, fair enough. Let’s explain how this works starting from the inside out. The inside REPLACE statement takes the NAME column and converts every occurrence of a single space into a left/right carat. The next REPLACE (working outward), takes the string we just created, and removes every occurrence of a right/left carat combination by replacing it with a zero length string. The final REPLACE function takes that string and replaces any left/right carats with a single space. The reversal of the carats is the key to removing all spaces except one – remember, we want to retain a single space anywhere there was a single space as well as anywhere that had multiple spaces. Try it, it works.

Of course, you can use any two characters you like, but the left and right carat characters work well visually. Be sure that you do not choose to use characters that occur naturally in the string that you are acting upon.

Finally, the SPACE function was used for clarity. You could have used strings encased in single quotes, but the SPACE function is easier to read. It simply returns a string of spaces the length of which is specified as the integer argument. So, SPACE(12) would return a string of twelve spaces.

Thursday, October 23, 2014

DB2 SQL and Application Performance Tools

So far in this series of blog posts on DB2 performance tools, we have looked at system and database performance solutions. But perhaps the most important solution area involves monitoring and tuning application SQL statements.

Why do I say that? Well, the cause of most performance problems is usually due to bad SQL and application code. Not every problem, of course, but maybe as much as 70 to 80 percent of DB2 (and relational) performance issues is likely due to inefficient application code.

Writing SQL statements to access database tables is the responsibility of an application development team. However, the DBA usually gets involved when it comes to the performance of SQL. With SQL’s flexibility, the same request can be made in different ways. Because many of these methods are inefficient, application performance can fluctuate wildly unless the SQL is analyzed and tuned by an expert prior to implementation.

The EXPLAIN command provide information about the access paths used by SQL queries by parsing SQL in application programs and placing encoded output into a PLAN_TABLE or by producing a standard access path report. To gauge efficiency, a DBA must decode this data and determine if a more efficient access path is available.

SQL code reviews are required to ensure that optimal SQL design techniques are used. An application design walkthrough should be performed for each program before it moves to production. This is done to review all SQL statements, the selected access paths, and the program code in which the SQL is embedded. The review also includes an evaluation of database statistical information to ascertain whether production-level statistics were used at the time of the EXPLAIN.

A line-by-line review of application source code and EXPLAIN output is tedious and prone to error, and can cause application backlogs. SQL analysis tools greatly simplify this process by automating major portions of the code review process. The SQL analysis tool typically

Analyzes the SQL in an application program, describing the access paths chosen in a graphic format, an English description, or both.
Issues warnings when specific SQL constructs are encountered. For example, each time a sort is requested (by ORDER BY, GROUP BY, or DISTINCT), a message informs the user of the requisite sort.
Suggests alternative SQL solutions based on an “expert system” that reads SQL statements and their corresponding PLAN_TABLE entries and poses alternative SQL options.
Extends the rules used by the “expert system” to capture site-specific rules.
Analyzes at the subsystem, instance, server, application, plan, package, or SQL statement level.
Stores multiple versions of EXPLAIN output, creates performance comparisons, and plans history reports.

Tools that analyze the performance of the application code in which the SQL is embedded are available too. These tools usually capture in-depth information about programs as they are run and provide reports that specify which areas of the code consume the most resources. Unfortunately, most of these tools do not necessarily interface to SQL analysis tools. Why might this be a problem?

Well, consider an application program that contains a singleton SELECT inside a loop. The singleton SELECT requests a single row based on a WHERE clause, checking for the primary key of that table. For each iteration of the loop, the program changes the primary key value being searched such that the entire table is read from the lowest key value to the highest key value.

SQL analysis tools will probably not target the SQL statement as inefficient because the predicate value is for the primary key, which should invoke indexed access. The application program analysis tool may flag the section of the code that accesses the data as inefficient, but it will not help you to fix it or tell you why it is inefficient.

A knowledgeable performance analyst or DBA would have to use both tools and interpret the output of each to arrive at a satisfactory conclusion. For example, it could be more efficient to code a cursor, without a predicate, to retrieve every row of the table, and then fetch each row one by one. This method would eliminate index I/O, might use parallel access, and therefore should reduce I/O and elapsed time—thereby enhancing performance.

Only a trained analyst can catch this type of design problem during a code walkthrough. Although a plan analysis tool significantly reduces the effort involved in the code review process, it cannot eliminate it.

So what should you look for in an SQL analysis tool? The first feature required of SQL analysis tools is the ability to read and interpret standard EXPLAIN or SHOW PLAN output. The tool should be able to read the plan table or interface directly with the DBMS command to obtain the output. It then must be able to automatically scan the EXPLAIN or SHOW PLAN data and report on the selected access paths and the predicted performance. Advanced tools will provide recommendations for improving the SQL by adding indexes or modifying the SQL.

Yet another category of tool can evaluate access paths as you REBIND programs and categorize them into changed and unchanged access paths. This helps to identify where SQL tuning may be required. Advanced forms of these tools also apply rules tot he changed SQL to indicate if the access path is better or worse than the prior access path. Such tools can be incredibly helpful for performing mass rebinds of your production programs.

SQL Monitors

An SQL monitoring solution can identify running SQL statements, filter the information, and display it in an appropriate order and configuration. For example, you can use an SQL monitor to identify the Top Ten CPU users over the past hour (or the past day, week, etc.)

Usually, there is the on-line capability, that displays what is happening right now, and the historical capability, which can display details and trends over time.

An SQL monitor is particularly helpful when working to remediate production performance issues where hundreds or thousands (or more) of SQL statements can be running at any one time.

End-to-End Performance Tools

Modern applications require multiple system components and run across multiple networked devices to deliver functionality. When performance problems arise, it can be difficult to determine what, exactly, is causing the problem. Is it on the client or the server? Is it a networking problem? Is it a database issue or a code problem?

End-to-end performance monitoring tools exist that track an application request from initiation to completion. These solutions provide enhanced visibility specifically into application performance—giving organizations the power to understand both when and why performance has degraded, and the information needed to improve matters in a business-prioritized fashion.

By following the workload as it progresses across multiple pieces of hardware and software, problem determination becomes possible.

Workload Testing and Estimation

Another category of SQL performance tool allows you to identify a workload consisting of programs and transactions that are to be run during a specific timeframe. The tools help to identify performance issues that crop up only when the application is running at a production volume.

Data Studio

Finally, no overview of application performance tools for DB2 would be complete without a brief mention of IBM's Data Studio. Data Studio is a free-of-charge tool for basic DB2 administration and development tasks. Data Studio offers an easy to use GUI interface for the following:

Designing data access queries and routines
Building, running, and tuning SQL
Building, testing, and deploying stored procedures (using SQL or Java)
Creating Web services in for Service Oriented Architecture (SOA) solutions
Developing DB2 SQLJ applications
Managing database objects and authorizations

You can download Data Studio at IBM’s website. It is available as a stand-alone package geared mostly for DBAs, or as an IDE geared for both DBA and development work.

Of course, IBM sells other DB2 tools for a fee, some of whichcan integrate and work well with Data Studio. And there are other tools that compete with Data Studio that offer a lot more functionality than the basics provided by the free capabilities of Data Studio (such as Dell's Toad).

Summary

These past few posts have taken a broad overview look at the categories and types of performance tools available for managing the performance of your DB2 for z/OS environment. Many of the same categories of tools are available for DB2 for LUW (as well as other DBMS offerings).

Have I missed any important categories? If so, drop me a line or add a comment here to the blog. I'm always interested in getting feedback.

Thanks... and happy performance tuning!