Tuesday, May 12, 2015

A Trip Report from the 2015 IDUG DB2 Tech Conference

Last week I attended, and spoke at, the annual North American IDUG DB2 Tech Conference in Philadelphia, PA. As usual, the event was busy and chock full of useful and interesting DB2 information.

My week started on Sunday with the IBM gold consultant briefing, whereby IBM talks to the gold consultants about their upcoming plans and solicits feedback from us. I can’t really share with you what happened there, but as is almost always the case, it was time well spent.
The conference kicked off in earnest on Monday with the keynote from IBMers Tim Vincent and Namik Hrle titled “Big Data, Analytics and the Future of Data Management.” Tim and Namik discussed how the growth of data is fueling innovation causing a big change in the way value is created. Some of the key takeaways from the keynote, for me at least, were:
  • The predominant source of Big Data for most projects is still structured transactions
  • Primary focus of most data projects is usually on understanding customers
  • There is a trend toward self-service
  • Clearly there is value in Hadoop but you can't replace all your DBMSes with it!

Perhaps the most salient tidbit shared at the keynote address was this: “People have forgotten, or never truly understood, how complex data integration actually is.” I think this is true; all too often people underestimate how difficult the integration of data will be. And I agree, too, with sentiment of the presenters who declared that “We need to realize that data is never going to be in one place.”

The bottom line of the keynote: All of our existing DB2 skills will still be relevant but must co-exist with skills in newer technologies like NoSQL and Hadoop.

Good stuff!

Some of the other highlights of the week:
  • Attendance seemed to be higher than at the past few IDUG conferences. And I see growth in the number of DB2 for LUW attendees. IDUG, at least historically, was mostly attended by DB2 for z/OS folks. And it is probably still true that more z/OS folks attend than LUW folks, but the LUW attendance continues to grow and may surpass z/OS at some point! Of course, this is all based on my unscientific eyeballing of attendees at presentations.
  • My session on DB2 performance had a full room and nobody seemed to nod off or run for the exits while I spoke. I also delivered a VSP for Dell to a room full of DBAs and developers, as well as a couple of presentations for DataKinetics during an evening session. So that was all good!
  • I attended a couple of sessions on newer features of DB2 and how some of them are under-utilized. I think this speaks to a few trends hitting the world of DB2 these days. Number one: training is drying up. If you don’t get training on new features it is hard to use them appropriately. Number two: failure to take advantage of free learning opportunities like webinars and freely-available PDF manuals:
  • The vendor exhibit hall was active and well-attended throughout the week. All the usual suspects were there like BMC, CA, dbi, Dell, and IBM, but there were some newer (at least to IDUG) companies, too, like DataKinetics and Imperva.
  • The educational seminars were integrated into the conference this year. And they did not cost extra to attend. That means that more people attended ed seminars this year (at least I think they did) and the conference offered more value to attendees looking for more in-depth education than can be gained by an hour session.


All in all, the 2015 North American IDUG Tech Conference was a big success. And it is not too early to start planning for next year’s conference, which will be in Austin, TX. Finally, I’ll be able to drive to an IDUG… see you there in 2016!

Tuesday, April 28, 2015

Controlling Mainframe Software Costs

Although this blog usually focuses on DB2 issues, I sometimes use it to focus on other IT issues, usually mainframe-related. The primary purpose of today's blog post is to promote a webinar I'm conducting this Thursday called, Managing Your z/OS Software BillThe webinar is sponsored by Data Kinetics, the North American distributor of a product called AutoSoftCapping (or ASC for short), that can be used to help control the rolling four hour average and thereby reduce monthly software bills.

Cost containment is of critical importance for IT departments in this day and age of financial austerity... especially so in the mainframe world.  Every decision regarding your computer resources is weighed based on not only the value that they can deliver to your organization, but upon their cost to procure, implement, and maintain. And, in most cases, if a positive return on investment cannot be calculated, the software won’t be adopted, or the hardware won’t be upgraded.

An important opportunity for mainframe cost containment is to better manage the peak monthly capacity of your mainframe on an LPAR (logical partition) by LPAR basis. The pricing model for most mainframe software is based on the capacity of the machine on which the software will run. Note that this pricing model reflects the potential usage based on the capacity of the machine, not the actual usage. Some vendors offer usage-based pricing. You should actively discuss this with your current ISVs as it is becoming more common, more accurately represents fair usage, and can save you money.
IBM offers several subcapacity pricing models for many of its popular software offerings, including products such as z/OS, DB2, IMS, CICS, MQSeries and COBOL. Some of the benefits of subcapcity pricing include the ability to:
  • Grow hardware capacity without necessarily increasing your software charges
  • Pay for key software at LPAR-level granularity
  • Experience a low cost of incremental growth
  • Manage software cost by managing workload utilization
By tracking MSU usage by LPAR you can be charged based on the maximum rolling four hour (R4H) average MSU usage, instead of full capacity. Most organizations with mainframes have shifted to some form of subcapacity pricing model, but not all of them understand how all of the "moving parts" work together. Thursday's webinar will help to clear that all up!
Managing mainframe software costs by adopting subcapacity pricing, soft capping techniques, and software like Data Kinetics' AutoSoftCapping can help your company to assure a cost-effective IT organization. In today’s cost-cutting, ROI-focused environment, doing anything less than that is short-sighted.

Tuesday, April 21, 2015

SQL Performance Basics: Part 7, Embed the SQL Directly Into Your Programs

You may recall that this is a subject I've written about before, but I think it is important enough to warrant covering briefly in this series on SQL performance basics... and that is, you should avoid black boxes if you want to optimize your applications.

So what is a black box? Simply stated, a black box is a database access routine that sits in between your application programs and DB2. Instead of coding embedded SQL in your application code, you make a call to the access routine (or black box).

The general idea here is that development is easier because programmers do not need to know how to write SQL. Instead, programmers call the black box to request data. SQL statements become calls – and every programmer knows how to code a call, right?

This approach is commonly referred to as a “black box” approach because the data access interface shields the developers from the “complexities” of SQL. The SQL is contained in that black box and programmers do not need to know how the SQL works – just how to call the black box for data. But there are a number of reasons why this approach is not sound:
  1. It is unwise to have uninformed and untrained developers writing DB2 applications without knowing the SQL behind their data access requests. What may seem like a simple request to a non-educated programmer may actually involve very complex and inefficient SQL “behind the scenes” in the black box.
  2. The black box coders will take shortcuts, like combining multiple types of SQL requests into one that will cause more data to be returned than is needed... but then send only the needed data back to the requester. This violates Part 1 of this series.
  3. The black box is an access routine to DB2 data, but SQL is already an access method to DB2 data -- another one is not needed, and is therefore superfluous.
  4. The black box will add CPU cycles to your application because it consists of extra code that is not needed if the SQL is embedded right into your application programs.

Suffice it to say, avoid implementing SQL black boxes... they are just not worth the effort!


If you are interested in more details on this subject, I point you to this article I wrote on the subject in my DBA Corner column for Database Trends & Applications.

Thursday, April 16, 2015

SQL Performance Basics: Part 6, Code Relationally... or Avoid the Flat-File Mentality

Another pervasive problem permeating the DB2 development community -- and indeed for most relational DBMSes -- is the “flat file” development mentality. What I mean by this is, when a programmer tries to access data in a relational database the same way that he would access data from a flat file. 

DB2 is ‘relational’ in nature and, as such, operates on data a set-at-a-time, instead of the record-at-a-time processing used against flat files. In order to do justice to DB2, you need to change the way you think about accessing data.

To accomplish this, all users of DB2 need at least an overview education of relational database theory and a moderate to extensive amount of training in SQL. Without such a commitment your programmers are sure to develop ugly and inefficient database access code – and who can blame them? Programmers are used to working with files so they are just doing what comes naturally to them.

SQL is designed so that programmers specify what data is needed but they cannot specify how to retrieve it. SQL is coded without embedded data-navigational instructions. The DBMS analyzes SQL and formulates data-navigational instructions "behind the scenes.” This is foreign to the programmer who has never accessed data using SQL.

Every SQL manipulation statement operates on a table and results in another table. All operations native to SQL, therefore, are performed at a set level. One retrieval statement can return multiple rows; one modification statement can modify multiple rows. This feature of relational databases is called relational closure.

When accessing data, a programmer needs to think about what the end result should be and then code everything possible into the SQL. This means using the native features of SQL – joins and subselects and functions, etc. – instead of coding procedural host language code (whether in COBOL, C, Java or whatever) that, for example, opens up a cursor, fetches a row, and then uses a fetched value to open up another cursor. This is processing DB2 like a set of flat files... better to join the data!

Educating programmers how to use SQL properly -- at a high level -- is probably the single most important thing you can do to optimize performance of your DB2 applications.

Saturday, April 11, 2015

SQL Performance Basics: Part 5, Avoid Sorting

Sorting is a very costly operation that you should strive to avoid if at all possible. 
Indexes are very useful for this purpose. DB2 can choose to use an available index 
to satisfy the sorting requirements of an ORDER BY, GROUP BY, or DISTINCT clause. 
Therefore, it can be beneficial to create indexes that correspond to these clauses 
for frequently executed SQL statements. 
For example, consider this SQL statement:
      SELECT    LAST_NAME, TITLE, SALARY
      FROM      EMPLOYEE
      ORDER BY  LAST_NAME, TITLE;
 
If an index on LAST_NAME and TITLE exists, DB2 can use it to avoid sorting. By 
retrieving the data in order from the index, sorting becomes unnecessary.
You can use this information to tune your SQL further to avoid sorting. When 
ordering is required only by one column, sometimes adding another column to 
the ORDER BY can enable indexed access to avoid the sort. 
Consider this SQL statement:
      SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
      FROM      DEPARTMENT
      ORDER BY  MANAGER, DEPT_CODE;
 
Say there is a unique index on MANAGER. DB2 will probably use this index to 
satisfy the request. However, a sort will be done after retrieving the data to 
put the results into DEPT_CODE order within MANAGER. But, since we know 
our data, we are able to change this situation. Because MANAGER is unique 
we know that the following SQL statement is equivalent to the prior one: 
      SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
      FROM      DEPARTMENT
      ORDER BY  MANAGER;
 
In this case, DB2 can use the index on MANAGER to avoid the sort. The extra 
column, DEPT_CODE, is removed from the ORDER BY clause. But, since MANAGER 
is unique, there can be at most one DEPT_CODE per MANAGER. Therefore, the 
sort order will be equivalent. Because we knew our data we removed a sort from 
the process and made our query more efficient!
One final note on sorting: although most of you probably know this, it cannot be 
stated too strongly (or frequently) - always code an ORDER BY if the order of the 
results of your query is important. The ORDER BY clause is the only foolproof 
method of ensuring appropriate sort order of query results. Simply verifying that 
access is via an index is not sufficient to ensure the order of the results because:
  • The access path may change in the future to non-indexed access.
  • Even indexed access does not always return ordered results (e.g. list prefetch).
UNION VERSUS UNION ALL
Another simple, yet effective, means of enhancing SQL performance is to 
understand how UNION, EXCEPT and INTERSECT work. Let's start with UNION 
because it has been around in DB2 the longest. UNION takes the results of 
multiple SELECT statements and combines them together. It does this, as part 
of the UNION operation, by sorting the results and removing any duplicates. 
UNION ALL, on the other hand, will not sort and will not remove duplicates.
If you know that the results of the queries being unioned together are distinct 
(that is, no duplicates will be encountered), then you can use UNION ALL instead 
of UNION, and thereby enhance performance by avoiding the sort. Additionally, 
if you do not care whether duplicates are returned, always use UNION ALL. 
The same applies to the EXCEPT and INTERSECT operations, which were added 
as of DB2 9 for z/OS.

Tuesday, April 07, 2015

SQL Performance Basics: Part 4, The Order of Predicates

Did you know that the order in which you code your predicates can have an impact on query performance? It is usually a minimal impact, but it may buy you a couple of microseconds for a very performance-critical query. In order to use predicate ordering to your advantage however, you  need to be armed with some basic information on how DB2 evaluates predicates as it processes your SQL.

So, before we continue, let's review the order in which DB2 evaluates predicates at execution time. DB2 will evaluate indexable predicates first: matching predicates before non-matching. Then, Stage 1 predicates, and finally Stage 2 predicates. Within each of these four groups, DB2 will evaluate equal predicates, then BETWEEN and NOT NULL predicates, and finally, any other predicates. If more than one predicate exists within a group, then DB2 will evaluate them in the physical order in which they are coded in the SQL statement. 

The re-ordering of predicates to take advantage of this situation should be considered only as a last resort. When implemented, the technique will usually shave only a little bit from the query's execution time. It is also important to note that predicate order will not impact a query's access path: it will remain unchanged (as shown in the PLAN_TABLE).

Now, how can we use this to our advantage

Consider the following query:

SELECT LAST_NAME, FIRST_NAME
FROM EMPLOYEE
WHERE SEX = 'M'
AND TITLE = 'MANAGER'


For the purposes of this discussion, no index exists for either of the columns coded in the predicates. They are therefore the same type: stage 1 and equal predicates. Furthermore, we know our data - in our organization, there is approximately a 50-50 split between males and females, and 15% of all employees are managers.

To optimize this query then, we can swap the two predicates to achieve better performance. So the query becomes:

SELECT LAST_NAME, FIRST_NAME
FROM EMPLOYEE
WHERE TITLE = 'MANAGER'
AND SEX = 'M'


Why should this query outperform the previous version? Well, assume we have 100,000 employees. If DB2 retrieves 50% of the rows (SEX = 'M') and then retrieves 15% of those 50%, we will have processed 57,500 rows:

( 100000 * 0.5 ) + ( ( 100000 * 0.5 ) * 0.15 ) = 57,500

But, if instead, DB2 were to retrieve 15% of the rows (TITLE = 'MANAGER') and then 50% of those, we will have processed only 22,500 rows:

( 100000 * 0.15) + ( ( 100000 * 0.15 ) * 0.5 ) = 22,500

Obviously, it is better for fewer rows to qualify early, thereby reducing the answer set and the number of rows that will have to be subsequently scanned.

Wednesday, April 01, 2015

SQL Performance Basics: Part 3, Eliminating Predicates?

It is technically possible to learn how to write SQL statements without having an in-depth knowledge of the data. However, the better you know your data, the better your application performance will be. Let's look at a simple example.

By reducing the number of predicates on your SQL statements you may be able to  achieve better performance by:
  1. Reducing BIND (and REBIND) time because fewer options will probably need to be examined by the DB2 Optimizer.
  2. Reducing execution time due to a smaller path length caused by the removal of search criteria from the optimized access path. DB2 will always make sure that it processes each predicate coded for the SQL statement. Removing predicates removes work -- and less work equals less time to process the SQL.
Of course, you have to make sure that you can actually remove predicates without impacting the result set of your query, right? But sometimes - if you know your data - there are cases where you can eliminate predicates.

Consider the following SQL statement:

 SELECT FIRST_NAME, LAST_NAME, GRADE_LEVEL
 FROM   EMPLOYEE
 WHERE  TITLE = 'VP'
 AND    GRADE_LEVEL >= 10;


This statement retrieve all rows for vice presidents who are at a grade level of 10 or above. But, what if we know more about our data? Say, for example, that the starting grade level for vice presidents in our organization is 10. Therefore, it is impossible for anyone with a lower grade level to achieve the title of VP. That makes the second predicate redundant in this case. If we remove this predicate it will not logically change the results, but with less checking of the data required (DB2 won't have to check for GRADE_LEVEL >= 10) performance may be improved.

It is important though that you truly do "know your data." For example, it is not sufficient to merely note that for current rows in the EMPLOYEE table, no vice presidents are at a grade level below 10. This may just be a coincidence. Do not base your knowledge of your data on the current state of the data. You must truly know your business criteria to determine that  a correlation between two columns (such as between GRADE_LEVEL and TITLE) actually exists. And only then should you modify your SQL. Failure to do so can result in incorrect results being returned.

Also, if the predicate was already there and you are removing it, comment out the predicate instead and be sure to document exactly why you are doing so in the code... that way, when somebody else takes a look at it later they'll know what happened and why.

Tuesday, March 24, 2015

Time to Start Your IDUG in Philadelphia Planning

Spring is in the air... well, at least it is South of the Mason-Dixon line... and that means it is time to plan your trip to this year's IDUG North American DB2 Tech Conference.

Anybody who has ever attended an IDUG conference knows about all of the good things you can expect to encounter, experience and learn at the event. That includes technical session on all of the latest and greatest DB2 technologies and features, networking opportunities to meet IBM developers and industry consultants, and the vendor exhibit hall where you can learn about software and tools to help you get the most out of DB2... not to mention picking up a few tchotchkes and mementos.

Additionally, there are Special Interest Groups (where you can discuss the latest industry trends and topics with other interested technicians), Hands-On Labs (delivering working training led by IBM), and the opportunity to take free IBM certification exams.

But there's even more... not only is this year's conference in Philadelphia, PA, a new venue for IDUG, but the half- and full-day educational seminars (that used to require an additional fee) are included in your registration fee.

I'll be presenting my DB2 Performance Roadmap presentation  at this year's IDUG, so be sure to stop in to say "Hi" and chat about DB2, big data, or your favorite topic du jour!

The IDUG DB2 Tech Conference is the place to be to learn all about DB2 from IBMers, gold consultants, IBM champions, end users, and more. With all of this great stuff going on in Philadelphia this May, why wouldn't you want to be there!?!?


Download the IDUG brochure for more details.

Monday, March 16, 2015

SQL Performance Basics: Part 2, Rely on Indexes

Perhaps the single most important aspect of SQL tuning is indexing. All developers should know all of the indexes that exist on any table upon which they write SQL statements. When an index exists on a column (or columns), DB2 can use the index to reduce I/O instead of scanning the entire table to satisfy a predicate. 

For critical queries, where no useful indexes exists, it usually makes sense to create an index to improve query performance. Of course, developers should enlist the assistance of a DBA to ensure the creation of appropriate indexes.

Let's learn with Bsome basics. For example, consider this SQL statement:
    SELECT   LASTNAME, SALARY 
    FROM     EMP 
    WHERE    EMPNO = '000010' 
    AND      DEPTNO =  'D01'
What index or indexes would make sense for this simple query? ""'First, think about all the possible indexes that you could create. Your first short list probably looks something like this:
  • Index1 on EMPNO
  • Index2 on DEPTNO
  • Index3 on EMPNO and DEPTNO
This is a good start, and Index3 is probably the best of the lot. It lets DB2 use the index to immediately look up the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table, you might want to examine the impact of creating yet another index on the table. There are several factors to consider. 

Indexing Factors to Consider
For starters, you need to weigh the impact of data modification. DB2 will automatically maintain every index you create. This means every INSERT and every DELETE to this table will insert and delete not just from the table, but also from its indexes. And if you UPDATE the value of a column that is in an index, you also update the index. So, indexes speed the process of retrieval but slow down modification.
You should also consider the impact to any existing indexes and applications before creating a new index. If an index already exists on EMPNO or DEPTNO, it might not be wise to create another index on the combination. However, it might make sense to change the other index to add the missing column. But not always, because the order of the columns in the index can make a big difference depending on the query. For example, consider this query:
SELECT   LASTNAME, SALARY 
FROM     EMP 
WHERE    EMPNO = '000010' 
AND      DEPTNO >  'D01';
In this case, EMPNO should be listed first in the index. And DEPTNO should be listed second, allowing DB2 to do a direct index lookup on the first column (EMPNO) and then a scan on the second (DEPTNO) for the greater-than.
Furthermore, if indexes already exist for both columns (one for EMPNO and one for DEPTNO), DB2 can use them both to satisfy this query so creating another index might not be necessary.
Finally, you should consider the importance of the query you are attempting to tune. The more important the query, the more you might want to tune by index creation. Of course, the term "importance" is not always easy to quantify. If you are coding a query that the CEO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. Of course, the decision depends on the application's importance to the business--not just on the user's importance. If the clerk runs business critical transactions and the CEO is simply printing off a report for later consumption, then the importance varies... right?

Index Overloading
Index design involves much more than I have covered so far. For example, you might consider index overloading to achieve index-only access. If all the data that a SQL query asks for is contained in the index, DB2 might be able to satisfy the request using only the index. Consider our previous SQL statement. We asked for LASTNAME and SALARY, given information about EMPNO and DEPTNO. And we also started by creating an index on the EMPNO and DEPTNO columns. If we include LASTNAME and SALARY in the index as well, we never need to access the EMP table because all the data we need exists in the index. This technique can significantly improve performance because it cuts down on the number of I/O requests.
Keep in mind that making every query an index-only access is not prudent or even possible. You should save this technique for particularly troublesome or important SQL statements.

The Bottom Line
A solid understanding of the indexes that exist -- and how additional indexes can help -- is vital to the performance of your DB2 applications. Take the time to understand the indexes that exist for your applications and you can become a better developer, becoming more valuable to your organizations and earning the respect of your peers!

Tuesday, March 10, 2015

SQL Performance Basics: Ask For Only What You Need

As you write SQL statements to access DB2 data, there are certain very simple, yet important rules to follow to encourage efficient SQL. Of course, SQL performance is a complex topic and to understand every nuance of how SQL performs can take a lifetime to master. That being said, adhering to the following simple rules puts you on the right track to achieving high-performing DB2 applications.

The first rule is to always provide only the exact columns that you need to retrieve in the SELECT-list of each SQL SELECT statement. Sometimes this is shortened to “do not use SELECT *”. The shorthand SELECT * means retrieve all columns from the table(s) being accessed. This is fine for quick and dirty queries but is bad practice for inclusion in application programs because DB2 tables may need to be changed in the future to include additional columns. SELECT * will retrieve those new columns, too, and your program may not be capable of handling the additional data without requiring time-consuming changes. 

Even if the program needs every column, it is better to explicitly ask for each column by name in the SQL statement for clarity and to avoid accessing new and unwanted columns.
DB2 consumes additional resources for every column you specify to be returned. If the program does not need the data, it should not ask for it.

The second part of asking only for what you need is using the WHERE clause to filter data in the SQL instead of bringing it all into your program. This is a common rookie mistake. It is much better for DB2 to filter the data before returning it to your program. This is true because DB2 uses additional I/O and CPU resources to obtain each row of data. The fewer rows passed to your program, the more efficient your SQL will be. So, the following SQL

    SELECT  EMPNO, LASTNAME, SALARY
    FROM    EMP
    WHERE   SALARY > 50000.00;


Is better than simply reading all of the data without the WHERE clause and then checking each row to see if the SALARY is greater than 50000.00 in your program.

Friday, February 27, 2015

A Few DB2 Connect Guidelines


Here are a few guidelines for you to consider as you work to implement and optimize your usage of IBM's DB2 Connect gateway...

Consider DB2 Connect PE for Single Workstation Solutions  DB2 Connect PE can be a good choice for two-tier client/server applications running on desktop workstations that need to access data from mainframe DB2 databases. However, the personal edition is not suitable for deployment on multi-user systems and application servers.

If you have two people who share a workstation, DB2 Connect PE is a viable solution, because each person uses the workstation individually. That is, they cannot both use the workstation at the same time. Only one connection is required at any time for both of these users so only one license of DB2 Connect PE is required.
On the other hand, if you set up five workstations as dumb terminals making connections to a DB2 for z/OS server, you would need ten licenses of DB2 Connect PE. Of course, you could use DB2 Connect EE instead, too. Or setting up a client with a Type 4 driver can be sufficient.


Consider DB2 Connect EE for Multi-User Solutions  DB2 Connect EE is the required solution when a middle tier connectivity server is needed. DB2 Connect EE is suitable for environments where the number of concurrent users can be easily determined. However, DB2 Connect UE and ASE are preferable to EE for web-based applications and multi-tier client/server applications where the number of concurrent users is difficult, or impossible, to determine. This is so mostly due to the cost of licensing.

Again, using Type 4 drivers without a DB2 Connect gateway can be a sufficient solution for many organizations.
Setting Up the DDF for z/OS  You must set up the DDF as an application server to connect distributed applications to DB2 for z/OS. There are two ways of doing this:
  • INSERT the LUNAME of the remote system into the SYSIBM.LUNAMES table.
  • INSERT the LUNAME, SYSMODENAME, USERSECURITY, ENCRYPTPSWDS, MODESELECT, and USERNAMES values into the SYSIBM.LUNAMES table.

Be sure to perform a DDF update to the Boot Strap Data (BSDS) after making one of these changed. Because DDF will try to connect to VTAM you must ensure that VTAM is active when the DDF starts.


Configure Distributed Threads  Use the DSNZPARM CMTSTAT to control the behavior of distributed threads. CMTSTAT specifies whether a DDF thread is made active or inactive after it successfully commits or rolls back and holds no cursors. Consider setting CMTSTAT to INACTIVE because inactive connections consume less storage. A thread can become inactive only if it holds no cursors, has no temporary tables defined, and executes no statements from the dynamic statement cache.

If you specify ACTIVE, the thread remains active. Although this can improve performance it consumes system resources. If your installation supports a large number of connections, specify INACTIVE.

DB2 supports two different types of inactive threads:
  1. An inactive DBAT, previously called a type 1 inactive thread, has the same characteristics as inactive threads prior to DB2 V8. This type of thread remains associated with the connections, but DB2 minimizes the thread’s storage use as much as possible.
  2. An inactive connection, previously called a type 2 inactive thread, uses less storage than an inactive DBAT. In this case, the connections are disassociated from the thread. The thread can be pooled and reused for other connections, new or inactive. This provides better resource utilization because there are typically a small number of threads that can be used to service a large number of connections.

    Although inactive connections are preferable to inactive DBATs, not every thread can become an inactive connection. If a thread is to become inactive, DB2 tries to make it an inactive connection. If DB2 cannot make it an inactive connection, it tries to make it an inactive DBAT. If neither attempt is successful, the thread remains active.


Increasing Data Transfer Rates  In addition to blocking of rows for a query result set, DB2 can also return multiple query blocks in response to an OPEN or FETCH request to a remote client (such as DB2 Connect). Instead of repeatedly sending requests to DB2 for z/OS requesting one block of data at a time, DB2 Connect can optionally request that the server send back some number of extra query blocks. These extra query blocks allow the client to minimize network traffic, thereby improving performance. DB2 Connect can be set up to request extra query blocks from a mainframe DB2 server by default.

Use the EXTRA BLOCKS SRV parameter on the DB2 DDF installation panel to enable support for extra query blocks. The minimum value for this parameter is 0 and the maximum is 100. Favor keeping the parameter set to the default value, which is 100.

You can also set up the EXTRA BLOCK REQ installation parameter to limit the number of extra DRDA query blocks that DB2 is to request from a remote DRDA server. This parameter does not limit the size of the SQL query answer set. It simply controls the total amount of data that can be transmitted on a network exchange. The minimum value for this parameter is 0 and the maximum is 100, which is also the default.

On the client side, you activate support on a cursor or statement basis. It is controlled by a query rowset size for a cursor, or the value of n in the OPTIMIZE FOR n ROWS clause or FETCH FIRST n ROWS ONLY clause.
Specify Character Data Types with Care  When character data is accessed using DB2 Connect, the choice of CHAR versus VARCHAR will impact performance. VARCHAR is more efficient if the length varies significantly. If the size of data does not vary much, CHAR is more efficient because each VARCHAR column requires two bytes of length information to be transmitted along with the data.

Of course, DB2 Connect transmission performance is but one consideration when choosing a data type. Refer to Chapter 5 for additional considerations.


BIND the DB2 Connect Utilities  DB2 Connect comes with several utilities that use embedded SQL, and therefore their programs must be bound to a database server before they can be used with that system. The list of bind files required by these utilities is contained in the following ddcsmvs.lst file for DB2 for z/OS. If you do not use the DB2 Connect utilities, you do not have to BIND their programs to each of your DB2 for z/OS database servers.

If DB2 Connect Enterprise Edition is installed, the DB2 Connect utilities must be bound to each database server once from each type of client platform, before they can be used with that system. For example, if you have 5 Windows clients and 4 Linux clients connecting to DB2 for z/OS using DB2 Connect EE, then BIND the programs in ddcsmvs.lst from one of the Windows clients, as well as from one of the Linux clients, and then from the DB2 Connect server. If all the clients are not at the same service level, you might need to BIND from each client for each particular service level.


Beware of SQLCODE and SQLSTATE Differences  Different IBM relational DBMSs will not always produce the same SQLCODE values for the same or similar errors. This can be trouble­some in distributed applications. There are two ways of handling this situation:
  • Use SQLSTATE instead of SQLCODE. The SQLSTATE values have the same basic meaning across the IBM’s DBMS products.
  • Use SQLCODE mapping.

DB2 Connect can map SQLCODEs and tokens from each IBM mainframe or iSeries server to your appropriate DB2 UDB system. You can specify your own SQLCODE mapping file if you want to override the default mapping or you are using a non-IBM database server that does not have SQLCODE mapping.

SQLCODE mapping is enabled by default. If you want to turn off SQLCODE mapping, specify NOMAP in the parameter string of the DCS directory or the DCE routing information object.


Assign Authids per Application  Consider assigning each distributed application a dedicated authid. This approach is superior to simply using the same authid for all applications because:
  •  Each authid can be assigned only the permissions it needs.
  •  You can use WLM to assign each authid different priorities, as needed.
  •  For troubleshooting, it is easier to identify the offending application.
  •  You can measure application resource usage by authid. 

Assure Appropriate Distributed Logon Authority  Be sure that DB2 administrators have the required authority on the distributed platforms that interact with DB2 for z/OS. For example, a UNIX logon should be available for the performance analyst or DBA to view db2diag.log and web application logs.

In addition, the logon should have sufficient authority to execute appropriate commands as needed. For example, iostat and vmstat are useful commands that may need to be issued.

Similar considerations should be made for Windows servers.

Monday, February 16, 2015

Choosing Between DECIMAL and FLOAT Data Types

DB2 can use both DECIMAL and FLOAT data types to store non-integer numeric data. But the two are not equivalent. In general, use DECIMAL instead of FLOAT whenever you can. The main problem with floating point numbers is that they are not precise. DECIMAL values are precise. In other words, a FLOAT value will be an approximate value whereas a DECIMAL value will be an exact value.

At times, if 100 percent precision is not required, you might want to use floating point numbers to save on storage. DB2 provides a parameter to size the floating point column; n where the data type specification is FLOAT(n). If n is between 1 and 21, this is a single precision floating point number and the column will require 4 bytes of storage; if n is between 22 and 53, this is a double precision floating point number and it will require 8 bytes to store it. A single precision floating-point number is a short (32 bits) floating-point number. A double precision floating-point number is a long (64 bits) floating-point number.

For DECIMAL columns, the byte count is calculated as INTEGER(p/2)+1; where p is the precision of the DECIMAL column. So, a DECIMAL(10,2) column will require (10/2)+1 bytes = 6 bytes. An approximation of the same number could be stored in a FLOAT(21) column that would require only 4 bytes of storage.

For very large or very small numbers, though, you will have to use FLOAT columns. This is so because there is a limit of 31 on decimal precision. The following outlines the largest and smallest values that can be supported using DECIMAL and FLOAT data types:

  •   Smallest FLOAT value is about -7.2**75
  •   Largest FLOAT value is about 7.2**75
  •   Smallest positive FLOAT value is about 5.4**-79
  •   Largest negative FLOAT value is about -5.4**-79
  •   Smallest DECIMAL value is 1 – 10**31
  •   Largest DECIMAL value is 10**31 - 1

Note that the values for floating point numbers are approximations. The ** is used to indicate “raised to the power of”.


If you are moving the between platforms, there is an additional concern when using FLOAT. Mainframes use an IBM standard whereas other platforms use different standards. Since floating point numbers are imprecise to begin with this may not be a problem. However, if you want to make sure that a particular column will be exactly the same value regardless of platform, then floating point is not the way to go. 

There is a third, newer option called DECFLOAT. Introduced in DB2 9 for z/OS, DECFLOAT is a combination of the two data types discussed here, or a decimal floating-point data type. Specified as DECFLOAT(n), where the value of n can be either 16 or or 34, representing the number of significant digits that can be stored. If the n is not specified, then the DECFLOAT column can represent 34 significant digits.

A decimal floating-point value is an IEEE 754r number with a decimal point. The maximum precision is 34 digits and the range of a DECFLOAT number is as follows:

DECFLOAT(16) value can range from a low of: 
-9.999999999999999×10**384

to a high of:
9.999999999999999×10**384

And a DECFLOAT(34) value can range from a low of:
-9.999999999999999999999999999999999 ×10**6144

to a high of:
9.999999999999999999999999999999999 ×10**6144


In addition, the DECFLOAT data type can be used to represent several special values that represent "non-number numbers," as follows:

  • Infinity - a value that represents a number whose magnitude is infinitely large.
  • Quiet NaN - a value that represents undefined results which does not cause an invalid number condition. NaN is not a number.
  • Signaling NaN - a value that represents undefined results which will cause an invalid number condition if used in any numerical operation.


So decimal-floating point values can be more flexible and precise with the ability to range lower and higher than floating point values (or decimal values). However, before using DECFLOAT be careful and plan ahead. If you use COBOL programs to operate on your DB2 for z/OS data because there is no way to specify the SQL DECFLOAT data type in COBOL.


Monday, February 09, 2015

Nulls Cannot Be Ignored!

NULLs are one of the more controversial things that a DB2 professional has to deal with. And I do mean HAS to deal with. 


Because of all the controversy about their usefulness, implementation, and confusing nature, some DB2 folks just decide to hide their head in the sand and ignore NULLs. The thinking goes something like this: “If I do not create any nullable columns in any of my DB2 tables, then I can blissfully ignore the whole NULL mess and be happy!”

Well, that is simply not true. You can follow this approach and still write a query that will return NULL. Don’t believe it? Run this query then:

SELECT AVG(PRSTAFF)
FROM   DSN8B10.PROJ
WHERE  DEPTNO = ˈXOXˈ;

PRSTAFF is defined as a DECIMAL(5,2) column and we are trying to find the average staffing for a specific department. The PRSTAFF column can contain nulls, but even if it could not, the result here would be the same. This query returns a NULL (unless someone inserted a row with the value of ‘XOX’ for DEPTNO at your site).

Why? There is no department ‘XOX’ in the sample databases (provided with DB2). So, the query is attempting to find an average for the empty set. This is NULL. Some folks think this query should return 0, but it won't! The sum of all PRSTAFF for the department ‘XOX’ is not zero, but is unknown…


So, take the time to understand how NULLs work in DB2, because they cannot be ignored!

Wednesday, February 04, 2015

Helping Out the DB2 Optimizer Using the VOLATILE Keyword

Do you know about the VOLATILE keyword? This keyword was added to DB2 for z/OS back in Version 8. It can be specified on a table using either the CREATE TABLE or ALTER TABLE statement.

By specifying VOLATILE, you are indicating that the volume of data in the table is not stable and is likely to fluctuate. In other words, it is volatile! One common scenario where VOLATILE will be helpful is for tables that are emptied nightly and then repopulated the next day, such as an input queue. 

When you specify the VOLATILE keyword on a table, BIND will favor using indexed access paths, even if the table was empty when RUNSTATS was run. It is ideal for single-index tables where you want DB2 to favor using the index.

ERP environments, such as SAP and Peoplesoft, with thousands of tables typically have some tables that meet these criteria. Even worse, it is not uncommon for DBAs to have no idea of the actual content or use for many of those thousands of tables generated by the ERP installation. At times, some of the ERP tables are not in use – depending on which modules of the ERP system you implement… but the tables get created anyway. Many DBAs simply maintain all of the tables provided with the ERP system, whether they are used or not, including running image copies and gathering RUNSTATS for them... and many are empty tables.

Collecting statistics on an empty table populates the catalog with stats indicating that the table contains no data. And, of course, when access paths are generated using those statistics DB2 will probably favor a scan because the table is small (how much smaller can you get than empty?) But some of those tables are volatile, going from empty to perhaps hundreds of thousands of rows during processing.

Of course, if the table is actually empty (or contains only a small amount of data), and VOLATILE is specified, DB2 will favor the use an index if one exists, which can degrade performance a bit. But that is a smaller price to pay than scanning thousands of rows, isn't it?


So one approach is to use the VOLATILE keyword for these types of tables... your users will be glad that you did.