The Db2 Portal Blog: April 2015

Tuesday, April 28, 2015

Controlling Mainframe Software Costs

Although this blog usually focuses on DB2 issues, I sometimes use it to focus on other IT issues, usually mainframe-related. The primary purpose of today's blog post is to promote a webinar I'm conducting this Thursday called, Managing Your z/OS Software Bill. The webinar is sponsored by Data Kinetics, the North American distributor of a product called AutoSoftCapping (or ASC for short), that can be used to help control the rolling four hour average and thereby reduce monthly software bills.

Cost containment is of critical importance for IT departments in this day and age of financial austerity... especially so in the mainframe world. Every decision regarding your computer resources is weighed based on not only the value that they can deliver to your organization, but upon their cost to procure, implement, and maintain. And, in most cases, if a positive return on investment cannot be calculated, the software won’t be adopted, or the hardware won’t be upgraded.

An important opportunity for mainframe cost containment is to better manage the peak monthly capacity of your mainframe on an LPAR (logical partition) by LPAR basis. The pricing model for most mainframe software is based on the capacity of the machine on which the software will run. Note that this pricing model reflects the potential usage based on the capacity of the machine, not the actual usage. Some vendors offer usage-based pricing. You should actively discuss this with your current ISVs as it is becoming more common, more accurately represents fair usage, and can save you money.

IBM offers several subcapacity pricing models for many of its popular software offerings, including products such as z/OS, DB2, IMS, CICS, MQSeries and COBOL. Some of the benefits of subcapcity pricing include the ability to:

Grow hardware capacity without necessarily increasing your software charges
Pay for key software at LPAR-level granularity
Experience a low cost of incremental growth
Manage software cost by managing workload utilization

By tracking MSU usage by LPAR you can be charged based on the maximum rolling four hour (R4H) average MSU usage, instead of full capacity. Most organizations with mainframes have shifted to some form of subcapacity pricing model, but not all of them understand how all of the "moving parts" work together. Thursday's webinar will help to clear that all up!
Managing mainframe software costs by adopting subcapacity pricing, soft capping techniques, and software like Data Kinetics' AutoSoftCapping can help your company to assure a cost-effective IT organization. In today’s cost-cutting, ROI-focused environment, doing anything less than that is short-sighted.

Tuesday, April 21, 2015

SQL Performance Basics: Part 7, Embed the SQL Directly Into Your Programs

You may recall that this is a subject I've written about before, but I think it is important enough to warrant covering briefly in this series on SQL performance basics... and that is, you should avoid black boxes if you want to optimize your applications.

So what is a black box? Simply stated, a black box is a database access routine that sits in between your application programs and DB2. Instead of coding embedded SQL in your application code, you make a call to the access routine (or black box).

The general idea here is that development is easier because programmers do not need to know how to write SQL. Instead, programmers call the black box to request data. SQL statements become calls – and every programmer knows how to code a call, right?

This approach is commonly referred to as a “black box” approach because the data access interface shields the developers from the “complexities” of SQL. The SQL is contained in that black box and programmers do not need to know how the SQL works – just how to call the black box for data. But there are a number of reasons why this approach is not sound:

It is unwise to have uninformed and untrained developers writing DB2 applications without knowing the SQL behind their data access requests. What may seem like a simple request to a non-educated programmer may actually involve very complex and inefficient SQL “behind the scenes” in the black box.
The black box coders will take shortcuts, like combining multiple types of SQL requests into one that will cause more data to be returned than is needed... but then send only the needed data back to the requester. This violates Part 1 of this series.
The black box is an access routine to DB2 data, but SQL is already an access method to DB2 data -- another one is not needed, and is therefore superfluous.
The black box will add CPU cycles to your application because it consists of extra code that is not needed if the SQL is embedded right into your application programs.

Suffice it to say, avoid implementing SQL black boxes... they are just not worth the effort!

If you are interested in more details on this subject, I point you to this article I wrote on the subject in my DBA Corner column for Database Trends & Applications.

Thursday, April 16, 2015

SQL Performance Basics: Part 6, Code Relationally... or Avoid the Flat-File Mentality

Another pervasive problem permeating the DB2 development community -- and indeed for most relational DBMSes -- is the “flat file” development mentality. What I mean by this is, when a programmer tries to access data in a relational database the same way that he would access data from a flat file.

DB2 is ‘relational’ in nature and, as such, operates on data a set-at-a-time, instead of the record-at-a-time processing used against flat files. In order to do justice to DB2, you need to change the way you think about accessing data.

To accomplish this, all users of DB2 need at least an overview education of relational database theory and a moderate to extensive amount of training in SQL. Without such a commitment your programmers are sure to develop ugly and inefficient database access code – and who can blame them? Programmers are used to working with files so they are just doing what comes naturally to them.

SQL is designed so that programmers specify what data is needed but they cannot specify how to retrieve it. SQL is coded without embedded data-navigational instructions. The DBMS analyzes SQL and formulates data-navigational instructions "behind the scenes.” This is foreign to the programmer who has never accessed data using SQL.

Every SQL manipulation statement operates on a table and results in another table. All operations native to SQL, therefore, are performed at a set level. One retrieval statement can return multiple rows; one modification statement can modify multiple rows. This feature of relational databases is called relational closure.

When accessing data, a programmer needs to think about what the end result should be and then code everything possible into the SQL. This means using the native features of SQL – joins and subselects and functions, etc. – instead of coding procedural host language code (whether in COBOL, C, Java or whatever) that, for example, opens up a cursor, fetches a row, and then uses a fetched value to open up another cursor. This is processing DB2 like a set of flat files... better to join the data!

Educating programmers how to use SQL properly -- at a high level -- is probably the single most important thing you can do to optimize performance of your DB2 applications.

Saturday, April 11, 2015

SQL Performance Basics: Part 5, Avoid Sorting

Sorting is a very costly operation that you should strive to avoid if at all possible. 
Indexes are very useful for this purpose. DB2 can choose to use an available index 
to satisfy the sorting requirements of an ORDER BY, GROUP BY, or DISTINCT clause. 
Therefore, it can be beneficial to create indexes that correspond to these clauses 
for frequently executed SQL statements.

For example, consider this SQL statement:

      SELECT    LAST_NAME, TITLE, SALARY
      FROM      EMPLOYEE
      ORDER BY  LAST_NAME, TITLE;

If an index on LAST_NAME and TITLE exists, DB2 can use it to avoid sorting. By 
retrieving the data in order from the index, sorting becomes unnecessary.

You can use this information to tune your SQL further to avoid sorting. When 
ordering is required only by one column, sometimes adding another column to 
the ORDER BY can enable indexed access to avoid the sort.

Consider this SQL statement:

      SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
      FROM      DEPARTMENT
      ORDER BY  MANAGER, DEPT_CODE;

Say there is a unique index on MANAGER. DB2 will probably use this index to 
satisfy the request. However, a sort will be done after retrieving the data to 
put the results into DEPT_CODE order within MANAGER. But, since we know 
our data, we are able to change this situation. Because MANAGER is unique 
we know that the following SQL statement is equivalent to the prior one:

      SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
      FROM      DEPARTMENT
      ORDER BY  MANAGER;

In this case, DB2 can use the index on MANAGER to avoid the sort. The extra 
column, DEPT_CODE, is removed from the ORDER BY clause. But, since MANAGER 
is unique, there can be at most one DEPT_CODE per MANAGER. Therefore, the 
sort order will be equivalent. Because we knew our data we removed a sort from 
the process and made our query more efficient!

One final note on sorting: although most of you probably know this, it cannot be 
stated too strongly (or frequently) - always code an ORDER BY if the order of the 
results of your query is important. The ORDER BY clause is the only foolproof 
method of ensuring appropriate sort order of query results. Simply verifying that 
access is via an index is not sufficient to ensure the order of the results because:


The access path may change in the future to non-indexed access.
Even indexed access does not always return ordered results 
(e.g. list prefetch).

UNION VERSUS UNION ALL

Another simple, yet effective, means of enhancing SQL performance is to 
understand how UNION, EXCEPT and INTERSECT work. Let's start with UNION 
because it has been around in DB2 the longest. UNION takes the results of 
multiple SELECT statements and combines them together. It does this, as part 
of the UNION operation, by sorting the results and removing any duplicates. 
UNION ALL, on the other hand, will not sort and will not remove duplicates.

If you know that the results of the queries being unioned together are distinct

(that is, no duplicates will be encountered), then you can use UNION ALL instead 
of UNION, and thereby enhance performance by avoiding the sort. Additionally, 
if you do not care whether duplicates are returned, always use UNION ALL.

The same applies to the EXCEPT and INTERSECT operations, which were added

as of DB2 9 for z/OS.

Tuesday, April 07, 2015

SQL Performance Basics: Part 4, The Order of Predicates

Did you know that the order in which you code your predicates can have an impact on query performance? It is usually a minimal impact, but it may buy you a couple of microseconds for a very performance-critical query. In order to use predicate ordering to your advantage however, you need to be armed with some basic information on how DB2 evaluates predicates as it processes your SQL.

So, before we continue, let's review the order in which DB2 evaluates predicates at execution time. DB2 will evaluate indexable predicates first: matching predicates before non-matching. Then, Stage 1 predicates, and finally Stage 2 predicates. Within each of these four groups, DB2 will evaluate equal predicates, then BETWEEN and NOT NULL predicates, and finally, any other predicates. If more than one predicate exists within a group, then DB2 will evaluate them in the physical order in which they are coded in the SQL statement.

The re-ordering of predicates to take advantage of this situation should be considered only as a last resort. When implemented, the technique will usually shave only a little bit from the query's execution time. It is also important to note that predicate order will not impact a query's access path: it will remain unchanged (as shown in the PLAN_TABLE).

Now, how can we use this to our advantage?

Consider the following query:

SELECT LAST_NAME, FIRST_NAME
FROM EMPLOYEE
WHERE SEX = 'M'
AND TITLE = 'MANAGER'

For the purposes of this discussion, no index exists for either of the columns coded in the predicates. They are therefore the same type: stage 1 and equal predicates. Furthermore, we know our data - in our organization, there is approximately a 50-50 split between males and females, and 15% of all employees are managers.

To optimize this query then, we can swap the two predicates to achieve better performance. So the query becomes:

SELECT LAST_NAME, FIRST_NAME
FROM EMPLOYEE
WHERE TITLE = 'MANAGER'
AND SEX = 'M'

Why should this query outperform the previous version? Well, assume we have 100,000 employees. If DB2 retrieves 50% of the rows (SEX = 'M') and then retrieves 15% of those 50%, we will have processed 57,500 rows:

( 100000 * 0.5 ) + ( ( 100000 * 0.5 ) * 0.15 ) = 57,500

But, if instead, DB2 were to retrieve 15% of the rows (TITLE = 'MANAGER') and then 50% of those, we will have processed only 22,500 rows:

( 100000 * 0.15) + ( ( 100000 * 0.15 ) * 0.5 ) = 22,500

Obviously, it is better for fewer rows to qualify early, thereby reducing the answer set and the number of rows that will have to be subsequently scanned.

Wednesday, April 01, 2015

SQL Performance Basics: Part 3, Eliminating Predicates?

It is technically possible to learn how to write SQL statements without having an in-depth knowledge of the data. However, the better you know your data, the better your application performance will be. Let's look at a simple example.

By reducing the number of predicates on your SQL statements you may be able to achieve better performance by:

Reducing BIND (and REBIND) time because fewer options will probably need to be examined by the DB2 Optimizer.
Reducing execution time due to a smaller path length caused by the removal of search criteria from the optimized access path. DB2 will always make sure that it processes each predicate coded for the SQL statement. Removing predicates removes work -- and less work equals less time to process the SQL.

Of course, you have to make sure that you can actually remove predicates without impacting the result set of your query, right? But sometimes - if you know your data - there are cases where you can eliminate predicates.

Consider the following SQL statement:

SELECT FIRST_NAME, LAST_NAME, GRADE_LEVEL
FROM EMPLOYEE
WHERE TITLE = 'VP'
AND GRADE_LEVEL >= 10;

This statement retrieve all rows for vice presidents who are at a grade level of 10 or above. But, what if we know more about our data? Say, for example, that the starting grade level for vice presidents in our organization is 10. Therefore, it is impossible for anyone with a lower grade level to achieve the title of VP. That makes the second predicate redundant in this case. If we remove this predicate it will not logically change the results, but with less checking of the data required (DB2 won't have to check for GRADE_LEVEL >= 10) performance may be improved.

It is important though that you truly do "know your data." For example, it is not sufficient to merely note that for current rows in the EMPLOYEE table, no vice presidents are at a grade level below 10. This may just be a coincidence. Do not base your knowledge of your data on the current state of the data. You must truly know your business criteria to determine that a correlation between two columns (such as between GRADE_LEVEL and TITLE) actually exists. And only then should you modify your SQL. Failure to do so can result in incorrect results being returned.

Also, if the predicate was already there and you are removing it, comment out the predicate instead and be sure to document exactly why you are doing so in the code... that way, when somebody else takes a look at it later they'll know what happened and why.