Tuesday, May 12, 2015

A Trip Report from the 2015 IDUG DB2 Tech Conference

Last week I attended, and spoke at, the annual North American IDUG DB2 Tech Conference in Philadelphia, PA. As usual, the event was busy and chock full of useful and interesting DB2 information.

My week started on Sunday with the IBM gold consultant briefing, whereby IBM talks to the gold consultants about their upcoming plans and solicits feedback from us. I can’t really share with you what happened there, but as is almost always the case, it was time well spent.
The conference kicked off in earnest on Monday with the keynote from IBMers Tim Vincent and Namik Hrle titled “Big Data, Analytics and the Future of Data Management.” Tim and Namik discussed how the growth of data is fueling innovation causing a big change in the way value is created. Some of the key takeaways from the keynote, for me at least, were:
  • The predominant source of Big Data for most projects is still structured transactions
  • Primary focus of most data projects is usually on understanding customers
  • There is a trend toward self-service
  • Clearly there is value in Hadoop but you can't replace all your DBMSes with it!

Perhaps the most salient tidbit shared at the keynote address was this: “People have forgotten, or never truly understood, how complex data integration actually is.” I think this is true; all too often people underestimate how difficult the integration of data will be. And I agree, too, with sentiment of the presenters who declared that “We need to realize that data is never going to be in one place.”

The bottom line of the keynote: All of our existing DB2 skills will still be relevant but must co-exist with skills in newer technologies like NoSQL and Hadoop.

Good stuff!

Some of the other highlights of the week:
  • Attendance seemed to be higher than at the past few IDUG conferences. And I see growth in the number of DB2 for LUW attendees. IDUG, at least historically, was mostly attended by DB2 for z/OS folks. And it is probably still true that more z/OS folks attend than LUW folks, but the LUW attendance continues to grow and may surpass z/OS at some point! Of course, this is all based on my unscientific eyeballing of attendees at presentations.
  • My session on DB2 performance had a full room and nobody seemed to nod off or run for the exits while I spoke. I also delivered a VSP for Dell to a room full of DBAs and developers, as well as a couple of presentations for DataKinetics during an evening session. So that was all good!
  • I attended a couple of sessions on newer features of DB2 and how some of them are under-utilized. I think this speaks to a few trends hitting the world of DB2 these days. Number one: training is drying up. If you don’t get training on new features it is hard to use them appropriately. Number two: failure to take advantage of free learning opportunities like webinars and freely-available PDF manuals:
  • The vendor exhibit hall was active and well-attended throughout the week. All the usual suspects were there like BMC, CA, dbi, Dell, and IBM, but there were some newer (at least to IDUG) companies, too, like DataKinetics and Imperva.
  • The educational seminars were integrated into the conference this year. And they did not cost extra to attend. That means that more people attended ed seminars this year (at least I think they did) and the conference offered more value to attendees looking for more in-depth education than can be gained by an hour session.


All in all, the 2015 North American IDUG Tech Conference was a big success. And it is not too early to start planning for next year’s conference, which will be in Austin, TX. Finally, I’ll be able to drive to an IDUG… see you there in 2016!

Tuesday, April 28, 2015

Controlling Mainframe Software Costs

Although this blog usually focuses on DB2 issues, I sometimes use it to focus on other IT issues, usually mainframe-related. The primary purpose of today's blog post is to promote a webinar I'm conducting this Thursday called, Managing Your z/OS Software BillThe webinar is sponsored by Data Kinetics, the North American distributor of a product called AutoSoftCapping (or ASC for short), that can be used to help control the rolling four hour average and thereby reduce monthly software bills.

Cost containment is of critical importance for IT departments in this day and age of financial austerity... especially so in the mainframe world.  Every decision regarding your computer resources is weighed based on not only the value that they can deliver to your organization, but upon their cost to procure, implement, and maintain. And, in most cases, if a positive return on investment cannot be calculated, the software won’t be adopted, or the hardware won’t be upgraded.

An important opportunity for mainframe cost containment is to better manage the peak monthly capacity of your mainframe on an LPAR (logical partition) by LPAR basis. The pricing model for most mainframe software is based on the capacity of the machine on which the software will run. Note that this pricing model reflects the potential usage based on the capacity of the machine, not the actual usage. Some vendors offer usage-based pricing. You should actively discuss this with your current ISVs as it is becoming more common, more accurately represents fair usage, and can save you money.
IBM offers several subcapacity pricing models for many of its popular software offerings, including products such as z/OS, DB2, IMS, CICS, MQSeries and COBOL. Some of the benefits of subcapcity pricing include the ability to:
  • Grow hardware capacity without necessarily increasing your software charges
  • Pay for key software at LPAR-level granularity
  • Experience a low cost of incremental growth
  • Manage software cost by managing workload utilization
By tracking MSU usage by LPAR you can be charged based on the maximum rolling four hour (R4H) average MSU usage, instead of full capacity. Most organizations with mainframes have shifted to some form of subcapacity pricing model, but not all of them understand how all of the "moving parts" work together. Thursday's webinar will help to clear that all up!
Managing mainframe software costs by adopting subcapacity pricing, soft capping techniques, and software like Data Kinetics' AutoSoftCapping can help your company to assure a cost-effective IT organization. In today’s cost-cutting, ROI-focused environment, doing anything less than that is short-sighted.

Tuesday, April 21, 2015

SQL Performance Basics: Part 7, Embed the SQL Directly Into Your Programs

You may recall that this is a subject I've written about before, but I think it is important enough to warrant covering briefly in this series on SQL performance basics... and that is, you should avoid black boxes if you want to optimize your applications.

So what is a black box? Simply stated, a black box is a database access routine that sits in between your application programs and DB2. Instead of coding embedded SQL in your application code, you make a call to the access routine (or black box).

The general idea here is that development is easier because programmers do not need to know how to write SQL. Instead, programmers call the black box to request data. SQL statements become calls – and every programmer knows how to code a call, right?

This approach is commonly referred to as a “black box” approach because the data access interface shields the developers from the “complexities” of SQL. The SQL is contained in that black box and programmers do not need to know how the SQL works – just how to call the black box for data. But there are a number of reasons why this approach is not sound:
  1. It is unwise to have uninformed and untrained developers writing DB2 applications without knowing the SQL behind their data access requests. What may seem like a simple request to a non-educated programmer may actually involve very complex and inefficient SQL “behind the scenes” in the black box.
  2. The black box coders will take shortcuts, like combining multiple types of SQL requests into one that will cause more data to be returned than is needed... but then send only the needed data back to the requester. This violates Part 1 of this series.
  3. The black box is an access routine to DB2 data, but SQL is already an access method to DB2 data -- another one is not needed, and is therefore superfluous.
  4. The black box will add CPU cycles to your application because it consists of extra code that is not needed if the SQL is embedded right into your application programs.

Suffice it to say, avoid implementing SQL black boxes... they are just not worth the effort!


If you are interested in more details on this subject, I point you to this article I wrote on the subject in my DBA Corner column for Database Trends & Applications.

Thursday, April 16, 2015

SQL Performance Basics: Part 6, Code Relationally... or Avoid the Flat-File Mentality

Another pervasive problem permeating the DB2 development community -- and indeed for most relational DBMSes -- is the “flat file” development mentality. What I mean by this is, when a programmer tries to access data in a relational database the same way that he would access data from a flat file. 

DB2 is ‘relational’ in nature and, as such, operates on data a set-at-a-time, instead of the record-at-a-time processing used against flat files. In order to do justice to DB2, you need to change the way you think about accessing data.

To accomplish this, all users of DB2 need at least an overview education of relational database theory and a moderate to extensive amount of training in SQL. Without such a commitment your programmers are sure to develop ugly and inefficient database access code – and who can blame them? Programmers are used to working with files so they are just doing what comes naturally to them.

SQL is designed so that programmers specify what data is needed but they cannot specify how to retrieve it. SQL is coded without embedded data-navigational instructions. The DBMS analyzes SQL and formulates data-navigational instructions "behind the scenes.” This is foreign to the programmer who has never accessed data using SQL.

Every SQL manipulation statement operates on a table and results in another table. All operations native to SQL, therefore, are performed at a set level. One retrieval statement can return multiple rows; one modification statement can modify multiple rows. This feature of relational databases is called relational closure.

When accessing data, a programmer needs to think about what the end result should be and then code everything possible into the SQL. This means using the native features of SQL – joins and subselects and functions, etc. – instead of coding procedural host language code (whether in COBOL, C, Java or whatever) that, for example, opens up a cursor, fetches a row, and then uses a fetched value to open up another cursor. This is processing DB2 like a set of flat files... better to join the data!

Educating programmers how to use SQL properly -- at a high level -- is probably the single most important thing you can do to optimize performance of your DB2 applications.

Saturday, April 11, 2015

SQL Performance Basics: Part 5, Avoid Sorting

Sorting is a very costly operation that you should strive to avoid if at all possible. 
Indexes are very useful for this purpose. DB2 can choose to use an available index 
to satisfy the sorting requirements of an ORDER BY, GROUP BY, or DISTINCT clause. 
Therefore, it can be beneficial to create indexes that correspond to these clauses 
for frequently executed SQL statements. 
For example, consider this SQL statement:
      SELECT    LAST_NAME, TITLE, SALARY
      FROM      EMPLOYEE
      ORDER BY  LAST_NAME, TITLE;
 
If an index on LAST_NAME and TITLE exists, DB2 can use it to avoid sorting. By 
retrieving the data in order from the index, sorting becomes unnecessary.
You can use this information to tune your SQL further to avoid sorting. When 
ordering is required only by one column, sometimes adding another column to 
the ORDER BY can enable indexed access to avoid the sort. 
Consider this SQL statement:
      SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
      FROM      DEPARTMENT
      ORDER BY  MANAGER, DEPT_CODE;
 
Say there is a unique index on MANAGER. DB2 will probably use this index to 
satisfy the request. However, a sort will be done after retrieving the data to 
put the results into DEPT_CODE order within MANAGER. But, since we know 
our data, we are able to change this situation. Because MANAGER is unique 
we know that the following SQL statement is equivalent to the prior one: 
      SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
      FROM      DEPARTMENT
      ORDER BY  MANAGER;
 
In this case, DB2 can use the index on MANAGER to avoid the sort. The extra 
column, DEPT_CODE, is removed from the ORDER BY clause. But, since MANAGER 
is unique, there can be at most one DEPT_CODE per MANAGER. Therefore, the 
sort order will be equivalent. Because we knew our data we removed a sort from 
the process and made our query more efficient!
One final note on sorting: although most of you probably know this, it cannot be 
stated too strongly (or frequently) - always code an ORDER BY if the order of the 
results of your query is important. The ORDER BY clause is the only foolproof 
method of ensuring appropriate sort order of query results. Simply verifying that 
access is via an index is not sufficient to ensure the order of the results because:
  • The access path may change in the future to non-indexed access.
  • Even indexed access does not always return ordered results (e.g. list prefetch).
UNION VERSUS UNION ALL
Another simple, yet effective, means of enhancing SQL performance is to 
understand how UNION, EXCEPT and INTERSECT work. Let's start with UNION 
because it has been around in DB2 the longest. UNION takes the results of 
multiple SELECT statements and combines them together. It does this, as part 
of the UNION operation, by sorting the results and removing any duplicates. 
UNION ALL, on the other hand, will not sort and will not remove duplicates.
If you know that the results of the queries being unioned together are distinct 
(that is, no duplicates will be encountered), then you can use UNION ALL instead 
of UNION, and thereby enhance performance by avoiding the sort. Additionally, 
if you do not care whether duplicates are returned, always use UNION ALL. 
The same applies to the EXCEPT and INTERSECT operations, which were added 
as of DB2 9 for z/OS.