Tuesday, July 21, 2015

Influencing the DB2 Optimizer: Part 4 - OPTIMIZE FOR n ROWS

In the first installments of this series of posts on ways to influence the DB2 optimizer we looked at standard methods like RUNSTATS and SQL tweaking techniques. Today, we will look at yet another method of influencing access path selection: specifying the OPTIMIZE FOR n ROWS for a SELECT statement. 

This clause enables programmers to specify the estimated maximum number of rows that will be retrieved. By indicating that a different number of rows will be returned than DB2 anticipates, you can influence access path selection. 

For example, consider the following statement:

  EXEC SQL
    DECLARE OPT_CUR FOR
      SELECT   WORKDEPT, EMPNO, SALARY
      FROM     DSN81010.EMP
      WHERE    WORKDEPT IN (‘A00’, ‘D11’)
        OPTIMIZE FOR 5 ROWS
  END-EXEC.

In this example, the number of rows to be optimized for has been set to 5. Keep in mind, though, that this this query could return more than 5 rows. DB2 formulates an access path optimized for 5 rows. More rows can be retrieved, but performance could suffer if you greatly exceed the estimated maximum.

Note: The FETCH FIRST n ROWS ONLY clause can be used to limit the number of rows returned regardless of the number that actually qualify.

Using OPTIMIZE FOR generally is preferable to both manually updating the DB2 Catalog statistics (our next topic, so say tuned) and tweaking the SQL statement (our previous topic). It provides more information to DB2’s optimization process, thereby giving DB2 the opportunity to establish a better access path. The crucial point, though, is that DB2 is doing the optimization; no manual updates or artificial SQL constructs are required.

When using the OPTIMIZE FOR n ROWS clause, it is a good idea to make n as accurate as possible. An accurate estimate gives DB2 the best opportunity to achieve optimum performance for the statement and also helps document the purpose of the SQL statement. For example, if you are coding an online transactions in which 25 rows (for example) are displayed on the screen, code OPTIMIZE FOR 25 ROWS.

But this is not really a hard and fast rule. You can tinker with different values to see what impact it may have on the optimization process. 

There is another "trick" that you can deploy using OPTIMIZE FOR n ROWS. By setting n=1 you should be able to disable list prefetch for the query.

One caution to keep in mind: DB2 uses the value of n for the block size of a distributed network request. The smaller the value of n, the fewer rows sent across the network for each block. The only exception is that when n=1, DB2 will set the block size to 16.

Synopsis 


So if you are having trouble with a specific access path, consider using the OPTIMIZE FOR n ROWS clause with various values for n. It just might give you some reasonable results.

Thursday, July 16, 2015

Influencing the DB2 Optimizer: Part 3 - Tweaking SQL Statements

In Part 2 of this series we took a look at standard methods of influencing the DB2 optimizer, essentially boiling down to running RUNSTATS, reorganizing, encouraging parallelism and indexing.

But sometimes you may not want to change the DB2 Catalog statistics (or use hints, which we will cover later in this series). In such situations, you might consider tweaking the offending SQL statement. Tweaking is the process of changing a statement in a non-intuitive fashion, without altering its functionality. Let's take a look at some of these methods...

At times, you may need to disable a specific index from being considered by the optimizer. One method of achieving this is to append OR 0 = 1 to the predicate. For example, consider a query against the EMP table on which two indexes exist: one on EMPNO and one on WORKDEPT. Appending OR 0 = 1 (as shown next) to the WORKDEPT predicate will cause DB2 to avoid using an index on WORKDEPT.

    SELECT  EMPNO, WORKDEPT, EDLEVEL, SALARY
    FROM    EMP
    WHERE   EMPNO BETWEEN ‘000020’ AND ‘000350’
    AND     (WORKDEPT > ‘A01’ OR 0 = 1);


The OR 0 = 1 clause does not change the results of the query, but it can change the access path chosen. 

There are other tweaks that can be used to dissuade the DB2 optimizer from choosing a specific index, such as multiplying or dividing by 1, adding or subtracting 0, or appending an empty string to a character column. 


Another method of tweaking SQL to influence DB2’s access path selection is to code redundant predicates. When DB2 calculates the filter factor for a SQL statement, it multiplies the filter factors for all predicates connected with AND.

You can lower the filter factor of a query by adding redundant predicates. For example, consider the following statement:

    SELECT LASTNAME
    FROM   DSN81010.EMP
    WHERE  WORKDEPT = :VAR;

You can attempt to lower the filter factor for the predicate by adding redundant predicates, for example:

    SELECT LASTNAME
    FROM   DSN81010.EMP
    WHERE  WORKDEPT = :VAR
    AND    WORKDEPT = :VAR
    AND    WORKDEPT = :VAR;

The two predicates added to the end are redundant and do not affect SQL statement functionally. However, DB2 calculates a lower filter factor, which increases the possibility that an index on the WORKDEPT column will be chosen. The lower filter factor also increases the possibility that the table will be chosen as the outer table, if the redundant predicates are used for a join.

When redundant predicates are added to enhance performance, be sure to document the reasons for the extra predicates. Failure to do so may cause a maintenance programmer to assume that the redundant predicates are an error and thus remove them. 

Another option for getting a small amount of performance out of an SQL statement is to change the physical order of the predicates in your SQL code. DB2 evaluates predicates first by predicate type, then according to the order in which it encounters the predicates. The four types of SQL predicates are listed in the order that DB2 processes them:
  • Equality, in which a column is tested for equivalence to another column, a variable, or a literal
  • Ranges, in which a column is tested against a range of values (for example, greater than, less than, or BETWEEN)
  • IN, where a column is tested for equivalence against a list of values
  • Stage 2 predicates

Understanding this hierarchy, you can place the most restrictive predicates at the beginning of your predicate list. For example, consider the following query:

    SELECT   LASTNAME
    FROM     DSN8810.EMP
    WHERE    WORKDEPT = ‘A00’
    AND      SEX = ‘M’;

The first predicate has a lower filter factor than the second because there are fewer workers in department A00 than there are males in the entire company. This probably will not increase performance by much, but it can shave a little off a query’s processing time.

Before deciding to tweak SQL statements to achieve different access paths, remember that you are changing SQL code in a non-intuitive fashion. For each modification you make to increase performance, document the reasons in the program, the data dictionary or repository (if one exists), and the system documentation. Otherwise, the tweaked SQL could be maintained after it is no longer required, or modified away when it still is required for performance.

Also remember that the changes could enhance performance for one release of DB2 but result in no gain or decreased efficiency in subsequent releases. Re-examine your SQL for each new version and release of DB2.

Wednesday, July 08, 2015

Influencing the DB2 Optimizer: Part 2 - Standard Methods

Last week, in Part 1 of this series, we introduced the concept and the 5 high-level methods of influencing the DB2 optimizer's access path choices. In today's post, we will tackle the first approach, which is using standard, DB2-based methods.

Of all the methods for influencing the DB2 optimizer, standard DB2 methods are the only mandatory ones. Try all the standard methods covered in this section before attempting any of the other methods (that we will cover in later installments). There are several reasons for this precaution.

The standard methods place the burden for generating optimal access paths on the shoulders of DB2, which is where it usually belongs. They also use IBM-supported techniques available for every version and release of DB2. Finally, these methods generally provide the greatest gain for the smallest effort.

There are four (4) standard methods for tuning DB2 access paths. The first method is ensuring that accurate statistics are available using the RUNSTATS utility and the BIND or REBIND command. RUNSTATS populates the DB2 Catalog with statistics that indicate the state of your DB2 objects, including the following:
  • Their organization
  • Clustering information
  • The cardinality of table spaces, tables, columns, and indexes
  • The range of values for columns
  • Skew and data distribution details

All of these factors are considered by the optimizer when it chooses what it deems to be the optimal access path for a given SQL statement. We will not delve deeply into all of the options at your disposal when running RUNSTATS in this series. 

A good DB2 tuning strategy is to execute RUNSTATS at least once for every table space, table, column, and index known to your DB2 subsystem. Schedule regular RUNSTATS executions for all DB2 objects that are not read-only. This keeps the DB2 Catalog information current, enabling proper access path selection. If you do not run RUNSTATS whenever your data changes significantly, the DB2 optimizer will not have accurate information upon which to rely when determining optimal access paths. And that means your performance will probably suffer.

The second standard method for tuning DB2 access paths is ensuring that the DB2 objects are properly organized. Disorganized objects, if properly reorganized, might be chosen for an access path. An object is disorganized when data modification statements executed against the object cause data to be stored in a non-optimal fashion, such as non-clustered data or data that exists on a different page than its RID, thereby spanning more than one physical page. To organize these objects more efficiently, run the REORG utility, followed by RUNSTATS and REBIND.

Use the DB2 Catalog and Real Time Statistics (RTS) to determine when your DB2 table spaces and indexes need to be reorganized. A few guidelines you might follow include: 

  • Reorganize a table space when the cluster ratio of its clustering index falls below 95%. (Schedule this so that it does not affect system performance and availability.)
  • Reorganize indexes when there have been a lot of INSERTs and DELETEs since the last time it was reorganized. As data is added to and removed from an index, it can cause disorganization that impacts the performance of queries using the index. The RTS columns
REORGINSERTS and REORGDELETES can be examined to ascertain the number of index entries inserted or deleted since the index was reorganized. A good rule of thumb is to consider reorganizing an index when 25% or more entries have been inserted or deleted.
  • Another index-related statistic to pay particular attention to is R
  • REORGAPPENDINSERT, which contains the number of inserts into an index since the last reorganization for which the index key was higher than any existing key value. If this column consistently grows, you have identified an object where data is inserted using an ascending key sequence. You might consider lowering the free space for such objects, because the free space is wasted space if inserts are always done in ascending key sequence. You should also consider reorganizing the index when 20% or more entries have been appended.
  • In addition, the number of index leaf page splits should also be examined when considering index reorganization. The RTS column that indicates this metric is REORGLEAFAR. Think about reorganizing the index at 10% or more.
  • Consider reorganizing any DB2 table space or index when its data set is in multiple physical extents. Before reorganizing, ensure that space allocations have been modified to cause all data to be stored in a single extent.


  • You may want to reorganize more frequently than indicated here by creating scheduled REORG jobs for heavily accessed or critical DB2 table spaces and indexes. This limits performance problems due to disorganized DB2 objects and reduces the number of reorganizations that must be manually scheduled or submitted by a DBA or performance analyst. 

    Also, keep in mind that online REORGs can be run thereby mitigating the impact ot availability that reorganization can cause. For your largest and most important table spaces you might want to consider advanced REORG utilities such as those offered by third party vendors.


    The third standard method for tuning DB2 access paths is to encourage parallelism. Consider changing simple and segmented table spaces to partitioned table spaces to encourage I/O, CPU, and Sysplex parallelism. Furthermore, it may be advantageous to repartition already partitioned table spaces to better align ranges of values, thereby promoting better parallel access.

    Note: Partitioning is not mandatory for parallel access, but it does aid in parallel access path selection and execution.


    The fourth standard method for tuning DB2 access paths is ensuring that there are proper indexes by creating new indexes or dropping unnecessary and unused indexes. DB2 relies on indexes to achieve optimum performance.

    Analyze the predicates in your SQL statements to determine whether there is an index that DB2 can use. Indexes can be used efficiently by DB2 if the first column of the index key is specified in an indexable predicate in the SQL statement. If no index meets these requirements, consider creating one. As you index more columns referenced in predicates, performance generally increases.

    Dropping unused indexes is another critical part of application tuning. Every table INSERT and DELETE incurs I/O to every index defined for that table. Every UPDATE of indexed columns incurs I/O to every index defined for that column. If an index is not being used, drop it. This reduces the I/O incurred for data modification SQL statements, reduces RUNSTATS resource requirements, and speeds REORG and RECOVER processing.

    If table data is accessed by random reads of small amounts of data, consider building a hash to improve the performance of those reads. 


    You should also know which predicates are Stage 1 and which are Stage 2 and try to favor Stage 1 to improve the performance of your queries.

    That's all for today... but tune in again next week for more ways to influence DB2 access paths...

    Wednesday, July 01, 2015

    Influencing the DB2 Optimizer: Part 1

    Today is July 1, 2015 - mid way through the year and the beginning a fresh, new month. With that in mind, I'm kicking off a new series of blog posts here on the various methods of influencing the DB2 Optimizer's access path decisions. The bulk of the material will be excerpted from my book - DB2 Developer's Guide - which just so happens to be on sale for 40% over the 4ht of July holidays at InformIT if you're interested in picking up a copy.

    Before going into the various methods that can be used to alter access paths or influence access path selection, let's first take a moment to laud the DB2 optimizer and the fine minds that built it. The DB2 optimizer is one of the most intricate pieces of software on the market. It does an admirable job of optimizing SQL requests. To achieve this level of success, the optimizer contains a great deal of performance-specific expertise. For example, the optimizer estimates both elapsed times and CPU times when choosing an access path. When a SQL statement is rebound, the optimizer might choose a new access path that increases CPU time but decreases elapsed time. 

    Now many shops may choose to enhance elapsed time at the expense of additional CPU use because elapsed time has a measurable effect on user productivity. In other words, it can be good to trade off CPU cycles for user satisfaction, and the DB2 optimizer attempts to accomplish this. Of course, if both CPU and elapsed time can be reduced, the optimizer tries to do that, too.

    But the DB2 optimizer is not infallible. Sometimes the application analyst or DBA understands the nature of the data better than DB2. And that is the reason that there are ways to influence the optimizer into choosing an access path that you know is a better one but the optimizer thinks is a worse one. 

    As the functionality and complexity of the optimizer is enhanced from release to release of DB2, the need to trick the optimizer in this way invariably diminishes. But until the day when software is perfect, the need to tweak it will remain (and I'm pretty sure that will be the case throughout my lifetime).

    There are five ways to influence the optimizer’s access path decisions:
    • Updating DB2 Catalog statistics
    • Standard, DB2-based methods
    • Tweaking SQL statements
    • Specifying the OPTIMIZE FOR n ROWS clause
    • Using OPTHINT to indicate that an access path in the PLAN_TABLE should be chosen

    Over the course of the ensuing weeks, we will examine each of these methods, and more. So stay tuned as we delve into the techniques at your disposal to influence the choices made by the DB2 optimizer.  

    Monday, June 22, 2015

    The DBMS Market Circa 2015

    Today's blog post is to call attention to a series of articles and product overviews I have been writing for the TechTarget SearchDataManagement portal on Database Management Systems (DBMS).

    Firstly, I wrote a 7 part series of articles reviewing the DBMS technology circa 2015. This series spans relational, NoSQL and in-memory database technology and here are the links to each of the seven articles:


    Now you may be asking, why would I provide links to these articles on a DB2 blog? Good question. The answer is that it behooves you to keep up to date on the latest breakthroughs and offerings in the world of data management. Sure, we all can agree that DB2 is great and should be used by everybody! But let's face it, our organizations are going to have data-related projects where DB2 is not the primary DBMS... so read through those articles and get up to speed on the new NoSQL and in-memory database offerings out there.


    I have also been writing a series of DBMS product overview documents that briefly review and highlight the features and capabilities of many popular DBMSes. I won't share all of them with you here today (if you are interested, they will all be linked to, over time, on my web site at http://mullinsconsulting.com/articles.html.  I will, though, share the link for the TechTarget product overview I wrote of DB2: IBM DB2 relational DBMS overview.

    That's all for today... thanks for reading!

    Tuesday, June 16, 2015

    Planning DB2 Educational Resources

    Ensuring access to proper DB2 educational materials should be one of the first issues to be addressed after your organization decides to implement DB2. But education sometimes falls through the cracks... 

    Does your organization understand what DB2 is? How it works? For what purposes it is needed at your shop? How it will be used? Without a sound understanding of DB2, its components, and features, it is unlikely that you will be able to use DB2 to its best advantage. A basic level of DB2 knowledge can be acquired through a short DB2 fundamentals class for the IT personnel charged with making DB2 a success at your ­organization. But long-term success with DB2 requires ongoing education.

    After addressing the basics of DB2 education, you must support continuing DB2 education for your co-workers/employees. This support falls into four categories. 

    The first category of training is a standard regimen of SQL and DB2 programming training to be used by application developers. Every programmer should receive basic training on SQL, education on how to embed SQL in the programming languages they will be using, and possibly additional courses on using DB2 with specific infrastructure software like MQ, WebSphere, CICS, and IMS. Also, with the importance of distributed access to DB2 these days, a course should be made available on that topic, with specific portions that address the technology used by your shop to provide distributed DB2 data access. If this basic level of DB2 education is not required for every DB2 programmer, then DB2 application performance will surely suffer as untrained coders write inefficient and incorrect SQL.

    The second category of education support is external training for special needs. This support includes education for database administrators, technical support personnel, and performance analysts. Additionally, your organization needs to plan for ongoing education to keep appropriate personnel up-to-date on new versions and releases of DB2. Although IBM typically offers great courses for new DB2 releases, several third-party vendors such as KBCE and Themis regularly offer in-depth training and release-specific DB2 courses and lectures.

    The third category of education is in-house, interactive training in the form of videos, computer-based training, and instructor-led courses. These courses should be used to augment and refresh the formal training given to your DB2 professional staff. 

    The fourth, and final category of support, is reference material—for example, IBM’s DB2 manuals, DB2 books (such as DB2 Developer's Guide), vendor-supplied white papers, and industry publications and periodicals. The current IBM manuals for DB2 are listed on the inside back cover of this book. Some organizations have third-party software for accessing manuals online, but the Web offers most everything needed these days.

    IBM offers the free, web-based Information Center, as well as PDF versions of all DB2manuals freely available for download over the Web.

    Of course, you should consider augmenting the standard IBM DB2 manuals with IBM redbooks. IBM redbooks provide in-depth, detailed coverage of a specific technology topic. IBM publishes redbooks on multiple subjects, including DB2, IMS, CICS, z/OS, and many other topics. IBM redbooks can greatly assist DB2 technicians working to understand a feature or nuance of DB2. You can download IBM red books for free in Adobe Acrobat format over the Web at http://www.redbooks.ibm.com/

    IBM also offers many other useful sources of DB2 information on the Web including white papers, articles, and book excerpts from IBM Press publications. Another useful source for DB2 information is the IBM Developerworks website. Developerworks offers technical details about development using IBM software and frequently offers DB2-related articles. 

    Independent Software Vendors (ISVs) are another rich source of DB2 information. The major vendors provide in-depth technical papers on features of DB2 that would be difficult for most shops to research in the same detail. BMC Software, CA Technologies, Compuware Corporation, and others are good sources for DB2-related white papers and content.

    All of these educational components—in-house education, external education, and industry publications—are useful for learning how you can use DB2 effectively. You would be wise to have a mix of material that supports more than one of the categories outlined previously. In this way, you provide a varied learning environment that meets the needs of all students. This varied learning environment allows each student to learn in the most conducive way for him or her. Plan to provide an on-site library of educational material addressing the following subjects:
    • Introduction to relational databases
    • Introduction to DB2 and SQL
    • Advanced SQL
    • Programming DB2 using (your languages of choice)
    • Programming DB2 in batch
    • Programming DB2 using TSO, CICS, and IMS
    • Programming DB2 and the Web
    • Creating DB2 stored procedures, triggers, and UDFs
    • Programming DB2 in a Distributed Environment
    • Debugging and Problem Analysis


    You also might want to have an introductory DB2 database administration course to train new DBAs. In addition to this basic education library, plan to provide advanced education for technical DB2 users, such as DBAs, technical support personnel, and technical programmers and analysts. Advanced DBA topics (such as Sysplex Data Sharing, performance management, and backup/recovery) should be left to instructor-led training courses because of the complex nature of DB2 database administration.Additional advanced topics to consider include system administration (for systems programmers) and disaster recovery. Many vendors, including IBM and Themis, offer these classes. Searching for smaller consulting firms and local resources is also prudent; these firms sometimes can provide courses tailored to your installation needs.


    The advanced education program should include allocating time to attend area user groups meetings, the annual IBM Insight Conference, and the International DB2 UsersGroup (IDUG) conferences. The DB2 Symposium events also are useful resources for DB2 education. When DB2 users get together to share experiences at such forums, they uncover undocumented solutions and ideas that would be difficult to arrive at independently.

    The bottom line is simple, though. Plan for ongoing DB2 education for your DBAs, programmers, and analysts... or plan on failing.

    Tuesday, June 09, 2015

    New England DB2 User Group

    Just a brief blog post today to promote my upcoming speaking engagement at the New England DB2 User Group, in Sturbridge, MA (just outside of Boston). If you are in or around the area on June 18, 2015 be sure to stop by and participate in the meeting!

    I will be giving two presentations (in the afternoon) that day. First up is my Big Data introductory presentation titled A Big Data Roadmap for the DB2 Professional. This was voted one of the Top Ten presentations at IDUG North America in 2014. My second presentation is a little bit different than what I normally present. It is titled Reduce Costs by Tuning DB2 to Reduce Your Rolling 4 Hour Average and in it I will walk through the components of subcapacity pricing and variable workload license charge... and how to use that knowledge to tune and reduce your monthly mainframe costs.

    There are two other speakers that day, including a pitch from IBM on migrating to DB2 11 for z/OS and another presentation from the State of CT on IBM's  PureData system.

    So if you are going to be in the New England area be sure to put the NEDB2UG meeting on your agenda.

    Hope to see you there!

    Monday, June 01, 2015

    DB2 User-Defined Functions

    In my last post here, I talked about the importance of built-in DB2 functions and how they can be used to simplify development. But DB2 also allows us to create our own, user-defined functions to programmatically extend the functionality of DB2.

    A user-defined function, or UDF for short, is procedural functionality that you code up and add to DB2. The UDF, after coded and implemented, extends the functionality of DB2 SQL by enabling users to specify the UDF in SQL statements just like built-in SQL functions.
    User-defined functions are ideal for organizations wanting to utilize DB2 and SQL to perform specialized routines with your own rules, business logic and data transformation procedures.

    UDFs are similar in functionality to application subroutines, but user-defined functions are different because they can be used inside SQL statements. The only way that user-defined functions can be executed is from within an SQL statement. This gives them great power. A user-defined function is not a substitute for an application subroutine, subprogram, or procedure. Instead, user-defined functions extend the functionality of the SQL language.
    There are two basic ways to create a user-defined function: You can code your own program from scratch (using either a host programming language or SQL) or you can edit an existing function. When coded, the CREATE FUNCTION statement registers the user-defined function to DB2 for use.

    Types of User-Defined Functions (UDFs)

    DB2 supports five different types of UDFs depending upon the manner in which they are coded and the type of functionality they support.
    • The first type is the external scalar UDF. Recall, from the prior post, that scalar functions are applied to a column or expression and operate on a single value. Furthermore, external functions are referred to as external because they are written and developed outside of (or external to) DB2. External UDFs must be written in a host programming language. DB2 user-defined functions can be written in Assembler, C, C++, COBOL, Java, or PL/I.
    • The next type of UDF is the external table UDF. Remember that table functions, when called, return an entire table. A table function is specified in the WHERE clause of a SELECT statement taking the place of a table, view, synonym, or alias.
    • A third type of user-defined function can be created from another existing function. This is a sourced function. A sourced function is based on a function that already exists -— it can be based on a built-in function (scalar or column) or another user-defined function that has already been created. External functions are coded from scratch, whereas sourced functions are created based on a pre-existing function and inherit the attributes of the underlying source function.
    • DB2 also supports the creation of SQL scalar user-defined functions. SQL scalar functions return a scalar value and are written entirely using SQL. Fairly extensive functionality can be delivered via SQL scalar functions using SQL procedural language.
    • The fifth, and final type of UDF supported by DB2 was introduced as of DB2 10 for z/OS: the SQL table UDF. SQL table functions are written exclusively in SQL but return a table as a result.
    User-defined functions can be used to handle non-traditional data requirements, as well as to build DB2 databases that are customized to your business requirements.

    There are a lot of details involved in coding and using UDFs, such as schema management, configuring WLM for execution, SQL restrictions, and more. But I will not be digging into those details in today’s post. I invite you to research the topic and learn how to extend the functionality of your DB2 environment using UDFs. 

    Tuesday, May 19, 2015

    The Importance of Using DB2 Functions

    There are three basic types of built-in functions available to DB2 users: 
    1. aggregate functions, 
    2. scalar functions, and 
    3. table functions. 

    It is important that you understand what functions are available – as well as what they do – in order to simplify program development and data access. Functions are invoked simply by specifying the function name and any required operands. A built-in function can be used any place an expression can be used (with some exceptions).

    The first type of function is called an aggregate function, or sometimes a column function. Aggregate functions can be used to compute, from a group of rows, a single value for a designated column or expression. This provides the capability to aggregate data, thereby enabling you to perform statistical calculations across many rows with one SQL statement.

    An aggregate function can be specified in a WHERE clause only if that clause is part of a subquery of a HAVING clause. In addition, every column name specified in the expression of the aggregate function must be a correlated reference to the same group.

    The aggregate functions are AVG, CORRELATION, COUNT, COUNT_BIG, COVARIANCE, COVARIANCE_SAMP, MAX, MIN, STDDEV, SUM, VARIANCE, and XMLAGG.

    I won’t be explaining what each of these functions does in today’s blog entry; instead, I suggest that you review their functionality in the SQL Reference Guide (SC19-4066 for DB2 11 for z/OS).

    The second type of function is the scalar function, which is probably the most commonly used type of function. However, there are so many scalar functions that it can be difficult to keep track of them all. A scalar function is applied to a column or expression and operates on a single value. Contrast this with the aggregate functions, which are applied to a set of data and return only a single result.

    The result of a scalar function is a transformed version of the column or expression being operated on. The transformation of the value is based on the scalar function being applied and the value itself. For example, you might use the CHAR function to convert a DB2 date, time, timestamp, ROWID, floating point, integer, or decimal value to a character value… or you might use the DIGITS function to convert a number to a character string of digits.

    Of course, these are just a few simple examples of the many functions available in DB2 for z/OS. There are well over 100 scalar functions, each of which may be useful to minimize the amount of code you write by making DB2 do it, instead!
    With so many functions available it would not make sense to cover them all here. Suffice it to say, by acquiring a sound understanding of DB2’s functions many requirements that seem to be difficult at first glance, are achievable using nothing more than SQL (check out this example to clarify what I mean: Sorting Out a Solution in DB2). 

    The third type of function supported by DB2 is the table function, which can be specified only in the FROM clause of a query. This is because table functions return columns and resemble a table. There are three basic table functions:
    • ADMIN_TASK_LIST: Returns a table with one row for each defined task in the administrative scheduler task list. The administrative scheduler is covered in Part VI, “DB2 Utilities and and Commands.”
    • ADMIN_TASK_OUTPUTFor an execution of a stored procedure, returns the output parameter values and result sets, if available.
    • ADMIN_TASK_STATUS: Returns a table with one row for each defined task in the administrative scheduler task list, showing the status of the task the last time it was run.

    Additional Useful Function Information 

    DB2 also offers some other types of built-in functions including:
    • MQSeries Built-In Functions - scalar and table functions for use with IBM’s message queuing software, MQSeries. These functions enable MQSeries messages to be received and sent.
    • XML Built-In Functions - scalar functions to allow applications to efficiently generate and manipulate XML data with DB2.
    The final built-in function to be aware of is the RAISE_ERROR Function. It is used to raise an error condition in the SQLCA. The user supplies the SQLSTATE and error description for the error to be raised. The error will be raised with the specified SQLSTATE and a SQLCODE of –438. The RAISE_ERROR function can be used to signal application program and data problems.

    What About User-Defined Functions?

    DB2 also supports the creation of additional functions (and data types) to supplement the built-in function (and data types) supplied out-of-the-box. User-defined functions (and types) give users the ability to effectively customize DB2 to their shop requirements. The ability to customize is potentially very powerful. It also can be quite complex and requires detailed knowledge, additional application development skills, and administrative dexterity.


    We will take a look at UDFs in the next blog post, so check in again next week for that!

    Tuesday, May 12, 2015

    A Trip Report from the 2015 IDUG DB2 Tech Conference

    Last week I attended, and spoke at, the annual North American IDUG DB2 Tech Conference in Philadelphia, PA. As usual, the event was busy and chock full of useful and interesting DB2 information.

    My week started on Sunday with the IBM gold consultant briefing, whereby IBM talks to the gold consultants about their upcoming plans and solicits feedback from us. I can’t really share with you what happened there, but as is almost always the case, it was time well spent.
    The conference kicked off in earnest on Monday with the keynote from IBMers Tim Vincent and Namik Hrle titled “Big Data, Analytics and the Future of Data Management.” Tim and Namik discussed how the growth of data is fueling innovation causing a big change in the way value is created. Some of the key takeaways from the keynote, for me at least, were:
    • The predominant source of Big Data for most projects is still structured transactions
    • Primary focus of most data projects is usually on understanding customers
    • There is a trend toward self-service
    • Clearly there is value in Hadoop but you can't replace all your DBMSes with it!

    Perhaps the most salient tidbit shared at the keynote address was this: “People have forgotten, or never truly understood, how complex data integration actually is.” I think this is true; all too often people underestimate how difficult the integration of data will be. And I agree, too, with sentiment of the presenters who declared that “We need to realize that data is never going to be in one place.”

    The bottom line of the keynote: All of our existing DB2 skills will still be relevant but must co-exist with skills in newer technologies like NoSQL and Hadoop.

    Good stuff!

    Some of the other highlights of the week:
    • Attendance seemed to be higher than at the past few IDUG conferences. And I see growth in the number of DB2 for LUW attendees. IDUG, at least historically, was mostly attended by DB2 for z/OS folks. And it is probably still true that more z/OS folks attend than LUW folks, but the LUW attendance continues to grow and may surpass z/OS at some point! Of course, this is all based on my unscientific eyeballing of attendees at presentations.
    • My session on DB2 performance had a full room and nobody seemed to nod off or run for the exits while I spoke. I also delivered a VSP for Dell to a room full of DBAs and developers, as well as a couple of presentations for DataKinetics during an evening session. So that was all good!
    • I attended a couple of sessions on newer features of DB2 and how some of them are under-utilized. I think this speaks to a few trends hitting the world of DB2 these days. Number one: training is drying up. If you don’t get training on new features it is hard to use them appropriately. Number two: failure to take advantage of free learning opportunities like webinars and freely-available PDF manuals:
    • The vendor exhibit hall was active and well-attended throughout the week. All the usual suspects were there like BMC, CA, dbi, Dell, and IBM, but there were some newer (at least to IDUG) companies, too, like DataKinetics and Imperva.
    • The educational seminars were integrated into the conference this year. And they did not cost extra to attend. That means that more people attended ed seminars this year (at least I think they did) and the conference offered more value to attendees looking for more in-depth education than can be gained by an hour session.


    All in all, the 2015 North American IDUG Tech Conference was a big success. And it is not too early to start planning for next year’s conference, which will be in Austin, TX. Finally, I’ll be able to drive to an IDUG… see you there in 2016!

    Tuesday, April 28, 2015

    Controlling Mainframe Software Costs

    Although this blog usually focuses on DB2 issues, I sometimes use it to focus on other IT issues, usually mainframe-related. The primary purpose of today's blog post is to promote a webinar I'm conducting this Thursday called, Managing Your z/OS Software BillThe webinar is sponsored by Data Kinetics, the North American distributor of a product called AutoSoftCapping (or ASC for short), that can be used to help control the rolling four hour average and thereby reduce monthly software bills.

    Cost containment is of critical importance for IT departments in this day and age of financial austerity... especially so in the mainframe world.  Every decision regarding your computer resources is weighed based on not only the value that they can deliver to your organization, but upon their cost to procure, implement, and maintain. And, in most cases, if a positive return on investment cannot be calculated, the software won’t be adopted, or the hardware won’t be upgraded.

    An important opportunity for mainframe cost containment is to better manage the peak monthly capacity of your mainframe on an LPAR (logical partition) by LPAR basis. The pricing model for most mainframe software is based on the capacity of the machine on which the software will run. Note that this pricing model reflects the potential usage based on the capacity of the machine, not the actual usage. Some vendors offer usage-based pricing. You should actively discuss this with your current ISVs as it is becoming more common, more accurately represents fair usage, and can save you money.
    IBM offers several subcapacity pricing models for many of its popular software offerings, including products such as z/OS, DB2, IMS, CICS, MQSeries and COBOL. Some of the benefits of subcapcity pricing include the ability to:
    • Grow hardware capacity without necessarily increasing your software charges
    • Pay for key software at LPAR-level granularity
    • Experience a low cost of incremental growth
    • Manage software cost by managing workload utilization
    By tracking MSU usage by LPAR you can be charged based on the maximum rolling four hour (R4H) average MSU usage, instead of full capacity. Most organizations with mainframes have shifted to some form of subcapacity pricing model, but not all of them understand how all of the "moving parts" work together. Thursday's webinar will help to clear that all up!
    Managing mainframe software costs by adopting subcapacity pricing, soft capping techniques, and software like Data Kinetics' AutoSoftCapping can help your company to assure a cost-effective IT organization. In today’s cost-cutting, ROI-focused environment, doing anything less than that is short-sighted.

    Tuesday, April 21, 2015

    SQL Performance Basics: Part 7, Embed the SQL Directly Into Your Programs

    You may recall that this is a subject I've written about before, but I think it is important enough to warrant covering briefly in this series on SQL performance basics... and that is, you should avoid black boxes if you want to optimize your applications.

    So what is a black box? Simply stated, a black box is a database access routine that sits in between your application programs and DB2. Instead of coding embedded SQL in your application code, you make a call to the access routine (or black box).

    The general idea here is that development is easier because programmers do not need to know how to write SQL. Instead, programmers call the black box to request data. SQL statements become calls – and every programmer knows how to code a call, right?

    This approach is commonly referred to as a “black box” approach because the data access interface shields the developers from the “complexities” of SQL. The SQL is contained in that black box and programmers do not need to know how the SQL works – just how to call the black box for data. But there are a number of reasons why this approach is not sound:
    1. It is unwise to have uninformed and untrained developers writing DB2 applications without knowing the SQL behind their data access requests. What may seem like a simple request to a non-educated programmer may actually involve very complex and inefficient SQL “behind the scenes” in the black box.
    2. The black box coders will take shortcuts, like combining multiple types of SQL requests into one that will cause more data to be returned than is needed... but then send only the needed data back to the requester. This violates Part 1 of this series.
    3. The black box is an access routine to DB2 data, but SQL is already an access method to DB2 data -- another one is not needed, and is therefore superfluous.
    4. The black box will add CPU cycles to your application because it consists of extra code that is not needed if the SQL is embedded right into your application programs.

    Suffice it to say, avoid implementing SQL black boxes... they are just not worth the effort!


    If you are interested in more details on this subject, I point you to this article I wrote on the subject in my DBA Corner column for Database Trends & Applications.

    Thursday, April 16, 2015

    SQL Performance Basics: Part 6, Code Relationally... or Avoid the Flat-File Mentality

    Another pervasive problem permeating the DB2 development community -- and indeed for most relational DBMSes -- is the “flat file” development mentality. What I mean by this is, when a programmer tries to access data in a relational database the same way that he would access data from a flat file. 

    DB2 is ‘relational’ in nature and, as such, operates on data a set-at-a-time, instead of the record-at-a-time processing used against flat files. In order to do justice to DB2, you need to change the way you think about accessing data.

    To accomplish this, all users of DB2 need at least an overview education of relational database theory and a moderate to extensive amount of training in SQL. Without such a commitment your programmers are sure to develop ugly and inefficient database access code – and who can blame them? Programmers are used to working with files so they are just doing what comes naturally to them.

    SQL is designed so that programmers specify what data is needed but they cannot specify how to retrieve it. SQL is coded without embedded data-navigational instructions. The DBMS analyzes SQL and formulates data-navigational instructions "behind the scenes.” This is foreign to the programmer who has never accessed data using SQL.

    Every SQL manipulation statement operates on a table and results in another table. All operations native to SQL, therefore, are performed at a set level. One retrieval statement can return multiple rows; one modification statement can modify multiple rows. This feature of relational databases is called relational closure.

    When accessing data, a programmer needs to think about what the end result should be and then code everything possible into the SQL. This means using the native features of SQL – joins and subselects and functions, etc. – instead of coding procedural host language code (whether in COBOL, C, Java or whatever) that, for example, opens up a cursor, fetches a row, and then uses a fetched value to open up another cursor. This is processing DB2 like a set of flat files... better to join the data!

    Educating programmers how to use SQL properly -- at a high level -- is probably the single most important thing you can do to optimize performance of your DB2 applications.

    Saturday, April 11, 2015

    SQL Performance Basics: Part 5, Avoid Sorting

    Sorting is a very costly operation that you should strive to avoid if at all possible. 
    Indexes are very useful for this purpose. DB2 can choose to use an available index 
    to satisfy the sorting requirements of an ORDER BY, GROUP BY, or DISTINCT clause. 
    Therefore, it can be beneficial to create indexes that correspond to these clauses 
    for frequently executed SQL statements. 
    For example, consider this SQL statement:
          SELECT    LAST_NAME, TITLE, SALARY
          FROM      EMPLOYEE
          ORDER BY  LAST_NAME, TITLE;
     
    If an index on LAST_NAME and TITLE exists, DB2 can use it to avoid sorting. By 
    retrieving the data in order from the index, sorting becomes unnecessary.
    You can use this information to tune your SQL further to avoid sorting. When 
    ordering is required only by one column, sometimes adding another column to 
    the ORDER BY can enable indexed access to avoid the sort. 
    Consider this SQL statement:
          SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
          FROM      DEPARTMENT
          ORDER BY  MANAGER, DEPT_CODE;
     
    Say there is a unique index on MANAGER. DB2 will probably use this index to 
    satisfy the request. However, a sort will be done after retrieving the data to 
    put the results into DEPT_CODE order within MANAGER. But, since we know 
    our data, we are able to change this situation. Because MANAGER is unique 
    we know that the following SQL statement is equivalent to the prior one: 
          SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
          FROM      DEPARTMENT
          ORDER BY  MANAGER;
     
    In this case, DB2 can use the index on MANAGER to avoid the sort. The extra 
    column, DEPT_CODE, is removed from the ORDER BY clause. But, since MANAGER 
    is unique, there can be at most one DEPT_CODE per MANAGER. Therefore, the 
    sort order will be equivalent. Because we knew our data we removed a sort from 
    the process and made our query more efficient!
    One final note on sorting: although most of you probably know this, it cannot be 
    stated too strongly (or frequently) - always code an ORDER BY if the order of the 
    results of your query is important. The ORDER BY clause is the only foolproof 
    method of ensuring appropriate sort order of query results. Simply verifying that 
    access is via an index is not sufficient to ensure the order of the results because:
    • The access path may change in the future to non-indexed access.
    • Even indexed access does not always return ordered results (e.g. list prefetch).
    UNION VERSUS UNION ALL
    Another simple, yet effective, means of enhancing SQL performance is to 
    understand how UNION, EXCEPT and INTERSECT work. Let's start with UNION 
    because it has been around in DB2 the longest. UNION takes the results of 
    multiple SELECT statements and combines them together. It does this, as part 
    of the UNION operation, by sorting the results and removing any duplicates. 
    UNION ALL, on the other hand, will not sort and will not remove duplicates.
    
    If you know that the results of the queries being unioned together are distinct 
    (that is, no duplicates will be encountered), then you can use UNION ALL instead 
    of UNION, and thereby enhance performance by avoiding the sort. Additionally, 
    if you do not care whether duplicates are returned, always use UNION ALL. 
    The same applies to the EXCEPT and INTERSECT operations, which were added 
    as of DB2 9 for z/OS.