Thursday, October 01, 2015

Understanding DB2 SELECT Syntax

Most DB2 programmers think they know how to correctly code simple SQL SELECT statements. And they usually are correct, as long as you keep that adjective “simple” in the assertion. When the statement requires more than SELECT...FROM…WHERE though, problems can ensue.

One of the biggest SELECT type of problem encountered by DB2 users is related to syntax. To paraphrase Mark Twain, sometimes what people think they know, just ain’t so.

How can you find the proper syntax for SELECT statements? The DB2 SQL Reference manual contains all of the syntax for DB2 SQL, but query syntax is separated from the rest of the language. Typically, users go to Chapter 5 of the SQL Reference that contains syntax diagrams, semantic descriptions, rules, and examples of the use of DB2 SQL statements. SELECT INTO is there, but SELECT is not in this section. Well, actually, there is a placeholder page that refers the reader back to Chapter 4. Chapter 4 contains the detailed syntax information and usage details for DB2 queries using SELECT. 


Another potentially confusing aspect of DB2 SELECT is the breakdown of SELECT into three sections: fullselect, subselect, and select-statement. This causes many developers to confuse which query options are available to the SELECT statements they want to code.

Let's take a look at each of these three sections:

First up is the select-statement, which is the form of a query that can be directly specified in a DECLARE CURSOR statement, or prepared and then referenced in a DECLARE CURSOR statement. It is the thing most people think of when they think of SELECT in all its glory. If so desired, it can be issued interactively using SPUFI. The select-statement consists of a fullselect, and any of the following optional clauses: WITH CTE, update, read-only, optimize-for, isolation, queryno and SKIP LOCKED DATA.

A CTE, or common table expression, defines a result table with a table-identifier that can be referenced in any FROM clause of the fullselect that follows. Multiple CTEs can be specified following a single WITH keyword. Each specified CTE can also be referenced by name in the FROM clause of subsequent common table expressions.

The next component is a fullselect, which can be part of a select-statement, a CREATE VIEW statement, a materialized query table, a temporary table or an INSERT statement. Basically, a fullselect specifies a result table. A fullselect consists of at least a subselect, possibly connected to another subselect via UNION, EXCEPT or INTERSECT. And ever since DB2 version 9 for z/OS, you can apply either or both ORDER BY and FETCH FIRST clauses. Prior to V9, this sometimes confused folks as they tried to put a FETCH FIRST n ROWS clause or an ORDER BY in a view or as part of an INSERT. That was not allowed! But it is now.

However, a fullselect does not allow any of the following clauses: FOR FETCH ONLY, FOR UPDATE OF, OPTIMIZE FOR, WITH, QUERYNO and SKIP LOCKED DATA. A fullselect specifies a result table – and none of these afore-mentioned clauses apply.

This sometimes confuses folks. I recently had a conversation with a guy who swore that at one point he created a view using the WITH UR clause and that it worked. It didn’t when we spoke and I’m sure it never did.

Finally, a subselect is a component of the fullselect. A subselect specifies a result table derived from the result of its first FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next.

I know, this can all seem to be a bit confusing. But think of it this way: in a subselect you specify the FROM to get the tables, the WHERE to get the conditions, GROUP BY to get aggregation, HAVING to get the conditions on the aggregated data, and the SELECT clause to get the actual columns. In a fullselect you add in the UNION to combine subselects and other fullselects. Finally, you add on any optional clauses (as specified earlier) to get the select-statement.

Now what could be any easier?

Actually, it is not super easy. And if you add in some of the newer SQL capabilities, like OLAP functions or temporal time travel query clauses, it gets even more complicated.


I guess the bottom line is that you really should make sure you have the SQL Reference handy (see link above) if you are doing anything other than simple selecting… because you’ll probably need it.

Monday, September 14, 2015

Are You Heading to Las Vegas for IBM Insight?

The IBM Insight conference is coming up and if you work with DB2, Big Data, analytics, data warehousing, or really, anything at all about enterprise data, then Las Vegas is the place to be the week of October 25 thru 29, 2015.

But what is IBM Insight? Well, you may remember it as the IBM Information on Demand conference, or IOD for short, Yes, IBM has renamed the conference yet again. I'm sure a lot of you can remember when there were a bunch of different "Technical Conferences" like the IMS Tech Conference or the DB2 Tech Conference. Those conferences, as well as several others, all got rolled up into IOD... which is now IBM Insight.


So with that bit of potential confusion out of the way, why should you attend this year's IBM Insight conference? Simply put, there is something of interest for everybody in a data-related profession. 

The conference boasts more than 1600 presentations that run the gamut from DB2 to IMS to Cognos to BI to Big Data to analytics to... well, you get the idea. You can experiences technical training, hands-on labs, and industry use cases delivered by over 1000 industry experts.

There will be multiple keynote sessions, focusing in on IBM's important data initiatives including:
  • Data Management
  • Advanced Analytics
  • Cloud
  • Hadoop & Spark
  • Content Management
  • Watson
Last year's event was attended by over 13,000 folks, which gives attendees a great opportunity to network with your peers and IBMers. Add to that the over 350 exhibitors at the Expo hall and you will be able to view, review, and examine all kinds of interesting software to help you manage your enterprise data.

I also want to promote my presentation at this year's conference, called Not Your Daddy's DB2! I'll talk about the changing landscape of the industry and how DB2 for z/OS has changed (and continues to change) to embrace modern IT. This session will be held on Wednesday, October 28, at 4:00 pm in the South Seas J room). If you're going to the conference, I hope to see you at my presentation.

And, as always, there will be plenty of time to kick back and relax after a long day of networking and learning. On Wednesday evening conference attendees will be treated to a free concert from Maroon 5... and, of course, everything that Las Vegas has to offer can be on your agenda, too.

So what are you waiting for? Register for the 2015 IBM Insight conference today... 

And if you register by September 18th you can get a discounted rate ($300 off).

Tuesday, September 08, 2015

Mainframe Cost Optimization



Modern businesses live in an age of financial austerity, cost containment and cutbacks. It is just a fact of life in many organizations that you need to be constantly vigilant for new ways to reduce costs. If your business relies on the mainframe -- and many of the biggest businesses do -- cost containment is of the utmost importance.
But how to cut costs? Many mainframe support groups are running thin in terms of people - so layoffs don't make a lot of sense. And the software that runs the business can't be cut. Management software supports the business systems, so cutting those may cost more than you save!

But there are things you can do. If you're interested in learning more about IBM MLC software costs, pricing/licensing, mainframe cost optimization and a solution to dynamically manage your system to reduce software costs be sure to take an hour out of your busy schedule and join me for a free webinar titled Mainframe Cost Optimization: Pricing, Licensing, the R4HA, and More! 

I'll be delivering this webinar on September 10, 2015 at 2pm EDT.

During this session I'll discuss:
  • The new mainframe pricing options including zCAP, CMP and MWP
  • The disparate moving parts of sub-capacity pricing including the R4HA
  • Methods for controlling R4HA intelligently to reduce monthly software costs.

So click here to register for the webinar and join me on September 10th. 

Friday, September 04, 2015

Influencing the DB2 Optimizer: Part 7 - Miscellaneous Additional Considerations

In this 7th, and final installment of this series on influencing the DB2 optimizer's access path choices, we will take a look at a couple of additional things to consider as you work toward improving your SQL performance.

Favor Optimization Hints Over Updating the DB2 Catalog  

Optimization hints to influence access paths are less intrusive and easier to implement than changing data in the DB2 Catalog. However, that does not mean that you should use optimization hints all the time! Do not use optimization hints as a crutch to arrive at a specific access path. Optimization hints are best used when an access path changes and you want to go back to a previous, efficient access path.

Limit Ordering to Avoid Scanning  

The optimizer is more likely to choose an index scan when ordering is important (ORDER BY, GROUP BY, or DISTINCT) and the index is clustered by the columns to be sorted.

Maximize Buffers and Minimize Data Access  

If the inner table fits in 2% of the buffer pool, nested loop join should be favored. Therefore, to increase the chances of nested loop joins, increase the size of the buffer pool (or decrease the size of the inner table, if ­possible).

Consider Deleting Non-uniform Distribution Statistics

Sometimes non-uniform distribution statistics can cause dynamic SQL statements to fluctuate dramatically in terms of how they perform. To decrease these wild fluctuations, consider removing the non-uniform distribution statistics from the DB2 Catalog.

Although dynamic SQL makes the best use of these statistics, the overall performance of some applications that heavily use dynamic SQL can suffer. The optimizer might choose a different access path for the same dynamic SQL statement, depending on the values supplied to the predicates. In theory, this should be the desired goal. In practice, however, the results might be unexpected. For example, consider the following dynamic SQL statement:

SELECT   EMPNO, LASTNAME
FROM     DSN81010.EMP
WHERE    WORKDEPT = ?

The access path might change depending on the value of WORKDEPT because the optimizer calculates different filter factors for each value, based on the distribution statistics. As the number of occurrences of distribution statistics increases, the filter factor decreases. This makes DB2 think that fewer rows will be returned, which increases the chance that an index will be used and affects the choice of inner and outer tables for joins.

These statistics are stored in the SYSIBM.SYSCOLDIST and SYSIBM.SYSCOLDISTSTATS tables and can be removed using SQL DELETE statements.

This suggested guideline does not mean that you should always delete the non-uniform distribution statistics. My advice is quite to the contrary. When using dynamic SQL, allow DB2 the chance to use these statistics. Delete these statistics only when performance is unacceptable. (They can always be repopulated later using RUNSTATS.)

Collect More Than Just the Top Ten Non-uniform Distribution Statistics   

If non-uniform distribution impacts more than just the top ten most frequently occurring values, you should use the FREQVAL option of RUNSTATS to capture more than 10 values. Capture only as many as will prove to be useful for optimizing queries against the non-uniformly distributed data.

DB2 Referential Integrity Use

Referential integrity (RI) is the implementation of constraints between tables so that values from one table (the parent) control the values in another (the dependent, or child). A referential constraint between a parent table and a dependent table is defined by a relationship between the columns of the tables. The parent table’s primary key columns control the values permissible in the dependent table’s foreign key columns. For example, in the sample table, DSN8810.EMP, the WORKDEPT column (the foreign key) must reference a valid department as defined in the DSN8810.DEPT table’s DEPTNO column (the primary key).

You have two options for implementing RI at your disposal: declarative and application. Declarative constraints provide DB2-enforced referential integrity and are specified by DDL options. All modifications, whether embedded in an application program or ad hoc, must comply with the referential constraints. Favor using declarative RI as DB2 will then be aware of the relationship and can use that information during access path optimization.

Application-enforced referential integrity is coded into application programs. Every program that can update referentially-constrained tables must contain logic to enforce the referential integrity. This type of RI is not applicable to ad hoc updates.

With DB2-enforced RI, CPU use is reduced because the Data Manager component of DB2 performs DB2-enforced RI checking, whereas the RDS component of DB2 performs ­application-enforced RI checking. Additionally, rows accessed for RI checking when using application-enforced RI must be passed back to the application from DB2. DB2-enforced RI does not require this passing of data, further reducing CPU time.

In addition, DB2-enforced RI uses an index (if one is available) when enforcing the referential constraint. In application-enforced RI, index use is based on the SQL used by each program to enforce the constraint.

If you must use application RI instead of declarative RI, be sure to also define referential constraints with the NOT ENFORCED keyword. In that case, the constraints will not be enforced by DB2, but will be documented in the DDL. And it gives DB2 additional information that can be used by the Optimizer for query optimization.

Summary

Hopefully this 7-part series on influencing DB2 access paths provided you with a nice overview of the options available to you and considerations for their use. If you are interested in learning more about SQL tuning and DB2 performance, consider purchasing the book from which this series was drawn: DB2 Developer's Guide 6th edition.

Happy SQL performance tuning!

Tuesday, August 25, 2015

Influencing the DB2 Optimizer: Part 6 - Using Optimization Hints

So far we have discussed several standard methods of influencing DB2’s access path selection -- in the first 4 parts of this series (1, 2, 3, 4) -- as well as updating DB2 Catalog statistics in part 5. But there is another tool in your DBA arsenal that you can use to force/influence access path selection - optimization hints, which will be the focus of today's post.


Using Optimization Hints to Force an Access Path

Optimization hints can be coded to influence the DB2 Optimizer's choice of access path. Actually, though, this method does not “influence” the access path; instead it directs DB2 to use a specific access path instead of determining a new access path using statistics.

The same basic cautions that apply to modifying DB2 Catalog statistics also apply to optimization hints. Only experienced analysts and DBAs should attempt to use optimization hints. However, optimization hints are much easier to apply than updating DB2 Catalog statistics.

There are several methods of implementing optimization hints. First, you can code optimization hints using the PLAN_TABLE. However, before you can use optimization hints, the DB2 DSNZPARM parameter for optimization hints (OPTHINTS) must be set to YES. If it is set to NO, you cannot use optimization hints.



There are two ways to specify an optimization hint to the PLAN_TABLE:
  • Modify PLAN_TABLE data to use an access path that was previously created by the DB2 optimizer, or;
  • INSERT rows to the PLAN_TABLE to create a new access path independently.


In general, favor the first method over the second . It is a difficult task to create from scratch an accurate access path in the PLAN_TABLE. If you do not get every nuance of the access path correct, it is possible that DB2 will ignore the optimization hint and calculate an access path at bind time. However, if you use an access path that was originally created by DB2, you can be reasonably sure that the access path will be valid. Of course, sometimes an access path created for an older version of DB2 will not be valid in a newer version of DB2. 

You should consider using optimization hints for the same reasons you would choose to modify DB2 Catalog statistics or tweak SQL. The general reason is to bypass the access path chosen by DB2 and use a different, hopefully more efficient, access path.

In addition to this reason, optimization hints are very useful as you migrate from release to release of DB2. Sometimes, a new release or version of DB2 can cause different access paths to be chosen for queries that were running fine. Or perhaps new statistics were accumulated between binds causing access paths to change. By saving old access paths in a PLAN_TABLE, you can use optimization hints to direct DB2 to use the old access paths instead of the new, and perhaps undesirable, access paths due to the new release or statistics.Of course, ever since DB2 9 for z/OS, you can use the plan management feature to save old access paths across rebinds. This is a more effective approach than saving them yourself.

Always test and analyze the results of any query that uses optimization hints to be sure that the desired performance is being achieved.

Defining an Optimization Hint  

To specify that an optimization hint is to be used, you will have to ensure that the PLAN_TABLE has the appropriate columns in it: 

OPTHINT              VARCHAR(128)  NOT NULL WITH DEFAULT
HINT_USED            VARCHAR(128)  NOT NULL WITH DEFAULT
PRIMARY_ACCESSTYPE   CHAR(1)       NOT NULL WITH DEFAULT

To set an optimization hint, you need to first identify (or create) the PLAN_TABLE rows that refer to the desired access path. You will then need to update those rows in the PLAN_TABLE, specifying an identifier for the hint in the OPTHINT column. For example,

UPDATE PLAN_TABLE
   SET OPTHINT = ‘SQLHINT’
WHERE  PLANNO = 50
AND    APPLNAME = ‘PLANNAME’;

Of course, this is just an example. You may need to use other predicates to specifically identify the PLAN_TABLE rows to include in the optimization hint. Some columns that might be useful, depending on your usage of dynamic SQL and packages, include QUERYNO, PROGNAME, VERSION, and COLLID.

Keep in mind, though, that when you change a program that uses static SQL statements, the statement number might change, causing rows in the PLAN_TABLE to be out of sync with the modified application. For this reason, you should probably choose the newer method of specifying optimization hints, which I will discuss in a moment.

You can use the QUERYNO clause in SQL statements to ease correlation of SQL statements in your program with your optimization hints. Statements that use the QUERYNO clause are not dependent on the statement number. To use QUERYNO, you will need to modify the SQL in your application to specify a QUERYNO, as shown in the following:

SELECT MGRNO
FROM   DSN81010.DEPT
WHERE  DEPTNO = ‘A00’
QUERYNO 200;

You can then UPDATE the PLAN_TABLE more easily using QUERYNO and be sure that the optimization hint will take effect, as shown in the following:

UPDATE PLAN_TABLE
   SET OPTHINT = ‘SQLHINT’
WHERE  QUERYNO = 200 
AND    APPLNAME = ‘PLANNAME’;

When the PLAN_TABLE is correctly updated (as well as possibly the application), you must REBIND the plan or package to determine if the hint is being used by DB2. When rebinding you must specify the OPTHINT parameter:

REBIND PLAN PLANNAME . . . OPTHINT(SQLHINT)

Be aware that the optimization hints may not actually be used by DB2. For optimization hints to be used, the hint must be correctly specified, the REBIND must be accurately performed, and the environment must not have changed. For example, DB2 will not use an access path specified using an optimization hint if it relies on an index that has since been dropped.


Use EXPLAIN(YES) to verify whether the hint was actually used. If the hint was used, the HINT_USED column for the new access path will contain the name of the optimization hint (such as SQLHINT in the previous example).


Optimization Hints and DB2 10 for z/OS

As of DB2 Version 10, applying optimization hints is much easier than it was in prior versions of DB2. You can build the access path repository using the BIND QUERY command. The access path repository contains system-level access path hints and optimization options.

These statement-level optimization hints, also known as instance-based statement hints, are enforced across the entire DB2 subsystem based upon the actual text of the SQL statement.

The access path repository contains information about queries including the text of the text, access paths, and optimization options. Using the access path repository, you can save multiple copies of access paths and switch back and forth between different copies of access paths for the same query. The access path repository resides in the DB2 Catalog and is composed of the following tables:

  • The primary table in the access path repository is SYSIBM.SYSQUERY. It contains one row for each static or dynamic SQL query.
  • SYSIBM.SYSQUERYPLAN holds access path details for queries in SYSQUERY. A query can have more than one access path for it.
  • SYSIBM.SYSQUERYOPTS stores miscellaneous information about each query.



The access path repository tables are populated when you issue the BIND QUERY command. Before running BIND QUERY, though, you must first populate the DSN_USERQUERY_TABLE with the query text you want to bind. Table 1 below depicts the columns of the DSN_USERQUERY_TABLE.

Table 1. DSN_USERQUERY_TABLE Columns
Column
Description
QUERYNO
An integer value that uniquely identifies the query. It can be used to correlate data in this table with the PLAN_TABLE.

SCHEMA
The default schema name to be used for unqualified database objects used in the query (or blank).

HINT_SCOPE
The scope of the access plan hint:
System-level access plan hint
Package-level access plan hint

QUERY_TEXT
The actual text of the SQL statement.

USERFILTER
A filter name that can be used to group a set of queries together (or blank).

OTHER_OPTIONS
IBM use only (or blank).

COLLECTION
The collection name of the package (option for package-level access plan hints).

PACKAGE
The name of the package (option for package-level access plan hints).

VERSION
The version of the package (option for package-level access plan hints); if '*' is specified, DB2 uses only COLLECTION and PACKAGE values to look up rows in the SYSIBM.SYSPACKAGE and SYSIBM.SYSQUERY catalog tables.

REOPT
The value of the REOPT BIND parameter:
A: REOPT(AUTO)
1: REOPT(ONCE)
N: REOPT(NONE)
Y: REOPT(ALWAYS)
blank: Not specified

STARJOIN
Contains an indicator specifying whether star join processing was enabled for the query:
Y: Yes, star join enabled
N: No, star join disabled
blank: Not specified.

MAX_PAR__DEGREE
The maximum degree of parallelism (or -1 if not specified).

DEF_PAR_DEGREE
Indicates whether parallelism was enabled:
ONE: Parallelism disabled
ANY: Parallelism enabled
Blank: Not specified

SJTABLES
Contains the minimum number of tables to qualify for star join (or [nd]1 if not specified).

QUERYID
Identifies access plan hint information in the SYSIBM.SYSQUERY and SYSIBM.SYSQUERYPLAN tables.

OTHER_PARMS
IBM use only (or blank).


You can populate the DSN_USERQUERY_TABLE using simple INSERT statements. Consider using a subselect from the DB2 Catalog to assist, such as in the following example: 

INSERT INTO DSN_USERQUERY_TABLE          
  (QUERYNO, SCHEMA, HINT_SCOPE, QUERY_TEXT, USERFILTER,
   OTHER_OPTIONS, COLLECTION, PACKAGE, VERSION, REOPT, 
   STARJOIN, MAX_PAR_DEGREE, DEF_CURR_DEGREE, SJTABLES, 
   OTHER_PARMS)                                        
SELECT SPS.STMTNO, 'SCHEMANAME', 1, SPS.STATEMENT, '', 
       '', SPS.COLLID, SPS.NAME, SPS.VERSION, '', 
       '', -1, '', -1, ''
FROM   SYSIBM.SYSPACKSTMT SPS
WHERE  SPS.COLLID = 'COLLID1'
AND    SPS.NAME = 'PKG1'
AND    SPS.VERSION = 'VERSION1'
AND    SPS.STMTNO = 177;

This particular INSERT statement retrieves data from SYSIBM.SYSPACKSTMT for statement number 177 of VERSION1 of the PKG1 package in collection COLLID1.

You still must populate the PLAN_TABLE with hints. Then you need to run the BIND QUERY command to build the data in the access path repository. Running BIND QUERY  LOOKUP(NO) reads the statement text, default schema, and bind options from DSN_USERQUERY_TABLE, as well as the system-level access path hint details from correlated PLAN_TABLE rows, and inserts the data into the access path repository tables.

After you have populated the access path repository, it is a good idea to delete the statement from the DSN_USERQUERY_TABLE. Doing so ensures that hints are not replaced when you issue subsequent BIND QUERY commands. 

When a statement-level hint is established, DB2 tries to enforce it for that statement. Hints for static SQL statements are validated and applied when you REBIND the package that contains the statements. Hints for dynamic SQL statements are validated and enforced when the statements are prepared.

To remove hints from the access path repository, use the FREE QUERY command.

Runtime Options Hints

You can also use BIND QUERY to set run-time options for a given SQL statement text. This allows you to control how a particular statement should behave without trying to force its actual access path.
To specify a run-time options hint, INSERT a row into DSN_USERQUERY_TABLE indicating the QUERYNO as you would for an optimization hint. Before running BIND QUERY, however, make sure that the PLAN_TABLE does not contain a corresponding access path for the query.

If the PLAN_TABLE contains an access path when you run BIND QUERY LOOKUP(NO), the SYSIBM.SYSQUERYOPTS table won't be populated. If you err and issue the BIND QUERY when data exists in the PLAN_TABLE, simply DELETE the pertinent PLAN_TABLE data and rerun the BIND QUERY command.

The REOPT, STARJOIN, MAX_PAR_DEGREE, DEF_CURR_DEGREE, SJTABLES, and GROUP_MEMBER columns can be used to set the run-time options using DSN_USERQUERY_TABLE.

Summary

Optimization hints provide a strong option for particularly vexing DB2 tuning problems. Consider investigating their usage for troublesome queries that once ran efficiently (and you still have the PLAN_TABLE data) but for whatever reasons, are no longer being optimized effectively.