Thursday, October 01, 2015

Understanding DB2 SELECT Syntax

Most DB2 programmers think they know how to correctly code simple SQL SELECT statements. And they usually are correct, as long as you keep that adjective “simple” in the assertion. When the statement requires more than SELECT...FROM…WHERE though, problems can ensue.

One of the biggest SELECT type of problem encountered by DB2 users is related to syntax. To paraphrase Mark Twain, sometimes what people think they know, just ain’t so.

How can you find the proper syntax for SELECT statements? The DB2 SQL Reference manual contains all of the syntax for DB2 SQL, but query syntax is separated from the rest of the language. Typically, users go to Chapter 5 of the SQL Reference that contains syntax diagrams, semantic descriptions, rules, and examples of the use of DB2 SQL statements. SELECT INTO is there, but SELECT is not in this section. Well, actually, there is a placeholder page that refers the reader back to Chapter 4. Chapter 4 contains the detailed syntax information and usage details for DB2 queries using SELECT. 

Another potentially confusing aspect of DB2 SELECT is the breakdown of SELECT into three sections: fullselect, subselect, and select-statement. This causes many developers to confuse which query options are available to the SELECT statements they want to code.

Let's take a look at each of these three sections:

First up is the select-statement, which is the form of a query that can be directly specified in a DECLARE CURSOR statement, or prepared and then referenced in a DECLARE CURSOR statement. It is the thing most people think of when they think of SELECT in all its glory. If so desired, it can be issued interactively using SPUFI. The select-statement consists of a fullselect, and any of the following optional clauses: WITH CTE, update, read-only, optimize-for, isolation, queryno and SKIP LOCKED DATA.

A CTE, or common table expression, defines a result table with a table-identifier that can be referenced in any FROM clause of the fullselect that follows. Multiple CTEs can be specified following a single WITH keyword. Each specified CTE can also be referenced by name in the FROM clause of subsequent common table expressions.

The next component is a fullselect, which can be part of a select-statement, a CREATE VIEW statement, a materialized query table, a temporary table or an INSERT statement. Basically, a fullselect specifies a result table. A fullselect consists of at least a subselect, possibly connected to another subselect via UNION, EXCEPT or INTERSECT. And ever since DB2 version 9 for z/OS, you can apply either or both ORDER BY and FETCH FIRST clauses. Prior to V9, this sometimes confused folks as they tried to put a FETCH FIRST n ROWS clause or an ORDER BY in a view or as part of an INSERT. That was not allowed! But it is now.

However, a fullselect does not allow any of the following clauses: FOR FETCH ONLY, FOR UPDATE OF, OPTIMIZE FOR, WITH, QUERYNO and SKIP LOCKED DATA. A fullselect specifies a result table – and none of these afore-mentioned clauses apply.

This sometimes confuses folks. I recently had a conversation with a guy who swore that at one point he created a view using the WITH UR clause and that it worked. It didn’t when we spoke and I’m sure it never did.

Finally, a subselect is a component of the fullselect. A subselect specifies a result table derived from the result of its first FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next.

I know, this can all seem to be a bit confusing. But think of it this way: in a subselect you specify the FROM to get the tables, the WHERE to get the conditions, GROUP BY to get aggregation, HAVING to get the conditions on the aggregated data, and the SELECT clause to get the actual columns. In a fullselect you add in the UNION to combine subselects and other fullselects. Finally, you add on any optional clauses (as specified earlier) to get the select-statement.

Now what could be any easier?

Actually, it is not super easy. And if you add in some of the newer SQL capabilities, like OLAP functions or temporal time travel query clauses, it gets even more complicated.

I guess the bottom line is that you really should make sure you have the SQL Reference handy (see link above) if you are doing anything other than simple selecting… because you’ll probably need it.

No comments: