Most
DB2 programmers think they know how to correctly code simple SQL SELECT
statements. And they usually are correct, as long as you keep that adjective
“simple” in the assertion. When the statement requires more than
SELECT...FROM…WHERE though, problems can ensue.
One
of the biggest SELECT type of problem encountered by DB2 users is related to
syntax. To paraphrase Mark Twain, sometimes what people think they know, just
ain’t so.
How
can you find the proper syntax for SELECT statements? The DB2 SQL Reference
manual contains all of the syntax for DB2 SQL, but query syntax is separated
from the rest of the language. Typically, users go to Chapter 5 of the SQL
Reference that contains syntax diagrams, semantic descriptions, rules, and
examples of the use of DB2 SQL statements. SELECT INTO is there, but SELECT is
not in this section. Well, actually, there is a placeholder page that refers
the reader back to Chapter 4. Chapter 4 contains the detailed syntax
information and usage details for DB2 queries using SELECT.
Another potentially confusing aspect of DB2 SELECT is the breakdown of SELECT into three
sections: fullselect, subselect, and select-statement. This causes many
developers to confuse which query options are available to the SELECT
statements they want to code.
Let's take a look at each of these three sections:
First up is the select-statement, which is
the form of a query that can be directly specified in a DECLARE CURSOR
statement, or prepared and then referenced in a DECLARE CURSOR statement. It is
the thing most people think of when they think of SELECT in all its glory. If
so desired, it can be issued interactively using SPUFI. The select-statement
consists of a fullselect, and any of the following optional clauses: WITH CTE, update,
read-only, optimize-for, isolation, queryno and SKIP LOCKED DATA.
A CTE, or common table expression, defines a result table with a table-identifier that can be referenced in any FROM clause of the fullselect that follows. Multiple CTEs can be specified following a single WITH keyword. Each specified CTE can also be referenced by name in the FROM clause of subsequent common table expressions.
The
next component is a fullselect,
which can be part of a select-statement, a CREATE VIEW statement,
a materialized query table, a temporary table or an INSERT statement. Basically,
a fullselect specifies a result table. A fullselect consists of at least a
subselect, possibly connected to another subselect via UNION, EXCEPT or
INTERSECT. And ever since DB2 version 9 for z/OS, you can apply either or both ORDER
BY and FETCH FIRST clauses. Prior to V9, this sometimes confused folks as they
tried to put a FETCH FIRST n ROWS clause or an ORDER BY in a view or as part of
an INSERT. That was not allowed! But it is now.
However,
a fullselect does not allow any of the following
clauses: FOR FETCH ONLY, FOR UPDATE OF, OPTIMIZE FOR, WITH, QUERYNO and SKIP
LOCKED DATA. A fullselect specifies a result table – and none of these
afore-mentioned clauses apply.
This
sometimes confuses folks. I recently had a conversation with a guy who swore
that at one point he created a view using the WITH UR clause and that it
worked. It didn’t when we spoke and I’m sure it never did.
Finally,
a subselect is a component of the fullselect. A subselect
specifies a result table derived from the result of its first FROM clause. The
derivation can be described as a sequence of operations in which the result of
each operation is input for the next.
I know, this can all seem to be a bit confusing. But think of it this way: in a subselect you specify the
FROM to get the tables, the WHERE to get the conditions, GROUP BY to get
aggregation, HAVING to get the conditions on the aggregated data, and the
SELECT clause to get the actual columns. In a fullselect you add in the UNION
to combine subselects and other fullselects. Finally, you add on any optional clauses
(as specified earlier) to get the select-statement.
Now
what could be any easier?
Actually,
it is not super easy. And if you add in some of the newer SQL capabilities,
like OLAP functions or temporal time travel query clauses, it gets even more
complicated.
I
guess the bottom line is that you really should make sure you have the SQL
Reference handy (see link above) if you are doing anything other than simple selecting… because
you’ll probably need it.