New OLAP Capabilities [DB2 9 for z/OS]
DB2 9 for z/OS offers several new SQL improvements focused on improving OLAP functionality. The OLAP functions provide the ability to return ranking, row numbering, and existing aggregate function information as a scalar value in the result of a query. You can include OLAP specifications in an expression, in a select-list, or in the ORDER BY clause of a select-statement.
The result to which the OLAP specification is applied is the result table of the innermost subselect that includes the OLAP specification. OK, at this point you might well be asking, “So, what are these new OLAP things and what can they do for me?” Let’s take them one at a time.
First up, we have the RANK and DENSE_RANK functions. These functions specify that the ordinal rank of a row within the specified window is computed. Rows that are not distinct with respect to the ordering within the specified window are assigned the same rank. Here is a quick example:
SALARY+BONUS+COMM AS TOTAL_COMP,
RANK() OVER(ORDER BY SALARY+BONUS+COMM DESC) AS RANK_COMP
WHERE SALARY+BONUS+COMM > 30000
ORDER BY LASTNAME;
This query will rank employees who have total compensation greater than $30,000, but order the results by last name. This allows you to rank data differently than the order in which it is presented.
You can define the results of ranking with gaps in the sequential rank numbering by using the RANK specification, or without gaps, by using the DENSE_RANK specification.
The difference between the two can be a bit difficult to decipher at first. Think of it this way: RANK specifies that the rank of a row is defined as 1 plus the number of rows that strictly precede the row. So, if two or more rows are not distinct you will get gaps in the ranking. With DENSE_RANK the rank of a row is defined as 1 plus the number of preceding rows that are distinct with respect to the ordering. In this case there will be no gaps in the sequential rank numbering. Consider the following data:
EMPNO LASTNAME FIRSTNAME SALARY BONUS COMM
----- -------- --------- ------ ------ ------
100 MULLINS CRAIG 500000 100000 400000
200 SHAW DONNA 25000 10000 0
300 ABBOTT RANDY 700000 300000 0
400 WISER BUD 10000 0 0
500 GREEN RACHEL 40000 2000 5000
The results of the previous query run against this data would look like this:
300 ABBOTT RANDY 1000000 1
500 GREEN RACHEL 47000 3
100 MULLINS CRAIG 1000000 1
200 SHAW DONNA 35000 4
400 WISER BUD 10000 5
Note that both ABBOTT and MULLINS earn the most, but the amount is the same, so they share the number one ranking. As this is not a dense rank, the next rank value is 3, and not 2.
The next OLAP specification introduced by DB2 9 for z/OS is ROW_NUMBER. ROW_NUMBER specifies that a sequential row number is computed for the row that is defined by the ordering, starting with 1 for the first row. If the ORDER BY clause is not specified in the window, the row numbers are assigned to the rows in an arbitrary order, as the rows are returned. This satisfies an often-requested capability to simply assign a number to the result rows of a query. Row numbers also enable easy formulation of queries for computing histogram statistics and they enable formation of other OLAP specifications (for example, moving sums, moving averages, and so on).
Here is a sample query using ROW_NUMBER:
ROW_NUMBER() OVER(ORDER BY WORKDEPT, LASTNAME) AS NUMBER,
ORDER BY WORKDEPT, LASTNAME;
The result of a RANK, DENSE_RANK, and ROW_NUMBER specification is BIGINT, and the result cannot be null.
One more thing to consider for the new OLAP features is the ability to partition results for the OLAP specification. This is specified using the PARTITION BY clause with a partitioning-expression, which is an expression used to define the partitioning of the result table. Each column name that is referenced in a partitioning-expression must unambiguously reference a column of the result table of the subselect that contains the OLAP specification. A partitioning-expression cannot include a scalar-fullselect or any function that is not deterministic or has an external action.
Here is an SQL example using partitioning:
DENSE RANK() OVER
(PARTITION BY WORKDEPT ORDER BY EDLEVEL DESC) AS RANK_EDLEVEL
ORDER BY WORKDEPT, LASTNAME;
This SQL ranks departments according to their education level. And because DENSE_RANK is specified multiple employees with the same rank in the department will not increase the next ranking.
Sometimes RANK, DENSE_RANK, and ROW_NUMBER are called window functions. An OLAP specification is not valid in a WHERE, VALUES, GROUP BY, HAVING, or SET clause. An OLAP specification cannot be used as an argument of an aggregate function. When invoking an OLAP specification, a window is specified that defines the rows and order of the rows over which the function is applied.