Monday, May 21, 2007

New OLAP Capabilities [DB2 9 for z/OS]

DB2 9 for z/OS offers several new SQL improvements focused on improving OLAP functionality. The OLAP functions provide the ability to return ranking, row numbering, and existing aggregate function information as a scalar value in the result of a query. You can include OLAP specifications in an expression, in a select-list, or in the ORDER BY clause of a select-statement.


The result to which the OLAP specification is applied is the result table of the innermost subselect that includes the OLAP specification. OK, at this point you might well be asking, “So, what are these new OLAP things and what can they do for me?” Let’s take them one at a time.


First up, we have the RANK and DENSE_RANK functions. These functions specify that the ordinal rank of a row within the specified window is computed. Rows that are not distinct with respect to the ordering within the specified window are assigned the same rank. Here is a quick example:


SELECT
EMPNO,
LASTNAME,
FIRSTNAME,
SALARY+BONUS+COMM AS TOTAL_COMP,
RANK() OVER(ORDER BY SALARY+BONUS+COMM DESC) AS RANK_COMP
FROM EMP
WHERE SALARY+BONUS+COMM > 30000
ORDER BY LASTNAME;


This query will rank employees who have total compensation greater than $30,000, but order the results by last name. This allows you to rank data differently than the order in which it is presented.

You can define the results of ranking with gaps in the sequential rank numbering by using the RANK specification, or without gaps, by using the DENSE_RANK specification.

The difference between the two can be a bit difficult to decipher at first. Think of it this way: RANK specifies that the rank of a row is defined as 1 plus the number of rows that strictly precede the row. So, if two or more rows are not distinct you will get gaps in the ranking. With DENSE_RANK the rank of a row is defined as 1 plus the number of preceding rows that are distinct with respect to the ordering. In this case there will be no gaps in the sequential rank numbering. Consider the following data:


EMPNO LASTNAME FIRSTNAME SALARY BONUS COMM
----- -------- --------- ------ ------ ------
100 MULLINS CRAIG 500000 100000 400000
200 SHAW DONNA 25000 10000 0
300 ABBOTT RANDY 700000 300000 0
400 WISER BUD 10000 0 0
500 GREEN RACHEL 40000 2000 5000



The results of the previous query run against this data would look like this:


300 ABBOTT RANDY 1000000 1
500 GREEN RACHEL 47000 3
100 MULLINS CRAIG 1000000 1
200 SHAW DONNA 35000 4
400 WISER BUD 10000 5


Note that both ABBOTT and MULLINS earn the most, but the amount is the same, so they share the number one ranking. As this is not a dense rank, the next rank value is 3, and not 2.


The next OLAP specification introduced by DB2 9 for z/OS is ROW_NUMBER. ROW_NUMBER specifies that a sequential row number is computed for the row that is defined by the ordering, starting with 1 for the first row. If the ORDER BY clause is not specified in the window, the row numbers are assigned to the rows in an arbitrary order, as the rows are returned. This satisfies an often-requested capability to simply assign a number to the result rows of a query. Row numbers also enable easy formulation of queries for computing histogram statistics and they enable formation of other OLAP specifications (for example, moving sums, moving averages, and so on).


Here is a sample query using ROW_NUMBER:



SELECT
ROW_NUMBER() OVER(ORDER BY WORKDEPT, LASTNAME) AS NUMBER,
LASTNAME,
SALARY
FROM EMP
ORDER BY WORKDEPT, LASTNAME;


The result of a RANK, DENSE_RANK, and ROW_NUMBER specification is BIGINT, and the result cannot be null.


One more thing to consider for the new OLAP features is the ability to partition results for the OLAP specification. This is specified using the PARTITION BY clause with a partitioning-expression, which is an expression used to define the partitioning of the result table. Each column name that is referenced in a partitioning-expression must unambiguously reference a column of the result table of the subselect that contains the OLAP specification. A partitioning-expression cannot include a scalar-fullselect or any function that is not deterministic or has an external action.


Here is an SQL example using partitioning:



SELECT
WORKDEPT,
EMPNO,
LASTNAME,
FIRSTNAME,
EDLEVEL,
DENSE RANK() OVER
(PARTITION BY WORKDEPT ORDER BY EDLEVEL DESC) AS RANK_EDLEVEL
FROM EMP
ORDER BY WORKDEPT, LASTNAME;


This SQL ranks departments according to their education level. And because DENSE_RANK is specified multiple employees with the same rank in the department will not increase the next ranking.


Sometimes RANK, DENSE_RANK, and ROW_NUMBER are called window functions. An OLAP specification is not valid in a WHERE, VALUES, GROUP BY, HAVING, or SET clause. An OLAP specification cannot be used as an argument of an aggregate function. When invoking an OLAP specification, a window is specified that defines the rows and order of the rows over which the function is applied.

5 comments:

Manya said...

Hi Craig,

I am working on OZ390 and getting the Error: SQL0104N An unexpected token "(" was found following "". Expected tokens may include: ", FROM INTO". SQLSTATE=42601 while using the RANK funtion in the query. Still this function cant be used in OZ390?
Pls provide your inputs, it is being quite urgent.

Query is:


SELECT deptid, date, managerid RANK() OVER(PARTITION BY deptid order by date DESC) AS RANK
FROM dept_table WHERE status = 'A' and date <= current date AND RANK =1 ORDER BY effdt

Loads of Thnx in advance!

Manya

Random thoughts said...

Hi Craig, I'm also getting the same error as Manya when trying to use it in our code. Can you please help us with this.
Regards,
Vineela

punitha said...

hi

How to calculate range and its avg
Input

col1
40
30
25
5

output
col2:range=row1-row2
col3:avg(range)
col1 col2(range) col3(avg(range))
40 null or 0 8.75
30 10 8.75
25 5 8.75
5 20 8.75


How to achieve in DB2

John said...

It should be DENSERANK() as one word, not DENSE RANK() as two words. Also see http://www-07.ibm.com/tw/imc/seminar/download/07d3_1.pdf

Craig S. Mullins said...

Actually, it is DENSE_RANK. The underscore is there in the SQL, but unfortunately the combination of font, size and HTML rendering is obscuring the underscore.