Thursday, April 16, 2009

Stages 3 and 4

All good DB2 developers and DBAs know about Stage 1 and Stage 2 predicates, right? But have you ever heard of Stage 3 and Stage 4 predicates? Well, you’re about to!


First of all, let’s do a quick review to catch those readers who don’t know what Stage 1 and 2 are. You may have heard about sargable and nonsargable, and if so, Stage 1 is sargable and Stage 2 is nonsargable. If not, don’t worry about those terms, they are obsolete.


A predicate that can be evaluated in the Data Manager (DM) component of DB2, that is at the earliest stage of query execution, is called a Stage 1 predcicate. Stage 2 predicates need to be passed up to the Relational Data System (RDS) to process. So Stage 1 predicates are more efficient than Stage 2 predicates because the Data Manager component of DB2 is at a level closer to the data than the Relational Data System. Stage 1 predicates, being evaluated earlier in the data retrieval process, avoid the overhead of passing data from component to component of DB2. For this reason, developers are encourage to use Stage 1 predicates rather than Stage 2 predicates to optimize performance.


What makes a predicate Stage 2 instead of Stage 1? Well, it is all in the type of predicate you code and how you write your SQL. There is a list of Stage 1 and Stage 2 predicates in Chapter 12 of the DB2 Performance and Tuning manual. (The same chart also tells you whether a predicates is indexable or not.) Whenever you move from one release of DB2 to another one of the first things you should do is consult this manual to see if any predicates have changed from Stage 2 to Stage 1… and you should make sure all of your developers have a copy of that chart taped to their cubicle wall!


OK, so what is all of this about Stage 3 and Stage 4, then? Well, it is a way of thinking about some bad SQL practices. Instead of coding a SQL predicate some programmers choose to bring all (or most) of the data into their program and then filter it using IF-THEN or CASE statements. You can think of these as Stage 3 predicates because it is one more place that the data must be passed to before it can be determined whether the data is needed.


Stage 4? That is when you use a black box (see the link for an explanation if you don't know what a black box is)... Instead of filtering the data in the DM or the RDS or even in your program, you have to work with another program altogether – the black box – to return the right data.


So just remember 1… 2… 3… 4… and that is the order of efficiency for those types of predicates. 1 is better than 2 is better than 3 is better than 4…

No comments: