Partitioned table spaces have been with us in the world of DB2 forever, but even as we embark on moving to Universal table spaces, there still exists some confusion "out there" in terms of the behavior of partitioning...
For example, consider this question:
If a table is in a partitioned table space, say four partitions, for example, then to process the table in batch can I run four instances of the batch program in parallel, one against each partition?
This type of question exhibits some confusion about how partitioned table spaces work. First of all, let's assume that we are talking about classic partitioned table spaces and/or Universal range-partitioned table spaces.
Given that assumption, yes, you can run
four instances of a batch program in parallel if you so desire. Of course, that
is not necessary to get DB2 use parallel tasks to read the data. The best approach is to BIND the program (package) specifying
the DEGREE(ANY) parameter. Then DB2
will choose the degree of parallelism for the batch program. This is simpler
than any alternative approach you might choose to code up because DB2
handles it all for you.
If you still wish to
run four instances of the batch program you, of course, can. In that case you
would probably want to BIND using
DEGREE(1). In order for this to work the way I infer that you intend it to
work, however, you may have to modify the program
somewhat. I assume that you wish each job to process only against one of the four
partitions. To accomplish this, you must provide some way for the program to identify and process only the
data from one of the four partitions
based on the partitioning key range specified in the partitioning index.
For
example, you might choose to input parameters to
the batch run specifying the key range for that program to process. As long as
the program adheres to that key range
you should only process data from the one partition that holds that data.
Of course, that means that your programmers must be diligent in applying the key range. Perhaps you could set up 4 views that apply the key range for each partition and then use only those views in each program. But that means you will have multiple versions of the same program. You could also specify the range as an input variable and then use only one program. But that means you have to make sure that you are sending the proper range to the input variables.
So just go with the first approach and let DB2 do it for you when it makes sense...
No comments:
Post a Comment