The Db2 Portal Blog

Monday, April 17, 2017

The DB2 12 for z/OS Blog Series - Part 9: Piece-wise Deletion

Adding the FETCH FIRST clause to the DELETE statement at first appears to be a very simple enhancement, but upon closer examination it is really quite powerful.

Prior to DB2 12 for z/OS, the FETCH FIRST n ROWS ONLY clause could be specified on a SELECT statement. The clause had two impacts:

the number specified for n is used by the optimizer to formulate an access path
the result set of the SELECT is limited to no more than n rows

With the advent of FETCH FIRST being allowed in a DELETE statement, the number n limits the number of rows that will be deleted in a single DELETE statement. So let's assume that there are 1000 rows for employees in department 200. When we issue this DELETE statement

DELETE FROM EMP

WHERE EMPDEPT = '200';

All 1000 rows will be deleted. However, if we issue this version of the same statement

DELETE FROM EMP

WHERE EMPDEPT = '200'

FETCH FIRST 500 ROWS ONLY;

Only 500 rows will be deleted... you could then run it again to delete the remaining 500 rows.

So why would you want to do this? Well, our little example here is not really a good case for using FETCH FIRST on DELETE . Instead, it is primarily designed for situations where a large number of rows would be impacted. For example, assume that instead of 1000 rows there were 2 millions rows. Using FETCH FIRST to DELETE the rows in batches, instead of 2 million all at once, can make an impossible task possible. The lock management when deleting 2 million rows can render a big, bulk deletion unwieldy as it impacts concurrent access of the data on the same pages where rows are being deleted.

So keep FETCH FIRST in your arsenal of DB2 12 SQL tools that can help when you need to DELETE a large number of rows.

Tuesday, April 04, 2017

The DB2 12 for z/OS Blog Series – Part 8: Index FTBs

IBM has delivered many new in-memory processing capabilities in DB2 12 for z/OS, so much so that Gartner has dubbed DB2 12 for z/OS an in-memory DBMS. This is good news for those of us in the mainframe world looking to utilize memory to improve the performance of our database applications.

Perhaps the most interesting of the new in-memory features is the new Fast Traversal Blocks, or FTBs. An FTB is an in-memory structure that can be used with unique indexes. DB2 detects which indexes are frequently used for traversals, and when a threshold is hit DB2 will build an FTB for the index in a storage area outside the buffer pool. This causes the top levels of the index to be cached thereby making it efficient to perform very fast traversals through the cached levels of the index.

FTBs are either on or off for the entire DB2 subsystem. This is managed using the new DSNZPARM named INDEX_MEMORY_CONTROL. Setting this zparm to AUTO, which is the default, indicates that 500 MB or 20 percent of the buffer pool will be used for FTBs (whichever is larger). Alternatively, you can set the upper limit to a number between 10MB and 200 GB, or you can DISABLE the feature altogether.

It may be confusing to specify a percentage of the buffer pool for caching FTBs, especially so because FTBs are stored outside of DB2’s buffer pools – that means you will not be consuming valuable buffer pool space with FTBs because the FTBs are stored in their own area of memory.

FTBs are most likely to be used by DB2 shops that run many applications performing frequent lookups where the unique index is used predominantly for reads. In those scenarios FTBs may be able to deliver a significant performance improvement.

There are two new DB2 IFCID trace records that report on index FTB usage in DB2 12 for z/OS: IFCID 389 and 477. IFCID 389 traces indexes with FTB structures and IFCID 477 traces allocation and deallocation of FTB structures.

The type of information tracked by these ICFIDs includes the number of indexes with FTBs along with number of levels in the FTB and the size of the structure. Such details will be important for DBAs looking to manage and support index FTBs in DB2 12.

Monday, March 20, 2017

The DB2 12 for z/OS Blog Series – Part 7: Relative Page Number Table Spaces

One of the most significant new features for supporting big data in a DB2 12 environment is relative page numbering (or RPN) for range-partitioned table spaces. You can either create a new RPN range-partitioned table space, or an existing range-partitioned table space can be changed to RPN via an ALTER TABLESPACE with PAGENUM RELATIVE, followed by an online REORG of the entire table space.

But why would you want RPN table spaces instead of the already-existing table spaces types in DB2? The simple answer is the ability to grow the amount of data you store. RPN table spaces enable you to store large amounts of data. The DSSIZE can grow up to 1 TB for a partition. And the maximum table size increases to 4 PB with up to 256 trillion rows per table. That is a lot of data that can be stored! Think about it this way: if you were to insert 1000 rows per second it would take more than 8000 years to fill the table to capacity!

With RPN table spaces you get the ability to create larger partition sizes. The maximum partition size is now 1 TB (it used to be 256 GB). So if you are reaching the capacity size of existing table space options, RPN table spaces will allow you to store a LOT more data. Of course, this requires an expanded RID, which increases from 5 bytes to 7 bytes. So that will impact the DDL for the mapping table for your online REORG utilities.

But size is not the only issue. RPN table spaces improve availability, too. You can specify DSSIZE at the partition level for RPN table spaces. So each partition can have its own, different DSSIZE specification. Furthermore, the allowable DSSIZE value is no longer dependent on the page size and number of table space partitions. The DSSIZE change can be an immediate change (no online REORG required to take effect) as long as the change does not decrease the DSSIZE value.

As you move your DB2 subsystems to Version 12, consider migrating your larger range-partitioned table spaces to RPN to take advantage of these new capabilities.

Thursday, March 02, 2017

The DB2 12 for z/OS Blog Series – Part 6: Transferring Ownership of Database Objects

When a database object is created it is given a qualified two-part name. This applies to tables, indexes, table spaces, distinct types, functions, stored procedures, and triggers. The first part is the schema name (or the qualifier), which is either implicitly or explicitly specified. The default schema is the authorization ID of the owner of the plan or package. The second part is the name of the object.

But things can get confusing. When an object is created, an authorization ID is assigned as the owner of the object. This may, or may not, be used as the schema qualifier for the object. The object owner implicitly inherits privileges to reference, maintain and grant privileges to the object.

Changing the owner of a database object used to be a difficult process. But DB2 12 for z/OS delivers a simple method of transferring the ownership of your database objects: the TRANSFER OWNERSHIP SQL statement.

The primary purpose for this new feature is to make it easier to manage database objects that are owned by an employee who no longer works for your company. You can use TRANSFER OWNERSHIP to simply switch the ownership of the database objects to another employee. The new owner can be an authorization ID or a role. You can issue the statement interactively, or embed it in an application program.

The TRANSFER OWNERSHIP statement does not change the schema of the transferred database object.

In order to transfer ownership of a database object, you must either be the owner of the object or have SECADM authority. The basic syntax of the statement is as follows:

TRANSFER OWNERSHIP OF object-name
TO {USER authorization-name |
SESSION_USER |
ROLE role-name}
REVOKE PRIVILEGES

Be careful if a package depends on the current owner's privileges. The dependent package will be invalidated unless the current owner is already explicitly granted those privileges from another source prior to the object ownership transfer. For example, after the ownership of a table is transferred and if a dependent package requires the SELECT privilege on that table by the current owner, the dependent package is invalidated unless the current owner has already been explicitly granted the SELECT privilege for that table before its ownership transfer.

Here is a quick example transferring the ownership of a specific index to a different user, in this case, JOHNDOE.

TRANSFER OWNERSHIP OF INDEX TR_P.XHIST02
TO USER JOHNDOE
REVOKE PRIVILEGES;

Friday, February 24, 2017

The DB2 12 for z/OS Blog Series - Part 5: Enhancements for Managing Dynamic and Static SQL

Most modern application development is done using dynamic SQL. But some features work only with static SQL and others only with dynamic SQL. DB2 12 for z/OS delivers functionality that minimizes the difference between static and dynamic SQL.

Dynamic plan stability brings the plan stability feature of static SQL to dynamic SQL. With plan stability for static SQL, you can use the PLANMGMT parameter of REBIND to save off old access paths that can be switched back to active if the new access paths are inefficient for any reason.

DB2 12 introduces dynamic plan stability, which is a little different but should prove to be quite useful. You can choose to stabilize dynamic SQL access paths by storing them in the DB2 system catalog. DB2 will not just automatically start to capture all dynamic access paths. There are options for selecting which queries to stabilize, so you do not have to store all dynamic SQL. This is controlled by the command:

-START DYNQUERYCAPTURE

When a dynamic statement is run, DB2 will look in the dynamic statement cache first. If it is not found there, DB2 will look in the system catalog before resorting to a full prepare. This is particularly useful when statements are flushed from the DSC and a prepare would otherwise be required.

You can label a set of stabilized SQL statements into a group known as a stabilization group. This makes it easier for DBAs to track and manage stabilized queries.

Click here for more details on Dynamic Plan Stability.

So dynamic plan stability can make your dynamic SQL more static. But there is another new DB2 12 capability that can make your static SQL more dynamic: static Resource Limit Facility (RLF). The RLF, heretofore, could only be used to govern dynamic SQL statements. RLF tables, manipulated by DBAs, contain limits that make sure that dynamic SQL statements do not consume too many resources such as CPU, I/O, locks, etc. This enables improperly tested SQL or poor resource planning from disrupting performance.

But dynamic SQL is not the only type of SQL that could cause performance issues; static SQL transactions can benefit from the same controls. DB2 12 extends the RLF to support static SQL statements thereby improving the ability to avoid problem applications from dominating your system’s resource consumption.

Specifically, you can set up reactive governing for static SQL statements by adding new rows in resource limit facility tables. Static SQL statements will be governed by rows where RLFFUNC='A' (for DSNRLSTxx tables) and RLFFUNC='B' (for DSNRLMTxx tables).

You can control whether resource limits apply to dynamic only, static only or all SQL statements using the DSNZPARM RLFENABLE. Acceptable values are DYNAMIC, STATIC, or ALL and the default is DYNAMIC. Furthermore, you can specify the default resource limit actions for static SQL statements by setting two DSNZPARMS:

RLFERRSTC for local statements
RLFERRDSTC for remote statements

In each case, the acceptable values are NOLIMIT, NORUN, or a number between 1 and 500000. You use NOLIMIT to let any static SQL statement that does not correspond to a row in the resource limit table run freely. Alternately, NORUN indicates that any static SQL statement that does not correspond to a row in the resource limit table is prevented from running. Finally, if you specify a number between 1 and 500000, that will be the number of service units to use as the default resource limit. If the limit is exceeded, the SQL statement is terminated.

If you need more details on setting up resource limits in general, you can find that information in the IBM manuals and online here.

But the bottom line is that DB2 12 gives users more and better options for managing both their dynamic and static SQL performance. And that is definitely a good thing!

Monday, February 13, 2017

The DB2 12 for z/OS Blog Series - Part 4: Real Time Statistics History

Real Time Statistics (RTS) were introduced in DB2 for z/OS as part of the maintenance stream of Version 7 and IBM has been improving their functionality every release since.

For those uninitiated to what RTS are, they are a series of statistics that are automatically maintained by DB2 and can be used by DBAs (and tools) to schedule maintenance tasks like REORG and COPY.

Prior to the introduction of RTS, the only way to gather statistics about DB2 database structures was by running the RUNSTATS utility. RUNSTATS collects statistical information about DB2 database objects and stores this data in the DB2 Catalog. There are two types of statistics collected by RUNSTATS: data used by the optimizer to formulate efficient access paths and data used by DBAs to determine when maintenance tasks should be run. The optimization statistics are still required, and therefore, so is RUNSTATS. The maintenance statistics, on the other hand, can be ignored and replaced by RTS.

RTS runs in the background and automatically updates statistics in two system catalog tables (one for table space data and one for index data) as your DB2 data is modified. This differs from RUNSTATS, which is a hands-on administrative process. RTS is hands-off.

So what is new with RTS in DB2 12 for z/OS?

The most significant enhancement is the ability to maintain RTS history using DB2’s system-time temporal capabilities. When history is enabled, both SYSIBM.SYSINDEXSPACESTATS and SYSIBM.SYSTABLESPACESTATS will have a table with "_H" on the end that have the same columns, with the same data types. But the temporal relationship is not automatically enabled, you have to choose to do so.

To enable the temporal relationship between a history table and its associated catalog table you must issue an ALTER statement to “hook up” the relationship. Here are the DDL statement for each of the RTS tables:

ALTER TABLE SYSIBM.SYSINDEXSPACESTATS

ADD VERSIONING

USE HISTORY TABLE SYSIBM.SYSIXSPACESTATS_H;

ALTER TABLE SYSIBM.SYSTABLESPACESTATS

ADD VERSIONING

USE HISTORY TABLE SYSIBM.SYSTABSPACESTATS_H;

Before turning on RTS history you should take care to build a process for maintaining, consolidating and purging historical information. Remember, that each time the STATSINT interval is reached, new RTS data will be written, causing the old information to be written to the system time temporal table. That history table will grow over time. You will want to have a plan for how to manage that growth. Perhaps you will want to aggregate the data daily, or weekly, or monthly. Or maybe you will just want to purge the data after a period of time. At any rate, your strategy for managing this data should be worked out before you decide to start collecting RTS history.

Why would you want to record historical RTS information? Well, such details can be useful for analyzing activity and predicting future requirements. With historical RTS you can see trends and might be able to improve performance, maintenance tasks, and service to your end users. And, of course, this means that you can write SQL time travel queries against the RTS system catalog tables.

You can always turn off historical RTS collection by severing the temporal relationship. This is accomplished by issuing ALTER TABLE specifying the DROP VERSIONING clause on the appropriate table. For example:

ALTER TABLE SYSIBM.SYSTABLESPACESTATS

DROP VERSIONING;

At any rate, the ability to take advantage of DB2’s temporal capabilities to capture RTS history is a powerful new feature of DB2 12 for z/OS that DBAs and performance analysts will want to take advantage of.

One other DB2 12 change to RTS worth noting, is the addition of the GETPAGES column to both table space and index space stats. This column is used to record the number of GETPAGE requests for the table space since the last time REORG was run (or since the object was created).

Tuesday, February 07, 2017

The DB2 12 for z/OS Blog Series - Part 3: Temporal Improvements

Temporal tables and time-travel query support was added in DB2 10 for z/OS. With temporal tables a time period is attached to the data to indicate when it was valid or changed in the database. A traditional database stores data implied to be valid at the current point-in-time; it does not track the past or future states of the data. Temporal support makes it possible to store different database states and to query the data as of those different states. That means different DDL to support temporal data, as well as different SQL syntax to query it.

Using built-in DB2 temporal capabilities is much easier than coding for it yourself, but as with most new features there were some additional tweaks that customers asked for. DB2 12 for z/OS delivers on making it easier for organizations to adopt and use temporal DB2 data and queries by supporting:

temporal referential constraints,
more flexible time periods, and
logical transactions.

Let’s examine each of these areas in a bit more depth. First up, let’s take a look at temporal referential constraints. Prior to DB2 12, temporal referential integrity required triggers or stored procedures to implement. But with DB2 12, you can define a temporal referential constraint for a temporal table that contains a BUSINESS_TIME period by specifying the PERIOD BUSINESS_TIME clause in the definition of the constraint. This should improve data integrity for temporal tables and increase temporal uptake for DB2 customers.

The second new temporal feature is improved flexibility for defining the application period in temporal tables. The application period is defined with two date/time columns, one specifying the beginning of the period and the other specifying the end of the period. Prior to DB2 12, the beginning value of a period had to be inclusive, and the end value had to be exclusive. So for a period beginning at 2017-01-01 and ending 2019-10-01, 2017-01-01 is part of the period but 2019-10-01 is not part of the period. This type of period is called an inclusive-exclusive period.

In DB2 12, you can create an application-period temporal table with a BUSINESS_TIME period that is inclusive-inclusive. That means that the end value is considered to be part of the period, instead of outside the period definition.

Generally speaking, the inclusive-exclusive definition is preferred (and is the default) because it complies with the SQL standard. However, if your data already exists as inclusive-inclusive, or it makes more sense to your business users, then DB2 12 support for inclusive-inclusive will make it easier for your organization to use DB2 temporal tables and time travel queries.

Finally, DB2 12 offers temporal logical transactions on system temporal tables (that is, those using SYSTEM_TIME periods). With this new feature, DB2 supports logical units of temporal work that are not determined by COMMIT and ROLLBACK. That means that values for row-begin and row-end columns are determined by applications based on a built-in global variable that you can set.

So overall, DB2 12 for z/OS makes using temporal tables easier and more flexible with these three compelling new features.

Wednesday, February 01, 2017

The DB2 12 for z/OS Blog Series - Part 2: Advanced Triggers

As we continue our series on new functionality in DB2 12 for z/OS, today we take a look at the improvements delivered for triggers.

Before we dive into the new capabilities, let's briefly review what triggers are. Triggers are event-driven specialized procedures that are stored in, and executed by, the DBMS. Each trigger is attached to a single, specified table. Triggers can be thought of as an advanced form of "rule" or "constraint" written using procedural logic. A trigger can not be directly called or executed; it is automatically executed (or "fired") by the DBMS as the result of a modification being made to the associated table. Once a trigger is created it is always executed when its "firing" event occurs (update, insert, delete, etc.). Triggers are generally used to promote data integrity and make the database self-maintaining.

So far, so good. But what does DB2 12 provide us that we did not have in prior releases?

Well, now we have a new type of trigger called an advanced trigger. All this really means is that the trigger supports new capabilities that I will outline momentarily. Triggers created before you have moved to function level 500 (this is new terminology for DB2 12 that roughly equate to NFM in earlier releases) will be referred to as basic triggers.

So what makes a trigger advanced? Simply put, support for additional development and usability capabilities. The following features are supported only with advanced triggers:

You can define and reference SQL variables in the trigger
More types of SQL statements can be used, including dynamic SQL statements
SQL PL control statements can be used in the trigger
It is possible to reference global variables, and to assign values to global variables
Finally, you can explicitly specify options, including BIND options
Comments are supported in SQL statements
You can include definitions for multiple versions of the trigger

These are all good things. But there are some additional changes that you need to be aware of with advanced triggers. All transition variables are nullable in advanced triggers. And debugging options are available. Perhaps the most significant change though is the support for a more robust ALTER TRIGGER statement that can be used to change trigger options or regenerate the trigger body. Only a very rudimentary ALTER capability is available for basic triggers.

Another nice capability for advanced triggers it that CREATE TRIGGER now provides the OR REPLACE clause. This lets the developer use one CREATE statement to define a trigger or update an existing trigger depending on whether the trigger already exists. The OR
REPLACE option can also be used with a CREATE TRIGGER statement to define a
new version of a trigger, or to replace an existing version of a trigger.

A nice chart of the behavioral differences between basic and advanced triggers can be found here, if you are interested in additional details.

The bottom line is that for shop's that rely on triggers, or that are looking to use more triggers, the set of new functionality offered in DB2 12 for z/OS greatly expands the usefulness and usability for triggers.

Tuesday, January 17, 2017

The DB2 12 for z/OS Blog Series - Part 1: SQL Pagination

Today’s blog post kicks off a new series of posts that examine new features and functionality delivered with the latest and greatest new version of DB2 foir z/OS – Version 12.

We’ll begin with a nice new feature that should make it easier for programmers writing queries where the program needs to be able to deal with a subset of results. For example, to show the first 25 rows, then the next, etc. This is a frequent requirement for mobile and web applications that are common these days.

This separating of rows into piece can now be accomplished quite easily in DB2 12 with the new OFFSET syntax. OFFSET is specified as a clause of the SELECT statement and it is used to specify the number of rows in the result table that will be skipped before

any rows are retrieved.

The offset clause is simple to code. You just need to determine the number of rows that you want to skip and code that in the clause. For example, to skip 5 rows you would code OFFSET 5 ROWS. Simple, right? Well, conceptually, yes, but in practice you have to be careful.

First of all, you must know where/when you can code an OFFSET clause. OFFSET can be specified on a subselect or fullselect that is outermost fullselect in a prepared SQL statement or a DECLARE CURSOR statement. OFFSET is also permissible in a SELECT INTO statement. However, you cannot code an OFFSET clause in a view definition, an MQT, the RETURN statement of SQL table functions, in row permissions or column masks, or in the outermost fullselect for a sensitive dynamic cursor. A further restriction is that the statements that includes the OFFSET clause cannot contain an expression that is not deterministic or that has external action.

The second, and I think more interesting aspect to consider when using the OFFSET clause is that your result set must be predictable to assure that you are retrieving useful data. As we all should know, the access path chosen by the optimizer can alter the order in which rows are returned. If you want to access rows in sets of 25, for example, then you want to make sure that each successive set does not contain any rows from previous sets, and does not omit any rows in between sets.

So how do we do this? The best approach is to code up an ORDER BY clause that specifies columns that uniquely identify each row in the result table. If there are duplicates, then there is no way to verify that you are getting the data that you want. IN other words, the order of the rows is not deterministic. The data being skipped will not be predictable and it is highly likely that you will not be accessing all of the data that you want (or perhaps even accessing the same data twice).

It is also important to understand that if the OFFSET number of rows is greater than the number of rows in the intermediate result table, you will get an empty result.

So let’s look at a quick example. Suppose we are looking for employees earning more than $50000. Further, we want to skip the first 10 rows of the EMP table, and grab the next 10. This SQL should do the trick:

SELECT *

FROM EMP

WHERE SALARY > 50000.00

ORDER BY EMPNO

OFFSET 10 ROWS

FETCH FIRST 10 ROWS ONLY;

The OFFSET 10 ROWS will cause the first 10 qualifying rows to be skipped. The FETCH clause gets the next 10 qualifying rows (after skipping the first 10).

But OFFSET is not the only new feature to help developers with SQL pagination. Also available with DB2 12 is data-dependent pagination, which uses row value

expressions in a basic predicate. That sounds more complicated than it really is, so let’s look at an example:

We can code a WHERE clause like this to search more rows with a name greater than mine:

WHERE (LASTNAME, FIRSTNAME) > (′MULLINS′, ′CRAIG′)

This is easier than what we previously had to do (before DB2 12), which was breaking the WHERE up as follows:

WHERE (LASTNAME = ′MULLINS′ AND FIRSTNAME > ′CRAIG′)

OR (LASTNAME > ′MULLINS′)

The bottom line is that application coding for pagination becomes a lot easier in DB2 12 for z/OS…

Thursday, December 22, 2016

Happy Holidays 2016

It is that time of year again. When we all take some time to pause and spend time with our family, friends and loved ones. We take time off of work to celebrate and reflect on the past year... and to welcome the new year.

No matter what holiday tradition you celebrate, I wish you an enjoyable holiday season. Happy holidays, that is!

And we'll meet back here again in January 2017 to talk more about DB2 stuff!

Happy Holidays!

Thursday, November 24, 2016

Happy Thanksgiving

Today, in the United States of America, we celebrate Thanksgiving by gathering together with our loved ones and giving thanks for all we have. This includes celebrations with food, traditionally a big turkey dinner with mashed potatoes and gravy, as we watch a parade and football games...

With this in mind, I give thanks for all of the reader's of my blog, wherever you may be, and wish you a very happy day!

Tuesday, July 12, 2016

DB2 for z/OS Webinar: Who Did What to Which Data When?

Today's blog post is to let you know about an upcoming webinar (July 21, 2016) that I am conducting with CorreLog on the topic of database auditing for DB2 for z/OS.

Unless you have been living under a rock these past few years you will have heard at least something about the many data breaches, hacks, and security issues that have been prevalent in the news. These issues have resulted in numerous industry and governmental regulations that organizations must understand and comply with. Data professionals, in particular, need to be vigilant in the tools and techniques that are available for better protecting their company’s data, and tracking those that access it. One of the most important of these techniques is database auditing.

Database auditing provides a facility for tracking the use of database resources and authority. When auditing is enabled, each audited database operation produces an audit trail of information including information such as what database object was impacted, who performed the operation and when. The comprehensive audit trail of
database operations produced can be maintained over time to allow DBAs and auditors, as well as any authorized personnel, to perform in-depth analysis of access and modification patterns against data in the DBMS.

With this basic information on database auditing in mind, you should be able to readily see how auditing can help your organization answer questions like:

“Who accessed or changed critical data?” and
“When was the data actually changed?” and perhaps even
“What was the old content prior to the change?”

Your ability to answer such questions can make or break a compliance audit. Of course, these are just the high-level details. To delve more deeply into the important issues involving database auditing requires time and research... or you can attend our upcoming webinar and get up to speed quickly on the essentials you need to know!

The webinar, entitled Who Did What to Which Data When? will elaborate on database auditing details and issues such as the data breach trends and how costly they can be, an overview of pertinent regulations and their impact, and the various types of database auditing methods with their pros and cons... all with a specific focus on mainframe DB2. And there will also be an overview and demo of CorreLog's database auditing offering for DB2 for z/OS.

You can learn all of this and more on Thursday, July 21, 2016, at 11:00 am Eastern Daylight Time (EDT).

But you have to register to attend, so be sure to click here to register.

Friday, July 08, 2016

Good Old Partitioned Table Space Confusion

Partitioned table spaces have been with us in the world of DB2 forever, but even as we embark on moving to Universal table spaces, there still exists some confusion "out there" in terms of the behavior of partitioning...

For example, consider this question:

If a table is in a partitioned table space, say four partitions, for example, then to process the table in batch can I run four instances of the batch program in parallel, one against each partition?

This type of question exhibits some confusion about how partitioned table spaces work. First of all, let's assume that we are talking about classic partitioned table spaces and/or Universal range-partitioned table spaces.

Given that assumption, yes, you can run four instances of a batch program in parallel if you so desire. Of course, that is not necessary to get DB2 use parallel tasks to read the data. The best approach is to BIND the program (package) specifying the DEGREE(ANY) parameter. Then DB2 will choose the degree of parallelism for the batch program. This is simpler than any alternative approach you might choose to code up because DB2 handles it all for you.

If you still wish to run four instances of the batch program you, of course, can. In that case you would probably want to BIND using DEGREE(1). In order for this to work the way I infer that you intend it to work, however, you may have to modify the program somewhat. I assume that you wish each job to process only against one of the four partitions. To accomplish this, you must provide some way for the program to identify and process only the data from one of the four partitions based on the partitioning key range specified in the partitioning index.

For example, you might choose to input parameters to the batch run specifying the key range for that program to process. As long as the program adheres to that key range you should only process data from the one partition that holds that data.

Of course, that means that your programmers must be diligent in applying the key range. Perhaps you could set up 4 views that apply the key range for each partition and then use only those views in each program. But that means you will have multiple versions of the same program. You could also specify the range as an input variable and then use only one program. But that means you have to make sure that you are sending the proper range to the input variables.

So just go with the first approach and let DB2 do it for you when it makes sense...

Tuesday, June 14, 2016

Four Important Buffer Pool Tuning Knobs in DB2 for z/OS

DB2 has five (well, four current) primary adjustable thresholds that can be modified using the ALTER BUFFERPOOL command.

These thresholds are as follows:

The Sequential Steal Threshold, or VPSEQT, is the percentage of the buffer pool that can be occupied by sequentially accessed pages. For example, at the default value of 80, when this threshold is reached, 80% of the buffer pool represents pages for sequential processing. Of course, 80% is just the default; you can modify this value based on your processing needs to any value ranging from 0 to 100. When this threshold is reached, DB2 will steal a sequential page first before stealing a page from a randomly accessed page. So, for data that is accessed mostly sequentially (for example, through scans and prefetching) consider increasing the value of this parameter, and for data that is accessed most randomly, consider decreasing the value of this parameter. A VPSEQT value of zero will prevent any sequential pages from lingering in the buffer pool and it will turn off sequential prefetch. A VPSEQT value of 100 allows the entire buffer pool to be monopolized by sequential pages.

The next tunable buffer pool threshold is the Parallel Sequential Threshold, or VPPSEQT. This threshold indicates the amount of the buffer pool that can be consumed by sequentially accessed data for parallel queries. When this threshold is reached, DB2 will cease to steal random pages to store sequential pages accessed by parallel queries. The default value for VPPSEQT is 50%, indicating its size as 50% of the sequential steal threshold (VPSEQT). For example, if the buffer pool is defined as 1000 pages and VPSEQT is set at 80%, a query using I/O parallelism can consume up to 400 sequential pages (that is, 1000 x 80% = 800 for the sequential steal threshold and 800 x 50% = 400 for the parallel sequential threshold).

The third, and final sequential threshold is the Assisting Parallel Sequential Threshold (or VPXPSEQT). This threshold is no longer supported as of DB2 11 because Sysplex Parallelism is no longer supported. When it was available, VPXPSEQT was used to indicate the portion of the buffer pool that might be used to assist with parallel operations initiated from another DB2 in the data sharing group.

The final two modifiable DB2 buffer pool thresholds are used to indicate when modified data is to be written from the buffer pool to disk. Log data is externalized when a COMMIT is taken, but writing of the actual data itself is controlled by the two deferred write thresholds (and DB2 system checkpoints).

First we have the Deferred Write Threshold (or DWQT). When DWQT is reached, DB2 starts scheduling write I/Os to externalize the data pages to disk. By default, the deferred write threshold is reached when 30% of the buffer pool is allocated to unavailable pages, whether updated or in use. The default is probably too high for most larger buffer pools.

DB2 also provides the Vertical Deferred Write Threshold (VDWQT), which is basically the same as DWQT but for a single page set. By default, VDWQT is reached when 5% of the buffer pool is allocated to one data set. When reached, DB2 will start scheduling write I/Os to externalize the data pages to disk. Once again, this default is most likely too high for most shops.

The VDWQT threshold can be specified as a percentage of the buffer pool, or as an absolute number of buffers. When you want to specify a relatively low threshold for VDWQT using an absolute number of buffers provides better granularity and control.

In general, consider ratcheting the deferred write thresholds down to smaller percentages (from the defaults) for most of your buffer pools. Doing so enables “trickle” write from the DB2 buffer pools. This means that the data is written asynchronously to disk regularly over time in smaller amounts, instead of storing up a lot of modified data that has to be written all at once when the threshold percentage is reached. Of course, the needs of every shop will vary.

And yes, there are other buffer pool tuning options other than these 5 4 parameters, such as changing the size of the pool, specifying min/max size for BP expansion/contraction, or altering the page most often used to fine tune buffer pool operations.

All of the above thresholds can be changed using the -ALTER BUFFERPOOL command.

Wednesday, June 01, 2016

Carefully Code Your DB2 LIKE Predicates

The LIKE predicate is a powerful but potentially problem-causing operator that can be used to great effect in your SQL development efforts.

The LIKE predicate searches for strings that match a specified pattern. Here is the definition from the DB2 SQL Guide:


>>-match-expression--+-----+--LIKE--pattern-expression---------->
                     '-NOT-'                             

>--+---------------------------+-------------------------------><
   '-ESCAPE--escape-expression-'

But what does this mean? Well, the match-expression is the string to be tested for conformity to the pattern specified in pattern-expression. You can use the underscore and the percent sign characters as wildcards in the pattern to indicate 1 (underscore) or many (percent sign) indeterminate characters.

The ESCAPE clause is used when you want to actually search for one of the wildcard characters.
But I do not really want to get into explaining the basics of how LIKE works here. If you really need more details on LIKE I refer you to the appropriate IBM Knowledge Center details.

The Semantics of LIKE
What I do want to do today is to give some advice on LIKE usage. First of all, be careful in terms of how you use the wildcard characters (underscore and percent sign). The '_' character requires a matching character and the '%' character does not. This can produce interesting results. For example, the following two queries are not equivalent, though at first glance you might think they were:

SELECT *

FROM cust

WHERE (lastname LIKE 'Peter_%');

SELECT *

FROM cust

WHERE (lastname LIKE 'Peter%');

Both will match to 'Peterson' and 'Peters', but the first will not match to 'Peter' because of the underscore.

The LIKE predicate offers a great deal of flexibility and power to your SQL statements. Using LIKE you can quickly retrieve data based on patterns and wildcards. However, some uses of LIKE can be confusing to implement appropriately—especially when LIKE is used with host variables.

Let’s assume that you need to create an application that retrieves customers by last name, but the supplied value for last name can be either the entire name or just the first few bytes of that name. In that case, the following query can suffice:

SELECT custno, firstname, lastname

FROM cust

WHERE lastname LIKE :host_variable;

In order for this to work, when you enter the value for host_variable be sure to append percent signs (“%”) to the end of the value. The percent sign specifies that DB2 should accept as a match any number of characters (including 0). This must be done programmatically. So, if the value entered is SM, the host_variable should contain “SM%%%%%%%%” and if the value entered is SMITH, the host_variable should contain “SMITH%%%%%”. Append as many percent signs as required to fill up the entire length of the host variable. Failure to do so will result in DB2 searching for blank spaces. Think about it—if you assign “SMITH%” to a 10-byte host variable, that host variable will think it should search for “SMITH%”, that is SMITH at the beginning, four blanks at the end, and anything in the middle.

So, for “SMITH%%%%%”, SMITH will be returned, but so will SMITHLY (or any name beginning with SMITH). There is no way to magically determine if what was entered is a complete name or just a portion thereof. If this is not acceptable, then a single query will not likely be feasible. Instead, you would have to ask the user to enter whether a full name or just a portion is being entered.

What About Performance?

Notwithstanding the semantic details, there are performance considerations to understand when using LIKE, too. It is a good practice to avoid using the LIKE predicate when the percentage sign (%) or the underscore (_) appears at the beginning of the comparison string because they prevent DB2 from using a matching index.

The LIKE predicate can produce efficient results, however, when you use the wildcard characters at the end or in the middle of the comparison string, for example:

Inefficient	Can be efficient with index
LIKE '%NAME'	LIKE 'NAME%'
LIKE '_NAME'	LIKE 'NA_ME'

DB2 will not use direct index lookup when a wildcard character is supplied as the first character of a LIKE predicate. At bind time, DB2 cannot determine when a host variable contains a wildcard character as the first character of a LIKE predicate. The optimizer therefore does not assume that an index cannot be used; rather, it indicates that an index might be used. At runtime, DB2 determines whether the index will be used based on the value supplied to the host variable. When a wildcard character is specified for the first character of a LIKE predicate, DB2 uses a non-matching index scan or a table space scan to satisfy the search.

Summary

The LIKE operator brings powerful search capabilities to your DB2 SQL queries. Be sure to understand its capabilities and to use it appropriately in your development efforts.

Tuesday, May 17, 2016

Come See Me at IDUG!

Just a quick blog post today to let all of my readers know my schedule for the IDUG DB2 Technical Conference being held next week in Austin, TX. With this information you can track me down in multiple places if you want to ask me a question, catch up, or just say “Hi!”

First of all, you can find me on Tuesday at the Expo Hall in the CorreLog booth (#300) from 12:00 to 1:00 and from 5:00 to 6:00. CorreLog will also be raffling off copies of my book, DB2 Developer’s Guide, to 4 lucky winners… so be sure to stop by. And chat with CorreLog about their SIEM and auditing solutions for DB2.

You should also make sure to attend my IDUG session titled “It’s Not Your Daddy’s DB2!” on Wednesday at 3:30 PM (session B13). The general idea of the session is that DB2 is changing and you should be changing with it. Over the course of the past few releases of DB2 for z/OS, IBM has added many features and capabilities that are transforming the platform. I’ll take a look at the big changes that have been introduced to DB2 including new SQL, universal table spaces, improved security, and more. The session also offers guidance on how to continue improving your DB2 environment to keep up with industry, technology and DBA trends circa 2016.

But that’s not all. On both Tuesday and Wednesday I will be co-presenting with SEG at their VSP sessions. On Tuesday at 1:00 PM (Session V02) I will be co-presenting with Ulf Heinrich on DB2 audit and compliance issues and how they impact your DB2 data and database assets. In particular, we will focus on database auditing, talking about what is needed, the various methods of auditing, and a new offering from SEG that you can use to run audit analytics against a long-term repository. And on Wednesday at 10:30 AM (Session V08) I will co-present with Roy Boxwell about DB2 release incompatibilities and how they impact your DB2 applications. And I hear that SEG will have some of my books to raffle off, too!

And there’s still more! On Tuesday night (6pm to 9pm) I will be speaking at a DataKinetics event with Colin Oakhill on the topic of SQL quality assurance. Be sure to stop by the DataKinetics booth (#509) to get an invitation to the Tuesday night event where you can hear me and others speak about SQL quality and performance… as well as enjoy a tasty snack or beverage.

So if you’ll be at IDUG and you want to find me, there is really no reason why you shouldn’t be able to track me down at one or more of these places…

See you in Austin!