Monday, January 24, 2022

Db2 DATE and TIME Arithmetic

After turning the last page on the 2021 calendar to embark on another year, I thought it might be a good time to reflect on date and time data... and how it can be used in Db2 for z/OS.


One of the most important database design decisions you will make – for Db2 or, indeed, any relational DBMS – is to use the proper data types for your columns when building tables. The data type that you choose for each column should be the one that most closely matches the domain of values that the column can be used to store.

Perhaps the most important design choice is to use the date and time data types that are available to you; for Db2 this is DATE, TIME, and TIMESTAMP. The ability to store dates as a native Db2 data type is a great advantage. If you need to store date information in your Db2 tables you should favor using the DATE data type instead of forcing a date into some other data type (such as CHAR or DECIMAL). Many a database design has been ruined (in my opinion) because somebody decided to store date data in a CHAR column.

When Db2 knows that the data should be a DATE or a TIME it can force data integrity such that no non-date/time value could ever be stored in the column. This is a big advantage, but it is not the only one. Db2 also provides numerous display formats so date and time values can be displayed in many different ways without having to store them in specific display formats.

Another big reason is that Db2 allows users to perform date/time arithmetic. So, you can easily use date columns to calculate durations or past and future dates based on a number of days, months, and years. The same type of arithmetic can be used for time and timestamp data. Just think about the application code you would have to write to manipulate and manage date/time values!

How Does Date/Time Arithmetic Work?

Db2 enables you to add and subtract DATE, TIME, and TIMESTAMP columns. In addition, you can add date and time durations to or subtract them from, such columns.

Arithmetic on date and time data is fairly straightforward. The plus (+) and minus (-) operations can be used on date and time values and durations. A duration is a number used to represent an interval of time. Db2 recognizes four types of durations.

1.       A labeled duration explicitly specifies the type of duration. An example of a labeled duration is 15 MINUTES. Labeled durations can specify the duration in years, months, days, hours, minutes, seconds, or microseconds.

2.       A DATE duration is a DECIMAL(8,0) number that has the format YYYYMMDD. The YYYY represents the number of years in the duration, MM the number of months, and DD the number of days. When you subtract one date from another, the result is a date duration in this format.

3.       A TIME duration is a DECIMAL(6,0) number with the format HHMMSS. The HH represents the number of hours, MM the number of minutes, and SS the number of seconds. When you subtract one time from another, the result is a time duration in this format.

4.       A TIMESTAMP duration is more complex than date and time durations. The
TIMESTAMP duration is a DECIMAL(20,6) number having the format YYYYXXDDHHMMSSZZZZZZ. The duration represents YYYY years, XX months, DD days, HH hours, MM minutes, SS seconds, and ZZZZZZ microseconds. When you subtract a TIMESTAMP from a TIMESTAMP, you get a TIMESTAMP duration.

So, if you want to add one hour to a TIME column you can simply specify TIME_COL + 1 HOUR. Or subtract a day from a date column easily, such as DATE_COL – 1 DAY. Simple, right?

Well, the rules for date and time arithmetic are somewhat complex. Remember that only addition and subtraction can be performed on date and time data (no division or multiplication). In addition, one of the two operands must be a duration. This stands to reason. For example, two dates cannot be added together, but a duration can be added to a date. The same goes for two times.

Use date and time arithmetic with care. If you understand the capabilities and features of date and time arithmetic, you should have few problems implementing it. Keep the following rules in mind:

                  When you issue date arithmetic statements using durations, do not try to establish a common conversion factor between durations of different types. For example, the date arithmetic statement:
                  DATE(‘2014/04/03’) - 1 MONTH

              is not equivalent to the statement:
                                DATE(‘2014/04/03’) - 30 DAYS

April has 30 days, so the normal response would be to subtract 30 days to subtract one month. The result of the first statement is 2014/03/03, but the result of the second statement is 2014/03/04. In general, use like durations (for example, use months or use days, but not both) when you issue date arithmetic.

                  If one operand is a date, the other operand must be a date or a date duration. If one operand is a time, the other operand must be a time or a time duration. You cannot mix durations and data types with date and time arithmetic.

                  If one operand is a timestamp, the other operand can be a time, a date, a time duration, or a date duration. The second operand cannot be a timestamp. You can mix date and time durations with timestamp data types.

Using Date/Time Functions

Db2 also provides numerous date/time functions that can be used to simply manipulate and modify date/time values. Let’s take a look at an example.

Suppose you want to express the duration resulting from date subtraction as a total-number-of-days (exact total, and not an approximate total)? Consider this query:

     SELECT DATE ('03/01/2004') - '12/01/2003'

It returns a duration of 00000300 (that is, 3 months). And those 3 months encompass a 29-day February plus a 31-day January plus a 31-day December (total 91 days). So the answer that we want to return is 91.

The answer to this conundrum lies in using the DAYS function. The following will return the result as a number of days:

     SELECT DAYS('03/01/2004') - DAYS('12/01/2003')

This query will return to you the exact number of days between the two dates. The DAYS function converts a Db2 date or timestamp into an integer value representing one more than the number of days since January 1, 0001. So converting both dates using the DAYS function and subtracting yields the desired result.

The functions provided by Db2 that can be applied to DATE, TIME, and TIMESTAMP columns can greatly assist your development efforts when using date/time data. Chances are you will find a function to help you with your task at hand, depending on your specific needs. Consider the functions listed in the following table.

DB2 Date/Time Functions

CHAR

Converts a Db2  date, time, timestamp, ROWID, floating point, integer, or decimal value to a character value.

DATE

Converts a value representing a date to a Db2 date. The value to be converted can be a Db2 timestamp, a Db2 date, a positive integer, or a character string.

DAY

Returns the day portion of a Db2 date or timestamp.

DAYOFMONTH

Similar to DAY except DAYOFMONTH cannot accept a date duration or time duration as an argument.

DAYOFWEEK

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the day of the week. The value 1 represents Sunday, 2 Monday, 3 Tuesday, 4 Wednesday, 5 Thursday, and so on.

DAYOFYEAR

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the day within the year. The value 1 represents January 1st, 2 January 2nd, and so on.

DAYS

Converts a Db2 date or timestamp into an integer value representing one more than the number of days since January 1, 0001.

EXTRACT

Returns a portion of a date or timestamp; you can use EXTRACT to slice up a date/time value into its component pieces.

HOUR

Returns the hour portion of a time, a timestamp, or a duration.

JULIAN_DAY

Converts a Db2 date or timestamp, or character representation of a date or timestamp, into an integer value representing the number of days from January 1, 4712 BC to the date specified in the argument.

MICROSECOND

Returns the microsecond component of a timestamp or the character representation of a timestamp.

MIDNIGHT_SECONDS

Returns the number of seconds since midnight for the specified argument which must be a time, timestamp, or character representation of a time or timestamp.

MINUTE

Returns the minute portion of a time, a timestamp, a character representation of a time or timestamp, or a duration.

MONTH

Returns the month portion of a date, a timestamp, a character representation of a date or timestamp, or a duration.

MONTHS_BETWEEN

Returns an estimate of the number of months between two expressions.

QUARTER

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the quarter within the year. The value 1 represents first quarter, 2 second quarter, 3 third quarter, and 4 fourth quarter.

SECOND

Returns the seconds portion of a time, a timestamp, a character representation of a time or timestamp, or a duration.

TIME

Converts a value representing a valid time to a Db2 time. The value to be converted can be a Db2 timestamp, a Db2 time, or a character string.

TIMESTAMP

Obtains a timestamp from another timestamp, a valid character-string representation of a timestamp, or a combination of date and time values (without a time zone).

TIMESTAMPADD

Adds an interval to a timestamp.

TIMESTAMPDIFF

Subtracts two timestamps and returns an interval.

TIMESTAMP_FORMAT

Changes the display format for a timestamp value.

TIMESTAMP_TZ

Returns a timestamp with time zone value.

WEEK

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the week within the year (with Sunday as the first day of the week). The value 1 represents the first week of the year, 2 the second week, and so on.

WEEK_ISO

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the week within the year (with Monday as the first day of the week).

YEAR

Returns the year portion of a date, a timestamp, or a duration.

A Caveat

Keep in mind that this post addresses date and time data in Db2 for z/OS. Yes, other RDBMS products also support date and time data, but there will be differences and nuances that you will need to understand for each. Here is a nice blog post that discusses working with dates on Oracle, SQL Server, PostgreSQL, and MariaDB if you are interested.

Summary

Using date and time data correctly can be a bit confusing, but the rewards of learning how to use Db2 to help you properly use date and time data are numerous. Do not continue to operate in the void. The wise Db2 professional will learn proper Db2 date and time usage and the vast support built into Db2 for manipulating date and time values.

And the sooner, the better!

That way you can let Db2 perform date and time formatting, integrity checking, and arithmetic… so you won’t have to!

Monday, January 03, 2022

Simplify and Speed Up Your Db2 for z/OS Data Copying and Movement

Register and attend a webinar on efficient Db2 data movement as presented by Craig S. Mullins on February 24, 2022 | 11:00 AM CST


Just about everybody is aware of the phenomenal data growth that organizations are experiencing these days. More data is being generated and stored than ever before. But not all of that data is unique or even production data. An underlying phenomenon contributing to data growth that is sometimes ignored is the vast amount of data that is copied and moved throughout every organization.

Data is constantly on the move. I wrote about this phenomenon in my recent column at Database Trends & Applications. From production systems to test systems, from one test system to another, and even from user to user. All of that data is required for multiple purposes, such as supporting transactional systems, mirroring production, reporting, performing analytics, for testing purposes, training, and more. And data is being refreshed and copied more frequently than ever before to support business requirements and changing technology needs such as agile development and DevOps.

How can your company support this myriad of data movement requirements without overworking your DBAs? What can be done to combat the sometimes excruciatingly long runtimes required to unload and load large tables using Db2 utilities? And let’s not forget that indexes and statistics need to be copied, too! Furthermore, nothing gets put on hold so that data can be copied, right? So how can test data management and data movement be automated to ensure that data is smoothly moved and refreshed even as application and database structure changes are made?

Attend Craig's webinar titled Simplify and Speed Up Your Db2 for z/OS Data Copying and Movement to learn more about these issues and to hear Craig’s insight on how to use BCV5 to automate, streamline, and speed up Db2 data movement based on his experiences using the tool in the field.

Thursday, December 23, 2021

Happy Holidays 2021-2022

 Just a quick blog post to wish everybody a very happy holiday season. Whether you celebrate Christmas, Hanukkah, Kwanza, or some other holiday season, all the best to you from me!


And a very Happy New Year to one and all. The past couple of years have been difficult and stressful for many, so here's a wish for a better 2022 for everybody.


I plan to take some time off over the last week of 2021 through New Year's Day, but I'll see you all again on the other side of the holiday season.


Monday, December 13, 2021

Top 10 Steps to Building the Right Indexes

One of the constant struggles with Db2 development, and indeed developing for any relational DBMS, is designing and creating appropriate indexes. Perhaps the most important thing you can do to assure optimal application performance when accessing data in a relational/SQL database is to create correct indexes for your tables based on the queries your applications use. Of course, this is easier said than done.

But we can start with some basics. For example, consider this SQL statement:

    SELECT   LASTNAME, SALARY
    FROM     EMP
    WHERE    EMPNO = '000010'
    AND      DEPTNO =  'D01';

What index (or indexes) would it make sense to build for this simple query? First, think about all the possible indexes that you could create. Your first short-list probably looks something like this:

  • Index1 on EMPNO
  • Index2 on DEPTNO
  • Index3 on EMPNO and DEPTNO

This is a good start, and Index3 is likely to be the best of the lot. It enables the optimizer to use the index to immediately look up the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table, you might want to examine the impact of creating yet another index on the table.

With the caveat that appropriate index creation can be complicated, let’s look at the Top 10 things you can do to build the right indexes on your tables:

1. Index by workload, not by object

Many people make the mistake of just guessing at some indexes to create when they are creating other database objects (like tables and tablespaces). But without an idea of how the tables are going to be accessed these guesses are usually wrong – at least some of them.

Indexes should be built to optimize the access of your SQL queries. To properly create an optimal set of indexes requires a list of the SQL to be used, an estimate of the frequency that each SQL statement will be executed, and the importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries most of the time be made.

If you are doing it any other way, you are doing it wrong.

2. Build indexes based on predicates

3. Index most-heavily used queries

Numbers 2 and 3 can be thought of as corollaries to Number 1… that is, these are the aspects of application workload that need to be examined to produce appropriate and effective indexes. 

Look at the predicates of the queries in the workload and work to create a set of indexes that match up to most (if not all) of them. When it is not practical to build all of the indexes you have identified, then it makes sense to look at the queries that will be used most often and optimize them first.

4. Index important queries

The more important the query, the more you might want to tune by index creation. If you are coding a query that the CIO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. 

Of course, the decision should depend on the application’s importance to the business - not just on the user’s importance. For example, if that clerk runs a query hundreds of times a day and the CIO runs his query only once, then perhaps the clerk's query is more important.

5. Index to avoid sorting (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses, the optimizer can use an index to avoid a sort, and thereby potentially improve performance.

6. Create indexes for uniqueness (PK, U)

Some indexes are required in order to make the database schema valid. Most Database systems require that unique indexes be created when unique and primary key constraints exist.

7. Create indexes for foreign keys

Even if not required, creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints (RI – referential integrity).

8. Consider adding columns for index-only access

Sometimes it can be advantageous to include additional columns in an index to increase the chances of index-only access. With index-only access, all of the data needed to satisfy the query can be found in the index alone — without having to read data from the tablespace.

For example, suppose that there is an index on the DEPTNO column of the DEPT table. The following query may use this index:

    SELECT   DEPTNAME
    FROM     DEPT
    WHERE    DEPTNO =  'D01';

The index could be used to access only those columns with a DEPTNO greater than D00, but then the DBMS would need to access the data in the table space to return the DEPTNAME. If you added DEPTNAME to the index, that is, create the index on (DEPTNO, DEPTNAME) then all of the data needed for this query exists in the index and additional I/O to the table space would not be needed. This technique is sometimes referred to as index overloading.

Of course, this is not always a good idea. You have to take into account whether other queries use the index and how it might negatively impact their performance.

It is also worth mentioning index include columns, which allows you to define additional (included) columns that are not part of the actual key but are available in the index. So we can create an index like this:


CREATE INDEX IXNAME
    ON DEPT ( DEPTNO )
     INCLUDE ( DEPTNAME )
The index key is only on the DEPTNO column, but the DEPTNAME is also included in the index, so the earlier query can be satisfied using index-only access with this index.

9. Don’t arbitrarily limit the number of indexes

An example of a bad standard would be to place an artificial limit on indexing. There should be no arbitrary limit on the number of indexes that you can create for any database table. Indexes are undoubtedly one of the most important factors in creating efficient queries. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data. When a rule such as this exists, it usually is stated in the standards manual using verbiage something like “Each table can have at most five indexes created for it” — or — “Do not create more than three indexes for any single table in the database.” These are bad standards. 

If you already have three indexes, or five indexes, or even 57 indexes, and another index will improve performance why would you arbitrarily want to avoid creating that index?

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.” 

Which brings us to…

10. Be aware of I/U/D implications

Db2 must automatically maintain every index you create. This means every INSERT and every DELETE to an indexed table will insert and delete not just from the table, but also from its indexes.

Additionally, when you UPDATE the value of a column that has been defined in an index, the DBMS must also update the index. So, indexes speed the process of retrieval but slow down modification.

So the general rule of thumb should be "Index until it hurts... and then back off the least effective index to make it no longer hurt." Sure, that is easier said than done, but it is a valid philosophy to follow.

Summary

Following these Top 10 index design techniques can go a long way toward improving not just your index usage, but also the performance of your database applications.

Monday, December 06, 2021

A Systematic Approach to View Creation

Historically, one of the most fertile grounds for disagreement between database professionals was the appropriate usage of views. Some analysts promote the liberal creation and usage of views, whereas others preach a more conservative approach.

When properly implemented and managed, views can be fantastic tools that help to ease data access and simplify development. Although views are simple to create and implement, few organizations take a systematic and logical approach to view creation. And therein lies the problem. A strategic and reasonable policy guiding the creation and maintenance of views is required to avoid a muddled and confused mish-mash of view usage. Basically, views are very useful when implemented wisely, but can be an administrative burden if implemented without planning.

Let's back up a minute and define what a view is.  First, remember that all SQL access to relational tables results in another table. This is called relational closure and it is a requirement of the relational model. A view can be considered to be a logical table. That is, a view is a "logical" representation of data that is "physically" stored in other tables (and perhaps other views as well). Views are defined using SQL and are represented internally to the DBMS by SELECT statements, not by stored data (unless it is a materialized view, which is another thing altogether and beyond the scope of this particular post). Almost any SQL that can be issued natively can be coded into a view.

View Implementation Rules

After you understand the basics of views, you should develop guidelines for view creation in order to limit administrative burden. The following rules can be used to ensure that views are created in a responsible and useful manner at your shop. These rules were developed over a number of years as a result of reviewing implementations and working with views in many different environments. There are three basic view implementation rules:

1. The View Usage Rule

2. The Proliferation Avoidance Rule

3. The View Synchronization Rule

The View Usage Rule: The first rule is simple: your view creation strategy should be goal-oriented. Views should be created only when they satisfy a specific application or business requirement. That requirement should be documented somewhere, preferably in a dictionary or repository. Although this rule may seem to be obvious, views are implemented at some shops without much thought as to how they will be used. This can cause the number of views that must be supported and maintained to continually expand until so many views exist that it is impossible to categorize their uses. And the time needed to maintain and administer the system increases as the number of views increases.

There are five basic uses for which views excel. These are: 

1) to provide row and column level security, 

2) to ensure efficient access paths, 

3) to mask complexity from the user, 

4) to ensure proper data derivation, and 

5) to rename tables and/or columns.

The second rule is the Proliferation Avoidance Rule. This rule is simple and to the point: do not needlessly create views (or, indeed, any database objects) that are not absolutely required. Whenever a database object is created additional entries are placed in the system catalog. Creating needless views causes "catalog clutter"…that is, entries in the catalog for objects which are not needed and/or are not used.

The proliferation avoidance rule is based on common sense. Why create something that is not needed? It just takes up space that could be used for something that is needed.

The third, and final view implementation rule is the View Synchronization Rule. The basic intention of this rule is to ensure that views are kept in sync with the base tables upon which they are based.

Whenever a change is made to a base table, all views that are dependent upon that base table should be analyzed to determine if the change impacts them. All views should remain logically pure. The view was created for a specific reason (if we followed the View Usage Rule above). The view should therefore remain useful for that specific reason. This can only be accomplished by ensuring that all subsequent changes that are pertinent to a specified usage are made to all views that satisfy that usage.

For example, say a view was created to satisfy an access usage, such as a join between the employee table and the department table. If a column is added to the employee table specifying the employee’s location, it should also be added to the EMP_DEPT view if it is pertinent to that view’s specific use. Of course, the column can be added to the table immediately and to the view at the earliest convenience of the development team.

The synchronization rule requires that strict change impact analysis procedures be in place. Every change to a base table should trigger the usage of these procedures. Simple SQL queries can be created to assist in the change impact analysis. These queries should pinpoint any SQL in QMF queries, application packages, dynamic packages, and so on. Policies for informing users of the views to be impacted also need to be established before view changes can be implemented.

View synchronization is needed to support the view usage rule. By keeping views in sync with table changes the original purpose of the view is maintained.

Synopsis

Views are practical and helpful when implemented in a systematic and thoughtful manner. Hopefully, this post has provided you with some food for thought pertaining to how views are implemented at your shop. 


And if you follow the guidelines contained in this article, in the end, all that will remain is a beautiful view!