Tuesday, February 01, 2022

Auditing Database Access and Change: A Necessity More Than a Nicety

The increasing burden of complying with government and industry regulations imposes significant, time-consuming requirements on IT projects and applications. And nowhere is the pressure to comply with regulations greater than on data stored in corporate databases.

Organizations must be hyper-vigilant as they implement controls to protect and monitor their data. One of the more useful techniques to protect your company’s database data is through data access auditing, also known as simply database auditing. Database auditing is the process of monitoring access to, and modification of, selected database objects and resources within operational databases and retaining a detailed record of the access where that record can be retrieved and analyzed as needed.

A data access auditing capability enables companies to produce an audit trail of information with regard to their database data. This audit trail should contain information such as what database objects were impacted, who performed the operations, and when the activity occurred. A comprehensive audit trail of database operations, coupled with an analysis engine to review and analyze the audit trail allows data and security professionals as well as IT auditors to perform an in-depth analysis of access and modification patterns against data in your database systems. Only when armed with such details is it possible to comply with regulations, pass security audits, and drill down into the details to review potential vulnerabilities for effective issue resolution.

A Look at the Regulations and Requirement

A fine-grained audit trail is necessary to comply with many regulations that apply to organizations of all types.

Many of the PCI Data Security Standard requirements emphasize the importance of real-time monitoring and tracking of access to cardholder data, as well as continuous assessment of database security health status.

HIPAA, the Health Insurance Portability and Accountability Act, directs health care providers to protect individual’s health care information going so far as to state that the provider must be able to deliver a list of everyone who even so much as looked at their patient’s information. Could you produce a list of everyone who looked at a specific row or set of rows in any database you manage?

And then there is the Sarbanes-Oxley Act (SOX) which has the goal of reducing fraud and conflicts of interest, as well as improving disclosure and financial reporting. Section 404 of the SOX Act specifies that the CFO must guarantee the accuracy of the processes used to add up the numbers; processes that access and manipulate data in a database system. As such, it is important to be able to track who changed database schemata and database data for SOX compliance

And these are only a few of the pertinent national, international, regional, and industry regulations that must be understood and complied with.

Database Access Auditing Techniques

So now that we understand why database access auditing is important, let’s take a look at how it can be accomplished. There are several popular techniques that can be deployed to audit your database structures.

The first technique is trace-based auditing, which is typically built directly into the native capabilities of the DBMS. For example, the Audit trace feature of IBM Db2 for z/OS. When an audit trace is started, the DBMS begins to cut trace records when activity occurs against audited objects (selected by DDL option). However, Db2 only captures the first read or write per unit of work, which will clearly miss activities as most UOWs encompass more than one read or write. Alternately, Db2 audit policies can be created for named tables to capture all activity, which improves the data captured, but can create an excess of audit records that need to be stored in SMF data sets.

So, there are problems with this technique including a high potential for performance degradation when audit tracing is enabled, a high probability that the database schema will need to be modified, and insufficient granularity of audit control, especially for reads.

Another technique is to scan and parse transaction logs. Every DBMS uses transaction logs to capture every database modification for recovery purposes. If you can read the log and interpret the data (which can be challenging as the data is not simple) it is possible to identify what data was changed and by which users. The biggest drawback to this technique is that database reads are not captured on transaction logs.

Additional issues with relying on log analysis for auditing data access include: it is possible to disable logging such that modification information will not be on the log and therefore not captured; performance issues scanning volumes and volumes of log files looking for only specific information to audit; and the difficulty of retaining logs over long periods for auditing when they were designed for short-term retention for database recovery.

And that brings us to the third, and preferred, method of database auditing for organizations that are serious about regulatory compliance: professional software that proactively monitors and intercepts all SQL requests as they are executed by the DBMS. It is important that all SQL access is audited by monitoring for SQL at the database level, not just by sniffing network calls. This is important because not every SQL request goes over the network, especially for the mainframe platform where much of the activity is centralized and many important business transactions never venture over an IP network (e.g., a CICS or IMS transaction accessing Db2).

Proactive intercept-based database audit monitoring does not require transaction logs, does not require database schema modification, should be highly granular in terms of specifying what to audit, and should incur only minimal overhead.

One such product that implements intercept-based auditing for Db2 database access is DBARS, which stands for “Db2 Access Recording Services,” available from ESAI Group

DBARS

Using DBARS for your Db2 database auditing needs makes sense as it offers a high-speed method for intercepting all Db2 database activities, not just modifications but also all reads. Furthermore, DBARS does not rely on Db2 tracing; instead, it uses a proprietary interface to intercept all Db2 SQL statements, regardless of origin. Therefore, you can use DBARS to create audit reports of all Db2 activity. Furthermore, DBARS provides the ability to block suspicious SQL activity, giving you the ability to prevent fraudulent access attempts based on specific parameters and patterns.

As you look into your mainframe Db2 database auditing needs and compare functionality against the advice proffered in this article, you will see that DBARS offers all of the functions needed for auditing access to sensitive data in Db2 tables.

Important Features for a Database Auditing Solution

As you investigate the database access auditing requirements for your organization, you should make sure that the solutions you examine support your DBMS using intercept-based auditing, instead of the other methods.

You should also compile a list of the types of questions that you want your solution to be able to answer. A good database access auditing solution should be able to provide answers to at least the following questions:

  • Who accessed the data?
  • At what date and time was the data accessed?
  • What program or client software was used to access the data?
  • For batch mainframer users, what was the z/OS job name?
  • From what location was the request issued?
  • For distributed Db2 access, what were the names of the external server, application, and workstation?
  • What SQL was issued to access the data?
  • Was the request successful; and if so, how many rows of data were accessed or modified?
  • If the request was a modification, what data was changed? (A before and after image of the change should be accessible.)
Of course, there are numerous details that must be investigated for each of these questions. You will want to be able to review recent activities, but you will also want to be able to review actions that happened in the past, so a robust database access auditing solution should provide an independent mechanism for the long-term storage and access of audit details. It should be easy to query the audit trail, perhaps even offering canned queries for the most common types of queries. Nonetheless, the audit information should be accessible using industry-standard query tools to make it easier for auditors to customize queries as necessary.

An alerting capability is also desirable, such that when certain SQL activity is intercepted an alert is triggered to take further actions, such as recording an exception, sending information to a log, or pinging a DBA or security admin.

Advanced auditing solutions also provide the ability to proactively block suspect access to the database. For example, you may want to stop any attempted access outside of normal, scheduled programs over the weekend. At any rate, it is desirable for an auditing solution to be able to block activities based on parameters such as user name, program name, IP address, execution time, type of access, and the like. Such a capability is important because preventing fraudulent access is preferable to allowing it and reporting that it happened!

It is also important for a comprehensive database auditing solution to provide a mechanism to audit privileged users, such as DBAs and SYSADMs. Many privileged users have blanket access to all corporate data. Although they can access and modify it at their discretion, they should not be accessing and modifying production data without due cause. A database auditing solution enables organizations to implement a “trust, but verify” policy with their privileged users. This allows the administrators to retain the authority they need to be able to do their jobs, while at the same time giving the organization the peace of mind that everything the privileged users are doing is tracked for security and compliance purposes. Without a database auditing solution in place, privileged users are a potential compliance problem lurking within every database implementation.

The Benefits of a Professional Database Auditing Solution

The bottom line is that database auditing should be a crucial component of your organization’s data protection strategy. Auditing database activity is a core requirement of compliance with many government and industry regulations, but auditing is also an essential component of securing and protecting the important production data in your database systems.

Be sure to study the auditing and compliance requirements of your organization and to augment your DBMS with the appropriate tools to bolster the auditability of your databases.

Monday, January 24, 2022

Db2 DATE and TIME Arithmetic

After turning the last page on the 2021 calendar to embark on another year, I thought it might be a good time to reflect on date and time data... and how it can be used in Db2 for z/OS.


One of the most important database design decisions you will make – for Db2 or, indeed, any relational DBMS – is to use the proper data types for your columns when building tables. The data type that you choose for each column should be the one that most closely matches the domain of values that the column can be used to store.

Perhaps the most important design choice is to use the date and time data types that are available to you; for Db2 this is DATE, TIME, and TIMESTAMP. The ability to store dates as a native Db2 data type is a great advantage. If you need to store date information in your Db2 tables you should favor using the DATE data type instead of forcing a date into some other data type (such as CHAR or DECIMAL). Many a database design has been ruined (in my opinion) because somebody decided to store date data in a CHAR column.

When Db2 knows that the data should be a DATE or a TIME it can force data integrity such that no non-date/time value could ever be stored in the column. This is a big advantage, but it is not the only one. Db2 also provides numerous display formats so date and time values can be displayed in many different ways without having to store them in specific display formats.

Another big reason is that Db2 allows users to perform date/time arithmetic. So, you can easily use date columns to calculate durations or past and future dates based on a number of days, months, and years. The same type of arithmetic can be used for time and timestamp data. Just think about the application code you would have to write to manipulate and manage date/time values!

How Does Date/Time Arithmetic Work?

Db2 enables you to add and subtract DATE, TIME, and TIMESTAMP columns. In addition, you can add date and time durations to or subtract them from, such columns.

Arithmetic on date and time data is fairly straightforward. The plus (+) and minus (-) operations can be used on date and time values and durations. A duration is a number used to represent an interval of time. Db2 recognizes four types of durations.

1.       A labeled duration explicitly specifies the type of duration. An example of a labeled duration is 15 MINUTES. Labeled durations can specify the duration in years, months, days, hours, minutes, seconds, or microseconds.

2.       A DATE duration is a DECIMAL(8,0) number that has the format YYYYMMDD. The YYYY represents the number of years in the duration, MM the number of months, and DD the number of days. When you subtract one date from another, the result is a date duration in this format.

3.       A TIME duration is a DECIMAL(6,0) number with the format HHMMSS. The HH represents the number of hours, MM the number of minutes, and SS the number of seconds. When you subtract one time from another, the result is a time duration in this format.

4.       A TIMESTAMP duration is more complex than date and time durations. The
TIMESTAMP duration is a DECIMAL(20,6) number having the format YYYYXXDDHHMMSSZZZZZZ. The duration represents YYYY years, XX months, DD days, HH hours, MM minutes, SS seconds, and ZZZZZZ microseconds. When you subtract a TIMESTAMP from a TIMESTAMP, you get a TIMESTAMP duration.

So, if you want to add one hour to a TIME column you can simply specify TIME_COL + 1 HOUR. Or subtract a day from a date column easily, such as DATE_COL – 1 DAY. Simple, right?

Well, the rules for date and time arithmetic are somewhat complex. Remember that only addition and subtraction can be performed on date and time data (no division or multiplication). In addition, one of the two operands must be a duration. This stands to reason. For example, two dates cannot be added together, but a duration can be added to a date. The same goes for two times.

Use date and time arithmetic with care. If you understand the capabilities and features of date and time arithmetic, you should have few problems implementing it. Keep the following rules in mind:

                  When you issue date arithmetic statements using durations, do not try to establish a common conversion factor between durations of different types. For example, the date arithmetic statement:
                  DATE(‘2014/04/03’) - 1 MONTH

              is not equivalent to the statement:
                                DATE(‘2014/04/03’) - 30 DAYS

April has 30 days, so the normal response would be to subtract 30 days to subtract one month. The result of the first statement is 2014/03/03, but the result of the second statement is 2014/03/04. In general, use like durations (for example, use months or use days, but not both) when you issue date arithmetic.

                  If one operand is a date, the other operand must be a date or a date duration. If one operand is a time, the other operand must be a time or a time duration. You cannot mix durations and data types with date and time arithmetic.

                  If one operand is a timestamp, the other operand can be a time, a date, a time duration, or a date duration. The second operand cannot be a timestamp. You can mix date and time durations with timestamp data types.

Using Date/Time Functions

Db2 also provides numerous date/time functions that can be used to simply manipulate and modify date/time values. Let’s take a look at an example.

Suppose you want to express the duration resulting from date subtraction as a total-number-of-days (exact total, and not an approximate total)? Consider this query:

     SELECT DATE ('03/01/2004') - '12/01/2003'

It returns a duration of 00000300 (that is, 3 months). And those 3 months encompass a 29-day February plus a 31-day January plus a 31-day December (total 91 days). So the answer that we want to return is 91.

The answer to this conundrum lies in using the DAYS function. The following will return the result as a number of days:

     SELECT DAYS('03/01/2004') - DAYS('12/01/2003')

This query will return to you the exact number of days between the two dates. The DAYS function converts a Db2 date or timestamp into an integer value representing one more than the number of days since January 1, 0001. So converting both dates using the DAYS function and subtracting yields the desired result.

The functions provided by Db2 that can be applied to DATE, TIME, and TIMESTAMP columns can greatly assist your development efforts when using date/time data. Chances are you will find a function to help you with your task at hand, depending on your specific needs. Consider the functions listed in the following table.

DB2 Date/Time Functions

CHAR

Converts a Db2  date, time, timestamp, ROWID, floating point, integer, or decimal value to a character value.

DATE

Converts a value representing a date to a Db2 date. The value to be converted can be a Db2 timestamp, a Db2 date, a positive integer, or a character string.

DAY

Returns the day portion of a Db2 date or timestamp.

DAYOFMONTH

Similar to DAY except DAYOFMONTH cannot accept a date duration or time duration as an argument.

DAYOFWEEK

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the day of the week. The value 1 represents Sunday, 2 Monday, 3 Tuesday, 4 Wednesday, 5 Thursday, and so on.

DAYOFYEAR

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the day within the year. The value 1 represents January 1st, 2 January 2nd, and so on.

DAYS

Converts a Db2 date or timestamp into an integer value representing one more than the number of days since January 1, 0001.

EXTRACT

Returns a portion of a date or timestamp; you can use EXTRACT to slice up a date/time value into its component pieces.

HOUR

Returns the hour portion of a time, a timestamp, or a duration.

JULIAN_DAY

Converts a Db2 date or timestamp, or character representation of a date or timestamp, into an integer value representing the number of days from January 1, 4712 BC to the date specified in the argument.

MICROSECOND

Returns the microsecond component of a timestamp or the character representation of a timestamp.

MIDNIGHT_SECONDS

Returns the number of seconds since midnight for the specified argument which must be a time, timestamp, or character representation of a time or timestamp.

MINUTE

Returns the minute portion of a time, a timestamp, a character representation of a time or timestamp, or a duration.

MONTH

Returns the month portion of a date, a timestamp, a character representation of a date or timestamp, or a duration.

MONTHS_BETWEEN

Returns an estimate of the number of months between two expressions.

QUARTER

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the quarter within the year. The value 1 represents first quarter, 2 second quarter, 3 third quarter, and 4 fourth quarter.

SECOND

Returns the seconds portion of a time, a timestamp, a character representation of a time or timestamp, or a duration.

TIME

Converts a value representing a valid time to a Db2 time. The value to be converted can be a Db2 timestamp, a Db2 time, or a character string.

TIMESTAMP

Obtains a timestamp from another timestamp, a valid character-string representation of a timestamp, or a combination of date and time values (without a time zone).

TIMESTAMPADD

Adds an interval to a timestamp.

TIMESTAMPDIFF

Subtracts two timestamps and returns an interval.

TIMESTAMP_FORMAT

Changes the display format for a timestamp value.

TIMESTAMP_TZ

Returns a timestamp with time zone value.

WEEK

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the week within the year (with Sunday as the first day of the week). The value 1 represents the first week of the year, 2 the second week, and so on.

WEEK_ISO

Converts a date, timestamp, or string representation of a date or timestamp into an integer that represents the week within the year (with Monday as the first day of the week).

YEAR

Returns the year portion of a date, a timestamp, or a duration.

A Caveat

Keep in mind that this post addresses date and time data in Db2 for z/OS. Yes, other RDBMS products also support date and time data, but there will be differences and nuances that you will need to understand for each. Here is a nice blog post that discusses working with dates on Oracle, SQL Server, PostgreSQL, and MariaDB if you are interested.

Summary

Using date and time data correctly can be a bit confusing, but the rewards of learning how to use Db2 to help you properly use date and time data are numerous. Do not continue to operate in the void. The wise Db2 professional will learn proper Db2 date and time usage and the vast support built into Db2 for manipulating date and time values.

And the sooner, the better!

That way you can let Db2 perform date and time formatting, integrity checking, and arithmetic… so you won’t have to!

Monday, January 03, 2022

Simplify and Speed Up Your Db2 for z/OS Data Copying and Movement

Register and attend a webinar on efficient Db2 data movement as presented by Craig S. Mullins on February 24, 2022 | 11:00 AM CST


Just about everybody is aware of the phenomenal data growth that organizations are experiencing these days. More data is being generated and stored than ever before. But not all of that data is unique or even production data. An underlying phenomenon contributing to data growth that is sometimes ignored is the vast amount of data that is copied and moved throughout every organization.

Data is constantly on the move. I wrote about this phenomenon in my recent column at Database Trends & Applications. From production systems to test systems, from one test system to another, and even from user to user. All of that data is required for multiple purposes, such as supporting transactional systems, mirroring production, reporting, performing analytics, for testing purposes, training, and more. And data is being refreshed and copied more frequently than ever before to support business requirements and changing technology needs such as agile development and DevOps.

How can your company support this myriad of data movement requirements without overworking your DBAs? What can be done to combat the sometimes excruciatingly long runtimes required to unload and load large tables using Db2 utilities? And let’s not forget that indexes and statistics need to be copied, too! Furthermore, nothing gets put on hold so that data can be copied, right? So how can test data management and data movement be automated to ensure that data is smoothly moved and refreshed even as application and database structure changes are made?

Attend Craig's webinar titled Simplify and Speed Up Your Db2 for z/OS Data Copying and Movement to learn more about these issues and to hear Craig’s insight on how to use BCV5 to automate, streamline, and speed up Db2 data movement based on his experiences using the tool in the field.

Thursday, December 23, 2021

Happy Holidays 2021-2022

 Just a quick blog post to wish everybody a very happy holiday season. Whether you celebrate Christmas, Hanukkah, Kwanza, or some other holiday season, all the best to you from me!


And a very Happy New Year to one and all. The past couple of years have been difficult and stressful for many, so here's a wish for a better 2022 for everybody.


I plan to take some time off over the last week of 2021 through New Year's Day, but I'll see you all again on the other side of the holiday season.


Monday, December 13, 2021

Top 10 Steps to Building the Right Indexes

One of the constant struggles with Db2 development, and indeed developing for any relational DBMS, is designing and creating appropriate indexes. Perhaps the most important thing you can do to assure optimal application performance when accessing data in a relational/SQL database is to create correct indexes for your tables based on the queries your applications use. Of course, this is easier said than done.

But we can start with some basics. For example, consider this SQL statement:

    SELECT   LASTNAME, SALARY
    FROM     EMP
    WHERE    EMPNO = '000010'
    AND      DEPTNO =  'D01';

What index (or indexes) would it make sense to build for this simple query? First, think about all the possible indexes that you could create. Your first short-list probably looks something like this:

  • Index1 on EMPNO
  • Index2 on DEPTNO
  • Index3 on EMPNO and DEPTNO

This is a good start, and Index3 is likely to be the best of the lot. It enables the optimizer to use the index to immediately look up the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table, you might want to examine the impact of creating yet another index on the table.

With the caveat that appropriate index creation can be complicated, let’s look at the Top 10 things you can do to build the right indexes on your tables:

1. Index by workload, not by object

Many people make the mistake of just guessing at some indexes to create when they are creating other database objects (like tables and tablespaces). But without an idea of how the tables are going to be accessed these guesses are usually wrong – at least some of them.

Indexes should be built to optimize the access of your SQL queries. To properly create an optimal set of indexes requires a list of the SQL to be used, an estimate of the frequency that each SQL statement will be executed, and the importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries most of the time be made.

If you are doing it any other way, you are doing it wrong.

2. Build indexes based on predicates

3. Index most-heavily used queries

Numbers 2 and 3 can be thought of as corollaries to Number 1… that is, these are the aspects of application workload that need to be examined to produce appropriate and effective indexes. 

Look at the predicates of the queries in the workload and work to create a set of indexes that match up to most (if not all) of them. When it is not practical to build all of the indexes you have identified, then it makes sense to look at the queries that will be used most often and optimize them first.

4. Index important queries

The more important the query, the more you might want to tune by index creation. If you are coding a query that the CIO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. 

Of course, the decision should depend on the application’s importance to the business - not just on the user’s importance. For example, if that clerk runs a query hundreds of times a day and the CIO runs his query only once, then perhaps the clerk's query is more important.

5. Index to avoid sorting (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses, the optimizer can use an index to avoid a sort, and thereby potentially improve performance.

6. Create indexes for uniqueness (PK, U)

Some indexes are required in order to make the database schema valid. Most Database systems require that unique indexes be created when unique and primary key constraints exist.

7. Create indexes for foreign keys

Even if not required, creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints (RI – referential integrity).

8. Consider adding columns for index-only access

Sometimes it can be advantageous to include additional columns in an index to increase the chances of index-only access. With index-only access, all of the data needed to satisfy the query can be found in the index alone — without having to read data from the tablespace.

For example, suppose that there is an index on the DEPTNO column of the DEPT table. The following query may use this index:

    SELECT   DEPTNAME
    FROM     DEPT
    WHERE    DEPTNO =  'D01';

The index could be used to access only those columns with a DEPTNO greater than D00, but then the DBMS would need to access the data in the table space to return the DEPTNAME. If you added DEPTNAME to the index, that is, create the index on (DEPTNO, DEPTNAME) then all of the data needed for this query exists in the index and additional I/O to the table space would not be needed. This technique is sometimes referred to as index overloading.

Of course, this is not always a good idea. You have to take into account whether other queries use the index and how it might negatively impact their performance.

It is also worth mentioning index include columns, which allows you to define additional (included) columns that are not part of the actual key but are available in the index. So we can create an index like this:


CREATE INDEX IXNAME
    ON DEPT ( DEPTNO )
     INCLUDE ( DEPTNAME )
The index key is only on the DEPTNO column, but the DEPTNAME is also included in the index, so the earlier query can be satisfied using index-only access with this index.

9. Don’t arbitrarily limit the number of indexes

An example of a bad standard would be to place an artificial limit on indexing. There should be no arbitrary limit on the number of indexes that you can create for any database table. Indexes are undoubtedly one of the most important factors in creating efficient queries. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data. When a rule such as this exists, it usually is stated in the standards manual using verbiage something like “Each table can have at most five indexes created for it” — or — “Do not create more than three indexes for any single table in the database.” These are bad standards. 

If you already have three indexes, or five indexes, or even 57 indexes, and another index will improve performance why would you arbitrarily want to avoid creating that index?

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.” 

Which brings us to…

10. Be aware of I/U/D implications

Db2 must automatically maintain every index you create. This means every INSERT and every DELETE to an indexed table will insert and delete not just from the table, but also from its indexes.

Additionally, when you UPDATE the value of a column that has been defined in an index, the DBMS must also update the index. So, indexes speed the process of retrieval but slow down modification.

So the general rule of thumb should be "Index until it hurts... and then back off the least effective index to make it no longer hurt." Sure, that is easier said than done, but it is a valid philosophy to follow.

Summary

Following these Top 10 index design techniques can go a long way toward improving not just your index usage, but also the performance of your database applications.