Showing posts with label data governance. Show all posts
Showing posts with label data governance. Show all posts

Tuesday, February 01, 2022

Auditing Database Access and Change: A Necessity More Than a Nicety

The increasing burden of complying with government and industry regulations imposes significant, time-consuming requirements on IT projects and applications. And nowhere is the pressure to comply with regulations greater than on data stored in corporate databases.

Organizations must be hyper-vigilant as they implement controls to protect and monitor their data. One of the more useful techniques to protect your company’s database data is through data access auditing, also known as simply database auditing. Database auditing is the process of monitoring access to, and modification of, selected database objects and resources within operational databases and retaining a detailed record of the access where that record can be retrieved and analyzed as needed.

A data access auditing capability enables companies to produce an audit trail of information with regard to their database data. This audit trail should contain information such as what database objects were impacted, who performed the operations, and when the activity occurred. A comprehensive audit trail of database operations, coupled with an analysis engine to review and analyze the audit trail allows data and security professionals as well as IT auditors to perform an in-depth analysis of access and modification patterns against data in your database systems. Only when armed with such details is it possible to comply with regulations, pass security audits, and drill down into the details to review potential vulnerabilities for effective issue resolution.

A Look at the Regulations and Requirement

A fine-grained audit trail is necessary to comply with many regulations that apply to organizations of all types.

Many of the PCI Data Security Standard requirements emphasize the importance of real-time monitoring and tracking of access to cardholder data, as well as continuous assessment of database security health status.

HIPAA, the Health Insurance Portability and Accountability Act, directs health care providers to protect individual’s health care information going so far as to state that the provider must be able to deliver a list of everyone who even so much as looked at their patient’s information. Could you produce a list of everyone who looked at a specific row or set of rows in any database you manage?

And then there is the Sarbanes-Oxley Act (SOX) which has the goal of reducing fraud and conflicts of interest, as well as improving disclosure and financial reporting. Section 404 of the SOX Act specifies that the CFO must guarantee the accuracy of the processes used to add up the numbers; processes that access and manipulate data in a database system. As such, it is important to be able to track who changed database schemata and database data for SOX compliance

And these are only a few of the pertinent national, international, regional, and industry regulations that must be understood and complied with.

Database Access Auditing Techniques

So now that we understand why database access auditing is important, let’s take a look at how it can be accomplished. There are several popular techniques that can be deployed to audit your database structures.

The first technique is trace-based auditing, which is typically built directly into the native capabilities of the DBMS. For example, the Audit trace feature of IBM Db2 for z/OS. When an audit trace is started, the DBMS begins to cut trace records when activity occurs against audited objects (selected by DDL option). However, Db2 only captures the first read or write per unit of work, which will clearly miss activities as most UOWs encompass more than one read or write. Alternately, Db2 audit policies can be created for named tables to capture all activity, which improves the data captured, but can create an excess of audit records that need to be stored in SMF data sets.

So, there are problems with this technique including a high potential for performance degradation when audit tracing is enabled, a high probability that the database schema will need to be modified, and insufficient granularity of audit control, especially for reads.

Another technique is to scan and parse transaction logs. Every DBMS uses transaction logs to capture every database modification for recovery purposes. If you can read the log and interpret the data (which can be challenging as the data is not simple) it is possible to identify what data was changed and by which users. The biggest drawback to this technique is that database reads are not captured on transaction logs.

Additional issues with relying on log analysis for auditing data access include: it is possible to disable logging such that modification information will not be on the log and therefore not captured; performance issues scanning volumes and volumes of log files looking for only specific information to audit; and the difficulty of retaining logs over long periods for auditing when they were designed for short-term retention for database recovery.

And that brings us to the third, and preferred, method of database auditing for organizations that are serious about regulatory compliance: professional software that proactively monitors and intercepts all SQL requests as they are executed by the DBMS. It is important that all SQL access is audited by monitoring for SQL at the database level, not just by sniffing network calls. This is important because not every SQL request goes over the network, especially for the mainframe platform where much of the activity is centralized and many important business transactions never venture over an IP network (e.g., a CICS or IMS transaction accessing Db2).

Proactive intercept-based database audit monitoring does not require transaction logs, does not require database schema modification, should be highly granular in terms of specifying what to audit, and should incur only minimal overhead.

One such product that implements intercept-based auditing for Db2 database access is DBARS, which stands for “Db2 Access Recording Services,” available from ESAI Group

DBARS

Using DBARS for your Db2 database auditing needs makes sense as it offers a high-speed method for intercepting all Db2 database activities, not just modifications but also all reads. Furthermore, DBARS does not rely on Db2 tracing; instead, it uses a proprietary interface to intercept all Db2 SQL statements, regardless of origin. Therefore, you can use DBARS to create audit reports of all Db2 activity. Furthermore, DBARS provides the ability to block suspicious SQL activity, giving you the ability to prevent fraudulent access attempts based on specific parameters and patterns.

As you look into your mainframe Db2 database auditing needs and compare functionality against the advice proffered in this article, you will see that DBARS offers all of the functions needed for auditing access to sensitive data in Db2 tables.

Important Features for a Database Auditing Solution

As you investigate the database access auditing requirements for your organization, you should make sure that the solutions you examine support your DBMS using intercept-based auditing, instead of the other methods.

You should also compile a list of the types of questions that you want your solution to be able to answer. A good database access auditing solution should be able to provide answers to at least the following questions:

  • Who accessed the data?
  • At what date and time was the data accessed?
  • What program or client software was used to access the data?
  • For batch mainframer users, what was the z/OS job name?
  • From what location was the request issued?
  • For distributed Db2 access, what were the names of the external server, application, and workstation?
  • What SQL was issued to access the data?
  • Was the request successful; and if so, how many rows of data were accessed or modified?
  • If the request was a modification, what data was changed? (A before and after image of the change should be accessible.)
Of course, there are numerous details that must be investigated for each of these questions. You will want to be able to review recent activities, but you will also want to be able to review actions that happened in the past, so a robust database access auditing solution should provide an independent mechanism for the long-term storage and access of audit details. It should be easy to query the audit trail, perhaps even offering canned queries for the most common types of queries. Nonetheless, the audit information should be accessible using industry-standard query tools to make it easier for auditors to customize queries as necessary.

An alerting capability is also desirable, such that when certain SQL activity is intercepted an alert is triggered to take further actions, such as recording an exception, sending information to a log, or pinging a DBA or security admin.

Advanced auditing solutions also provide the ability to proactively block suspect access to the database. For example, you may want to stop any attempted access outside of normal, scheduled programs over the weekend. At any rate, it is desirable for an auditing solution to be able to block activities based on parameters such as user name, program name, IP address, execution time, type of access, and the like. Such a capability is important because preventing fraudulent access is preferable to allowing it and reporting that it happened!

It is also important for a comprehensive database auditing solution to provide a mechanism to audit privileged users, such as DBAs and SYSADMs. Many privileged users have blanket access to all corporate data. Although they can access and modify it at their discretion, they should not be accessing and modifying production data without due cause. A database auditing solution enables organizations to implement a “trust, but verify” policy with their privileged users. This allows the administrators to retain the authority they need to be able to do their jobs, while at the same time giving the organization the peace of mind that everything the privileged users are doing is tracked for security and compliance purposes. Without a database auditing solution in place, privileged users are a potential compliance problem lurking within every database implementation.

The Benefits of a Professional Database Auditing Solution

The bottom line is that database auditing should be a crucial component of your organization’s data protection strategy. Auditing database activity is a core requirement of compliance with many government and industry regulations, but auditing is also an essential component of securing and protecting the important production data in your database systems.

Be sure to study the auditing and compliance requirements of your organization and to augment your DBMS with the appropriate tools to bolster the auditability of your databases.

Friday, July 16, 2021

Keeping Track of Data Movement in Db2 for z/OS

Creating and managing test data for Db2 application development and testing requirements can be a significant challenge. To enable not only the development of new programs, but to be able to maintain existing ones, organizations must ensure that there is an adequate amount of accurate test data always available. Without relevant, useful data, there is no way to test applications to make sure they are operating correctly. 

Although this duty must be a shared one between the application developers and the DBAs, managing and controlling all of the data movement tasks typically falls on the DBAs. And keeping track of what data moved where, when it moved, and why can at times be as much of a challenge as moving the data itself.

Fortunately, there are test data management tools available to not only move the data, but to keep track of it. I’ve written about one of the better Db2 for z/OS data movement tools here in the blog before: Fast and Effective Db2 for z/OS Test Data Management with BCV5. I hope you'll take a moment to click and read that post.

Now BCV5 has been improved with a new reporting feature, to enable users to track the movement of data across their Db2 subsystems. This is a significant new feature that can be used to glean useful information for DBAs, storage administrators, and even by data stewards for data governance.

There are six tables of metadata that BCV5 populates to track the data movement and the copy tasks it performs. These tables are:
  • BCV531.TASK_EXECUTIONS
  • BCV531.TASK_EXECUTIONS_OBJECTS
  • BCV531.TASK_EXECUTIONS_PARAMETERS
  • BCV531.TASK_EXECUTIONS_JOBS
  • BCV531.TASK_EXECUTIONS_RULES
  • BCV531.TASK_EXECUTIONS_MASKING
The information in these tables is updated whenever BCV5 runs a task to copy Db2 data. Users can query these tables just like any other Db2 tables to monitor the details of the BCV5 tasks you have run. This information may be useful for many different IT and business professionals, but let’s take a look at three specific use cases: 
  1. database administration (DBA), 
  2. storage administration, and 
  3. data governance.
DBA tracking
DBAs tasked with moving and refreshing data from one Db2 environment to another are the typical users of BCV5, and therefore they will be one of the primary users of the new reporting tables. Most sites that use BCV5 use it to refresh test data, for example, copying production data to test, or copying unit test data to an integration test set of tables.  

Regardless of the type of data movement that is being undertaken, it is usually being done for multiple tables, tablespaces, and databases. Usually, there will be regularly scheduled processes that copy some of the data, but this is rarely sufficient as there will be on-off requests, special situations, and emergency data refreshes happening all the time. Keeping track of such a hectic morass of copying and refreshing data can be difficult. 

Fortunately, if you are using BCV5 the new Usage Tracker tables can simplify keeping track of data refreshes for DBAs. For example, a DBA looking to find out which BCV5 copy tasks were run during the month of May could code a query like:

 SELECT T.ROWDATE, T.USER, T.TASKNAME 
 FROM BCV531.TASK_EXECUTIONS T 
 WHERE T.ROWDATE BETWEEN ´2021-05-01´ AND ´2021-05-31´ 
 ORDER BY T.ROWDATE ;

This will show all the BCV5 copy tasks that ran during that timeframe, and will look something like this:

ROWDATE      USER     TASKNAME 
---------+---------+---------+---------+--------- 
2021-05-12   USERID1  TSK0001 
2021-05-12   USERID1  TSK0002 
2021-05-20   USERID5  TSKPROD1 
2021-05-21   USERID9  TSKPROD4 
 

The results shown here are just a sample and will likely be a subset of the actual results of running such a query. 

Of course, this is rudimentary information and it is likely that the DBA will want to know more, such as which objects were impacted by these tasks. A query such as the following will come in handy:

SELECT SUBSTR(SRCSCHEMA,1,8) AS SS, 
       SUBSTR(SRCNAME,1,12) AS SN, 
       SUBSTR(TGTSCHEMA,1,8) AS TS, 
       SUBSTR(TGTNAME,1,12) AS TN, 
       SUBSTR(OBJTYPE,1,1), 
       ROWDATE AS DATE_COPIED, 
       SIZEKB 
FROM   BCV531.TASK_EXECUTIONS_OBJECTS 
ORDER BY DATE_COPIED ;

The results here show all the Db2 objects copied by BCV5 showing the source and target names as well as the object type, date copied, and amount of data (in KB) copied:

SS      SN       TS     TN         DATE_COPIED   SIZEKB 
---------+---------+---------+---------+---------+-----
DB500XA TS500X01 DBA001 TS500XA1 S  2021-05-17  1462480 
QUALID  XCL59011 TESTID XCL59011 X  2021-05-17   325040 
QUALID  XCL59012 TESTID XCL59012 X  2021-05-17   125200 
QUALID  XCL59013 TESTID XCL59013 X  2021-05-17   301460 
QUALID  XCL5901C TESTID XCL5901C X  2021-05-17    98400 
QUALID  TEST_TBL TESTID TEST_TBL T  2021-05-17       20 

Again, the results have been truncated as this is intended as an example.

A DBA looking to track down the results of a specific copy task that ran on a specific date might want to run a query like this, to verify which objects were copied. Simply plug in the name of your task and the date it ran:

SELECT T.TASKNAME, O.OBJTYPE, O.SRCSCHEMA, O.SRCNAME, 
       O.TGTSCHEMA, O.TGTNAME, O.PARTITIONS 
FROM   BCV531.TASK_EXECUTIONS         T, 
       BCV531.TASK_EXECUTIONS_OBJECTS O 
WHERE  T.ID = O.EXECID 
AND    T.TASKNAME = ? 
AND    T.ROWDATE = ? ;

Storage Administration Tracking 
Another type of user who might find the Usage Tracker capabilities of BCV5 useful is the storage administrator. Storage administrators are responsible for managing an organization’s disk and tape systems. Additionally, they are also responsible for monitoring storage usage and capacity to ensure that sufficient storage is available for the organization’s IT requirements. 

As such, the storage administrator will likely want to keep an eye on the data movement activities of BCV5. For example, a query such as this one can be used to report on the total amount of data (TS and IX) copied by date:

SELECT ROWDATE AS DATE_COPIED, 
       SUM(SIZEKB) AS TOTAL_KB_COPIED 
FROM   BCV531.TASK_EXECUTIONS_OBJECTS 
WHERE  OBJTYPE IN ('S', 'X') 
GROUP BY ROWDATE ;

Which will return data similar to this:

DATE_COPIED  TOTAL_KB_COPIED 
---------+---------+---------+---------+------ 
2021-06-12         106231270 
2021-06-19         106231270 
2021-06-21        4451457810 
2021-06-26         106231270 
Another potentially useful query, not only for storage administrators and DBAs, but also for application managers, is tracking the amount of actual (tablespace) data copied by date and application. Finding the application name or identifier can be tricky, but if we assume that an application identifier is embedded in the second 2 chars of database name then a query like this can be run:

WITH SIZEBYAPP AS ( 
  SELECT ROWDATE AS DATE_COPIED, 
         SUBSTR(TGTSCHEMA,2,2) AS APPL_NAME, 
         SIZEKB AS SIZE_IN_KB 
  FROM   BCV531.TASK_EXECUTIONS_OBJECTS 
  WHERE OBJTYPE = 'S' 
                  ) 
SELECT DATE_COPIED, APPL_NAME, 
       SUM(SIZE_IN_KB) AS TOTAL_KB_COPIED 
FROM   SIZEBYAPP 
GROUP BY DATE_COPIED, APPL_NAME ;

Which might return a report looking something like this:

DATE_COPIED APPL_NAME TOTAL_KB_COPIED 
---------+------------------+---------+------- 
2021-05-11  EN               46805760 
2021-05-22  BA              242791056 
2021-05-22  BX                4094640 
2021-05-22  CM                1008720 
2021-05-22  DA              270390816 
2021-05-22  OR                  90528 
2021-05-26  PR               55737376 
2021-05-26  XX              537647328 

You can adjust this query if you want to know the amount of index data copied by data and application like so (under the same assumption as above for application identifier):

WITH SIZEBYAPP AS ( 
  SELECT B.ROWDATE AS DATE_COPIED, 
         SUBSTR(T.DBNAME,2,2) AS APPL_NAME, 
         B.SIZEKB AS SIZE_IN_KB 
  FROM   BCV531.TASK_EXECUTIONS_OBJECTS B, 
         SYSIBM.SYSINDEXES              X, 
         SYSIBM.SYSTABLES               T 
 WHERE   B.OBJTYPE = 'X' 
 AND     B.TGTNAME = X.NAME 
 AND     B.TGTSCHEMA = X.CREATOR 
 AND     X.TBNAME = T.NAME 
 AND     X.TBCREATOR = T.CREATOR 
                 ) 
SELECT DATE_COPIED, APPL_NAME, 
       SUM(SIZE_IN_KB) AS TOTAL_KB_COPIED 
FROM   SIZEBYAPP 
GROUP BY DATE_COPIED, APPL_NAME ;

Data Governance Tracking 
Although tracking data movement activities is useful for DBAs, it is also important for data governance reporting. Data governance refers to the processes and standards of ensuring access to high-quality data throughout an organization. Data governance encompasses all aspects of data quality including its accuracy, availability, consistency, integrity, security, and usability. The role of data governance has expanded as data privacy rules and regulations have expanded in response to an increasing number of data breaches and hacker attacks. For example, in early July 2021, Colorado passed the Colorado Privacy Act, meaning that now Colorado, Virginia, and California have passed data privacy legislation that impacts how personal data must be governed. More states are certain to follow their lead… and let’s not forget the European GDPR act!

These type of regulations provide rights for access, deletion, correction, portability, and protection for personally identifiable information, or PII. Note that “portability” is one aspect of data protection covered under the auspices of such regulations… and BCV5 is a mover of data, so you need to be able to track what data moved where, especially when the data that moved contains any PII. 

So, what types of queries can be run using the new BCV5 reporting tables to help satisfy the needs of data governance? 

Well, if you have identified specific tables that have personally identifiable information, and therefore requires specific policies to ensure its privacy and protection, a data steward might want to run a query that shows all of the times that a specific protected table was copied:

SELECT SUBSTR(SRCSCHEMA,1,8) AS SS, 
       SUBSTR(SRCNAME,1,12) AS SN, 
       SUBSTR(TGTSCHEMA,1,8) AS TS, 
       SUBSTR(TGTNAME,1,12) AS TN, 
       ROWDATE AS DATE_COPIED, 
       SIZEKB 
FROM   BCV531.TASK_EXECUTIONS_OBJECTS 
WHERE  OBJTYPE = 'T' 
AND    SRCSCHEMA = ? 
AND    SRCNAME = ? 
ORDER BY DATE_COPIED ;

Simply code the appropriate schema (SRCSCHEMA) and table name (SRCNAME) and this query will show all the times that the particular table (say PROD.CUSTOMERS) was copied. A data governance professional with a list of tables that contains PII could alter this query to accept that list as an IN clause instead of the simple equality clause shown here. 

Additionally, BCV5 can de-identify sensitive data using masking. Whenever a task that requires data to be masked is run, information is captured in the TASK_EXECUTIONS_MASKING table. So, a data governance professional might want to run a query like this one to report on all the masking of sensitive data.

SELECT SUBSTR(TBCREATOR,1,8) AS TBCREATOR,
       SUBSTR(TBNAME,1,12) AS TBNAME, 
       SUBSTR(COLNAME,1,18) AS COLNNAME, 
       METHOD AS MASKING_METHOD, 
       SQLEXPR 
FROM   BCV531.TASK_EXECUTIONS_MASKING 
ORDER BY ROWDATE ;

This can always be modified to join it to the TASK_EXECUTIONS table to obtain the task name if that is important. And with a little manipulation of the query it is possible to look for tables that contain sensitive data that have been copied using BCV5, but have not had masking applied. 

Summary 
BCV5 has offered powerful data movement and masking capabilities for Db2 data for a long time, but now it also offers the ability to track and report on your organization’s Db2 data movement and copy tasks. This new functionality opens a plethora of useful information for BCV5 users.