The Db2 Portal Blog: recovery

Showing posts with label recovery. Show all posts

Thursday, November 09, 2023

Top Ten Db2 Performance Tips – No. 9 Regular Maintenance and Housekeeping

When thinking about the performance characteristics of your Db2 databases and applications keep in mind the routine maintenance that can help or hinder your performance. Regular maintenance and housekeeping tasks are essential for ensuring the optimal performance and stability of your Db2 environment.

By performing routine maintenance activities, database administrators can proactively address performance degradation, optimize query execution, and maintain a healthy database environment. In today’s post, we will highlight the importance of regular maintenance tasks and discuss key activities that contribute to database performance.

Index Maintenance

Indexes play a crucial role in query performance, as they facilitate quick data retrieval. Over time, indexes can become fragmented, leading to increased disk I/O and decreased query performance. Regularly reorganizing indexes helps eliminate fragmentation and enhances query execution efficiency. By scheduling index reorganization tasks based on the fragmentation level and database activity, administrators can maintain optimal index performance and minimize the impact of fragmentation on query response times.

You should also monitor index usage and consider removing any unused indexes. You can identify unused indexes relatively easily using the LASTUSED information in the RTS SYSINDEXSPACESTATS table in the Db2 Catalog.

Tablespace Reorganization

As your database grows and data is modified or deleted, storage space can become fragmented, leading to suboptimal performance. Regularly reorganizing database structures helps to consolidate data, reclaim unused space, and optimize storage allocation. Reorganizing your tablespaces can improve I/O performance, reduce disk fragmentation, and enhance overall system efficiency. By performing regular database reorganizations based on data growth patterns and workload characteristics, administrators can maintain a well-organized and performant database environment.

Up-to-Date Statistics

Accurate and up-to-date statistics are essential for the Db2 optimizer to make informed decisions on query execution plans. As data in the database changes, statistics need to be updated to reflect the current distribution of data. Regularly updating statistics (using the RUNSTATS utility) ensures that the optimizer has the most accurate information to generate optimal execution plans. By analyzing data distribution patterns and scheduling statistics updates accordingly, administrators can improve query performance and avoid suboptimal query plans caused by outdated statistics.

For packages using static SQL, taking advantage of updated statistics requires rebinding. However, you may not want to rebind every time you run RUNSTATS unless application performance is suffering.

Routine Backups

Regular backups are vital for data protection and disaster recovery. Performing routine database backups not only safeguards the integrity of the database but also contributes to performance optimization. In the event of a failure or data loss, having a recent backup minimizes the recovery time and ensures business continuity. DBAs should establish a backup schedule based on the criticality of the data, recovery time objectives (RTOs), and workload requirements.

And do not forget to regularly test your recovery plans and capabilities. Too often DBAs focus on backups at the expense of recovery… and backups are needed primarily to enable recovery, right?

Transaction Log Management

And let’s not forget the transaction logs! Logs play a critical role in ensuring data consistency and recoverability. Regularly monitoring and managing the transaction log space helps prevent log-related performance issues and ensures uninterrupted database operations. Activities such as transaction log backups, log file sizing, optimizing log offloading, and log file utilization monitoring are crucial for maintaining optimal transaction log performance and managing log space efficiently.

Buffer Pool Review

Regularly monitoring the efficiency of your Db2 buffer pools is important to ensure that you are achieving expected hit ratios and performance. As new applications are added, your amount of data increases, and access patterns change it will be necessary to adjust buffer pool sizes and parameters to optimize performance.

System Maintenance

Be sure to keep your Db2 subsystem updated with recent maintenance. You can use the -DIS GROUP command, even if you are not running data sharing, to display the current status of your Db2 software.

This command returns the message DSN7100I which is documented at https://www.ibm.com/docs/en/db2-for-zos/13?topic=messages-dsn7100i. It will show you the current version and function level, the current code level, and also the highest possible function level you can activate for your Db2 environment.

Be sure too to follow the IBM recommended preventive maintenance strategy to apply the appropriate maintenance using the IBM supplied RSU (Recommended Service Upgrade).

Summary

By incorporating these regular maintenance tasks into your database administration and management routine, DBAs and systems programmers can optimize performance, prevent performance degradation, and mitigate potential issues. Automation tools and scripts can streamline these maintenance activities and ensure consistency and timeliness in execution.

Monday, July 13, 2020

Take Advantage of the Wealth of Information in Your Db2 Logs

The Db2 for z/OS log sometimes referred to as the transaction log, is a fundamental component of Db2 that is central to all data activity in the database management system. Every change to application data (with a few exceptions) is recorded serially in the log as the change is made. The log is a key resource for ensuring data integrity and recoverability of data. Using the logged information, Db2 can track which transaction made which changes to the database.

The log contains units of recovery, checkpoint data, control records, and other pieces of information needed to ensure that your data is successfully managed and changed appropriately. Log data is crucial for rolling back unwanted changes, recovering database objects, and resetting the database back to a particular point-in-time.

During normal database application processing SQL inserts, updates, and deletes are executed to modify data in the database. As these database modifications are made, they are recorded in the log. The Db2 transaction log is a write-ahead log. This means that changes are made to the transaction log before they are actually made to the data in the database itself. When the database modification has been fully recorded on the log, recovery of the transaction is guaranteed.

Periodically, a system checkpoint is taken by Db2 to guarantee that all log records and all modified database pages are written safely to disk. The frequency of database system checkpoints can be set up by the DBA using database configuration parameters – usually, checkpoint frequency is set either as a predetermined time interval or as a preset number of log records written.

Generally, the following type of information is recorded on the database log:

the beginning and ending time of each transaction
the actual changes made to the data and enough information to undo the modifications made during each transaction (accomplished using before and after images of the data)
the allocation and deallocation of database pages
the actual commit or rollback of each transaction

Using this data the DBMS can accomplish data integrity operations to ensure consistent data is maintained in the database. Of course, there are other “things” stored in the log, but I don’t want to get into an in-depth discussion of that. At this point, it is time to start thinking about all of the great information stored on the log, and how we can take advantage of that information to accomplish many different tasks.

Using the Db2 Log

There are many worthwhile uses for the data on the Db2 log other than the operational necessities as required by Db2 for z/OS itself. Because the log records data changes, it can aid in the delivery of data propagation, database auditing, surgically repairing changed data using undo and redo SQL, and undropping database objects. You can also use the Db2 log to report on all changes and identify erroneous changes as well as when they were made.

Of course, to use the log you need to understand the structure of the data (schema) and how it is configured. Log records are not simply laid out and it can take a long time to digest and understand the data. For this reason, many organizations look to acquire a product to deliver visibility and usage of log data.

And that brings us to UBS-Hainer’s ULT4Db2^TM.

ULT4Db2 is a powerful log analysis product that delivers multiple capabilities for DBAs to manage, control, and analyze Db2 logs. ULT4Db2 simplifies most tasks associated with the Db2 log with a user-friendly ISPF interface and comprehensive automation features. No need to understand the Byzantine layout and structure of the Db2 log because ULT4Db2 does most of the heavy lifting for you.

You can keep Db2 tables synchronized using the ULT4Db2 data propagation feature. It can be used to directly execute the same INSERT, UPDATE, and DELETE statements against different target tables. Or you can direct ULT4Db2 to write those statements into external data sets. If your target tables are in a different database system or a platform like Oracle, Microsoft SQL Server, or other DBMS, then you can change the syntax of the generated statements to suit your needs.

You can keep track of changes to sensitive information using the ULT4Db2 audit capability. Use it to track who made what change to Db2 tables, when the change was made, and what exactly was changed. You can analyze all the changes over a given period of time and filter by user name, plan name, column contents, or any other criteria.

The repair capability of ULT4Db2 makes it simple to undo a single change -- or a single transaction -- that affected one or more tables. Instead of backing out or recovering an entire database object, ULT4Db2 can create SQL statements that revert a specific change that happened at a given point-in-time. This is sometimes referred to as undo SQL. To generate undo SQL, the database log is read to find the data modifications that were applied during a given timeframe and

INSERTs are turned into DELETEs.
DELETEs are turned into INSERTs.
UPDATEs are turned around to UPDATE to the prior value.

This technique is also called transaction recovery. A traditional recovery specifies a database object and then lays down a backup copy of the object and reapplies log entries to recover the object to a specific, wanted point-in-time. Transaction recovery enables a user to recover a specific portion of data based on user-defined criteria. So only a portion of the data is affected. With ULT4Db2, these undo/redo statements can be written to an external dataset so that DBAs can review them first before running them as you would any other SQL statements.

And the ULT4Db2 undrop feature makes it easy to restore an object that was accidentally dropped. Restoring an object to the state before the drop operation is usually tedious and error-prone, requiring a lot of resources and extra work for DBAs. But ULT4Db2 can bring back objects that have been accidentally dropped using information from the Db2 log and existing image copy data sets. The entire process is automated and does not require manual intervention. ULT4Db2 is able to undrop databases, table spaces, tables, and indexes. All foreign keys, check constraints, and table privileges are automatically recreated as well.

ULT4Db2 can generate a variety of reports that help you keep an overview on how your tables are used. It can summarize the INSERT, UPDATE, and DELETE activity for your tables by different criteria like unit-of-recovery, user name, or plan name. You can also produce a detailed report that contains each row as it was before and after each update. These are just a few of the many reports that are available using ULT4Db2.

Finally, you can automate your ULT4Db2 log analysis processes using the ISPF interface.

The Bottom Line

The Db2 for z/OS logs contain a plethora of useful data that can be exploited to better manage your Db2 environment and expose useful information for business purposes. Consider looking into a log analysis tool like ULT4Db2 to help you better control and access the large variety of useful business information embedded in your Db2 logs.

Thursday, September 04, 2014

The Importance of SLAs and RTOs

Assuring optimal performance is one of the most frequently occurring tasks for DB2 DBAs. Being able to assess the effectiveness and performance of various and sundry aspects of your DB2 systems and applications is one of the most important things that a DBA must be able to do. This can include online transaction response time evaluation, sizing of the batch window and determining whether it is sufficient for the workload, end-to-end response time management of distributed workload, and so on.

But in order to accurately gauge the effectiveness of your current environment and setup, Service Level Agreements, or SLAs, are needed. SLAs are derived out of the practice of Service-level management (SLM), which is the “disciplined, proactive methodology and procedures used to ensure that adequate levels of service are delivered to all IT users in accordance with business priorities and at acceptable cost.”

In order to effectively manage service levels, a business must prioritize its applications and identify the amount of time, effort, and capital that can be expended to deliver service for those applications.

A service level is a measure of operational behavior. SLM ensures that applications behave accordingly by applying resources to those applications based on their importance to the organization. Depending on the needs of the organization, SLM can focus on availability, performance, or both. In terms of availability, the service level might be defined as “99.95 percent uptime from 9:00 a.m. to 10:00 p.m. on weekdays.” Of course, a service level can be more specific, stating that “average response time for transactions will be 2 seconds or less for workloads of 500 or fewer users.”

For an SLA to be successful, all parties involved must agree on stated objectives for availability and performance. The end users must be satisfied with the performance of their applications, and the DBAs and technicians must be content with their ability to manage the system to the objectives. Compromise is essential to reach a useful SLA.

In practice, though, many organizations do not institutionalize SLM. When new applications are delivered, there may be vague requirements and promises of subsecond response time, but the prioritization and budgeting required to assure such service levels are rarely tackled (unless, perhaps, if the IT function is outsourced). It never ceases to amaze me how often SLAs simply do not exist. I always ask for them whenever I am asked to help track down performance issues or to assess the performance of a DB2 environment.

Let's face it, if you do not have an established agreement for how something should perform, and what the organization is willing to pay to achieve that performance, then how can you know whether or not things are operating efficiently enough? The simple answer is: you cannot.

It may be possible for a system assessment to offer up general advice on areas where performance gains can be achieved. But in such cases -- where SLAs are non-existent -- it you cannot really deliver guidance on whether the effort to remediate the "problem areas" is worthwhile. Without the SLAs in place you simply do not know if current levels of performance are meeting agreed upon service levels, because there are no agreed-upon service levels (and, no, "subsecond respond time" is NOT a service level! Additionally, you cannot know what level of spend is appropriate for any additional effort needed to achieve the potential performance, because no budget has been agreed upon.

Another potential problem is the context of the service being discussed. Most IT professionals view service levels on an element-by-element basis. In other words, the DBA views performance based on the DBMS, the SysAdmin views performance based on the operating system or the transaction processing system, and so on. SLM properly views service for an entire application. However, it can be difficult to assign responsibility within the typical IT structure. IT usually operates as a group of silos that do not work together very well. Frequently, the application teams operate independently from the DBAs, who operate independently from the SAs, and so on.

To achieve end-to-end SLM, these silos need to be broken down. The various departments within the IT infrastructure need to communicate effectively and cooperate with one another. Failing this, end-to-end SLM will be difficult to implement.

The bottom line is that the development of SLAs for your batch windows, your transactions and business processes is a best practice that should be implemented at every DB2 shop (indeed, you can remove DB2 from that last sentence and it is still true).

Without SLAs, how will the DBA and the end users know whether an application is performing adequately? Not every application can, or needs to, deliver subsecond response time. Without an SLA, business users and DBAs may have different expectations, resulting in unsatisfied business executives and frustrated DBAs—not a good situation.

With SLAs in place, DBAs can adjust resources by applying them to the most-mission-critical applications as defined in the SLA. Costs will be controlled and capital will be expended on the portions of the business that are most important to the business. Without SLAs in place, an acceptable performance environment will be ever elusive. Think about it; without an SLA in place, if the end user calls up and complains to the DBA about poor performance, there is no way to measure the veracity of the claim or to gauge the possibility of improvement within the allotted budget.

Recovery Time Objectives (RTOs)

Additionally, the effectiveness of backup and recovery should be a concern to all DB2 DBAs. This requires that RTOs (Recovery Time Objectives) be established. An RTO is basically an SLA for the recovery of your database objects. Without RTOs, it is difficult (if not impossible) to gauge the state of recoverability and the efficacy of image copies being taken.

Each database object should have an RTO assigned to it. The RTO needs to take into account the same type of things that an SLA considers. In other words, the business must prioritize its applications, DBAs must map database objects to the applications, and together they must identify the amount of time, effort, and capital that can be expended to assure the minimization of downtime for those applications.

Again, we are measuring operational behavior. The RTO ensures that, when problems occur requiring database recovery, the application outage is limited to what has been defined as tolerable for the business (in terms of uptime and cost to provide that uptime).

Again, as with an SLA, for the RTO to be successful, all parties involved must agree on stated objectives for downtime and time to recovery. The end users must be satisfied with the potential duration of their application’s downtime, and the DBAs and technicians must be content with their ability to recover the system to the objectives. And again, cost is a contributing factor. The RTO cannot simply be I need my application up in 5 minutes and I can’t spend any more money to do that, because that is not reasonable (or possible).

Without written RTOs, DBAs can provide due diligence to make sure that database objects are backed up and recoverable, but cannot really provide any guarantee in terms of how quickly the data can be recovered (or perhaps, to what point in time) when an outage occurs. Of course, the DBA can create and review backup policies and procedures to encourage a recoverable environment. But there won't be any way to ensure with any consistency that the backup plan can deliver the time-to-recovery needed by the business.

So why don't organizations create SLAs and RTOs as a regular course of business?

And if your organization does create SLAs and RTOs, please share with us how doing so became a standard at your shop...

Friday, August 01, 2014

DB2 Health Checks - Part Two

In the first part of this series on DB2 health checks, DB2 Health Checks - Part One, I discussed the general concept of a health check and their basic importance in terms of maintaining a smooth-running DB2 environment.

Today, I want to briefly look at how DB2 health checks are usually done... if they are done at all.

The Scope of a DB2 Health Check

Some people mistakenly view a DB2 Health Check as being performance-focused only. Yes, performance is an important aspect of a health check -- and I admit that performance is generally the area that causes an organization to undergo the health check process. But the overall health of the DB2 environment needs to be addressed by the health check. In addition to performance-related issues (system, database and application), this can include:

availability
fault tolerance
recoverability
use of automation
process review
documentation
people skills (DBA, sysprog, development, etc.)

Considerations Before Undergoing a DB2 Health Check

DB2 health checks are important and crucial to the on-going stability of your systems, but there are issues:

Health checks can be costly (consulting engagements)
When a consulting company conducts a health check the analysis usually is done off-site, so your DBAs do not learn the techniques used by the consultants as they massage and analyze the data
Health checks generally are valid for a specific point-in-time and can become obsolete quickly

Conducting DB2 Health Checks

DB2 health checks typically are conducted by IBM personnel, a DB2 consultant, or a larger services firm. The engagement begins with experts/consultants interviewing the DBAs, submitting questionnaires as needed and collecting data from DB2. After collecting the data the consulting team goes off site and analyzes the reams of collected data. There may be intermittent communication between the consulting team and the on-site DBAs to clear up any lingering questions or to clarify things during the analysis phase. After some time (usually a week or more), a report on the health of your DB2 environment, perhaps with some recommendations to implement, is delivered.

What happens next is all up to you. After reading the report you can ignore it, implement some or all of the recommendations, conduct further in-house investigation for the feasibility of implementing the recommendations, or send it along to management for their perusal. But there is a deadline involved. After all, your systems are not static. So the health check report is only as good as the point-in-time for which it was delivered. Time, as it always does, will creep up on you. If you wait too long, the recommendations become stale and you might not be doing the proper thing for your environment by implementing changes based on old information.

Of course, when too much time has gone by after the health check, you could always engage with the services company and consultants again, requiring additional spending.

Is another way?

Stay tuned, as we'll look at some other options in upcoming installments of this blog series on DB2 health checking...

Thursday, July 17, 2014

DB2 Health Checks - Part 1

Left to their own devices, DB2 databases and applications will accumulate problems over time. Things that used to work, stop working. This can happen for various reasons including the addition of more data, a reduction in some aspect of business data, different types of data, more users, changes in busy periods, business shifts, software changes, hardware changes… you get the idea.

And there is always the possibility of remnants from the past causing issues with your DB2 environment. Some things may have been implemented sub-optimally from the start, perhaps many years ago… or perhaps more recently. Furthermore, DB2 is not a static piece of software; it changes over time with new versions, features and functionality. As new capabilities are introduced, older means of performing similar functionality become suboptimal, and in some cases, even obsolete. Identifying these artifacts can be troublesome and is not likely to be something that a DBA will do on a daily basis.

Nonetheless, the performance and availability of your DB2 environment – and therefore the business systems that rely on DB2 – can suffer if you do not pay attention to the health and welfare of your DB2 databases and applications.

Health Checking Your DB2

The general notion of a health check is well known in the IT world, especially within the realm of DB2 for z/OS. The purpose of a DB2 health check is to assess the stability, performance, and availability of your DB2 environment. Health checks are conducted by gathering together all of the pertinent details about your DB2-based systems and reviewing them to ascertain their appropriateness and effectiveness. You may narrow down a health check to focus on specific aspects of your infrastructure, for example, concentrating on just availability and performance, or on other aspects such as recoverability, security, and so on.

At any rate, scheduling regular independent reviews of your DB2 environment is an important aspect of assuring the viability and robustness of your implementation. Simply migrating DB2 applications to production and then neglecting to review them until or unless there are complaints from the end users is not a best practice for delivering good service to your business. Just like a car requires regular maintenance, so too does your DB2 environment. Regular analysis and health check with an overall goal should of identifying weaknesses and targeting inefficiencies, can save your organization time and money, as well as reduce the daily effort involved in implementing and maintaining your DB2 applications.

Think about the health of your DB2 system the same way you think about your health. A regular health check helps to identify and eliminate problems. And it helps you to perform the daily operational tasks on your DB2 databases and applications with the peace of mind that only regular, in-depth, knowledgeable analysis can deliver.

Check Back Soon
Later in this series we'll uncover more aspects of health checking and look at some software that might be able to assist. So stay tuned...

Monday, March 17, 2014

Types of DB2 Tools

As a user of DB2, which I'm guessing you are since you are reading this blog, you should always be on the lookout for useful tools that will help you achieve business value from your investment in DB2. There are several categories of tools that can help you to achieve this value.

Database Administration and Change Management tools simplify and automate tasks such as creating database objects, examining existing structures, loading and unloading data, and making changes to databases. Without an administration tool these tasks require intricate, complex scripts to be developed and run. One of the most important administration tools is the database change manager. Without a robust, time-tested product that is designed to effect database changes, database changes can be quite time-consuming and error prone. A database change manager automates the creation and execution of scripts designed to implement required changes – and will ensure that data integrity is not lost.

One of the more important categories of DB2 tools offers Performance Management capabilities. Performance tools help to gauge the responsiveness and efficiency of SQL queries, database structures, and system parameters. Performance management tools should be able to examine and improve each of the three components of a database application: the DB2 subsystem, the database structures, and the application programs. Advanced performance tools can take proactive measures to correct problems as they happen.

Backup and Recovery tools simplify the process of creating backups and recovering from those backup copies. By automating complex processes, simulating recovery, and implementing disaster recovery procedures these tools can be used to assure business resiliency, with no data being lost when the inevitable problems arise.

Another important category of DB2 tool is Utilities and Utility Management. A utility is a single purpose tool for moving and/or verifying database pages; examples include LOAD, UNLOAD, REORG, CHECK, COPY, and RECOVER. Tools that implement and optimize utility processing, as well as those that automate and standardize the execution of DB2 utilities, can greatly improve the availability of your DB2 applications. You might also want to consider augmenting your utilities with a database archiving solution that moves data back and forth between your database and offline storage.

Governance and Compliance tools deliver the ability to protect your data and to assure compliance with industry and governmental regulations, such as HIPAA, Sarbanes-Oxley, and PCI DSS. In many cases business executives have to vouch for the accuracy of their company’s data and that the proper controls are in place to comply with required regulations. Governance and compliance tools can answer questions like “who did what to which data when?” that are nearly impossible to otherwise answer.

And finally, Application Management tools help developers improve application performance and speed time-to-market. Such tools can improve database and program design, facilitate application testing including the creation and management of test data, and streamline application data management efforts.

Tools from each of these categories can go a long way toward helping your organization excel at managing and accessing data in your DB2 databases and applications...

Monday, November 18, 2013

DBA Rules of Thumb - Part 2 (Automate)

Why should you do it by hand if you can automate DBA processes? Anything you can do probably can be done better by the computer – if it is programmed to do it properly. And once it is automated you save yourself valuable time. And that time can be spent tackling other problems, learning about new features and functionality, or training others.

Furthermore, don’t reinvent the wheel. Someone, somewhere, at some time many have already solved the problem you currently are attempting to solve. Look to the web for sites that allow you to download and share scripts. Or if you have budget money look to purchase DBA tools from ISVs. There are a lot of good tools out there, available from multiple vendors, that can greatly simplify the task of database administration. Automating performance management, change management, backup and recovery, and other tasks can help to reduce the amount of time, effort, and human error involved in managing database systems.

Of course, you can take the automation idea too far. There has been a lot of talk and vendor hype lately about self-managing database systems. For years now, pundits and polls have been asking when automation will make the DBA job obsolete. The correct answer is "never" - or, at least, not any time soon.

There are many reasons why DBAs are not on the fast path to extinction. Self-managing databases systems are indeed a laudable goal, but we are very far away from a “lights-out” DBMS environment. Yes, little-by-little and step-by-step, database maintenance and performance management is being improved, simplified, and automated. But you know what? DBAs will not be automated out of existence in my lifetime – and probably not in your children’s lifetime either.
Many of the self-managing features require using the built-in tools from the DBMS vendor, but many organizations prefer to use heterogeneous solutions that can administer databases from multiple vendors (Oracle, DB2, SQL Server, MySQL, etc.) all from a single console. Most of these tools have had self-managing features for years and yet they did not make the DBA obsolete.

And let’s face it, a lot of the DBMS vendors claims are more hyperbole than fact. Some self-managing features are announced years before they will become generally available in the DBMS. All vendors claims to the contrary, no database today is truly 100% self-contained. Every database needs some amount of DBA management – even when today’s best self-management features are being used.

What about the future? Well, things will get better – and probably more costly. You don’t think the DBMS vendors are building this self-management technology for free, do you? But let’s remove cost from the equation for a moment. What can a self-managing database actually manage?

Most performance management solutions allow you to set performance thresholds. A threshold allows you to set up a trigger that says something like “When x% of a table’s pages contain chained rows or fragmentation, schedule a reorganization.” But these thresholds are only as good as the variables you set and the actions you define to be taken upon tripping the threshold. Some software is bordering on intelligent; that is, it “knows” what to do and when to do it. Furthermore, it may be able to learn from past actions and results. The more intelligence that can be built into a self-managing system, the better the results typically will be. But who among us currently trusts software to work like a grizzled veteran DBA? The management software should be configurable such that it alerts the DBA as to what action it wants to take. The DBA can review the action and give a “thumbs up” or “thumbs down” before the corrective measure is applied. In this way, the software can earn the DBA’s respect and trust. When the DBA trusts the software, he can turn it on so that it self-manages “on the fly” without DBA intervention. But today, in most cases, a DBA is required to set up the thresholds, as well as to ensure their on-going viability.

Of course, not all DBA duties can be self-managed by software. Most self-management claims are made for performance management, but what about change management? The DBMS cannot somehow read the mind of its user and add a new column or index, or change a data type or length. This non-trivial activity requires a skilled DBA to analyze the database structures, develop the modifications, and deploy the proper scripts or tools to implement the change. Of course, software can help simplify the process, but software cannot replace the DBA.

Furthermore, database backup and recovery will need to be guided by the trained eye of a DBA. Perhaps the DBMS can become savvy enough to schedule a backup when a system process occurs that requires it. Maybe the DBMS of the future will automatically schedule a backup when enough data changes. But sometimes backups are made for other reasons: to propagate changes from one system to another, to build test beds, as part of program testing, and so on. A skilled professional is needed to build the proper backup scripts, run them appropriately, and test the backup files for accuracy. And what about recovery? How can a damaged database know it needs to be recovered? Because the database is damaged any self-managed recovery it might attempt is automatically rendered suspect. Here again, we need the wisdom and knowledge of the DBA.

And there are many other DBA duties that cannot be completely automated. Because each company is different, the DBMS must be customized using configuration parameters. Of course, you can opt to use the DBMS “as is,” right out-of-the-box. But a knowledgeable DBA can configure the DBMS so that it runs appropriately for their organization. Problem diagnosis is another tricky subject. Not every problem is readily solvable by developers using just the Messages and Codes manual and a help desk. What happens with particularly thorny problems if the DBA is not around to help?

Of course, the pure, heads-down systems DBA may (no, let's say should) become a thing of the past. Instead, the modern DBA will need to understand multiple DBMS products, not just one. DBAs furthermore must have knowledge of the business impact of each database under their care (more details here). And DBAs will need better knowledge of logical database design and data modeling – because it will advance their understanding of the meaning of the data in their databases.

Finally, keep in mind that we didn't automate people out of existence when we automated HR or finance. Finance and HR professionals are doing their jobs more efficiently and effectively, and they have the ability to deliver a better product in their field. That's the goal of automation. So, as we automate portions of the DBA’s job, we'll have more efficient data professionals managing data more proficiently.

This blog entry started out as a call to automate, but I guess it kinda veered off into an extended dialogue on what can, and cannot, be accomplished with automation. I guess the bottom line is this... Automation is key to successful, smooth-running databases and applications... but don't get too carried away by the concept.

I hope you found the ideas here to be useful... and feel free to add your own thoughts and comments below!

Friday, January 30, 2009

Hey DBAs! Recoverability Trumps Performance

Many DBAs reading this blog will probably think I'm wrong, at least initially. They'll claim that managing performance is the most important thing they do, but they are confusing frequency with importance. Yes, DBAs confront performance issues more often than they build backup plans – and they better be managing performance more frequently than they are actually recovering their databases or their company has big problems!

So why do I say that recoverability is at the pinnacle of the DBA task list? Well, if you cannot recover your databases after a problem then it won’t matter how fast you can access them, will it? Anybody can deliver fast access to the wrong information (or worse yet, no information at all). It is the job of the DBA to keep the information in their company’s databases accurate, secure, and accessible.

So what do we need to do to assure the integrity of our database data? First we need to understand the availability needs of our data in terms of the business. In the event of a failure how rapidly must we be able to recover from that failure? Keep in mind that the failure could be either physical, such as a failed disk drive, or logical, such as applying the wrong input to a process which corrupts the database.

Only after we know the impact to the business can we develop an appropriate backup and recovery plan. We need service level agreements (SLAs) for recovery just like we have SLAs for performance. The recovery SLA needs to be phrased as a recovery time objective (RTO) from an application perspective; for example “The amount of time to restore application availability after a failure of the order entry system cannot exceed 2 hours (or 10 minutes or whatever is appropriate for your business)”

To create effective RTOs you will need to be able to answer the question “What is the cost of not having this data available?” When we know the expectations of the business we can work to create a backup and recovery plan that matches the requirements. There are multiple techniques and methods for backing up and recovering databases. Some techniques, while more costly, can enhance availability by recovering data more rapidly.

It is imperative that the DBA team creates an appropriate recovery strategy for each database object. This requires mapping database objects to applications so we can adopt the proper strategy in accordance with the application recovery SLA. Some database objects will participate in multiple applications, and their recovery strategy will therefore be more complex.

Not all data is created equal. Some of your databases and tables contain data that is necessary for the core of your business. Other database objects contain data that is less critical or easily derived from other sources. Armed with this information, DBAs can develop RTOs such that the recovery plan matches the needs of the business.

Establishing a reasonable backup schedule requires you to balance two competing demands: the need to take image copy backups frequently to assure reasonable recovery time, while at the same time dealing with the need to take image copies infrequently so as not to interrupt daily business. All the while keeping in mind, if you make fewer image copies you will need to apply more log records during the recovery, and the recovery will take longer. The DBA must balance these competing objectives based on RTOs, usage criteria, and the capabilities of the DBMS.

When was the last time you re-evaluated and tested your backup and recovery plans? Oh, you may have looked at disaster plans, but have you examined your ability to recover locally? Do you know how long it would take to recover your most important primary customer tables, for example, if you took a hit in the middle of the day?

Regular recoverability health checking should be a standard, documented responsibility for the DBA staff; and if you can acquire software to automate the health-check process, all the better.

Friday, January 02, 2009

Recovery AssuranceExpert for DB2 z/OS: Automating the IT risk management of business availability

Business availability is more than just having a reliable hardware and database platform in place. Even the best high availability environment cannot safeguard itself from logical errors. Since most companies cannot afford downtime, it is important that the enterprise data on which they depend is always available.

Well-planned recovery procedures should be able to assure a complete recovery of enterprise-critical data within a pre-defined time window that provides for minimum disruption of the business. However, within complex environments, it is nearly impossible to perform recovery tests without disrupting the production system. Therefore, even the best-planned recovery scenarios fail because of operational risks resulting from unforeseen and typically immeasurable vulnerabilities.

If you are interested in minimizing the risk associated with DB2 for z/OS availability and recoverability, read this white paper by Brenda Honeycutt to learn about the value of regular, periodic health checks to assure your recovery time objectives for DB2 recoverability.

Monday, October 20, 2008

DBA Rules of Thumb

Database administration is a very technical discipline, but it is also a discipline in which the practitioner is very visible politically within the organization. As such, DBAs should be armed with the proper attitude and knowledge before attempting to practice the discipline of database administration.

Just as important as technical acumen, though, is the ability to carry oneself properly and to embrace the job appropriately. With this in mind, I wrote a series of blog entries on DBA Rules of Thumb over at my Data Management Today blog... and I thought the information I wrote there may be helpful to my DB2 and mainframe readership here, so I'm sharing the eight rules of thumb (with links) here on my DB2 Portal blog:

What do you think? Did I miss anything important?

P.S. Just a reminder that I will be presenting a webinar on assuring DB2 recoverability with my colleague, Michael Figaro, this Thursday, October 23, 2008 at 10:30 Central time. If you are at all interested in the topic, be sure to register today - and attend this Thursday!

Thursday, October 09, 2008

Assuring the Recoverability of Your DB2 Databases

Availability requires much more than just having a reliable hardware and database platform. Most companies cannot afford significant downtime, and some cannot afford any! As such, it is crucial for unplanned outages to be as short as possible. But it is not just a business requirement, in many cases assuring a speedy recovery is also a legal mandate. Regulations such as SOX and Basel II dictate that any outage is resolved within a predefined period of time.

But how many of us can answer, with any degree of certainty, the question “How long will this outage last?” There are many variables that need to be considered when estimating a DB2 recovery time: backups available, quality, point-in-time requirements, amount of log processing, disk speed, tape mounts, and on and on and on...

With these thoughts in mind, Michael Figaro and I will be delivering a webinar titled Assuring the Recoverability of Your DB2 Databases, on Thursday, October 23, 2008 at 10:30 am CDT.

We will tackle issues ranging from regulations, IT complexity, and business continuity, to DSNZPARMs and backup/recovery planning. We’ll also make the case that planning for database recoverability is the most important task among the many tasks of the DBA.

As part of the webinar we will introduce and demonstrate Recovery AssuranceExpert, a new technology to help you ensure that all of your critical DB2 objects are recoverable within your recovery time objectives. Recovery AssuranceExpert is an automated solution to perform daily health checks of data availability and recoverability, as well as provide actual recovery times required for a DB2 object, a complete application, or even a whole DB2 subsystem. Join us on October 23rd to find out how you can insure that your actual recoverability times fit into your SLAs.

Tuesday, September 04, 2007

MODIFY RECOVERY [DB2 9 for z/OS]

Another IBM DB2 utility that has received some attention under Version 9 is MODIFY RECOVERY. For those not sure what this utility is, MODIFY RECOVERY is used to remove records from the SYSIBM.SYSCOPY table in the DB2 Catalog, along with related log records from the SYSIBM.SYSLGRNX directory table and related information from the DBD.

Before the new V9 changed, when you ran MODIFY RECOVERY you had to specify the deletion criteria in one of two ways:

Given a specific date, delete all recovery information before that data
Or given an age in days, delete all recovery information older than the age

But as of V9 we can alternately specify what is to be retained instead of what is to be deleted. This way, we tell the utility what we want to keep and it will remove the rest.

So, instead of coding the DELETE parameter (with a DATE or AGE) we would instead code a RETAIN parameter. The RETAIN parameter can specify one of five different options:

LAST (integer) – Using this option informs DB2 to choose a date such that the number of full image copies kept is not less than the integer specified. So, if we specify LAST(5), then we will delete all entries that exist prior to the date on which the fifth oldest copy was taken. If more than one copy was taken on that date the result could be that more than 5 copies are retained.

LOGLIMIT – Specifying this option will delete all copies where DB2 no longer has log records to recover forward from. DB2 uses the BSDS to establish the oldest archive log timestamp.

GDGLIMIT – You can use this option to enable the number of copies to be retained to be linked to the corresponding GDG base limit.

GDGLIMIT LAST (integer) – It is possible also to combine the GDGLIMIT and LAST options. DB2 will then use the GDG base limit, if the last primary copy is a GDG, if not it uses the integer specified.

GDGLIMIT LOGLIMIT – And finally, we can combine the GDGLIMIT and LOGLIMIT options, too. If the most recent primary full copy is a GDG then the GDG limit is used. If the last copy is not a GDG then the BSDS is used to determine the date prior to which deletions will occur.

The last two options (the ones that combine two parameters) are particularly useful if you have a mix of GDGs and non-GDGs for your image copies. They are also helpful if you use the LISTDEF option when you make image copy backups.

It is important to understand that for all these options MODIFY RECOVERY will cause DB2 to determine the most recent date that satisfies the retention requirement. After ascertaining the correct date to use, all entries before that data will be deleted. In effect DB2 chooses the most recent date that would satisfy the retention criteria. So, under some circumstances there may be more than one image copy on the specific date that is established, and therefore DB2 will keep those additional copies, too.

As an example, say we take 7 image copies on a single day. Then we run MODIFY RECOVERY … RETAIN LAST (3). In this case, all 7 will be retained as they were made on the same day, even though we indicated that we wanted to retain only the last 3.

Finally, there is one more significant change made to the MODIFY RECOVERY utility. Prior to V9, information was only deleted when copies were removed from SYSIBM.SYSCOPY. But as of V9, MODIFY RECOVERY will deletes SYSIBM.SYSLGRNX entries even if no SYSIBM.SYSCOPY records are deleted.

Why is this important? Well, some shops have migrated over to running BACKUP SYSTEM instead of individual image copies. In this case, there will be no individual image copies to remove in SYSIBM.SYSCOPY, but the SYSIBM.SYSLGRNX records will continue to grow. After migrating to V9 you can use MODIFY RECOVERY to trim the size of SYSIBM.SYSLGRNX which can improve performance for processes that need to access log range information.