Let's talk about dirty reads (with apologies to the AC/DC pun in the title of this blog post).
Application
programmers must understand how concurrency problems impact the access and modification of Db2 data. When one program attempts to read data that’s in the
process of being changed by another, the DBMS must forbid access until the
modification is complete to ensure data integrity. Most DBMS products,
including Db2, use a locking mechanism for all data items being changed.
Therefore, when one task is updating data on a page, another task can’t access
data (i.e., read or update) on that same page until the data modification is
complete and committed.
If you are interested, I wrote a 17-part series of blog post on Db2 locking back in 2013... that last part, found here, contains an index to all 17 posts. But back to today's topic... the dirty read.
Before
discussing what a “dirty read” is, we should first talk a bit about
transactions and the importance of ACID. With the advent of NoSQL database systems
that do not always support ACID, it is important that developers and DBAs
understand what ACID is and why it is important to the integrity of your data.
Transactions and ACID
A
transaction is an atomic unit of work
with respect to recovery and consistency. A logical transaction performs a
complete business process typically on behalf of an online user. It may consist
of several steps and may comprise more than one physical transaction. The
results of running a transaction will record the effects of a business
process—a complete business process. The data in the database must be correct
and proper after the transaction executes.
When
all the steps that make up a specific transaction have been accomplished, a
COMMIT is issued. The COMMIT signals that all work since the last COMMIT is
correct and should be externalized to the database. At any point within the
transaction, the decision can be made to stop and roll back the effects of all
changes since the last COMMIT. When a transaction is rolled back, the data in
the database will be restored to the original state before the transaction was
started. The DBMS maintains a transaction log (or journal) to track database
changes.
In
other words, transactions exhibit ACID properties. ACID is an acronym for atomicity, consistency, isolation, and durability. Each of these four qualities
is necessary for a transaction to be designed correctly.
- · Atomicity means that a transaction must exhibit “all or nothing” behavior. Either all of the instructions within the transaction happen, or none of them happen. Atomicity preserves the “completeness” of the business process.
- · Consistency refers to the state of the data both before and after the transaction is executed. A transaction maintains the consistency of the state of the data. In other words, after running a transaction, all data in the database is “correct.”
- · Isolation means that transactions can run at the same time. Any transactions running in parallel have the illusion that there is no concurrency. In other words, it appears that the system is running only a single transaction at a time. No other concurrent transaction has visibility to the uncommitted database modifications made by any other transactions. To achieve isolation, a locking mechanism is required.
- · Durability refers to the impact of an outage or failure on a running transaction. A durable transaction will not impact the state of data if the transaction ends abnormally. The data will survive any failures.
Let’s use an
example to better understand the importance of transactions to database
applications. Consider a banking application. Assume that you wish to withdraw
$50 from your account with Mega Bank. This “business process” requires a
transaction to be executed. You request the money either in person by handing a
slip to a bank teller or by using an ATM (Automated Teller Machine). When the
bank receives the request, it performs the following tasks, which make up the
complete business process. The bank will:
- Check your account to make sure you have the necessary funds to withdraw the requested amount.
- If you do not, deny the request and stop; otherwise continue processing.
- Debit the requested amount from your checking account.
- Produce a receipt for the transaction.
- Deliver the requested amount and the receipt to you.
The
transaction performing the withdrawal must complete all of these
steps, or none of these steps, or else one of the parties in the transaction
will be dissatisfied. If the bank debits your account but does not give you
your money, then you will not be satisfied. If the bank gives you the money but
does not debit the account, the bank will be unhappy. Only the completion of
every one of these steps results in a “complete business process.” Database
developers must understand the requisite business processes and design
transactions that ensure ACID properties.
To
summarize, a transaction—when executed alone, on a consistent database—will
either complete, producing correct results, or terminate, with no effect. In
either case the resulting condition of the database will be a consistent state.
Now Let’s Get Back to Dirty Reads
Programs
that read Db2 data typically access numerous rows during their execution and
are susceptible to concurrency problems. But when writing your application programs you
can use read-through locks, also known as “dirty read” or “uncommitted read,”
to help overcome concurrency problems. When using uncommitted reads, an
application program can read data that has been changed, but not yet committed.
Dirty
read capability is implemented using the UR isolation level (for uncommitted
read). If the application program is using the UR isolation level, it will read
data without taking locks. This lets the application program read data
contained in the table as it’s being manipulated. Consider the following
sequence of events:
1.
At
9 a.m., a transaction containing the following SQL to change a specific value
is executed:
UPDATE EMP
SET
FIRST_NAME = “MICHELLE”
WHERE
EMPNO = 10020;
2.
The
transaction is long-running and continues to execute without issuing a COMMIT.
3.
At
9:01 a.m., a second transaction attempts to SELECT the data that was changed,
but not committed.
If
the UR isolation level was specified for the second transaction, it would read
the changed data even though it had yet to be committed. Because the program
simply reads the data in whatever state it happens to be at that moment, it can
execute faster than if it had to wait for locks to be taken and resources to be
freed before processing.
However,
the implications of reading uncommitted data must be carefully examined before
being implemented, as several problems can occur. A dirty read can cause
duplicate rows to be returned where none exist. Alternately, a dirty read can
cause no rows to be returned when one (or more) actually exists.
Some Practical Advice
So,
when is it a good idea to implement dirty reads using the UR isolation level? If the data is read only, a dirty read is fine because there are no changes being made to the data. In "real life," though, true read only data is rare.
A general rule of thumb is to avoid dirty reads whenever the results of your queries must be 100
percent accurate. For example, avoid UR if calculations must balance, data is
being retrieved from one source to modify another, or for any production,
mission-critical work that can’t tolerate data integrity problems.
In other words: If my bank deployed dirty reads on its core banking
applications I would definitely find myself another bank!
One
of the more concerning things that I’ve witnessed as a Db2 consultant out “in
the real world” is a tendency for dirty read to be used as a quick and dirty
way to improve performance. By appending a WITH UR to a statement a developer
can remove the overhead of locking and improve performance. But often this is done
without a thorough investigation of the possible implications. Even worse, some
organizations have implemented a standard that says SELECT statements should always be
coded using WITH UR. That can wreak havoc on data integrity... and it goes against my core mantra - almost never say always or never.
Most
Db2 applications aren’t viable candidates for dirty reads, but there are a few
situations where dirty reads can be beneficial. Examples include access to a
reference, code, or look-up table (where the data is non-volatile), statistical
processing on large amounts of data, analytical queries in data warehousing and
Business Intelligence (BI) applications, or when a table (or set of tables) is
used by a single user only (which is rare). Additionally, if the data being
accessed is already questionable, little harm can be done using a dirty read to
access the information.
Because
of the data integrity issues associated with dirty reads, DBAs should keep
track of the programs that specify an isolation level of UR. This information
can be found in the Db2 Catalog. The following two queries can be used to find
the applications using uncommitted reads.
Issue
the following SQL for a listing of plans that were bound with ISOLATION(UR) or
contain at least one statement specifying the WITH UR clause:
SELECT DISTINCT
S.PLNAME
FROM SYSIBM.SYSPLAN
P,
SYSIBM.SYSSTMT S
WHERE (P.NAME =
S.PLNAME AND
P.ISOLATION = ˈUˈ
)
OR
S.ISOLATION = ˈUˈ
ORDER BY S.PLNAME;
Issue
the following SQL for a listing of packages that were bound with ISOLATION(UR)
or contain at least one statement specifying the WITH UR clause:
SELECT DISTINCT
P.COLLID, P.NAME, P.VERSION
FROM
SYSIBM.SYSPACKAGE P,
SYSIBM.SYSPACKSTMT S
WHERE (P.LOCATION
= S.LOCATION AND
P.LOCATION
= ˈ ˈ AND
P.COLLID =
S.COLLID AND
P.NAME =
S.NAME AND
P.VERSION
= S.VERSION AND
P.ISOLATION = ˈUˈ
)
OR
S.ISOLATION = ˈUˈ
ORDER BY S.COLLID, S.NAME, S.VERSION;
The
dirty read capability can provide relief to concurrency problems and deliver
faster performance in specific situations. Understand the implications of the
UR isolation level and the “problems” it can cause before diving headlong into
implementing it in your production applications.