Sunday, December 07, 2008
In the meantime, be sure to check in regularly on my other blog, Data Management Today.
And if you live in Richmond or Detroit, you can see me present at your local user group next week... Tuesday in Richmond (12/9/08) and Thursday in Detroit (12/11/08).
Friday, November 14, 2008
Each Forum offers 2 days of education with 2 tracks: one covering DB2 for z/OS and another covering DB2 for LUW. IDUG is offering full two day registrations for $425 and single day registrations for only $225.
Here are the scheduled dates:
* Camp Hill, PA - November 17 and 18
* Kansas City, MO - November 19 and 20
Check out the links above for the full list of sessions in your area.
I'll be delivering my presentation titled "DB2 9: For Developer's Only" at both of these forums. And there will be many other great speakers there, too!
Friday, November 07, 2008
Keep the following rules in mind.
When you issue date arithmetic statements using durations, do not try to establish a common conversion factor between durations of different types. For example, the following two date arithmetic statements are not equivalent:
1997/04/03 - 1 MONTH
1997/04/03 - 30 DAYS
April has 30 days, so the normal response would be to subtract 30 days to subtract one month. The result of the first statement is 1997/03/03, but the result of the second statement is 1997/03/04. In general, use like durations (for example, use months or use days, but not both) when you issue date arithmetic.
Another consideration: if one operand is a date, the other operand must be a date or a date duration. If one operand is a time, the other operand must be a time or a time duration. You cannot mix durations and data types with date and time arithmetic.
If one operand is a timestamp, the other operand can be a time, a date, a time duration, or a date duration. The second operand cannot be a timestamp. You can mix date and time durations with timestamp data types.
Now, what exactly is in that field returned as the result of a date or time calculation? Simply stated, it is a duration. There are three types of durations: date durations, time durations, and labeled durations.
Date durations are expressed as a DECIMAL(8,0) number. The result of subtracting one DATE value from another is a date duration. To be properly interpreted, the number must have the format yyyymmdd, where yyyy represents the number of years, mm the number of months, and dd the number of days.
Time durations are expressed as a DECIMAL(6,0) number. To be properly interpreted, the number must have the format hhmmss, where hh represents the number of hours, mm the number of minutes, and ss the number of seconds. The result of subtracting one TIME value from another is a time duration.
Labeled durations represent a specific unit of time as expressed by a number followed by one of the seven duration keywords: YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS, or MICROSECONDS. A labeled duration can only be used as an operand of an arithmetic operator, and the other operand must have a data type of DATE, TIME, or TIMESTAMP. For example:
CURRENT DATE + 3 YEARS + 6 MONTHS
This will add three and a half years to the current date.
Thursday, November 06, 2008
Redman offers the basic thesis of the book right there on page one, where he states “…bad data lie at the root of issues of international importance, including the current subprime mortgage meltdown, lost and stolen identities, hospital errors and contested elections.” After laying down the problem, the rest of the book tells us what we need to do to correct the problems.
Data Driven will help you to improve the methods you deploy for the care and feeding of your data and information; in other words, helping you to control and manage data using similar processes and controls that you deploy on your other assets (finances, people, structures, etc.) – a noble goal, indeed!
The writing is concise and snappy – you won’t get bored reading this book. The style is engaging and it is easy to read. For example, instead of just saying what to do and how to do it, which can be boring, Redman discusses many of the arguments people use to say that data quality is impossible, and then debunks them showing that data quality is possible, if approached properly and thoroughly.
There are many good ideas, charts, and graphs in Data Driven, too. One of my favorites is on page 54, where you can find a chart of the ten habits followed by those with the best data. If you buy this book, make a poster-sized photocopy of that page and hang it up on the wall of the break room and in the data folks’ cubicles. Maybe the habits will rub off on everyone as they gaze upon them everywhere.
But the best little gem in this wonderful book is the entirety of the last chapter, which is titled “The Next One Hundred Days.” In this chapter Dr. Redman offers what he calls a hundred-day panorama. It is not a grand plan because most will not have the depth of understanding required to create such a plan and have it succeed. Instead, the panorama strives for breadth, not depth, with a focus on quality. Diligent readers can follow the guidance in this chapter and thereby begin the long-term process of appreciating the importance of data quality on their business practices.
And that alone is worth the price of the book… but, of course, Data Driven offers much more and I recommend it to every IT and business professional whose job relies on accurate data.
Monday, November 03, 2008
Q: My format does not fit into any of the formats listed in the DB2 manuals. What if I have a DATE stored like YYYYMMDD (with no dashes or slashes) and I want to compare it to a DB2 date?
A: Okay, let's look at one potential solution to your problem (and then I want to briefly talk about the use of proper data types). First of all you indicate that your date column contains dates in the following format: yyyymmdd with no dashes or slashes. You do not indicate whether this field is a numeric or character field - I will assume that it is character. If it is not, you can use the CHAR function to convert it to a character string.
Then, you can use the SUBSTR function to break the character column apart into the separate components, for example SUBSTR(column,1,4) returns the year component, SUBSTR(column,5,2) returns the month, and SUBSTR(column,7,2) returns the day.
Then you can concatenate all of these together into a format that DB2 recognizes, for example, the USA format which is mm/DD/yyyy. This can be done as follows:
SUBSTR(column,5,2) || "/" || SUBSTR(column,7,2) ||
"/" || SUBSTR(column,1,4)
Then you can use the DATE function to convert this character string into a DATE that DB2 will recognize. This is done as follows:
DATE(SUBSTR(column,5,2) || "/" || SUBSTR(column,7,2) ||
"/" || SUBSTR(column,1,4))
The result of this can be used in date arithmetic with other dates or date durations. Of course, it may not perform extremely well, but it should return the results you desire.
Now, a quick word about using proper data types. I say this all of the time, but there are many applications and implementations "out there" that do not heed the advice: it is wise to use the DATE data type when you store dates in DB2 tables. It simplifies life later on when you want to do things like formatting dates and performing date arithmetic.
Using the appropriate data type also ensures that DB2 will perform the proper integrity checks on the columns when data is entered, instead of requiring application logic to ensure that valid dates are entered.
Wednesday, October 29, 2008
Q:I have a DATE column in a DB2 table, but I do not want it to display the way DB2 displays it by default. How can I get a date format retrieved from a column in a table from DB2 database in the format MM/DD/YYYY?
A:The simplest way to return a date in the format you desire is to use the built-in column function CHAR. Using this function you can convert a date column into any number of formats. The specific format you request, MM/DD/YYYY, is the USA date format. So, for example, to return the date in the format you requested for a column named START_DATE you would code the function as follows:
The first argument is the column name and the second argument is the format. Consult the following table for a list of the date formats that are supported by DB2.
| || |
Locally defined layout
You may also have an installation-defined date format that would be named LOCAL. For LOCAL, the date exit for ASCII data is DSNXVDTA, the date exit for EBCDIC is DSNXVDTX, and the date exit for Unicode is DSNXVDTU.
Of course, this is a simple date question... I will follow-up with some additional date-related questions and answers in my next couple of blog posts.
Wednesday, October 22, 2008
Monday, October 20, 2008
Just as important as technical acumen, though, is the ability to carry oneself properly and to embrace the job appropriately. With this in mind, I wrote a series of blog entries on DBA Rules of Thumb over at my Data Management Today blog... and I thought the information I wrote there may be helpful to my DB2 and mainframe readership here, so I'm sharing the eight rules of thumb (with links) here on my DB2 Portal blog:
- Document Everything!
- Automate Ingelligently
- Don't Panic!
- Focus Your Efforts
- Invest In Yourself
- Develop Business Acumen
P.S. Just a reminder that I will be presenting a webinar on assuring DB2 recoverability with my colleague, Michael Figaro, this Thursday, October 23, 2008 at 10:30 Central time. If you are at all interested in the topic, be sure to register today - and attend this Thursday!
Thursday, October 09, 2008
But how many of us can answer, with any degree of certainty, the question “How long will this outage last?” There are many variables that need to be considered when estimating a DB2 recovery time: backups available, quality, point-in-time requirements, amount of log processing, disk speed, tape mounts, and on and on and on...
With these thoughts in mind, Michael Figaro and I will be delivering a webinar titled Assuring the Recoverability of Your DB2 Databases, on Thursday, October 23, 2008 at 10:30 am CDT.
We will tackle issues ranging from regulations, IT complexity, and business continuity, to DSNZPARMs and backup/recovery planning. We’ll also make the case that planning for database recoverability is the most important task among the many tasks of the DBA.
As part of the webinar we will introduce and demonstrate Recovery AssuranceExpert, a new technology to help you ensure that all of your critical DB2 objects are recoverable within your recovery time objectives. Recovery AssuranceExpert is an automated solution to perform daily health checks of data availability and recoverability, as well as provide actual recovery times required for a DB2 object, a complete application, or even a whole DB2 subsystem. Join us on October 23rd to find out how you can insure that your actual recoverability times fit into your SLAs.
Tuesday, September 30, 2008
As more of the tasks required of DBAs become more automated, the DBA will be freed to expand into other areas. So one front on this storm is the autonomic computing initiatives that automate DBA tasks. At the same time, IT professionals are being asked to know more about the business instead of just knowing the technology. So DBAs need to understand the business purpose and definition of the data they manage, as well as the technological underpinnings of the DBMS. The driving force here is predominantly regulatory compliance. This second front of the perfect storm will cause DBAs to work more closely with metadata to drive database archiving, data auditing, and security to ensure their organization complies with regulations like Sarbanes-Oxley, HIPAA, and others.
Regarding the wireless aspect of things, pervasive devices (PDA, handhelds, cell phones, etc.) will increasingly interact with database systems. DBAs will need to get involved there to ensure successful data synchronization. And database systems will work with disconnected data seamlessly, such as data generated by RFID tags.
Yet another big database trend is technology "suck." By that I mean the DBMS is as it sucks up technologies and functions that previously required you to purchase separate software. Remember when the DBMS had no ETL or OLAP functionality? Those days are gone. This will continue as the DBMS adds capabilities to tackle more and more IT tasks.
Another trend impacting DBAs will be a change in some of their roles as more and more of the recent DBMS features actually start being used in more production systems.
The net result of this perfect storm of changes is that data professionals are absolutely being required to do more... sometimes with less (less time, less money, less staff, etc.)
If you know the technology but are then required to know the business, this is doing more – much more. But the technology, in many cases, is also expanding. For example, DB2 9 incorporates native XML. Most DBAs are not XML savvy, but increasingly they will have to learn more about XML as the DBMS technology expands. And this is just one example.
Additionally, data is growing at an ever-increasing rate. Every year the amount of data under management increases (some analysts peg the compound annual rate of data growth at 125%) and in many cases the number of DBAs to manage that growing data is not increasing, and indeed, could be decreasing.
And, budgetary limitations can cause DBAs to have to do more work, to more data, with less resources. When a company reduces budget but demands more work, automation is an absolute necessity. Turning work over to the computer can help (although it is unlikely to solve every administrative issue). Sometimes IT professionals fight against the very thing they excel in – that is, automating work. If you think about it, every computer program is written to automate someone’s work – the write (word processing), the accountant (financials, payroll, spreadsheets), and so on. This automation did not put the executives whose work was automated out of a job; instead it made them more efficient. Yet, for some reason, there is a notion in the IT industry that automating IT tasks will eliminate jobs. You cannot automate a DBA out of existence – but you can make that DBA’s job more effective and efficient with DBA tools and autonomic computing.
And the sad truth of the matter is that there is still a LOT more than can, and should, be done in most companies. We can start with better automation of DBA tasks, but we shouldn't stop there!
Corporate governance is hot – that is, technologies to help companies comply with governmental regulations. Software to enable archiving for long-term data retention, auditing to determine who did what to which piece of data, and security to better protect data are all hot data technologies right now. But database security need to be improve and technologies for securing and auditing data need to be more pervasively implemented.
Metadata is increasing in importance. As data professionals really begin to meld together technology and business, they find that metadata is imperative. But most organizations do not have a metadata repository fully-populated and up-to-date that acts as a lexicon for business data.
And finally, something that isn’t nearly hot enough is data quality and integrity. Tools, processes, and database options that can be used to make data more accurate and reliable are not implemented appropriately with any regularity. So the data stored in our corporate databases is suspect. According to Thomas C. Redman, data quality guru, poor data quality costs the typical company at least ten percent (10%) of revenue. That is a significant cost! Data quality is generally bad in most organizations – and more needs to be done to address that problem.
With all of these thoughts in mind, are you prepared to weather this perfect storm?
Tuesday, September 23, 2008
Database auditing, sometimes called data access auditing, is one technique growing in popularity as a response to the demands of regulatory compliance. At a high level, database auditing is basically a facility to track the use of database resources and authority. It can be used to help answer questions like “Who accessed or changed data?” and “What was actually changed?” and “When did it change?”
But how you implement your database auditing, especially in a mainframe environment, will have a significant impact on not just "the completeness" of what you capture in the audit trail, but on the performance and availability of your entire environment.
Join me on Wednesday, September 24, 2008 at 10:30 am, Central Daylight Time, for a free webinar where I will discuss the issues and requirements driving database auditing. This presentation can help to serve as a roadmap of sorts for your data access auditing needs.
Monday, September 22, 2008
Mainframe Executive magazine just published my article on mainframe database auditing. Click here to read all about it: What Every Good CIO Needs to Know About Mainframe Database Auditing.
Friday, September 05, 2008
Anyway, why do I place recoverability at the very top of the DBA task list? Well, if you cannot recover your databases after a problem then it won’t matter how fast you can access them, will it? Anybody can deliver fast access to the wrong information. It is the job of the DBA to keep the information in our company’s databases accurate, secure, and accessible.
So what do we need to do to assure the integrity of our database data? First we need to understand the availability needs of our data in terms of the business. In the event of a failure how rapidly must we be able to recover from that failure? Keep in mind that the failure could be either physical, such as a failed disk drive, or logical, such as applying the wrong input to a process which corrupts the database.
Only after we know the impact to the business can we develop an appropriate backup and recovery plan. We need service level agreements (SLAs) for recovery just like we have SLAs for performance. The recovery SLA, or Recovery Time Objective (RTO) needs to be from an application perspective, such as “Time to restore application availability after a failure for application X cannot exceed 2 hours (or 10 minutes or …)”
To create effective RTOs you must be able to answer the question “What is the cost of not having this data available?” When we know the expectations of the business we can work to create a backup and recovery plan that matches the requirements. There are multiple techniques and methods for backing up and recovering databases. Some techniques, while more costly, can enhance availability by recovering data more rapidly.
It is imperative that the DBA team creates an appropriate recovery strategy for each database object. This requires mapping database objects to applications so we can adopt the proper strategy in accordance with RTOs. Some database objects will participate in multiple applications, and their recovery strategy will therefore be more complex.
Not all data is created equal. Some of your databases and tables contain data that is necessary for the core of your business. Other database objects contain data that is less critical or easily derived from other sources. Armed with this information -- and our RTOs -- a DBA can create a recovery plan that matches the needs of the business.
Establishing a reasonable backup schedule requires you to balance two competing demands: the need to take image copy backups frequently to assure reasonable recovery time, while at the same time dealing with the need to take image copies infrequently so as not to interrupt daily business. All the while keeping in mind, if you make fewer image copies you will need to apply more log records during the recovery, and the recovery will take longer. The DBA must balance these competing objectives based on SLAs, usage criteria, and the capabilities of the DBMS.
When was the last time you re-evaluated and tested your backup and recovery plans? Oh, you may have looked at disaster plans, but have you examined your ability to recover locally? Do you know how long it would take to recover your most important primary customer tables, for example, if you took a hit in the middle of the day?
Regular recoverability health checking should be a standard documented responsibility for every DBA staff; and if you can acquire software to automate the health-check process, all the better.
SEGUS offers a nice option for checking the recoverability of your DB2 databases called Recovery AssuranceExpert. Using this automated tool you can monitor the recoverability of your DB2 environment including DB2 settings (such as DB2 logging, buffer pools, DSNZPARMs), recovery prerequisites, recovery service levels, and recover time objectives for your database objects.
When was the last time you tested recovery? Are you sure you can recover your DB2 databases within a satisfactory timeframe? Wouldn’t you sleep better if you had a methodology and process in place for doing so? I know I would…
Thursday, September 04, 2008
Actually, the question asked what kind of a performance impact might be expected if a query was issued against two similar tables. The first table had (say) 20 columns, and the second table had the same 20 columns, as well as 35 additional columns.
Well, most of the basic responses were similar. The consensus was that as long as the query was going against the same columns then performance should be about the same. I disagree. Here is why.
You also need to factor in the I/O requests that are required to return the data. The DBMS will perform I/O at the block (or page) level - this is so whether you return one row or millions of rows. For multi-row results, accessing data from the table with the wider row (more columns) will usually be less efficient. This is so because fewer rows will exist on each page (the row with 100 columns is smaller than the row with 150 columns so more rows can reside in a single, pre-sized block/page). The bigger the result set, the more pronounced the performance degradation can be (because more physical I/Os are required to retrieve the data).
Think about it this way. Is it faster to pull smaller peaches out of a basket than bigger peaches? That is about the same type of question and anybody can envision the process. Say you want 100 peaches. Small peaches fit 25 per basket; big peaches fit ten per basket. To get 100 small peaches you'd need to pull 4 baskets from the shelf. To get 100 big peaches you'd need to pull 10 baskets from the shelf. The second task will clearly take more time.
Of course, the exact performance difference is difficult to calculate - especially over an online forum and without knowledge of the specific DBMS being used. But there will, more than likely, be a performance effect on queries when you add columns to a table.
Wednesday, September 03, 2008
When does it make more sense not to build an index for a DB2 table?
I'll attempt to answer this question for any SQL DBMS, not just for DB2:
First of all, this is a very open-ended question, so I will give a high-level answer. Let's start by saying that most of the time you will want to build at least one - and probably multiple - indexes on each database table that you create. Indexes are crucial for optimizing performance of SQL access. Without an index, queries must scan every row of the table to come up with a result. And that can be very slow.
OK, that being said, here are some times when it might makes sense to have no indexes defined on a table:
- When all accesses retrieve every row of the table. Because every row will be retrieved every time you want to use the table an index (if used) would just add extra I/O and would decrease, not enhance performance. Though not extremely common, you may indeed come across such tables in your organization.
- For a very small table with only a few pages of data and no primary key or uniqueness requirements. A very small table (perhaps 10 or 20 pages) might not need an index because simply reading all of the pages is very efficient already.
- When performance doesn't matter and the table is only accessed very infrequently. But, hey, when do you ever have those type of requirements in the real world?
Other than under these circumstances, you will most likely want to build one or more indexes on each table, not only to optimize performance, but also to ensure uniqueness, to support referential integrity, and perhaps to drive data clustering.
Of course, indexes do not come without cost. Indexes take up disk space and adding a lot of indexes will consume disk space. For some DBMS products, adding many indexes can impact the working set size and perhaps raise memory problems. Additionally, although indexes speed up queries they degrade inserts and deletes, as well as any modification to indexed columns.
What do you think? Are there other situations where a table should have no indexes? Are there any pertinent high-level issues I missed? Feel free to add your thoughts and comments below!
Friday, August 22, 2008
If you are interested in this topic I will be conducting a free webinar titled Data Breach Protection: From a Database Perspective on Wednesday, August 27, 2008 at 10:30 am CDT. This presentation will provide an overview of the data breach problem, providing examples of data breaches, their associated cost, and series of best practices for protecting your valuable production data.
This webinar offers you the opportunity to:
- Understand the various laws that have been enacted to combat data breaches and the trends toward increasing legislation
- Learn how to calculate the cost of a data breach based on industry best practices and research from leading analysts
- Gain knowledge of several best practices for managing data with the goal of protecting the data from surreptitious or nefarious access (and/or modification)
- Learn about the available techniques for securing, encrypting, and masking data to minimize exposure of critical data
- Uncover new data best practices for auditing access to database data and for protecting data stored for long-term retention
Monday, July 28, 2008
Well, my first reaction was to think "this guy doesn't understand the way a SQL DBMS like DB2 works." The data in DB2 tables is not ordered, so there is no way to guarantee that the rows are odd or even numbered. While that observation may (or may not) have been true, it didn't help the guy. So I thought about it and came up with a possible work-around solution.
The first thing we have to do is to mimic row numbers in DB2. Until V9, DB2 did not support the row number construct (such as you can find in Oracle), and we'd like this to work for the versions in support today (V8 and V9).
So, to do this we start by using the COUNT(*) function and a table expression. A table expression is when you substitute SQL in place of the table in the FROM clause of another SQL statement. For example, consider this SQL:
SELECT DEPTNO, ROWNUM
FROM DSN8810.DEPT A,
TABLE (SELECT COUNT(*) + 1 AS ROWNUM
FROM DSN8810.DEPT B
WHERE B.DEPTNO < A.DEPTNO) AS TEMP_TAB;
That puts a pseudo-row number on the table that we can access in our SQL predicates. If, say, we only want to return the even results, we could write the following query:
SELECT DEPTNO, ROWNUM
FROM DSN8810.DEPT A,
TABLE (SELECT COUNT(*) + 1 AS ROWNUM
FROM DSN8810.DEPT B
WHERE B.DEPTNO < A.DEPTNO) AS TEMP_TAB
WHERE MOD(ROWNUM,2) = 0
ORDER BY ROWNUM;
The MOD function returns the remainder of dividing the second argument into the first. So, if the remainder is zero, we have an even number. So, this query returns every other row to the result set. If you want the odd rows only, change the predicate with the MOD function to this:
WHERE MOD(ROWNUM,2) <> 0
Of course, there is no guarantee that the same exact rows will be even (or odd) for subsequent executions of this query. It all depends how DB2 optimizes the query for execution. But it does provide a nice way to produce samples of the data (perhaps to populate a test bed of data).
Thursday, July 24, 2008
Learn all about an exciting new solution for auditing your DB2 for z/OS databases and resources - Guardium for Mainframes - at this free webinar on July 29, 2008.
Guardium for Mainframes provides 100% visibility into mainframe database activities without impacting normal business operations. This webinar will show you how to get better insight into database activity without the performance penalty of typical database trace utilities and without relying on inadequate log file data.
I'll be introducing the webinar and giving a quick overview of the issues, and Bill Baker, a senior software consultant with NEON Enterprise Software, will walk through a demonstration of the Guardium for Mainframes in action!
Monday, July 21, 2008
A RedPaper is sort of like a tip, only longer... and sort of like a RedBook, only shorter... Anyway, if you are interested in the topic, the RedPaper can be donwloaded for free by following this link:
DB2 9 for z/OS Data Sharing: Distributed Load Balancing and Fault Tolerant Configuration
Monday, July 07, 2008
When I spoke at the Techxans event in Houston this past May (2008) I was interviewed beforehand on what my presentation would cover. And lo' and behold, the Techxans folks have put that interview up on YouTube, so I thought I'd share it here with my regular blog readers. Enjoy!
Wednesday, June 25, 2008
Q: We have a CHAR(10) column that cannot contain alphabetic characters. How can we make sure that the letters A thru Z are not allowed.
A: Well, think about the characteristics of alphabetic characters versus the other "things" that can be stored in a CHAR column. One thing that separates an alphabetic letter from numbers, punctuation, etc. is that there are upper and lower case versions (e.g. A, a). So, you could use the following predicate to preclude alphabetic characters from being accepted:
Of course, you will not be able to put this into a CHECK constraint because of restrictions on their content (for example, you cannot use function in a CHECK constraint). But you could use this in SQL statements and as a check in your programs before allowing data to be inserted or modified in your CHAR(10) column.
Anyone else have any other ideas?
Monday, June 16, 2008
The Gartner middleware market numbers were reported in a recent article in eWeek. Evidently, the worldwide application infrastructure and middleware software market revenue totaled $14.1 billion in 2007, a 12.9 percent increase from 2006 revenue of $12.5 billion.
Now that is quite healthy growth in what is a somewhat slow market. And right there at the top of the pile is IBM with a 28.9 percent share of what Gartner identifies as the AIM market...BEA Systems came in second with 9.3 percent of the market, followed by Oracle with 8.5 percent. However, Oracle now owns BEA and will benefit from BEA's market share (next year).
Oracle will likely continue its acquisitive ways, but IBM has not been silent on the acquisition front lately either. So I'm guessing that next year IBM will retain its #1 position with Oracle coming in solidly at #2.
For 2007, though, in terms of growth, Microsoft and Software AG posted impressive gains. Among the big enterprise software vendors, Microsoft came in at 41.6 percent revenue growth year over year. And Software AG showed strong growth with a 107 percent increase from 2006.
This is a market segment, like database software, where a small number of big players own most of the market. However, it is not quite as monopolized as the database market where three players (IBM, Oracle, and Microsoft) dwarf the rest of the field. The top five middleware vendors hold over 50 percent of the overall market and Gartner indicates that the big players are slowly eroding market share from the smaller vendors.
Monday, June 09, 2008
This week (the second week of June) I will be traveling to Washington, DC to speak to the Baltimore-Washington DB2 User Group (BWDUG) on June 11th to deliver "The Impact of Regulatory Compliance on Database Administration." And then, later in the week, June 13th, I will be in Tampa to speak on the topic on database auditing to the Tampa Bay Relational User Group (TBRUG).
And the week after that I will be speaking to the Chicago chapter of (DAMA) on June 18th, on the topic of "Managing Data For Long Retention Periods."
So, if you are in one of the regions where I'll be speaking, I hope you can take the time to attend. And if not, you can always keep track of my speaking schedule on my web site at http://www.craigsmullins.com/speak.htm.
Wednesday, May 28, 2008
I'll touch on trends such as regulatory compliance issues, e-discovery, operational performance improvement, and retiring legacy applications. After examining the forces driving the need to archive database data, we'll look at the requirements for implementing database archiving appropriately, and walk thru an example using TITAN Archive.
If your databases are bursting at the seams, your organization is experiencing compliance-related troubles and/or lawsuits, or you need to figure out how to sunset an old database application or two, this presentation will provide guidance, advice, and a workable template for you to follow.
I hope you can find the time to attend!
Monday, May 26, 2008
It is titled Dimensional Modeling: In a Business Intelligence Environment. The book is not intended to be an academic treatise, but a practical guide for implementing dimensional models oriented specifically to business intelligence systems.
Particularly interesting are the case studies in Chapters 7 and 8 that walk you through BI implementations.
Download it today... and enjoy it at your leisure.
Thursday, May 22, 2008
From the start of the festivities on Monday with the welcome address and keynote session (which can be downloaded here) to the traditional IBM panel and closing session today, IDUG offered consistently high quality education and unparalleled networking opportunities for DB2 professionals.
Usually I blog about the sessions I attend but this year I used Twitter instead to micro-blog the highlights of the sessions I attended right from the sessions using my Treo. I hope you followed my Twitter posts (Tweets, they're called). But even if you didn't it is not too late to follow me on Twitter at www.twitter.com/craigmullins.
One thing I would like to mention, though, is that it looks like the Special Interest Groups are finally being taken seriously. Used to be that the SIGs were put on the schedule late in the day and almost nobody showed up. This year, there were more SIGs and they were scheduled at better times throughout the day - and people showed up for them... and participated. I very much enjoyed participating as a subject matter expert in the Changing Role of the DBA SIG, and I attended a couple other SIGs that were very worthwhile, too!
If you didn't get to the conference this year (or even if you did and missed a few sessions) IDUG will be making audio recordings from this year’s technical sessions available on the IDUG Online Learning Center in July 2008. Full-conference attendees get twelve complimentary downloads with their registration. If you did not attend, individual sessions can be downloaded for a nominal fee. You can check out the IDUG Online Learning Center here (again, that is where the session downloads will be).
And if you just want to voyeuristically take a look at what you missed, you can check out photos from this year's conference online at http://idug2008northamerica.site.shutterfly.com/.
Thanks for another great event, IDUG... and hopefully we'll see you next year in Denver, CO (May 11-15, 2009).
Wednesday, May 14, 2008
I put up Twitter feeds on my home page and here on my blog, too (it is over there on the right). I'm not sure if I'll stick with Twittering long-term, but I probably will - it is a bit addictive. If you want to try it out yourself, click on the follow me on Twitter link over on the right hand side of this page - or click here if you don't want to be bothered tracking it down over there!
I noticed, too, that Willie Favero will be twittering during the upcoming IDUG conference next week and since I can recognize a good idea when I hear/read/see one (good idea, Willie), I think I'll try it, too. So sign up on Twitter before next week if you want to virtually attend IDUG by following our twittering.
Monday, May 12, 2008
A variety of trends and issues are contributing to the growing requirement within enterprises to archive database data for long-term retention and preservation. This webinar will review the trends driving database archiving, including regulatory compliance issues, e-discovery, operational performance improvement, and retiring legacy applications. After examining the driving forces for database archiving, we will walk through the basic steps required to implement best practices based database archiving practice.
If your databases are bursting at the seams, your organization is experiencing compliance-related troubles and/or lawsuits, or you need to figure out how to sunset an old database application or two, this presentation will provide guidance, advice, and a workable template for you to follow.
Monday, April 28, 2008
On April 30th, 2008 I'll be speaking at Alabama DB2 User Group on the topic of Managing Data For Long Retention Periods.
Then, on May 2nd, I mosey on over to Dallas to speak on two topics at the DB2 Forum meeting. I'll cover database auditing in one talk and the other will be my "famous" DB2 Top 10 Lists presentation.
The following week, on May 8th, I'll be in Arizona to discuss The Impact of Regulatory Compliance on Database Administration at SWARUG.
And in my last presentation before IDUG, I'll be giving a shortened version of the regulatory compliance presentation in my hometown of Houston, TX at the Techxans: CIO Speaker Forum.
So maybe I'll see you on the road... and, if not, I hope to see you in Dallas for IDUG the week of May 18 thru 22, 2008. I've got a presenation on data breaches from a database perspective (4 PM on Tuesday), and I'll also be leading a Special Interest Group discussion on the changing role of the DBA (9:15 AM on Thursday). You can see the entire agenda here on IDUG's web site.
Tuesday, April 22, 2008
Any way, the following three articles might be of interest to DB2 for z/OS folks:
Use Real Time Statistics to Automate Your Database Maintenance was published in the April/May 2008 issue of zJournal. This article examines Real Time Statistics (RTS) and the benefits that can be accrued by using RTS. If you aren't using RTS yet, be sure to read this article to learn why you should!
Collecting Histogram Statistics With RUNSTATS was published in the March 2008 issue of DB2 Update. This article discusses one of the many new enhancements that have found their way into DB2 9 for z/OS -- the ability to gather histogram statistics with the IBM RUNSTATS utility.
And finally, the February/March 2008 issue of zJournal contains Much Ado About DB2 Locking. This installation of the z/Data Perspectives column takes a look at the most recent, new locking-related features of DB2 for z/OS.
Thursday, April 17, 2008
Here is the list for those not inclined to click on the link:
1. Lowest outage costs from highest platform reliability, availability, and serviceability.
2. Lowest security breach risks/costs via most secure design, encryption, etc.
3. Highest resource use efficiency/utilization for mixed commercial workloads.
4. Widest platform scalability supports any workload size, mix, growth.
5. Consolidates many new workloads, extends traditional workload strengths.
6. Top data-serving capacity, performance, value—best Information on Demand host.
7. Highest QoS, best performance with fastest response times.
8. Best enterprise SOA platform; enables fullest reuse of mainframe application assets.
9. Much-improved cost model transformed mainframe economics.
10. Lowest power consumption, cooling, and data center floor space needs.
11. Lowest staffing and support costs for enterprise workloads.
12. Lowest total cost of ownership, total cost per user, and total cost per transaction.
13. Best customer investment protection for any enterprise platform.
14. Lowest business risk platform with best world class support.
15. Healthy, expanding mainframe ecosystem is supporting the platform.
If you are a mainframer this list won't come as any surprise to you... but it can be handy to keep it readily available for the next time someone attempts to convince you that mainframes are already obsolete, or should be.
In fact, maybe you can come up with additional reasons. After reading the list (http://www.mainframe-exec.com/articles/?p=12) feel free to submit comments here with any additional reasons you might come up with!
Also, for those who don't know, Mainframe Executive is published by Thomas Communications, the same folks who publish the excellent bi-monthly z/Journal.
Thursday, April 10, 2008
For example, consider the following query:
MIN(D.DEPTNAME) AS DEPT_NAME,
MIN(D.LOCATION) AS DEPT_LOCATION,
SUM(E.SALARY) AS TOTAL_SALARY
FROM DEPT D,
WHERE D.DEPTNO = E.WORKDEPT
AND E.BONUS BETWEEN 0.00 AND 1000.00
GROUP BY D.DEPTNO;
In this query, the detail rows that qualify from each table are joined prior to the GROUP BY processing. In general, there will be more EMP rows than DEPT rows because a department comprises multiple employees. Suppose there were 200 DEPT rows joined to 75,000 EMP rows. The join is done and then the GROUP BY is processed.
Instead, you can use table expressions to force the optimizer to process the aggregations on a table-by-table basis:
FROM DEPT D,
(SELECT WORKDEPT, SUM(SALARY) AS TOTAL_SALARY
FROM EMP E
WHERE E.BONUS BETWEEN 0.00 and 1000.00
GROUP BY E.WORKDEPT) AS E
WHERE D.DEPTNO = E.WORKDEPT;
This will produce the same results but it should perform better.
In general, consider using table expressions to pre-filter FULL JOIN tables, to pre-filter null supplying tables of LEFT/RIGHT joins, to separate GROUP BY work, and to generate or derive data.
Thursday, March 27, 2008
This new edition of an all-time favorite RedBook is newly updated to show the changes that have happened to DB2 stored procedures and related tools from V8 to V9. It offers examples and guidelines for developing stored procedures in several languages. You will also find many useful recommendations for setting up and tuning your environment for stored procedures in this free-to-download manual.
And if you are looking for some "stuff" on using Data Studio with stored procedures, this is the place to go... so, it is time to update by downloading this new edition today!
Wednesday, March 19, 2008
Thanks to everyone who voted for my book… I appreciate your support.
Tuesday, March 18, 2008
Pearson and IBM Press have just published a new eBook on IBM database technology that is available for free download. Just follow this link, provide your e-mail, and they’ll direct you to a PDF containing sample chapters from six recently published books.
The information in the eBook comes from the following books:
- Understanding DB2: Learning Visually With Examples, Second Edition by Chong, Wang, Dang, and Snow - Chapter 2: DB2 at a Glance: The Big Picture
- DB2 9 for Linux, UNIX, and Windows, Sixth Edition by Baklarz & Zikopoulos - Chapter 8: pureXML Storage Engine
- Understanding DB2 9 Security by Bond, See, Wong & Chan - Chapter 11: Database Security — Keeping It Current
- DB2 SQL PL: Essential Guide for DB2 UDB on Linux, UNIX, Windows, iSeries by Janmohamed, et al - Chapter 3: Overview of SQL PL Language Elements
- Mining the Talk by Spangler & Kreulen - Chapter 5: Mining to Improve Innovation
- An Introduction to IMS by Meltz, et al - Chapter 18: Application Programming in Java
But make sure you download that free eBook... after all, what is better than free?
Monday, March 03, 2008
Question: Lets say I have a table A which has 500 columns. Out of those 500 columns only 5 columns have been defined as not nullable and the rest have been defined as NULLS allowed. And out of those 500 columns I have found that 300 columns are unused(empty) totally. My business allows me to remove those 300 columns. My doubt is if I remove those 300 empty columns will I save on DASD space occupied by DB2? Will empty columns occupy DASD space?Would be really helpful if you can guide me on this.
Answer: I'm happy to try to help out. First of all, the short answer to your question is "Yes!" Those 500 columns are all consuming valuable disk space. To determine how much space is being consumed, you will need to examine the data type and length assigned to each column and add them up. And to make matters worse, you must add an additional 1 byte to each of them because the columns are nullable.
In DB2, a NULL is stored using a special one-byte null indicator that is "attached" to every nullable column. If the column is set to NULL, then the indicator field is used to record this. Using NULL will never save space in a DB2 database design - in fact, it will always add an extra byte for every column that can be NULL. The byte is used whether or not the column is actually set to NULL.
So, a column defined as CHAR(5) NOT NULL will required five bytes of storage space - but if it is defined as nullable, then it requires six bytes of storage space - five bytes for the data, and one byte for the null indicator.
Given all of this, it would seem that there is a very viable case to be made for you to remove those columns that are not being used. Of course, this means that you will likely have to make changes to any programs accessing that table. Because the table definition will change (fewer columns) you will need new DCLGENs and those will have to be included and bound into your programs. Be sure to factor this additional workload into your planning before moving forward with this change.
The better question to ask is "How the heck did all of the empty columns get put into the table to begin with and how did that design get past the DBAs?"
If you have an answer for that one, please share it by posting your answer in a comment here!
Wednesday, February 20, 2008
I've written briefly about COBOL before, in my Data Management Today blog. COBOL is still all over the place and in no danger of dying off. According to the Computerworld article, 75% of the world's businesses data is still processed in COBOL, and about 90% of all financial transactions are in COBOL.
Yet there is a lingering perception "out there" that COBOL is dead (or at least dying). And as far as graduating seniors and new programmers are concerned, COBOL ain't cool! New programmers don't want to learn it and most universities don't teach it in their computer science or information science curricula. Just like the mainframe (which is alive and well, too), COBOL is ignored and a big problem is developing.
Analysts at Gartner estimate that there are 180 billion lines of COBOL code in existence and about 90,000 COBOL programmers. To convert all of that to something else "each programmer will require 100,000 hours to complete the conversion of 2 million lines. That works out to 12,500 eight-hour workdays. If we figure 250 workdays per year (though it’s unlikely any Cobol programmers are settling for just two weeks of vacation per year), these guys should be done in 50 years."
Who knows, when I retire (sometime in the far-off future) maybe I'll hang up a shingle and offer my services as a COBOL coder... after all, that is what I started out doing right out of college (all those years ago)...
Tuesday, February 05, 2008
Furthermore, we know that we can guide DB2 on how best to approach this situation using the REOPT parameter of the BIND command. Prior to DB2 V9, there were three options for REOPT:
REOPT(NONE) – DB2 will not reoptimize SQL at run time.
REOPT(ALWAYS) – DB2 will prepare SQL statements again at run time when the host variable values are known. This enables the DB2 optimizer to formulate the query execution plan using the actual host variable values, which can result in better performing access paths.
REOPT(ONCE) – DB2 will prepare SQL statements only once, using the first set of host variable values, no matter how many times the statement is executed by the program. The access path is stored in the Dynamic Statement Cache (DSC) and will be used for all subsequent executions of the same SQL statement. REOPT(ONCE) only applies to dynamic SQL statements and is ignored if you use it with static SQL statements. This option was introduced in DB2 V8.
What is New in V9?
DB2 9 for z/OS introduces a new REOPT option: REOPT(AUTO). The ideas behind REOPT(AUTO) is to come up with the optimal access path in the minimum number of prepares.
The basic premise of REOPT(AUTO) is to re-optimize only when host variable values change. Using this option, DB2 will examine the host variable values and will generate new access paths only when host variable values change and DB2 has not already generated an access path for those values.
REOPT(AUTO) only applies to dynamic statements that can be cached.
After migrating to DB2 9, consider re-evaluating programs bound specifying REOPT(ALWAYS) and REOPT(NONE). In many cases, switching to REOPT(AUTO) from REOPT(ALWAYS) can produce performance improvement; and in some cases you can use re-optimization with REOPT(AUTO) for programs bound REOPT(NONE) because of the fear of too frequent re-optimization causing a performance hit.
In particular, consider specifying REOPT(AUTO) for SQL statements that at times can take a relatively long time to execute, depending on the values of parameter markers. In particular, you should especially consider doing this when parameter markers refer to non-uniform data that is joined to other tables.
Friday, January 25, 2008
This webinar will be presented by myself (Craig Mullins) and Joe Brockert, Sr. Software Consultant for NEON Enterprise Software. We'll discuss the issues associated with dynamic SQL during a DB2 migration and offer a live demo of Bind ImpactExpert. Join us to see the solution that provides predictability in access path changes.
Enroll by clicking on this link.
Wednesday, January 16, 2008
Anyway, at times I will take a question I get and blog about it in Q+A format. Today is one of those days!
The question was: I want to perform a retry on an INSERT under DB2 Z/OS when I get a deadlock/timeout. -911 causes a rollback automatically. Is there a ZPARM or other method of turning this off? I am inserting millions of rows and do not want a rollback to the last commit point.
Here is my answer:
Well, first of all, let me recommend that you minimize the size of your unit of work. If you are inserting millions of rows without a COMMIT you are likely causing locking issues in your environment. The pages you have locked while you are waiting for your millions of inserts to finish are all unavailable to any other user of the table (assuming page locking). That means any data on any page that you have locked cannot be read by anyone else until your unit of work is committed. Any other user, running at the same time as you are, trying to get to any page you have modified, would be getting -911 too.
That being said, you can control whether or not the work is rolled back automatically in CICS (on a thread basis) using an RDO parameter (or RCT if on an ancient CICS). The parameter is called ROLBE (RCT) or DROLLBACK (RDO). If it is set to YES a CICS SYNCPOINT ROLLBACK is issued and a -911 SQLCODE is returned to the program. If NO is coded, a CICS SYNCPOINT ROLLBACK is not issued and the SQLCODE is set to -913. You will have to programmatically either specify COMMIT or ROLLBACK for the unit of work.
In a batch environment you will need to code your programs to periodically issue COMMITs after so many modifications (or using some other method like a timer or loop counter). There is no method I am aware of to automatically control this behavior outside of looking into a third party product (for example, Softbase Checkpoint Restart, and others).
Basically, blog-tagging is a game, of sorts, that has been crawling its way through the blogosphere for awhile now. The way it works, when you are tagged by another blogger, you have to write a blog posting about yourself, with 8 things that others might not know. . . and then tag 8 other bloggers.
So here goes:
- I am an avid music fan. At last count, I have 5,281 CDs and albums (yes, I still have records). I know exactly how many I have because, geek that I am, I keep a list of them in a Filemaker database that I sync up with my Treo. I need that list on my Treo because, without it, I have been known to buy a CD I already own.
- I currently live in Texas, but I was born and raised in Pittsburgh, PA. Go Steelers (we'll get 'em next year)! My Mom, my brother, and his family still live in the Pittsburgh area and I get back to visit them at least once a year.
- I've also lived in the Chicago area. When people ask how I like it in Texas after living up North for so long, I tell 'em "I like it. I basically traded snow for humidity, and you don't have to shovel humidity!"
- I've written two books - DB2 Developer's Guide and Database Administration: The Complete Guide to Practices & Procedures... and I'm working on co-authoring another one on DB2 performance.
- I'm married, and I met my wife while working at PLATINUM technology. Remember them? A lot of good things happened during my days at PLATINUM! In fact...
- At one point, I used to write those monthly DB2 tips you DB2 people used to get in the mail from PLATINUM.
- I currently write four different columns for industry publications, as well as several blogs.
- I own a dog, an English springer spaniel named Jerry... I call him my Jerry Springer spaniel... and two cockatiels.
OK, I guess that means I now have to tag eight others. Willie beat me to the punch on a lot of my favorite DB2 bloggers, though. So I'll tag Peter Armstrong, Chris Foot, Chris Eaton, Trevor Eddolls, Dave Moore, Phil Nelson, Fred Sobotka, and Ralph Wilson.
You folks are now "it"...
Tuesday, January 15, 2008
First up, on The History of Computing Project's site, is this entertaining and informative timeline of mainframe history. The timeline starts in 1939 with the creation of the Atanasoff-Berry Computer at Iowa State. If you are looking for historical events in the life of the mainframe, then this is a good place to start. It contains links to information about, and pictures of, some early mainframes including the ENIAC and the IBM 701.
Another interesting mainframe-related page is at Carnegie-Mellon's Software Engineering Institute site. I point it out not because I agree with the "stuff" written there, but because I find it amusing to see the word "LEGACY" stamped over every inch of the page. Wise up! The mainframe is not just legacy, folks!
And finally a nice little article with the proper perspective on mainframe architecture from IBM. I particularly liked the way this article ended:
As the image of the mainframe computer continues to evolve, you might ask: Is the mainframe computer a self-contained computing environment, or is it one part of the puzzle in distributed computing? The answer is that The New Mainframe is both...
Wednesday, January 09, 2008
With today’s posting we return to our examination of the new features of DB2 9 for z/OS. With V9, DB2 storage groups can be better integrated with SMS storage classes.
Prior to DB2 9, you could only spcify SMS storage classes, data classes, and management classes when using explicit IDCAMS defines. You could use those SMS specifications with your SMS ACS routings, but ACS routines filter on data set names, so those routines could become large and unwieldy if you defined multiple different combinations for different data sets.
The improvement in DB2 9 modifies the CREATE and ALTER STOGROUP statements to utilize SMS classes. This can greatly improve ease-of-use by minimizing the manual effort involved in managing DB2 data sets using DFSMS.
There are three new keywords in the CREATE STOGROUP syntax. You can specify just one, two or even all three of them on one CREATE STOGROUP statement:
- DATACLAS - influences characteristics such as the data set control block (DCB), striping, extended format usage, extended addressability usage and so on.
- MGMTCLAS – defines data set frequency of volume backups, migration requirement and related tasks.
- STORCLAS - define guaranteed spaced and other requirements.
DB2 will not check to verify that the data class, management class, or storage class specified actually exist. In that regard, the parameters are designed to work the same way that the VCAT and VOLUMES parameters have always worked. When the STOGROUP is used to allocate a data set, the specified classes are passed to DFSMS, which does the actual work.
The intent of this posting is not to act as an SMS tutorial. If you wish to investigate the details of SMS in more depth, consult the IBM manual titled -- z/OS DFSMS Implementing System-Managed Storage, SC26-7407.
Additionally, these same parameters have been added to the ALTER STOGROUP statement. When you alter SMS class names of a DB2 STOGROUP, this does not affect the existing data sets. However, if you run the REORG, RECOVER, or LOAD REPLACE utility, DB2 deletes the associated data set and redefines it using the new description of the storage group.
Finally, to accommodate the metadata for these new parameters, three new columns have been added to the SYSIBM.SYSSTOGROUP DB2 catalog table: DATACLAS, MGMTCLAS, and STORCLAS.
Thursday, January 03, 2008
But two recent posts at my other blog may be of interest to readers of my DB2 Portal blog. They deal with the topics of employability and pay -- two topics that are near and dear to the heart of IT and database professionals.
Here are links to those posts:
- DBA Salary Update - a look at a couple of recent salary surveys
- Database Skills Are In Demand - a look at a recent study on IT skills
If you find these posts interesting, subscribe to my Data Management Today blog (via RSS) and/or check in regularly.