Wednesday, December 18, 2019
High Level Db2 Indexing Advice for Large and Small Tables
Friday, March 04, 2016
The Most Misunderstood Features of DB2 – Part 5: Choosing the Clustering Key
Sunday, September 01, 2013
Top Ten Common SQL Mistakes (with DB2 for z/OS)
- Assuming an ORDER BY is not necessary for ordered results - if you want your results always to be returned in a specific order, you must include the ORDER BY clause. If you fail to do so, a future rebind can change the access path and also possibly change the order of your results set.
- Forgetting the NULL indicator - when your query can return a null, but sure to include a null indicator or you will get a -305 SQLCODE (22002 SQLSTATE). And be sure to check the null indicator to see if the data is null or not!
- Incorrect expectations when using the NOT IN predicate with NULLs - Nulls can be confusing when using the IN predicate. For example, what if we want to find all Colors who are not assigned to a particular Product using a query like shown below. THe problem arises when the P.color can be NULL. The NULL causes the predicate to be UNKNOWN so the results set is always empty.
FROM Colors AS C
WHERE C.color NOT IN (SELECT P.color
FROM Products AS P);
- Coding predicates appropriately in Outer Joins - Sometimes it can be difficult to figure out how to code outer joins appropriately if you don't code them very often. Terry Purcell has written a nice article on the topic that can be found here.
- Not coding a cursor for a multi-row result - When more than one row can be returned by your query you must use a cursor (or specify FETCH FIRST 1 ROW ONLY)
- Recompiling but not binding - if you make changes to the SQL, you have to BIND it again. Otherwise it won't work!
- Forgetting to use single quotes around strings (instead of double quotes) - DB2 SQL expects single quotes around character strings.
- Trying to modify a Primary Key column - you cannot modify a primary key. A primary key should be unique within the table and immutable.
- Forcing dynamic SQL into static SQL (sometimes hundreds of static SQL statements) - you should analyze the type and nature of your database queries to determine whether they should be static or dynamic. Consider using the advice here (static SQL v. dynamic SQL) to guide you.
- Asking for more data than you need (columns and/or rows) - specify only the columns that you actually need in your SELECT-list... and use WHERE clauses (predicates) to filter the data to just that data that you need before bringing it into the program. The lesser the amount of data that DB2 needs to transfer from the database to your program, the more efficient things will be!
Monday, April 04, 2011
What About Surrogate Keys?
As is so often the case with my blog, today's topic came about as the result of an e-mail question I received from a DBA I know. His question was this:
"A great debate rages here about the use of ‘synthetic’ keys. We read all sorts of articles on the wild wild web but none seem to address the database performance impacts of designs using synthetic keys. I wondered if you could point me to any information on this…"
If you've ever Googled the term "surrogate key" you know the hornet's nest of opinions that swirls around "out there" about the topic. For those who haven't heard the term, here is my attempt at a quick summary: a surrogate key is a generated unique value that is used as the primary key of a database table; database designers tend to consider surrogate keys when the natural key consists of many columns, is very long, or may need to change.
And here is the response I sent to my e-mail inquisitor:
I doubt that there is any “final word” on this topic. It has been raging on for years and years; so folks pro, others con. This Wikipedia article offers up a nice start: http://en.wikipedia.org/wiki/Surrogate_key
However, when I get to the performance area of this article I don’t think I agree. The article puts a lot of emphasis on there being fewer columns to join and therefore better performance.. If you’ve got an index on those multiple columns how much “worse” will the performance be, really? Sure, the SQL is more difficult to write, but will a join over 4 or 5 indexed columns perform that much worse than a join on one indexed column? I suppose as the number of columns required for the natural key increases the impact could be greater (e.g. 10 columns???)
I guess I can see the argument if you are swapping a variable length key with a surrogate having a fixed length key – that should improve things!
Furthermore consider this: the natural key columns are still going to be there, after all, they are naturally part of the data, right? So the surrogate (synthetic) key gets added to each row. This will likely reduce the number of rows per page (maybe not, but probably). And that, in turn, will negatively impact the performance of sequential access because more I/O will be required to read the “same” number of rows.
And what about the impact of adding data? If there are a significant number of new rows being added at the same time by different processes there will be locking issues as they all try to put the new data on the same page (unless, of course, your surrogate key is not a sequential number and is, instead, something like the microseconds portion of the current timestamp [that must be tested to avoid duplicates]).
The one thing that usually causes me to tend to favor natural keys is just that – they are natural. If the data is naturally occurring it becomes easier for end users to remember it and use it. If it is a randomly generated surrogate nobody will actually know the data. Yes, this can be masked to a great deal based on the manner in which you build your applications to access the data, but ad hoc access becomes quite difficult.
I guess the bottom line is that “it depends” on a lot of different things! No surprise there, I suppose.
Here are a few other resources with information (not so much on performance though) that you may or may not have reviewed already:
- http://www.agiledata.org/essays/keys.html
- http://www.techrepublic.com/forum/discussions/10-85582-3172364
- http://www.infobright.org/forums/viewthread/348/
- http://www.bcarter.com/intsurr1.htm
- http://weblogs.sqlteam.com/mladenp/archive/2009/10/06/Why-I-prefer-surrogate-keys-instead-of-natural-keys-in.aspx
- http://www.sqlservercentral.com/articles/Primary+key/70747/ - this one talks about SQL Server performance
What do you think about natural keys versus surrogate keys? Surely some readers here have an opinion on this topic! If so, post them as comments...