We all can relate to dealing with systems that have data
integrity problems. But some data integrity problems can be cleaned up using a
touch of SQL. Consider the common data entry problem of extraneous spaces (or
blanks) inserted into a name field. Not only is it annoying, sometimes it can
cause the system to ignore relationships between data elements because the
names do not match. For example, “Craig
Mullins” is not equivalent to “Craig Mullins”; the first one has two
spaces between the first and last name whereas the second one only has one.
You can write an SQL UPDATE statement to clean up the
problem, if you know how to use the REPLACE function. REPLACE does what it
sounds like it would do: it reviews a source string and replaces all
occurrences of a one string with another. For example, to replace all
occurrences of Z with A in the string BZNZNZ you would code:
REPLACE(‘BZNZNZ’,’Z’,’A’)
And the result would be BANANA. So, let’s create a SQL
statement using REPLACE to get rid of any unwanted spaces in the NAME column of
our EMPLOYEE table:
UPDATE EMPLOYEE
SET NAME = REPLACE(
REPLACE(
REPLACE(NAME, SPACE(1),
'<>')
'><', SPACE(0))
'<>', SPACE(1));
"Wait-a-minute," you might be saying. "What are all of those
left and right carats and why do I need them?"
OK, fair enough. Let’s explain how this works starting from the inside
out. The inside REPLACE statement takes the NAME column and converts every
occurrence of a single space into a left/right carat. The next REPLACE (working
outward), takes the string we just created, and removes every occurrence of a right/left
carat combination by replacing it with a zero length string. The final REPLACE
function takes that string and replaces any left/right carats with a single
space. The reversal of the carats is the key to removing all spaces except one
– remember, we want to retain a single space anywhere there was a single space
as well as anywhere that had multiple spaces. Try it, it works.
Of course, you can use any two characters you like, but the
left and right carat characters work well visually. Be sure that you do not
choose to use characters that occur naturally in the string that you are acting
upon.
Finally, the SPACE function was used for clarity. You could
have used strings encased in single quotes, but the SPACE function is easier to
read. It simply returns a string of spaces the length of which is specified as
the integer argument. So, SPACE(12) would return a string of twelve spaces.
Interesting thanks Craig - it is always good to see demonstrated what use could be made of seldom used SQL features, stretches the imagination. PS There are 2 commas missing from the SQL - easily fixed (",)
ReplyDeleteGwyn
This was very helpful, thanks!
ReplyDeleteYes, I am on Twitter. You can follow me at https://twitter.com/craigmullins
ReplyDelete