Wednesday, June 01, 2016

Carefully Code Your DB2 LIKE Predicates

The LIKE predicate is a powerful but potentially problem-causing operator that can be used to great effect in your SQL development efforts.

The LIKE predicate searches for strings that match a specified pattern. Here is the definition from the DB2 SQL Guide:

Read syntax diagram
>>-match-expression--+-----+--LIKE--pattern-expression---------->
                     '-NOT-'                             

>--+---------------------------+-------------------------------><
   '-ESCAPE--escape-expression-'   

But what does this mean? Well, the match-expression is the string to be tested for conformity to the pattern specified in pattern-expression. You can use the underscore and the percent sign characters as wildcards in the pattern to indicate 1 (underscore) or many (percent sign) indeterminate characters. 

The ESCAPE clause is used when you want to actually search for one of the wildcard characters.
But I do not really want to get into explaining the basics of how LIKE works here. If you really need more details on LIKE I refer you to the appropriate IBM Knowledge Center details.

The Semantics of LIKE
What I do want to do today is to give some advice on LIKE usage. First of all, be careful in terms of how you use the wildcard characters (underscore and percent sign).  The '_' character requires a matching character and the '%' character does not.  This can produce interesting results. For example, the following two queries are not equivalent, though at first glance you might think they were:

SELECT *
FROM   cust
WHERE (lastname LIKE 'Peter_%');

SELECT *
FROM   cust
WHERE (lastname LIKE 'Peter%');

Both will match to 'Peterson' and 'Peters', but the first will not match to 'Peter' because of the underscore. 

The LIKE predicate offers a great deal of flexibility and power to your SQL statements. Using LIKE you can quickly retrieve data based on patterns and wildcards. However, some uses of LIKE can be confusing to implement appropriately—especially when LIKE is used with host variables.

Let’s assume that you need to create an application that retrieves customers by last name, but the supplied value for last name can be either the entire name or just the first few bytes of that name. In that case, the following query can suffice:

SELECT custno, firstname, lastname
FROM   cust
WHERE  lastname LIKE :host_variable;

In order for this to work, when you enter the value for host_variable be sure to append percent signs (“%”) to the end of the value. The percent sign specifies that DB2 should accept as a match any number of characters (including 0). This must be done programmatically. So, if the value entered is SM, the host_variable should contain “SM%%%%%%%%” and if the value entered is SMITH, the host_variable should contain “SMITH%%%%%”. Append as many percent signs as required to fill up the entire length of the host variable. Failure to do so will result in DB2 searching for blank spaces. Think about it—if you assign “SMITH%” to a 10-byte host variable, that host variable will think it should search for “SMITH%”, that is SMITH at the beginning, four blanks at the end, and anything in the middle.

So, for “SMITH%%%%%”, SMITH will be returned, but so will SMITHLY (or any name beginning with SMITH). There is no way to magically determine if what was entered is a complete name or just a portion thereof. If this is not acceptable, then a single query will not likely be feasible. Instead, you would have to ask the user to enter whether a full name or just a portion is being entered.

What About Performance?

Notwithstanding the semantic details, there are performance considerations to understand when using LIKE, too. It is a good practice to avoid using the LIKE predicate when the percentage sign (%) or the underscore (_) appears at the beginning of the comparison string because they prevent DB2 from using a matching index. 

The LIKE predicate can produce efficient results, however, when you use the wildcard characters at the end or in the middle of the comparison string, for example:

InefficientCan be efficient with index
LIKE '%NAME' LIKE 'NAME%'                         
LIKE '_NAME'          LIKE 'NA_ME'


DB2 will not use direct index lookup when a wildcard character is supplied as the first character of a LIKE predicate. At bind time, DB2 cannot determine when a host variable contains a wildcard character as the first character of a LIKE predicate. The optimizer therefore does not assume that an index cannot be used; rather, it indicates that an index might be used. At runtime, DB2 determines whether the index will be used based on the value supplied to the host variable. When a wildcard character is specified for the first character of a LIKE predicate, DB2 uses a non-matching index scan or a table space scan to satisfy the search.

Summary

The LIKE operator brings powerful search capabilities to your DB2 SQL queries. Be sure to understand its capabilities and to use it appropriately in your development efforts.

3 comments:

Unknown said...

I am USING LIKE predicate in a clause of mine. In a specific table, when I do

select * from X_table where X_object like :host_X_object, the result is returned only if I add to the host variable i.e. 'host_X_object%' adding the '%' to the end. If I do not, even if the host_x_object is correct, the result is not retrieved. That happens only for that specific table about a VARCHAR. For CHAR it works fine. Any ideas why?

Craig S. Mullins said...

That is the way that LIKE works with VARCHAR

Srikanth said...

I am trying to use the CLP to call an SQL query that uses LIKE:

SELECT NAME, PLACE, ANIMAL from ZOOTABLE where NAME like 'TIG%' or BNAME like 'LIO%';

With DB2 CLP, i run:

db2 "SELECT NAME, PLACE, ANIMAL from ZOOTABLE where NAME like 'TIG%' or BNAME like 'LIO%'";

I get this error:
SQL0104N An unexpected token "%" was found following "where NAME like TIG".
Expected tokens may include: "". SQLSTATE=

Any suggestions would be greatly appreciated. Thank you!