Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

Monday, November 15, 2021

Db2, SQL, and Modern Coding Techniques

 

Mainframe application development has changed dramatically during the lifetime of Db2. Way back in the beginning, and even for a long time thereafter, most Db2 programs were written in COBOL and coded to use static SQL. This meant that the SQL was bound before it was executed and the access paths were etched in stone. And it also meant that the access paths for any SQL in almost any program were readily available to DBAs in PLAN_TABLEs.

Fast forward to the present and static SQL in COBOL is the exception rather than the rule. Oh, sure, those COBOL programs with static SQL are still there, running in CICS or in batch (or even IMS/TM), but most new development is not done this way anymore. Today, programmers use IDEs to build code that accesses mainframe Db2 data using distributed data access over the web or from a GUI. 

Most modern, distributed, application development projects typically rely upon application development frameworks. The two most commonly-used frameworks are Microsoft .NET and Java/J2EE. And these frameworks use dynamic SQL, not static.

An additional contributing force is the adoption of commercial off-the-shelf applications for ERP and CRM like SAP, Peoplesoft, and Siebel (the last two of which are now Oracle products). These applications are not tied to a specific DBMS but support by multiple different DBMSs, one of which is (was) DB2 for z/OS. So, these applications use dynamic SQL because that makes it easier to accomplish multi-vendor DBMS support. For the most part though, none of these ERP packages run on the mainframe (or Db2) because two of the are now Oracle products and SAP is promoting its own DBMS (HANA).

The point though is that the manner in which Db2 applications are developed has changed. And that means the manner in which Db2 is managed is changing, too. Instead of relying on access paths already being there, DBAs must develop ways of capturing access paths for dynamic SQL statements. Prepared dynamic SQL can be cached in the dynamic statement cache so that the same SQL statement can reuse the mini plan for the statement the next time it runs. And the BIND command can be used to snap the statements in the cache and find the access paths used. But the dynamic statement cache works like a buffer pool, with least recently used statements being flushed to make room for new statements… so you might not find the dynamic SQL you are looking for, at least not without helpful tools or scripts to stay on top of things.

This change has caused many organizations to experience SQL performance problems. Because dynamic SQL is easier to introduce into the system performance can become problematic. It is difficult to get a handle on what dynamic SQL runs when. And it is difficult to track because the DBAs are not accustomed to monitoring and tuning dynamic SQL… or they just do not have the proper tools to be able to do the job appropriately. So, these black holes of dynamic SQL performance problems open up that are left to run wild.

New SQL Stuff

It is not just the new programming paradigm that causes management problems for modern DB2 usage. The amount of new SQL statements and functions and features continues to grow with each new version (and even function level) of Db2. This can be a mixed blessing though. It is good because expanding SQL functionality makes it easier to program and access Db2 data. But it can be problematic because it can be more confusing and difficult to learn these new features.

Exacerbating the issue is that many organizations do not invest a sufficient amount of time and effort in educating their staff. If the DBAs are not trained in all of the new functionality, then new SQL can cause big issues. How? Imagine a shop that just migrated to a new version or level of Db2, but the only education available was to read the manuals. So, some of the DBAs are not knowledgeable on the new SQL code. Then a developer picks up a manual, and learns how to use a new function or arrays as parameters. Kudos to the developer for the initiative, but if problems arise there may not be anybody who knows how to support the new feature.

And there is a lot of new SQL functionality being added. If we focus on just the past few versions of Db2, here is a representative list of new SQL programming related enhancements that have been introduced: TRUNCATE, new data types (DECFLOAT, VARBINARY), optimistic locking, FETCH CONTINUE, ROLE, MERGE, SELECT from MERGE, pureXML, FETCH FIRST & ORDER BY in subselect and fullselect, INTERSECT, EXCEPT, Indicator Variables, TIMESTAMP precision and time zones, Moving  sums and averages, Inline and Non-inline SQL scalar functions, SQL table functions, extended implicit casting, RANK(), ROW_NUMBER(), XQuery, transparent archive query, APPLCOMPAT, IDAA/analytics, grouping sets, ROLLUP, Hadoop access, FTBs, LISTAGG…

That is a lot to learn and this is just a selection of what has been added!

Summary

Things are moving at a fast and furious pace for application developers these days. And the resultant changes can introduce problems that impact your business unless you adapt to align your management and DBA capabilities with the new development methods.

 

Monday, August 16, 2021

SQL to Return the nth Maximum Value

Sometimes what seems like a simple request might take a little bit of thinking to devise and implement a solution. Recently, I was asked how to write SQL that returns the nth maximum value from a column. If you think about it, it isn't too difficult.

Assume that you have a table with 10 rows, and there is a column in that table that holds the values 1 through 10. The idea is to create a query that for the 7th maximum value would return a 7, the 8th an 8, and so on.

This can be accomplished with a common table expression (CTE) and the following query:

WITH TOPN AS 
   (
    SELECT YOUR_COLUMN 
    FROM    YOUR_TABLE
    ORDER BY YOUR_COLUMN DESC
    FETCH FIRST 7 ROWS ONLY
   )
SELECT MIN(YOUR_COLUMN)
FROM   TOPN;

Simply change the value of FETCH FIRST from 7 to whatever you wish n to be.

If there are duplicate values and you want to eliminate them from consideration, just change the SELECT to SELECT DISTINCT in the TOPN common table expression.

This should work for any type of values (numeric, character, date/time) that you wish to query.

Wednesday, November 18, 2020

Deleting "n" Rows From a Db2 Table

I regularly receive database- and Db2-related questions via e-mail. And that is great, but I don't always get a chance to respond to everything. If you've sent me a question and I haven't replied, I apologize. But every now and then, I will use one of the e-mail questions in my in-box and write about it in the blog. Today's question is this:

How do you delete N rows from a Db2 table?

Also, how do you retrieve bottom N rows from a Db2 table without sorting the table on key?

And here is my response:

First things first, you need to refresh your knowledge of "relational" database systems and Db2. There really is no such thing as the "top" or "bottom" N rows in a table. Tables are sets of data that have no inherent logical order.

With regard to the result set though, there is a top and a bottom. You can use the FETCH FIRST N ROWS ONLY clause to retrieve only the first N rows, but to retrieve only the bottom N rows is a bit more difficult. For that, you would have to use scrollable cursors.

A scrollable cursor allows you to move back and forth through the results set without first having to read/retrieve all of the rows before. I suggest that you read up on scrollable cursors in the Db2 SQL Reference manual and the Db2 Application Programming manual. All Db2 manuals can be downloaded in Adobe PDF format for free over the IBM web site.

Basically, you would want to FETCH LAST from the scrollable cursor and then loop through with a FETCH PRIOR statement executing the loop N-1 times. That would give you the "bottom" N of any results set -- sorted or not.

As for your other question, I am confused as to why you would want to delete N rows from a table. Doesn't it matter what the data in the rows is? My guess is that you are asking how you would limit a DELETE to a subset of the rows that would apply to the WHERE condition of the DELETE. The answer is, you cannot, at least not without writing some code.

You would have to open a cursor with the same WHERE conditions specifying FOR UPDATE OF. Then you would FETCH and DELETE WHERE CURRENT OF cursor for that row in a loop that occurs N times. Of course, that means you have to write a program to embed that SQL in.

Hope this answer helps...

Wednesday, October 07, 2020

IDUG 2020 EMEA Db2 Tech Conference Goes Virtual

For those of you who have attended an IDUG conference before you know why I am always excited when IDUG-time rolls around again. And the EMEA event is right around the corner!

Participating at an IDUG conference always delivers a boatload of useful information on how to better use, tune, and develop applications for Db2 – both for z/OS and LUW. IDUG offers phenomenal educational opportunities delivered by IBM developers, vendor experts, users, and consultants from all over the world.

Unfortunately, due to the COVID-19 pandemic, in-person events are not happening this year... and maybe not for some time to come, either. But IDUG has gone virtual, and it is the next best thing to being there! The IDUG EMEA 2020 virtual event will take place November 16–19, 2020. So you have ample time to plan for, register, and attend this year.

If you attended any of the IDUG North American virtual conference earlier this year you know that you can still get great Db2 information online at an IDUG event. And there are a ton of great presentations at the EMEA virtual IDUG conference – just check out the great agenda for the event.

Of course, a virtual event does not offer the face-to-face camaraderie of an in-person event, but it still boasts a bevy of educational opportunities. And the cost is significantly less than a traditional IDUG conference: both in terms of the up-front cost (which is significantly less) and also because there are no travel costs... 

For just $199, you get full access to the virtual conference, as well as a year-long premium IDUG membership and a complimentary certification or proctored badging voucher. If you're already a premium member, you can add the EMEA 2020 Conference access to your membership for just $99.

You can register here https://www.idug.org/p/cm/ld/fid=2149

The Bottom Line

So whether you are a DBA, a developer, a programmer, an analyst, a data scientist, or anybody else who relies on and uses Db2, the IDUG EMEA Db2 Tech Conference will be the place to be this November 2020. 

With all of this great stuff available online from this IDUG virtual event, why wouldn’t you want to participate?


Tuesday, March 10, 2020

A Guide to Db2 Performance for Application Developers



DBAs: are you looking for a way to help train your developers to code more efficient Db2 application programs? 
Programmers: do you want to understand the best practices for writing high-performing Db2 applications?
Well, my latest book, A Guide to Db2 Performance for Application Developers, is just what you are looking for! Available in both printed and eBook formats, this is the book you need to assure that you are building effective, efficient Db2 applications.


This book will make you a better programmer by teaching you how to write efficient code to access Db2 databases. Whether you write applications on the mainframe or distributed systems, this book will teach you practices, methods, and techniques for optimizing your SQL and applications as you build them. Write efficient applications and become your DBA's favorite developer by learning the techniques outlined in this book!

The methods outlined in this book will help you improve the performance of your Db2 applications. The material is written for all Db2 professionals, whether you are coding on z/OS (the mainframe) or on Linux, Unix or Windows (distributed systems). When there are pertinent differences between the platforms it is explained in the text.

The focus of the book is on programming, coding and developing applications. As such, it does not focus on DBA, design, and data modeling issues, nor does it cover most Db2 utilities, DDL, and other non-programming related details. If you are a DBA, the book should still be of interest to you because DBAs are responsible for overall Db2 performance. Therefore, it makes sense to understand the programming aspect of performance.

It is important also to understand that the book is not about performance monitoring and tuning. Although these activities are important, they are typically not the domain of application developers. Instead, the book offers guidance on application development procedures, techniques, and philosophies. The goal of the book is to educate developers on how to write "good" application code that lends itself to optimal performance. By following the principles in this book you will be able to write code that does not require significant remedial, after-the-fact modifications by performance analysts. If you follow the guidelines in this book your DBAs and performance analysts will love you!

The assumption is made that the reader has some level of basic SQL knowledge and therefore it will not cover how to write Db2 SQL code or code a Db2 program. It is also important to point out that the book does not rehash material that is freely available in Db2 manuals that can be downloaded or read online.

What you will get from reading this book is a well-grounded basis for designing and developing efficient Db2 applications that perform well.

You can order your copy of A Guide to Db2 Performance for Application Developers today at:

Monday, April 22, 2019

Db2 Application Testing Considerations


Testing application programs is a discipline unto itself and there are many considerations and nuances to be mastered to be able to test software appropriately. This blog post will not go into a lot of depth regarding testing practices, other than to highlight the important things to keep in mind to assure optimal performance of your Db2 applications.

Statistics

The data that you use in your test environment will not be the same as your production data. Typically, you will have less test data than you do in production. So, if you run the RUNSTATS utility on your test data you will get different statistics than in production. 

Instead of running RUNSTATS, you can test your SQL access paths by updating the test catalog statistics to be the same as your production system. Do not attempt to modify test statistics on your own. You should work with your DBA group to set up a process for up­dating test statistics. This can be accomplished in various ways. Your organization may have a tool that makes it easy to copy statistics from production to test; or your DBA team may use a DDL script of queries and modification statements to populate test statistics from production.

If your test table definitions are different in test and produc­tion you will need to take this into account in the script. For example, things like creator, name, indexes, number of parti­tions, and even columns can differ between environments. Furthermore, you may have new tables and columns for which there are no current production statistics, meaning that you will need to create estimated statistics based on your knowledge of the business and application.

Some organizations make the mistake of copying production statistics to test once, and never (or rarely) populating test again. This is a mistake because most production databases change over time, sometimes dramatically. When you run Runstats for your production applications it is a good idea to also update your test statistics from the new production statis­tics.

Modeling a Production Environment in Test

Another tactic you can use to improve the accuracy of access path testing is to model the configuration and settings of your productionenvironment in your test system. Remember that the Db2 optimizer does not just use statistics, but also infor­mation about your computing environment.

Db2 test systems typically vary from the production system. Application testing is often accomplished on test systems that have different parameters and configurations than the produc­tion systems that run the applications. Test systems usually get set up on a less powerful processor (or LPAR), and use less memory for buffering, sorting, and other system processes. This can result in different access paths for test and produc­tion systems, which can cause performance problems that only show up after you move to production.

However, it is possible to model the configuration and param­eters of your production environment in your test system. You can specify configuration details for Db2 to use for access path selection in your test system using profile tables.

Work with your DBA group to configure proper profile settings for testing your applications.

Test Cases for Skewed Data

Db2 assumes that data values are mostly uniformly distributed throughout the data. However, not all data is uniformly distributed. Db2 RUNSTATS can capture information about non-uniformly distributed and skewed data.

When data is non-uniformly distributed a subset of the values occur much more frequently than others. A special case of non-uniformly distributed data is skewed data. When data is skewed, one value (or a very small number of values) occurs much more frequently than others.

Non-uniformly distributed and skewed data presents a performance testing challenge. The Db2 optimizer can formulate different access paths for non-uniformly distributed data based on the actual values supplied. This is particularly important for dynamic SQL applications, but you should be aware of non-uniformly distributed and skewed data even for static SQL applications.

For non-uniformly distributed data you can examine the Db2 catalog to obtain values for the most commonly occurring values. For Db2 for z/OS this information is in the SYSIBM.SYSCOLDIST table.

Be sure to test with several of the values that are stored in the Colvalue column of these tables, and some that are not. This will enable you to test the performance of the most common values and less common values. The access paths may differ for each and the performance also can vary.

An Example

Let’s discuss an example. Suppose you operate a bar with a cigar room and you want to keep track of customers with a table of cigar smokers. We gather information like name, address, phone number, sex, and favorite cigar in the table. Cigar smokers skew male, so there will likely be many more rows where Sex is M, than there are where Sex is F.  With that background, consider a query like this one:

  SELECT name, phoneno, fav_cigar
  FROM   cigar_smokers
  WHERE  sex = ?;

Because the data is skewed, it is possible that Db2 will choose a different access path for M than for F. If the vast majority of the rows are men, then a table scan might be chosen for Sex = ‘M’; whereas with only a few rows for women, an index might be chosen if one exists on the Sex column.

This is just a simple example. You need to understand your data and how it skews to make sure that you create and test sample test cases for all of the pertinent values.

SQL Variations

A final performance testing consideration is to consider multiple SQL variations, especially for queries that access a lot of data or have complex access paths. Do not just find one SQL formulation that works and stick with it. Remember from earlier chapters that you can code multiple variations of SQL statements that return the same data, but that perform quite differently.


Note
This blog post was adapted and excerpted from my latest book, A Guide to Db2 Performance for Application Developers. Click the link for more information or to buy a copy (both print and ebook available).



Thursday, February 01, 2018

Db2 Application Development for Performance: Be Early and Be Informed

The title of this blog post is also the title of one of my two IDUG Db2 Tech Conference presentations in Philadelphia this year. I chose this topic because I am in the process of writing a book on Db2 performance from a developer perspective.

There are a lot of performance-focused presentations but most of them are from the perspective of monitoring and tuning. In other words, they are about looking for and fixing problems in code that already exists. Now there is nothing wrong with this. In fact, it is necessary. Even if everything is written correctly from the get-go (right, sure it is) over time things will change and performance will decline. So we need tools to ferret out what is going wrong and help us to fix it. This is mostly the domain of the DBA to do.

From the programmer's point-of-view, though, it seems that things are lacking. I frequently see SQL and program code that seems to have been written by someone with no understanding of relational basics. This has to change.

That is the reason for my IDUG presentation... and also for the book. The idea is to give advice and direction to Db2 application developers and programmers on writing efficient, well-performing programs. If you write code and access data in a Db2 database, then this book and presentation should be helpful to you.

The material is written for all Db2 professionals. It will be useful for users of Db2 for Linux, Unix and Windows as well as for users of Db2 for z/OS. When there are pertinent differences between the two I point it out. Also, much of the material will apply to any relational database system, not just Db2; nevertheless, the book is written and intended for Db2 users.

I don't try to teach basic programming skills, other than, of course, some guidance on SQL. And even that is not basic. I assume you can figure out a basic Select, Insert, Update or Delete... and even some more complex stuff like joins. The focus of the book is on programming, coding and developing applications.

If you are a DBA, most of the material will still be of interest to you because DBAs are responsible for overall Db2 performance. Therefore, it makes sense to understand the programming aspect of performance. That said, I will not be covering DBA and system administration level performance. But if you think the material is worthy, knowing it is there to recommend to your developers (new and old) can be worthwhile!

The book and presentation focus on guidance regarding application development procedures, techniques, and philosophies. The goal is to educate developers on how to write good application code that lends itself to optimal performance. Of course, this does not mean that every program you write will perform at top speed. But it should enable you to write code that does not require remedial after-the-fact modifications by performance analysts. If you follow the guidelines I outline in this book and presentation, I can say with confidence that your DBAs and performance analysts will love you!

So if you are going to IDUG in Philadelphia this year, be sure to attend my presentation. Be sure to say "Howdy!" and that this blog post guided you to the presentation... and also, keep an eye on my web site, Mullins Consulting, Inc., for information about the book when it gets published, hopefully later in 2018.

Friday, February 24, 2017

The DB2 12 for z/OS Blog Series - Part 5: Enhancements for Managing Dynamic and Static SQL

Most modern application development is done using dynamic SQL. But some features work only with static SQL and others only with dynamic SQL. DB2 12 for z/OS delivers functionality that minimizes the difference between static and dynamic SQL.

Dynamic plan stability brings the plan stability feature of static SQL to dynamic SQL. With plan stability for static SQL, you can use the PLANMGMT parameter of REBIND to save off old access paths that can be switched back to active if the new access paths are inefficient for any reason.

DB2 12 introduces dynamic plan stability, which is a little different but should prove to be quite useful. You can choose to stabilize dynamic SQL access paths by storing them in the DB2 system catalog. DB2 will not just automatically start to capture all dynamic access paths. There are options for selecting which queries to stabilize, so you do not have to store all dynamic SQL. This is controlled by the command:

 -START DYNQUERYCAPTURE

When a dynamic statement is run, DB2 will look in the dynamic statement cache first. If it is not found there, DB2 will look in the system catalog before resorting to a full prepare. This is particularly useful when statements are flushed from the DSC and a prepare would otherwise be required.

You can label a set of stabilized SQL statements into a group known as a stabilization group. This makes it easier for DBAs to track and manage stabilized queries.

Click here for more details on Dynamic Plan Stability.

So dynamic plan stability can make your dynamic SQL more static. But there is another new DB2 12 capability that can make your static SQL more dynamic: static Resource Limit Facility (RLF). The RLF, heretofore, could only be used to govern dynamic SQL statements. RLF tables, manipulated by DBAs, contain limits that make sure that dynamic SQL statements do not consume too many resources such as CPU, I/O, locks, etc. This enables improperly tested SQL or poor resource planning from disrupting performance.

But dynamic SQL is not the only type of SQL that could cause performance issues; static SQL transactions can benefit from the same controls. DB2 12 extends the RLF to support static SQL statements thereby improving the ability to avoid problem applications from dominating your system’s resource consumption.

Specifically, you can set up reactive governing for static SQL statements by adding new rows in resource limit facility tables. Static SQL statements will be governed by rows where RLFFUNC='A' (for DSNRLSTxx tables) and RLFFUNC='B' (for DSNRLMTxx tables).

You can control whether resource limits apply to dynamic only, static only or all SQL statements using the DSNZPARM RLFENABLE. Acceptable values are DYNAMIC, STATIC, or ALL and the default is DYNAMIC. Furthermore, you can specify the default resource limit actions for static SQL statements by setting two DSNZPARMS:
  • RLFERRSTC for local statements
  • RLFERRDSTC for remote statements


In each case, the acceptable values are NOLIMIT, NORUN, or a number between 1 and 500000. You use NOLIMIT to let any static SQL statement that does not correspond to a row in the resource limit table run freely. Alternately, NORUN indicates that any static SQL statement that does not correspond to a row in the resource limit table is prevented from running. Finally, if you specify a number between 1 and 500000, that will be the number of service units to use as the default resource limit. If the limit is exceeded, the SQL statement is terminated.

If you need more details on setting up resource limits in general, you can find that information in the IBM manuals and online here.

But the bottom line is that DB2 12 gives users more and better options for managing both their dynamic and static SQL performance.  And that is definitely a good thing!

Tuesday, January 17, 2017

The DB2 12 for z/OS Blog Series - Part 1: SQL Pagination

Today’s blog post kicks off a new series of posts that examine new features and functionality delivered with the latest and greatest new version of DB2 foir z/OS – Version 12.

We’ll begin with a nice new feature that should make it easier for programmers writing queries where the program needs to be able to deal with a subset of results. For example, to show the first 25 rows, then the next, etc. This is a frequent requirement for mobile and web applications that are common these days.

This separating of rows into piece can now be accomplished quite easily in DB2 12 with the new OFFSET syntax. OFFSET is specified as a clause of the SELECT statement and it is used to specify the number of rows in the result table that will be skipped before
any rows are retrieved.

The offset clause is simple to code. You just need to determine the number of rows that you want to skip and code that in the clause. For example, to skip 5 rows you would code OFFSET 5 ROWS. Simple, right? Well, conceptually, yes, but in practice you have to be careful.

First of all, you must know where/when you can code an OFFSET clause. OFFSET can be specified on a subselect or fullselect that is outermost fullselect in a prepared SQL statement or a DECLARE CURSOR statement. OFFSET is also permissible in a SELECT INTO statement. However, you cannot code an OFFSET clause in a view definition, an MQT, the RETURN statement of SQL table functions, in row permissions or column masks, or in the outermost fullselect for a sensitive dynamic cursor. A further restriction is that the statements that includes the OFFSET clause cannot contain an expression that is not deterministic or that has external action.

The second, and I think more interesting aspect to consider when using the OFFSET clause is that your result set must be predictable to assure that you are retrieving useful data. As we all should know, the access path chosen by the optimizer can alter the order in which rows are returned. If you want to access rows in sets of 25, for example, then you want to make sure that each successive set does not contain any rows from previous sets, and does not omit any rows in between sets.

So how do we do this? The best approach is to code up an ORDER BY clause that specifies columns that uniquely identify each row in the result table. If there are duplicates, then there is no way to verify that you are getting the data that you want. IN other words, the order of the rows is not deterministic. The data being skipped will not be predictable and it is highly likely that you will not be accessing all of the data that you want (or perhaps even accessing the same data twice).

It is also important to understand that if the OFFSET number of rows is greater than the number of rows in the intermediate result table, you will get an empty result.

So let’s look at a quick example. Suppose we are looking for employees earning more than $50000. Further, we want to skip the first 10 rows of the EMP table, and grab the next 10. This SQL should do the trick:

SELECT *
FROM EMP
WHERE SALARY > 50000.00
ORDER BY EMPNO
OFFSET 10 ROWS
FETCH FIRST 10 ROWS ONLY;

The OFFSET 10 ROWS will cause the first 10 qualifying rows to be skipped. The FETCH clause gets the next 10 qualifying rows (after skipping the first 10).

But OFFSET is not the only new feature to help developers with SQL pagination. Also available with DB2 12 is data-dependent pagination, which uses row value
expressions in a basic predicate. That sounds more complicated than it really is, so let’s look at an example:

We can code a WHERE clause like this to search more rows with a name greater than mine:

WHERE (LASTNAME, FIRSTNAME) > (′MULLINS′, ′CRAIG′)

This is easier than what we previously had to do (before DB2 12), which was breaking the WHERE up as follows:

WHERE (LASTNAME = ′MULLINS′ AND FIRSTNAME > ′CRAIG′)
OR (LASTNAME > ′MULLINS′)


The bottom line is that application coding for pagination becomes a lot easier in DB2 12 for z/OS…

Wednesday, June 01, 2016

Carefully Code Your DB2 LIKE Predicates

The LIKE predicate is a powerful but potentially problem-causing operator that can be used to great effect in your SQL development efforts.

The LIKE predicate searches for strings that match a specified pattern. Here is the definition from the DB2 SQL Guide:

Read syntax diagram
>>-match-expression--+-----+--LIKE--pattern-expression---------->
                     '-NOT-'                             

>--+---------------------------+-------------------------------><
   '-ESCAPE--escape-expression-'   

But what does this mean? Well, the match-expression is the string to be tested for conformity to the pattern specified in pattern-expression. You can use the underscore and the percent sign characters as wildcards in the pattern to indicate 1 (underscore) or many (percent sign) indeterminate characters. 

The ESCAPE clause is used when you want to actually search for one of the wildcard characters.
But I do not really want to get into explaining the basics of how LIKE works here. If you really need more details on LIKE I refer you to the appropriate IBM Knowledge Center details.

The Semantics of LIKE
What I do want to do today is to give some advice on LIKE usage. First of all, be careful in terms of how you use the wildcard characters (underscore and percent sign).  The '_' character requires a matching character and the '%' character does not.  This can produce interesting results. For example, the following two queries are not equivalent, though at first glance you might think they were:

SELECT *
FROM   cust
WHERE (lastname LIKE 'Peter_%');

SELECT *
FROM   cust
WHERE (lastname LIKE 'Peter%');

Both will match to 'Peterson' and 'Peters', but the first will not match to 'Peter' because of the underscore. 

The LIKE predicate offers a great deal of flexibility and power to your SQL statements. Using LIKE you can quickly retrieve data based on patterns and wildcards. However, some uses of LIKE can be confusing to implement appropriately—especially when LIKE is used with host variables.

Let’s assume that you need to create an application that retrieves customers by last name, but the supplied value for last name can be either the entire name or just the first few bytes of that name. In that case, the following query can suffice:

SELECT custno, firstname, lastname
FROM   cust
WHERE  lastname LIKE :host_variable;

In order for this to work, when you enter the value for host_variable be sure to append percent signs (“%”) to the end of the value. The percent sign specifies that DB2 should accept as a match any number of characters (including 0). This must be done programmatically. So, if the value entered is SM, the host_variable should contain “SM%%%%%%%%” and if the value entered is SMITH, the host_variable should contain “SMITH%%%%%”. Append as many percent signs as required to fill up the entire length of the host variable. Failure to do so will result in DB2 searching for blank spaces. Think about it—if you assign “SMITH%” to a 10-byte host variable, that host variable will think it should search for “SMITH%”, that is SMITH at the beginning, four blanks at the end, and anything in the middle.

So, for “SMITH%%%%%”, SMITH will be returned, but so will SMITHLY (or any name beginning with SMITH). There is no way to magically determine if what was entered is a complete name or just a portion thereof. If this is not acceptable, then a single query will not likely be feasible. Instead, you would have to ask the user to enter whether a full name or just a portion is being entered.

What About Performance?

Notwithstanding the semantic details, there are performance considerations to understand when using LIKE, too. It is a good practice to avoid using the LIKE predicate when the percentage sign (%) or the underscore (_) appears at the beginning of the comparison string because they prevent DB2 from using a matching index. 

The LIKE predicate can produce efficient results, however, when you use the wildcard characters at the end or in the middle of the comparison string, for example:

InefficientCan be efficient with index
LIKE '%NAME' LIKE 'NAME%'                         
LIKE '_NAME'          LIKE 'NA_ME'


DB2 will not use direct index lookup when a wildcard character is supplied as the first character of a LIKE predicate. At bind time, DB2 cannot determine when a host variable contains a wildcard character as the first character of a LIKE predicate. The optimizer therefore does not assume that an index cannot be used; rather, it indicates that an index might be used. At runtime, DB2 determines whether the index will be used based on the value supplied to the host variable. When a wildcard character is specified for the first character of a LIKE predicate, DB2 uses a non-matching index scan or a table space scan to satisfy the search.

Summary

The LIKE operator brings powerful search capabilities to your DB2 SQL queries. Be sure to understand its capabilities and to use it appropriately in your development efforts.

Tuesday, May 17, 2016

Come See Me at IDUG!

Just a quick blog post today to let all of my readers know my schedule for the IDUG DB2 Technical Conference being held next week in Austin, TX. With this information you can track me down in multiple places if you want to ask me a question, catch up, or just say “Hi!”

First of all, you can find me on Tuesday at the Expo Hall in the CorreLog booth (#300) from 12:00 to 1:00 and from 5:00 to 6:00. CorreLog will also be raffling off copies of my book, DB2 Developer’s Guide, to 4 lucky winners… so be sure to stop by. And chat with CorreLog about their SIEM and auditing solutions for DB2.

You should also make sure to attend my IDUG session titled “It’s Not Your Daddy’s DB2!” on Wednesday at 3:30 PM (session B13). The general idea of the session is that DB2 is changing and you should be changing with it. Over the course of the past few releases of DB2 for z/OS, IBM has added many features and capabilities that are transforming the platform. I’ll take a look at the big changes that have been introduced to DB2 including new SQL, universal table spaces, improved security, and more. The session also offers guidance on how to continue improving your DB2 environment to keep up with industry, technology and DBA trends circa 2016.

But that’s not all. On both Tuesday and Wednesday I will be co-presenting with SEG at their VSP sessions. On Tuesday at 1:00 PM (Session V02) I will be co-presenting with Ulf Heinrich on DB2 audit and compliance issues and how they impact your DB2 data and database assets. In particular, we will focus on database auditing, talking about what is needed, the various methods of auditing, and a new offering from SEG that you can use to run audit analytics against a long-term repository. And on Wednesday at 10:30 AM (Session V08) I will co-present with Roy Boxwell about DB2 release incompatibilities and how they impact your DB2 applications. And I hear that SEG will have some of my books to raffle off, too!

And there’s still more! On Tuesday night (6pm to 9pm) I will be speaking at a DataKinetics event with Colin Oakhill on the topic of SQL quality assurance. Be sure to stop by the DataKinetics booth (#509) to get an invitation to the Tuesday night event where you can hear me and others speak about SQL quality and performance… as well as enjoy a tasty snack or beverage.


So if you’ll be at IDUG and you want to find me, there is really no reason why you shouldn’t be able to track me down at one or more of these places… 

See you in Austin!

Wednesday, April 20, 2016

IDUG is Coming to the Lone Star State

The weather is improving, Spring has sprung and that means it is, once again, time to start planning your trip to this year's IDUG North American DB2 Tech Conference. This year it is being held in Austin, TX at the Renaissance Austin Hotel. This is a very nice area and the hotel is great – I’ve stayed there numerous times in the past.

If you’ve ever attended an IDUG conference before then you know all about the fantastic educational and entertainment opportunities that IDUG offers. Of course there will be a ton of informative technical sessions on all of the latest and greatest DB2 technologies and features. The presentations vary in length from an hour to a half day to complete full day training sessions. 

And if you are looking to learn something completely new, but data-related, this year there is even a special opportunity to learn about Spark technology on Wednesday.

But IDUG is not just a bunch of presentations… it is so much more. There are countless networking opportunities to meet and talk with your peers, IBM developers, and industry consultants. And let’s not forget about the vendor exhibit hall where you can talk to the ISVs and learn all the about software and tools that can help you get the most out of DB2... not to mention picking up a few tchotchkes and mementos.

Additionally, there are Special Interest Groups (where you can discuss the latest industry trends and topics with other interested technicians), Hands-On Labs (delivering working training led by IBM), and the opportunity to take complementary IBM certification exams.

I’ll be there, too. So if you're going to IDUG be sure to find me and say “Hello.” I'll be delivering talking about the changing state of DB2 for z/OS in my session: It’s Not Your Daddy’s DB2! And I’ll also be talking at the VSPs this year… 2 of them. I'll be co-presenting with SEG on DB2 auditing and also on managing DB2 incompatibilities. So be sure to attend these informative sessions!

I am also planning to present at a special evening meeting being held by DKL. I’ll talk about SQL quality assurance and performance… but there will also be libations and snacks, too. So hunt us down on Tuesday night… stop by the DKL booth for an invitation.


The IDUG DB2 Tech Conference is the place to be to learn all about DB2 from IBMers, gold consultants, IBM champions, end users, and more. With all of this great stuff going on in Austin this May, why wouldn't you want to be there!?!?

Thursday, March 17, 2016

Digital Transformation and DB2 for z/OS: It’s Not Your Daddy’s DB2!

If you are a DBA who has been using DB2 for z/OS for a while you should have noticed that we are not doing things the same way we used to. DB2 is changing and we should be changing with it. If you are still using DB2 the same way you did 10 or 20 years ago, then you are definitely not adhering to industry best practices!
The same trends that are driving the digital explosion are also changing DB2 and the traditional role of the DBA. We are storing more data and different types of data for longer periods of time and in different ways than we have in the past.
And DB2 for z/OS keeps changing to adopt and embrace modern data management requirements and techniques. Whether it is modernizing storage with universal table spaces, embracing unstructured data in LOBs, or expanding the SQL language with new and more functionality, today’s DB2 looks a lot different than it did yesterday. Indeed, it is different – it is not your daddy’s DB2.
I’ve been writing a series of blog posts for BMC about this topic under the title It’s Not Your Daddy’s DB2!  You can find the first three blog posts in this series here: 1 2 3
But you can also attend a live webinar that BMC is sponsoring where I will talk about these issues. You can learn about:
·        Trends that influence the size and complexity of your DB2 environment and how this impacts data management
·        How to adapt to new DB2 data types and structures
·        Best practices and technologies for managing DB2 in the digital age
·        And BMC will share its next generation technology for managing the new world of DB2 for z/OS.

Learn how digital transformation will change the way your DBAs manage critical business needs. Attend this webinar on March 30, 2016, at 12:00 pm CT.

Tuesday, October 06, 2015

Sorting by Day of the Week in DB2

Today's blog will walk you through a sorting "trick" that you can use when you are dealing with days of the week. A relatively common problem involves sorting a column that contains a day-of-the-week not in alphabetical order by in the order it appears in the week...

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing the name of the day on which the transaction happened; let’s call this column DAY_NAME. Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, if we write the first query that comes to mind, the results will obviously be sorted improperly:

SELECT   DAY_NAME, COL1, COL2 . . .
FROM     TXN_TABLE
ORDER BY DAY_NAME;

The results from this query would be ordered alphabetically; in other words
  1. FRI
  2. MON
  3. SAT
  4. SUN
  5. THU
  6. TUE
  7. WED

And it would rare to want this type of data sorted in that order, right? The more common need would be to sort the data the way it appears on the calendar.



One solution would be to design the table with an additional numeric or alphabetic column that would allow us to sort that data properly. By this I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. Then we could SELECT DAY_NAME but sort by DAY_NUM.

But this is a bad fix. Not only does it requires a database design change, this "fix" also introduces the possibility for the DAY_NUM value and DAY_NAME value to get out of sync... unless we are very careful, or perhaps do not allow DAY_NUM to be changed other than via an INSERT trigger that automatically populates the correct number. But requiring a trigger adds even more complexity to this "fix" which really should indicate to us that it is not a very good proposal.

A better solution uses just SQL and requires no change to the database structures. All you need is an understanding of SQL and SQL functions – in this case, the LOCATE function.  Here is the SQL:

SELECT   DAY_NAME, COL1, COL2 . . .
FROM     TXN_TABLE
ORDER BY LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME);

The trick here is to understand how the LOCATE function works: it returns the starting position of the first occurrence of one string within another string. So, in our example, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position.

So,if DAY_NAME is WED, the LOCATE function in the above SQL statement returns 10... and so on. To summarize, Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

(Note: Some other database management systems have a function similar to LOCATE called INSTR.) 

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME)/3) + 1;


Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week. 

You can monkey with the code samples here to modify the results to your liking. For example, if you prefer Monday to be the first day of the week and Sunday the last day of the week, simply change the text string in the LOCATE function as follows:

LOCATE('MONTUEWEDTHUFRISATSUN',DAY_NAME)

My intent with this blog posting is actually twofold. Firstly, I wanted to share a method of sorting days-of-the-week... but secondly, I hope to also have piqued your interest in using SQL functions to achieve all types of different processing requirements. DB2 offers quite a lot of functions and it seems to me that many programmers fail to use functions to their fullest.

So if you are ever face-to-face with a difficult query request, step back and take a moment to consider how DB2 functions might be able to help!