Tuesday, February 08, 2022

Db2 User-Defined Functions (UDFs)

A user-defined function, or UDF for short, enables you to customize Db2 to your shop's requirements. It is a very powerful feature that  can be used to add procedural functionality, coded by the user, to Db2. The UDF, once coded and implemented extends the functionality of Db2 by enabling users to specify the UDF in SQL statements just like built-in SQL functions.

Additional functionality and capability aside, it can also be complex to deliver and requires detailed knowledge, additional application development skills, and extra administrative handling.

User-defined functions are ideal for organizations wishing to utilize Db2 and SQL to perform specialized, corporate routines performing business logic and data transformation.

Types of UDFs

There are two ways of creating a user-defined function: you can code your own program from scratch using a traditional programming language; or you can use an existing function as a template, of sorts.

There are two types of user-defined functions that can be written from scratch: scalar functions and table functions. A scalar function is applied to a column or expression and operates on a single value. DB2 offers multiple built-in scalar functions, each of which can be applied to a column value or expression. Examples of built-in scalar functions include LTRIM, SQRT, and SUBSTR. You can read more about Db2 built-in functions in my earlier blog post, A Quick Intro to Db2 SQL Functions.

Table functions are a different type of function that, when invoked, returns an entire table. A table function is specified in the WHERE clause of a SELECT statement taking the place of a table, view, synonym, or alias.

Scalar and table user-defined functions are referred to as external functions, because they are written and developed outside of (or external to) DB2. External UDFs must be written in a host programming language. DB2 UDB user-defined functions can be written in Assembler, C, COBOL, or PL/I.

The final type of user-defined function is a sourced function. A sourced function is based on a function that already exists, that is it can be based on a built-in function or another user-defined function that has already been created. A sourced function can also be based on an existing scalar or column function.

User-defined functions are similar in functionality to application subroutines. But user-defined functions are different because they are used inside of SQL statements. This gives them great power. A user-defined function is not a substitute for an application subroutine, subprogram, or procedure. Instead, user-defined functions are used to extend the functionality of the SQL language.

The Schema

User-defined functions, user-defined distinct types, stored procedures, and triggers all are associated with a schema. By default, the schema name is the authid of the process that issues the CREATE FUNCTION, CREATE DISTINCT TYPE, CREATE PROCEDURE, or CREATE TRIGGER statement.

You can specify a schema name when you create a user-defined function, user-defined type, or trigger. If the schema name is not the same as the SQL authorization ID, then the issuer of the statement must have either SYSADM or SYSCTRL authority, or the authid of the issuing process must have the CREATEIN privilege on the schema.

For example, the following statement creates a user-defined function named FUNCX in the schema named MYSCHEMA:

     CREATE FUNCTION MYSCHEMA.FUNCX ...

If the MYSCHEMA component was not included in the CREATE statement, then the schema would default to the authid of the person (or process) that executed the CREATE statement. In short, the schema is set to the owner of the function. If the CREATE statement was embedded in a program, the owner is the authid of the owner of the plan or package; if the statement is dynamically prepared, the owner is the authid in the CURRENT SQLID special register.

Table Functions

Table functions are different in nature than scalar functions. A table function is designed to return multiple columns and rows. In other words, the output of a table function is itself a table. An example using a table function follows:

SELECT WINNER, WINNER_SCORE,
             LOSER, LOSER_SCORE
      FROM   GAME_RESULTS(5)
      WHERE  LOSER_SCORE = 0;

In this case, the table function GAME_RESULTS( ) is used to return the win/loss statistics for football games. The table function can be used in SQL statements just like a regular DB2 table. The function program is designed to fill the rows and columns of the "table." The input parameter is an INTEGER value corresponding to the week the game was played; if 0 is entered, all weeks are considered. The query above would return all results where the losing team was shut out (had 0 points) during the fifth week of the season.

The following or similar CREATE FUNCTION statement could be used to define the GAME_RESULTS( ) function:

      CREATE FUNCTION GAME_RESULTS(INTEGER)
     RETURNS TABLE (WEEK INTEGER,
                    WINNER CHAR(20),
                    WINNER_SCORE INTEGER,
                    LOSER CHAR(20),
                    LOSER_SCORE INTEGER)
       EXTERNAL NAME GAMES
     
   LANGUAGE C
        PARAMETER STYLE DB2SQL
          NO SQL
          DETERMINISTIC
          NO EXTERNAL ACTION
          FENCED
    
      SCRATCHPAD
    
      FINAL CALL
    
      DISALLOW PARALLEL
    
      CARDINALITY 300;

The parameter identifying this as a table function is the RETURNS TABLE parameter. This parameter is used to define the columns of the table function. The function program must create these rows itself or from another data source such as a flat file.

The value supplied for the CARDINALITY parameter is only an estimate. It is provided to help Db2 optimize statements using the table function. It is possible to return more or fewer rows than is specified in CARDINALITY.

Sourced Functions 

Sourced functions are created from functions that already exist. A sourced function can be based on any existing function, whether built-in (scalar and column) or user-defined (scalar). The biggest reasons that sourced functions are created is to enable functions for user-defined distinct data types. This is required because DB2 implements strong typing. Here is an example of creating a sourced UDF:

 CREATE FUNCTION FINDWORD
    (DOCUMENT, VARCHAR(50))
  RETURNS INTEGER
  SPECIFIC FINDWORDDOC
  SOURCE SPECIFIC FINDWORDCLOB;

In this example a new function, FINDWORD, is created from an existing function FINDWORDCLOB. The function finds the location of the specified word expressed as a VARCHAR(50) value in the supplied DOCUMENT. The function returns an INTEGER indicating the location of the word in the DOCUMENT. DOCUMENT is a user-defined data type based on a CLOB data type.

Summary

UDFs provide powerful functionality that can be used to customize Db2 SQL functionality and standardize your operational functtions and algorithms into your DBMS environment. 

2 comments:

Muehe said...

Let’s say I want to have a function were I return multiple values.
Like I pass in a customer number and I want the formatted name, and formatted address
For example I have a list of customers I need to create a mailinglist that has the the formatted name and address.
So 2 Columns back -- not a table

Select MyLib.Myfunc(CustId) From Mylib.Mailing ;

Name Address
John A. Smith 400 1St New York, NY 45456-2323
Jane Doe 500 E. James Chicago, IL 65421-6565


Craig S. Mullins said...

Well, a scalar UDF returns a single value each time it is invoked. So you can either concatenate the name with the address into a single value (which you can break apart when you receive it) or you can create a table UDF which will return a two column table.

Alternately, you can create a stored procedure that accepts the customer number as an input parameter and returns name and address as output parameters. Of course, you do not use stored procedures the same way that you do UDFs.

All the best.