Sybase Business Intelligence Solutions - Database Management, Data Warehousing Software, Mobile Enterprise Applications and Messaging
Sybase Brand Color Bar
delete

Search for    in all of Sybase.com
view all search results right arrow
  blank
 
 
 
 
 
Support > Technical Documents > Document Types > White Paper-Technical > User-Defined Function Development for Sybase IQ an...

User-Defined Function Development for Sybase IQ and RAPStore - Best Practices Guide

This Best Practices Guide is a supplement to the Sybase IQ User-Defined Functions Guide, and is meant to assist developers in the development of User-Defined Functions for Sybase IQ and RAPStore (part of RAP - The Trading Edition R3).
 
RSS Feed
 
 
 
User-Defined Function Development for Sybase IQ and RAPStore - Best Practices Guide - Sybase

Supplement to the Sybase IQ User-Defined Functions Guide

Abstract: This paper provides a basic overview of the implementation of User-Defined Functions (UDFs) for Sybase IQ, available starting with release 15.1. It is meant to be a supplement to the Sybase IQ 15.1 User-Defined Functions Guide, which is provided as part of the product documentation set for Sybase IQ 15.1 and higher releases.

Audience: This paper is intended for developers of UDFs, as well as administrators responsible for installing and configuring UDFs on a Sybase IQ instance.

Author: Jay Mueller, Sybase Partner Certification Group, Application Engineering. Additional information provided by Sybase IQ Engineering.


Contents

REVISIONS and REFERENCED DOCUMENTS

OVERVIEW

TYPES OF USER-DEFINED FUNCTIONS
o Cardinality of Input Rows (scalar vs. aggregate)
o Cardinality of Output Results (single output vs. multiple ouput)
o Summary of the different types of UDFs (diagram)
o Implementation Choices for Multiple-output UDFs
o Other Attributes (number of input columns/parameters, etc.)

UDF LICENSING

PLANNING AND DESIGN
o Considerations for Developing within a Database Context
o Practices to Avoid
o UDF Naming Conventions
o Design – Basics

IMPLEMENTATION - Basic Workflow
o Setting the dynamic library interface
o Defining the descriptor structure
o Function entry points
o Context Storage for Aggregate UDFs
o Calling patterns

SELECTED SQL CALL-BACK FUNCTIONS
o Getting Parameters and Setting Results
o Allowing Server to Terminate Functions
o Generating Exceptions and Logging Messages
o Converting Data, OLAP, and other callback functions

DECLARING A SQL FUNCTION TO ACCESS THE UDF

INSTALLATION, UPGRADE AND UNINSTALLATION

RUNTIME CONFIGURATION AND TESTING

ADDITIONAL INFORMATION

Skip forward to OVERVIEW


Revisions:

Revision Author Date
Initial publication Jay Mueller - Partner Certification - Application Engineering October 2009
Added UDF LICENSING Information Jay Mueller - Partner Certification - Application Engineering November 2009
Expanded "Types of UDFs" section Jay Mueller - Partner Certification - Application Engineering November 2009

Referenced Documents:

Document Title Location

Product Manual:

Sybase IQ 15.1 User-Defined Functions Guide

Navigation:
Sybase.com -> Support & Services -> Product Manuals -> Product: Sybase IQ ->
Document Set: Sybase IQ 15.1 -> User-Defined Functions Guide

Direct Link:
http://infocenter.sybase.com/help/index.jsp?topic=
/com.sybase.infocenter.dc01034.1510/doc/html/title.html

Product Manual:

Sybase IQ 15.1 System Administration Guide, Volume 1

Navigation:
Sybase.com -> Support & Services -> Product Manuals -> Product: Sybase IQ ->
Document Set: Sybase IQ 15.1 -> System Administration Guide, Volume 1

Direct Link:
http://infocenter.sybase.com/help/index.jsp?topic=
/com.sybase.infocenter.dc00170.1510/html/iqapgv1/title.htm

Product Manual:

Sybase IQ 15.1 System Administration Guide, Volume 2

Navigation:
Sybase.com -> Support & Services -> Product Manuals -> Product: Sybase IQ ->
Document Set: Sybase IQ 15.1 -> System Administration Guide, Volume 2

Direct Link:
http://infocenter.sybase.com/help/index.jsp?topic=
/com.sybase.infocenter.dc00800.1510/html/iqapgv2/title.htm

Product Manual:

Sybase IQ 15.1 Reference: Building Blocks, Tables, and Procedures

Navigation:
Sybase.com -> Support & Services -> Product Manuals -> Product: Sybase IQ ->
Document Set: Sybase IQ 15.1 -> Sybase IQ 15.1 Reference: Building Blocks, Tables, and Procedures

Direct Link:
http://infocenter.sybase.com/help/index.jsp?topic=
/com.sybase.infocenter.dc38151.1510/html/iqrefbb/title.htm

White Paper:

Debugging Using Cross-Character-Set Maps

Navigation:
Sybase.com -> Support & Services -> Technical Documents -> Document Types ->
White Paper-Technical -> Debugging Using Cross-Character-Set Maps

Direct Link:
http://www.sybase.com/detail?id=1009201

Note: Occasionally, maintenance at Sybase.com will cause documents to be moved to different URLs. If the direct links in this Best Practices Guide fail to work, navigate to the referenced document and search for the relevant section within the document.

Back to Contents


Overview

The purpose of this User-Defined Function (UDF) Development Best Practices Guide is to supplement the Sybase IQ User-Defined Functions Guide, by giving the developers of UDFs some general guidelines on how to get started, some design considerations, etc. There are several references from this Best Practices Guide back to the User-Defined Functions Guide, where additional detailed information is available.

With release 15.1 of Sybase IQ, and release 3 of RAP - The Trading Edition™, an optional feature was added which enables partners to create User Defined Functions (hereafter referred to as “UDFs”) that execute within the database container. The UDF execution feature is available as an optionally-licenseable component for use within Sybase IQ or within the RAPStore component of RAP - The Trading Edition™ R3. For the remainder of this report, wherever Sybase IQ 15.1 is mentioned, the statement also applies to the RAPStore component of RAP - The Trading Edition™ R3.

In order to execute external UDFs, the Sybase IQ database server must be started with the IQ_UDF license (see UDF Licensing below for details).

Also available with RAP - The Trading Edition™ R3: Enterprise is a set of optional time-series functions which were implemented with this same UDF architecture. These functions are named with a 'ts_' prefix. These finance-specific functions are available only with RAP - The Trading Edition™ R3: Enterprise, and are documented in the Sybase IQ 15.1 Reference: Building Blocks, Tables, and Procedures, in the section titled Time series and forecasting functions.

In order to execute the TimeSeries functions, the Sybase IQ database server must be started with the IQ_TS_FSF license option in addition to the IQ_UDF license option (these are both included in the license for RAP The Trading Edition R3 Enterprise - see UDF Licensing below for details).

The purpose of developing and invoking UDFs that execute within Sybase IQ is to take advantage of the extreme performance of the IQ server, while also providing users the flexibility of analyzing their data with the flexibility of a programmatic solution.

User-Defined Functions comprise two components.

  • UDF Declaration, and
  • UDF Executable Code

A UDF is declared in the SQL environment via the creation of a SQL function or stored procedure which describes the parameters and provides a reference to the external library.

The actual executable portion of the UDF is contained within an external (shared object or dynamic load) library file, which will be automatically loaded by the IQ server upon the first invocation of a UDF Declaration function or stored procedure associated with that library. Once loaded, the library will then remain resident in the IQ server for rapid access via subsequent invocations of SQL functions or stored procedures that reference that library.

This architecture is represented in a diagram below.

User-Defined Function architecture

The mechanism for developing UDFs is via C or C++ coding.

The C/C++ source code for the UDFs is compiled into one or more external libraries that are subsequently loaded into the IQ server's process space when needed. The UDF calling mechanism is defined to the Sybase IQ server via a SQL function. When the SQL function is invoked from a SQL query, the IQ server will load the corresponding library if it has not already been loaded.

For simplicity of managing the UDF installation, it is recommended that UDF developers package many UDF functions within a single library

To facilitate the construction of UDFs, a C-based API has been developed and provided by Sybase IQ. The API comprises a set of pre-defined entry points for the UDFs, a well-defined context data structure, and a series of SQL call-back functions that provide a communication mechanism from the UDF back to the Sybase IQ server.  The Sybase IQ UDF API allows software vendors and expert end-users to develop their own UDFs.  Software vendors may package and sell UDFs independently from Sybase IQ releases.

A User-Defined Functions Guide has been delivered along with release 15.1 and later releases of Sybase IQ . The guide describes the UDF API, gives several examples of calling patterns for the functions, and details the operation of the data structure and call-back functions.

Back to Contents


Types of UDFs

There are several types of user-defined functions, depending on the following attributes:

Cardinality of Input - Number of Input Rows per function call (Scalar vs. Aggregate)

The number of input rows against which the function will operate will determine the classification of the UDF and how the underlying function code will be implemented.

Scalar User-Defined Functions

Functions that operate against a single row of input are refered to as Scalar User-Defined Functions, also known simply as User-Defined Functions or UDFs. Unless otherwise indicated, a User-Defined Function is assumed to be scalar.

Since scalar UDFs operate on a single row of input, all values are passed into the underlying function code at invocation time. Hence, no contextual information is stored between invocations of the underlying code associated with a scalar UDF.

Example scalar (native) function invocation:

select UCASE(DepartmentName) from Departments

Example scalar UDF (sample UDF delivered with Sybase IQ):

select MY_PLUS_COUNTER(DepartmentID) from Departments

User-Defined Aggregate Functions (UDAFs)

Functions that operate against a group of rows of input are referred to as Aggregate User-Defined Functions, also known as User-Defined Aggregate Functions, UDAFs or UDAs. Aggregate UDFs should be specifically referred to as such, since the default assumption for UDFs is that they are scalar. For example, the Sybase IQ User-Defined Functions Guide refers to these functions as UDAFs.

The context structure for coding UDAFs is slightly different than the context structure used for coding scalar UDFs. Since a UDAF needs to collect data from multiple rows, the intermediate results will need to be stored in a context area. For more details about managing the context storage, see the section Context Storage for Aggregate UDFs.

Example aggregate (native) function invocation:

select MAX(DepartmentID) from Departments

Example UDAF function (sample UDAF delivered with Sybase IQ):

select MY_SUM(DepartmentID) from Departments

Cardinality of Output - Number of Output values per function call (single output vs. multiple outputs)

The number of output values to be generated by each invocation of the function will determine how the underlying function code will be implemented, as well as having implications on how users will invoke the UDF.

Single-output UDFs

With traditional SQL functions, only a single output was generated upon each invocation of a function.

Invoking a traditional, single-output scalar function on each row in a set of rows would generate a single output value for each row. For example, the native scalar function 'UCASE' returns a single, upper-case result from each line of mixed-case input:

select UCASE(DepartmentName) from Departments

Invoking a traditional, single-output aggregate function on a group of rows will generate a single output value for the set of input rows. For example, the native aggregate function MAX returns the largest numerical value from a list:

select MAX(DepartmentID) from Departments

Multiple-output UDFs

With UDFs, it is possible to develop functions that generate multiple output values per set of input data.

Invoking a scalar UDF on each row in a set of rows could generate multiple output values for each row. For example, a multiple-output scalar UDF could be written to simultaneously compute the minimum and maximum values from a set of columns in a single row. A scalar UDF that produces multiple outputs could be referred to as a "UDF-MO". Unless otherwise specified, a scalar UDF can be assumed to only produce a single output.

Invoking a UDAF on a group of rows could generate multiple output values for the set of input rows. For example, a multiple-output UDAF could be written to simultaneously compute the count, sum and average of a column of numerical values. A UDAF that produces multiple outputs could be referred to as a "UDAF-MO". Unless otherwise specified, a UDAF can be assumed to only produce a single output.

Summary of the different types of UDFs

Given the two classes of input row cardinality and two classes of output result cardinality, there are four basic types of UDFs. The following diagram illustrates these different types of UDFs.

Types of UDFs

Note that any type of UDF could operate either on a single column from the input row(s), or on multiple columns within the input row(s). See the Other Attributes of User-Defined Functions section below for more details.

Implementation Choices for Multiple-Output UDFs (UDF-MOs and UDAF-MOs)

When multiple outputs are desired from a UDF or UDAF, there are a couple different ways of implementing them. There are advantages and disadvantages of these implementations.

Multiple analysis functions (each having a single output)

The simplest solution (albeit not the most efficient) would be to develop multiple functions that each produce a single-output. Single-output functions fit nicely into traditional SQL environments, and are relatively straightforward to implement. For the scalar example above, this could translate into two single-output UDF invocations:

select my_min(col1, col2, ..., colN) Min, 
       my_max(col1, col2, ..., colN) Max from SampleTable;

For large amounts of data that require extensive analysis, this approach is not the most efficient (consider the possibility that there are 1000 columns in the table, or a scenario where more complex analysis is being performed).

Single analysis, multiple accessors (using an intermediate variable)

An alternate solution would be to develop a single 'analysis' function which performs the analysis once, and packages the multiple results into a single object, such as a binary or varchar variable. This would then be coupled with one or more result 'accessor' functions, which would be used to parse the binary or varchar variable and return the desired results. For the scalar example above, this would translate into a UDF-MO invocation having a pattern similar to the following:

-- Set up a variable to store intermediate result object
create variable @mm_result_string varchar (16); -- Perform analysis
select my_minmax_compute(col1, col2, ..., colN) into @mm_result_string from SampleTable;
-- Retrieve results select my_minmax_value(@mm_result_string, 'minimum') Min, my_minmax_value(@mm_result_string, 'maximum') Max from iq_dummy;
-- Clean up (remove temporary variable)
drop variable @@mm_result_string;

In the above example, the analysis UDF-MO ('my_minmax_compute') would return a single object (@mm_result_string), a string representing two float values appended together (for example: '123456.7987654.3'). The accessor single-output UDF ('my_minmax_value') would take the result object as one parameter, a request for a value as another parameter, and then return the designated result.

For the simple example above, the 'my_minmax_value' functionality could also be implemented using a native substring function surrounded by a native cast operation. For example, using native functions, the results could also be retrieved as follows (non-UDF solution for retrieving values from a results object):

...
-- Retrieve results select cast (substring(@mm_result_string,1,8) as float) Min, cast (substring(@mm_result_string,9,8) as float) Max from iq_dummy;
...

Either method should result in output similar to the following:

Min Max

123,456.703125

987,654.3125

The resulting values may not match the result string exactly due to the choice of intermediate result variable type and/or internal rounding caused by the cast operation. For this example, a binary variable or longer varchar (padded with zeros) would probably be a better choice for the intermediate result values, where final results are float values.

Note that as of release 15.1 GA of Sybase IQ, there was a limit of 256 bytes for return values of UDFs and UDAFs. With release 15.1 ESD1, this limit has been increased to 32K. In a future release of Sybase IQ, this limit will be expanded to 2MB. Eventually, it will be possible to use binary large objects (BLOBs) or character large objects (CLOBs) as return values from UDFs and UDAFs. This will allow for much larger result objects (limited only by the architecture of the hardware).

Since there was a strict limitation on the size of the return values in 15.1 GA, a few UDAF-MOs were developed which wrote intermediate results to a temporary output file. These functions will be redesigned with the intermediate variable paradigm in the near future. Going forward, UDF-MOs and UDAF-MOs should NOT write to output files (see the section Practices to Avoid below).

Other attributes of User-Defined Functions

In addition to the cardinality of input and output, there are other attributes that could be considered to distinguish the functions. These attributes are not as significant as the cardinality attributes described above, but interesting nonetheless.

Number of Input Columns / Parameters per Row of Input

UDFs and UDAFs (as well as UDF-MOs and UDAF-MOs) may operate on a single input column or multiple input columns. For example, my_sum is a UDAF that operates on a single column of input data. The sample UDF my_plus is an example of a scalar UDF that operates on multiple input columns within a single input row. An example of a UDAF that operates on multilple columns could be a correlation function that compares two columns of input data.

Multiple-column input values are specified on the SQL command line as a comma-delimited list. The underlying code is then created to handle these multiple parameters. See the example code for the scalar UDF function 'my_sum' to see how this is implemented. The same convention applies to multiple-column inputs to UDAFs.

Other than parameter-parsing, the number of input columns does not affect the inherent semantics of the underlying UDF code, so there is no need to make a distiction between single-column input and multiple-column input for the various types of UDFs.

Deterministic vs. Non-deterministic

Another attribute is whether the function returns a deterministic value, or a non-deterministic value. The result of a function could either be determined solely by the input parameters and data (deterministic), or by some random behavior (non-deterministic). Parameters of non-deterministic UDFs typically need a random seed as one of the input parameters. An example of a non-deterministic function would be a normal bell curve distribution, with parameters specifying the shape of the bell curve.

Note: Since non-deterministic functions require an additional random seed, if the random seed is set up to be the first parameter of the function, than the additional parameters that represent the shape of the distribution will not map as you may expect. In other words, if the parameters of a non-deterministic function are (RandomNumber, A, B, C, ...), then parameter 2 maps to A, parameter 3 maps to B, etc.

Return values, Null handling, Aggregation allowed, SQL Security.

UDFs can be defined to return a variety of data types in the result.

Another attribute of a scalar UDF defines whether to ignore or respect null values.

Additional attributes of UDAFs define what to return on empty input (null or a value) and whether and how the function is designed to operate within various OLAP aggregation windows. For an in-depth discusion about OLAP (online analytical processing), refer to Chapter 2: Using OLAP, in the Sybase IQ 15.1 System Administration Guide: Volume 2.

SQL Security can also be defined for each type of UDF. With SQL Security, the UDF can be defined to run as the owner (DEFINER) or the user that is executing the query (INVOKER).

For more information about these attributes, see the sections Declaring a Scalar UDF (CREATE FUNCTION statement) or Declaring a UDAF (CREATE AGGREGATE FUNCTION statement) in the Sybase IQ User-Defined Functions Guide.

Back to Contents


UDF Licensing

In order to execute UDFs residing in external libraries, the Sybase IQ server must be running with the "IQ_UDF" license option (the "IQ_TS_FSF" license option is also required for executing the TimeSeries UDFs).

Determining if the license is present

To determine if the appropriate license has been obtained, view the Sybase IQ Message Log (filename with extension '.iqmsg') co-located with the database (.db) file and configuration (.cfg) file.

To verify the ability to execute UDFs, look for the string "Checked out license for {n} IQ_UDF" (where {n} is the number of CPU cores on the machine where Sybase IQ is running). This message is written to the file during database startup, so if it is present, it should be located near the top of the iqmsg file.

To verify the ability to execute TimeSeries functions, look for the string "Checked out license for {n} IQ_TS_FSF" (where {n} is the number of CPU cores on the machine where Sybase IQ is running). This message is written to the file during database startup, so if it is present, it should be located near the top of the iqmsg file.

The database may be located in a variety of places. For example, by default, the demo database is located at:

  • $IQDIR15/demo (on Unix), or
  • C:\Documents and Settings\All Users\SybaseIQ (on Windows)

Alternately, examine the license file or files (.lic extension) in your environment. The license file(s) will be either co-located with your database (.db) file and configuration (.cfg) file (see above), or will be located in the Sybase Software Asset Management (SySAM) folder.

Depending on the operating system flavor, the SySAM folder is located at:

  • $IQSYSAM/licenses (on Unix), or
  • %SYBASE%\SYSAM-2_0\licenses (on Windows)

In the license (.lic) file, look for the "IQ_UDF" option in the list of "COMPONENTS=".

Obtaining the UDF license

The IQ_UDF license is available from the Sybase Product Download Center (SPDC) as the "Sybase IQ 15.1 Partner Solutions Option".

Trouble-shooting - license is present, but not recognized during DB startup

There is a known issue with Sybase IQ regarding the way databases are created and how they are initially configured to look for license files.

The database-creation script 'mkiqdemo' creates a License Management Protocol (.lmp) file along with the new database. This file will have the same name as the database instance (e.g. 'iqdemo.lmp'), and will be co-located with the database (.db) file and configuration (.cfg) file.

When first generated, the first two lines of this file contain the strings "PE=" and "LT=". The string "PE=" tells the Sybase IQ server which Platform Edition license to look for, and is initially blank. The string "LT=" tells the Sybase IQ server which License Type to look for, and is also initially blank. These two fields must be edited in the .lmp file in order to tell the Sybase IQ server how to find the available licenses. See the next paragraph to determine how these two fields must be modified.

When downloaded from the SPDC, the "Partner Solutions Option" license will contain a string "VENDOR_STRING=PE=xx;LT=yy", where 'xx' is the Platform Edition and 'yy' is the License Type for this license. These are the values that must be present in the .lmp file.

Back to Contents


Planning and Design

Sybase IQ is a world-class, high-speed, robust database and analytics engine. Any components such as UDFs that are added to Sybase IQ should maintain these qualities of the Sybase IQ server. The following sections describe some details of these considerations.

As with any modern programming project, there are some basic considerations to keep in mind while developing UDFs. In addition to the standard considerations related to software development in general, there are some additional considerations when working within a database container. For example, concurrency, availability and data integrity. See the section Considerations for Working within the Sybase IQ Container below for a more complete list and additional details.

When executing within a database container, there are certain activities that should be avoided. See the section Practices to Avoid below for details.

For additional considerations regarding UDF development, refer to the section User-defined function restrictions in the User-Defined Functions Guide.

There are several flavors of UDFs that can be created. For example, scalar vs. aggregate, single-output vs. multiple-output, determinate vs. non-determinate. See the section Types of UDFs for a brief description of these different flavors of UDFs.

After reading and understanding the considerations, practices to avoid and learning about the types of UDFs, the section Design - Basics describes the sample UDF source code provided along with Sybase IQ, and gives a high-level description of some of the components within the sample UDF source code.

Back to Contents


Considerations for Working within the Sybase IQ Container

Some of the additional considerations for working within the Sybase IQ database container are listed below.

Robustness

The reliability required by a database is extremely high. UDFs running within a database environment must maintain this high level of reliability. With the first implementation of the UDF API, UDFs run within the Sybase IQ server. If a UDF aborts prematurely or unexpectedly, this may cause the Sybase IQ server to also abort. It is important to ensure that UDFs do not terminate prematurely or abort unexpectedly under any circumstances.

Data Integrity

It is extremely important that a database maintains data integrity. In no circumstances should data be lost, modified, augmented, or corrupted. Since UDF execution happens within the Sybase IQ server, there is a risk of corrupting data. It is important to practice caution with memory management and any other use of pointers. A recommendation has been made for users to install and execute UDFs within a read-only multiplex node, but that policy cannot be guaranteed. For added protection each IQ server can enable or disable the execution of UDFs, via a startup option. See the section Enabling and disabling user-defined functions in the User-Defined Functions Guide.

Flexible Installation

The Sybase IQ product can be installed in a wide variety of configurations. UDFs must be easily installed within this environment, and must be able to run within all supported configurations. The Sybase IQ installer provides a default installation directory , but allows users to select a different installation directory if they so choose. In order to complement Sybase IQ, UDF developers should consider providing the same flexibility with the installation of the UDF libraries and associated SQL function definition scripts.

Seemless Execution

UDFs will be expected to run seemlessly within the database container. Although Sybase IQ is a complex product consisting of many files, the main user interaction is via an IQ server process (iqsrv15), using industry-standard SQL (Structured Query Language). Execution of UDFs should be accomplished entirely via SQL commands, and the user should not be required to understand the underlying implementation method in order to use the UDFs.

UDFs run under the cover of Sybase IQ, so no console messages should be written. Any feedback to the user should be presented via pre-defined exception messages.

UDFs should manage memory and temporary results as defined by the Sybase IQ UDF API. For additional details on the Sybase IQ UDF API, refer to the User-Defined Functions Guide.

Sybase IQ manages disk I/O in a reliable manner to guarantee data availability and integrity. UDFs should generally not write to or read from the file system.

Sybase IQ is a multi-user application. Many users could be executing the same UDF simultaneously. Certain OLAP queries will cause a UDF to be executed multiple times within the same query, sometimes in parallel. For additional details on setting UDFs to be able to run in parallel, refer to the section Aggregate UDF calling patterns in the User-Defined Functions Guide.

Rapid Upgrade

Many Sybase IQ installations are running in mission-critical environments. Customers require an extremely high level of availability from Sybase IQ. To provide the highest availability, it is important that system administrators be able to install and upgrade UDFs with little or no impact to the Sybase IQ server. Upgrades to Sybase IQ or underlying components (such as UDFs) should be accomplished with little or no downtime to the IQ server.

Serviceability

Sybase IQ provides extensive logging and tracing capabilities. In the unlikely event that there is a problem with Sybase IQ, system administrators have access to this very detailed logging and tracing information. It is important that UDFs provide the same or better level of detailed logging, in the event that trouble-shooting indicates that a problem was encountered within the UDF code.

User Control

Sybase IQ can handle extremely large data sets, and some queries can potentially run for long periods of time. Occasionally a query will take an unusually long time to execute. The SQL client provides the user with the ability to cancel a query if it is taking too long to complete. Native functions provided by Sybase IQ are implemented in a manner that allows them to know when a user has canceled a query. UDFs must also be written in a manner that they understand if a query has been canceled by the user. In other words, UDFs should support the ability for users to cancel long-running queries that invoke UDFs. For details on this implementation, see the 'get_is_canceled' function description in the list of SQL callback functions below.

Internationalization

Sybase IQ has been internationalized, so that it can be sold in different countries around the world, to users who speak many different languages. Error messages have been extracted from the code and put into external files. This provides the capability to localize those error messages to new languages, without having to make extensive code changes.

In order to support multiple languages, UDFs should also be internationalized. In general, most UDFs will operate on numeric data. In some cases, a UDF may accept string keywords as one or more of the parameters. These keywords should be placed in external files, in addition to any exception text and log messages used by the UDF.

Sybase IQ has also been localized to a few non-English foreign languages. In order to support localization to the same languages as Sybase IQ supports, it is recommended that UDFs be internationalized to facilitate the localization process. Localization may be performed at a later date, by an independent organization.

For details about international language support in Sybase IQ, refer to the Sybase IQ System Administration Guide, Volume 1, Chapter 11: International Languages and Character Sets

There is also a white paper available at Sybase.com which describes how to debug multi-language client-server environments. This white paper is titled Debugging Using Cross-Character-Set Maps. This paper describes mainly the task of dealing with multi-byte data, as opposed to input keywords, exception messages and log entries. Most UDFs will deal with numeric data, as opposed to character data.

Platform Differences

Be aware that the functions will need to be flexible to run on a variety of platforms supported by Sybase IQ. The Sybase IQ 15.x server runs on 64-bit architectures. It is supported under several flavors of the MS Windows (64-bit) family of operating systems. It is also supported on various types and versions of Unix (64-bit), including Solaris, HP-UX, AIX and Linux.

Customizable Installation Location

See Flexible Installation above.

Back to Contents


Practices to Avoid

The following is a list of practices to avoid.

  • Do not hard-code library paths in SQL registration scripts. This practice makes it difficult to provide flexibility to the user to install the UDFs into the same directory as Sybase IQ.
  • Do not write output files. With the initial release 15.1, there was an architectural limitation on UDF results within Sybase IQ. Due to this limitation, some UDFs were developed which wrote to temporary results files outside of the Sybase IQ container. With the release of the first ESD for Sybase IQ 15.1, this architectural limit has been expanded to a usable size. The limit will be essentially removed in a future release of Sybase IQ.
  • Do not write ambiguous code, or constructs that could unexpectedly loop forever, without providing a mechanism for the user to cancel the UDF invocation (see the function 'get_is_cancelled()' below in the discussion of SQL Callback Functions.
  • Do not perform complex, or memory-intensive operations that will be repeated every invocation. When a UDF call is made against a table that contains many thousands of rows, efficient execution becomes paramount. It is recommended to allocate blocks of memory for a thousand to several thousand rows at a time, rather than on a row-by-row basis.
  • Do not open a database connection from within a UDF. Database operations are are not to be performed within a UDF. All parameters and data required for UDF execution should be passed as parameters to the UDF.
  • Do not use reserved words when naming UDFs. For more details, see the section UDF naming conventions.

Back to Contents


UDF Naming Conventions

UDF names must follow the same restrictions as other identifiers in Sybase IQ.

Sybase IQ identifiers have a maximum length of 128 bytes. For simplicity of use, UDF names should start with an alphabetic character as defined by Sybase IQ. Alphabetic characters as defined by Sybase IQ are the letters of the alphabet, plus underscore (_), at sign (@), number or pound sign (#) and dollar sign ($). UDF names should consist entirely of these alphabetic characters as well as digits (the numbers 0 through 9). UDF names should not conflict with SQL reserved words. There is a list of SQL reserved words in the Sybase IQ 15.1 manual Reference: Building Blocks, Tables, and Procedures, Chapter 2: SQL Language Elements, section Reserved words.

Although UDF names (as other identifiers) may also contain reserved words, spaces, characters other than those listed above, and may start with a non-alphabetic character, this is not recommended. If UDF names have any of these characteristics, then they must be enclosed in quotes or square brackets, which makes it more difficult to use them.

A list of reserved words in Sybase IQ can be found on the page titled Reserved words under the section Keywords in Chapter 2: SQL Language Elements of the Sybase IQ manual Reference: Building Blocks, Tables, and Procedures. The direct link to the Reserved words section is http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/Alhakeywords.htm

The UDFs will reside in the same name space as other SQL functions and stored procedures. In order to avoid conflicts with existing stored procedures and functions, UDFs should be prefaced with a unique short (2-letter to 5-letter) acronym and underscore. UDF names should be chosen that will not conflict with other SQL functions or stored procedures already defined in the local environment. The following are some of the prefixes which are already in use by native Sybase IQ, the optional Time Series financial functions, and UDFs available from other vendors:

UDF Name or Prefix Description

debugger_tutorial

stored procedure delivered with the native Sybase IQ installation

FL...

a package of DBLytix UDFs from Fuzzy Logix, LLC *

ManageContacts

stored procedure delivered with the Sybase IQ demo database to manage the Contacts table

Show... a set of stored procedures that display various Sybase IQ demo database objects
sp_Detect_MPX_DDL_conflicts stored procedure used in a Sybase IQ Multiplex environment
sp_iqevbegintxn stored procedure used to begin a new transaction in the Multiplex environment
sp_iqmpx... a set of functions and stored procedures provided by Sybase IQ to assist in MultiPlex administration

ts_...

optional financial time series and forecasting functions **

* For a list of DB Lytix functions, see the Partner Certification Report: Fuzzy Logix, LLC - DB Lytix version 1.0 User-Defined Functions (UDFs) running on Sybase IQ 15.1 with the IQ_UDF license option, Report 321, August 2009. For additional details on the DB Lytix functions, refer to DB Lytix product documentation or contact Fuzzy Logix, LLC (see Vendor Contact Information below).

** For a list of the financial time series analysis functions, see the Sybase IQ 15.1 manual Reference: Building Blocks, Tables, and Procedures, Chapter 4: SQL Functions, section Time series and forecasting functions. For additional details on these functions, refer to the section Alphabetical list of functions within the same document. The time series and forecasting capability is only available with RAP - The Trading Edition™ R3 Enterprise.

Back to Contents


Design – Basics

As with any modern programming project, there are some basic considerations to keep in mind while developing UDFs.

This Best Practices Guide assumes that the UDF developer is familiar with the basics of developing shrink-wrap software, including good program design and development, independent testing, etc.

In addition to standard software development practices, the UDF developer should keep in mind that they are developing code to be executed within the Sybase IQ database container (see Considerations for Working within the Sybase IQ Database Container above). Also, it is important to understand the limitations imposed by the database container (see the section Practices to Avoid).

Developers of aggregate UDFs should also be familiar with OLAP queries, and how they translate into UDF calling patterns (see the section Aggregate UDF calling patterns in the User-Defined Functions Guide for various OLAP examples, and their associated calling patterns).

Because the UDFs may be invoked by several threads simultaneously, they must be constructed to be thread-safe.

Sample code

Starting with Sybase IQ 15.1 and subsequent releases, sample UDF source code is delivered along with the product. As this code may evolve slightly from release to release, the newest version of the sample code will be that delivered with the most recent release.

To see if there were last-minute changes to the sample UDF source code, refer to the Sybase IQ Release Bulletin for the relevant release and operating system platform. For Sybase IQ 15.1, the Release Bulletins for various supported platforms are located at: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.help.iq.15.1/title.htm

It should be noted that the sample UDF code documented in the User-Defined Functions Guide may not be the latest version as delivered with the Sybase IQ product.

On Unix platforms, the sample UDF code is delivered at:

$SYBASE/IQ-15_1/samples/udf

(where $SYBASE is the installation root)

On Windows platforms, the sample UDF code is delivered at:

C:\Documents and Settings\All Users\SybaseIQ\samples\udf

Note: The sample code for the 'my_interpolate' function has several variables named with the prefix 'preceeding_'. Although the code will compile and execute properly, these variables should be renamed to 'preceding_' to correct the spelling. SQL function definitions and associated SQL queries that employ window framing should use the correct spelling of the keyword 'PRECEDING'.

Back to Contents


Implementation - Basic Workflow

This section describes the components of a UDF program, and how the program should be built. There is also another overview of this process in the section Sybase IQ overview in the User-Defined Functions Guide.

The UDF executable code must be developed in a 64-bit environment. The Sybase IQ 15.x server runs in 64-bit environments only.

The basic steps to developing a UDF are as follows:

  1. Setting the dynamic library interface
  2. Defining the descriptor structure
  3. Programming the Function entry points
  4. Managing the Context Storage for Aggregate UDFs
  5. Being aware of the possible Calling patterns for the function

The next section lists the various SQL call-back functions available for use within UDF code.

Setting the dynamic library interface

For Sybase IQ UDFs, only version 3 interface style (EXTFN_V3_API) is supported. See the section Setting the dynamic library interface in the User-Defined Functions Guide for the C syntax to define this interface. The UDF code must include the header file 'extfnapiv3.h'.

Defining the descriptor structure

The descriptor structure varies between scalar and aggregate UDFs.

For scalar UDFs, define the structure 'a_v3_extfn_scalar' (see the section Scalar UDF descriptor structure in the User-Defined Functions Guide for syntax).

For aggregate UDFs, define the structure 'a_v3_extfn_aggregate' (see the section Aggregate UDF descriptor structure in the User-Defined Functions Guide for syntax).

Function entry points

The function entry points differ for scalar and aggregate UDFs. Information about calling patterns and how the Sybase IQ server invokes the scalar and aggregate UDF entry points is documented in the User-Defined Functions Guide, in the section UDF callback functions and calling patterns and its sub-sections.

Scalar UDF - Entry Points

Scalar UDFs must have a descriptor function. They also have one required entry point and two optional entry points.

The required entry point for a scalar UDF is:

  • _evaluate_extfn

The optional entry points for a scalar UDF are:

  • _start_extfn
  • _finish_extfn

The descriptor function and entry points are defined in a scalar UDF context structure. For details, refer to the section Defining a scalar UDF in the User-Defined Functions Guide.

Aggregate UDF - Entry Points

Aggregate UDFs must have a descriptor function. They also have five required entry points and five optional entry points.

The required entry points for aggregate UDFs are:

  • _start_extfn
  • _finish_extfn
  • _evaluate_extfn
  • _reset_extfn
  • _next_value_extfn

The optional entry points for aggregate UDFs are:

  • _drop_value_extfn
  • _evaluate_cumulative_extfn
  • _next_subaggregate_extfn
  • _drop_subaggregate_extfn
  • _evaluate_superaggregate_extfn

For details on the purpose of each of these functions, refer to the section Defining an aggregate UDF in the User-Defined Functions Guide. The descriptor function and entry points (along with several other variables used for optimization and context storage) are defined in an aggregate UDF context structure. For details, refer to 'UDAF context structure' in the User-Defined Functions Guide.

Context Storage for Aggregate UDFs

Context variables control whether the intermediate results of aggregate functions are to be managed by the UDF itself (forcing the IQ Server to run the UDFs serially), or whether the memory is to be managed by the IQ server. The context area is used to transfer or communicate data between multiple invocations of the UDF within the same query (particularly within OLAP-style queries).

If the '_calculation_context_size' is set to 0, then the UDF will be required to manage all interim results in memory (forcing the IQ server to invoke the UDF sequentially over the data (vs. being able to invoke many instances of the UDF in parallel during an OLAP query).

If the '_calculation_context_size' is set to a non-zero value, then the IQ server will manage a separate context area for each invocation of the UDF, allowing multiple instances of the UDF to be invoked in parallel. To make the most efficient use of memory, consider setting the '_calculation_context_alignment' to something smaller than the default (depending on the size of the context storage needed).

For details on context storage, refer to the description of '_calculation_context_size' and '_calculation_context_alignment' in the section Aggregate UDF descriptor structure in the User-Defined Functions Guide. These variables are near the end of the descriptor structure.

There is also a detailed discussion of the use of context storage in the section Calculation context in the User-Defined Functions Guide.

IMPORTANT: Whenever it is required to use memory to store intermediate results within an aggregate UDF, remember to initialize the memory with the _start_extfn function, and to clean-up and de-allocate any memory with the _finish_extfn function.

Calling patterns

The calling pattern for a scalar UDF is provided in the section Scalar UDF calling pattern in the User-Defined Functions Guide

The various calling patterns for the entry points in an aggregate UDF are defined in the User-Defined Functions Guide, in the section Aggregate UDF calling patterns and subsequent sections.

Back to Contents


Selected SQL Callback Functions

This section describes some of the SQL callback functions available for use within the UDF code. Note that some of these call-back functions are specific to scalar UDF coding, or aggregate UDF coding: For additional details about callback functions, see the section UDF and UDAF callback functions in the User-Defined Functions Guide.

Getting the UDF Input Parameters (scalar or aggregate)

Use the 'get_value( )' call-back function to get input values for the UDF.

Getting Subsequent Input Data (aggregate)

Use the 'get_piece( )' call-back function (with an offset) for additional data, if the piece length is less than the total length.

Returning the UDF Result back to the IQ Server (scalar)

Use the 'set_value( )' call-back function to return the result back to the IQ server.

Allowing Server to Terminate Functions (scalar or aggregate)

Use the 'get_is_cancelled( )' call-back function within loops in the UDF code, to ensure that the user is able to cancel a query containing this UDF if the query is taking too long to complete.

Generating an Exception (scalar or aggregate)

Use the 'set_error( )' call-back function to set an exception that will be forwarded by the IQ server to the SQL client.

Logging a Message (scalar or aggregate)

Use the 'log_message( )' call-back function to log a message to the IQ Message Log (file named <database name>.iqmsg). Note that messages longer than 255 bytes may be truncated in the Message Log.

Converting Data (scalar or aggregate)

To convert data from one SQL Data Type to another, use the 'convert_data()' call-back function.

Other OLAP-style Query Information Functions (aggregate)

  • _max_rows_in_frame( )
  • _estimated_rows_per_partition( )
  • _is_used_as_a_superaggregate( )
  • _is_window_used( )
  • _window_has_unbounded_preceding( )
  • _window_contains_current_row( )
  • _window_is_range_based( )
  • _num_rows_in_partition( )
  • _result_row_from_start_of_partition( )

Future Call-back Functions

Note that additional call-back functions will be provided over time. Please refer to the User-Defined Functions Guide for the complete list of SQL call-back functions.

Back to Contents


Declaring a SQL Function to Access the UDF

After the UDF code has been written and compiled, the next step is to create a SQL Function that will invoke the UDF from the appropriate library file, sending the input data to the UDF.

There are many styles of SQL Functions. The syntax versions for declaring SQL functions for UDFs are provided in the User-Defined Functions Guide, in the sections titled Declaring a Scalar UDF and Declaring a UDAF.

There are also descriptions in the User-Defined Functions Guide on how to create SQL functions using Sybase Central, in the sections titled Declaring a scalar user-defined function in Sybase Central and Declaring a user-defined aggregate function in Sybase Central.

Examples of SQL function definitions are supplied in the User-Defined Functions Guide along with the various code samples.

Note: The sample code for the 'my_interpolate' function has several variables named with the prefix 'preceeding_'. Although the code will compile and execute properly, these variables should be renamed to 'preceding_' to correct the spelling. SQL function definitions and associated SQL queries that employ window framing should use the correct spelling of the keyword 'PRECEDING'. See Problems with the 'PRECEEDING' keyword below.

Back to Contents


Installation, Upgrade and Uninstallation

Sybase IQ should be shut down during UDF file maintenance

As with any application which employs external libraries, it is very important that the application not be attempting to access an external library while the associated library file is being moved, overwritten or deleted. Since libraries are automatically loaded whenever an associated SQL function is invoked, it is important to follow these steps in the exact order whenever performing any type of maintenance on existing UDF libraries:

  1. Ensure all users who invoke UDFs do not have any pending queries in progress
  2. Revoke the execute permission from users, and/or drop the SQL functions and/or stored procedures which reference external UDF code modules
  3. Unload the library from the IQ server, using the 'unload' command (note that shutting down the IQ server will also automatically unload the library)
  4. Perform the desired maintenance on the external library files (copy, move, update, delete)
  5. Edit SQL function and/or stored procedure definitions in the registration scripts to reflect external library locations, if the libraries were moved
  6. Grant the execute permission to users, and/or run registration scripts to re-create the SQL functions and/or stored procedures which reference external UDF code modules
  7. Invoke a SQL function or stored procedure which references the external UDF code to ensure the IQ server is able to dynamically load the external library

For additional details about external libraries, refer to the section Managing external libraries in the User-Defined Functions Guide.

For syntax on revoking and granting execute permissions, see the section Granting and revoking permissions in the User-Defined Functions Guide.

Back to Contents


Runtime Configuration and Testing

After UDF external code has been coded, compiled and linked, and the corresponding SQL functions and/or stored procedures have been defined, the UDFs are ready to be tested.

UDFs should be run in a read-only node in a multiplex environment

In order to ensure the safest environment possible, it is highly recommended that UDFs be installed and invoked from a read-only IQ server node in a multiplex installation.

Disable UDF execution in critical IQ server nodes

To disable the ability to execute external UDFs in a particular IQ server, the administrator may specifiy the following flag in the startup command or configuration file:

-sf external_procedure_v3

See the section Enabling and disabling user-defined functions in the User-Defined Functions Guide.

Debugging UDFs with a Safer Execution Mode

To catch some of the more serious API errors, set the Sybase IQ option 'external_udf_execution_mode' to 2 while testing. For increased performance, this setting should be reduced back to 1 or 0 at the completion of testing. For additional details regarding this execution mode, see the section Controlling error checking and call tracing in the User-Defined Functions Guide.

Problems with the 'PRECEEDING' keyword

Note: In the Sybase IQ 15.1 System Administration Guide, Volume 2, there is a typo in the paragraph title "UNBOUNDED PRECEEDING and FOLLOWING" on the page Window framing in Chapter 2: Using OLAP ->Analytical Functions. The proper spelling of the keyword is 'PRECEDING' (remove the double-E). SQL function definitions and associated SQL queries that employ window framing should use the correct spelling of the keyword 'PRECEDING'.

How to enable full tracing in Sybase IQ

Consider turning on full tracing within Sybase IQ in a debug environment. To enable IQ tracing, add the following flags to the Sybase IQ server startup command line or the Sybase IQ config file:

-zr all -zo filename 

where filename is the path and filename of the tracing output file.

Location of Log Files

Logfiles for the database are generally co-located with the database file and configuration file. On Unix platforms, there are two files named after the database instance, one with a .stderr extension and one with a .stdout extension. On Windows, the stderr file is not generated by default by the start_iq utility. To capture the stderr messages along with the stdout messages under Windows, run the iqsrv15.exe directly, redirecting the stdout and stderr as follows:

iqsrv15.exe @iqdemo.cfg iqdemo.db 2>&1 > iqdemo.stdout

The Windows output messages are slightly different from what is generated on Unix platforms.

The first execution of a UDF will take longer due to library load

The Sybase IQ server will not load the library containing the UDF code until the first time the UDF is invoked. The first execution of a UDF residing in a library that has not yet been loaded may be unusually slow. After the library is loaded, the subsequent invocation of the same UDF or another UDF contained in the same library will have the expected performance.

Note that libraries may be manually unloaded with the stored procedure SA_EXTERNAL_LIBRARY_UNLOAD and will not be re-loaded after the IQ server is stopped and restarted.

In environments where after-hours maintenance operations require a shutdown and restart of the IQ server, it is advisable to run some test queries after the Sybase IQ server has been restarted. This will ensure that the appropriate libraries are loaded in memory for optimal query performance during business hours.

Back to Contents


Additional Information

Sybase Product Documentation

Product Manuals are available from Sybase.com, through the following navigation:
Sybase.com -> Support & Services -> Product Manuals (direct link: http://sybooks.sybase.com/nav/base.do ).

From this page, select Sybase IQ from the Product menu, which will bring up the documentation sets for all releases Sybase IQ.

Select Sybase IQ 15.1 from the list of Documentation Sets. This will bring up all of the documents for release 15.1 of Sybase IQ.

Direct Link: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.help.iq.15.1/title.htm

Partner Certification Reports

Partner Certification Reports describe the interaction of third-party products with Sybase IQ, RAPStore, and other Sybase database types.

To see the list of Partner Certification Reports available, use the navigation path:
Sybase.com -> Support & Services -> Technical Documents -> Document Types -> Partner Certification Report

The direct link to the Partner Certification Reports is:
http://www.sybase.com/detail_list?id=9784

Sybase Partner Program

For more information about the Sybase Partner Program, please visit the Sybase Partner Program website.
http://www.sybase.com/partner

Vendor Contact Information

Vendor Name:  Fuzzy Logix, LLC

Address:  5900 Graburns Ford Dr., Charlotte, NC 28269
Technical Support Number:  704.307.4819
Technical Support Email:  support@fuzzyl.com
Web Page: http://www.fuzzyl.com

Technical White Papers

Technical White Papers discuss topics of a technical nature, and augment Sybase Product Manuals.

To see the list of Technical White Papers available, use the navigation path:
Sybase.com -> Support & Services -> Technical Documents -> Document Types -> White Paper-Technical

Direct Link:
http://www.sybase.com/detail?id=1066612

Sybase e-Shop

Sybase provides product and technical documentation online at MySybase.com, including the certification reports written by the Sybase Interoperability Group. Paper versions of product manuals may also be ordered through Sybase's online store, http://e-shop.sybase.com.

Back to Contents



 

DOCUMENT ATTRIBUTES
Last Revised: Nov 16, 2009
Product: Sybase IQ, RAP - The Trading Edition
Technical Topics: Data Marts, Data Mining, Data Analysis, Data Modeling, Database Admin, Data Management, Decision Processing, High-Speed Analytics, Distributed Computing, Transaction Processing, Application Development, Application Integration, Enterprise Architecture, Total Cost of Ownership, Client/Server Development, Accelerated Decision Making, Stored Procedures, Functions & Database Development
  
Business or Technical: Technical
Content Id: 1066612
Infotype: White Paper-Technical
 
 
 

© Copyright 2013, Sybase Inc. - v 7.6 Home / Contact Us / Help / Jobs / Legal / Privacy / Code of Ethics