IN-list predicates of the form T.A IN T.A IN ( < item 1 >, < item 2 > ) are commonly found in SQL queries. IN-lists are a straightforward way to qualify a result set, and are equivalent to a chain of OR'd conditions of the form T.A = < item1 > OR T.A = < item2 >. In fact, if the set of IN-list elements is small enough, the Adaptive Server Anywhere optimizer will automatically convert a series of OR'd conditions to an IN-list, since they are semantically equivalent. This conversion is done to enable the Adaptive Server Anywhere optimizer to more easily find opportunities to apply optimizations to the processing of the IN-list elements. In the same way, Adaptive Server Anywhere will merge two IN-lists together when the predicates refer to the same LHS expression.
Large IN-list predicates, particularly those generated by application programs, are problematic for several reasons:
- they take time and memory to construct within the application (client)
- they result in a very large SQL statement, which in a client-server environment must be passed over the network
- the statement uses a considerable amount of RAM within the Adaptive Server Anywhere server, which can affect the efficiency of other requests
To illustrate this, consider a simple query with a single IN-list predicate consisting of 250,000 literal constants (integers). If we assume that each constant is five digits (bytes), then the SQL statement text alone is roughly 1.25 MB.
Once transmitted over the network to the Adaptive Server Anywhere server, the statement is parsed and analyzed prior to optimization. Parsing the statement requires building internal data structures to represent the different tokens within the SQL request. For our example query, the parse tree representation requires 40 bytes for each literal constant element in the IN-list. This will (roughly) require an additional 10 MB of RAM, subtracting 10 MB of memory from availability in the Adaptive Server Anywhere buffer pool for concurrently executing requests.
After parsing, Adaptive Server Anywhere builds optimizer data structures for the request and constructs an execution plan from the alternative chosen by the optimizer. Fortunately, IN-list predicates and most other predicates are shared between these two phases, so only one copy of each IN-list predicate need be built. However, the Adaptive Server Anywhere database server will require an additional 88 bytes for each IN-list element, using an additional 22 MB of database cache. All together, this statement will require approximately 33 MB of cache. In an environment with limited cache, or with a number of concurrent connections, queries of this form can cause the server to either run out of memory, or affect the performance of other queries by limiting the amount of buffer pool space available to them.
Instead of using large IN-lists, a better approach may be to load the IN-list values into a temporary table for that connection, and then re-compose the query to join the temporary table with the tables in the original query. This approach does not require Adaptive Server Anywhere to represent all of the IN-list values in memory. Instead, Adaptive Server Anywhere can use high-performance, memory-efficient query processing operators, such as a nested-loop join or hash join, to restrict the query's result set based on the values stored in the temporary table.
When trading off the two approaches, you can use the number of IN-list values as a guide. This number depends on system-specific parameters, but a rule of thumb would be that somewhere between 100 and 1000 IN-list elements is a good point to decide if using a temporary table is warranted.