Project

General

Profile

Sorting Query Results

Database Indexes and Query Sorting

In order to accurately duplicate the behavior of the Progress 4GL single record reading approach, one must dynamically select each record of the specified table in the same traversal order which Progress would generate. This problem is complicated by the feature provided by Progress where the same record may be visited more than once if its data is changed in a manner that reorders it later (or earlier if the walk is backwards) in the sequence of records. This is very different from SQL's set oriented approach where one can be sure that the set returned is of a static size. Besides the obvious danger of creating infinite loops or otherwise non-intuitive behavior, this requirement means that each "next" or "previous" record retrieval must be based on a relative movement between records instead of a walk of a static list.

In order to duplicate the traversal order of Progress, the relative movement from one record to the next/previous must be based on knowledge of the exact sorting criteria used by Progress. As with many of the behaviors of the Progress language, there is a mixture of explicit and implicit factors that determine the exact sorting criteria used for each query.

The Progress database accesses all records using an index, via the combined use of multiple indexes or via sequential table scan. When a single index is used, if the sorting criteria isn't specified or in some cases where the sort criteria doesn't fully specify an unambiguous ordering (but is compatible with the index), the index used determines the order in which the records are returned. In other words, the index in use can and often does determine (or partially determine) the real sort order.

In the multiple index and table scan approaches, if no explicit sort order is specified, the sort order is undefined. In addition, even in cases where the sort order is explicitly specified, it is possible that records can be ambiguously ordered at a more granular level that the explicit sort order specifies. When this occurs, the "secondary" sort order is undefined.

The BY phrase is used to specify a sort order explicitly. It is especially important to note that the sort order can be:

  • completely implicit (if no BY clause is provided);
  • partially implicit (if the BY clause does not unambiguously specify an exact ordering);
  • completely explicit (if the BY clause results in an unambiguous order).

For this reason, it is possible (and even likely) that a given Progress application will have ordering dependencies built in, possibly without the specific intention of the original author. This may appear as the order of records in a report or the order in which records are displayed in the user interface.

Implicit sort behavior in the single index case is dependent upon the selected index. For this reason, one must duplicate the implicit index selection that the Progress compiler makes. This is driven by analysis of query predicates (WHERE, OF and USING clauses), sort specification (BY clause) and explicit index specifications (USE-INDEX clause). These rules may be used for all query types: standalone FIND queries, queries which allow the iteration of results preselected by DO/REPEAT PRESELECT blocks, FOR loops and queries opened by the OPEN QUERY statement. Since there are differences in what can be done with these query types, some rules may not apply to all of them. For instance, BY clauses cannot occur in standalone FIND statements; i.e., sort order always is based only on the selected index.

This chapter describes the rules by which the sorting criteria are determined during the FWD conversion process, which follows, to the extent possible, the behavior of the Progress 4GL environment.

Single versus Multiple Index Selection

As noted above, some ambiguity in sorting is resolved using the index selected for record reading. This point really assumes that a single index is in use. Multiple indexes can be selected and are used together to read the specified records. However, in the multiple index case, all non-explicit sorting behavior is undefined. According to 4GL documentation, in the single index case, the 4GL developer can sometimes rely upon secondary sort behavior based on the index. So, in order to provide more predictable sorting, the FWD conversion always assumes that a single index is used. This approach is designed to approximate the 4GL environment's sorting to some extent, though it is not expected to completely match the undefined behavior.

All of the queries presented in 4GL use multiple indexes whenever possible (assuming the -v6q command line option is not active). This multiple index support appeared in Progress v7 and applications that had a dependency on single index selection (an implicit sorting dependency) needed the v6 query compatibility mode (the -v6q command line option).

Even though certain language constructs are enabled for multiple indexes, it is the query predicate (i.e., record selection criteria) which determines whether a single index or multiple indexes are chosen. In particular:

  • Having OR criteria makes multiple index use much more likely. For block queries, if an OR operator is involved, multiple indexes will be selected even if there is only a subset of the index components that are involved in equality or range matches.
  • With AND criteria an index must have all index components referenced in the predicate in equality matches (see below). This means that indexes that have only one component and predicates that reference a superset of fields in multiple indexes are more likely to trigger this condition.
  • Query predicate role is most important when there is no BY clause.

General Sorting Behavior

As discussed earlier, when multiple indexes are selected any implicit sort order is undefined. 4GL documentation indicates that the 4GL developer should not rely upon any particular sort order that may be implicitly provided in the multiple index case, since this behavior may change in future releases. One is expected to explicitly specify a BY clause in this case, or to write the code in such a manner that it is not dependent upon of the resulting sort order.

When a single index is used and there is no explicit sort order, the sort order is defined by the index. This is behavior upon which 4GL developers can rely.

To the extent a BY clause does not specify an unambiguous sort order, the sorting within the "buckets" defined in the BY clause is usually undefined. In other words, if multiple records have the same data in the given set of fields specified in the BY clause, the sort order of these records will be undefined even though the overall sort order of the query is defined. This is called the secondary sort order. For example, given the BY clause:

BY a1 BY a2

the sort order may be:

a1  a2  a3
--- --- ---
1   1   1
1   1   3
1   1   2
1   2   1
1   2   2

as well as:

a1  a2  a3
--- --- ---
1   1   3
1   1   2
1   1   1
1   2   2
1   2   1

Thus, the secondary sort order of this data set using the given BY clause generally is undefined, with the following exception: if a single index is used for record access and the BY clause matches that selected index, then that index defines the secondary sort order, to the extent the index is more specific than the BY clause. The term matches in this case means that the sort string (i.e., BY clause) components match the leading components of the index or the sort string matches the index completely. An index can be matched in either the forward direction (if sorting directions match for each matching field) or the inverse direction (if sorting directions are opposite for each matching field). Consider that we have a table with the fields a1, a2 and a3 and the index a1 a2:

def temp-table a
     field a1 as integer
     field a2 as integer
     field a3 as integer
     index idxa a1 a2.

The following sort strings all match the index idxa:

a1
a1 a2
a1 desc
a1 desc a2 desc

The following sort strings do not:

a3
a3 a1
a2
a1 a2 a3
a1 desc a2
a1 a2 desc

Index matching usually occurs when the index is selected based on sort matches (see Index Selection Rules below). In this case, the index and the sort criteria are compatible and the sorting is completely specified (and implemented) by the index on the 4GL database server. This means that in such cases, there will be a reliable secondary sort ordering based on the additional (more specific) fields in the index that were not part of the BY clause.

General sorting rules for both single and multiple index cases can be summarized in this way:

  1. The BY clause is honored as the primary sort criteria.
  2. The selected indexes can affect the secondary sorts. Other than the exceptions noted above (single index with matching or missing BY clause), the rules by which this occurs are undefined.
  3. In some cases a secondary sort is based on the record create or recid/rowid order, but one should not rely upon this kind of sorting for the following reasons:
    • It is not definitely known whether create order or rowid order is used. Tests have been performed with the sets of records where records that had been added sequentially and therefore the create order is the same as rowid order. In the future, a more active set of data with volumes of deletes and adds might be able to differentiate this undetermined behavior.
    • It is not known to which extent indexes and create/rowid order affect the sort. It is also unknown how these two methods may be chosen separately or used together.
    • One may delete a record and then create an identical record and the resulting record may have a different rowid than the original. Since rowid is a physical pointer to the location of the record in the database, anything that changes the physical order of records in the database may change the secondary sort order in these undefined cases. This means that reloading the database, exporting/importing and compacting activities can change sorting.

Implicit WHERE Clause Components

Query predicate (i.e. record selection criteria) plays a role in index selection mechanism and therefore in record sorting. As with many things in Progress, a query predicate can be specified explicitly (WHERE clause) or implicitly (through OF or USING clauses and through the special unqualified constant form of FIND). Before analyzing the query predicate, one must add any implicit portions to get the complete expression used to select records.

Unqualified Constant

A FIND with a single constant (literal value) instead of a standard record phrase is the same as making an addition to the WHERE clause referencing a primary index (which must have only a single field) in an equality match. Assuming the customer table has a primary key which has the single index component cust-num, this:

FIND customer 1 WHERE country = "USA".

is actually the same as:

FIND customer WHERE country = "USA" AND cust-num = 1.

OF Clause

Usage of the OF clause causes an addition of an equality match for each field in the common index (an index that is common between the two tables and unique in at least one of them). These equality matches are connected via an AND operator. If the common index between two tables is comprised of fields A, B and C:

FIND customer OF order WHERE name = "Bogus".

is converted into:

FIND customer WHERE name = "Bogus" AND customer.A = order.A AND customer.B = order.B AND customer.C = order.C.

Direct Logical Field References

A logical field that is directly referenced as the entire predicate or as a direct operand of a logical operator is actually considered to be the same as an equality comparison with the literal true(thus it will affect the index selection process described below). This means that:

WHERE logicalField

is the same as

WHERE logicalField = true

However the use of the NOT operator in this case does not get translated into the corresponding equality comparison (so it does not affect index selection). For instance:

WHERE NOT logicalField

is not the same as

WHERE logicalField = false

USING Clause

The usage of USING clause causes an addition of an equality match for each referenced field with the instance of that field in the default or specified (using FRAME option) screen buffer. So:

PROMPT-FOR customer.cust-num customer.seq.
FIND customer USING cust-num AND seq WHERE country = "Finland".

is the same as:

PROMPT-FOR customer.cust-num customer.seq.
FIND customer WHERE country = "Finland" AND cust-num = INPUT customer.cust-num AND seq = INPUT customer.seq.

An alternate form of USING usage exists for any field that is defined as an abbreviated field in an index (abbreviate index fields are character fields that lets you conveniently search for a partial match based on the first few characters of a field). In this case:

PROMPT-FOR customer.cust-name.
FIND customer USING cust-name WHERE country = "Finland".

is the same as:

PROMPT-FOR customer.cust-name.
FIND customer WHERE country = "Finland" AND cust-name BEGINS INPUT customer.cust-name.

The difference is that the BEGINS operator is used instead of equality operator.

Single Index Selection Rules

Since the selected index affects the resulting sorting order, it is important to know which index will be selected. FWD always assumes that a single is index is used (or none). In this section the selection process of this single index will be covered.

Indexes are selected using explicit and implicit criteria. In the absence of an explicit index specification (USE-INDEX), the Progress compiler makes an implicit index selection. This selection is based on an analysis of the query predicate (WHERE / OF / USING clauses) and sorting specifications (BY clause). The idea is that the Progress database server is completely driven by index processing (or table scans if no index can be used). There are a very limited set of operations that can be specified in a predicate which can be translated into index operations. Anything that can be translated into an index operation can be resolved by the database server without the Progress client evaluating expressions to determine if a given record matches.

In other words, a single query predicate can be processed completely on the client, completely on the database server or in both places. The degree with which the predicate can be processed on the server has a great impact on the performance of the database access since it is highly inefficient to read many records from the server only to reject them at the client when they do not match the predicate. This is a statement of data transfer volume (a less visible factor in shared memory implementations than on network implementations) and, more importantly, it is a statement of the efficiency of general purpose interpreted 4GL expression performance compared to a database server that is tuned for such processing.

Operands Which Affect Index Selection

Only database fields that are components of an index of the current table affect index selection. Consider the following table:

DEF TEMP-TABLE tt
    FIELD f1 AS INTEGER
    FIELD f2 AS INTEGER
    FIELD f3 AS INTEGER
    FIELD f4 AS INTEGER
    FIELD f5 AS INTEGER
    INDEX i1 f1
    INDEX i2 f2 f3
    INDEX i3 IS PRIMARY f1 f3 f4.

In this case fields f1, f2, f3 and f4 if encountered in a WHERE clause will have an impact on the index selection process, while field f5 will not.

The following operands are ignored during analysis:

  • literals;
  • variable references;
  • widget attributes references;
  • field references to fields of the current table which are not components of any index;
  • field references to fields external to the current table.

Operators Which Affect Index Selection

The set of operators known by the Progress database server is limited: EQ (=), LT (<), GT (>), LE (<=), GE (>=), BEGINS and CONTAINS. The CONTAINS operator works only for OPEN QUERY / FOR statements and if the referenced field is contained in a word index. When any of these operators are encountered, the compiler is allowed to examine the two operands to determine if one of them is an index component. If one of the operands (it does not have to be on the left side only) is an index component, then this may modify the index selected (based on the rules provided below).

As an example, the following predicates affect index selection:

indexField1 = 5
aVariable > indexField1
indexField2 begins “some”

Any operator than is not noted in the list above is unknown to the database server. Furthermore, any operand of the operators unknown to database server will be ignored and it (and its children) will not have any impact on the index selection.

Here is the list of operators and functions which do not participate in the index selection process:

  • Comparison operators which are unknown to the database server: NE (<>), NOT, MATCHES. For example, these predicates will be ignored:
    indexField <> 5
    not (indexField = 5)
    
  • Other operators with no functional counterpart on the server (e.g., arithmetic operators). For example, these predicates will be ignored:
    indexField + 1 = 5
    indexField1 + indexField2 > 5
    
  • Methods, constructions, built-in and user-defined functions. For example, these predicates will be ignored:
    substring(indexField, 3) = “some string”
    someFunction(indexField) > 5
    IF indexField1 THEN indexField2 ELSE indexField3 = 5
    

The only other operators that are recognized during this analysis are the conjunction (AND) and precedence (parentheses) operators.

In the single index case, the AND operator allows the compiler to analyze both operands for index selection purposes, because the best fit index will naturally be able to reduce the result set based on an AND condition. However OR operator does not allow operand analysis, because it is significantly harder (or sometimes impossible) to ensure that the results of a single index can fulfill all possible conditions joined via an OR operator, so FWD ignores the OR operator and its children during analysis. In the multiple index case, the AND and OR operators may both have some support, however this has not been explored or documented.

Parentheses neither hinder nor help the analysis process since they merely ensure that the structure/precedence of the expression is forced into a desired pattern. This resulting structure is what is analyzed, after the parentheses already have been applied.

Referencing Fields

There are two ways that a table field (either index or non-index) can be a reference in an operand of an operator known to Progress database server:

  1. Direct reference. In this case a field is the operand of an operator known to Progress database server. The following example contains two direct references:
    field1 <= field2
    
  2. Nested reference. In this case a field participates in an expression (at an arbitrary nesting level) which represents an operand. The following example contains two nested references:
    field1 + 5 = someFunction(field2)
    

The sub-expression rooted at an operator known by Progress database server will affect index selection if one of the operands is a direct reference to an index field and the other operand does not contain any references to the fields of the current table. This happens because in other cases (including the case when both operands are direct references), the sub-expression is evaluated on the Progress client after the query has been executed on the database server. Note that an index field which is an operand of a BEGINS or CONTAINS operator affects index selection only if it is the first operand. Also, the CONTAINS operator requires that the first operand is contained in a word index.

The following sub-expressions affect index selection:

indexField1 = 5
5 = indexField1

While the following do not affect it:

indexField1 = indexField2
indexField1 = indexField2 + 5
indexField1 = entry(1, indexField2)

Allowed Nesting Level

In order to affect index selection, a sub-expression (rooted by an operator known to Progress database server) should be the root expression, or should be an operand of a top-level AND operator. For instance, all sub-expressions participating in the expressions below affect index selection:

indexField1 = 5
indexField1 = 5 AND indexField2 > 6 AND (indexField3 < 7 AND indexField4 = 8)

while these do not:

(indexField1 = 5) = (indexField2 > 6)
(indexField1 = 5 AND indexField2 > 6) OR indexField3 < 7

Summary of Expressions That Affect Index Selection

Summarizing information about operations, types of operands, and the index field referencing rules provided above, we can form the rules below.

A sub-expression affects index selection only if:

  1. It is represented by an operator known to the Progress database server (EQ (=), LT (<), GT (>), LE (<=), GE (>=), BEGINS and CONTAINS).
  2. One of the operands is a direct reference to an index field and the other operand doesn't contain any references to the fields of the current table.
  3. The sub-expression is the root expression of a WHERE clause or an operand of a top-level AND operator.

Let's take a look at some sample WHERE clauses.

Example 1:

indexField1 + 5 = 6 AND 3 > indexField2 AND indexField3 <> 1

indexField1 + 5 is ignored because of the arithmetic operator +. indexField3 <> 1 is ignored because of the <> operator, which is unknown to the database server. So, the part which affects the index selection process is:

                          > indexField2

Example 2:

(indexField1 = variable1 + 5) AND someFunction(indexField2) = 10

someFunction(indexField2) = 10 is ignored because a function is used. The part that affects the index selection process is:

 indexField1 =

Example 3:

indexField4 BEGINS “some string” AND indexField2 = indexField3 AND indexField1 > nonIndexField1 + 1

indexField2 = indexField3 and indexField1 > nonIndexField1 + 1 are ignored because both sides of operands contain references to the fields of the current table. The part that affects the index selection process is:

indexField4 BEGINS

Example 4:

indexField1 = 5 OR indexField2 = 1

Because of the top-level OR operator, FWD ignores the whole expression and it will not affect index selection process (however Progress may use single or multiple indexes that contain index fields indexField1 or indexField2).

Example 5:

(indexField1 > 5) = true

In this case there is no part that can affect analysis because the indexField1 > 5 sub-expression is neither the root expression nor an operand of a top-level AND operator.

Types of Matches

Each operator known by the Progress database server can be classified by the type of match it provides. The table below specifies this correspondence. Additionally, rows in the table are ordered by the relative weight of match types, which is defined by the performance value of different types of index traversals (rows are ordered from the fastest operation to the slowest). This order in fact formed the order of index selection rules provided further.

Operator(s) Match type
EQ (=) Equality match
LT (<), GT (>), LE (<=), GE (>=) Range match
BEGINS Begins match
CONTAINS Word match

Sort Match

A sort match is a special type of match that is determined based on analysis of a BY clause rather than by analyzing a WHERE predicate. Identifying sort matches is performed using these rules:

  1. There is a specificity to the order in which the BY clauses occur. The first BY clause is the least specific component of the sort order (the least granular which forces the overall ordering of the results). Each subsequent BY clause only changes the sort order within the larger/broader categories created by the previous sort specification. Thus, subsequent BY clauses are increasingly specific sort criteria. This order must be preserved in the list of sort matches and any comparisons must take this order into account. Consider that we have the index indexField1, indexField2. Then we will have the following number of matches for the specified sort strings:
    BY  indexField1 BY indexField2                  /* 2 matches */
    BY  indexField1                                 /* 1 match */
    BY  indexField3 BY indexField1 BY indexField2   /* no matches */
    BY  indexField2 BY indexField1                  /* no matches */
    BY  indexField1 BY indexField2 BY indexField3   /* 2 matches */
    BY  indexField1 BY indexField3 BY indexField4   /* 1 match */
    
  2. There is a direction component to the BY clause (implicitly this is ASCENDING, but it can be explicitly specified as DESCENDING). If a BY clause references an index component field, BUT the sorting direction is not compatible with the index component's definition, Progress still counts this as a sort match (this statement is based on index selections reported in the compiler's XREF output). However in this case the Progress client would be forced to ignore the ordering provided by the server index and handle sorting itself in this case (unless EACH matched component of the sort string have sorting direction that is opposite the corresponding index component). Consider that we have the index indexFiled1, indexField2. Then we will have the following number of matches for the specified sort strings:
    BY  indexField1 DESC BY indexField2                 /* 2 matches */
    BY  indexField1 DESC                                /* 1 match */
    BY  indexField1 BY indexField2 DESC BY indexField3  /* 2 matches */
    
  3. The expression in a BY clause can be more complex than a simple field reference (e.g., expressions like BY SUBSTRING(field1) or BY filed2 + filed3 are allowed). Only a simple field reference can be counted as a sort match as more complicated expressions must be resolved on the client at runtime (and thus they cannot be used with an index, so they do not affect index selection).

Single Index Selection Rules

For each of the match types (equality match, range match, begins match, word match and sort match) a set of statistics is recorded by index component that was matched. This information is then used with the rules described in this section to pick the best index.

This list of rules is specified in priority order (1 is the highest priority). Each priority level is checked in turn, from highest to lowest. At the start of processing, there is a "selectable" list that includes all possible indexes for the given table. As each priority level is processed, indexes that are selectable based on the criteria at that level will remain in the selectable list and all other indexes will be removed. If the selectable list contains a single matching index after a given priority level is evaluated, that index will be the selected index. If there are multiple matching indexes at a given priority level, then the reduced selectable list is used in subsequent priority levels to provide a "tie breaking" process to determine which index is selected. This general rule does have some exceptions, which are noted below.

This example table will be used in the following examples:

define temp-table tt
    field f1 as integer
    field f2 as integer
    field f3 as integer
    field f4 as integer
    field f5 as integer
      field f6 as integer
      index idx1 is primary f1 f2
    index idx2 is unique f3 f4
    index idx3 is unique f3 f5
      index idx4 f3 f6 s1
      index idx5 f3 f6 s2
      index idx6 f3 f1 f2
    index idx7 f1 f4
    index idx8 f4 f1
    index word1 is word-index s1
      index word2 is word-index s2.

Rules:

1. An index that was specified in USE-INDEX*.* The use of the USE-INDEX phrase overrides any compiler efforts at selecting the index(es) to use and limits the selection to that single index. This is the (one and only) explicit index case. Only one USE-INDEX clause can be present in any record phrase, so there is never a need for a tie breaker for this priority level. Sample WHERE clause:

f1 = 1 AND f2 = 1 USE-INDEX idx2

Selected index: idx2.

2. A unique index with all components in equality matches. When each and every index field of an unique index is involved in an equality match that index will be selectable. Since there can be more than one unique index and the predicate could cause the selection of multiple unique indexes, the unique index that has the most components will be the one chosen. If there are multiple unique indexes that are selectable with the same number of components, then this list of selectable choices will be further processed starting at rule 7 below (this is a tie breaker). Sample WHERE clause:

f3 = 1 AND f4 = 3

Selected index: idx2

f3 = 1 AND f4 = 3 AND f5 = 5

Because idx2 and idx3 have the same number of matching components, no index will be selected at this step and processing goes to the step 7.

3. A word index referenced through CONTAINS*.* When a word index is referenced through a CONTAINS operator, that index is selectable (this operator can take as the first operand only a field referenced in a word index, and a word index always contains a single index field). If only one word index is selectable, that index is used. If there are multiple word indexes that are selectable, then this list of selectable choices will be further processed starting at rule 7 below (this is a tie breaker). Sample WHERE clause:

s1 CONTAINS “item”

Selected index: word1.

s1 CONTAINS “item” AND s2 CONTAINS “work”

Because indexes word1 and word2 have the same number of matching components, no index will be selected at this step and processing goes to the step 7.

Note that the described order of steps: first step 2 then step 3 matches FWD behavior, however Progress documentation tells that the step 3 should go before the step 2, while analysis of .xrf files provided by Progress compiler shows that in the case when an unique index with all components in equality matches occurs along with a word index match, both indexes are selected (according to Progress documentation the word index should have preference). Also note that so far FWD does not completely support neither word indexes nor CONTAINS operator.

4. Most sequential leading equality matches. Due to the special nature of rules 2 and 3, when this rule is evaluated, it always processes starting with the full list of all possible indexes as the potential indexes. The indexes that have the most sequential leading components with equality matches are selectable. Most is interpreted in an absolute sense, not a relative or percentage sense. For example, if one index has the first 4 index components all involved in equality matches, and no other index has 4 or more leading index components in equality matches then this index is chosen. Sample WHERE clause:

f3 = 1 AND f1 = 1

Selected index: idx6.

If no indexes were selectable via this rule, then the full list of all possible indexes will be checked using the next rule (№5). If more than one index has the same number of leading index components in equality matches, then this reduced list of selectable indexes further refined:

  • The next index component (after those already known to be involved in an equality match) of each selectable index must be evaluated to see if it has a begins match in this component. For example, an index with its first 3 components with equality matches and the 4th component as a begins match will be selected over an index with 3 equality matches (only).
    The equality matches must all occur before the begins matches and all equality/begins matches must still be sequential (the first index component without any match ends the calculation). Only one component after the equality matches is checked for a begins match.
    If only one index is selectable after the begins match check, this index is chosen.
    If more than one index is still selectable after the begins test (i.e. if multiple indexes have the next component as a begins match), then this reduced list of selectable indexes is processed further starting at the rule 6.
    If none of the selectable indexes were matched via this begins tie breaker, processing of the selected indexes (all with the same number of equality matches) continues in the next sub-rule (b).
    Sample WHERE clause:
    f3 = 1 AND f6 = 1 AND s2 BEGINS “item”
    

    Selected index: idx5.
  • The next index component (after those already known to be involved in an equality match) of each selectable index must be evaluated to see if it has a range match in this component. Note that despite the fact that Progress documentation refers "most range matches" as an index selection criteria, only one range match is ever considered.
    If only one index is selectable after the range match check, this index is chosen.
    If more than one selectable index is still selectable after the range test (i.e. if multiple indexes have the next component as a range match), then this reduced list of selectable indexes is processed further starting at the rule 6.
    If none of the selectable indexes were matched via this range tie breaker, processing of the selected indexes (all with the same number of equality matches) continues in rule 6.
    Sample WHERE clause:
    f3 = 1 AND f5 > 10
    

    Selected index: idx3.

5. An index with its first component involved in a range or BEGINS match. All indexes that have their leading component in a range or begins match are selectable. Note that despite the fact that Progress documentation refers "most range/begins matches" as an index selection criteria, only one range/begins match is ever considered.

If only one index is selectable after the range/begins match check, this index is chosen.

If more than one index is selectable, then the reduced list of selectable indexes is evaluated by the next rule (№6).

Sample WHERE clause:

f1 > 0 AND f4 < 10

Because idx1, idx7 and idx8 have the first component involved in a range match, no index will be selected at this step and processing goes to the step 6.

6. Most sequential leading sort matches. The indexes that have the most sequential leading components with sort matches (an index component directly specified in a BY clause rather than the WHERE clause) are selectable. Most is interpreted in an absolute sense, not a relative or percentage sense. For example, if one index has the first 4 index components all involved in sort matches, and no other index has 4 or more leading index components in sort matches then this index is chosen. You can read more about matching rules in the Sort Match section above.

If only one index is selectable after the sort match check, this index is chosen.

If more than one index has the same number of leading index components in sort matches, then this reduced list of selectable indexes is evaluated by the next rule (№7).

Sample case (no WHERE clause specified):

BY f1 BY f4

Selected index: idx7.

7. The primary index. This rule is special because it can be "directly" reached from the rules №2 and №3 above. When this rule is evaluated, the selectable list may include all possible indexes (if no rule above was matched) or it may include a reduced set of selectable indexes (if one or more of the rules above selected indexes). Whichever is the case, this list is searched for the primary index.

If the primary index is selectable, then this index is used.

There can be only one primary index, so there can never be a "tie" here and in the case where the primary index is not selectable, the next rule (№8) is used.

Sample WHERE clause:

f1 > 0 AND f4 < 10

For this case, on the step №5 the list of selected indexes has been refined to idx1, idx7 and idx8. On this step the primary index was selected: idx1.

8. The index with the alphabetically first name. In general, the index name (from the selectable list) that is lexicographically first (case insensitive) is the index chosen. The only exception is if this is being used as a tie breaker for unique indexes (rule №2), where the lexicographically last (case insensitive) index name is the index chosen. Since all index names must be different (using case insensitive comparison) within a given table, this rule will always result in a single index choice.

Sample WHERE clause:

s1 CONTAINS “item” AND s2 CONTAINS “work”

For this case, on the step №3 the list of selected indexes has been refined to word1 and word2. On this step the lexicographically first index was selected: word1.

Index Selection Special Case

A FIND lookup by recid/rowid does not use an index.

Conversion Examples

The index selection examples are grouped in three sections:

  • general index criteria selection examples;
  • single index selection rules examples;
  • operators and functions with no impact on index selection.

The following examples will use the person temporary table, as defined in person-temp-table.i:

/* person-temp-table.i */
define temp-table person
     field site-id as integer
     field emp-num as integer
     field first-name as character
     field last-name as character
     field middle-init as character
     field ssn as character
     field pay-type as character
     field hire-date as date
     field term-date as date
     field comments as character
     field area-code as character
     field phone-number as character
     field extension as character
     field schedule as character
     field hours as integer
     index site-emp is primary unique site-id emp-num
     index comments is word-index comments
     index emp-ssn is unique ssn
     index exten is unique extension
     index hire-date hire-date
     index name last-name first-name middle-init
     index pay-type pay-type
     index phone area-code phone-number extension
     index term-date term-date.

Example1: Default Index Selection

4GL source:

{person-temp-table.i}

find first person no-lock no-error.

if available(person)
then display person.
else message "No person available".

Converted example:

...
new FindQuery(person,
              (String) null,
              null,
              "person.siteId asc, person.empNum asc",
              LockType.NONE).first();
...

Details:

This example illustrates simple default index selection. The default index used is the primary index as there are no criteria to work with.

Example 2: Implicit Index Selection

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.hire-date = 01/01/2000 :
    display person.
end.

Converted example:

...
query0 = new AdaptiveQuery(person, "person.hireDate = '2000-01-01'", null,
                           "person.hireDate asc", LockType.NONE);

Details:

FIND uses an implicit index. Since the hire-date field is used in an equality match, and no USE-INDEX clause is specified, the hire-date index is selected.

Example 3: Explicit Index Selection (rule 1)

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.hire-date = 01/01/2000
    use-index name:

    display person.

end.

Converted example:

...
query1 = new AdaptiveQuery(person, "person.hireDate = '2000-01-01'", null,
                           "person.lastName asc, person.firstName asc, person.middleInit asc",
                           LockType.NONE);
...

Details:

This time USE-INDEX overrides default index selection and the name index is selected.

Example 4: BY Clause Index Selection

4GL source:

{person-temp-table.i}

for each person no-lock
  by person.last-name:

  ...

end.

Converted example:

...
query0 = new AdaptiveQuery(person,
                           (String) null,
                           null,
                           "person.lastName asc, person.firstName asc, person.middleInit asc",
                           LockType.NONE);
...

Details:

This example illustrates index selection based on a BY clause. The additional sort criteria in the converted code result from the selection of the name index as a sort match for the less specific BY clause (see Sort Match section above).

Example 5: Criteria Index Selection

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.site-id = 1
  by person.last-name:

  ...

end.

Converted example:

...
query0 = new PreselectQuery(person,
                            "person.siteId = 1",
                            null,
                            "person.lastName asc, person.siteId asc, person.empNum asc",
                            LockType.NONE);
...

Details:

This example illustrates criteria index selection combined with a simple sort specified by a BY clause. The primary index site-emp is selected, based on the equality match against the site-id field in the WHERE clause. However, since the BY clause calls for an explicit sort on the last-name field, lastName becomes the leading component in the HQL order by clause passed to PreselectQuery's constructor. The components of the site-emp index (site-id and emp-num) are added to the HQL order by clause (converted to siteId and empNum, respectively), to ensure that the sort order of records with identical last-name values matches as closely as possible the (undefined) sort order provided by the 4GL.

Example 6: One Index For Fully Qualified Unique Option (Rule 2 )

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.site-id = 1
    and person.ssn = "111223333":

  ...

end.

Converted example:

...
query2 = new AdaptiveQuery(person,
                           "person.siteId = 1 and upper(person.ssn) = '111223333'", null,
                           "person.ssn asc", LockType.NONE);
...

Details:

The example illustrates the case of a* unique index with all components in equality matches. *The selected index is the fully qualified unique index emp-ssn.

Example 7: One Unique Index For Two Fully Qualified Options (Rule 2)

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.site-id = 1
    and person.emp-num = 123
    and person.ssn = "111223333":

  ...

end.

Converted example:

...
query3 = new AdaptiveQuery(person,
                           "person.siteId = 1 and person.empNum = 123 and "+
                           "upper(person.ssn) = '111223333'", null,
                           "person.siteId asc, person.empNum asc", LockType.NONE);
...

Details:

Both site-emp and emp-ssn have all components in equality matches. Rule number 2 applies, with the emp-ssn index selected based on the fact that it has the most components.

Example 8: One Unique Index For Two Fully Qualified Options Plus Sorting (Rule 2)

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.site-id = 1
    and person.emp-num = 123
    and person.ssn = "111223333" 
  by person.ssn:

  ...

end.

Converted example:

...
query4 = new PreselectQuery(person,
                            "person.siteId = 1 and person.empNum = 123 and " +
                            "upper(person.ssn) = '111223333'", null,
                            "person.ssn asc, person.siteId asc, person.empNum asc", LockType.NONE);
...

Details:

Both site-emp and emp-ssn have all components in equality matches. Here rule number 2 applies, with the emp-ssn index selected based on the fact that it has the most components. The sorting on the ssn field was added to because of the BY clause.

Example 9: Most Sequential Leading Equality Matches (Rule 4)

4GL source:

{person-temp-table.i}

find first person no-lock
  where person.area-code = "111" 
    and person.phone-number = "2223333" 
    and person.last-name = "Jones" 
  no-error.

if available(person)
then do: display person.

end.

Converted example:

...
new FindQuery(person,
              "upper(person.areaCode) = '111' and upper(person.phoneNumber) = '2223333' and " +
              "upper(person.lastName) = 'JONES'", null,
              "person.areaCode asc, person.phoneNumber asc, person.extension asc",
              LockType.NONE).first();
...

Details:

In this case no unique indexes are in the list of candidates. Both phone and name indexes have all components in equality matches. The phone index is selected, since it has the most components.

Example 10. On Same Number of Matches BEGINS Is Used To Make the Pick (Rule 4a)

4GL source:

{person-temp-table.i}

find first person no-lock
  where person.area-code = "111" 
    and person.last-name = "Jones" 
    and person.first-name begins "R" 
  no-error.

Converted example:

...
new FindQuery(person,
              "upper(person.areaCode) = '111' and upper(person.lastName) = 'JONES' and " +
              "upper(person.firstName) like 'R%'", null,
              "person.lastName asc, person.firstName asc, person.middleInit asc",
              LockType.NONE).first();
...

Details:

In this case no unique indexes are in the list of candidates. Both phone and name indexes have equal number of components in equality matches but not all of them. The name index is selected, since the BEGINS expression contains the next available component of the name index not already used in an equality match.

Example 11. On Same Number of Matches* the Range Match Is Used To Make the Pick (Rule 4b)*

4GL source:

{person-temp-table.i}

find first person no-lock
  where person.area-code = "111" 
    and person.phone-number > "2220000" 
    and person.last-name = "Jones" 
  no-error.

Converted example:

...
new FindQuery(person,
              "upper(person.areaCode) = '111' and upper(person.phoneNumber) > '2220000' and " +
              "upper(person.lastName) = 'JONES'", null,
              "person.areaCode asc, person.phoneNumber asc, person.extension asc",
              LockType.NONE).first();
...

Details:

In this case, no unique indexes are in the list of candidates. Both the phone and name indexes have an equal number of components in equality matches, but not all of them. The phone index is selected, since the range expression contains the next available component of the phone index not already used in an equality match.

Example 12. An Index with its First Component Involved in a Range or BEGINS Match (Rule 5)

4GL source:

{person-temp-table.i}

find first person no-lock
  where person.hire-date >= 01/01/2000
  no-error.

Converted example:

...
new FindQuery(person, "person.hireDate >= '2000-01-01'", null,
              "person.hireDate asc", LockType.NONE).first();
...

Details:

In this case there are no equality matches. The hire-date index will be selected based on the index component reference in the range expression.

Example 13. Range Versus Sort Matches* (Rule 5)*

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.hire-date >= 01/01/2000
    and person.hire-date <  01/01/2001
  by person.last-name by person.first-name:
  ...
end.

Converted example:

...
query5 = new PreselectQuery(person,
                            "person.hireDate >= '2000-01-01' and person.hireDate < '2001-01-01'",
                            null,
                            "person.lastName asc, person.firstName asc, person.hireDate asc",
                            LockType.NONE);
...

Details:

In this case there are no equality matches. The hire-date index is selected based on the index component reference in the range expression with no regard to sort matches. The BY clauses are honored as the primary sort criteria.

Example 14. Most Sequential Leading Sort Matches (Rule 6)

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.phone-number > "2220000" 
    and person.last-name > "Jooooo" 
  by person.last-name by person.first-name by person.area-code:
  ...
end.

Converted example:

...
query6 = new PreselectQuery(person,
                            "upper(person.phoneNumber) > '2220000' and " +
                            "upper(person.lastName) > 'JOOOOO'", null,
                            "person.lastName asc, person.firstName asc, " +
                            "person.areaCode asc, person.middleInit asc",
                            LockType.NONE);
...

Details:

The name index is selected in this case because more components of this index match the BY clauses.

Example 15. Primary Versus Alphabetical Tie Break (Rule 7)

4GL source:

{person-temp-table.i}

for each person no-lock
  where person.site-id = 1
    and person.last-name = "Jones":

  ...
end.

Converted example:

...
query7 = new AdaptiveQuery(person,
                           "person.siteId = 1 and upper(person.lastName) = 'JONES'", null,
                           "person.siteId asc, person.empNum asc", LockType.NONE);
...

Details:

Both site-emp and name indexes have an equal number of components in equality matches. The primary index site-emp is selected in this case.

Example 16. Two Equal Range Selections with a Find (Rule 8)

4GL source:

{person-temp-table.i}

find first person no-lock
  where person.hire-date >= 01/01/2000
    and person.hire-date <  01/01/2002
    and person.term-date >= 01/01/2004
    and person.term-date <  01/01/2005
  no-error.

Converted example:

...
new FindQuery(person,
              "person.hireDate >= '2000-01-01' and person.hireDate < '2002-01-01' and " +
              "person.termDate >= '2004-01-01' and person.termDate < '2005-01-01'", null,
              "person.hireDate asc", LockType.NONE).first();
...

Details:

The hire-date index is used as alphabetical choice.

Example 17. Operators and Functions with No Impact On Index Selection

4GL source:

{person-temp-table.i}

find first person no-lock
  where not (person.site-id = 1) and
        person.last-name <> "Gaither"  and
        person.last-name matches "R*" and
        person.first-name = entry(1,"Rod,George,Frank") and
        person.term-date = person.hire-date and
        person.term-date + 5 = 01/01/2003
  no-error.

Converted example:

...
new FindQuery(person,
              "not (person.siteId = 1) and " +
            "upper(person.lastName) != 'GAITHER' and " +
            "upper(person.lastName) like 'R%' and " +
            "upper(person.firstName) = ? and " +
            "person.termDate = person.hireDate and " +
            "plus(person.termDate, 5) = '2003-01-01'",
            null,
            "person.siteId asc, person.empNum asc",
            new Object[]
              {
                 toUpperCase(entry(1, "Rod,George,Frank"))
              },
              LockType.NONE).first();
...

Details:

The operators and functions used do not participate in the index selection process and as a result the index selected is the default, primary index.

Example 18. Lookup Function

4GL source:

{person-temp-table.i}

find first person no-lock
  where lookup(person.pay-type,"H,S") = 1
  no-error.

Converted example:

...
new FindQuery(person, "lookup(upper(person.payType), 'H,S') = 1", null,
              "person.siteId asc, person.empNum asc", LockType.NONE).first();
...

Details:

The primary index is selected as a default.

Example 19. ROW-ID

4GL source:

{person-temp-table.i}

define buffer b-person for person.

define variable t-rowid as rowid no-undo.

find first person no-lock.
assign t-rowid = rowid(person).

find b-person no-lock
  where rowid(b-person) = t-rowid
  no-error.

Converted example:

...
new FindQuery(person,
              (String) null,
              null,
              "person.siteId asc, person.empNum asc",
              LockType.NONE).first();

tRowid.assign(RecordBuffer.rowID(person));
ErrorManager.silentErrorEnable();

new FindQuery(bPerson,
              "bPerson.id = ?",
              null,
              "bPerson.siteId asc, bPerson.empNum asc",
              new Object[] { tRowid },
              LockType.NONE).unique();
ErrorManager.silentErrorDisable();
...

Details:

FIND with ROWID selection shows no impact on index selection. The default primary index is used. Of course, since a unique record is being found here, a sort order essentially is meaningless.*


© 2004-2017 Golden Code Development Corporation. ALL RIGHTS RESERVED.