com.goldencode.expr (P2J - Progress 4GL to Java Conversion and Runtime)

Interface Summary
Interface	Description
CompilerConstants	Constants used by the expression compilation support classes.
ExpressionFlags	Expression flags that affect the compilation and runtime behavior of a compiled expression.
ExpressionParserTokenTypes
Scope	Objects of this type are used by the `SymbolResolver` to partition library objects and user variables registered by clients of the expression engine.
WritableData	Classes which implement this interface must be able to write their data to a `DataOutput` stream.

Class Summary
Class	Description
Argument	This class represents an argument to a bytecode instruction.
Attribute	This is the base class for all classes which represent attribute info structures found within a class file.
BytecodeContainer	This class represents any object which contains bytecode, such as a bytecode instruction or a code unit containing multiple instructions.
ClassConstant	This class represents the `CONSTANT_Class_info` structure defined by the Java Virtual Machine Specification.
ClassFile	This class encapsulates knowledge about the Java class file format as defined in The Java Virtual Machine Specification, Second Edition by Tim Lindholm and Frank Yellin.
CodeAttribute	This class manages information necessary to create the `Code` attribute structure for a method.
CodeUnit	Instances of this class organize one or more bytecode instructions into logical units of code.
CompiledExpression	This is the base class for all custom, compiled expression classes.
Compiler	Creates in-memory Java classes on the fly from expression strings written in infix notation.
Compiler.LogicalData	Helper class which stores information to assist when assembling branch instructions for logical tests.
Compiler.PrimitiveInfo	A helper class which stores information about the various primitive data types supported by the expression compiler.
Constant	This is the parent class to all of the various types of constants supported by the Java class file format.
ConstantArgument	This class represents an instruction argument which is a constant.
ConstantPool	This class represents the constant pool found within the Java class file format (see The Java Virtual Machine Specification, Second Edition).
DoubleConstant	This class represents a double constant.
ExceptionAttribute
Expression	This class provides transparent access to compiled expressions.
Expression.CacheKey	Definition of the key to the cache of `CompiledExpression` instances.
ExpressionClassLoader	This class extends `ClassLoader` to enable the loading of a class file stored as a byte array in memory.
ExpressionInitializer	Generates an initial value for a variable, using a contained `Expression`.
ExpressionLexer	Tokenizes an arithmetic or logical expression in infix notation (input as a stream of characters) into a stream of tokens suitable for the `ExpressionParser`.
ExpressionParser	Creates an Abstract Syntax Tree (AST) representation of an expression in infix notation from an input stream of tokens (provided by the `ExpressionLexer`).
ExtraAst	A modified abstract syntax tree node which provides an additional service for storing a single object reference of the user's choosing.
FieldrefConstant
FloatConstant
FMIrefConstant
Function	Support class which assists in parse-time resolution of method references in user expressions.
Initializer	Generates an initial value for a variable.
Instruction	Instances of this class represent individual bytecode instructions.
IntegerConstant
InterfaceMethodrefConstant
LongConstant
MethodInfo
MethodrefConstant
NameAndTypeConstant
NumericArgument
SimpleInitializer	Generates an initial value for a variable, using a contained instance that is returned directly.
StringConstant
SymbolResolver	The abstract base class of application-specific symbol resolvers.
SymbolResolver.CacheKey	Helper class which is used to assist in lookups of methods when introspecting invocation target classes.
SymbolResolver.TestResolver	Concrete implementation of a `SymbolResolver` for simple testing and debug purposes.
TestDriver	Test driver and sample client code for the compiled expressions package.
UnknownType
Utf8Constant	This class represents a string constant in UTF-8 format, which is used by the JVM to reference strings.
Variable	Implementation of a user variable which is accessible from expressions.
Verifier	Provides inspection methods that search down the expression tree to determine operator/operand types.

Exception Summary
Exception	Description
AmbiguousSymbolException	Exception thrown when a lookup of a function or variable is ambiguous.
CompilerException	Exception thrown during `Compiler` operations.
ExpressionException	Exception thrown during `ExpressionCompiler` operations.
SymbolException	Exception indicating an error relating to symbol resolution processing.
UnresolvedSymbolException	This exception should be thrown by classes which implement the `SymbolResolver` interface, when a variable is encountered which cannot be resolved.

Package com.goldencode.expr Description

Provides a cached, high performance (dynamically compiled) expression processing engine for the generic processing of expressions that can return objects or perform assignment operations.

Author	Eric Faulhaber
Modification Date	July 4, 2005
Access Control	CONFIDENTIAL

Introduction
Why Compile Expressions?

The Trade-Off

Expression Types
General Syntax
Variables
User Functions
Method Invocation
Supported Operators
Symbol Resolution

Scope
Callback Libraries
Constants
Security
Resolution Error Handling

Debugging
Other Options
Test Driver Example
Just-In-Time Compilation
Known Limitations

Introduction

The expr package was created to address performance issues with often-evaluated runtime expressions. The expression engine implemented in this package dynamically compiles a runtime expression's logic directly into Java bytecode instructions to create a Java object which is well suited to execute the expression repeatedly in performance critical situations. The object uses callback libraries and variables to interact with client code and to allow client code to provide extended services to the user via the expression engine. Client code must supply a symbol resolver implementation for this purpose.

Each expression is compiled into a discrete class. This is done by assembling a Java class in memory, structured according to the Java class file format (see The Java^TM Virtual Machine Specification, Second Edition by Tim Lindholm, Frank Yellin - ISBN 0-201-43294-3). The class includes the Java bytecode instructions necessary to execute the expression efficiently and quickly, as well as the minimal infrastructure necessary for a valid and verifiable class file. The class is loaded directly from memory into the JVM via a custom class loader. It is not stored in the file system, except when the expression is compiled in debug mode in a special developer build of the expression engine. Once compiled and loaded into the virtual machine, expression classes are never explicitly unloaded.

The semantics of compiling an expression are encapsulated in the Expression class, and are transparent to client code. Client code simply prepares an instance of this class using an expression string in infix notation and a symbol resolver object, then invokes a method to execute the expression. Internally, the infix expression is submitted to the expression compiler for "just-in-time" compilation the first time the expression is to be executed. The compiler parses the expression, assembles the proper bytecode instructions, compiles a Java class, loads it into the current Java virtual machine, instantiates it, and executes it using a well-known method invocation convention. The compiled expression instance is cached such that subsequent requests for an instance of that expression object can skip the parsing, assembly, compilation, class loading and instantiation steps entirely and simply return the existing instance immediately.

Why compile expressions?

Expressions are compiled primarily as a performance optimization. A secondary reason to compile expressions is that certain convenience features, such as auto-boxing, automatic type conversion, automatic null checking, and shorthand notation are built into the expression compiler. These features allow very robust expression logic to be written in a compact and generally safe manner.

First, let us consider the performance implications. Traditionally, runtime expressions are evaluated using some variant of the following approach:

Lexically analyze an infix notation expression (e.g., '(A + B == C * 2) OR (A < D)') into tokens.
Parse the tokens into a postfix (a.k.a. reverse Polish) notation (e.g., 'A B + C 2 * == A D < OR'), or into a similarly structured tree.
Evaluate the modified expression completely: the result of each subexpression determines the input to the next higher subexpression, until the entire expression is evaluated and the result is returned.

This approach is effective and elegant in its simplicity. However, much time (relatively speaking) is spent lexing and parsing; probably more than is spent actually evaluating the result. The actual execution of the expression involves the additional overhead of at least one runtime data structure (typically a stack) to manage pushing and popping the operator and operand objects (and casting them to the appropriate data types). In practical terms, this is not concerning for expressions which must only be evaluated a few times. However, for expressions that must be evaluated many times quickly, this overhead is considerable.

Also, in a compound logical expression which uses the AND or OR conjunction operators, the above approach does not optimize out unnecessary processing if a result can be determined early in the expression's execution. Every sub-operation of the expression is evaluated to determine the final result. Consider that the right side of a compound expression using OR can be ignored completely if the left side evaluates to true, and that the right side of an compound expression using AND can be ignored if the left side evaluates to false.

When an expression is compiled, steps 1 and 2 above are performed only once, the first time the expression is encountered. Thereafter, the expression class is cached. Optimizations are performed to convert string constants to the literals they represent. The bytecode instructions are optimized for compound, logical expressions, so that as little of the expression is processed as is necessary to reach a definitive result.

Next, let us consider the convenience aspects of compiled expressions:

Auto-boxing. This feature allows seamless integration within an expression between primitive data types and their associated Java wrapper objects. For instance, a variable of type java.lang.Integer can be assigned an int value directly in an expression; a method which requires a primitive boolean parameter may be invoked from an expression which passes a java.lang.Boolean object as that parameter. The expression compiler detects these conditions, determines that these fixups are required, and compiles instructions accordingly to perform the proper wrapping and unwrapping of data at expression execution time.
Automatic Type Conversion. Similar to auto-boxing, the compiler will detect when a provided data type must (and can) be converted to a required type at expression execution time. This feature will perform both widening numeric and object reference conversions, as well as narrowing conversions. Note that this is a double-edged sword; truncation and data/precision loss may occur in a narrowing conversion. Thus, some of the safety of strict typing may be lost with this convenience, as the compiler assumes that the author of an expression knows best.
Automatic Null Checking. In (and only in) expressions which evaluate to a boolean result or which contain sub-expressions which evaluate to a boolean result, object references retrieved at runtime are checked for null before they are dereferenced by the expression, in order to avoid the expression from throwing a NullPointerException. If an object reference involved in some boolean operation evaluates to null, that operation will always evaluate to false. On the other hand, if an object reference is passed as a parameter to a method, it is not first checked against null, as the compiler can not presume in this case that a null parameter to a method is invalid.
Shorthand Notation. Properties of an object accessible to an expression, which are exposed via a bean-like API, can be referenced from an expression using a shorthand notation which eliminates the semantics of method invocation. For instance, if an object foo has methods, int getBar() and void setBar(int), they can be referred to respectively, in expressions as follows: foo.bar > 10 and foo.bar = 55.

The Trade-Offs

From a performance perspective, the following trade-offs should be considered. The first time an expression is encountered, it must be lexed and parsed, just as with the traditional approach. Additionally, compilation and class loading must be performed. This front-end loads the performance cost. For expressions that must be evaluated only once or a handful of times, the break-even point for that performance penalty may not be reached. Memory consumption should also be considered when evaluating perfomance. Each unique runtime expression necessitates the creation and loading of a new class. By contrast, evaluating an expression using the traditional approach creates no lasting increase in memory footprint. Though the compiled expression classes are quite small, there is some amount of redundant overhead in each class, due to the structure of the Java class file format.

From a convenience perspective, the trade-offs to be considered are the reduction in strict type safety which comes with auto-boxing and auto-conversion. In some cases, these conversion features may cause ambiguity during method and user function resolution at expression compile time. However, these issues are usually remedied with well-placed typecasts. Finally, debugging may require additional effort to understand the compiler's default conversion behaviors, especially since source code level debugging is not possible inside of a compiled expression. This is because no intermediate Java source code is ever generated or stored; instead, the compiler accepts an infix notation expression as input and emits an in-memory Java class file image as output.

Expression Types

Two types of expressions are supported:

Assignment
Non-assignment

An assignment expression is one whose top level operation stores a literal, a variable value, or the result of a calculation or method invocation into a variable. This expression's top level operator is the assignment operator (=). This expression always returns null. Thus, inline or nested assignment operations are not possible. Examples of an assignment expression are:

myVar = 10



myVar = a + b



myVar = null



myVar = foo.getBar()



myVar = a > b

A non-assignment expression is one which returns some object, but which does not assign it to a variable. Any object type supported by Java may be returned. It is up to the application or framework which uses the expression to capture the object and act upon it or analyze it according to the needs of the application or framework. Examples of a non-assignment expression are:

7



true



myVar == a + b



a > b



a * b + 5



foo.getBar()

General Syntax

The syntax supported by the expression engine is largely the same as that supported by the Java language itself. All symbols are case sensitive. In addition to basic, scalar expressions, method invocation using familiar Java syntax is supported. In addition, where the default type conversion behavior is insufficient, type casting is possible. However, there are a number of differences and exceptions to the expression engine's support for Java syntax:

No direct static references. Static references in the form <class name>.<static member name> are not supported. For method invocation, one must either have an object instance upon which to invoke a method with the dot operator (.), or one must be invoking a user function in a registered callback library.
Special typecast syntax. If casting an object reference to a class type, the fully qualified class name is required. In addition, the opening left parenthesis is preceded by the hash symbol (#). For example, given a variable myInteger of type java.lang.Integer, an object foo, and a method of foo with signature Object getIntegerAsObject(), which is known to return a java.lang.Integer (widened to a java.lang.Object), the following cast would be appropriate: myInteger = #(java.lang.Integer) foo.integerAsObject
No inline assignments. Assignments can only be made at the top level of an expression. Thus, the following expression is not valid: (myVar = a + b) > 24, because the assignment to myVar occurs inline, whereas the greater than comparison is made at the top level of the expression.
No new keyword. There is no direct support in the expression engine for the construction of new object instances. Object references are available only from callback library methods (user functions) and from variable references. Any support for the construction of new object instances must be provided by the application code which uses the expression engine.
Arrays not supported. There is no syntax to create arrays, nor to reference array elements by subscript. As an alternative, use java.util.List and its implementation variants.
String delimiters. Strings embedded within expressions may be delimited by either single quotes (e.g., 'Hello World') or double quotes (e.g., "Hello World"), so long as the same character is used on both ends of the string. Note that it is possible to mix and match the delimiter for different strings within the same expression. However, this is bad form and should be avoided unless it is necessary to escape one delimiter, and then the other, from within the same expression.
No support for char literals. Primitive characters are not supported within expressions as literals; any single character inside single quotes is instead interpreted as a string of length one. However, values of type char may be returned from method calls and passed as parameters to other methods, just as any other primitive or object. For instance, the following expression is valid: string1.indexOf(string2.charAt(7)) >= 0
String literals may not be dereferenced. Thus, the following syntax is invalid: "Hello World".length(). Instead, intermediate storage of a string literal in a variable is necessary to perform this type of dereferencing.
Several operators are not available:

new (see above)
instanceof
string concatentation (+)
ternary (?:)
increment (++)
decrement (--)
assignment combination operators (+=, -=, *=, /=, %=, |=, &=, ^=, >>=, <<=, >>>=)

Method call chaining limited. Calls may be chained from a variable or from the result of an explicit method/user function invocation. No other forms of call chaining are supported. In particular, one cannot chain a method call from a parenthesized sub-expression.

Variables

Variables are containers which associate a symbol name, a data type, and a particular symbol resolution scope with an object reference (the referent). Variables are supported directly by the expression engine as first class objects. Variables must be declared to store a particular type of referent; this can be any non-primitive data type. A variable must be registered with the expression engine before it may be referenced by an expression; otherwise, the referencing expression will fail to compile. Variables are segregated into variable pools by associating a scope with a variable during registration.

A variable can be initialized by an arbitrary expression. This expression may refer to other variables which share scope with this variable, and which have been registered before it. Circular variable references within initializer expressions are not permitted. The initializer expression is executed whenever the variable's reset method is invoked (including during Variable construction), and the expression's result becomes the new referent of the variable. If no initializer expression is provided, the variable is initialized to null.

Variables are represented by instances of the Variable class. An instance of this class is created only when a variable is registered with the symbol resolver; this class cannot be instantiated directly. Variables optionally may be registered as read-only. In this mode, expressions which reference a variable may access the referent of a variable, but may not change it. This restriction holds only for variable access from within expressions, but not for programmatic manipulation of a Variable object. The latter is always permitted, regardless of the read-only state of the variable.

Variable Syntax

A variable is used in an expression with Java-like syntax. For example, assuming a variable myVar has been registered as type Bar, the expression

myVar = foo.getBar()

represents an assignment to myVar. The expression

myVar.doSomething()

invokes the method doSomething() on myVar's referent, an instance of Bar. Note that any assignment, access, comparison, invocation, etc. against a variable always is applied against the variable's referent. Because of the automatic null checking, auto-boxing, and automatic type conversion features of the expression engine, variables which represent primitive values using wrapper objects can interact naturally with primitive literals. For example, given a variable num, registered as type java.lang.Integer, the following expression is perfectly valid, despite the apparent type mismatch:

num >= 100.5

In this case, num's referent is checked against null (auto null checking), then unwrapped to an int (auto-boxing), then widened to a double (auto-conversion) before the comparison takes place (in the event num's referent is null, this expression would return false).

User Functions

User functions are a means of accessing application-provided functionality from within expressions. User functions are exposed by a callback library which is registered with the symbol resolver. The callback library defines the method which provides the backing implementation of a user function. A user function can provide any capability required by the application; there are effectively no functional limits. A user function can accept any number of arguments (including a variable argument list), or none at all. It can return any object or primitive value, or void.

User Function Syntax

A user function is invoked from an expression using Java-like syntax. For example:

someUserFunction(myVar, 39,
constant)

It is recommended to qualify the invocation with the name of the callback library which implements the user function as follows:

myLib.someUserFunction(myVar, 39,
constant)

Doing so ensures no ambiguity in resolving the correct function if more than one callback library implements a method with the same name and parameter signature. In the case where this occurs and there is no disambiguating qualifier, an AmbiguousSymbolException is thrown at expression compile time.

Argument types are somewhat flexible, within the bounds of the auto-boxing and automatic type conversion features the expression engine provides. For example, a user function which is defined to accept a parameter of type int can in fact accept any numeric primitive or numeric primitive wrapper object. Primitive unwrapping operations and narrowing conversions will occur as necessary to allow the user function to resolve at expression compile time. Note that this behavior may result in ambiguity among user function alternatives which may require a typecast to eliminate. Consider, for instance two alternatives of a method within callback library myLib which implement different user functions:

public void foo(long num)
public void foo(java.lang.Integer num)

and the expression:

myLib.foo(25)

This expression creates ambiguity, because both alternatives would be candidates for a match during user function resolution. The solution to this problem is to use an explicit typecast to disambiguate, as in:

myLib.foo(#(long) 25)

which forces a widening conversion of the literal 25 to a long, or

myLib.foo(#(java.lang.Integer) 25)

which forces the literal 25 to be wrapped into an instance of java.lang.Integer. Note that where there is no such ambiguity, the typecast is unnecessary.

Variable Length Argument Lists

Variable length argument lists (varargs) are supported by the expression engine, even when using JVM releases prior to J2SE 5.0. To implement backing support for a user function, a callback library method is defined to accept an array of java.lang.Objects as its final parameter. For instance:

public void
myVarArgFunction(Object[] args)

{

    // process variable arguments

    for (int i = 0; i < args.length; i++)

    {

       ...

    }

}

No parameters should come after the Object array in the method definition. If any parameters are required, these must be explicitly listed preceding the Object array, as in:

public void
myOtherVarArgFunction(int len, String text, Object[] args)

{

    // process required arguments 'len' and 'text'

    ...



    // process variable arguments

    for (int i = 0; i < args.length; i++)

    {

       ...

    }

}

The array of Objects passed to such methods is guaranteed to be non-null. In the event an expression invokes such a user function, but passes no parameters for the variable portion of the parameter list, the Object array will be an array of size zero. Invocation of the above user functions is straightforward. The following expressions are all valid; note, however, that NullPointerException or ClassCastException may be thrown by the backing method if a null argument is dereferenced or an argument is assumed to be an incorrect type, respectively:

myLib.myVarArgFunction()
myLib.myVarArgFunction(1, 2, 3)
myLib.myVarArgFunction(1, myVar)
myLib.myOtherVarArgFunction(10, 'Hello World')
myLib.myOtherVarArgFunction(5, 'Some text', vararg1, vararg2)

Method Invocation

The expression engine supports the direct invocation of arbitrary methods against an object reference using the dot (.) operator. This concept differs from user function invocation in that it requires no explicit registration of a callback library of backing methods with the symbol resolver, but it always requires an object instance upon which to apply the invocation (even for static methods). Thus, an unqualified method invocation will fail to compile, since unlike Java, the expression engine provides is no implicit this reference. Likewise, a static method invocation qualified by a class name will fail to compile, since there is no implicit class resolution.

Method Invocation Syntax

Syntax for method invocation in an expression is quite similar to method invocation in Java, considering the caveats mentioned above and in the General Syntax section. It takes the form:

<object
reference>.<method name>([param1 [, ...]])

For instance, given a variable string1 of type java.lang.String, initialized to "Hello World", the following represents a valid invocation of a java.lang.String method:

string1.indexOf('He')

which would return 0 upon execution.

Variable Length Argument Lists

Varargs are supported as they are with user functions. Please refer to the discussion of this topic in the User Functions section for details.

Symbol Resolution

Variables, user functions, methods, and constants are all represented as symbols within an expression. In order for an expression to compile correctly, each symbol it references must be resolved to a backing method or literal value. All symbol resolution occurs at expression compile time. Contrary to previous implementations of this framework, no symbol resolution is deferred to expression execution time. This provides for both enhanced performance and enhanced security, because expensive lookups and permission checks are performed only once, at compile time. If they fail, the expression fails to compile and can never be executed.

Scope

Callback Libraries

Constants

Security

Resolution Error Handling

Supported Operators

The set of operators which may be used within expressions is listed in the table below. Operators in this table are listed in order of their precedence, from those evaluated first to those evaluated last. Operators which have the same precedence are grouped together. When evaluating operations whose operators have the same precedence, operations are performed in the order in which they appear, from left to right. Parentheses (()) may be used to group operations which must be evaluated in a different order.

Precedence	Symbol	Type	Unary/Binary	Operation Performed
1	`!` or `not`	Logical	Unary	Logical complement
2	`~`	Bitwise	Binary	Bitwise complement
3	`-`	Arithmetic	Unary	Negation
4	*``**	Arithmetic	Binary	Multiplication
	`/`	Arithmetic	Binary	Division
	`%`	Arithmetic	Binary	Remainder
5	`+`	Arithmetic	Binary	Addition
5	`-`	Arithmetic	Binary	Subtraction
6	`<<`	Bitwise	Binary	Left shift
	`>>`	Bitwise	Binary	Right shift w/ sign extension
	`>>>`	Bitwise	Binary	Right shift w/ zero extension
7	`<`	Logical	Binary	Is less than
	`<=`	Logical	Binary	Is less than or equal to
	`>`	Logical	Binary	Is greater than
	`>=`	Logical	Binary	Is greater than or equal to
8	`==`	Logical	Binary	Is equal to
8	`!=`	Logical	Binary	Is not equal to
9	`&`	Bitwise	Binary	Bitwise AND
10	`^`	Bitwise	Binary	Bitwise XOR
11	`\|`	Bitwise	Binary	Bitwise OR
12	`&&` or `and`	Logical	Binary	Conditional AND
13	`\|\|` or `or`	Logical	Binary	Conditional OR

Resolving Symbols (TBD - update)

All but the most trivial expressions will contain symbols which have some application-specific meaning. These represent placeholders in the expression for values to be substituted when the expression is executed (or when it is compiled). To enable client code to provide these value substitutions at the appropriate times, compiled expression objects use a callback model. This model is described by the SymbolResolver interface.

When submitting an expression for compilation, client code must provide a SymbolResolver object. This object will be called:

by the expression parser during the parsing phase, when it encounters a string constant which must be resolved into a literal;
by the expression compiler during the compilation phase, when it encounters one of the following constructs in the expression:

a variable for which it requires data type information;
a user function for which it requires return type information;

by the compiled expression object itself during the runtime execution phase, when it needs to:

retrieve the current value of a variable;
execute a user function.

The SymbolResolver implementation supplied by client code is responsible for providing the correct substitution values based upon the context of the application at the time it executes an expression; the expression object itself has no awareness of the application's current state. The TestDriver class is a sample implementation of the SymbolResolver interface. It is discussed in greater detail below.

How Resolution Errors are Handled

If a string constant cannot be resolved during expression parsing, an error is reported to stderr, followed by an ExpressionException thrown at compile time. Client code generally should not throw an exception from the SymbolResolver resolveXXX methods; the subclasses of CompiledExpression do not have exception handlers, as these are expensive constructs. Any exception thrown by client code during symbol resolution will propagate up the call stack back to the client code which called ArithmeticExpression.compute() or LogicalExpression.evaluate().

Instead, if no suitable variable substitution exists, null should be returned from a SymbolResolver resolveXXX method. A null return is handled differently by ArithmeticExpression and LogicalExpression objects, as described below.

Arithmetic Expressions

An unresolved variable is a fatal condition for an arithmetic expression, since it cannot compute a result if a component value is missing. If null is returned from a variable resolver callback method, ArithmeticExpression throws an UnresolvedSymbolException.

Logical Expressions

Logical expressions are more lenient to null values returned by the variable resolver. However, the implication of this leniency is that unexpected results may occur. Consider the following expression:

MYVAR == 10 This expression is interpreted internally by the expression engine as

MYVAR != null and MYVAR == 10 Thus, if MYVAR cannot be resolved to a substitute value, then this expression would evaluate to false. This treatment of the above expression seems fairly intuitive. However, it should be noted that the expression

MYVAR != 10 also will evaluate to false if the variable MYVAR cannot be resolved at evaluation time. This may not be as immediately intuitive as the first example, but this behavior is consistent. This is because the latter expression is interpreted internally by the expression engine as

MYVAR != null and MYVAR != 10 Finally it should be noted that the expressions:

MYVAR != 10 and

!(MYVAR == 10) are not equivalent. The former will return false if MYVAR cannot be resolved (as discussed above), while the latter will return true in the same circumstance. This is because the latter expression is interpreted internally by the expression engine as

!(MYVAR != null and MYVAR == 10) Distributing the negation operator (!) across both subexpressions inside the parentheses yields

MYVAR == null or MYVAR != 10 Thus, the left side of the expression evaluates to true if MYVAR cannot be resolved. In this event, the right side of the expression is ignored and the overall expression returns true. It is important to keep in mind the subtleties of how unresolved variables are handled when considering input expressions.

Test Driver Example (TBD - obsolete)

The TestDriver class represents a trivial application which loads records from a simple database (implemented as a properties file) and allows expressions to be executed against these records. Either arithmetic or logical expressions can be evaluated against any valid range of records defined in the properties.

The program interprets the basic data types used by compiled expression objects. In addition, it processes one higher level data type -- a date -- which allows it to use the parsing capabilities of java.util.SimpleDateFormat to process date and time string constants. Data and time string constants are converted by the application to Long values (number of milliseconds since midnight, Jan. 1, 1970) for use in expressions.

The TestDriver class serves as an illustrative example of a VariableResolver implementation and of an ExpressionCompiler client. It can be launched from the command line, or may be used programmatically as a test harness. See TestDriver's main method for usage syntax and its class decription for a sample properties format.

In its default properties configuration, the program manages the following information about a set of hypothetical employees:

Field Name	Application-Defined Data Type	Compiled Expression Data Type	Notes
name	string	string	Employee name
dob	date	long	Employee date of birth (translates to #millis since 01/01/1970)
overtime	double	double	Overtime hours for the current period
city	string	string	City in which employee works
union	boolean	long	Employee's union affiliation
begin	date	long	Time employee's regular work shift begins (translates to #millis since 01/01/1970)
end	date	long	Time employee's regular work shift ends (translates to #millis since 01/01/1970)

The following boolean expression determines who is affliated with a union:

union

This is the equivalent:

union == true

Running either of these expressions against all default records should produce the following results:



[1] true

[2] true

[3] false

The following arithmetic expression determines the approximate age in years of each employee (recall that time values are measured in milliseconds):

(@now() - dob) / 86400000 / 365.25

This results in:

[1] 54.908003341413796

[2] 44.55904144317058

[3] 59.077750090215986

Notice the use of the @now() user function in the above expression. From the compiled expression's point of view, this symbol is simply a variable reference to be resolved to a numeric value. It is recognized by the application code (in the implementation of the resolveToLong method) as having special meaning. As a result, special processing is invoked to resolve this variable to the current date and time (as a millisecond value).

String constants may be used in comparisons; they must be enclosed in single quotes:

name == 'Larry' or city == 'St.
Louis'

This produces the following results with the default data records:

[1] true

[2] true

[3] false

Omitting the single quotes from a string constant will result in either an ExpressionException being thrown, or in a potentially incorrect result. The unquoted symbol will not be recognized as a string constant, but will instead be treated as a variable, or as an unexpected token, depending upon the contents of the string constant. For instance, the results of leaving the quotes off 'Larry' in the example above results in a valid expression, but Larry is treated as a variable at runtime, and resolves to null. This results in a successful compilation and execution, but the results are probably not what was intended:

[1] false

[2] true

[3] false

On the other hand, if the quotes are instead removed from

'St.
Louis'

in the same expression, the compiler treats this condition as a fatal error, since the string Louis is now interpreted as an unexpected, extra token, which invalidates the expression.

The TestDriver application should be explored and tested with a variety of expressions, to determine the capabilities (and limitations) of expression processing with the expr package.

Expression Compilation (TBD - update)

Compilation of an expression into Java classes is a complex process. This section explains the flow of this process in some detail, but it is not intended to be exhaustive.

The first step of expression compilation is a combined lexing/parsing of the expression string from infix notation to a list of tokens in postfix notation. To extend the TestDriver example above, the following expression in infix notation:

union and begin < '08:00:00'

determines which employees have a union affiliation and work a shift which begins earlier than 8 o'clock in the morning. This expression string is converted in this initial stage into the following token stream in postfix order:

  union      begin    '08:00:00'     <         and

(variable  (variable  (constant   (binary    (binary
 operand)   operand)   operand)   operator)  operator)

The expression compiler creates a skeleton representation of a Java class file in memory. It generates a unique name for the new class, and sets its superclass to ArithmeticExpression for numeric expressions and to LogicalExpression for boolean expressions. It creates a default constructor and a stub execution method (compute() for numeric, evaluate() for boolean) for the new class. It is into this method which the expression's logic will be distilled as Java bytecode instructions.

Now that the expression's tokens are arranged in postfix order, they are iterated from left to right. If an operand is encountered, it is pushed onto a compile-time operand stack. Variable operands are pushed directly onto the stack as strings; they must be resolved into replacement values at runtime. The compiler calls back to the variable resolver to request the data type of each variable. This allows the compiler to select an appropriate callback method for use at runtime, to minimize the amount of type casting necessary.

When the iteration encounters an operator, the operand stack is popped to retrieve the operand(s) to which the operator will be applied: one operand is popped for a unary operator, two for a binary operator. The first operator encountered in the example above is the less-than (<) operator, which pops the constant string '08:00:00' and the variable begin. Once the operator has the needed operand(s), it examines them to determine what Java bytecode instructions are required to handle each at runtime.

If the operand is a constant string, the compiler will ask the variable resolver to resolve it into a simpler representation. For instance, the constant string '08:00:00' represents a time, which this application chooses to resolve to a number of milliseconds since midnight. When the compiler encounters this token, it calls TestDriver's resolveConstant method, which returns a Long with an internal value of 46800000. It is this simpler numeric representation which is compiled into the class as a constant, avoiding the need for the application to interpret the string '08:00:00' each time the expression is executed.

If the operand is a variable, bytecode instructions to access the expression's variable resolver and to make the appropriate type of callback are first assembled. Instructions for a null check on the callback result are then appended. Finally, whether the operand is a variable or a constant, instructions are appended to push the resolved value onto the JVM's runtime operand stack.

Once each operand's code has been assembled, bytecode instructions are added to perform the task of popping the analogous runtime operands off the JVM's runtime stack, and to perform the operator's actual runtime function. The assembled code unit for the sub-expression (e.g.,

begin
< '08:00:00'

in this case) is then pushed back onto the compile time operand stack.

When the logical, binary operator and is encountered, the compiler pops this code unit and the union variable operand off the stack. It recognizes that the code unit operand has already been processed and only assembles bytecode instructions for the union operand and for the logical and operation. As an optimization, the algorithm which assembles instructions for the logical and operation ensures that if the union variable resolves to false at runtime, it will jump directly to the end of the method to return false, skipping the evaluation of the begin < '08:00:00' portion of the expression entirely.

Bytecode instructions to return the expression's overall result (in this case true or false) are finally assembled for the end of the method, branching logic code fixups and bytecode offset fixups are made, the class' constant pool is indexed, and the finished class file is written to a byte array in memory for further class loading and caching. If the expression was compiled in debug mode, the byte array is written to the file system as a class file, at a location specified during construction of the ExpressionCompiler object.

The following output displays the internal representation of the Java class which is created for the above expression:

Class filename:  LE0.class
Magic Number  :  0xCAFEBABE
Version       :  45.3
This Class    :  com/goldencode/expr/LE0
Super Class   :  com/goldencode/expr/LogicalExpression
Access Flags  :  ACC_PUBLIC 

Constant Count: 0x24 (36 dec)
1:  <String> begin
2:  <String> union
3:  <Class> com/goldencode/expr/CompiledExpression
4:  <Class> com/goldencode/expr/LE0
5:  <Class> com/goldencode/expr/LogicalExpression
6:  <Class> com/goldencode/expr/VariableResolver
7:  <Class> java/lang/Long
8:  <Field> com/goldencode/expr/CompiledExpression.resolverLcom/goldencode/expr/VariableResolver;
9:  <Method> com/goldencode/expr/LogicalExpression.<init>()V
A:  <Method> java/lang/Long.longValue()J
B:  <InterfaceMethod> com/goldencode/expr/VariableResolver.resolveToLong(Ljava/lang/String;)Ljava/lang/Long;
C:  <Double> 4.68E7
E:  <NameAndType> <init>()V
F:  <NameAndType> longValue()J
10:  <NameAndType> resolveToLong(Ljava/lang/String;)Ljava/lang/Long;
11:  <NameAndType> resolverLcom/goldencode/expr/VariableResolver;
12:  <Utf8> ()J
13:  <Utf8> ()V
14:  <Utf8> ()Z
15:  <Utf8> (Ljava/lang/String;)Ljava/lang/Long;
16:  <Utf8> <init>
17:  <Utf8> Code
18:  <Utf8> Lcom/goldencode/expr/VariableResolver;
19:  <Utf8> begin
1A:  <Utf8> com/goldencode/expr/CompiledExpression
1B:  <Utf8> com/goldencode/expr/LE0
1C:  <Utf8> com/goldencode/expr/LogicalExpression
1D:  <Utf8> com/goldencode/expr/VariableResolver
1E:  <Utf8> evaluate
1F:  <Utf8> java/lang/Long
20:  <Utf8> longValue
21:  <Utf8> resolveToLong
22:  <Utf8> resolver
23:  <Utf8> union

Interface Count:  0

Field Count    :  0

Method Count   :  2
0:  <Method> ACC_PUBLIC <init>()V 
    <Code> Max stack: 1, Max locals: 1
    5 bytes of code:
       0000  0x2A <aload_0        >
       0001  0xB7 <invokespecial  > 0009 [<Method> com/goldencode/expr/LogicalExpression.<init>()V]
       0004  0xB1 <return         >
1:  <Method> ACC_PUBLIC evaluate()Z 
    <Code> Max stack: 4, Max locals: 4
    65 bytes of code:
       0000  0x2A <aload_0        >
       0001  0xB4 <getfield       > 0008 [<Field> com/goldencode/expr/CompiledExpression.resolverLcom/goldencode/expr/VariableResolver;]
       0004  0x4C <astore_1       >
       0005  0x2B <aload_1        >
       0006  0x12 <ldc            > 02 [<String> union]
       0008  0x4D <astore_2       >
       0009  0x2C <aload_2        >
       000A  0xB9 <invokeinterface> 000B [<InterfaceMethod> com/goldencode/expr/VariableResolver.resolveToLong(Ljava/lang/String;)Ljava/lang/Long;] 02 00
       000F  0x4E <astore_3       >
       0010  0x2D <aload_3        >
       0011  0xC7 <ifnonnull      > 0006 [dest:0017]
       0014  0xA7 <goto           > 0029 [dest:003D]
       0017  0x2D <aload_3        >
       0018  0xB6 <invokevirtual  > 000A [<Method> java/lang/Long.longValue()J]
       001B  0x88 <l2i            >
       001C  0x99 <ifeq           > 0021 [dest:003D]
       001F  0x2B <aload_1        >
       0020  0x12 <ldc            > 01 [<String> begin]
       0022  0x4D <astore_2       >
       0023  0x2C <aload_2        >
       0024  0xB9 <invokeinterface> 000B [<InterfaceMethod> com/goldencode/expr/VariableResolver.resolveToLong(Ljava/lang/String;)Ljava/lang/Long;] 02 00
       0029  0x4E <astore_3       >
       002A  0x2D <aload_3        >
       002B  0xC7 <ifnonnull      > 0006 [dest:0031]
       002E  0xA7 <goto           > 000F [dest:003D]
       0031  0x2D <aload_3        >
       0032  0xB6 <invokevirtual  > 000A [<Method> java/lang/Long.longValue()J]
       0035  0x8A <l2d            >
       0036  0x14 <ldc2_w         > 000C [<Double> 4.68E7]
       0039  0x98 <dcmpg          >
       003A  0x9B <iflt           > 0005 [dest:003F]
       003D  0x03 <iconst_0       >
       003E  0xAC <ireturn        >
       003F  0x04 <iconst_1       >
       0040  0xAC <ireturn        >

Attribute Count:  0

Known Limitations (TBD - update)

The expr package has a number of known limitations. Some are the result of conscious design decisions, since the overriding concern in the development of this package was high performance. Others are bugs or unwelcome side effects of the implementation which may be addressed with future development.

Compiled expression objects not thread-safe

Compiled expression objects should not be used simultaneously from multiple threads. The ArithmeticExpression.compute() and LogicalExpression.evaluate() methods are not synchronized, as synchronization is expensive and is not needed in many use cases. In any case, synchronization only at the level of these methods would not be enough to ensure thread safety, since the VariableResolver itself is the most important component whose state must be synchronized to ensure the integrity of an expression result. Where it is necessary at all, synchronization must be done at the application level.

This limitation is the result of a conscious design decision; it will not be removed in future development of the package.

Package com.goldencode.expr

Package com.goldencode.expr Description

Contents

Introduction

Why compile expressions?

The Trade-Offs

Expression Types

General Syntax

Variables

Variable Syntax

User Functions

User Function Syntax

Variable Length Argument Lists

Method Invocation

Method Invocation Syntax

Variable Length Argument Lists

Symbol Resolution

Scope

Callback Libraries

Constants

Security

Resolution Error Handling

Supported Operators

Resolving Symbols (TBD - update)

How Resolution Errors are Handled

Arithmetic Expressions

Logical Expressions

Test Driver Example (TBD - obsolete)

Expression Compilation (TBD - update)

Known Limitations (TBD - update)

Compiled expression objects not thread-safe