com.goldencode.p2j.pattern (P2J - Progress 4GL to Java Conversion and Runtime)

Interface Summary
Interface	Description
DebugLevels	Defines the list of possible debug levels used in the pattern engine.
GraphNodeFilter	The implementation class name of this interface is passed to the `PatternEngine` via the `graph-node-filter` configuration parameter.
PatternWorker	Each pattern engine worker class which provides variable and/or user function resolution support must implement this interface to perform initialization and termination processing, and to receive notifications from the pattern engine of processing events, such as a new AST being processed, or a new level of a call tree being processed.
RuleListElement	Represents a generic element in a list of rules/rule containers.
SourceFileHelper	Provides an API for an object that helps copy and link (from statistics reports) to source code.

Class Summary
Class	Description
AbstractPatternWorker	Provides a default implementation of the most common initialization and termination logic for pattern engine worker classes.
AstSymbolResolver	Provides symbol resolution of fundamental AST properties, user functions, and actions.
AstWalker	Traverses an AST hierarchy or an arbitrary iteration of AST nodes, and applies a set of rules against each source-copy AST node pair encountered during the walk.
CommonAstSupport	Provides common symbol resolution and function support for AST processing.
CommonAstSupport.WorkArea	Context local work area.
ConfigLoader	Loads pattern engine configuration profile data into the pattern engine from one or more XML documents.
CopySpec	Defines a file copy operation (source, target and type of copy).
DataSetRelation	Holds the details about a dataset relation, for a SOAP operation's dataset parameter.
DictionaryWorker	Exports named and scoped dictionaries to pattern engine rules supporting any number of user controlled scoped dictionaries that map names to any of the 4 supported pattern engine types (string, long, double and boolean).
DictionaryWorker.WorkArea	Context local work area.
FileOperationsWorker	Provides useful file operations for AST processing.
LegacyProxyClientConfig	Specifies details about an external program exposed as a legacy proxy open client, to generate its Java code.
LegacyProxyConfig	Configuration for programs exposed as legacy open client proxy programs, for which a Java class will be generated.
Match	A simple container class to store and access file and text related information about a single specific match to a condition within a specific file.
NamedFunction	Provides a container to manage, store and execute a named function.
PatternEngine	An engine which applies rules to the nodes of an abstract syntax tree (AST), in order to analyze or transform it, based upon patterns detected in the AST data.
PatternEngine.StorageKey	Container for an arbitrary category and id that is supposed to be used as a key for the `PatternEngine.storedObjects` map.
PropsWorker	Provides a set of user-defined functions for accessing system properties.
RestConditionVar	This variable is used to evaluate a runtime-level expression, and check if a certain `RestParameterMapping` applies to a specific resource, operation and verb.
RestLegacyParameter	This maps a legacy parameter configured in the service's `.restoe` file.
RestOperation	Holds the definition of a REST operation, as it appears in the resourceModel.xml file.
RestParameter	This information is not yet used - in resourceModel.xml, the `prgs:operation` has `prgs:param` children, with attributes like `name="value" type="PATH"`.
RestParameterMapping	Holds a list of mapping rules, which apply to a certain condition.
RestParameterMappingRule	From the `mapping.xml` file, holds a pair of source and target values, defining the mapping rule, as set in the `conditional:conditionalRuleSet/conditional:if/mapping:rule` XML node.
RestResource	Class identifying a REST resource from `resourceModel.xml`.
Rule	A single rule which defines a list of zero of more actions and/or nested rules to be performed if this rule's condition is met.
RuleContainer	Abstract base class for concrete implementations of rule containers.
RuleSet	A rule container implementation which adds a choice of input mode.
RulesTracing	The class implements support for simple tracing of rules processing.
RulesTracing.Location	Holds the positional information.
SearchTrees	Special-purpose driver for the pattern engine to search a list of ASTs for matches to a given expression.
ServiceSupport	Interpret and create supporting data, to be able to create `LegacyService` annotations, for any legacy service defined in 4GL code.
ServiceSupport.WorkArea	State which must survive for the duration of the conversion.
SoapConfig	Loads a .wsm file and computes the SOAP operations exported by this file.
SoapOperation	Definition of a SOAP operation exported via .wsm files.
SoapOperationParameter	Definition of a SOAP parameter exported via a .wsm file.
SoapWsdl	Generate the WSDL file for all services exported by a .wsm file.
StatisticsHelper	Provides a set of methods to calculate statistical values based upon the data provided.
TemplateWorker	Provides a set of user-defined functions for accessing libraries of pre-defined AST subtrees.
TemplateWorker.WorkArea	Context local work area.
WebHandlerConfig	Configuration for a WebHandler, as it was generated in the `openedge.properties`, at deploy time.

Enum Summary
Enum Description

CopySpec.Type
Operation type constants.

Enum Summary
Enum	Description
CopySpec.Type	Operation type constants.

Exception Summary
Exception	Description
CommonAstSupport.UserGeneratedException	Indicates a P2J internal error was encountered during pattern engine processing.
Rule.BreakException	Specialized exception type thrown by a break action to terminate a loop within a while rule.
Rule.ContinueException	Specialized exception type thrown by a continue action to continue a loop within a while rule.
TreeWalkException	Exception type for an error which occurs during a `PatternEngine` tree walk.

Package com.goldencode.p2j.pattern Description

A framework to post-process abstract syntax trees (ASTs) using pattern-matching techniques, for the purpose of examination, analysis, transformation, and ultimately conversion from one programming language's syntax to another's.

Author(s)	Eric Faulhaber Greg Shah
Date	November 1, 2007
Access Control	CONFIDENTIAL

Introduction

Rules
Symbol Resolver
Rulesets
Pipeline
Pattern Workers
Event Model
Why Use the Pattern Engine?

Package Class Hierarchy
High Level Flow
Configuration

Pipeline and Ruleset Profile File Format
Developing Rule Sets
How to Use Scoping During a Tree Walk

Introduction

An abstract syntax tree (AST) is a powerful representation of the source code of a program, in that it imposes a well-known, programatically accessible structure upon the logic and language statements of a program's source code. ASTs are built by language parsers or by other means. The purpose of the pattern package is to provide a processing engine which is applied against ASTs to detect structural and data patterns, and to perform actions conditionally, according to the needs of the user. Given a definition which includes, at a high level:

a programming language context;
instructions to access the ASTs to be processed; and
a pipeline of sets of rules to apply to the ASTs;

the pattern processing engine, or pattern engine, allows one to inspect, analyze, interpret and/or transform a group of ASTs in diverse ways.

Rules

At the core of the pattern engine is the notion of a rule. A rule defines a logical condition which must evaluate to true or false at runtime. Each rule may define zero or more actions, which are executed if and only if the rule's condition evaluates to true. An action can do anything a Java method can do; typical uses for actions are to record information for later use, annotate an AST node, edit an AST's structure, or to provide feedback to the user.

A rule generally is applied to a single node of an AST, though some rules (those used for global initialization and termination processing for a session of the pattern engine) are processed outside of the scope of any AST. The current node for a given rule can be accessed using this or copy. This corresponds to the source and target trees where this is the source (read-only) node that should not be changed and copy is the target (writable) node that can be changed. Both of these special variables are of type com.goldencode.p2j.uast.Aast and one can call any public method directly on these variables.

A rule's condition and its actions are each represented as logical, algebraic expressions in infix notation (e.g., a + b = c), which is a natural convention for most people. It is possible to embed constants, variables, functions, and a variety of operators within these expressions. An action typically utilizes one or more function invocations within its expression, since an algebraic comparison in and of itself does not invoke any behavior beyond simple evaluation to a boolean result. A symbol resolver is used to substitute runtime data for the constants, variables, and function invocations in an expression. This data may be extracted from the environment, from user-provided variables, or from an AST.

See this document for the syntax of the expression language.

Symbol Resolver

A specialized symbol resolver is used by the pattern engine. This symbol resolver is aware of the properties and general structure of an AST. Thus, expressions can reference properties of the AST, such as its token type, its text, its source filename, etc. Furthermore, the symbol resolver can answer questions about the structure of an AST via user functions, such as who its ancestors, siblings, and descendants are. The pattern engine inspects an AST, updating the state of the symbol resolver as it visits each node in the AST, such that the symbol resolver always has the correct AST node in context when a rule's expressions are applied.

Rulesets

Rules are organized into rulesets. Rulesets have a notion of the programming language tokens they expect to encounter within the ASTs they are required to process. They optionally may define a library of named expressions or functions which can be executed or invoked within expressions by an their name.

Rulesets differentiate between six types of rules, each of which is applied at a different time for a different purpose:

init-rules
walk-rules
descent-rules
ascent-rules
next-child-rules
post-rules

A ruleset is applied against each AST being processed by a run of the pattern engine.

Init-rules are run once at the beginning of the inspection of an AST hierarchy. They are applied in a pre-defined order, within the context of the root AST node of that hierarchy.

Walk-rules are run once per AST node visited. As the tree is inspected, each node in the tree (or in a filtered view of the tree) is visited by the engine. The entire list of walk-rules is applied in a pre-defined order, within the context of each node, before the next node is visited.

Descent-rules are run in between the walk-rules for a parent node and the walk-rule for that parent's first child.

Ascent-rules are run in after all of a parent's last child's children nodes are done being processed. This is the moment when the tree walk would naturally "ascend" back up the tree to the parent node.

Next-child-rules are run in before the walk-rules for rightmost of two sibling nodes, after all the child nodes of the leftmost sibling have been processed. This is equivalent to when the tree walk would move between the two sibling nodes.

Post-rules are run once at the end of the inspection of an AST hierarchy. They are applied in a pre-defined order, within the context of the root AST node of that hierarchy.

Each ruleset actually processes a related pair of AST hierarchies: a read-only, source AST and a writable, target AST. The target AST is simply a deep copy of the source AST. The source AST is used to drive the node-visiting logic, and as such, should not be modified by a rule's actions. The target AST is intended to be modified as a ruleset sees fit.

Pipeline

Rulesets may be chained together into a pipeline, such that the writable, target AST modified by one ruleset becomes the read-only, source AST available to the next ruleset in the pipeline. Additionally, each ruleset has the opportunity to create a filtered view on its target AST, to which it adds only those AST nodes which meet some criteria defined by its rules. This view is also made available to the next ruleset in the pipeline, which has the option to iterate over the filtered view only, or over the entire tree. The designation to iterate over the view vs. the tree is known as the ruleset's input mode.

Pattern Workers

The implementation of the functionality available to the expressions defined within rules is provided by pattern worker objects. These objects perform the grunt work of exposing variables and user functions, such that they can be used within expressions. They may also be used to implement actions for the purpose of collecting, maintaining, and otherwise using information specific to a particular set of functionality, such as statitistical analysis or AST transformation. It is the pattern worker objects that provide the implementation behind the symbol resolver's ability to expose AST properties and structural relationships, for instance, or to collect statistics about language structure.

Certain pattern workers are generally useful across a wide range of applications. These are registered directly by the pattern engine as default workers. The following are loaded by default:

Event Model

The pattern engine is a tree processing framework where the engine handles the details of walking the tree and the programmer uses hooks or callbacks that are triggered based on specific events that occur during the walk. The pattern engine is programmed using a "rule-set" or a combination of rule-sets that are executed in series. These programs are written in an XML-based markup language. Each rule-set is separated into sections that define the type of tree walk event in which that section is executed. There are 6 different event types:

Event Type	Occurs	Node in Scope	Notes
<init-rules>	once for each tree, before any other rule executes	tree root	Useful for initialization logic, especially for variables that are unique per-tree (a single instance is used to track state across an entire tree walk.
<walk-rules>	once for each node, this is the "primary" visit to the node	the current node being visited	Most logic is inserted in these rules. Anything that does not depend upon the order of processing children is usually placed here.
<descent-rules>	when the tree walk descends from a parent to its first child (it occurs after the parent has gotten its <walk-rules> and before the child gets its <walk-rules>)	the parent node	Very useful for implementing scope-aware data structures (such as a stack). Scopes are normally pushed (added) on descent. This never occurs for nodes that have no children.
<next-child-rules>	when the tree walk moves between siblings (it occurs after the current [left] sibling has gotten its <walk-rules>, and after all processing is complete for that left sibling's descendants and before the next [right] sibling gets its <walk-rules>)	the parent node	In rule-sets that convert a tree to a human readable form, these rules are highly useful for implementing differential processing based on the index of the child being processed (the "nextChildIndex"). For example, in converting the tree form of a Java method call (assume that the first child is the Object instance upon which the method is called) into a source code version, one might need to output a ".methodName(" after the first child has been processed but before all other children are processed. This is generally useful in any case where processing needs to occur between the processing of any two children. This never occurs for nodes that have no children.
<ascent-rules>	when the tree walk ascends from the last child to its parent (it occurs after the last child has gotten its <walk-rules> and after all processing is complete for the last child's descendants)	the parent node	Very useful for implementing scope-aware data structures (such as a stack). Scopes are normally popped (deleted) on ascent. In rule-sets that convert a tree to a human readable form, these rules are highly useful for closing processing after all children are processed. For example, in converting the tree form of a function call (assume that the children represent function call parameters) into a source code version, one might need to output a closing ")" after all parameters are emitted. This never occurs for nodes that have no children.
<post-rules>	once for each tree, after all other rules have executed	tree root	Useful for termination logic, especially for implementing any filesystem or database persistence that is done on a per-tree basis.

To hook a particular event for a specific node, just insert a rule that matches that node using its logical condition. Then inside this rule, you can put any combination of actions and nested rules (which can have any combination of actions and rules inside it). While the pattern engine will visit every node (and trigger every one of the actions as needed), if you have no matching expressions to a given node, the node will be ignored.

The following diagram shows a trivial example. The legend lists each event in the order that the events are triggered. The event number matches the numbers on the tree nodes and the arrows. The arrows designate events that are triggered in between <walk-rules>, while moving between the visits of two related nodes. The numbers in a node specify the case where there is a specific visit to a node (<init-rules>, <walk-rules> and <post-rules>).

Why Use the Pattern Engine?

The pattern engine is a tree processor. It walks every node in the tree in a very defined manner (as shown by the previous diagram). A rule set is really a set of call-backs that are invoked in response to events occuring during the tree walk. So, if a rule set has walk-rules, each one is executed (in the order of occurance) when the walk of any node triggers the walk-rules "event". For each walk-rule in that rule set, the condition is evaluated and if it evaluates true, then the body (actions or rules) of the rule will trigger (and so on as things are nested inside). But if a rule set has no rules that evaluate true for a given node, then that node is essentially ignored by the given rule set, even though it will be processed by the engine.

There is support for if/else conditional processing (this is the purpose of a <rule> which has a condition and then actions, rules and loops that can be used on either false or true) and loops (<while> which can be easily used to also simulate a for). Along with the variable support included (any kind of object can be instantiated or obtained and then all of its methods are available in rules and actions) and the assignment operator support, there is little that cannot be done directly in a rule set.

There is great power in letting the pattern engine handle the tree walking. The tree processing logic is expressed in a very compact form (much smaller than the corresponding Java program that implements the same logic) while still taking advantage of nearly every feature of the Java language. The only Java features that are not provided: array subscripting, static methods and some operators like "new".

The pattern engine removes the duplication of tree walking code that would be required in any set of Java programs that hard-code tree processing. This reuse reduces development time, makes the programs standardized, leverages tested/debugged/documented code and allows the developer to focus on the logic of the pattern matching and the objectives of what to do when a pattern is found.

Package Class Hierarchy

The primary classes of interest are those that correspond with the constructs described in the introduction:

Rule is the implementation of the conditional rule.
AstSymbolResolver is the implementation of the symbol resolver.
RuleSet is the implementation of the ruleset.
PatternEngine is the implementation of the pattern engine, and also manages the ruleset pipeline.
PatternWorker is the interface which all pattern worker objects implement. AbstractPatternWorker is a partial implementation of this interface to simplify the creation of concrete pattern worker implementations.

High Level Flow

The PatternEngine is launched as a command line driver, or it may be constructed, configured, and invoked programmatically. When launched from the command line, a configuration profile must be specified. Processing is invoked using the run method, which accepts the name of the profile to load and execute. The logic flow is as follows:

begin
perform global initialization processing:
- create a new symbol resolver
- initialize and register default worker objects
- load the pattern configuration profile
- set a default specification to find ASTs if not provided by profile
- apply global, user-defined init-rules
for each AST to be processed:
- for each ruleset defined in the pipeline:
  - make a deep copy of the source AST hierarchy; this copy becomes the editable copy for the current ruleset pass and becomes the read-only source AST for the next ruleset in the pipeline
  - perform ruleset-specific, initialization processing:
    - configure the symbol resolver for the current ruleset
    - set the source and copy root ASTs as the current ASTs in the symbol resolver
    - apply user-defined init-rules for the ruleset
  - walk the AST (or the filtered view of results of previous ruleset) and generate the proper events
  - perform ruleset-specific termination processing:
    - set the source and copy root ASTs as the current ASTs in the symbol resolver
    - apply user-defined post-rules for the ruleset
  - use the copy AST from the last ruleset as the source AST for the next ruleset
perform global termination processing:
- apply global, user-defined post-rules
- call each registered worker object's termination processing hook
- null out the symbol resolver
end

Configuration

A pattern processing engine configuration profile defines everything necessary for the engine to process one or more ASTs against a pipeline of rulesets. A profile is defined in one or more XML documents. A master document, the pipeline profile document, defines:

the ASTs to be processed;
the default programming language to use when processing rulesets;
the pipeline of rulesets;
global initialization and termination rules to be applied at the beginning and end of the processing session, respectively.

The pipeline profile may refer to zero or more ruleset profile documents. A ruleset profile document defines:

the rules to be applied against each AST;
the programming language to use when resolving language token constants;
an optional library of named expressions which may be referenced by alias from other expressions;
an optional list of other ruleset profiles from which public, named expression libraries are to be included for the current ruleset's use.

Pipeline profile documents always have a filename extension of xml. Ruleset profile documents always have a filename extension of rules. A ConfigLoader object is responsible for parsing the pipeline profile document, for resolving any references to external ruleset profile documents, and for configuring the pattern engine using all of the information extracted from the resulting network of documents.

Pipeline and Ruleset Profile File Format

The same general XML grammar is used for both the master profile, which drives the entire pipeline of ruleset processing, and for each, individual ruleset within the pipeline. A pipeline profile must use the .xml file name extension; a ruleset profile must use the .rules file name extension. The ruleset profile file format is a subset of the pipeline profile format. The following describes the grammar:

Name Scope Num Children Description

cfg P 1 ast-spec worker include variable init-rules descent-rules rule-set ascent-rules post-rules Outermost containing element (root) of the pipeline profile document.

No attributes are defined for this element.

ast-spec

none

Defines a directory and file specification used to find a list of either (a) persisted AST files to process; or (b) root node entries in a call graph. If no such element is provided, such a specification must be provided at the pattern engine command line; otherwise, no processing will occur.

Attribute	Required	Description
`directory`	yes	An absolute or relative path to a directory which is the root of a directory tree containing .ast files (persisted ASTs), or which contains one or more .graph files (call graph definitions). If relative, the directory will be processed relative to the value of the P2J_HOME system property.
`file-spec`	yes	A regular expression used to identify the names of source files (i.e., `.p` or `.w` files) to be processed. The appropriate `.ast` or `.graph` file name extension will be appended automatically, depending upon the pattern engine mode of operation.

worker

none

Defines a PatternWorker implementation for the pattern engine to load for a specialized task. Several pattern workers are loaded by default to provide standard services. This element gives the user the opportunity to load additional worker objects, to provide additional variables and/or user functions for a particular purpose.

Attribute	Required	Description
`class`	yes	The fully qualified name of the `PatternWorker` implementation class to load within a `worker` element.

include

P/R

none

Loads a function library from the given rules filename. Whether this filename is unqualified or partially qualified, each segment in the pattern path is searched for the first file that matches. That file is loaded and any public func-library element in that file is made accessible.

Attribute	Required	Description
`name`	yes	The unqualified or partially qualified name of the rules file to load.

func-library

P/R

expression

function

Provides a set of named expressions or functions which can be used in the current rule file or imported into an external rule file for use.

Attribute	Required	Description
`access`	no	The string "public" or "private". Private access means that the function library can only be accessed from within the same rules file. Public access allows the function library to be loaded (and then accessed) by any rules file. By default a function library is public.

expression

P/R

none

Use of this element is discouraged. This should be considered deprecated.

Contains a single expression which is evaluated unconditionally. This is similar to an action element, except:

actions cannot be named
actions can be set to execute based on the boolean result of an enclosing rule element, expressions cannot do this

When directly contained inside of a func-library element, a named expressions is the equivalent to a named function, except that the expression can only contain a single expression and none of the structure or contained elements of a function can be used.

Attribute	Required	Description
`name`	no	The name of the expression. This is the first parameter (a a string literal) to execLib() or evalLib().

function

P/R

parameter

variable

return

rule

while

expression

Defines a named block of logic that:

Is only invoked explicitly by the use of CommonAstSupport execLib() and evalLib() methods. This code does not get implicitly called based on tree walking.
Can be invoked at any time in an expression, from any action or rule (even from other functions).

The logic can be specified as a single expression or as a set of one or more contained rule, while or expression elements. If a single expression is specified, this is done without any contained elements, instead the text of the function element itself must be a single valid expression that can be evaluated to a result. This non-element mode for defining a function is considered deprecated and should not be used for new functions.

By default, if no return element is defined, the return value is the same as the last executed expression in the function (whether this expression is directly specified in the function element's text or whether this is a contained rule, action, expression or while element).

Variables that are in enclosing scopes can be accessed but this is discouraged. The function element is considered a new scope and can hide variables which have the same name in enclosing scopes.

Some problems to note:

Recursion is not allowed at this time.
Actions can't be embedded directly in a function, but can be embedded inside a rule in a function.
Pattern worker access is based on the worker definitions in the calling rule set NOT the rule set in which the function is defined.

Attribute	Required	Description
`name`	yes	The name of the function. This is the first parameter (a a string literal) to execLib() or evalLib().

variable

P/R

none

Declares a user variable to use in rule expressions. Variables are global across rulesets if defined in the scope of a pipeline profile. Variables are local to an individual ruleset if defined within the scope of a ruleset profile.

Attribute	Required	Description
`name`	yes	Name which will be used to refer to variable within user expression tests and assignments.
`type`	yes (if init does not exist)	Data type of the variable. Valid entries are `int, long`, `double`, `string, boolean` or any fully qualified Java classname.
`init`	yes (if type does not exist)	An expression which when evaluated yields a result that is the initial value (and type) for the variable.

parameter

P/R

none

Declares a parameter to use in a named function. This is equivalent to a local variable, except the value is assigned by the caller in the parameter list (2nd and subsequent parameters to the execLib() and evalLib() functions). Note that the order, type and number of the parameters in the function definition must match those passed in the function invocation.

Attribute	Required	Description
`name`	yes	Name which will be used to refer to parameter within the function.
`type`	yes (if init does not exist)	Data type of the parameter. Valid entries are `int, long`, `double`, `string, boolean` or any fully qualified Java classname.
`init`	yes (if type does not exist)	An expression which when evaluated yields a result that is the initial value (and type) for the parameter.

return

P/R

0 or 1

none

Declares a local variable, whose value will be returned to the caller of the function when the function completes. This also determines the type of the function's return value.

Attribute	Required	Description
`name`	yes	Name which will be used to refer to return variable within the function.
`type`	yes (if init does not exist)	Data type of the return value. Valid entries are `int, long`, `double`, `string, boolean` or any fully qualified Java classname.
`init`	yes (if type does not exist)	An expression which when evaluated yields a result that is the initial value (and type) for the return variable.

rule-set

P/R

expr-library


include


variable


init-rules


descent-rules


walk-rules


ascent-rules


post-rules

rule-set

Defines a set of rules and resources used by the pattern engine to process the information stored within an AST. Rulesets which are tightly coupled to the purpose of the pipeline's task may be defined inline, within the main pipeline profile document. Alternatively, more general purpose, reusable rulesets may be defined individually, within their own .rules files.

If defined within a .rules file, this element is the outermost containing element (root) of the profile document (rather than having an outermost cfg element).

If defined within a .xml file (the master profile document), this element can either define an entire ruleset inline, or refer to an external ruleset defined within a .rules file. If the element contains the name attribute (see below), it refers to an external ruleset profile. Otherwise, it is expected that an inline ruleset definition follows the opening tag.

This element defines a new scope for the purpose of resource access such as variables. Resources in enclosing scopes can be accessed but enclosing scopes cannot access resources defined within this element.

Attribute	Required	Description
`load-condition`	no	This attribute is processed only when the `rule-set` element is not the root element (to allow a function library to be included by other files unconditionally). It contains a comma-separated list of flags which will be checked only when conditional rule-set processing is activated (the `PatternEngine.conditionalRuleSets` field is set to `true`). When conditional rule-set processing is activated, it will process all the rule-sets which have at least one `load-condition` flag matching one from the `PatternEngine.activatedFlags` set. The following APIs from the `PatternEngine` class are related to conditional rule-set processing: `void setConditionalRuleSets(boolean)` to enable or disable conditional rule-set processing. `boolean isConditionalRuleSets()` to check the current state of conditional rule-set processing. `void addConditionalFlag(String)` to add a new flag to the `PatternEngine.activatedFlags` set. `void removeConditionalFlag(String)` to remove a flag from the `PatternEngine.activatedFlags` set. `boolean isFlagActivated(String)` when conditional rule-set processing is enabled, it will check if the given flag is part of the activated flags. `boolean activeFlags(Set)` when conditional rule-set processing is enabled, it will check if any flag from the given set is part of the activated flags.
`name`	no	If present, this attribute defines the name of an external ruleset definition which is included by reference in the master pipeline profile. It should not include the `.rules` file name extension. If this attribute is present, any information contained within the opening and closing `rule-set` tags is ignored. There are 2 ways to use this external reference: Top-Level - when the rule-set element is contained directly within a cfg element, this is considered a top-level rule-set. Such a rule-set will have its contents executed in its entirety with a tree walk that is specific to that rule-set. Any other top-level rule-set elements in the same cfg element will be executed in order. This provides the pipelining feature to the master profile document. In a file with 2 top-level rule sets, the first rule set will be processed through a complete tree walk (with all nodes being visited and all events being triggered) before any processing in the second rule set will occur. Merge - when the rule-set element is contained within another rule-set element, this has the effect of merging the named rules file into the containing rule-set element as if those rules were defined there directly. This allows rules to be split into multiple different rules files but then walked as a combined set. In such a case, each of the merged rule sets will have their events triggered in the same order as the rule-set elements are merged. In a rule-set that contains merged rule-set elements, the first event of the tree walk will be processed for each merged rule-set in turn and then the next tree walk event will be processed for each rule-set in turn and so on, until the tree walk is complete. These rule-set elements are really part of a simultaneously executing single tree walk that occurs for the top-level rule-set element. If this attribute is not provided, an inline ruleset definition is assumed.
`input`	no	Defines the type of input for this ruleset to process. Acceptable values are `tree` or `view`. A value of `tree` indicates that this ruleset should be applied against the entire AST hierarchy, regardless of the output of the previously processed ruleset in the pipeline. A value of `view` indicates that this ruleset should be applied only against the individual AST nodes which the previous ruleset in the pipeline added to its filtered view of ASTs. If this attribute is not provided, a default value of `tree` is assumed.
honor-hidden	no	If present, this boolean value determines if the tree walk will include nodes that have their hidden flag set to true. An AST can have its hidden flag set by com.goldencode.p2j/uast.Aast.setHidden(). If honor-hidden is true, then hidden nodes will NOT be included in the tree walk. Note that "manual" dereferencing of nodes (e.g. this.getChildAt(4)) will still return hidden nodes.

init-rules P/R 0-1 rule A list of rules to run as initialization processing, either globally across all rulesets in the pipeline, or locally, within an individual ruleset. If defined within the pipeline profile document, the init rules are global; if defined within a ruleset profile document, the init rules are local to that ruleset.

Global init rules are applied once per pipeline, after pattern workers are initialized, but before any individual rulesets are processed. No AST is in scope for purposes of rule expression processing when the global init rule list is applied.

Local init rules are applied each time a ruleset is applied against an AST, before any of the ruleset's walk rules are applied. The root node of the AST being processed by the current ruleset is in scope for purposes of rule expression processing when a local init rule list is applied. So, any rule expressions which refer to properties of an AST will use the root AST to resolve AST property variables or to apply changes against the AST.

No attributes are defined for this element.

descent-rules P/R 0-1 rule

walk-rules R* 0-1 rule A list of rules to apply against an iteration of AST nodes. Depending upon the value of the input attribute of the enclosing rule-set element, this list of rules is either applied against every AST node in the current AST hierarchy, or against only those AST nodes included in the filtered view defined by the previous ruleset in the pipeline.

No attributes are defined for this element.

ascent-rules P/R 0-1 rule

post-rules P/R 0-1 rule A list of rules to run as termination processing, either globally across all rulesets in the pipeline, or locally, within an individual ruleset. If defined within the pipeline profile document, the post rules are global; if defined within a ruleset profile document, the post rules are local to that ruleset.

Global post rules are applied once per pipeline, after all individual rulesets have been processed, but before pattern worker termination processing takes place. No AST is in scope for purposes of rule expression processing when the global post rule list is applied.

Local post rules are applied each time a ruleset is applied against an AST, after any of the ruleset's walk rules are applied. The root node of the AST being processed by the current ruleset is in scope for purposes of rule expression processing when a local post rule list is applied. So, any rule expressions which refer to properties of an AST will use the root AST to resolve AST property variables or to apply changes against the AST.

No attributes are defined for this element.

rule

rule


while


action

Defines a logical condition and an optional list of actions to perform if that condition is met or not met. The condition and actions are each defined as logical expressions. They may contain system/AST variables (exposed by pattern worker objects) or global or local user variables defined using the variable element.

CDATA may be used to contain the condition however, it is common to put the condition directly in the element text. Note that a rule with no contained elements is the functional equivalent of an action.

This element defines a new scope for the purpose of resource access such as variables. Resources in enclosing scopes can be accessed but enclosing scopes cannot access resources defined within this element.

Attribute	Required	Description
`on`	no	If `true`, this action is executed if the enclosing rule's condition evaluates to `true`. If `false`, this action is executed if the enclosing rule's condition evaluates to `false`. If not provided, defaults to `true`.

while

rule
while
action

This is a looping construct. While the logical condition is true, the optional list of rules, whiles and actions are performed. Any contained elements which use the on="false" attribute are never executed as the concept of a while loop inherently precludes this. The condition is defined as a logical expression. They may contain system/AST variables (exposed by pattern worker objects) or global or local user variables defined using the variable element.

Attribute	Required	Description
`on`	no	If `true`, this action is executed if the enclosing rule's condition evaluates to `true`. If `false`, this action is executed if the enclosing rule's condition evaluates to `false`. If not provided, defaults to `true`.

action

none

Defines an action to perform in the event the enclosing rule's condition is met or not met.

Attribute	Required	Description
`on`	no	If `true`, this action is executed if the enclosing rule's condition evaluates to `true`. If `false`, this action is executed if the enclosing rule's condition evaluates to `false`. If not provided, defaults to `true`.

TODO: provide example of each level of profile.

Developing Rule Sets

Let the pattern engine handle the tree traversal. As a developer of rule sets, you must only insert call-backs for those events that you care about. So you don't control the traversal, you instead control which events matter. A node is "skipped" by simply never matching it in the condition portion of the rule. This is simply a matter of carefully designing the rules such that they can only match the exact tree construct that you are looking for, and no others.

This is where it becomes important to review the related nodes and constructs in the parser that creates the tree since this is *the* authoritative source of what is *possible*. The general approach to identifying the pattern:

Identify the token types that you are looking for. Writing a small testcase, running it through the parsere and then reviewing the generated AST is a good way to do this.
Identify the location(s) in the parser at which the construct you are trying to match is created. This shows you the possible structure (or context) in which the node(s) will be found.
Search in the parser source for all other instances of the token types you are matching on. This will tell you if you need to make your rule conditions more specific in order to avoid false positives (matching where a match really doesn't exist). Many input data streams are heavily context sensitive. This means that the same tokens can be used differently based on context. In such context sensitive input streams, the matching rules generally must be more specific, often have a condition that tests data from multiple related nodes at once (e.g. the current node, the parent node, the parent's parent...).

Draw the pattern (the tree relationship/structure) out on paper as the above steps are taken. Once the unique pattern is defined, encoding this in rule form is usually very straightforward. Then add logic in the body of the rule to do whatever it is that is the objective of the rule.

How to Use Scoping During a Tree Walk

In many cases, one may need to keep a stack or a scoped dictionary of data to keep track of data which is inherently scoped. For example, in traditional languages the meaning of a particular symbol is different depending upon the subtree which is being visited. Due to the nesting features of many languages, it is often the case that there can be an arbitrary number of scoping levels.

Often a simple stack can be used for this, but a scoped dictionary is especially useful if you have > 1 variable which has the same scoping semantics. This is the more general case, so the following text will show how that works.

The DictionaryWorker is already loaded as one of the base workers so it doesn't need to be added manually. Examine this class to understand how it works. The key point is that each dictionary is named using an arbitrary string and you may have any number of variables stored in this named dictionary. Each variable is also named using a string and can be gotten, put and queried to see if the dictionary contains such a variable. A stack of maps is used and each level in the stack is a different scope. You can add and delete scopes (maps) and when a variable is accessed, the stack of maps is searched from the top down until the variable is found or there is nothing more to search.
You may want to "clear" your dictionary at a known point, especially if you are processing multiple trees. In this example, 2 variables named "process" and "num" are "prepopulated" with the value 0:

<init-rules>
<rule>deleteDictionary("my_scoped_dictionary")</rule>
<rule>addDictionaryLong("my_scoped_dictionary", "process", #(long) 0)</rule>
<rule>addDictionaryLong("my_scoped_dictionary", "num", #(long) 0)</rule>
</init-rules>

Find the correct place to add scopes. This is typically in a walk-rule or in a descent-rule. You do this to ensure that anything deeper in the tree will see the latest version of the variables. In this examples, new versions of the variables are added to the new scope, though this could be split into rules triggered elsewhere.

<walk-rules>
<rule>type == something
<action>addScope("my_scoped_dictionary")</action>
<action>addDictionaryLong("my_scoped_dictionary", "process", getSomeDataHere())</action>
<action>addDictionaryLong("my_scoped_dictionary", "num", getSomeOtherDataHere())</action>
</rule>
</walk-rules>

Find the correct place to delete scopes. This is usually done in an ascent-rule since this is the moment when the subtree rooted at a particular node is known to have completed processing.

<ascent-rules>
<rule>type == something
<action>deleteScope("my_scoped_dictionary")</action>
</rule>
</ascent-rules>

Use the dictionaryContains() or lookupDictionary*() methods to access the data and make decisions. Since these variables are scoped using a stack, this technique cleanly handles any amount of nesting. References to these variables always get the most recent version and when the variables go out of scope they automatcially get cleaned up by deleting the top scope from the stack.

Package com.goldencode.p2j.pattern

Package com.goldencode.p2j.pattern Description

Contents

Introduction

Rules

Symbol Resolver

Rulesets

Pipeline

Pattern Workers

Event Model

Why Use the Pattern Engine?

Package Class Hierarchy

High Level Flow

Configuration

Pipeline and Ruleset Profile File Format

Developing Rule Sets

How to Use Scoping During a Tree Walk