A framework to post-process abstract syntax trees (ASTs) using
pattern-matching techniques, for the purpose of examination, analysis,
transformation, and ultimately conversion from one programming
language's syntax to another's.
Author(s)
|
Eric Faulhaber
Greg Shah
|
Date
|
November 1, 2007
|
Access Control
|
CONFIDENTIAL
|
Contents
Introduction
Package
Class Hierarchy
High Level Flow
Configuration
Introduction
An abstract syntax tree (AST) is a powerful representation of the
source
code of a program, in that it imposes a well-known, programatically
accessible structure upon the logic and language statements of a
program's source code. ASTs are built by language parsers or by
other means. The purpose of the
pattern
package is
to provide a processing engine which is applied against ASTs to detect
structural and data patterns, and to perform actions conditionally,
according to the needs of the user. Given a definition which
includes, at a high level:
- a programming language context;
- instructions to access the ASTs to be processed; and
- a pipeline of sets of rules to apply to the ASTs;
the pattern processing engine, or
pattern
engine, allows one to inspect, analyze, interpret and/or
transform a group of
ASTs in diverse ways.
Rules
At the core of the pattern engine is the notion of a
rule. A rule defines a logical
condition which must evaluate
to true or false at runtime. Each rule may define zero or more
actions, which are executed if and
only if the rule's condition evaluates to true. An action can do
anything a Java method can do; typical uses for actions are to
record information for later use, annotate an AST node, edit an AST's
structure, or to provide feedback to the user.
A rule generally is applied to a single node of an AST, though some
rules (those used for global initialization and termination processing
for a session of the pattern engine) are processed outside of the scope
of any AST. The current node for a given rule can be accessed using
this or
copy. This corresponds to the source and target trees where
this is the source (read-only) node that should not be changed and
copy is the target (writable) node that can be changed. Both of these special variables are of type
com.goldencode.p2j.uast.Aast and one can call any public method directly on these variables.
A rule's condition and its actions are each represented as logical,
algebraic
expressions in infix
notation (e.g.,
a + b = c
), which is a natural convention
for most people. It is possible to embed
constants,
variables,
functions, and a variety of
operators within these
expressions. An action typically utilizes one or more function
invocations within its expression, since an algebraic comparison in and
of itself does not invoke any behavior beyond simple evaluation to a
boolean result. A
symbol
resolver is used to substitute runtime data for the constants,
variables, and function invocations in an expression. This data
may be extracted from the environment, from user-provided variables, or
from an AST.
See
this document for the syntax of the expression language.
Symbol Resolver
A specialized symbol resolver is used by the pattern engine. This
symbol resolver is aware of the properties and general structure of an
AST. Thus, expressions can reference properties of the AST, such
as its token type, its text, its source filename, etc.
Furthermore, the symbol resolver can answer questions about the
structure of an AST via user functions, such as who its ancestors,
siblings, and descendants are. The pattern engine inspects an
AST, updating the state of the symbol resolver as it visits each node
in the AST, such that the symbol resolver always has the correct AST
node in context when a rule's expressions are applied.
Rulesets
Rules are organized into
rulesets.
Rulesets have a notion of the programming language tokens they expect
to encounter within the ASTs they are required to process. They
optionally may define a library of named expressions or functions which
can be executed or invoked within expressions by an their name.
Rulesets differentiate between six types of rules, each of which is
applied at a different time for a different purpose:
- init-rules
- walk-rules
- descent-rules
- ascent-rules
- next-child-rules
- post-rules
A ruleset is applied against each AST being processed by a run of the
pattern engine.
Init-rules are run once at the beginning of the
inspection of an AST hierarchy. They are applied in a pre-defined
order, within the context of the
root
AST node of that hierarchy.
Walk-rules are run once per AST node
visited. As the tree is inspected, each node in the tree (or in a
filtered view of the tree) is visited by the engine. The entire
list of walk-rules is applied in a pre-defined order, within the
context of each node, before the next node is visited.
Descent-rules are run in between the walk-rules for a parent node and the walk-rule for that parent's first child.
Ascent-rules are run in after all of a parent's last
child's children nodes are done being processed. This is the
moment when the tree walk would naturally "ascend" back up the tree to
the parent node.
Next-child-rules are run in before the walk-rules for rightmost of two
sibling nodes, after all the child nodes of the leftmost sibling have
been processed. This is equivalent to when the tree walk would
move between the two sibling nodes.
Post-rules
are run once at the end of the inspection of an AST hierarchy.
They are applied in a pre-defined order, within the context of the
root AST node of that hierarchy.
Each ruleset actually processes a related pair of AST
hierarchies: a read-only,
source
AST and a writable,
target
AST. The target AST is simply a deep copy of the source
AST. The source AST is used to drive the node-visiting logic, and
as such, should not be modified by a rule's actions. The target
AST is intended to be modified as a ruleset sees fit.
Pipeline
Rulesets may be chained together into a pipeline, such that the
writable, target AST modified by one ruleset becomes the read-only,
source AST available to the next ruleset in the pipeline.
Additionally, each ruleset has the opportunity to create a filtered
view on its target AST, to which it
adds only those AST nodes which meet some criteria defined by its
rules. This view is also made available to the next ruleset in
the pipeline, which has the option to iterate over the filtered view
only, or over the entire tree. The designation to iterate over
the view vs. the tree is known as the ruleset's
input mode.
Pattern Workers
The implementation of the functionality available to the expressions
defined within rules is provided by
pattern
worker objects. These objects perform the grunt work of
exposing variables and user functions, such that they can be used
within expressions. They may also be used to implement actions
for the purpose of collecting, maintaining, and otherwise using
information specific to a particular set of functionality, such as
statitistical analysis or AST transformation. It is the pattern
worker objects that provide the implementation behind the symbol
resolver's ability to expose AST properties and structural
relationships, for instance, or to collect statistics about language
structure.
Certain pattern workers are generally useful across a wide range of
applications. These are registered directly by the pattern engine
as default workers. The following are loaded by default:
- CommonAstSupport
- PropsWorker
- DictionaryWorker
- FileOperationsWorker
Event Model
The pattern engine is a tree processing framework where the engine
handles the details of walking the tree and the programmer uses hooks
or callbacks that are triggered based on specific events that occur
during the walk. The pattern engine is programmed using a
"rule-set"
or a combination of rule-sets that are executed in series. These
programs are written in an XML-based markup language. Each
rule-set is
separated into sections that define the type of tree walk event in
which that section is executed. There are 6 different event types:
Event
Type
|
Occurs
|
Node
in Scope
|
Notes
|
<init-rules> |
once for each tree, before any
other rule executes
|
tree root
|
Useful for initialization logic,
especially for variables that are unique per-tree (a single instance is
used to track state across an entire tree walk.
|
<walk-rules> |
once for each node, this is the
"primary" visit to the node
|
the current node being visited
|
Most logic is inserted in these
rules. Anything that does not depend upon the order of processing
children is usually placed here.
|
<descent-rules> |
when the tree walk descends from
a parent to its first child (it occurs after the parent has gotten its
<walk-rules> and before the child gets its <walk-rules>) |
the parent node
|
Very useful for implementing
scope-aware data structures (such as a stack). Scopes are
normally pushed (added) on descent.
This never occurs for nodes that have no children.
|
<next-child-rules> |
when the tree walk moves between
siblings (it occurs after the current [left] sibling has gotten its
<walk-rules>, and after all processing is complete for that left sibling's descendants and before the next [right] sibling gets its
<walk-rules>) |
the parent node |
In rule-sets that convert a tree
to a human readable form, these rules
are highly useful for implementing differential processing based on the
index of the child being processed (the "nextChildIndex"). For
example, in converting the tree form of a Java method call
(assume that the first child is the Object instance upon which the
method is called) into a
source code version, one might need to output a ".methodName(" after
the first child has been processed but before all other children are
processed.
This is generally useful in any case where processing needs to occur
between the processing of any two children.
This never occurs for nodes that have no children. |
<ascent-rules> |
when the tree walk ascends from
the last child to its parent (it occurs after the last child has gotten
its <walk-rules> and after all processing is complete for the last child's descendants) |
the parent node |
Very useful for implementing
scope-aware data structures (such as a stack). Scopes are
normally popped (deleted) on ascent.
In rule-sets that convert a tree to a human readable form, these rules
are highly useful for closing processing after all children are
processed. For example, in converting the tree form of a function
call
(assume that the children represent function call parameters) into a
source code version, one might need to output a closing ")" after all
parameters are emitted.
This never occurs for nodes that have no children.
|
<post-rules> |
once for each tree, after all
other rules have executed |
tree root |
Useful for termination logic,
especially for implementing any filesystem or database persistence that
is done on a per-tree basis. |
To hook a particular event for a specific node, just insert a rule that
matches that node using its logical condition. Then
inside this rule, you can put any combination of actions and nested
rules (which can have any combination of actions and rules inside
it).
While the pattern engine will visit every node (and trigger every one
of the actions as needed), if you have no matching expressions to a
given node, the node will be ignored.
The following diagram shows a trivial example. The legend lists
each
event in the order that the events are triggered. The event
number
matches the numbers on the tree nodes and the arrows. The arrows
designate events that are triggered in between <walk-rules>,
while moving between the visits of two related nodes. The numbers
in a
node specify the case where there is a specific visit to a node
(<init-rules>, <walk-rules> and <post-rules>).
Why Use the Pattern Engine?
The pattern engine is a tree processor. It walks every node in
the tree in a very defined manner (as shown by the previous
diagram). A rule set is really a set of call-backs that are
invoked in response to events occuring during the tree walk. So,
if a rule set has walk-rules, each one is executed (in the order of
occurance) when the walk of any node triggers the walk-rules
"event". For each walk-rule in that rule set, the condition is
evaluated and if it evaluates true, then the body (actions or rules) of
the rule will trigger (and so on as things are nested inside).
But if a rule set has no rules that evaluate true for a given node,
then that node is essentially ignored by the given rule set, even
though it will be processed by the engine.
There is support for if/else conditional
processing (this is the purpose of a <rule> which has a condition
and then actions, rules and loops that can be used on either false or
true) and loops (<while> which can be easily used to also
simulate a for). Along with the variable support included (any
kind of
object can be instantiated or obtained and then all of its methods
are available in rules and actions) and the assignment operator
support, there is little that cannot be done directly in a rule set.
There is great power in letting the pattern engine handle the tree
walking. The tree processing logic is expressed in a very compact
form
(much smaller than the corresponding Java program that implements the
same logic) while still
taking advantage of nearly every feature of the Java language.
The
only Java features that are not provided: array subscripting, static
methods and some operators
like "new".
The pattern engine removes the duplication of tree walking code that
would be required in any set of Java programs that hard-code tree
processing. This reuse reduces development time, makes the
programs standardized, leverages tested/debugged/documented code and
allows the developer to focus on the logic of the pattern matching and
the objectives of what to do when a pattern is found.
Package Class Hierarchy
The primary classes of interest are those that correspond with the
constructs described in the introduction:
Rule
is the implementation
of the conditional rule.
AstSymbolResolver
is the implementation of the symbol
resolver.
RuleSet
is the
implementation of the ruleset.
PatternEngine
is
the implementation of the pattern
engine, and also manages the ruleset pipeline.
PatternWorker
is
the interface which all pattern worker
objects implement. AbstractPatternWorker
is a partial implementation of this interface to simplify the creation
of concrete pattern worker implementations.
High Level Flow
The PatternEngine is launched as a command line driver, or it may be
constructed, configured, and invoked programmatically. When
launched from the command line, a configuration profile must be
specified. Processing is invoked using the
run method,
which accepts the name of the profile to load and execute. The
logic flow is as follows:
- begin
- perform global initialization processing:
- create a new symbol resolver
- initialize and register default worker objects
- load the pattern configuration profile
- set a default specification to find ASTs if not provided by
profile
- apply global, user-defined init-rules
- for each AST to be processed:
- for each ruleset defined in the pipeline:
- make a deep copy of the source AST hierarchy; this copy
becomes the editable copy for the current ruleset pass and
becomes the read-only source AST for the next ruleset in the pipeline
- perform ruleset-specific, initialization processing:
- configure the symbol resolver for the current ruleset
- set the source and copy root ASTs as the
current ASTs in the symbol resolver
- apply user-defined init-rules for the ruleset
- walk the AST (or the filtered view of results of previous
ruleset) and generate the proper events
- apply all walk-rules of the ruleset to the each AST
visited
- apply descent-rules upon moving to the first child
- apply next-child-rules upon moving from one child to
another
- apply ascent-rules upon moving back up the tree from
the last child
- perform ruleset-specific termination processing:
- set the source and copy root ASTs as the
current ASTs in the symbol resolver
- apply user-defined post-rules for the ruleset
- use the copy AST from the last ruleset as the source
AST for the next ruleset
- perform global termination processing:
- apply global, user-defined post-rules
- call each registered worker object's termination processing
hook
- null out the symbol resolver
- end
Configuration
A pattern processing engine configuration profile defines everything
necessary for the engine
to process one or more ASTs against a pipeline of rulesets. A
profile is defined in one or more XML documents. A master
document, the
pipeline profile
document, defines:
- the ASTs to be processed;
- the default programming language to use when processing rulesets;
- the pipeline of rulesets;
- global initialization and termination rules to be applied at the
beginning and end of the processing session, respectively.
The pipeline profile may refer to zero or more
ruleset profile documents. A
ruleset profile document defines:
- the rules to be applied against each AST;
- the programming language to use when resolving language token
constants;
- an optional library of named expressions which may be referenced
by alias from other expressions;
- an optional list of other ruleset profiles from which public,
named expression libraries are to be included for the current ruleset's
use.
Pipeline profile documents always have a filename extension of
xml
.
Ruleset profile documents always have a filename extension of
rules
.
A
ConfigLoader
object is
responsible for parsing the pipeline profile document, for resolving
any references to external ruleset profile documents, and for
configuring the pattern engine using all of the information extracted
from the resulting network of documents.
Pipeline and Ruleset
Profile File Format
The same general XML grammar is used for both the master profile, which
drives the entire pipeline of ruleset processing, and for each,
individual ruleset within the pipeline. A
pipeline profile must use the
.xml
file name extension; a
ruleset
profile must use the
.rules
file name
extension. The ruleset profile file format is a subset of the
pipeline profile format. The following describes the grammar:
Name
|
Scope
|
Num
|
Children
|
Description
|
cfg
|
P
|
1
|
ast-spec
worker
include
variable
init-rules
descent-rules
rule-set
ascent-rules
post-rules
|
Outermost containing element
(root) of
the pipeline profile document.
No attributes are defined for this element.
|
ast-spec
|
P
|
0+
|
none
|
Defines a directory and file
specification used to find a list of either (a) persisted AST files to
process; or (b) root node entries in a call graph. If no
such element is provided, such a specification must be provided at the
pattern engine command line; otherwise, no processing will occur.
Attribute
|
Required
|
Description
|
directory |
yes
|
An absolute or relative
path to a directory which is the root of a
directory tree containing .ast files (persisted ASTs), or which
contains one or more .graph files (call graph definitions). If
relative, the directory will be processed relative to the value of the
P2J_HOME system property. |
file-spec |
yes
|
A regular expression used
to identify the names of source files (i.e., .p or .w
files) to be processed. The appropriate .ast or .graph
file name extension will be appended automatically, depending upon the
pattern engine mode of operation. |
|
worker
|
P
|
0+
|
none
|
Defines a PatternWorker
implementation for the pattern engine to load for a specialized
task. Several pattern workers are loaded by default to provide
standard services. This element gives the user the opportunity to
load additional worker objects, to provide additional variables and/or
user functions for a particular purpose.
Attribute
|
Required
|
Description
|
class |
yes
|
The fully
qualified name of the PatternWorker implementation class to load
within a worker element. |
|
include
|
P/R
|
0+
|
none
|
Loads a function library from
the given rules filename. Whether this filename is
unqualified or partially qualified, each segment in the pattern path is
searched for the first file that matches. That file is loaded and any
public func-library element in that file is made accessible.
Attribute
|
Required
|
Description
|
name |
yes
|
The unqualified or partially
qualified name of the rules file to load. |
|
func-library
|
P/R
|
0+
|
expression
function
|
Provides a set of named
expressions or functions which can be used in the current rule file or
imported into an external rule file for use.
Attribute
|
Required
|
Description
|
access |
no
|
The string "public" or
"private". Private access means that the function library can
only be accessed from within the same rules file. Public access
allows the function library to be loaded (and then accessed) by any
rules file. By default a function library is public. |
|
expression
|
P/R
|
1+
|
none
|
Use of this element is discouraged. This should be considered deprecated.
Contains a single expression which is evaluated unconditionally. This is similar to an action element, except:
- actions cannot be named
- actions can be set to execute based on the boolean result of an enclosing rule element, expressions cannot do this
When directly contained inside of a func-library element, a named expressions is the equivalent to a named function, except that the expression can only contain a single expression and none of the structure or contained elements of a function can be used.
Attribute
|
Required
|
Description
|
name |
no
|
The name of the expression. This is the first parameter (a a string literal) to execLib() or evalLib(). |
|
function |
P/R |
1+ |
parameter
variable
return
rule
while
expression |
Defines a named block of logic that:
- Is only invoked explicitly by the use of CommonAstSupport
execLib() and evalLib() methods. This code does not get implicitly
called based on tree walking.
- Can be invoked at any time in an expression, from any action or rule (even from other functions).
The logic can be specified as a single expression or as a set of one or more contained rule, while or expression elements. If a single expression is specified, this is done without any contained elements, instead the text of the function element itself must be a single valid expression that can be evaluated to a result. This non-element mode for defining a function is considered deprecated and should not be used for new functions.
By default, if no return
element is defined, the return value is the same as the last executed
expression in the function (whether this expression is directly
specified in the function element's text or whether this is a contained
rule, action, expression or while element).
Variables that are in enclosing scopes can be accessed but this is
discouraged. The function element is considered a new scope and can
hide variables which have the same name in enclosing scopes.
Some problems to note:
- Recursion is not allowed at this time.
- Actions can't be embedded directly in a function, but can be embedded inside a rule in a function.
- Pattern worker access is based on the worker definitions in
the calling rule set NOT the rule set in which the function is defined.
Attribute
|
Required
|
Description
|
name |
yes
|
The name of the function. This is the first parameter (a a string literal) to execLib() or evalLib(). |
|
variable
|
P/R
|
0+
|
none
|
Declares a user variable to use
in rule expressions. Variables are global across rulesets if
defined in the scope of a pipeline profile. Variables are local
to an individual ruleset if defined within the scope of a ruleset
profile.
Attribute
|
Required
|
Description
|
name
|
yes
|
Name which will be used to
refer to variable within user expression tests and assignments.
|
type
|
yes (if init does not exist)
|
Data type of the
variable. Valid entries are int, long , double ,
string, boolean or any fully qualified Java classname.
|
init
|
yes (if type does not exist)
|
An expression which when evaluated yields a result that is the initial value (and type) for the variable.
|
|
parameter |
P/R |
0+ |
none |
Declares a parameter to use
in a named function. This is equivalent to a local variable,
except the value is assigned by the caller in the parameter list (2nd
and subsequent parameters to the execLib() and evalLib() functions).
Note that the order, type and number of the parameters in the
function definition must match those passed in the function invocation.
Attribute
|
Required
|
Description
|
name
|
yes
|
Name which will be used to
refer to parameter within the function.
|
type
|
yes (if init does not exist)
|
Data type of the parameter. Valid entries are int, long , double ,
string, boolean or any fully qualified Java classname.
|
init
|
yes (if type does not exist)
|
An expression which when evaluated yields a result that is the initial value (and type) for the parameter.
|
|
return |
P/R |
0 or 1 |
none |
Declares a local variable, whose value will be returned to
the caller of the function when the function completes. This also
determines the type of the function's return value.
Attribute
|
Required
|
Description
|
name
|
yes
|
Name which will be used to
refer to return variable within the function.
|
type
|
yes (if init does not exist)
|
Data type of the return value. Valid entries are int, long , double ,
string, boolean or any fully qualified Java classname.
|
init
|
yes (if type does not exist)
|
An expression which when evaluated yields a result that is the initial value (and type) for the return variable.
|
|
rule-set
|
P/R
|
1+
|
expr-library
include
variable
init-rules
descent-rules
walk-rules
ascent-rules
post-rules
rule-set
|
Defines a set of rules and
resources used by the pattern engine to process the information stored
within an AST. Rulesets which are tightly coupled to the purpose
of the pipeline's task may be defined inline, within the main pipeline
profile document. Alternatively, more general purpose, reusable
rulesets may be defined individually, within their own .rules
files.
If defined within a .rules file, this element is the
outermost containing element (root) of the profile document (rather than having an outermost cfg element).
If defined within a .xml file (the master profile
document), this element can either define an entire ruleset inline, or
refer to an external ruleset defined within a .rules
file. If the element contains the name attribute
(see below), it refers to an external ruleset profile. Otherwise,
it is expected that an inline ruleset definition follows the opening
tag.
This element defines a new scope for the purpose of resource access
such as variables. Resources in enclosing scopes can be accessed
but enclosing scopes cannot access resources defined within this
element.
Attribute |
Required
|
Description
|
load-condition
|
no
|
This attribute is processed only when the rule-set element is not the root element
(to allow a function library to be included by other files unconditionally).
It contains a comma-separated list of flags which will be checked only when conditional
rule-set processing is activated (the PatternEngine.conditionalRuleSets field is set to true ).
When conditional rule-set processing is activated, it will process all the rule-sets which have at least one load-condition
flag matching one from the PatternEngine.activatedFlags set.
The following APIs from the PatternEngine class are related to conditional rule-set
processing:
void setConditionalRuleSets(boolean) to enable or disable conditional rule-set processing.
boolean isConditionalRuleSets() to check the current state of conditional rule-set processing.
void addConditionalFlag(String) to add a new flag to the PatternEngine.activatedFlags set.
void removeConditionalFlag(String) to remove a flag from the PatternEngine.activatedFlags set.
boolean isFlagActivated(String) when conditional rule-set processing is enabled, it will check if the given flag
is part of the activated flags.
boolean activeFlags(Set) when conditional rule-set processing is enabled, it will check if any flag from
the given set is part of the activated flags.
|
name
|
no
|
If present, this attribute
defines the name of an external ruleset definition which is included by
reference in the master pipeline profile. It should not include
the .rules file name extension. If this attribute
is present, any information contained within the opening and closing rule-set
tags is ignored.
There are 2 ways to use this external reference:
- Top-Level - when the rule-set element is contained directly within a cfg element, this is considered a top-level rule-set. Such a rule-set will have its contents executed in its entirety with a tree walk that is specific to that rule-set. Any other top-level rule-set elements in the same cfg
element will be executed in order. This provides the pipelining
feature to the master profile document. In a file with 2
top-level rule sets, the first rule set will be processed through a
complete tree walk (with all nodes being visited and all events being
triggered) before any processing in the second rule set will occur.
- Merge - when the rule-set element is contained within another rule-set element, this has the effect of merging the named rules file into the containing rule-set
element as if those rules were defined there directly. This
allows rules to be split into multiple different rules files but then
walked as a combined set. In such a case, each of the merged rule
sets will have their events triggered in the same order as the rule-set elements are merged. In a rule-set that contains merged rule-set elements, the first event of the tree walk will be processed for each merged rule-set in turn and then the next tree walk event will be processed for each rule-set in turn and so on, until the tree walk is complete. These rule-set elements are really part of a simultaneously executing single tree walk that occurs for the top-level rule-set element.
If this attribute is not provided, an inline ruleset definition is
assumed.
|
input
|
no
|
Defines the type of input
for this ruleset to process. Acceptable values are tree
or view . A value of tree indicates
that this ruleset should be applied against the entire AST hierarchy,
regardless of the output of the previously processed ruleset in the
pipeline. A value of view indicates that this
ruleset should be applied only against the individual AST nodes which
the previous ruleset in the pipeline added to its filtered view of ASTs.
If this attribute is not provided, a default value of tree
is assumed.
|
honor-hidden |
no |
If present, this boolean value determines if the tree
walk will include nodes that have their hidden flag set to true.
An AST can have its hidden flag set by com.goldencode.p2j/uast.Aast.setHidden(). If honor-hidden
is true, then hidden nodes will NOT be included in the tree walk. Note
that "manual" dereferencing of nodes (e.g. this.getChildAt(4)) will
still return hidden nodes. |
|
init-rules |
P/R |
0-1 |
rule |
A list of rules to run as
initialization processing, either globally
across all rulesets in the pipeline, or locally, within an individual
ruleset. If defined within the pipeline profile document, the
init
rules are global; if defined within a ruleset profile document,
the
init rules are local to that ruleset.
Global init rules are applied once per pipeline, after pattern workers
are initialized, but before any individual rulesets are
processed. No
AST is in scope for purposes of rule expression processing when the
global init rule list is applied.
Local init rules are applied each time a ruleset is applied against an
AST, before any of the ruleset's walk rules are applied. The root
node
of the AST being processed by the current ruleset is in scope for
purposes of rule expression processing when a local init rule list is
applied. So, any rule expressions which refer to properties of an
AST will use the root AST to resolve
AST property variables or to apply changes against the AST.
No attributes are defined for this element.
|
descent-rules
|
P/R
|
0-1
|
rule
|
|
walk-rules
|
R*
|
0-1
|
rule
|
A list of rules to apply against
an iteration of AST nodes. Depending upon the value of the input
attribute of the enclosing rule-set element, this list of
rules is either applied against every AST node in the current AST
hierarchy, or against only those AST nodes included in the filtered
view defined by the previous ruleset in the pipeline.
No attributes are defined for this element.
|
ascent-rules
|
P/R
|
0-1
|
rule
|
|
post-rules
|
P/R
|
0-1
|
rule
|
A list of rules to run as
termination processing, either globally
across all rulesets in the pipeline, or locally, within an individual
ruleset. If defined within the pipeline profile document, the
post
rules are global; if defined within a ruleset profile document,
the post rules are local to that ruleset.
Global post rules are applied once per pipeline, after all individual
rulesets have been processed, but before pattern worker termination
processing takes place. No
AST is in scope for purposes of rule expression processing when the
global post rule list is applied.
Local post rules are applied each time a ruleset is applied against an
AST, after any of the ruleset's walk rules are applied. The root
node
of the AST being processed by the current ruleset is in scope for
purposes of rule expression processing when a local post rule list is
applied. So, any rule expressions which refer to properties of an
AST
will use the root AST to resolve AST property variables or to apply
changes against the AST.
No attributes are defined for this element.
|
rule
|
R*
|
1+
|
rule
while
action
|
Defines a logical condition and
an optional list of actions to perform if that condition is met or not
met. The condition and actions are each defined as logical
expressions. They may contain system/AST variables (exposed by
pattern worker objects) or global or local user variables defined using
the variable element.
CDATA may be used to
contain the condition however, it is common to put the condition
directly in the element text. Note that a rule with no contained elements is the functional equivalent of an action.
This element defines a new scope for the purpose of resource access
such as variables. Resources in enclosing scopes can be accessed but
enclosing scopes cannot access resources defined within this element.
Attribute
|
Required
|
Description |
on
|
no
|
If true ,
this action is executed if the enclosing rule's condition evaluates to true .
If false , this action is executed if the enclosing rule's
condition evaluates to false . If not provided,
defaults to true .
|
|
while
|
R*
|
0+
|
rule
while
action
|
This is a looping
construct. While the logical condition is true, the optional list
of rules, whiles and actions are performed. Any contained
elements which use the on="false" attribute are never executed as the
concept of a while loop inherently precludes this. The condition
is defined as a logical
expression. They may contain system/AST variables (exposed by
pattern worker objects) or global or local user variables defined using
the variable element.
Attribute
|
Required
|
Description |
on
|
no
|
If true ,
this action is executed if the enclosing rule's condition evaluates to true .
If false , this action is executed if the enclosing rule's
condition evaluates to false . If not provided,
defaults to true .
|
|
action
|
R*
|
0+
|
none
|
Defines an action to perform in
the event the enclosing rule's condition is met or not met.
Attribute
|
Required
|
Description |
on
|
no
|
If true ,
this action is executed if the enclosing rule's condition evaluates to true .
If false , this action is executed if the enclosing rule's
condition evaluates to false . If not provided,
defaults to true .
|
|
TODO: provide example of each level of profile.
Developing Rule Sets
Let the pattern engine handle the tree traversal. As a developer
of rule sets, you must only insert call-backs for those events that you
care about. So you don't control the traversal, you instead
control which events matter. A node is "skipped" by simply never
matching it in the condition portion of the rule. This is simply
a matter of carefully designing the rules such that they can only match
the exact tree construct that you are looking for, and no others.
This is where it becomes important to review the related nodes and
constructs in the parser that creates the tree since this is
*the* authoritative source of what is
*possible*. The general approach to
identifying the pattern:
- Identify the token types that you are looking for. Writing
a small testcase, running it through the parsere and then reviewing the
generated AST is a good way to do this.
- Identify the location(s) in the parser at which the construct you
are trying to match is created. This shows you the possible
structure (or context) in which the node(s) will be found.
- Search in the parser source for all other instances of the token
types you are matching on. This will tell you if you need to make
your rule conditions more specific in order to avoid false positives
(matching where a match really doesn't exist).
Many input data streams are heavily context sensitive. This means
that the same tokens can be used differently based on context. In
such context sensitive input streams, the matching rules generally must
be more specific, often have a condition that tests data from multiple
related nodes at once (e.g. the current node, the parent node, the
parent's parent...).
Draw the pattern (the tree relationship/structure) out on paper as the above steps are taken. Once
the unique pattern is defined, encoding this in rule form is usually
very straightforward. Then add logic in the body of the rule to
do whatever it is that is the objective of the rule.
How to Use
Scoping During a Tree Walk
In many cases, one may need to keep a stack or a scoped dictionary of
data to keep track of data which is inherently scoped. For
example, in traditional languages the meaning of a particular symbol is
different depending upon the subtree which is being visited. Due
to the nesting features of many languages, it is often the case that
there can be an arbitrary number of scoping levels.
Often a simple stack can be used for this, but a scoped dictionary is
especially useful if you have > 1 variable which has the same
scoping semantics. This is the more general case, so the
following text will show how that works.
- The DictionaryWorker is
already loaded as one of the base workers so it doesn't need to be
added manually. Examine this class to understand how it
works. The key point is that each dictionary is named using an
arbitrary string and you may have any number of variables stored in
this named dictionary. Each variable is also named using a string
and can be gotten, put and queried to see if the dictionary contains
such a variable. A stack of maps is used and each level in the
stack is a different scope. You can add and delete scopes (maps)
and when a variable is accessed, the stack of maps is searched from the
top down until the variable is found or there is nothing more to
search.
- You may want to "clear" your dictionary at a known point,
especially if you are processing multiple trees. In this example,
2 variables named "process" and "num" are "prepopulated" with the value
0:
- <init-rules>
- <!-- keep track of the current scope -->
-
<rule>deleteDictionary("my_scoped_dictionary")</rule>
-
<rule>addDictionaryLong("my_scoped_dictionary", "process",
#(long) 0)</rule>
-
<rule>addDictionaryLong("my_scoped_dictionary", "num", #(long)
0)</rule>
- </init-rules>
- Find the correct place to add scopes. This is typically in
a walk-rule or in a descent-rule. You do this to ensure that
anything deeper in the tree will see the latest version of the
variables. In this examples, new versions of the variables are
added to the new scope, though this could be split into rules triggered
elsewhere.
- <walk-rules>
- <rule>type == something
-
<action>addScope("my_scoped_dictionary")</action>
-
<action>addDictionaryLong("my_scoped_dictionary", "process",
getSomeDataHere())</action>
-
<action>addDictionaryLong("my_scoped_dictionary", "num",
getSomeOtherDataHere())</action>
- </rule>
- </walk-rules>
- Find the correct place to delete scopes. This is usually
done in an ascent-rule since this is the moment when the subtree rooted
at a particular node is known to have completed processing.
- <ascent-rules>
- <rule>type == something
-
<action>deleteScope("my_scoped_dictionary")</action>
- </rule>
- </ascent-rules>
- Use the dictionaryContains() or lookupDictionary*() methods to
access the data and make decisions. Since these variables are
scoped using a stack, this technique cleanly handles any amount of
nesting. References to these variables always get the most recent
version and when the variables go out of scope they automatcially get
cleaned up by deleting the top scope from the stack.