public class E4GLParser extends antlr.LLkParser implements E4GLParserTokenTypes
This is NOT a general purpose HTML parser. In fact, it is "dumbed down" from the pure HTML specification to a subset that is supported by the very limited parsing that WebSpeed supports. That parsing ignores certain features of HTML encoding that are valid, thus this parser can similarly ignore those features.
The WebSpeed product supports 4 types of program:
This preprocessor handles cases 1 and 2 above. Cases 3 and 4 do not have any embedded 4GL to preprocess so they do not need to run through this parser. On first glance, case 1 (static HTML with no E4GL) seems like it can be ignored since turning it into 4GL is costly from a runtime efficiency perspective. When run through this preprocessor, the output of such files is still a 4GL procedure (or include file). Since this procedure or include file can be itself included in other E4GL programs, it is necessary to convert all static HTML and E4GL code into 4GL source code.
Interestingly, the 4GL source code result of this preprocessor is the same as if one had coded a CGI Wrapper (case 3 above) instead of using static HTML or E4GL. There is no difference between the 3 cases at runtime, it is only a difference in whether this WebSpeed preprocessor is used first.
HTML Mapping (case 4) is not addressed by this preprocessor since the HTML involved in that solution is processed at runtime.
The filtering that happens can be customized to match the output of a specific E4GL implementation. At this time both WebSpeed and Blue Diamond implementations are supported.
This parser is heavily dependent upon the following code:
Options
- stores configuration that can be shared from
multiple modules
CompatibilityHelper
- this hides the difference in modes
between WebSpeed and Blue Diamond
E4GLPreprocessor
- this provides the external interface
for the parser
There are 2 external entry points for the parser: preprocess(com.goldencode.p2j.e4gl.Options, java.io.PrintStream)
and
scan_options(com.goldencode.p2j.e4gl.Options)
.
Parser features:
META
or WSMETA
elements
with attributes of NAME="wsoptions"
or
HTTP-EQUIV="content-type"
; in both cases the
CONTENT
attribute is read with the following options
being honored
PUT Stream WebStream UNFORMATTED '
) and a line suffix
~n'.
(there are many variations on these depending on
what is being output and in which order; in additions the actual
text used is different depending on the compatibility helper that
is in use, WebSpeed or BlueDiamond)
TODO: before this output can be optimally converted, it is likely that the expansions, replacements and other state of the preprocessing will need to be recorded by the parser into a hints file. Much of the data to be stored will be in the form of a "section definition" which stores the start line/column and end line/column of a section of either the original source file or of the target file. The following will be needed:
Modifier and Type | Field and Description |
---|---|
static java.lang.String[] |
_tokenNames |
static antlr.collections.impl.BitSet |
_tokenSet_0 |
static antlr.collections.impl.BitSet |
_tokenSet_1 |
static antlr.collections.impl.BitSet |
_tokenSet_2 |
private Options |
cfg
Configuration values to honor during preprocessing.
|
private CompatibilityHelper |
hlp
Helper to emit headers, footers, comments and other text.
|
private java.io.InputStream |
in
Source for raw input.
|
private boolean |
lineBegin
Flag to denote that the output is starting a new line.
|
private static char |
NL
Newline character to search for for line operations.
|
private java.io.PrintStream |
out
Destination for preprocessed output.
|
private boolean |
wasExprClose
Flag to denote that the last output was an expression escape close.
|
private boolean |
wasHtml
Flag to denote that the last output was HTML.
|
private boolean |
wasStmtClose
Flag to denote that the last output was an statement escape close.
|
astFactory, inputState, returnAST, tokenNames, tokenTypeToASTClassMap, traceDepth
BACK_TICK, CLOSE_COMMENT, CLOSE_CURLY_EQ, CLOSE_PCT, CLOSE_QUESTION, CLOSE_TAG, CLOSE_TAG_NO_CONTENT, COLON, CONTENT, DIGIT, DOT, DQUOTE, ENCODED_CHAR, EOF, EQUALS, EXCLAIM, GT, HEX_DIGIT, HTML_DSTRING, HTML_SSTRING, HTTP_EQUIV, HYPHEN, JUNK, LANGUAGE, LEFT_CURLY, LETTER, LT, META, NAME, NULL_TREE_LOOKAHEAD, OPEN_COMMENT, OPEN_CURLY_EQ, OPEN_END_TAG, OPEN_PCT, OPEN_PCT_EQ, OPEN_QUESTION, OPEN_START_TAG, OTHER, PERCENT, QUESTION, RIGHT_CURLY, SCRIPT, SERVER, SLASH, SQUOTE, SYMBOL, UNDERSCORE, UNKNOWN, WS, WS4GL, WSE, WSMETA, WSPEED, WSS
Modifier | Constructor and Description |
---|---|
|
E4GLParser(antlr.ParserSharedInputState state) |
|
E4GLParser(antlr.TokenBuffer tokenBuf) |
protected |
E4GLParser(antlr.TokenBuffer tokenBuf,
int k) |
|
E4GLParser(antlr.TokenStream lexer) |
protected |
E4GLParser(antlr.TokenStream lexer,
int k) |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
attribute_value(java.lang.StringBuilder sb)
Parse the
= followed by a SYMBOL or
STRING . |
void |
backtick_escape()
Match a starting
BACK_TICK , arbitrary 4GL expression text and
a closing BACK_TICK . |
void |
closeInput()
Close the input stream.
|
void |
curly_escape()
Match a starting
OPEN_CURLY_EQ , arbitrary 4GL expression text
and a closing CLOSE_CURLY_EQ . |
java.lang.String |
dstring()
Matches any data delimited by two matching double quote characters.
|
private void |
generateComment(java.lang.String escape,
boolean expr)
Generate a comment out the given text of an escape and output it to the
stream.
|
java.io.InputStream |
getInput()
Get the input stream.
|
private java.lang.String |
getTokenText(antlr.Token next)
Get the text of the token unless it is of type
ENCODED_CHAR
in which case the contents will be converted from the '%XX' form into
the hexidecimal character that is being encoded. |
boolean |
meta_tag(Options opts,
java.lang.StringBuilder sb)
Match
OPEN_START_TAG META or OPEN_COMMENT WSMETA ,
the following NAME or HTTP_EQUIV and then the
CONTENT and a closing CLOSE_TAG or
CLOSE_COMMENT . |
private static long[] |
mk_tokenSet_0() |
private static long[] |
mk_tokenSet_1() |
private static long[] |
mk_tokenSet_2() |
private void |
output(java.lang.String txt,
boolean cvt,
boolean html)
Output the given text to the stream while maintaining the new line
state.
|
private void |
outputCloseComment(java.lang.String escape,
boolean expr)
Comment out the given text of an expression or statement close element
and output it to the stream.
|
private void |
outputCloseComment(antlr.Token escape,
boolean expr)
Comment out the given text of an expression or statement close element
and output it to the stream.
|
private void |
outputOpenComment(java.lang.String escape,
boolean expr)
Comment out the given text of an expression or statement open element
and output it to the stream.
|
private void |
outputOpenComment(antlr.Token escape,
boolean expr)
Comment out the given text of an expression or statement open element
and output it to the stream.
|
void |
percent_eq_escape()
Match a starting
OPEN_PCT_EQ , arbitrary 4GL expression text
and a closing CLOSE_PCT . |
void |
percent_escape()
Match a starting
OPEN_PCT , arbitrary 4GL statement text
and then CLOSE_PCT . |
void |
preprocess(Options cfg,
java.io.PrintStream out)
Top-level entry point that parses the input stream and preprocesses the
HTML and embedded 4GL into a valid 4GL program.
|
void |
question_escape()
Match a starting
OPEN_QUESTION WSPEED CLOSE_TAG , arbitrary
4GL statement text and then CLOSE_QUESTION WSPEED CLOSE_TAG . |
void |
reportError(antlr.RecognitionException re)
Writes error data to
System.err , including a full stack
trace. |
void |
scan_options(Options opts)
Top-level entry point that parses the input stream to read the E4GL
options embedded in any
META or WSMETA elements. |
void |
script_escape()
Match a starting
OPEN_START_TAG SCRIPT , arbitrary
4GL statement text and then CLOSE_QUESTION WSPEED CLOSE_TAG . |
boolean |
script_tag(java.lang.StringBuilder sb)
Match a starting
OPEN_START_TAG SCRIPT , arbitrary
attributes and then CLOSE_TAG . |
void |
server_escape()
Match a starting
OPEN_START_TAG SERVER CLOSE_TAG , arbitrary
4GL statement text and then CLOSE_QUESTION WSPEED CLOSE_TAG . |
void |
setInput(java.io.InputStream in)
Set the input stream.
|
java.lang.String |
sstring()
Matches any data delimited by two matching single quote characters.
|
java.lang.String |
string()
Matches various string types (data delimited by two matching single or
double quote characters).
|
void |
wse_escape()
Match a starting
OPEN_COMMENT WSE , arbitrary 4GL expression
text and a closing CLOSE_COMMENT . |
void |
wss_escape()
Match a starting
OPEN_COMMENT followed by a WSS
or WS4GL , arbitrary 4GL statement text and then
CLOSE_COMMENT . |
addMessageListener, addParserListener, addParserMatchListener, addParserTokenListener, addSemanticPredicateListener, addSyntacticPredicateListener, addTraceListener, consumeUntil, consumeUntil, defaultDebuggingSetup, getAST, getASTFactory, getFilename, getInputState, getTokenName, getTokenNames, getTokenTypeToASTClassMap, isDebugMode, mark, match, match, matchNot, panic, recover, removeMessageListener, removeParserListener, removeParserMatchListener, removeParserTokenListener, removeSemanticPredicateListener, removeSyntacticPredicateListener, removeTraceListener, reportError, reportWarning, rewind, setASTFactory, setASTNodeClass, setASTNodeType, setDebugMode, setFilename, setIgnoreInvalidDebugCalls, setInputState, setTokenBuffer, traceIndent
private static char NL
private Options cfg
private java.io.InputStream in
private java.io.PrintStream out
private CompatibilityHelper hlp
private boolean lineBegin
private boolean wasHtml
private boolean wasExprClose
private boolean wasStmtClose
public static final java.lang.String[] _tokenNames
public static final antlr.collections.impl.BitSet _tokenSet_0
public static final antlr.collections.impl.BitSet _tokenSet_1
public static final antlr.collections.impl.BitSet _tokenSet_2
protected E4GLParser(antlr.TokenBuffer tokenBuf, int k)
public E4GLParser(antlr.TokenBuffer tokenBuf)
protected E4GLParser(antlr.TokenStream lexer, int k)
public E4GLParser(antlr.TokenStream lexer)
public E4GLParser(antlr.ParserSharedInputState state)
public java.io.InputStream getInput()
public void setInput(java.io.InputStream in)
in
- Source stream.public void closeInput()
public void reportError(antlr.RecognitionException re)
System.err
, including a full stack
trace.reportError
in class antlr.Parser
re
- The error on which to report.private void output(java.lang.String txt, boolean cvt, boolean html)
cvt
parameter. If the type of output
is html
, then any unsafe characters are escaped before
being output as 4GL code.
This method is highly sensitive to the member state variables that
are maintained in cooperation with outputOpenComment()
and
outputCloseComment()
. Together these 3 methods handle all
output for the preprocessor.
Based on the current state variables, the parameters and the
CompatibilityHelper
instance, this method handles all output
that isn't a start or end element of a statement of expression escape.
This means that all raw HTML and all (embedded 4GL source) content of
expression/statement escapes is handled here.
Of great importance is that each line of output is started and ended with the proper text (in HTML mode). Likewise, when shifting into and out of expression and statement escapes, any raw HTML must be properly terminated and resumed as needed. Some deferred suffix processing of end elements for statement and expression escapes is handled here where the following content is known such that the variance in results can be determined.
txt
- Contents to write to the stream. Must be non-empty and
not null
.cvt
- true
to force HTML entity conversion before the
text is output.html
- true
to add end of line and beginning of line
output around new line characters in HTML. Since this
preprocessor is converting this HTML into strings that are
output via streams in the 4GL source, these strings need to
be augmented at the beginning and end of each line.private void outputOpenComment(antlr.Token escape, boolean expr)
This method is highly sensitive to the member state variables that
are maintained in cooperation with output()
and
outputCloseComment()
. Together these 3 methods handle all
output for the preprocessor.
Based on the current state variables, the parameters and the
CompatibilityHelper
instance, this method handles all output
that is a start element of a statement of expression escape.
Of great importance is that, when shifting into expression and statement escapes, any raw HTML must be properly terminated or started as needed. Some deferred suffix processing of end elements for statement and expression escapes is handled here where the following content is known such that the variance in results can be determined.
escape
- The token with the text to comment.expr
- true
if this is an expression escape,
false
for a statement escape.private void outputOpenComment(java.lang.String escape, boolean expr)
This method is highly sensitive to the member state variables that
are maintained in cooperation with output()
and
outputCloseComment()
. Together these 3 methods handle all
output for the preprocessor.
Based on the current state variables, the parameters and the
CompatibilityHelper
instance, this method handles all output
that is a start element of a statement of expression escape.
Of great importance is that, when shifting into expression and statement escapes, any raw HTML must be properly terminated or started as needed. Some deferred suffix processing of end elements for statement and expression escapes is handled here where the following content is known such that the variance in results can be determined.
escape
- The text to comment.expr
- true
if this is an expression escape,
false
for a statement escape.private void outputCloseComment(antlr.Token escape, boolean expr)
This method is highly sensitive to the member state variables that
are maintained in cooperation with output()
and
outputOpenComment()
. Together these 3 methods handle all
output for the preprocessor.
Based on the current state variables, the parameters and the
CompatibilityHelper
instance, this method handles all output
that is an end element of a statement of expression escape.
Of great importance is that, when shifting into and out of expression
and statement escapes, any raw HTML must be properly
resumed and terminated as needed. Some suffix processing of end elements
for statement and expression escapes is also deferred to the
output()
and outputOpenComment()
methods
where the following content is known such that the variance in results
can be determined.
escape
- The token with the text to comment.expr
- true
if this is an expression escape,
false
for a statement escape.private void outputCloseComment(java.lang.String escape, boolean expr)
This method is highly sensitive to the member state variables that
are maintained in cooperation with output()
and
outputOpenComment()
. Together these 3 methods handle all
output for the preprocessor.
Based on the current state variables, the parameters and the
CompatibilityHelper
instance, this method handles all output
that is an end element of a statement of expression escape.
Of great importance is that, when shifting into and out of expression
and statement escapes, any raw HTML must be properly
resumed and terminated as needed. Some suffix processing of end elements
for statement and expression escapes is also deferred to the
output()
and outputOpenComment()
methods
where the following content is known such that the variance in results
can be determined.
escape
- The text to comment.expr
- true
if this is an expression escape,
false
for a statement escape.private void generateComment(java.lang.String escape, boolean expr)
escape
- The text to comment.expr
- true
if this is an expression escape,
false
for a statement escape.private java.lang.String getTokenText(antlr.Token next)
ENCODED_CHAR
in which case the contents will be converted from the '%XX' form into
the hexidecimal character that is being encoded.next
- The token to potentially decode.public final void preprocess(Options cfg, java.io.PrintStream out) throws antlr.RecognitionException, antlr.TokenStreamException
Uses the following expression escape rules:
Uses the following statement escape rules:
This rule also processes the meta_tag(com.goldencode.p2j.e4gl.Options, java.lang.StringBuilder)
rule in the case where
keep-meta-content-type
is not set as an option. In this
case, a META
or WSMETA
element with an attribute
of HTTP-EQUIV
set to CONTENT-TYPE
will be
removed from the output stream using comments. Anything else will be
passed through a raw HTML.
Anything that is not handled by one of the above listed rules is considered raw HTML and is output with little regard to the content. The exceptions to this include newline processing (since the beginning and end of each line must have text added) and the proper escaping of certain valid HTML characters that cannot be directly included in 4GL programs.
Internally, all of the state management and output processing is
centralized in the following methods: output()
,
outputOpenComment()
and outputCloseComment()
.
All rules are peers at this level to ensure that they do not cause parser ambiguity.
cfg
- Storage container for the options that are found.antlr.RecognitionException
antlr.TokenStreamException
public final void backtick_escape() throws antlr.RecognitionException, antlr.TokenStreamException
BACK_TICK
, arbitrary 4GL expression text and
a closing BACK_TICK
.antlr.RecognitionException
antlr.TokenStreamException
public final void percent_eq_escape() throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_PCT_EQ
, arbitrary 4GL expression text
and a closing CLOSE_PCT
.antlr.RecognitionException
antlr.TokenStreamException
public final void curly_escape() throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_CURLY_EQ
, arbitrary 4GL expression text
and a closing CLOSE_CURLY_EQ
.antlr.RecognitionException
antlr.TokenStreamException
public final void wse_escape() throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_COMMENT WSE
, arbitrary 4GL expression
text and a closing CLOSE_COMMENT
.antlr.RecognitionException
antlr.TokenStreamException
public final void script_escape() throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_START_TAG SCRIPT
, arbitrary
4GL statement text and then CLOSE_QUESTION WSPEED CLOSE_TAG
.antlr.RecognitionException
antlr.TokenStreamException
public final void question_escape() throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_QUESTION WSPEED CLOSE_TAG
, arbitrary
4GL statement text and then CLOSE_QUESTION WSPEED CLOSE_TAG
.antlr.RecognitionException
antlr.TokenStreamException
public final void percent_escape() throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_PCT
, arbitrary 4GL statement text
and then CLOSE_PCT
.antlr.RecognitionException
antlr.TokenStreamException
public final void server_escape() throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_START_TAG SERVER CLOSE_TAG
, arbitrary
4GL statement text and then CLOSE_QUESTION WSPEED CLOSE_TAG
.antlr.RecognitionException
antlr.TokenStreamException
public final void wss_escape() throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_COMMENT
followed by a WSS
or WS4GL
, arbitrary 4GL statement text and then
CLOSE_COMMENT
.antlr.RecognitionException
antlr.TokenStreamException
public final boolean meta_tag(Options opts, java.lang.StringBuilder sb) throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_START_TAG META
or OPEN_COMMENT WSMETA
,
the following NAME
or HTTP_EQUIV
and then the
CONTENT
and a closing CLOSE_TAG
or
CLOSE_COMMENT
. This parses the meta tag processing and
may set option values that are embedded in the attributes. This is
called from scan_options(com.goldencode.p2j.e4gl.Options)
and preprocess(com.goldencode.p2j.e4gl.Options, java.io.PrintStream)
.opts
- The options to update with the found values. If null
then this rule matches the given text but does not update any
options even if those attributes are present.sb
- The buffer in which to accumulate the parsed text. May be
null
to bypass text accumulation.true
if the parsed meta tag has an
HTTP-EQUIV
attribute with a value of
"content-type".antlr.RecognitionException
antlr.TokenStreamException
public final boolean script_tag(java.lang.StringBuilder sb) throws antlr.RecognitionException, antlr.TokenStreamException
OPEN_START_TAG SCRIPT
, arbitrary
attributes and then CLOSE_TAG
. The attributes will be
scanned and if a LANGUAGE
attribute with a value of
SpeedScript
, WebSpeed4GL
or Progress
is found then this rule will return true
. Either way the text
of all matched tokens will be returned in the given buffer so that the
calling rule may handle the output however is needed.sb
- Buffer which will contain all text associated with the parsed
tokens.true
if this script element includes a language
attribute that identifies the content as embedded 4GL.antlr.RecognitionException
antlr.TokenStreamException
public final java.lang.String attribute_value(java.lang.StringBuilder sb) throws antlr.RecognitionException, antlr.TokenStreamException
=
followed by a SYMBOL
or
STRING
. Whitespace on either side of the EQUALS
is tolerated.sb
- Buffer in which to store all matched text if not
null
.SYMBOL
or STRING
.antlr.RecognitionException
antlr.TokenStreamException
public final void scan_options(Options opts) throws antlr.RecognitionException, antlr.TokenStreamException
META
or WSMETA
elements.
Uses meta_tag(com.goldencode.p2j.e4gl.Options, java.lang.StringBuilder)
to parse the options themselves. Anything else
is simply ignored until EOF
.
opts
- Storage container for the options that are found.antlr.RecognitionException
antlr.TokenStreamException
public final java.lang.String string() throws antlr.RecognitionException, antlr.TokenStreamException
antlr.RecognitionException
antlr.TokenStreamException
public final java.lang.String dstring() throws antlr.RecognitionException, antlr.TokenStreamException
See string()
.
antlr.RecognitionException
antlr.TokenStreamException
public final java.lang.String sstring() throws antlr.RecognitionException, antlr.TokenStreamException
See string()
.
antlr.RecognitionException
antlr.TokenStreamException
private static final long[] mk_tokenSet_0()
private static final long[] mk_tokenSet_1()
private static final long[] mk_tokenSet_2()