HQLLexer (P2J - Progress 4GL to Java Conversion and Runtime)

java.lang.Object
- antlr.CharScanner
- - com.goldencode.p2j.persist.hql.HQLLexer

All Implemented Interfaces:

antlr.TokenStream, HQLParserTokenTypes
```
public class HQLLexer
extends antlr.CharScanner
implements HQLParserTokenTypes, antlr.TokenStream
```
Tokenizes an HQL query statement (input as a stream of characters) into a stream of tokens suitable for the HQLParser.
Each token has an integer token type by which the parser references and matches tokens. This is done to make a high performance parser. To make this work, the lexer must create a token object when it matches a top level rule, this token object includes the token type and the actual text found in the original character stream. The lexer's main job is to look ahead into the character stream, switch into the top level rule that matches some set of characters and when this is complete, to create a valid token object and return this to the parser. The lexer thus implements a token stream interface from the parser's perspective.
Please note that this is a generated file using ANTLR 2.7.4 and a grammar specified in hql.g.
Symbol resolution is the process by which the proper token type is assigned to a token created from a match to the generic symbol rule. In the lexer, the only symbol resolution that can be done is to match with reserved keywords. Reserved keywords are given preference over other symbol types in any case where there is a conflict.
Other key design points:
- This grammar has been designed to ignore case. All symbol definitions use lowercase characters and ANTLR generates code to match both the uppercase and lowercase version of each character. This is enabled using the lexer options section.
- The lexer is defined to create a single token of any string. All logic for matching string literals is embedded in the lexer.
- At the top level of the lexer, all tokens are one of the following types:
  - string literals
  - whitespace (spaces, tabs, carriage returns, line feeds)
  - operators
  - integer and decimal literals
  - reserved keywords
  - user-defined symbols
- The lexer's top level entry point is nextToken. This method has a for-loop that uses up to 3 characters of lookahead to identify which of the top level token rules to call. To do this, each token rule's left-most match characters are "rolled up" and tested in this top level rule.
- Whitespace tokens are artificially set to the special "skip" token type, which drops them from the token stream seen by the parser. This is an important feature that simplifies the parser design.
All token types are defined as integer constants in HQLParserTokenTypes which is an interface that the parser, lexer and other related classes all implement. This allows all of these classes to directly refer to the token types and share this common set of definitions. All top-level lexer rules generate a token of the same name in the HQLParserTokenTypes interface.
There is a tokens { } section in the parser where artificial tokens are defined (tokens that are not backed by a lexer rule). These tokens are also added to the HQLParserTokenTypes interface and can thus be referenced directly by the lexer and parser.
Author:

ECF, GES

Field Summary

Fields
Modifier and Type	Field and Description
`static antlr.collections.impl.BitSet`	`_tokenSet_0`
`static antlr.collections.impl.BitSet`	`_tokenSet_1`
`private static java.util.Map`	`keywords` Map of keywords to symbol tokens

Fields inherited from class antlr.CharScanner
_returnToken, caseSensitive, caseSensitiveLiterals, commitToPath, EOF_CHAR, hashString, inputState, literals, saveConsumedInput, tabsize, text, tokenObjectClass, traceDepth

Fields inherited from interface com.goldencode.p2j.persist.hql.HQLParserTokenTypes
ALIAS, AND, AS, BOOL_FALSE, BOOL_TRUE, CASE, CAST, COMMA, CONCAT, DEC_LITERAL, DIGIT, DIVIDE, DMO, DOT, ELSE, END, EOF, EQUALS, ESCAPE, FROM, FUNCTION, GT, GTE, IN, INDEX, IS, IS_NULL, LBRACKET, LETTER, LIKE, LONG_LITERAL, LPARENS, LT, LTE, MINUS, MULTIPLY, NOT, NOT_EQ, NOT_NULL, NULL, NULL_TREE_LOOKAHEAD, NUM_LITERAL, OR, PLUS, PROPERTY, RBRACKET, RPARENS, SELECT, STRING, SUBSCRIPT, SUBSELECT, SUBST, SYM_CHAR, SYMBOL, TERNARY, THEN, UN_MINUS, VALID_1ST_IDENT, VALID_SYM_CHAR, WHEN, WHERE, WS

Constructor Summary

Constructors
Constructor and Description
`HQLLexer(antlr.InputBuffer ib)`
`HQLLexer(java.io.InputStream in)`
`HQLLexer(antlr.LexerSharedInputState state)`
`HQLLexer(java.io.Reader in)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`main(java.lang.String[] args)` Provides a command line interface for an end user to drive and/or test the HQLLexer class.
`void`	`mCOMMA(boolean _createToken)` Matches the ',' character.
`void`	`mCONCAT(boolean _createToken)` Matches the '\|\|' concatenation operator.
`protected void`	`mDIGIT(boolean _createToken)` Matches any numeric digit: `0 - 9`
`void`	`mDIVIDE(boolean _createToken)` Matches the '/' character.
`void`	`mEQUALS(boolean _createToken)` Matches the "=" string.
`void`	`mGT(boolean _createToken)` Matches the '>' character.
`void`	`mGTE(boolean _createToken)` Matches the '>=' character sequence.
`private static long[]`	`mk_tokenSet_0()`
`private static long[]`	`mk_tokenSet_1()`
`void`	`mLBRACKET(boolean _createToken)` Matches the '[' character.
`protected void`	`mLETTER(boolean _createToken)` Matches any alphabetic character: `a - z`
`void`	`mLPARENS(boolean _createToken)` Matches the '(' character.
`void`	`mLT(boolean _createToken)` Matches the '<' character.
`void`	`mLTE(boolean _createToken)` Matches the '<=' character sequence.
`void`	`mMINUS(boolean _createToken)` Matches the '-' character.
`void`	`mMULTIPLY(boolean _createToken)` Matches the '*' character.
`void`	`mNOT_EQ(boolean _createToken)` Matches the inequality string.
`void`	`mNUM_LITERAL(boolean _createToken)` Matches all forms of valid integer literals and decimal literals.
`void`	`mPLUS(boolean _createToken)` Matches the '+' character.
`void`	`mRBRACKET(boolean _createToken)` Matches the ']' character.
`void`	`mRPARENS(boolean _createToken)` Matches the ')' character.
`void`	`mSTRING(boolean _createToken)` Matches an opening single quote, arbitrary contents and an ending single quote.
`void`	`mSUBST(boolean _createToken)` Matches the query substitution parameter placeholder character.
`protected void`	`mSYM_CHAR(boolean _createToken)` Matches all characters that can be made part of a valid user-defined symbol (except alphabetic and numeric characters).
`void`	`mSYMBOL(boolean _createToken)` Match a valid symbol name.
`protected void`	`mVALID_1ST_IDENT(boolean _createToken)` Matches any character that can appear in the 1st position of a symbol (matches any `LETTER` or `SYM_CHAR`).
`protected void`	`mVALID_SYM_CHAR(boolean _createToken)` Matches any single character that can appear in the 2nd or later position in a symbol (matches any `LETTER, DIGIT or SYM_CHAR`).
`void`	`mWS(boolean _createToken)` Matches any amount of whitespace in an expression and sets the token type to SKIP.
`antlr.Token`	`nextToken()`

Methods inherited from class antlr.CharScanner
append, append, commit, consume, consumeUntil, consumeUntil, getCaseSensitive, getCaseSensitiveLiterals, getColumn, getCommitToPath, getFilename, getInputBuffer, getInputState, getLine, getTabSize, getText, getTokenObject, LA, makeToken, mark, match, match, match, matchNot, matchRange, newline, panic, panic, reportError, reportError, reportWarning, resetText, rewind, setCaseSensitive, setColumn, setCommitToPath, setFilename, setInputState, setLine, setTabSize, setText, setTokenObjectClass, tab, testLiteralsTable, testLiteralsTable, toLower, traceIn, traceIndent, traceOut, uponEOF

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - keywords
```
private static java.util.Map keywords
```
    Map of keywords to symbol tokens
  - _tokenSet_0
```
public static final antlr.collections.impl.BitSet _tokenSet_0
```
  - _tokenSet_1
```
public static final antlr.collections.impl.BitSet _tokenSet_1
```
- Constructor Detail
  - HQLLexer
```
public HQLLexer(java.io.InputStream in)
```
  - HQLLexer
```
public HQLLexer(java.io.Reader in)
```
  - HQLLexer
```
public HQLLexer(antlr.InputBuffer ib)
```
  - HQLLexer
```
public HQLLexer(antlr.LexerSharedInputState state)
```
- Method Detail
  - main
```
public static void main(java.lang.String[] args)
```
    Provides a command line interface for an end user to drive and/or test the HQLLexer class.
    Syntax:
```
    java HQLLexer <expression>
 
```
    Parameters:
    
    args - List of command line arguments.
  - nextToken
```
public antlr.Token nextToken()
                      throws antlr.TokenStreamException
```
    Specified by:
    
    nextToken in interface antlr.TokenStream
    
    Throws:
    
    antlr.TokenStreamException
  - mWS
```
public final void mWS(boolean _createToken)
               throws antlr.RecognitionException,
                      antlr.CharStreamException,
                      antlr.TokenStreamException
```
    Matches any amount of whitespace in an expression and sets the token type to SKIP. In addition, each newline character causes the lexer's line counter to be incremented in order to properly maintain each token's line information.
    Spaces, tabs and carriage returns and line feeds (newlines) are all matched.
    This is a top level lexer rule which means that there is an associated WS token.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mSYMBOL
```
public final void mSYMBOL(boolean _createToken)
                   throws antlr.RecognitionException,
                          antlr.CharStreamException,
                          antlr.TokenStreamException
```
    Match a valid symbol name. All symbols must start with an alphabetic character, the "_" character or the "$" character. Subsequent characters can be alphanumeric or one of the special symbol characters (_ and $). Symbols are matched case-insensitively.
    Once a symbol has been found, a keyword lookup occurs. If matched, the the token's type is overridden from the default (SYMBOL) to the artificial token type associated with the keyword.
    This is a top level lexer rule which means that there is an associated SYMBOL token.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mVALID_1ST_IDENT
```
protected final void mVALID_1ST_IDENT(boolean _createToken)
                               throws antlr.RecognitionException,
                                      antlr.CharStreamException,
                                      antlr.TokenStreamException
```
    Matches any character that can appear in the 1st position of a symbol (matches any LETTER or SYM_CHAR).
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mVALID_SYM_CHAR
```
protected final void mVALID_SYM_CHAR(boolean _createToken)
                              throws antlr.RecognitionException,
                                     antlr.CharStreamException,
                                     antlr.TokenStreamException
```
    Matches any single character that can appear in the 2nd or later position in a symbol (matches any LETTER, DIGIT or SYM_CHAR). This is simply a helper rule to make references easier.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mCONCAT
```
public final void mCONCAT(boolean _createToken)
                   throws antlr.RecognitionException,
                          antlr.CharStreamException,
                          antlr.TokenStreamException
```
    Matches the '||' concatenation operator. This is a top level rule, creating a CONCAT token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mCOMMA
```
public final void mCOMMA(boolean _createToken)
                  throws antlr.RecognitionException,
                         antlr.CharStreamException,
                         antlr.TokenStreamException
```
    Matches the ',' character. This is a top level rule, creating a COMMA token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mEQUALS
```
public final void mEQUALS(boolean _createToken)
                   throws antlr.RecognitionException,
                          antlr.CharStreamException,
                          antlr.TokenStreamException
```
    Matches the "=" string. This is a top level rule, creating an EQUALS token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mNOT_EQ
```
public final void mNOT_EQ(boolean _createToken)
                   throws antlr.RecognitionException,
                          antlr.CharStreamException,
                          antlr.TokenStreamException
```
    Matches the inequality string. This is a top level rule, creating a NOT_EQ token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mGT
```
public final void mGT(boolean _createToken)
               throws antlr.RecognitionException,
                      antlr.CharStreamException,
                      antlr.TokenStreamException
```
    Matches the '>' character. This is a top level rule, creating a GT token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mLT
```
public final void mLT(boolean _createToken)
               throws antlr.RecognitionException,
                      antlr.CharStreamException,
                      antlr.TokenStreamException
```
    Matches the '<' character. This is a top level rule, creating a LT token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mGTE
```
public final void mGTE(boolean _createToken)
                throws antlr.RecognitionException,
                       antlr.CharStreamException,
                       antlr.TokenStreamException
```
    Matches the '>=' character sequence. This is a top level rule, creating a GTE token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mLTE
```
public final void mLTE(boolean _createToken)
                throws antlr.RecognitionException,
                       antlr.CharStreamException,
                       antlr.TokenStreamException
```
    Matches the '<=' character sequence. This is a top level rule, creating a LTE token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mLBRACKET
```
public final void mLBRACKET(boolean _createToken)
                     throws antlr.RecognitionException,
                            antlr.CharStreamException,
                            antlr.TokenStreamException
```
    Matches the '[' character. This is a top level rule, creating a LBRACKET token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mRBRACKET
```
public final void mRBRACKET(boolean _createToken)
                     throws antlr.RecognitionException,
                            antlr.CharStreamException,
                            antlr.TokenStreamException
```
    Matches the ']' character. This is a top level rule, creating a RBRACKET token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mLPARENS
```
public final void mLPARENS(boolean _createToken)
                    throws antlr.RecognitionException,
                           antlr.CharStreamException,
                           antlr.TokenStreamException
```
    Matches the '(' character. This is a top level rule, creating a LPARENS token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mRPARENS
```
public final void mRPARENS(boolean _createToken)
                    throws antlr.RecognitionException,
                           antlr.CharStreamException,
                           antlr.TokenStreamException
```
    Matches the ')' character. This is a top level rule, creating a RPARENS token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mPLUS
```
public final void mPLUS(boolean _createToken)
                 throws antlr.RecognitionException,
                        antlr.CharStreamException,
                        antlr.TokenStreamException
```
    Matches the '+' character. This is a top level rule, creating a PLUS token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mMINUS
```
public final void mMINUS(boolean _createToken)
                  throws antlr.RecognitionException,
                         antlr.CharStreamException,
                         antlr.TokenStreamException
```
    Matches the '-' character. This is a top level rule, creating a MINUS token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mMULTIPLY
```
public final void mMULTIPLY(boolean _createToken)
                     throws antlr.RecognitionException,
                            antlr.CharStreamException,
                            antlr.TokenStreamException
```
    Matches the '*' character. This is a top level rule, creating a MULTIPLY token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mDIVIDE
```
public final void mDIVIDE(boolean _createToken)
                   throws antlr.RecognitionException,
                          antlr.CharStreamException,
                          antlr.TokenStreamException
```
    Matches the '/' character. This is a top level rule, creating a DIVIDE token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mSUBST
```
public final void mSUBST(boolean _createToken)
                  throws antlr.RecognitionException,
                         antlr.CharStreamException,
                         antlr.TokenStreamException
```
    Matches the query substitution parameter placeholder character. This is a question mark (?). This is a top level rule, creating a SUBST token type.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mNUM_LITERAL
```
public final void mNUM_LITERAL(boolean _createToken)
                        throws antlr.RecognitionException,
                               antlr.CharStreamException,
                               antlr.TokenStreamException
```
    Matches all forms of valid integer literals and decimal literals. Scenarios that are matched:
    - Matches a sequence of one or more digits (without a decimal point and additional digits) as an integer literal with a NUM_LITERAL token type.
    - If the literal at previous step can be parsed as integer but cannot fit in 32-bit space, it will be set to LONG_LITERAL.
    - If the NUM_LITERAL literal cannot be parsed by Long.parseLong then it is too large to fit in a 64-bit variable so it will be matched as DEC_LITERAL.
    - Matches a sequence of one or more digits followed by a '.' but no additional digits, as a decimal literal with a DEC_LITERAL token type.
    - Matches one or more numeric digits followed by a decimal point and another sequence of numeric digits as a decimal constant with a DEC_LITERAL token type.
    - Matches a decimal point followed by a sequence of numeric digits as a decimal constant with a DEC_LITERAL token type.
    This is a top level rule. The token type defaults to NUM_LITERAL and is overridden by specific actions depending on the scenario matched.
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mDIGIT
```
protected final void mDIGIT(boolean _createToken)
                     throws antlr.RecognitionException,
                            antlr.CharStreamException,
                            antlr.TokenStreamException
```
    Matches any numeric digit: 0 - 9
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mSTRING
```
public final void mSTRING(boolean _createToken)
                   throws antlr.RecognitionException,
                          antlr.CharStreamException,
                          antlr.TokenStreamException
```
    Matches an opening single quote, arbitrary contents and an ending single quote. Two continguous single quote characters are used to specify a one single quote in the output string (it does not terminate the string).
    Any newlines inside the string are identified and the lexer's internal newline counter is properly maintained.
    The greedy option does not need to be disabled here as the closure rule termination is built into the subrule itself: it accepts anything that isn't an unescaped single quote character (see above).
    Tabs and spaces are maintained inside strings. Opening and closing single quote characters are dropped.
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mLETTER
```
protected final void mLETTER(boolean _createToken)
                      throws antlr.RecognitionException,
                             antlr.CharStreamException,
                             antlr.TokenStreamException
```
    Matches any alphabetic character: a - z
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mSYM_CHAR
```
protected final void mSYM_CHAR(boolean _createToken)
                        throws antlr.RecognitionException,
                               antlr.CharStreamException,
                               antlr.TokenStreamException
```
    Matches all characters that can be made part of a valid user-defined symbol (except alphabetic and numeric characters). These are: $ _
    
    Throws:
    
    antlr.RecognitionException
    
    antlr.CharStreamException
    
    antlr.TokenStreamException
  - mk_tokenSet_0
```
private static final long[] mk_tokenSet_0()
```
  - mk_tokenSet_1
```
private static final long[] mk_tokenSet_1()
```

Class HQLLexer

Field Summary

Fields inherited from class antlr.CharScanner

Fields inherited from interface com.goldencode.p2j.persist.hql.HQLParserTokenTypes

Constructor Summary

Method Summary

Methods inherited from class antlr.CharScanner

Methods inherited from class java.lang.Object

Field Detail

keywords

_tokenSet_0

_tokenSet_1

Constructor Detail

HQLLexer

HQLLexer

HQLLexer

HQLLexer

Method Detail

main

nextToken

mWS

mSYMBOL

mVALID_1ST_IDENT

mVALID_SYM_CHAR

mCONCAT

mCOMMA

mEQUALS

mNOT_EQ

mGT

mLT

mGTE

mLTE

mLBRACKET

mRBRACKET

mLPARENS

mRPARENS

mPLUS

mMINUS

mMULTIPLY

mDIVIDE

mSUBST

mNUM_LITERAL

mDIGIT

mSTRING

mLETTER

mSYM_CHAR

mk_tokenSet_0

mk_tokenSet_1