Running Conversion¶

Running Conversion

In the Running the Front End Conversion chapter, the front end of the conversion process was executed by itself. That was done for two reasons: to obtain the ASTs in a pristine form suitable for reporting and to isolate the preprocessing and parsing issues of the front end from other problems that can occur in later phases of conversion processing.

This chapter assumes that the front end conversion runs successfully (it completes with no errors). In addition, it is important that all code preparation, hints definition and other customizations of the project are complete. The next step is to run the entire code conversion process from end to end.

Syntax¶

The same tool used for running the conversion front end is used to run the entire code conversion process. The difference in the command line is related to the mode parameter given.

Always run the driver with a current directory set as the project root:

cd $P2J_HOME

The ConversionDriver syntax is as follows:

java -classpath $P2J_HOME/p2j/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver [options] <mode> <filelist>

java -classpath $P2J_HOME/p2j/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -F[options] <mode> <filename>

java -classpath $P2J_HOME/p2j/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -S[options] <mode> <directory> "<filespec>"

The driver always requires a mode parameter. Following the mode is one of three ways to specify the source files to be processed. Each of the above command lines corresponds to one of the three input file syntax options.

Option/Parameter	Meaning
`-D<debuglevel>`	`<debuglevel>` must be one of the following integers: 0 = no message output 1 = status messages only 2 = status + debug messages 3 = verbose trace output
`-I`	Execute "incremental conversion", see Incremental Conversion below.
`-F`	Read the list of source procedure files via a custom file list. Instead of a list of explicit file names, the last parameter must be a filename that encodes an arbitrary list of relative or absolute filenames. See the `filename` entry below.
`-N`	No recursion through subdirectories of `directory` when processing the `filespec`. This is only valid with the `-S` option.
`-S`	Generate a list of source procedure files by searching for a match to a `filespec` starting in the given `directory`. By default, the list of procedure files must be explicitly specified as the last command line parameters.
`mode`	A combination of 1 or more of the following values. When more than one is specified, use the '+' character to delimit the list (don't insert spaces) . The most common usage is `F2+M0+CB` which is a complete code conversion run.
	Mode	Phase	Steps
	F0	Front End	preprocessor (in honor cache mode), lexer, parser, AST persistence and post-parse fixups
	F1	Front End	F0 + forced preprocessor execution
	F2	Front End	F1 + schema loader/fixups
	F3	Front End	F2 + call graph generation
	M0	Middle	schema annotations, P2O generation, Hibernate mapping docs, DMO generation, brew for DMO classes, schema DDL generation
	M1	Middle	M0 + length statistics
	CB	Code Back End	unreachable code, annotations, frame generator, base structure, core conversion and brew
`directory`	The top-level directory in which to search for filenames which match the `filespec` pattern. This is only used with `-S` (file specification) .
`filespec`	The file specification to use to find all source procedures under the `directory` given. Wildcards (* and `?`) and regular expressions can be used to create the filter. In most command shells, wildcard characters and regular expressions will be interpreted by the shell. Since the conversion driver must do the interpretation of this parameter, it is important to wrap the filter inside double quotes or some other shell-specific characters or escaping mechanism. This mode is naturally recursive (it will process all matching files found in all levels of sub-directory of the top directory) unless the `N` option is passed.
`filename`	A filename for a text file that contains an explicit list of the procedure files. The filename parameter must be a single absolute or relative filename of a text file that contains a custom list of absolute and/or relative source procedures to scan. The file list will be read from the specified file instead of being hard coded on the command line. There must be one filename per line in the file and there is no limit of the number of files in the list. This is only available with `-F` option (custom file list mode).
`filelist`	An explicit list of 4GL source procedures as one or more relative or absolute filenames. This is the default approach. The command line will contain an arbitrary list of absolute and/or relative file names to scan. This list is hard coded in the command line itself. Any number of files may be listed on the command line, subject to the shell's command line limits. This corresponds to the list of all `.p` files for the application. The actual file extension doesn't have to be `.p`, it can be anything, just as in Progress. Do not specify filenames that are only used as include files, only procedures must be specified.

To run the a complete code conversion, the best mode to use is F2+M0+CB.

Examples:

java -classpath $P2J_HOME/p2j/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver F2+M0+CB src/relative/path/one.p  src/relative/path/two.p  src/relative/path2/three.p

java -classpath $P2J_HOME/p2j/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -S F2+M0+CB src/top/directory/ "*.[pPwW]" 

java -classpath $P2J_HOME/p2j/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -F F2+M0+CB my_custom_file_list.txt

Note the use of the -S and -F options to respectively enable the non-default file specification and file list command line processing.

When the Sun Hotspot Java Virtual Machine is in use (e.g. Sun J2SE 1.6), there are 2 JVM compilers that can be used at runtime. Java uses these compilers to convert the platform independent Java bytecodes of a class file into native platform machine code, which will execute significantly faster than the interpreted bytecode approach. Hotspot has a client compiler (i.e. C1) and a server compiler (i.e. C2). The default for most environments is to run the client compiler. Long batch processes such as conversion can often run much faster with the server compiler. Generally, it is a good idea to try running conversion by adding the -server option to the java command line (between the java executable name and the -classpath option).

Normally, using the server compiler may save 10% - 15% of the total processing time. In rare cases, some combination of hardware, operating system and JVM has been seen where the difference between the client and server compilers is massive. In one example, a conversion of a single file that took over 20 minutes with the client compiler took only 30 seconds when run with the server compiler. Theoretically, it is also possible for the client compiler to be faster than the server compiler, but this would not be expected for this kind of Java work.

Incremental Conversion¶

Starting in FWD v4.0.0 there is an optional ability to run incremental conversion. This means that the conversion can optionally just convert the source files that changed (and any sources that are dependent upon those changes) since the last time conversion was run. The implementation was done in #3471. It is functional for most use cases, but it may not work in all cases. If a problem occurs you may always fall back to a full (non-incremental) conversion.

The core idea is that the ConversionDriver has an incremental conversion option (-I) to quickly convert just the changed files in the source tree. When this is active, it will detect the list of files (procedures, classes, includes and/or .df files) which have changed OR which must be reconverted because a dependency has changed. An example of a dependency is an include file. Any location in the project which directly or indirectly includes a changed include file must be reconverted. Only that list of files that is changed or depends on a changed file will be converted.

This means you can edit the source files as needed (or just put new versions of them in place) and FWD can detect the changes. It does not look at the date/time stamps. It uses MD5 hashes of the files to detect a change. Even if you just change the whitespace in a file, it will be detected as a difference because that will change the file-level hash!

Please note that all of this assumes that you have successfully executed a full conversion in a single batch run. This will create and populate a database in the cvtdb/ folder off the main project root directory. Subsequent incremental runs will use the state stored in that database to determine what needs to be converted.

If incremental mode is used without having run a clean/full conversion first, then it will effectively detect all of the project as new/different and will do a full conversion. After having run some incremental conversions, if you want to go back to a clean/full conversion, delete the cvtdb/ folder and this same incremental mode will convert all files.

In our "standard project" configuration (see Hotel GUI or Hotel ChUI, one can add the 'i' mode to any of the convert targets (e.g. convert.blacklist) (as in <arg value="-IXd2"/>) in the build.xml. After that, run only ant <convert_target_name> compile jar to get incremental mode. Using compile jar targets avoids the clean of the project. The standard build.xml will normally delete the cvtdb/ directory during clean target, which causes a full (non-incremental) reconversion.

Conversion now requires more memory because we have more AST files loaded at one time and an in-memory collections (maps, sets) which are saved in the cvtdb/ folder only when a conversion phase finishes (i.e. annotations phase). Where a project needed a minimum of 8GB before, you may need 16GB now. In other applications, we now use 12GB where we used to use 8GB previously. Regardless, more memory will be needed. Make sure you don't starve the conversion process, otherwise the performance will be exceptionally slow. We have an idea of how to reduce these memory requirements, but it is not implemented yet.

Limitations¶

any changes in the permanent schema(s) will require a full conversion
after upgrading the FWD project to a new revision, if there are parser or TRPL rules changes (in the rules/ folder, which are used in the conversion phase), it is recommended to run a full conversion.
any abend during the conversion process (for example, because a file was changed and that is no longer 4GL valid code, or if there is a FWD bug) will leave the cvtdb/ database in an incorrect state - after the fix, you will need to run a full conversion again.

Guidelines and Tips¶

Make sure you have carefully read the following Running the Front End Conversion chapter, including the sections on Interpreting Conversion Output, Problem Resolution Process and Logging and Debugging Tips.

In addition to those guidelines and recommendations, the following points should be considered.

When running the middle part of the conversion, always make sure to run the front end as well. This means that M0 should not be run by itself, but rather it should always be paired with a front end such as F2+M0.
When running the code back end, always make sure to run the front end and middle at the same time. This means that CB should not be run by itself, but rather it should always be paired with a front end and middle such as F2+M0+CB.
The first time you attempt to run the conversion process through to the code back end, it should be expected that there may be failures. These failures will have to be resolved before the conversion process can be run to completion in a single batch.
The entire project must be run together in a single batch, at least the first time you convert. This is due to certain parts of the conversion needing to be able to lookup or resolve data that will only exist if all the source files for the whole application are known. For example, to convert RUN statements, the conversion must know the Java class names that map to those procedure names. After that first full batch run, you can sometimes convert individual files or small subsets of files. As a rule of thumb, all code that has common dependencies must be converted together when there is a change.
Any changes to temp-table/work-table definitions or to any database schema requires that the entire application be reconverted in a single common batch.
The conversion front end is designed to keep going regardless of failures, but the middle and code back end will halt on any fatal problem. This means that the preprocessing and parsing issues can be found in a single run but middle or back end failures will be encountered (and thus resolved) sequentially.
The time it takes to run a full conversion primarily varies based on the size of the application being processed. For an application that is a single file, the entire conversion may only take 30 seconds. An application with 3000 procedures and over 500 KLOC (thousands of lines of code), the conversion may take 2 hours. Larger applications will take even longer. The speed of the CPU, disk and the amount of RAM will also have an effect.
Small application conversions will not require much memory. Medium sized application conversions may take 512MB to 768MB of memory to complete (for 3000 files and over 500 KLOC). The amount of memory will increase directly in relation to the size of the application, since there is some data that must be stored and processed across the entire application, rather than on a file by file basis. Use the command line option -XmxNNNm (where NNN is the number of megabytes of memory for the Java heap) just after the java executable name.
The conversion process is single threaded. This is something that will change in the future, but for now it means that multiprocessor CPUs will not have as big an impact on conversion batch time as one might expect.
When running the conversion on a remote Linux or UNIX system, run in a screen session to allow the process to survive a session disconnect.

Debugging Conversion Failures¶

To get meaningful diagnosis information on a failure, use the -D2 option on the conversion command line. Using -D0 will generate no information at all and -D1 will provide very little useful information. -D3 is much too verbose for normal usage, but -D2 is highly useful. The console output is not changed unless there is a failure, in which case a reasonably detailed error report is displayed.

While failures in the front end process do not halt the conversion run, once the parser step is complete all subsequent processing in handled using the TRPL engine (the PatternEngine program). In TRPL, a failure halts the process. This leads to a cycle where conversion is run, on a failure, the failure is debugged. Once a fix is made, the conversion is restarted and the cycle repeats as necessary.

Understanding TRPL Error Output¶

This is an example of a Progress 4GL code that can cause a failure:

def var ch as longchar.

At the time of this writing, the 4GL longchar datatype is supported only by the front conversion phase. When the middle conversion phase is reached, the Code Conversion Annotations will fail as it can not understand the datatype of the ch variable. This “unsupported” example will be used in the next paragraph to demonstrate how the TRPL error can be interpreted. When attempting to convert the above code (assuming it is contained by the broken_4gl_def_var_longchar.p), the following failure will occur (this is an example of the -D2 failure output):

Code Conversion Annotations
------------------------------------------------------------------------------

Optional rule set [customer_specific_annotations_prep] not found.
./broken_4gl_def_var_longchar.p
EXPRESSION EXECUTION ERROR:
---------------------------
throwException(errmsg)
^  { Unrecognized data type VAR_LONGCHAR. [DEFINE_VARIABLE id <133143986179> 1:1] }
---------------------------
ERROR:
java.lang.RuntimeException: ERROR!  Active Rule:
-----------------------
      RULE REPORT
-----------------------
Rule Type :   WALK
Source AST:  [ def ] BLOCK/STATEMENT/DEFINE_VARIABLE/ @1:1 {133143986179}
Copy AST  :  [ def ] BLOCK/STATEMENT/DEFINE_VARIABLE/ @1:1 {133143986179}
Condition :  throwException(errmsg)
Loop      :  false
--- END RULE REPORT ---

    at com.goldencode.p2j.pattern.PatternEngine.run(PatternEngine.java:718)
    at com.goldencode.p2j.convert.ConversionDriver.processTrees(ConversionDriver.java:892)
    at com.goldencode.p2j.convert.ConversionDriver.back(ConversionDriver.java:780)
    at com.goldencode.p2j.convert.ConversionDriver.main(ConversionDriver.java:1661)
Caused by: com.goldencode.expr.ExpressionException: Expression execution error @1:1 [DEFINE_VARIABLE id=133143986179]
    at com.goldencode.p2j.pattern.AstWalker.walk(AstWalker.java:226)
    at com.goldencode.p2j.pattern.AstWalker.walk(AstWalker.java:160)
    at com.goldencode.p2j.pattern.PatternEngine.apply(PatternEngine.java:1105)
    at com.goldencode.p2j.pattern.PatternEngine.processAst(PatternEngine.java:1003)
    at com.goldencode.p2j.pattern.PatternEngine.run(PatternEngine.java:690)
    ... 3 more
Caused by: com.goldencode.expr.ExpressionException: Expression execution error @1:1
    at com.goldencode.expr.Expression.execute(Expression.java:430)
    at com.goldencode.p2j.pattern.Rule.apply(Rule.java:401)
    at com.goldencode.p2j.pattern.Rule.executeActions(Rule.java:640)
    at com.goldencode.p2j.pattern.Rule.coreProcessing(Rule.java:609)
    at com.goldencode.p2j.pattern.Rule.apply(Rule.java:440)
    at com.goldencode.p2j.pattern.Rule.executeActions(Rule.java:640)
    at com.goldencode.p2j.pattern.Rule.coreProcessing(Rule.java:609)
    at com.goldencode.p2j.pattern.Rule.apply(Rule.java:440)
    at com.goldencode.p2j.pattern.RuleContainer.apply(RuleContainer.java:488)
    at com.goldencode.p2j.pattern.RuleSet.apply(RuleSet.java:1)
    at com.goldencode.p2j.pattern.AstWalker.walk(AstWalker.java:213)
    ... 7 more
Caused by: com.goldencode.p2j.pattern.CommonAstSupport$UserGeneratedException: Unrecognized data type VAR_LONGCHAR. [DEFINE_VARIABLE id <133143986179> 1:1]
    at com.goldencode.p2j.pattern.CommonAstSupport$Library.throwException(CommonAstSupport.java:2082)
    at com.goldencode.p2j.pattern.CommonAstSupport$Library.throwException(CommonAstSupport.java:2067)
    at com.goldencode.expr.CE1538.execute(Unknown Source)
    at com.goldencode.expr.Expression.execute(Expression.java:336)
    ... 17 more

At the top is the standard header output for the conversion step (Code Conversion Annotations in this case). Below that header is the list of files that have been processed so far. The last file in the list before the error output is the file in which the problem occurred. In this case the first and last file are both broken_4gl_def_var_longchar.p so there is no ambiguity in the example.

The next section of the output provides a summary of the problem:

EXPRESSION EXECUTION ERROR:
---------------------------
throwException(errmsg)
^  { Unrecognized data type VAR_LONGCHAR. [DEFINE_VARIABLE id <133143986179> 1:1] }
---------------------------

This lists the TRPL expression (the line of code) that failed. The caret ^ identifies the point in the code at which the failure occurred. In this case, the exception is throw via the TRPL throwException API. The text inside the curly braces { } describes the problem. In this case, an exception is throw by some TRPL rule set. To identify the TRPL rule set, see the next section, How to Find the Failing TRPL Code. In this case, the problematic file is the annotations/variable_definitions.rules rule-set.

The next section of the output reports the kind of error that occurred. Usually it is some Java exception that was thrown. In this case it is a RuntimeException:

ERROR:
java.lang.RuntimeException: ERROR!  Active Rule:

At that point there is a dump of useful information about the state of the trees at the time of the failure. This is called the rule report:

-----------------------
      RULE REPORT
-----------------------
Rule Type :   WALK
Source AST:  [ def ] BLOCK/STATEMENT/DEFINE_VARIABLE/ @1:1 {133143986179}
Copy AST  :  [ def ] BLOCK/STATEMENT/DEFINE_VARIABLE/ @1:1 {133143986179}
Condition :  throwException(errmsg)
Loop      :  false
--- END RULE REPORT ---

In TRPL, a rule is an executable line of code. These TRPL rules are grouped by a rule type, such as a DESCENT rule or a WALK rule. The TRPL engine traverses an AST and creates events associated with that traversal. The type of event corresponds to the different rule types. When an event occurs during the traversal, the rules associated with that rule type (or event) get executed. The rule type is displayed as the first line of the rule report.

After the rule type is some diagnostic state regarding the current AST nodes being processed at the time of the failure. During traversal of the AST, the TRPL engine keeps track of the current AST node (the source node) and also provides a duplicate of that node called the copy node. The source is read-only, but the copy is a version of that same node that can be modified or deleted. When the TRPL program (called a rule-set) completes, the copy tree is saved off and made into the source tree for the next rule-set. Then a new copy is made so the next rule-set can have a tree to edit. The reason there are two trees is so that edits do not cause the tree traversal to fail. For this reason, the traversal occurs on a read-only tree and the TRPL rules always have access to that input tree, even while they are editing the output tree.

The way to interpret the current AST listing is:

[ node text ] root/path/to/the/node/ @line:column {node id}

The node text is inside square braces [ ] and it is the text for the AST node, as read from the source file. The path listing is next. It provides a set of token types from the AST root node (as the leftmost segment of the path) through each parent and child in the ancestry chain until the current AST token type is displayed in the rightmost segment. Each parent and child in the path is separated by a / slash character. In this case the root node for the AST is a BLOCK, the current node is a FILENAME type and the immediate parent node is a KW_RUN.

The line and column information is next. For schema ASTs, that data references the original schema definition file (.df file). For code ASTs, that line and column data references the preprocessed .cache file. Since that file may have many expansions (e.g. includes), the line and column data will often vary between the cache file and the original source file.

Finally, inside the curly braces { } is the AST node identifier. This is a 64-bit integer that uniquely identifies the AST node across the entire project. The upper (most significant) 32-bits is unique to the source file and is shared across all nodes in the AST created from that source file. The lower (least significant) 32-bits is unique only within the AST and is a node-specific number.

Since both the source and copy AST nodes are different instances of the same tree, they usually will share the same node data (such as the identifier). To understand the AST node data in more detail, please see the Reviewing the AST and Understanding AST Identifiers sections of the Resolving Parsing Issues chapter.

Below the node data is the condition that was being processed. That is the TRPL expression (line of code) that failed. It should be the same as the expression shown in the summary section that had the header EXPRESSION EXECUTION ERROR:.

The final value displayed in the rule report is whether or not the TRPL code was in a loop. TRPL provides a <while> block which allows iteration based on a control condition.

Below the rule report is any stack trace that is associated with the failure. Usually, this is a series of chained stack traces. TRPL is based on Java and runs inside the Java Virtual Machine (JVM). Thus, normal Java stack trace behavior is in use. As with all chained stack traces, usually the most important portion is the last stack trace in the chain. In this case:

Caused by: com.goldencode.p2j.pattern.CommonAstSupport$UserGeneratedException: Unrecognized data type VAR_LONGCHAR. [DEFINE_VARIABLE id <133143986179> 1:1]
    at com.goldencode.p2j.pattern.CommonAstSupport$Library.throwException(CommonAstSupport.java:2082)
    at com.goldencode.p2j.pattern.CommonAstSupport$Library.throwException(CommonAstSupport.java:2067)
    at com.goldencode.expr.CE1538.execute(Unknown Source)
    at com.goldencode.expr.Expression.execute(Expression.java:336)
    ... 17 more

Looking at this makes it clear that the exception is throw by the TRPL code, and not the Java code. This is because the CommonAstSupport$Library.throwException API is called from a Java class part of the com.goldencode.expr package, with its named having a CE prefix followed by a numeric index. Another hint that this is interpreted TRPL code is that the CE1538 class has no meaningful name and also no source code (the standard way to compile FWD is to include the source code information in the compiled bytecode).

If this diagnostic information does not make the problem obvious, the next step is to look two things. First, the TRPL rule-set that failed must be found and reviewed. That will allow the developer to understand what the conversion code was trying to do at the time of the failure. Second, the failing location in the 4GL source code must be found and reviewed. The Progress 4GL source code that is being processed is the primary input to the conversion processing. The TRPL code is highly dependent upon this input, so it is critical to understanding any failure.

How to Find the Failing TRPL Code¶

At this time, the error report does not include the specific TRPL rule-set filename, nor does it list any line/column information regarding where in that rule-set the problem occurred.

The key to finding the failing TRPL code is the failing expression (a TRPL rule) itself. The idea is to use tools such as grep to search through the rule-sets (which can be found in the directories listed in the patpath global configuration parameter, see the Project Setup chapter). TRPL rules can be split onto multiple lines, so the developer cannot simply search on the entire expression as an exact match to the text string. Pick some subset of the expression to use as the match criteria. In the example above, the overall expression is throwException(errmsg), which results in multiple occurrences. Instead, search for the first part of the error message, Unrecognized data type.

The filenames for rule-sets can end in a variety of extensions (e.g. .xml, .rules, .rpt) so it is best to search across all files in the patpath directories.

If there are too many matches in the rule-sets, the developer can try to isolate the search to the code that is specific to the conversion step being run. This table gives a mapping between the conversion step names and the directories in which the primary TRPL code can be found. Warning: it is possible (and common) to use libraries of TRPL functions that are shared across conversion steps. These libraries will usually exist in common directories outside of a step-specific location. That means that limiting the search too greatly can cause the search to be unsuccessful.

Conversion Step	Primary TRPL Rules Directory
Post-Parse Fixups	fixups/
Schema Fixups (data dictionary)	schema/
Schema Fixups (Progress source file schemas)	schema/
Call Graph Generation	callgraph/
Schema Annotations (scan Progress source code)	schema/
P2O Generation (database schema files)	schema/
P2O Generation (temp-table schema files)	schema/
Database Field Length Statistics	schema/
Generate Hibernate Mappings (database P2Os)	schema/
Generate Hibernate Mappings (temp-table P2Os)	schema/
Generate Data Model Objects (database DMOs)	schema/
Generate Data Model Objects (temp-table DMOs)	schema/
Generate Java Source Data Model Objects (DMOs)	convert/
Unreachable Code Analysis	unreachable/
Code Conversion Annotations	annotations/
Frame Generator	convert/
Business Logic Base Structure	convert/
Core Code Conversion	convert/
Generate Java Source Business Logic and Frames Classes	convert/

Once the list of possible matches has been found, examine each potential match to see if the surrounding code matches the full expression. This should reduce the list of potential matches to a small number, ideally to only a single match. Failing expressions usually include references to named variables, constants and literals that can make a line of code unique among the matches.

How to Find the Failing Progress 4GL Source¶

The current 4GL source file being processed should be the last displayed filename before the error report was displayed. In the example above, the file is listed as broken_4gl_def_var_longchar.p which means that the preprocessed version would be named broken_4gl_def_var_longchar.p.cache.

The line and column of the code being processed is usually easily seen from the source AST node information. In the example above, the FILENAME node that was being processed at the time of the failure was from line 1 column 1 ( 1:1). Remember that this line and column information references the preprocessed cache file, not the actual Progress 4GL input file.

When the parser builds the AST based on the input source code, there are times when the parser will create an artificial AST node. An artificial node is one that did not get created based on a source code token read from the lexer. This technique is used for purposes such as aggregation of multiple source code tokens under a common artificial parent in the tree. The idea is to make the tree easier to process, by improving the structure. When one of these artificial nodes is the current source node at the time of a failure, the line and column information will usually be listed as 0:0 since there is no real backing source line and column for an artificial node. This makes it more difficult to find the failing code in the preprocessor cache file.

The trick to use in this case is to search in the AST XML file for the node identifier being used. In the example above, the source and copy nodes both use an identifier of {133143986179}. Here is a portion from the AST file named broken_4gl_def_var_longchar.p.ast:

  <ast col="0" id="133143986177" line="0" text="block" type="BLOCK">
    <annotation datatype="java.lang.Boolean" key="reachable" value="true"/>
    <ast col="0" id="133143986178" line="0" text="statement" type="STATEMENT">
      <annotation datatype="java.lang.Boolean" key="reachable" value="true"/>
      <ast col="1" id="133143986179" line="1" text="def" type="DEFINE_VARIABLE">
        <annotation datatype="java.lang.String" key="name" value="ch"/>
        <annotation datatype="java.lang.Long" key="type" value="337"/>
        <annotation datatype="java.lang.Boolean" key="vardef" value="true"/>
        <annotation datatype="java.lang.Boolean" key="reachable" value="true"/>
        <ast col="9" id="133143986181" line="1" text="ch" type="SYMBOL">
          <annotation datatype="java.lang.Boolean" key="reachable" value="true"/>
        </ast>
        <ast col="12" id="133143986184" line="1" text="as" type="KW_AS">
          <annotation datatype="java.lang.Boolean" key="reachable" value="true"/>
          <ast col="15" id="133143986186" line="1" text="longchar" type="KW_LONGCHAR">
            <annotation datatype="java.lang.Boolean" key="reachable" value="true"/>
          </ast>
        </ast>
      </ast>
    </ast>
  </ast>

By searching that file for the text 133143986179, one will find this section. The nodes surrounding the failing node will have some line and column information that can be used in the cache file.

It is possible that the current node at the time of the failure does not exist in the persisted AST XML file. This can occur since there are portions of the AST that are dynamically created during the conversion process. When a failure occurs, it is not always safe to persist the current AST into the XML format. The persisted AST file is the one which was last saved, which is not necessarily the exact version that was being processed at the time of the failure. If this occurs, try to use the node text (e.g. def) to find possible locations in the file that would match. The other thing that can be very useful is the path information in the rule report. The path information may uniquely identify a particular path in the tree, or it may at least reduce the possible locations in the file to a small number. In the above example, the path was BLOCK/STATEMENT/DEFINE_VARIABLE/. The failing code referenced a DEFINE_VARIABLE node whose parent was a STATEMENT and whose grandparent node was a STATEMENT and whose great-grandparent node was a BLOCK. This is often enough to find the location in the source code. Once the AST nodes in question are found, use the line=”NNN” and column=”YYY” attributes in the surrounding nodes to map back into the preprocessor cache file.

Next Steps¶

Once the TRPL code and the Progress 4GL code have been found, analysis of that code in light of the error will usually make the problem clear. If the TRPL code is complex, carefully looking at the Progress 4GL and/or the Java ASTs may be very important. Please see the Reviewing the AST and Understanding AST Identifiers sections of the Resolving Parsing Issues chapter for details. For more details on the TRPL language and on debugging TRPL please see the FWD Internals book.

Common Issues¶

The following is a summary of the most common issues that may be encountered.

OutOfMemoryError¶

This is how the Java environment (inside of which TRPL is executing) has run out of memory. In particular, the memory requirements for Java and TRPL programs are largely fulfilled out of the java “heap”. The heap is a pool of memory that is managed by the JVM. Try setting a larger size on the java command line. The parameter must appear after the java executable name but before the class name that is being executed. The parameters is in the form -XmxNNNm where NNN is the maximum size (in megabytes) of the heap.

Broken or Incomplete ASTs¶

Since the conversion front end does not halt on a failure, it is possible to have errors during preprocessing, lexing or parsing which will cause the resulting ASTs to be broken, incomplete or invalid. Whatever AST is there (broken or not) when the parser completes operation will be persisted to the XML format.

During later conversion phases, these broken ASTs will be very likely to cause problems or lead to unexpected results. This is the reason that conversion should not be run while the front end still has errors occurring. If the AST looks invalid or broken, do check the output of the conversion front end (in the log file) for any possible reported problems. If no errors are present, the error may have been suppressed or otherwise may have gone undetected.

Note that when the parser fails, it often tries to recover and continue parsing. Part of this recovery process entails dropping tokens that are unrecognized. This can occur for tokens currently being processed as well as some number of following tokens. As soon as the token stream starts matching expected grammar again, the dropping of tokens will stop and regular parsing will continue. The result of these dropped tokens will be missing portions of the AST where a developer would normally expect to see one or more subtrees. These missing nodes are really the same thing as a broken AST, but the breakage just manifests itself as missing content.

Broken Progress 4GL Code¶

The conversion process assumes that all Progress 4GL source code is syntactically valid. While the FWD parser will usually detect syntax problems, it is not guaranteed that such problems will be identified. As a result, it is possible for invalid 4GL code to parse and be passed through to the later conversion steps. Faced with invalid input, the conversion processing is not expected to complete successfully. Fix the broken 4GL code and re-run the conversion.

Missing Source Files¶

The conversion command line requires a list of the all application's source files to be given. If there are missing files in that list, then there are portions of the conversion process which may break as a result. This can happen in the Code Conversion Annotations step, but it is more likely to be seen in the Business Logic Base Structure step where RUN statements that reference hard coded filenames are statically resolved to Java class names. An example of this error can be seen in the section above entitled Debugging Conversion Failures. To resolve this, add the referenced file(s) to the conversion command line and re-run conversion.

In that same Business Logic Base Structure step, Progress 4GL code that directly references a missing internal procedure will fail. This is really a case of broken Progress 4GL code, but it will appear in the same part of the process as the missing file problem.

FWD Conversion Defect¶

The FWD conversion is a complex process which operates on a complex and often cryptic 4GL language as its input. On top of this, all of the conversion processing is automated, which adds an additional set of problems. It should come as no surprise that the conversion itself can be defective.

The unreachable processing is one of the most likely steps to fail. Unreachable processing attempts to detect code that cannot ever be executed. This is an inherently difficult problem since it must by nature deal with quirky and undocumented block properties/behavior, transaction processing, UNDO processing, RETRY processing, flow of control, conditions processing as well as all of the behavior and side effects of the full range of language statements that can generate a condition which was unexpected or not properly coded.

Code Conversion Annotations (or just “annotations”) is a long and complex step which is the most likely to fail of all conversion processing. That step is responsible for the widest range of 4GL code analysis that occurs during the conversion.

The Core Code Conversion step is where semantically identical Java ASTs are created based on a parallel traversal of each Progress 4GL AST. This is the second most complex step, behind annotations. Thus this is a common place for problems to occur.

No matter where the conversion process fails, it is important to have a TRPL programming knowledge to solve the problem. Please see the FWD Internals for more details.

Missing or Incomplete FWD Support for a 4GL Feature¶

Although the gaps in supported features between the Progress 4GL and FWD are constantly shrinking, at any given point in time there are some features that FWD has not yet implemented or which are only partially implemented. Attempting to convert such code will almost always cause failures during the conversion processing. These failures will often be exhibited in error reports about unexpected or unrecognized ASTs node types. As the input is unexpected, the errors cannot be completely generalized.

There are certain edge cases or rare cases that may have been excluded from support intentionally. Usually there are explicit tests for such cases in the conversion process and a clear exception is thrown to notify the developer that there is an unsupported feature present.

The resolution for either case is to remove/repalce the feature usage in the 4GL source code or to fix the conversion to handle that feature properly. Often when the conversion is enhanced to handle a new feature, the FWD runtime will also need to be enhanced to handle the new runtime requirements of that feature.

Project

General

Profile

FWD

Wiki