Project

General

Profile

Running the Front End Conversion

Once enough of the global project configuration is in place, it is time to run the front end of the conversion process. The conversion process is automated by a tool called the ConversionDriver. It coordinates the processing and organizes that processing into three phases: the front end, the middle and the code back-end. The front end of the conversion reads the application source code and translates it from the Progress 4GL language (as a programmer would write it) into an abstract syntax tree (AST). The AST is a tree-structured format which is designed to be more easily processed by software than the original 4GL source code. So the inputs to the front end is the application source and the outputs are a set of ASTs which can then be processed further by the middle and back-end of the conversion process.

Avoiding Common Mistakes

The front end conversion process does several things, but the most important are the preprocessing of 4GL input files into properly formed 4GL code and the parsing of that 4GL code into an Abstract Syntax Tree (AST) form for programmatic/automated processing. For this reason, the front end conversion process is also often just called "parsing".

Getting 4GL source code to preprocess and parse properly is a prerequisite for automated conversion and for use of the Code Analytics tools. Both of these downstream processes rely upon the code to be in AST form.

Parsing the code is often straightforward, but it does have some prerequisites. The parsing process is doing much the same processing that the OpenEdge 4GL compiler does when generating r-code. For this reason, one should expect to provide the same kinds of inputs and configuration to FWD that would be needed for an OpenEdge 4GL compile of the entire application.

Addressing the following issues before running the front end conversion process will greatly reduce the effort needed to complete a successful run. The alternative is to run the front end conversion process multiple times, clearing as many errors as possible in each run.

1. Database Schema Issues

Parsing Progress 4GL source code is so dependent upon the database schema that it is a waste of time to run the conversion front end if the database schema configuration is not properly setup. The previous section on Schema Loading must have been carefully implemented in p2j.cfg.xml.

The key issue here is whether FWD will know about all databases that must be connected when a particular source file is being parsed. If there is a single database in use, then it should be safe to set the default=”true” attribute and it will be loaded for all source files in the project.

If there are multiple databases, but they are always connected for all source files, then the default=”true” attribute can be set for all of them and they will all be available for all source files. This will work if the schemata were designed to be mutually exclusive (there is no overlap between unqualified table names and no overlap between unqualified field names).

If there are multiple databases and some are only connected sometimes (using a CONNECT statement at runtime), then it may be important to simulate this in the FWD configuration. Two problem cases will be seen.

The first problem occurs when there are database names that conflict. Such a case occurs where source code references are ambiguous between two or more database schemata. This can happen when database references are not fully qualified (e.g. order.name instead of awesome-db.order.name). Progress 4GL's database name abbreviation support can further worsen the potential for schema name ambiguity (e.g. order.na can match awesome-db.order.name as well as awful-db.order.nada).

A second problem exists when the same logical database is accessed via an alias (through CREATE ALIAS) and the code is referencing that alias in qualified table names.

Just like an OpenEdge compile, the FWD parsing does not have a runtime stack with CONNECT and CREATE ALIAS statements executed. When multiple database schemata exist AND the schemata are not all connected by default, then hints must be provided to FWD to simulate the CONNECT and CREATE ALIAS statements that some of the code will be dependent upon. In the 4GL at runtime these statements are used before dependent programs are run, so that those programs "see" the connected databases and have access to the databases using the names expected.

It is important to provide properly encoded hints to the FWD conversion tools, such that the proper database schemata are available to match each source file in the project. Such hints can be specified at the individual source file level, for all procedures in an entire directory, or for all source files in an entire sub-directory tree. Please see the Conversion Hints chapter for details on how to use the database (the equivalent of CONNECT) and alias (the equivalent of CREATE ALIAS) hints for this purpose.

If the application has a complex set of possibly conflicting database schemata, then it is important to define the hints early, otherwise getting the conversion front end to run will be a very long and arduous process.

2. Missing Include Files

Another common problem occurs during preprocessing. If the FWD preprocessor cannot find referenced include files, a warning message will be generated noting that there was a missing include file, but the process will continue.

Missing includes may or may not cause other downstream problems. In particular, if the lack of that included code doesn't result in a syntax problem, then parsing may complete successfully. In this case the resulting preprocessed and parsed source code will be incorrect (which will lead to functional problems later). Often, the lack of included code will manifest itself in syntax problems during parsing. Either way, it is important to avoid missing includes when those files should actually exist.

The most common cause of missing include files is an incorrect or missing PROPATH. See the earlier section on Global Configuration in this chapter for details on how to specify the PROPATH (see the propath global parameter value). Some applications depend upon dynamic assignment of the PROPATH at runtime (the modification of the PROPATH during application execution). Some of these use cases can be duplicated with the dynamic-append and dynamic-prepend global parameter values. There are additional tools for setting or overriding these values on a per file or directory basis using hints. See the chapter on Conversion Hints for details.

Another common cause of missing includes is for include files that are not in source control. This is very common for applications which use ADM/ADM2 or other frameworks that are part of the OpenEdge installation. These are truly application dependencies, but they are almost always overlooked since they are usually found through implicit entries in the PROPATH. This case can also happen when 3rd party frameworks are used, if those are not present in source control and are not added to the source tree when the FWD project is created.

If there are include files referenced in the code base which are not in source control or otherwise added to the FWD project, then there can be some effort and delays associated with gathering these items. It won't be apparent that something is missing until there is an attempt to parse that generates failures. Then each failure location must be investigated. This can be quite time consuming. Once the missing files are located and obtained, parsing is re-run and new failures can be encountered. This can happen because the parsing gets further or because of new include file references in the new code that was added. This is an iterative process and one can only know that everything has been found when all code parses without error.

For this reason, it is best to identify and resolve these "hidden dependencies" before running the front end conversion process.

3. Specifying All Source Files

This problem does not usually manifest itself as errors during the front end run. The problem here will often be found later in the automated conversion process when something tries to reference a non-existent procedure (e.g. in a RUN language statement). The problem can also manifest at runtime if the missing files are referenced via a dynamic method such as in an expression passed to RUN VALUE(). Either way, many problems can be avoided by making sure that the list of files given to the front end conversion process is complete (and that the files are actually there).

The list must contain all procedure or class files that could ever be possibly executed in the application (either as the main program at startup of the user's session or whether it is called indirectly). If the application has a mixture of .w, .p and other procedure or class files, then this must be handled when providing the list of procedures to the ConversionDriver.

There are 3 ways to pass the list of program names to the ConversionDriver. Please see Running the Command for details.

4. Runtime Preprocessor Argument Usage

When runtime preprocessor arguments are used extensively in the code, then there is effort in encoding the "hints" to FWD to know how to preprocess the target procedures. Without this preprocessing, those procedures usually cannot be parsed. Encoding these hints will require a 4GL application developer that can provide one concrete example of the arguments for each location in the code which is called in this manner.

5. OO 4GL and .NET Class Usage

When OO 4GL code depends on external classes (built into OpenEdge itself or from 3rd parties), the FWD parser may not have enough information to parse these references. To resolve this, stub versions (called "skeletons") of any such classes/interfaces must be provided to allow this code to parse.

See Parsing OO 4GL for more details.

6. FWD Defects or Missing Code

Defects can exist in the FWD preprocessor or parser. Although FWD parser has successfully used with a large number of applications, including a wide range of features, it is possible that a new application may trigger latent bugs in the FWD code. When this happend, most often one can easily bypass these with some minor code changes in the 4GL sources. But it is possible that a problem might have to be fixed which would take longer.

See Preprocessing Issues and Parsing Issues for more details.

Running the Command

The following discussion assumes that the user has a shell (command prompt) open and the current directory is in the project root directory ($P2J_HOME).

The ConversionDriver program manages the entire automated conversion process. The portions of the process to be run are chosen by command-line parameters that specify the “modes” to be executed.

To run the front end, the best mode to use is F2. Later on when a full conversion is being run, this mode parameter will be modified to include other portions of the process, the most common being F2+M0+CB. But for this case, F2 is most appropriate. This forces the front end to run with the schema loader, preprocessor, lexer, parser and AST persistence.

This process requires another input (besides the mode) which is the list of procedure files to be processed. This corresponds to the list of .p files for the application. The actual file extension doesn't have to be .p, it can be anything, just as in Progress.

There are three ways to specify the files to be processed. Regardless of the mechanism used for specifying the list of source files, you must make sure that EVERY external procedure and class needed by the application (at any time in its processing) is included in the list. This means all code used for ChUI, GUI, batch, appserver/PASOE, REST and SOAP/WSA. This also means any downstream procedure/class dependencies called from that code need to be specified too. For ADM/ADM2 applications this means using Possenet and including some of that code in the conversion list.

Explicit Filenames on the Command Line

An explicit list of 4GL procedures (one or more relative or absolute filenames). This is the default approach. The command line will contain an arbitrary list of absolute and/or relative file names to scan. This list is hard coded in the command line itself. Any number of files may be listed on the command line, subject to the shell's command line limits.

Example:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver f2 ./abl/relative/path/one.p ./abl/relative/path/two.p ./abl/relative/path2/three.p

File Specification

A top level directory and a file specification (as a regular expression) for the matching procedure files in the containing directory tree. The top level directory is the directory tree to search for files. The file specification is the filter that will be used to create a list of matching files from that directory tree. The file specification must be enclosed in double quotes if either of the wildcard characters * or ? are used. This mode is naturally recursive (it will process all matching files found in all levels of sub-directory of the top directory) unless the N option is passed. To use this approach, pass -s on the command line.

Example:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -s f2 ./abl/top/directory/to/process/ "(*.[pPwW]|*.cls)" 

The -s option enables the non-default file specification mode. The next two parameters specify the directory to process and the file specification that is used to match external procedures.

Filename List in a Text File

The input for this mode is a filename for a text file that contains an explicit list of the procedure and class filenames to parse or convert. The filename parameter must be a single relative filename of a text file that contains a custom list of relative file names of files to scan. The file list will be read from the specified file instead of being hard coded on the command line. There must be one filename per line in the file and there is no limit of the number of files in the list. To use this approach, pass -f on the command line.

Example:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -f f2 my_file_list.txt

The -f option enables file list command line processing.

An example of the content for my_file_list.txt is:

./abl/my_app/module1/some-file.p
./src/my_app/module1/another.p
./src/my_app/module2/different-file.p

Please follow these rules:

  • There is one file name per line.
  • All filenames are relative to the project root.
  • Starting in FWD v4:
    • The UNIX/Linux path separator character / is always used. The tools will convert this to the local platform's path separator as needed.
    • A # as the first character on a line will treat the entire line as a comment (it will be ignored). A # character anywhere else will be considered part of the filename.

Command Line Specification Plus Ignore List in a Text File

This command line option is only available starting with FWD v4.

This feature combines a command line file specification with a custom file ignore/exclude list. The ignore/exclude list is a simple text file. Each line in this file specifies a file or set of files to be excluded from the set of files that would be gathered by the command line specification. All file specifications in the text file must be relative to the project root directory.

There must be one ignore/exclusion specification per line in the file and there is no limit of the number of files in the list. Activate this with the -x option to the ConversionDriver and provide 3 parameters (in this order):

  • The top level directory is the directory tree to search for procedure and class files.
  • The file specification (as a regular expression) is the filter that will be used to create a list of matching files from that top level directory tree. The file specification must be enclosed in double quotes if either of the wildcard characters * or ? are used. A common regular expression for this is "(*.[pPwW]|*.cls)".
  • The name of the ignore/exclude list file.

Example:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -x f2 ./4gl/top/directory/to/process/ "(*.[pPwW]|*.cls)" my_ignore_list.txt

An example of the content for my_ignore_list.txt is:

./abl/skeleton/*
./abl/my_app/module1/a_file_to_exclude.p
./abl/my_app/module2/some/broken/subtree/*
./abl/my_app/a-dif???ent-file.cls
  • Each line must reference either a single file name, or a set of zero or more files using the following wildcard characters:
    • the asterisk (*) symbol, to represent zero or more wildcard characters;
    • the question mark (?) symbol, to represent exactly one wildcard character.
  • There is one relative file name or specification per line.
  • All filenames/paths are relative to the project root.
  • A specification such as ./abl/skeleton/* excludes everything in the ./abl/skeleton/ directory, even all subdirectories and their contents are excluded.
  • Blank lines are ignored.
  • The UNIX/Linux path separator character / is always used. The tools will convert this to the local platform's path separator as needed.
  • A # as the first character on a line will treat the entire line as a comment (it will be ignored). A # character anywhere else will be considered part of the filename.

Other Notes

Syntax help is available by running the ConversionDriver without any parameters. The Conversion Driver javadoc has these same details in the main() method.

Some applications may be a mixture of regular 4GL source (e.g. usually files named with extensions .w and .p) as well as WebSpeed procedures. In particular, Embedded 4GL (E4GL) WebSpeed code will usually be stored in files with the .htm or .html extension. The command line must properly include all these files, even if there is a mixture. In the case of the E4GL source files, FWD includes an E4GL preprocessor that runs before the Progress 4GL-compatible preprocessor. This is handled automatically based on the filename of the input file.

Interpreting Conversion Output

The ConversionDriver will emit output to the console regarding the status and results of the conversion processing. The following assumes there are two Progress 4GL input files (named first.p and second.p), a single database schema named test.df and a metadata schema called standard.df to be processed. Given these inputs, the output will look something like the following:

------------------------------------------------------------------------------
FWD Conversion Driver
------------------------------------------------------------------------------

------------------------------------------------------------------------------
SchemaLoader
------------------------------------------------------------------------------

Importing 'standard.df' for schema 'standard'...
Persisted schema 'standard' to 'standard.dict'
Importing 'test.df' for schema 'test'...
Persisted schema 'test' to 'test.dict'

------------------------------------------------------------------------------
Scanning Progress Source (preprocessor, lexer, parser, persist ASTs)
------------------------------------------------------------------------------

first.p
second.p

------------------------------------------------------------------------------
Post-Parse Fixups
------------------------------------------------------------------------------

./first.p
./second.p
Elapsed job time:  00:00:01.380

------------------------------------------------------------------------------
Schema Fixups (data dictionary)
------------------------------------------------------------------------------

./data/namespace/test.dict
Elapsed job time:  00:00:00.610

------------------------------------------------------------------------------
Elapsed job time:  00:00:08

This output is abbreviated since this is just the conversion front end. If this was a full conversion, a much longer list of phases would be output. In this case, there are only 4 phases: Schema Loader, Scanning Progress Source, Post-Parse Fixups and Schema Fixups. Each phase has its own section of the output where the list of input files being processed is output as those files are handled. At the end of the section, an elapsed job time is displayed in the format hours:minutes:seconds.milliseconds.

While a given file name XYZ is displayed at the end of a section of output, that input file is being processed. Any errors that are displayed after that file name (but before the next input file name ABC) are problems that occurred due to processing XYZ in the FWD tools for that given phase. The conversion process is designed to continue processing all input files in all requested phases, regardless of whether any errors occur. If this was not done, it would be impossible to judge how close a project is to completion since it would abort at the first error. In addition, this allows multiple problems to be identified at once and thus multiple problems can be resolved simultaneously.

If a given input file has errors in earlier phases, then it is expected that later phases will also fail. There is usually no point to looking into failures in later phases. Instead, it is the first failure(s) for a given file that must be resolved. Note that it is possible for there to be multiple different problems for the same file.

Problem Resolution Process

The Schema Loading section describes how to get the Schema Loader phase of the conversion front end to run successfully.

The Scanning Progress Source phase is where the preprocessor, lexer and parser steps are run. The chapters Resolving Preprocessing Issues and Resolving Parsing Issues provide more details on how to handle issues during this phase.

The Post-Parse Fixups and Schema Fixups are not normally phases in which problems occur. If problems do occur there (without any corresponding failures in earlier phases), then it is very likely that there is a bug or missing feature in the FWD tools for these phases. See the book FWD Developer Guide for details on debugging the FWD tools.

Once the conversion run completes, fixes, configuration updates or other resolutions can be put into place. Normally, to test a specific resolution, a developer would run the ConversionDriver with only the file(s) affected by the resolution. This will show if that resolution works or not. Once it works, a more complete run of the entire project can be executed, or resolutions for other problems can be added and then tested with limited runs of the conversion front end.

Some simple rules of thumb regarding running the conversion tools:

  • Only run one conversion tool or process at a time per project.
  • Do not modify files in the project while the conversion is running.
  • Do not rebuild the FWD jar file while the conversion tools are running.
  • You can manually abort the conversion processing at any time using CTRL-C or some other signal processing (e.g. kill in Linux or UNIX) in the console in which the FWD tools are running. This will immediately abort the processing. Please note that the results will be incomplete and any results currently being processed may be partial or otherwise corrupted.

Logging and Debugging Tips

The ConversionDriver can be configured to generate debugging output when problems occur. To set the debug level, use the D option with a numeric level between 0 and 3 inclusive. 0 means no debug output and 3 is very verbose (massive and constant tracing of all processing). The most useful level is normally D2.

For example, the D2 is added to the -S option that is already present in this command line:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -SD2 F2 src/top/directory/ "*.[pPwW]" 

All of this debugging output and any normal output are generated to the standard output (STDOUT) and standard error (STDERR) of the shell. This makes it valuable and possible to log that output for future review.

In Linux or UNIX, a common logging technique is to add the following to the end of the command line:

2>&1 | tee “cvt_front_end_$(date '+%Y%m%d_%H%M%S').log”

This redirects the STDERR output to the same destination as STDOUT and then both outputs are piped into the tee program. tee displays all input on the console and simultaneously writes it to the filename specified (cvt_front_end_YYYYMMDD_hhmmss.log in this example). The YYYYMMDD_hhmmss will be a timestamp generated when the process is started. This allows the user to see the output in real time and also save that output for future reference.

For example:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -SD2 F2 src/top/directory/ "*.[pPwW]" 2>&1 | tee “cvt_front_end_$(date '+%Y%m%d_%H%M%S').log”

For more specifics on resolving issues that occur during front end conversion processing, please see the next two chapters: Resolving Preprocessing Issues and Resolving Parsing Issues.


© 2004-2017 Golden Code Development Corporation. ALL RIGHTS RESERVED.