Project

General

Profile

Reporting

This chapter is only relevant for FWD v3.0 and earlier FWD releases. As of FWD v3.1, the reporting tools described here are no longer available and have been replaced in their entirety by Code Analytics. The Simple Search and List the Files Which Contain a Pattern Match are still available but are not often used as the Code Analytics tools provide much more capability. This chapter is kept here for historical reference and for usage of older releases.

The FWD conversion technology is largely based on a language called TRee Programming Language (TRPL). This language allows one to encode programs that inspect, analyze, search, modify and otherwise completely control abstract syntax trees (ASTs). This processing can be done across a set of arbitrary size on a non-interactive basis. This means that such processing is completely automated and is suitable for the processing of the largest of software projects.

One of the features FWD has built upon the TRPL language is a very powerful reporting facility. This facility can be used to search for and report on very complex code patterns which are highly specific to the Progress 4GL. In other words, all processing is highly 4GL syntax-aware, which makes this facility extremely useful since it is not limited to simple regular expression searches (e.g. grep). It can be used to calculate statistics, search for all instances of very specific patterns which would otherwise be impossible to find, analyze, categorize and otherwise deeply inspect a Progress 4GL software project.

While not strictly a reporting tool, FWD includes a simple search feature that can be used to quickly find complex patterns across an entire project. This search tool is one of the facilities described in this chapter.

Beyond simple searching, FWD also includes a reporting facility. The reporting tools search a set of project files based on specific criteria. The matches to that criteria are categorized and tabulated into HTML report documents.

There are three ways of using the reporting facilities:

  • Canned Reports - FWD includes sets of predefined reports which can be run with no coding necessary. The results are stored in a set of hyperlinked HTML documentation which provides a valuable and unique insight into the project upon which the reports were run.
  • Ad-Hoc Reports - Search for arbitrary Progress 4GL code patterns across an entire project by specifying a custom match expression which can access the full range of 4GL syntax elements including the structure and relationship of AST nodes, text, token type, annotation data and much more.
  • Custom Reports - Create your own reports and leverage the full capabilities of the reporting facility and TRPL.

This chapter explains how to use each of these reporting approaches. All examples in this chapter will assume that the project root directory can be found in the shell variable $P2J_HOME.

Prerequisites

The search and reporting tools take a set of one or more ASTs input. This means that the conversion front end must have been successfully run across the entire set of source files upon which reporting is to be processed. When reporting is being used to examine an entire project, it is very important to have successfully completed the conversion front end on the entire project. Any failure during the conversion front end will usually result in an AST that is missing or broken. For this reason, it is important to eliminate all errors reported during the front end processing. After all errors are resolved, make sure you have run the conversion front end across all source files in the project in a single pass (or batch). This is not strictly necessary, since it is possible to have run the front end in multiple passes so long as all source files have gotten successfully processed. But as a rule, always re-run the front end across all source files in the entire project any time the project configuration is changed, a database schema changes or the FWD code is changed. To do otherwise may caused a range of subtle problems in the results.

Please see the Conversion Environment Setup and Project Setup chapters for details on this processing. When the front end processing is complete, the reporting tools can be used.

In the case that a project has been fully converted or has had any of the middle or back end conversion processing executed, the canned reports may not work properly or those canned reports may provide unexpected results. The reason is that after the front end of the conversion process, the input ASTs are heavily modified and augmented in ways that can obscure the original intent of the 4GL source code from the perspective of the canned reports. The canned reports were designed to be used on ASTs as they exist when the front end conversion process completes. If the ASTs have been processed by later conversion phases, it is important to run the front end (by itself) on the project before reporting is done. A copy of the project can be made to provide an isolated environment for reporting. Just duplicate the entire directory tree of the project under a different file system name and then re-run the conversion front end. That copy is now safe for use with the canned reports.

FWD provides a search tool which is like a version of grep that is fully aware of the syntax and structure of the Progress 4GL. To leverage this, the user must know how to specify the search condition using a TRPL expression. The TPRL syntax is quite similar to Java, but it includes many extensions which provide easier access for inspection, analysis and modification of ASTs. For details on how to program the TRPL language for use with Progress 4GL ASTs, please see the FWD Internals book.

Start by changing directories to the project root:

cd $P2J_HOME

For the purposes of understanding search, the expression type prog.STRING can be used to match all AST nodes that have the STRING token type. The STRING type is associated with all string literals in a Progress 4GL program (no matter whether they are delimited with single quotes or double quotes).

Here is an example command:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.PatternEngine criteria="'type == prog.STRING'" reports/simple_search ./src/ "*.[pPwW].ast" 

This runs the TRPL engine passing the search expression as a variable named criteria to a TRPL program named reports/simple_search. The TRPL engine will run that search program against all ASTs found in the ./src/ directory tree which came from source files that had an extension of .p, .P, .w or .W.

Running the PatternEngine with no parameters will provide a syntax display. For more details on the syntax of the PatternEngine, please see the Running Ad-Hoc Reports section of this chapter.

Assume there is a file named hello.p with the following content:

def var txt as char init "World".
message "Hello " + txt + "!".

To see an example of the XML AST file created for this code, see the Reviewing the AST section of the chapter on Resolving Parsing Issues.

After the PatternEngine command is executed, each match of our condition is displayed on the command shell console (STDOUT). Here is sample output of the search:

./hello.p

"World" [STRING] @1:26

"Hello " [STRING] @2:9

"!" [STRING] @2:26

Elapsed job time:  00:00:00.854

This shows that there are 3 string literals in the hello.p program. Perhaps the user wants to know which string literals are being used with the string concatenation operator (+). The expression for this is type prog.STRING and parent.type == prog.PLUS so the command would look like this:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.PatternEngine criteria="\"type == prog.STRING and parent.type == prog.PLUS\"" reports/simple_search ./src/ "*.[pPwW].ast" 

This is the output:

./hello.p

"Hello " [STRING] @2:9

"!" [STRING] @2:26

Elapsed job time:  00:00:00.856

As can be seen, only 2 of the 3 literals are used as operands to the string concatenation operator. This is just the merest glimpse of the power and flexibility of TRPL in regards to searches of ASTs. More details are beyond the scope of this book.

The TRPL code for this search is stored in reports/simple_search.xml which is found in the patpath (see the Project Setup chapter in regards to this Global Configuration parameter). As with all TRPL programs, it is an XML file. This TRPL program just evaluates the contents of the variable criteria and for each match, it dumps the matched subtree in a human readable form to the console. Inside the simple_server.xml, criteria is a variable of type java.lang.String. The pattern engine command line MUST have a parameter specified to initialize the value of this variable. The syntax is

criteria="'valid_TRPL_expression'" 

The tricky part here is that the outer double quote (") characters are removed by the operating system shell BUT TRPL does need the inner single quote (') characters to be part of the value. These are needed for the pattern engine to successfully assign the value of criteria which must be of type java.lang.String. Enclosing the value in single quotes makes this possible.

If the enclosed expression itself must have other string literals encoded inside it, backslash-escaped double quote characters should be used. For example, this expression matches all usage of the RETRY built-in function across the entire application:

criteria="'type == prog.func_logical and this.isAnnotation(oldtype\") and getNoteLong(\"oldtype\") == prog.kw_retry'" 

The benefit of this should be obvious since the text RETRY will appear in the standard 4GL application thousands of times, but only by understanding the 4GL language syntax and context can a tool differentiate between the different usage.

List the Files Which Contain a Pattern Match

Sometimes getting a simple list of files that match a condition is more important than getting the detailed listings of every match location (see Simple Search above). This can often be the case for conditions (patterns) that are so complex (and possibly inexact), that a human needs to review the files manually.

FWD provides a listing tool which allows the same kind of search condition (using a TRPL expression) as the Simple Search. Please refer to the section above for an example. The following will show how to run the file listing report.

Start by changing directories to the project root:

cd $P2J_HOME

The example condition will use a function which attempts to detect any code that is interactive. It will search for user-interface code that cannot be redirected, but instead requires a user to interact with the runtime.

Here is an example command:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.PatternEngine condition="'evalLib(\"interactive_ui_code\", this)'" reports/rpt_file_list ./src/ "*.[pPwW].ast" 

This runs the TRPL engine passing the search expression as a variable named condition to a TRPL program named reports/rpt_file_list. The TRPL engine will run that search program against all ASTs found in the ./src/ directory tree which came from source files that had an extension of .p, .P, .w or .W.

Note: The name of the variable is different from the simple search case (where it is named criteria).

Running the PatternEngine with no parameters will provide a syntax display. For more details on the syntax of the PatternEngine, please see the Running Ad-Hoc Reports section of this chapter.

After the PatternEngine command is executed, each file which has at least 1 match of our condition will have its filename output into a text file named matched_filename_list.txt. By passing an additional parameter outputFilename="'my_custom_filename.whatever'", the output filename can be customized.

When processing database schema ASTs, add databaseMode="true" to the command line. This will enable the original source file to be properly found for temp-table definitions. For example:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.PatternEngine databaseMode="true" outputFilename="'my_custom_name'" condition="'evalLib(\"interactive_ui_code\", this)'" reports/rpt_file_list ./src/ "*.[pPwW].ast" 

Note: The databaseMode parameter is not used in this report in FWD v.3.3 and later, and is ignored by the program. Instead, the database program name is automatically detected and used if it exists.

The TRPL code for this search is stored in reports/rpt_file_list.xml which is found in the patpath (see the Project Setup chapter in regards to this Global Configuration parameter). As with all TRPL programs, it is an XML file. This TRPL program just evaluates the contents of the variable condition and for each match, it adds the filename of the source file to a set. Once all ASTs have been processed, the set of files is output to a text file, whose name can be customized. Inside the rpt_file_list.xml, condition and outputFilename are variables of type java.lang.String. The pattern engine command line MUST have a condition parameter specified to initialize the value of that variable. The syntax is

condition="'valid_TRPL_expression'" 

The tricky part here is that the outer double quote (") characters are removed by the operating system shell BUT TRPL does need the inner single quote (') characters to be part of the value. These are needed for the pattern engine to successfully assign the value of criteria which must be of type java.lang.String. Enclosing the value in single quotes makes this possible.

If the enclosed expression itself must have other string literals encoded inside it, backslash-escaped double quote characters should be used. For example, this expression matches all usage of the RETRY built-in function across the entire application:

condition="'type == prog.func_logical and this.isAnnotation(oldtype\") and getNoteLong(\"oldtype\") == prog.kw_retry'" 

Running Canned Reports

Setup

Once the front end conversion (and no other conversion phases: see the Prerequisites section above) has been run without errors, the ASTs are ready for reporting.

The next step is to create an output directory in which all of the report will be generated (example command for Linux/UNIX):

cd $P2J_HOME
mkdir rpt/

Then create a symbolic link from the top-level source code directory to the location of the database schema AST files (schemas have ASTs just like 4GL source files). This is an example command for Linux/UNIX (assuming that all source code is under the directory $P2J_HOME/src/):

cd $P2J_HOME/src
ln -s $P2J_HOME/data/namespace data

This symbolic link is necessary to enable a simple file specification to be able to reference the schema AST files for the permanent database schema(s) as well as those AST files for the temp-table (and work-table) schemas. The database schemas reside in $P2J_HOME/data/namespace/. The temp-table schemas reside alongside the 4GL source files from which they are derived, which is normally $P2J_HOME/src/. By creating this link, there is a single subdirectory tree in which both sets of schema ASTs can be found.

Reporting Syntax

After this setup, the canned reports can be processed. There is a command line Java program called ReportDriver that is used to generate the reports. Here is the syntax for the two possible modes of operation:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.ReportDriver [options] <rptdef> <outdir> <filelist>

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.ReportDriver -S[options] <rptdef> <outdir> <directory> "<filespec>" 

The first mode (which is the default) requires an explicit list of AST filenames to process. The second mode uses a top-level directory and a file specification to use to find matching files in that directory tree.

Options include the following:

Option Meaning
Dn Set the debug level to n which must be a numeric digit between 0 and 3 inclusive. D2 is the most useful setting for normal debugging since it will emit a better report on a failure but otherwise does not change the output. D3 is very verbose and is unsuitable for normal use.
S Use a top-level directory and a file specification instead of the default (explicit file list).
N Disable recursion in file specification mode (the S option).

The rptdef parameter is required in both modes. It defines the XML file that contains the set of report definitions to use. This XML file with an extension of .rpt must be found somewhere in the pattern path (see the patpath parameter in the Global Configuration section of the Project Setup chapter). The .rpt file extension must not be specified (it is added automatically by the report driver).

The outdir parameter is required in both modes. It defines the directory in which all report output will be written.

In the default mode, the filelist parameter is an arbitrary list of absolute and/or relative AST file names to scan. Any file name extension must be specified in this case (e.g. .schema or .ast) since the ReportDriver has no knowledge of how to convert input file names into AST names.

In filespec mode (the S option), the directory parameter specifies the relative or absolute path to the top-level directory in which AST files will be found. The filespec parameter defines the filter specification of the filenames in the given directory to scan. Most operating system shells will require this specification to be enclosed in double quotes (or the equivalent for that shell) if any of the wildcard characters * or ? are used in the specification. Such a requirement is only used in filespec mode, since in default mode the shell should be given the opportunity to expand such usage into a list of explicit filenames.

There are 2 sets of reports: schema reports and code reports. Both are of great interest when reviewing a project.

Before running either set of reports, make sure your current directory is the project root directory:

cd $P2J_HOME

Schema Reports

To run the schema reports:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.ReportDriver -sd2 reports/schema rpt/schema src/ "*.schema" 2>&1 | tee schema_rpts.log

The first parameter after the options is reports/schema which is the rptdef. This will search for a file named reports/schema.rpt in the pattern path and run all of the reports listed in that file.

The second parameter is rpt/schema which is the outdir. Notice that in this case a subdirectory was specified to store all the schema reports in a unique directory.

The src/ "*.schema" are the directory and filespec parameters respectively. This will process all files with the extension .schema anywhere under the src/ directory.

The 2>&1 | tee schema_rpts.log is a way to redirect the console output to a log file while still seeing the output on the console. This is a useful feature which is available under Linux but which is not required.

Generally, the schema reports will run quickly. It normally runs in minutes, even for a large project.

Code Reports

To run the code reports:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.ReportDriver -sd2 reports/profile rpt/code src/ "*.p.ast" 2>&1 | tee code_rpts.log

The first parameter after the options is reports/profile which is the rptdef. This will search for a file named reports/profile.rpt in the pattern path and run all of the reports listed in that file.

The second parameter is rpt/code which is the outdir. Notice that in this case a subdirectory was specified to store all the code reports in a unique directory.

The src/ "*.p.ast" are the directory and filespec parameters respectively. This will process all files with the extension .p.ast anywhere under the src/ directory.

Warning: the code reports may take a very long time to run. If it takes 10 minutes to run the front end conversion on a project, then it may take several hours to run the code reports. If it takes 2 hours to run the front end conversion, then it will likely take over a day to run the code reports.

If there project is large, it is likely that additional memory will need to be provided to the Java Virtual Machine (JVM) to enable reporting to finish. Adding an -Xmx1024m parameter between the java and the -classpath on the command line would provide 1GB of memory for the reporting. The amount of memory needed can not be known in advance. Start with a reasonable value and increase or decrease as needed.

When the Sun Hotspot Java Virtual Machine is in use (e.g. Sun J2SE 1.6), there are 2 JVM compilers that can be used at runtime. Java uses these compilers to convert the platform independent Java bytecodes of a class file into native platform machine code, which will execute significantly faster than the interpreted bytecode approach. Hotspot has a client compiler (i.e. C1) and a server compiler (i.e. C2). The default for most environments is to run the client compiler. Long batch processes such as reporting can often run much faster with the server compiler. Generally, it is a good idea to try running reporting by adding the -server option to the java command line (between the java executable name and the -classpath option).

Command Output

The schema reports will output a section like this for each report being run:

------------------------------------------------------------------------------
Tables by Database (reports/rpt_template)
------------------------------------------------------------------------------

WARNING:  could not find match phrase list!
./data/namespace/test.schema
./src/first.p.schema
./src/second.p.schema
Elapsed job time:  00:00:01.696

At the top, the name of the report being run and the specific XML file that contains the report definition. That is followed by the list of files processed will include any database schema ASTs and any temp-table (or work-table) schema ASTs. At the end of the section is an elapsed time in hours:minutes:seconds.milliseconds format.

The code reports will output the same section format, with the only difference being the list of filenames being processed:

------------------------------------------------------------------------------
Lines of Code Analysis By File (reports/rpt_lines_of_code)
------------------------------------------------------------------------------

WARNING:  could not find match phrase list!
./src/first.p
./src/second.p
Elapsed job time:  00:00:22.671

The only thing to note here is that the .ast extension is not printed in the console output, but it is the .ast file that is being processed (e.g. ./src/first.p.ast).

For details on how to find and interpret the generated reports, please see the How to Interpret Reports section later in this chapter.

Running Ad-Hoc Reports

The reporting facility can be run in a manner that allows arbitrary, user-defined queries across a single file, a set of files or the entire project. A user of this tool must be able (at a minimum) to specify a single, valid TRPL expression which expresses the condition upon which to base the search. Each AST specified will be traversed and the specified expression will be evaluated with each node of each AST in turn. Every time the expression evaluates true, that node will be added as a match in the search results. When all the matches are found, HTML reports will be generated based on those search results.

With the canned reports, the user runs the ReportDriver which uses the TRPL engine to process a list of predefined report definitions. Each report definition in the list is a separate TRPL program. With ad-hoc reports, the user runs the TRPL engine directly and uses a generic report definition that allows a wide range of control based on command line parameters. The most important of the parameters is the specification of the expression which defines the search condition.

The statistics and reporting features are implemented as a plug-in to the TRPL engine. This generic report definition uses the same reporting facilities backing the canned reports. With the ad-hoc reports, the user has great ability to control and customize the results of that one report run. As a downside, there is no batching (of multiple reports) and the syntax is significantly more complicated. In addition, the user is limited to the processing that can be encoded in a single expression. More complex reporting is possible with TRPL, but not with the ad-hoc search facility. For example, any report that requires the storage of state and/or multiple expressions is too complex for the search facility. The possibilities are still extremely powerful. Tools like grep cannot begin to provide the same results.

The $P2J_HOME/p2j/rules/reports directory contains some sample reports that can be referenced using this driver. Each XML report definition defines a "pipeline" of "rule sets" that can be used to inspect, convert, transform or otherwise process one or more ASTs. The TRPL (pattern processing) engine handles obtaining the ASTs, providing tree walking services as needed, reading the directives in a pipeline or rule set and then processing the rule sets on each node of the tree. Based on user-defined expressions one can record matches in one or more "statistics" and then at the end of the rule set, these statistics are used to write reports into text files.

Before running a search, make sure your current directory is the project root directory:

cd $P2J_HOME

Here is an example command line (assumes that there is a ./reports subdirectory in your current directory in which the output will be created):

java -Xmx256m -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.PatternEngine condition="\"evalLib('statements',this)\"" filePrefix="'stmt'" sumTitle="'Language Statements'" dumpType="'simple'" xref="true" xrefname="'source_file_index.html'" outputDir="'reports'" reports/rpt_template ./src/my_app/ "*.[pPwW].ast" 

This example will process all AST files that correspond with 4GL source files ending in .p, .P, .w and .W files in the $P2J_HOME/src/my_app/ subdirectory tree. It will load each AST (found by that specification) in turn and it will apply the pipeline of rule-sets defined in the reports/rpt_template.xml file that must reside in the patpath(this is a global configuration parameter, see Project Setup for more details). This rpt_template TRPL rules file is the report definition that provides generic ad-hoc search capability. In the above example, all AST nodes where the library function statements returns true for the current node (this) will be matched as the condition. The statements library function matches all 4GL language statements. So this is a quick way to get a list of the language statements in use in the application. That is simply not possible with less capable tools such as grep.

For details on how to find and interpret the generated reports, please see the How to Interpret Reports section later in this chapter.

Many command prompt variable substitutions are provided in the generic search template to define runtime replaceable parameters that can be read and used by the rule set. These variable substitutions must be inserted after the Java class name, after any pattern engine option flags but before the report name. Warning: the variable names are case-sensitive! In particular, the following are provided:

Definition Default Value Required Use
condition="\"<expression>\"" n/a Y <expression> is a valid user-defined expression that represents the condition being tested (matched). The extra set of double quotes is needed to ensure that the rule set parses this as a string rather than as a boolean expression. At runtime, the string will be converted into a compiled boolean expression and executed.
databaseMode="<boolean>" false N Set this to true when processing schema ASTs. By default, this assumes that 4GL code ASTs are being processed. This value is critical for proper generation of an index for the source files (schema or code).
dumpAtAncestor=<boolean> false N If set to true and so long as there is no dumpExpr set, the text to be dumped upon matching with the condition will be at the ancestor found upwards level number of parents. See the level variable. This takes precedence over the dumpType variable if that is set as well. The output generated will be the same as the "parser" mode using dumpType, but the sub-tree will be rooted at the node based on the specified level (dumpType using "parser" mode is the same as dumpAtAncestor as true with level set to 1).
dumpExpr="\"<expression>/"" n/a N If <expression> is a non-empty String, then this user-defined expression will be evaluated to create the descriptive text for this match. This expression must be valid TRPL code that evaluates to a String result. This is an alternative to relying upon the dumpType variable. This type of dumping takes precedence over both dumpType and dumpAtAncestor if all 3 are present. The extra set of double quotes is needed to ensure that the rule set parses this as a string rather than as a String expression. At runtime, the string will be converted into a compiled String expression and executed.
dumpType="'<type>'" "parser" N This defines the technique used to generate the text that will be reported for each match. In each case, the node or the subtree rooted at that node will be described. Options for <type> are:
<type> Example Output
simple
message [KW_MSG]

simplePlus
message [KW_MSG] <total 7> <immed 1> <depth 2>

parser
message [KW_MSG] @2:1
    [CONTENT_ARRAY] @0:0
      expression [EXPRESSION] @0:0
         + [PLUS] @2:24
            + [PLUS] @2:18
               "Hello " [STRING] @2:9
               txt [VAR_CHAR] @2:20
            "!" [STRING] 2:26

lisp
message  ( KW_MSG ( CONTENT_ARRAY ( EXPRESSION ( PLUS ( PLUS STRING VAR_CHAR ) STRING ) ) ) )

The extra set of single quotes is needed to ensure that the rule set parses this as a string rather than as an unidentified symbol.
filePrefix="'<prefix>'" n/a N <prefix> is text to prepend to the auto-generated detailed report filenames to allow easy identification. The extra set of single quotes is needed to ensure that the rule set parses this as a string rather than as an unidentified symbol.
level=<num> 1 N The ancestor at which to dump (or capture) the descriptive text for each AST node matched by the search. This is honored when dumpAtAncestor is true and so long as there is no dumpExpr specified. If set to 0, it dumps the current node. If 1, the parent (and all children) are dumped. If 2, the grandparent tree is dumped and so on.
multiplexStringExpr="'<expression>'" n/a N If <expression> is non-empty, this enables a user-controlled multiplexing mode where the basis for categorizing the matches into common "buckets" or groups. <expression> must be a valid user-defined expression that returns a string that uniquely identifies/categorizes a match. All nodes that match and for which this expression generates the same exact string will be counted in the same category. Each category gets a separate line in the summary report and a separate HTML detailed output report. This is useful to categorize nodes based on something like a lowercased version of their text rather than the token type, which is how it would be done by default. The extra set of double quotes is needed to ensure that the rule set parses this as a string rather than as a boolean expression. At runtime, the string will be converted into a compiled boolean expression and executed.
mutex="<boolean>" false N If <boolean> is true each group of the summary report will be assumed to be mutually exclusive with all the other groups (see multiplexStringExpr on how to create custom groups or "buckets" for matches). For this to be true, the condition must only allow an AST node to match with a single group. In such a case, this variable can be set to true and as a result the summary report will have 2 extra columns: percent of matches and percent of files. This greatly improves the summary report when possible.
outputDir="'<directory>'" "." N The relative or absolute directory name in which to generate all output. Note that the extra set of single quotes is needed to ensure that the rule set parses this as a string rather than as an unidentified symbol.
pageSize="<num>" 100 N <num> is the maximum number of lines in a detail report.
sortOrder="'<order>'" "alpha" N <order> specifies the sort order of the summary report. Options are "alpha" for alphanumeric ascending order, "insertion" for the order in which the statistics were added or "matches" for descending order from most to fewest number of matches. The extra set of single quotes is needed to ensure that the rule set parses this as a string rather than as an unidentified symbol.
sumTitle="'<title>'" "Summary Report" N <title> is the text used in the header of the summary report. The extra set of single quotes is needed to ensure that the rule set parses this as a string rather than as an unidentified symbol.
uniqueMatch="<boolean>" false N When <boolean> is set true, any duplicate nodes selected as a match are dropped so that each node can only appear as one match in the results. This only matters in the case where dumpAtAncestor is true since only in that case can the same node appear in the results more than once.
xref="<boolean>" false N When <boolean> is set true, and the xrefname is non-empty, that name will be used to create a source file index for all source files processed in the search run.
xrefname="'<index_name>'" n/a N When xref is true and xrefname is non-empty, <index_name> will be used as the file name for a source file index for all source files processed in the search run. That index will be created only if both of the above conditions hold. The index is created based on the type of AST being processed (database or code). See the databaseMode variable for more details.

Strictly speaking, only the condition variable is absolutely required. All other variables have some reasonable default. Using these features, one has great control over the resulting search processing and the reports that are generated.

The first time a report is run against the entire project (or any significant subset like an entire subdirectory), text and HTML versions of the source files (for code ASTs the original source, preprocessor cache file, lexer output and parser output) are saved off under the outputDir. This process can take quite a while, but once these files have been copied/generated they won't be re-generated until the source files change.

It can be useful to have an index for the source files. The xref and xrefname variables provide that capability.

A library of common functions has been provided in the $P2J_HOME/p2j/rules/include/common-progress.rules file. These functions can be referenced using evalLib('<name>'[,optonal_parms[,...]]) in any expression.

This generic reporting uses the TRPL engine directly instead of the ReportDriver. The TRPL engine command line syntax:

java -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.pattern.PatternEngine [options] ["<variable>=<expression>"...] <profile> [<directory> "<filespec>" | <filelist>]
Option/Parameter Meaning
-d <debuglevel> <debuglevel> must be one of the following integers:
0 = no message output
1 = status messages only
2 = status + debug messages
3 = verbose trace output
-f Enables explicit file list mode. In this case the last parameter(s) must be an arbitrary list of relative or absolute filenames instead of a file specification.
-h Forces the TRPL engine to honor the hidden flag in AST nodes. When in this mode, any AST node with the hidden flag set true will be bypassed in the tree walk.
-r Forces the TRPL engine to run in read-only mode. This means that any changes made to the input trees are lost when the processing run is complete. The changes will not be saved back to the input files.
variable The name of a variable to create and initialize using a specific expression that is specified on the command line. This is a way to pass arbitrary named data values to the TRPL program.
expression Infix expression (i.e. human readable as in x + y instead of postfix which is x y +) which is evaluated once at TRPL program startup and whose resulting value is assigned to the named variable. This is not executed once per AST walk, but rather it is evaluated only once per TRPL engine run.
profile The TRPL program name to process. This can have a name relative to the pattern path (see patpath in the Global Configuration section of the Project Setup chapter).
directory The top-level directory in which to search for filenames which match the filespec pattern. Not used with -f (explicit file list mode).
filespec The file specification to use to find all AST files under the directory given. Wildcards (* and ?) and regular expressions can be used to create the filter. In most command shells, wildcard characters and regular expressions will be interpreted by the shell. Since the TRPL engine must do the interpretation of this parameter, it is important to wrap the filter inside double quotes or some other shell-specific characters or escaping mechanism.
filelist An arbitrary list of absolute and/or relative file names of persisted AST files to process. Only available with -f (explicit file list mode).

Creating Custom Reports

The same reporting facilities are used in both the canned reports as well as the ad-hoc reporting. The ReportDriver which creates the canned reports is simply an automation tool that provides a mechanism to run a list of predefined reports on a non-interactive basis. These predefined reports could each be run manually using the same ad-hoc reporting mechanism described in the previous section. Most of the predefined reports are really just variants on the use of the reports/rpt_template.xml. The inputs for these variants are stored in the reports/profile.rpt for the code reports and in reports/schema.rpt for the database reports. These files are in XML format.

The full power and flexibility of the TRPL language is available for custom usage. When creating a custom report, it is important to review examples to understand what is possible. The most simple report to start with is not really a report, but rather it is like a version of grep that is fully aware of the syntax and structure of the Progress 4GL: reports/simple_search.xml. As with all TRPL programs, it is an XML file. This TRPL program just evaluates the contents of the variable criteria and for each match, it dumps the matched subtree in a human readable form to the console. No reports are generated, all the output is in the command shell itself.

reports/rpt_template.xml provides a much more complex example which fully leverages the reporting facilities. All the hyperlinked HTML documents are created in this example and a wide range of the possible report facilities are used.

Two other useful examples to review are reports/rpt_includes.xml (include file usage report) and reports/rpt_lines_of_code.xml (lines of code counter report) which are both used in the canned code reports.

Custom reports can be created using these same techniques. Please see the FWD Internals book for details on how to program in TRPL and for more details on the reporting facilities. If you create a new ad-hoc report, the TRPL engine (PatternEngine) would be used to run that report. The variables defined in your TRPL program can be overridden as command line parameters in the variable="<value>" format that is defined for the PatternEngine.

To add to or modify the canned reports, the files reports/profile.rpt and reports/schema.rpt can be edited or otherwise enhanced. A custom set of canned reports can be created using the same .rpt XML file format. The ReportDriver accepts the canned report list file as a parameter, so any list can be created passed to that driver, so long as the format of the file is correct.

How to Interpret Reports

Figure 1

When the report tools (the PatternEngine for ad-hoc reports and the ReportDriver for predefined reports) are run, an output directory must be specified. Assuming that this has been specified on the command line as rpt/, then all report indexes and content will be found in ./rpt/.

Among the potentially hundreds of reports are also included the various outputs of the FWD front end conversion process, including the preprocessor output (*.cache files), the human readable lexer output (*.lexer), the human readable parser output (*.parser) and the AST (abstract syntax trees in XML format - *.ast) that are created off the Progress 4GL source. Everything is in HTML or text files. It is all hyperlinked together for easy navigation.

Each set of predefined (canned) reports have an index page with a table of contents with categorized lists of related reports. Each report is reachable via a hyperlink.

Figure 2

The index page for the schema reports would be found in rpt/schema_profile.html. This is the set of database schema and temp-table/work-table schema reports. See Figure 1 for an example of that index page.

If a report lists 0 matches in 0 files, then there were no instances found of that condition in the set of files processed. As a result, there is no content for that report. Clicking on a hyperlink gets the user to the summary page for the named report.

The code report index page can be found in rpt/code_profile.html. See Figure 2 for an example of that index.

At the bottom of most pages is a link to a "Source File Index". That page is a way to see conversion process "artifacts" associated with each file. In the code reports, for each file in the project, this includes a link to:

  • the unmodified 4GL procedure file
  • the fully preprocessed version of the 4GL procedure file (named with an extension of .cache) which has all includes expanded and references fixed up just as the Progress compiler would expect to see it (please note that this may be different from the cached output that you can get from the Progress COMPILE statement since it seems that Progress outputs an intermediate version in that case, which itself is not always compilable)
  • a .lexer text file that is a human readable representation of the "token stream" created by the FWD lexer
  • a .parser text file that is a human readable representation of the "tree" created by the FWD parser
  • a .ast XML file that persistently encodes the actual Abstract Syntax Tree (AST) created by our parser

Figure3

Figure 3

Figure 3 shows an example of a Source File Index page for a code reports run with 3 source files. Each row in the table has the related artifacts for a single source file, with a column for each artifact. Each cell has the name of the artifact as a hyperlink. Clicking on the link brings up the file in question.

Figure 4

Figure 4 shows the result of browsing the hello.p.ast link.

In the schema reports, for each schema file or 4GL source file that has a temp-table/work-table definition, the Source File Index includes a link to:

  • the original source file (.cache if it is a 4GL definition of a temp-table/work-table or a .df file if it is a database schema file)
  • the schema AST file that is the resulting structure of the schema (this is an XML AST file similar to the code AST, but designed to hold the structure of a temp-table or a database)

The idea of the Source File Index is to provide an easy way to navigate the artifacts of the conversion process front end. All of these files are copied into the output directory. This means that report output is a completely self-contained website, which can be moved and archived independent of the FWD project.

The Source File Index is generated when the ReportDriver is run. It is optionally created during ad-hoc reporting if the report being run includes the necessary processing. The reports/rpt_template.xml ad-hoc report does provide this feature (see xref and xrefname in the section on Running Ad-Hoc Reports above).

Whether the reports are generated using the ReportDriver or are created ad-hoc, there will be a Summary page created for each report. This summary page will be found in a subdirectory under the output directory. The subdirectory will be named with a predefined value in the canned reports (e.g. lang_stmt_usage for the report titled "Language Statement Usage"). For an ad-hoc report, the subdirectory name will be created from the filePrefix variable which is passed on the command line.

Inside that report-specific directory, will be a summary page named index.html. The summary page includes a title that describes the report (the condition being searched for in the source code or schemas) with the total number of matches found and the total number of files in which those matches were found. The main contents are a table of results with columns for Condition, # Matches and # Files. If the report was marked as mutually exclusive (e.g. using the mutex variable in an ad-hoc report) there will also be columns for % Matches and % Files. Figure 5 shows an example of the summary page.

The Condition column has a row for each different "variant" or category of the match being searched for in this report. The # Matches is how many instances were found throughout the searched ASTs. % Matches is the number of matches for that condition as a proportion of the total number of matches. The # Files is how many files had matches of this type throughout the searched ASTs. The % Files is the number of files with that condition as a proportion of the total number of files with matches.

Each entry in the Condition column is a link to a details report for that set of matches.

Figure 5

Details reports come in 2 forms. If there are few enough entries to fit on a single page, then a 1 page report will be there. If there are too many entries, then a 2nd level table of contents will be presented which lists a page for each range of 100 entries. For ad-hoc reports, the pageSize variable sets the threshold that determines when a multi-page report must be created. Figure 6 is an example of a 1 page details report.

Inside the details report is the following:

Filename column with the specific AST in which the match was found. For code reports this is a hyperlink to the location in the .cache file where the match was found. For schema reports this is a hyperlink to the location in the .df file if this is a database schema and it is a hyperlink to the location in the .cache file if this is a temp-table/work-table.

Line № which is the line number of the file at which the match was found.

Column № which is the column number on the line at which the match was found.

Match Text is some useful representation of the code or schema definition that matched the condition. This is often in a human readable dump of the sub-tree of the AST that matched, but in ad-hoc reports this is controlled via variables such as dumpType.

Figure 6

In the canned reports there are many very useful reports. There are general purpose reports on every language statement, built-in functions, attributes, methods and other basic language features such as block structure and expression processing. There are also more specific reports for areas of an application that can be tricky such as "Nested CAN-FINDs" or "Trigger Reversion".

These reports are only an example of the power of the FWD technology. Much more is possible with modest efffort. Given an understanding of a pattern to be found, a report definition can usually be written in a matter of minutes or hours. Once written, the report definition can be added to the canned reports so that it can be run at any time with no additional development work. See the Creating Custom Reports section of this chapter for more details.


© 2004-2017 Golden Code Development Corporation. ALL RIGHTS RESERVED.