Call Graph Analyzer¶
- Call Graph Analyzer
What is the Call Graph¶
FWD Analytics provides tools to generate a graph database which will allow you to analyze the entire code set (programs, classes, include files and schemas). Starting from the specified entry points into your application, FWD will 'walk' your code and compute dependencies between call sites and call targets.
A call site is a place in 4GL code that invokes other code. The 4GL code
RUN some-program.p is a call site.
A call target is code that is invoked (called) by other code. In the example above, the external procedure of
some-program.p is the call target.
There are many different types of call site and for every type of call site there are possible types of call targets. For example, a
RUN statements is a call type and both external procedures and internal procedures are call targets. The call graph analysis supports processing all kinds of 4GL program linkage including procedures, function calls, events, OS libraries, shell programs, triggers and more.
Each call site can be determined to call 0 or more call targets. For each call target that can be reached from a call site, there will be a linkage defined between the call site and call target.
The call graph is the set of call site nodes, all call target nodes and the linkages between those call site nodes and the call target nodes that are reachable.
The linkages between call sites and call targets are determined by traversing the application ASTs. It starts at the entry points of your application and extends to all reachable code in the application. Any code that can actually be invoked is considered "reachable". The results form the call graph for the application.
Once the call graph is generated, you will be able to use the built-in reports to determine dead code, dependencies to external resources (i.e. OS libraries), calls to code which does not exist, and more. The graph database is built on the Janus Graph, which implements the TinkerPop graph computing framework. Call sites and call targets are graph database
nodes. The program invocation linkages are the graph database
edges that connect nodes. In addition to the built-in call graph processing and reports, it is possible to use the TinkerPop Gremlin graph traversal language to write custom reports. This can allow analysis of your code in a way that was not possible before.
Using the provided reports and any custom tools you may want to build, this can help you remove from your application any code which is no longer in use, analyze the impact of a change by knowing which other programs are dependent on the code being changed, split your application into components for easier management, reduce the coupling between different programs/components, and more.
Running the Call Graph Analyzer¶
Call Graph Configuration¶
The required configuration to build the call graph is specified via
p2j.cfg.xml parameters and are described by the following table:
|Parameter Name||Default Value||Description
||The storage folder.
||The folder in which to search the programs, classes and the include files.
||A shell matching pattern, describing the files which are included by the preprocessor.
||Flag indicating if the 4GL application was originally ran on a case-sensitive file system.
||Needs to be specified. This will point to an XML file, defining the list of entry points (by file or folder):
From these, only
callgraph-db-folder are graph-specific: the others should be already configured, as part of the initial FWD project setup.
Call Graph Generation¶
Call graph generation must always be done using the ASTs from the front-end conversion phase, with any post-parse fixups already applied. With this constraint, there are two modes of generating the callgraph: automatically (via the
ConversionDriver), and explicitly (via the
If you are using the FWD sample applications as a template, the provided
build.xml has a
callgraph target which will automatically run the front-end phase (to create fresh ASTs) and generate the callgraph. This
callgraph ANT task relies on the
CallGraphGenerator to generate the call graph. To run it:
cd ~/app_name ant callgraph
c: cd c:\app_name ant callgraph
You will also be able to export the graph in GraphML format; the export will generate a
callgraph.graphml file in the root of your project, and will contain your application's callgraph; it is possible to import this file in other tools which provide graph visualisation, or process it in any other way. See Using CallGraphGenerator section for more details.
ConversionDriver allows callgraph generation by specifying the
F3 mode for the front-end phase. This produces the same results as running the
CallGraphGenerator tool in normal mode. To run the conversion and generate the callgraph (assuming the current folder is the project home folder), the following command can be used, which includes all
java -classpath p2j/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -Sd2 f3 ./abl/ "*.p"
or, if you want to specify a list of programs, use:
java -classpath p2j/lib/p2j.jar com.goldencode.p2j.convert.ConversionDriver -Fd2 f3 file-cvt-list.txt
file-cvt-list.txtcontains all source files which need to be parsed.
F3 mode is specified, the front-end phase will also generate the callgraph. If needed, the middle and code backend phases can also be specified, so that full conversion is performed.
ConversionDriver was run in only F3 or F2 mode (i.e. the ASTs are the ones generated by the parser and with any post-parse fixups applied), the
com.goldencode.p2j.uast.CallGraphGenerator tool can be used to generate the callgraph, avoiding the overhead of re-parsing the sources.
This cannot be used if the middle or code backend has been executed because the ASTs are heavily modified during the actual conversion process and these modifications render the ASTs unsuitable for the graph analysis.
This tool allows generating the callgraph both from scratch or incrementally, to resolve ambiguities without a full run. The syntax of this tool is:
java -classpath p2j/lib/p2j.jar com.goldencode.p2j.uast.CallGraphGenerator [-Dn] [-u] [root_nodes...]where:
-Dnsets the pattern engine debug level where n is the numeric setting of the level (default is 1):
none (0) status (1) debug (2) trace (3)
-usets the callgraph processing in update mode, where only the ambiguous nodes and their newly-resolved targets are processed. The targets specified in the
root_nodeslist are ensured to be updated and must already exist in the callgraph.
[root_nodes...]is optional, and it contains one or more root node filenames to process (which must be valid relative or absolute names based on the current directory). Each filename must specify an existing AST file associated with a legacy external procedure, and they will not affect the project's configured root program list (
rootlist.xml). If no root nodes are listed on the command line, and we are not in update mode, then the project's root list will be used; in update mode, only the ambiguous programs will be processed, if no root programs are specified at the command line. When additional root programs are specified at the command line, these will be processed depending on the graph mode (update or non-update mode):
- in non-update mode, the configured
rootlist.xmlentry point programs will be ignored, and processing will use as entry points only the specified list of programs
- in update mode, these additional root programs will be processed along with the ambiguous programs in the graph. The root programs specified in
rootlist.xmlwill not be used, as we want to only update the graph, after hints to disambiguate some call sites have been added.
- in non-update mode, the configured
In non-update mode, the first part consists of loading the entire code set into the graph database, ensuring that unreachable external programs or include files are represented in the graph. This loading is done in two phases:
- Using the
include-specparameter (which defaults to
*.[iI]), all the include files matching the specification are listed from the
basepathfolder, and for each a special node is added to the graph.
- Using the
*.astpattern, all the external programs (found in the
basepathfolder) are processed. This includes creating a special node in the graph and also generating the sub-graph associated with the include linkages, by processing the associated
pphintsfile (the preprocessor hints file is output from the front-end parsing process). When creating the include sub-graph, if the include file is not found in the graph, a new node is created for it and a warning is logged to STDOUT.
- All schema triggers are loaded into the graph, and linked to their associated procedure trigger file, if it exists.
The second part of call graph processing requires creating a graph node for each defined schema trigger and resolving the linkage between the schema trigger and the target external program. The target external programs, even if determined as existing in the initial code-set, will not be automatically added to the root node list.
The final part consists of applying the call graph processing rules (which process the call-sites and add linkages between the call-site and the call-site's target) to the programs determined to be as root (or entry-points) and all reachable programs, in batches:
- the rules are applied to the initial root list and the set of first-level reachable programs is determined.
- the rules are applied to the set of first-level reachable programs and the set of second-level reachable programs is determined.
- the rules are applied to the set of the second-level reachable programs and the set of third-level reachable programs is determined.
- and so on, until no more reachable programs are found OR all reachable programs were already processed.
The callgraph processing rules are split into standard and customer-specific rules. The customer specific rules must reside in a
customer_specific_call_graph.rules file, which needs to be accessible via the
patpath configuration parameter. For each external program, first the standard rules will be applied, followed by the customer-specific rule-set. This allows the customer to write custom rules to automate the disambiguation of ambiguous call-sites and/or to adjust any linkages generated by the standard rules.
After the callgraph is generated, you can create a
GraphML representation of the graph database using this command:
java -Xmx8g -classpath p2j/lib/p2j.jar com.goldencode.p2j.uast.CallGraphGenerator -g
Depending on the size of your application, you might need to increase the Java heap; also, this can take a while for the export to complete. Once complete, the graph will be saved in the
callgraph.graphmlfile, in the project home. GraphML is a standard file format for representing graph networks. This file can be viewed with tools like yEd or read with other applications. Describing the
GraphMLstructure is outside the scope of this document.
Call Graph Connection Settings¶The graph connection settings can be tweaked via the
cfg/callgraph.propertiesfile. The default settings are:
storage.directory=callgraph/This is the setting from
callgraph-db-folderproperty (if not found, defaults to
storage.backend=berkeleyjeThe storage backend. Changing this might require additional JARs added to the classpath.
index.search.backend=luceneThe backend for the
searchis the default index name used by FWD). Changing this might require additional JARs added to the classpath.
index.search.directory=callgraph/searchindexThe folder to store the
cache.db-cache=trueEnable DB-level cache. Disabling this will slowdown the graph queries.
query.fast-property=falseDo not pull all properties when reading a vertex. If you enable this, all vertex properties are read each time a vertex is loaded, which will affect performance and memory usage.
More graph connection settings can be found at https://robertdale.github.io/docs.janusgraph.org/0.2.0-SNAPSHOT/config-ref.html.
Defining and Refining the Call Graph¶
The process of obtaining a graph definition always starts with determining the root list. Look into your
.pf files, batch scripts, database and gather all these program names. Once found, configure the
rootlist.xml as described in Defining the Root List section.
The first run of the call graph generation tool most likely will produce ambiguous call sites. These can be found in the Ambiguous Call Sites report, in FWD Analytics. Depending on how large your application is, there might be a few programs which will require disambigation. Each ambiguous call site in each program must be reviewed by a developer that is knowledgeable about the application. That developer must then create a hints to identify the exact list of call targets which can be reached from each ambiguous call-site. On the next execution of the call graph (in update mode), these hints will be processed, any reachable code will be linked into the graph from that call site and the call site will will be removed from the ambiguous call sites list.
In terms of determining the reachable code, it is important to resolve all ambiguous
RUN statements and OO invocation calls; start with these call sites, which may refer an external program which was not already reached by the call graph generation. This will help you get a full code coverage by the call graph faster. The other call sites (which target OS processes, internal procedures, user-defined functions, external libraries) can be done next. The Missing Targets report will help you identify any programs which were not included in the parsed code; analyze each case in this report and determine if the program was missed or the code which invokes this missing program is actually unreachable or is otherwise not in use.
Repeat this process until all call-sites have been disambiguated and all missing programs were reviewed and resolved.
Once the callgraph has been defined, you can start to cut-down your code by eliminating dead code (program files not reachable from the entry points). This dead programs can be found in Dead Code report. Depending on the state of your ambiguous RUN statements, the report may present 'false positives', program files which are not yet dead (as they are reachable by a i.e. not-yet-disambiguated
After removing dead code, it is important to ensure the application is still functional: compile and test the resulting cut-down code set. If this is functional, then the removed dead code was really dead and your callgraph is correct and complete.
Otherwise, you need to determine which code was assumed dead and removed, but is still needed: add each program back, and re-generate the callgraph. If there are any ambiguities, resolve them. Repeat this process until you have all ambiguities resolved, and the application passes testing, with all the dead code removed.In short, this iterative process to implement and test the call graph can be described by these steps:
- Define the root list.
- Run the call graph analyzer (generation).
- Review call graph reports for missing and ambiguous code.
- Define hints to resolve the ambiguous code, change the root list or add/modify code to the project for missing programs.
- Repeat from step 2 until there is nothing ambiguous or missing.
- Use the call graph reports to eliminate dead code from the project.
- Compile and test the resulting "cut down" distribution to sanity check the validity of the call graph.
- If there are any issues that show some code that was thought to be dead is actually needed, add these program files back (with their associated hints) and repeat from step 2.
- If no functional issues can be found in testing, the call graph is correct and complete.
- The resulting source code represents your application with all the code needed to run correctly.
Defining the Root List¶Building the callgraph requires the list of entry point (root) programs which are accessible from the outside world. This list must include all programs that are directly executed:
- programs executed by a user, a batch process or via the appserver
- programs determined dynamically (i.e. saved in a database table and picked from it)
- programs ran from command line or shell scripts
- programs targeted by schema triggers
- programs configured in
All such "top-level" programs must be specified as entry points.
The root programs have a special meaning in the graph database: only the initial root list and all reachable external programs (e.g. via a
RUN external-program statement) will be processed during callgraph generation. External programs not reachable from any of the root programs will not be processed (they can be declared "dead").
CallGraphGeneratortool, or via a special XML file, specified by the
p2j.cfg.xml. This XML file is composed from a root XML element named
roots, which has one or more XML elements named
nodeelement can specify either a file name or a folder name, relative to the project home (the name must include the
- if the
filenameattribute is set, it must specify the name of an AST file associated with an external program.
- If the
folderattribute is set, then the
patternattribute must exist: this is a shell matching pattern of the AST files to be included from the specified folder. The optional
recursiveattribute can be set to
true, so that the pattern will be applied to the folder and any subfolders, recursively.
Following is an example of how this file can look, assuming the
basepath is the
<?xml version="1.0"?> <roots> <node filename="./abl/some/application/folder/some-external-program.p.ast" /> <node folder="./abl/another/application/folder/" pattern="*.ast" recursive="true"/> </roots>
This example adds to the root list the
./abl/some/application/folder/some-external-program.p and all programs from the
./abl/another/application/folder/ and any subfolders.
If you want to include all external programs as entry-points, use a node with the folder name set to the basepath value, and with the recursive attribute set to true, as in (assuming
<?xml version="1.0"?> <roots> <node folder="./abl" pattern="*.ast" recursive="true"/> </roots>
This ensures the entire code set is processed by the callgraph, and all the external targets are determined, regardless if an external program is dead or not. Be careful if you do this. This is the same as stating that all external procedures and classes are reachable (not dead). If you do this, you might want to run the callgraph twice, and generate two different databases: one with the entire code set added, to resolve all external targets and one with the explicit list of entry points, to resolve the dead files.By having two graphs, one with the entire code set as entry points and the other with only the explicit entry points, you will be able to compare the resolved external targets and determine, for example:
- which OS libraries are no longer needed
- what Web Services are no longer in use
- what Socket connections are no longer needed
For more details, see the External Dependencies report
Resolving Ambiguous Call Sites¶
During call graph processing, all targets (external programs, procedures, OS calls, etc) which are specified as a literal filename or as a string literal will be automatically resolved. These are "hard coded" program linkages that cannot be changed at runtime. An example of this would be
When the target is resolved at runtime (via a complex expression), the call graph will report it as ambiguous. These call sites can be found in the Ambiguous Call Sites report, provided by the FWD Analytics. An example of an ambiguous call site would be
The report will list each program with ambiguous call sites, and for each program the details about the call site, as determined in the associated preprocessed program file (the
.cache file for that program).
To solve these ambiguities, it is required to build a UAST hint file for each program. This hint file will have the same name as the original program name, ending with the
.hints suffix; also, it needs to be placed in the same location as the associated program. So, for a program named
./abl/some/app/program.p, the hint file will be named
The structure of this file is explained in the Conversion Hints chapter. For our call graph purposes, the hint will have a structure like this:
<?xml version="1.0"?> <hints> <!-- multiple targets --> <uast name="<hint_ID>" datatype="string"> <array-val value="target1" /> <array-val value="target2" /> </uast> <!-- single target --> <uast name="<hint_ID>" datatype="string" value=”target1”/> <!-- just ignore this ambiguous call-site --> <uast name="<hint_ID>" datatype="string" /> </hints>For each ambiguous call-site, you will be able to:
- specify multiple targets
- specify a single target
- ignore this call-site
Each hint ID will have a format like
RUN_VALUE represents the call-site's type (i.e. a
RUN VALUE statement) and the
_1 suffix represents a 0-based counter, to uniquely identify each call-site in a certain program. This is required because you might have multiple i.e.
RUN VALUE statements, and each one will need to be uniquely identified, to be disambiguated. The counter will start with 0 and will be incremented separately for each call site type. The Ambiguous Call Sites report will help you identify the hint ID to be used; see this report for more details.
In cases when a call-site can be disambiguated to more than one name, in more than one external program, a suffix is used: for example, the
RUN VALUE statement might target an internal procedure OR an external program; in this case, the hints can be specified like this:
<?xml version="1.0"?> <hints> <!-- multiple targets --> <uast name="RUN_VALUE_0" datatype="string"> <array-val value="program1.p" /> <array-val value="program2.p" /> <array-val value="program3.p:someInternalProc" /> </uast> </hints>
Here, the hint will specify the external programs
program2.p, plus, allows you to specify an internal procedure (
someInternalProcin this case), which is part of
program3.p- note the
:character which is used as a separator between the external program and internal procedure name. If the statement can invoke
otherInternalProctoo, which is part of two programs, then this hint can be added:
<uast name="RUN_VALUE_0" datatype="string"> <array-val value="program3.p:otherInternalProc" /> <array-val value="program4.p:otherInternalProc" /> </uast>
This suffix is required only when a
RUN VALUEstatement can target internal procedures; if it can invoke only external programs, you don't need this suffix.
Once hints are added for one or more ambiguous call-sites, the callgraph can be re-generated from scratch or processed in update mode. In update mode, only the ambiguous call-sites plus the explicit list of programs passed as arguments to the
CallGraphGenerator tool will be processed. The update mode will bypass the code-set and schema-processing phases, and will just apply the callgraph processing rules, until no more new external programs are linked. If an existing hint is changed, the change will not be picked up by the
CallGraphGenerator tool, when ran in update mode. In these cases, the entire call graph must be re-generated from scratch.
AMBIGUOUSsuper-node. Each edge will have as properties:
hint-instance, which represents the counter suffix for this call-site
call-site-key, an integer value (with the token type), which can be transformed to the call-site name via
Possible Ambiguous Call Sites¶
The following table describes the possible ambiguous call-sites; in this table, the
# suffix will be replaced with a 0-based index.
|call-site-key value (ProgressParserTokenTypes constant)||Hint Name||Statement||Description|
||Run the super function. The hint will specify one or more external program names.|
||Run the super internal-procedure. The hint will specify one or more external program names.|
||Invoke a dynamic function. The hint will specify one or more external program names. The hint's value can have the
||Associates a procedure defined
||Associates a function defined in a super procedure or in a procedure handle, with its possible definition(s). The hint will specify one or more external program names.|
||Disambiguate a dynamic OO object or method call. The hint will specify one or more fully-qualified class names (in case of
||OS process launching with a dynamic expression. The hint will specify one or more process launch commands.|
||OS process launching with a dynamic expression. The hint will specify one or more process launch commands.|
||OS process launching with a dynamic expression. The hint will specify one or more process launch commands.|
||Creates a ActiveX Automation object with a dynamic expression. The hint will specify one or more process launch commands.|
||Opens a DDE client with a dynamic expression.|
||Loads the control from specified file, via a dynamic expression.|
||Connects to an AppServer, Server Socket or a Web Service. The hint will specify the connection settings.|
||Opens a server socket, with a dynamic expression. The hint will specify the connection settings.|
Resolving Missing Code¶While analysing the source code, you might encounter a number of external programs, functions or internal procedures calls for which their target does not exist in the graph database. This might mean:
- problems in the code, where dead external programs were removed, but references to it still remain in other parts of programs (which may in turn be dead code)
- external programs which were specified during call-site disambiguation, but are not part of the code set being parsed
- case-sensitivity inconsistencies - a wrong case was specified in the
p2j.cfg.xmlfile, and program file names can't be matched properly
- AppServer programs invoked via
RUNstatement, which are not included in the code being parsed
- functions or internal procedures part of the Possenet framework, which are called from ADM/ADM2 code which is not in use by your application
- functions or internal procedures which are assumed to be invoked by the Possenet framework, but are optional and not in use by your ADM/ADM2 windows
- if the program was removed incorrectly - if so, add it back. This may happen while ambiguous call sites still exist, and the graph is not populated fully and correctly.
- if some program file was missed from the parsed code. If so, copy it to the ABL source folder and re-generate the callgraph.
- if the call site is dead-code or not - in this case, you can either remove the code or just ignore it
- if the case-sensitivity configuration in
Limitations¶At the time of this writing, there are a number of entry points and call-sites which are not being processed or identified in the call graph:
ON ... PERSISTENT RUNtriggers
RUN ... ASYNCcalls
- OO class events
- OCX Event Procedures
- exported web services
- Web Service and SOAP invocations are not disambiguated from normal procedure calls, so they will be matched the same as an internal procedure call.
- stored procedure invocation are not processed
UI triggers are solved once the graph is complete - it will attempt to walk the graph, starting from all callers of a program with an UI statement which can process events (like
WAIT-FOR), and find triggers which may be invoked by it. This should still be considered experimental, as there is no resolution made depending on which widgets may be enabled at the scope of the UI statement.
OO features like method invocation are resolved using a greedy approach: for overloaded methods, a parameter type match is attempted, by checking all method definitions with the same name in the method's class - first method which matches the caller's parameter types will be used. Considering the caller's parameter type may be ambiguous (i.e. the return type of a
DYNAMIC-FUNCTION is unknown at parsing time), first match will be considered as the target which may not be the real one. Also, matching is not done recursively in the class hierarchy, so there may be inconsistencies.
.NET class usage is not separated from normal OpenEdge clases; if needed, you can differentiate by checking the qualified class name associated with that OO usage.
When upgrading FWD, if there are parser changes (i.e. supporting new features), you need to re-generate the callgraph, even if your application doesn't use the new features - this is because the token IDs (for 4GL keywords, like
RUN) might have changed and the call graph will not be able to be interpreted correctly.
© 2004-2018 Golden Code Development Corporation. ALL RIGHTS RESERVED.