Progress 4GL to Java (P2J)


Modification Date
January 21, 2004
Access Control
CONFIDENTIAL

Contents

Introduction
High Level Design
Common Infrastructure
Secure Transport
P2J Protocol
Transaction Requester
Transaction Server
Security Model
Directory Server
Expression Engine
Manageability Service
Manageability Agent and Instrumentation
Application Client/Server Components
Runtime Environment - Execution
Threading and Multiprocessing
Shared Variable Support
Procedure Handles and Persistent Procedures
Business Logic
Input Validation
Runtime Libraries
Runtime Environment - Data
Progress 4GL Transaction Processing Facilities
Data Model
Data Storage
Object to Relational Mapping
Runtime Environment - User Interface
Character Mode User Interface
Menuing and Navigation
Online Help
Data and Database Conversion
Database Schema Conversion (Stage 1)
Data Conversion and RDBMS Creation/Loading (Stage 2)
Data Model Conversion (Stage 3)
Code Conversion
Progress 4GL Lexical Analyzer and Parser
4GL Unified Abstract Syntax Tree
Java Unified Abstract Syntax Tree
Tree Processing Harness
Progress 4GL Preprocessor (Stage 1)
Dead Code Analysis (Stage 2)
Annotation (Stage 3)
Structural Analysis (Stage 4)
Code Conversion (Stage 5)
Output Generation (Stage 6)
Other Conversion Tasks
Miscellaneous Issues
Load Balancing/Failover/Redundancy
User Hooks and Implementation Control
Internationalization
Regression Testing
Build Environment
References
Trademarks

Introduction

By converting Progress 4GL source code into native Java source code, a customer can completely eliminate the disadvantages of the Progress 4GL environment while leveraging the considerable advantages in retaining the significant investment in their custom application itself.  It is possible to create a set of runtime libraries and conversion tools to automatically convert an application's source code from Progress 4GL to Java.  These tools will be written in Java which means that after conversion, the customer will have a pure Java version of its application with all of the same function as the current application.  Of equal importance is the fact that the user interface will remain identical between the Progress version and the Java version.  This will eliminate any need to modify business processes and retrain users.  From the user perspective, the application will be the same.

All the advantages of Java will accrue to the new Java version.  Thus any ongoing costs from Progress Software Corp. will be eliminated as none of their technology will be in use.  With no dependency upon Progress Software Corp. the vendor related problems are eliminated.  From a technology perspective,  Java is one of the most popular and capable application environments.  It provides complete portability across all hardware and operating system platforms (from mainframe to mini to PCs to handhelds).  There are more than 4 million (and growing) Java developers available worldwide along with countless books, sample code and other technical resources.  This eliminates the artificial limitations due to the proprietary nature of the Progress environment.  Finally, there is an entire industry investing in the Java environment and there is a virtually unlimited pool of Java based technology available, which the new Java version of the application will now be able to leverage.  The Java environment has extremely strong technological potential and thus the Progress 4GL issues in this regard are also eliminated.

Progress 4GL is based on a traditional interpreter plus runtime library model.  While it is highly procedural in nature, from a functionality perspective the Java platform can be made into a complete replacement.  P2J provides an application environment and a set of conversion tools which when added to the capabilities of the J2SE 1.4.x platform, will allow this conversion to be accomplished.  This document describes the P2J design at a high level.

High Level Design

The following diagram shows the high level components of the P2J application environment (it does not show any of the conversion tools or processes):



Common Infrastructure

Secure Transport

Java systems use TCP/IP sockets as the premier inter-process communication (IPC) mechanism.  This is due to the following reasons:
  1. Sockets support is built into the J2SE platform.
  2. Using the sockets support is very easy.
  3. It is the only IPC that is 100% portable across operating systems.
  4. It provides an IPC that is useful between processes on the local system or separated by an IP network (Internet or intranet).  Thus the natural ability to distribute processing is built in by using this IPC mechanism, the decision of where to run a particular process can be made at implementation time rather than at the time the system is programmed.
These advantages are significant and any secure transport layer will be based upon sockets.

Requirements for a secure transport:
  1. Built on TCP sockets - provides the benefits of a connection oriented, session protocol and wide interoperability on public and private IP networks.
  2. Privacy - the payload of every packet must be encrypted using a strong encryption algorithm.  In this way, it is impractical to read the data even if the packets are intercepted.
  3. Integrity - the payload of every packet must be protected from modification. This way if the packets are compromised, it will be detected at the receiving end.  No hijacking of a session is possible.
  4. Authentication - both ends of a session can determine the identity of the other end point.  This is important to ensure that a secure session is not established with an impostor.
  5. Platform and language independence - while we will be using pure Java to implement all P2J services and clients for the foreseeable future, by design we are ensuring that we can choose a language and platform without precluding our use or the interoperability of the secure transport.
Transport Layer Security (TLS) is a standard for creating secure sockets.  This standard meets all of the requirements stated above and has been selected as the basis for the secure transport.  For a primer regarding this technology:

TLS Primer

The key idea here is that every P2J module in the system, whether it is a client, a server, a service... will have a unique identity in the distributed system.  This identity will be comprised of the following:
  1. name
  2. IP address and TCP port
  3. public/private key pair and any associated certificate that has been issued which authenticates our public key based on some trusted 3rd party
Most important in this case is the certificate which is how TLS implements authentication.  Between server/service modules (where there is no "logged on" user), it is this certificate that is checked before a session is established.  A key assumption here is that the keys and certificates are strongly secured at the operating system level such that no unauthorized individual has access to these credentials.  Otherwise it would be possible to pose as an impostor of one of the trusted P2J services.

Starting with J2SE 1.4.x, the Java platform has TLS support built in.  The toolset is called Java Secure Socket Extensions (JSSE) and it provides a very simple interface for establishing TLS sessions between two arbitrary processes.  The effort to use TLS from Java is only slightly more effort than to use an unsecured socket.  There is a bit more work to include the authentication capability, though this is still quite modest.  The most important feature here is the key management, which has not yet been investigated.

Detailed Design/Implementation Process:
  1. Investigate the options for key management and distribution.
  2. Prototype a set of classes to implement a simple, abstracted way of creating and using a TCP session between two arbitrary Java processes.  Define the user's identity (noted above) in a configuration file and allow options to set the target IP address, TCP port and whether the session should be secured or not.  This will allow us to easily substitute a non-TLS socket for testing the performance overhead of the secure transport compared to a standard TCP socket.  Implement full TLS authentication on a bidirectional basis, although ensure that authentication of a client can be disabled.
  3. Setup a set of testcases to simulate transaction traffic and bulk data transfer.  Use the prototype to measure performance metrics for both the secured and unsecured sockets.
  4. Document the process for setting creating keys and certificates.
  5. Document the configuration and operation of the secure transport.
  6. Fully JavaDoc the API.

P2J Protocol

The primary focus of the P2J application environment is the execution of transactions between a client and a server (synchronous request/response). However, there is also a requirement for the protocol to support forms of asynchronous messaging. A protocol must be selected or created which can provide the following:
  1. Synchronous request/response transactions where both sides can pass transaction specific data. These transactions can be defined by a well known identifier and each one can have its own structure for both the request and the response.
  2. Asynchronous events where either partner in session can send a "one-way" event which is received by the other end point but which does not generate a synchronous response. This is trivial to mimic using a request/response transaction that has an immediate, empty response. However the part that is not as trivial is when the server wishes to send an event notification to a client. Since the client is not normally listening for unsolicited events, this must be planned for.
  3. All communication must be via the Secure Transport.
At this time a publish/subscribe model is probably not necessary.

P2J protocol can be split into two logical parts:
  1. Low-Level communication protocol responsible for establishing connection, serialization/deserialization, data formats, etc.
  2. High-Level application protocol consisting of a set of application specific commands and/or API's and objects which are used to wrap and hold data during exchange and processing inside the application.
Low-Level Protocol alternatives:
  1. RMI
  2. IIOP
  3. RMI-IIOP
  4. Web Services (e.g. JAX-RPC)
  5. Roll Your Own

High-Level Protocol considerations:

Possible choices depend on low level protocol. Most low level protocols listed above are designed with the assumption that objects are specially designed for the distributed environment.  For example, objects implement a specified interface and/or have limitations on data types.  In this case high level protocol logic is hidden inside Java "client" code.  Also, all objects used for building high level part of the protocol must be implemented in both a server side and client side version.   In some cases it also requires additional classes for registering objects in a Naming Service.   The following set of problems may appear from this approach:

Approach:

From the information provided above we got a conclusion that two protocols may satisfy our needs: RMI and our own protocol. Selection between them is complicated because there are only two critical points which may affect decision: language/platform neutrality and any issues or limitations due to the design or implementation of RMI.

An effort was made to use RMI since it exists in a working, well known form without further necessary development.  However during the course of the detailed design work, many limitations were found and documented.  The result is that RMI is not suitable for the P2J low level protocol.  Please see the following link to understand the limitations of RMI.

Instead a custom protocol has been designed to meet the project's exact requirements.

Transaction Requester

Transaction Requester provides the client side of transaction processing.  It hides details like how to contact the transaction server and how to establish the secure session (including user level authentication via userid/password).  It exposes a set of transactions to the client application (usually the presentation layer) whose implementation and location are hidden. From the application point of view the Transaction Requester is an API (or Java interface) or set of API's (Java interfaces).

The Transaction Requester consist of two main parts:

Initialization Module reads configuration, tries to establish TLS connections with the server and performs other relevant startup tasks (for example, it starts the name service, if necessary). If initialization is successful Transaction Requester is ready and can perform other tasks.  The main problem with the initialization module is storing client configuration.  The configuration contains security sensitive data, in particular certificates, so configuration must be protected from outside access.   On the other hand, the client configuration should be manageable and needs to be maintainable in a convenient way.

The Protocol Module implements the High-Level protocol (see P2J Protocol), provides API for client side and uses the connection established by the Initialization Module.

Transaction Server

Transaction Server provides the server side of our transaction processing, includes security management and transaction routing. To perform the task, the server manages "sessions".   A session is opened at the moment of successful login and its state is managed by the Transaction Server until the user explicitly closes it by logging off. The session holds all state variables, including the execution point in application logic, active windows etc...   The session state can be read and can be changed by client requests by passing request data to the runtime environment and processing responses.   This approach enables reconnection to an existing session for example after client crash or long term link downtime.  Another interesting feature is session state logging which is very useful for problem determination.

The Transaction Server consists of the following modules:
A key point: the security context will *never* be passed from the client to the server.  To do so would be unnecessary and would also cause a security risk.  It is unnecessary since there is a dedicated TLS connection that is associated with a security context on the server side.  This means that every packet can be associated with the right context on the server without any additional information.  Secondly, if the server were to accept a context from the client and assign this, then a modified/compromised client application could escalate its own rights... thus the server cannot honor any security context passed from the clients.  The protocol will be designed to remove any such possibility.

This security context will be made available to the entire server-side P2J environment using a static ThreadLocal object.  This will provide for fast, simple access to the session object that is specific to the current thread.  This allows different threads to each have a separate context without making the access to this context a reference that must be passed through all levels of methods calls.

After establishing a connection and completing a successful authentication there is no need for an explicit security context on the client.  Implicitly it will exist in the TLS but we don't need to control it or even have access to it from the client.

It should be noted that "transaction" in the description of the Transaction Requester and Transaction server should not be mixed with the concept of database or data model transactions performed by the Runtime Environment.  The Transaction Requester and Transaction Server are working with "transactions" as a mechanism for sending requests from client to server, processing it and sending results back (with or without success). Such behavior is well described as a "transaction" (a sequence of call-reply).  Therefore in this case we call the Requester and Server, the Transaction Requester and Transaction Server respectively. These transactions may or may not change state of transactions in the Runtime Environment but there is no direct linking between these concepts.

Security Model

The Security Model is the approach used in P2J to provide valid users with their assigned access to resources and to prohibit valid or invalid users from accessing resources to which they have no rights.   At the most basic level, there are 3 important concepts:

A database of subjects, resources and access rights (also known as Access Control Lists or ACLs) is maintained as part of the Security Model.  A unified API provides a transparent way to perform authentication, password, group and ACL management and access checking for other parts of the application. Access checking is performed by a generic, pluggable security decision engine.

For the purposes of P2J, standardized access control points in various places of the application represent resources.  A user can be assigned to one or more groups for the purposes of access rights management.

Subjects

All users of the system will be represented by an account definition in the P2J Directory.  This account definition will store information specific to the associated user, including identifying information (e.g. passwords or passphrases, certificates), policy rules (e.g. may only logon between 8am and 5pm on non-holiday weekdays), status information (e.g. last logon time, account is enabled/disabled) or other security relevant data.  This data will be accessible to the other components of P2J.

All servers and client programs will also be represented in the P2J Directory.  These are also considered subjects, although they do not have a traditional password based authentication sequence.  It is important to associate rights with programmatic subjects just as with users.  This enhances security as it may disallow certain information or function access from specific P2J components.  For example, by removing the ability to edit the P2J Directory from a standard client, even highly privileged users (that might normally have access to directory editing) would be required to handle directory maintenance from a known special terminal.  This approach of applying security even to system components is also critical to limiting the damage that can occur by any security flaw found in one of those system components.  This enables a layered, orthogonal approach to security that greatly strengthens the system.

P2J will provide group support.  This is the concept that access rights (ACLs) can be managed and assigned to an arbitrary entity called a group.  Users can be then included in one or more groups, allowing a single policy definition to apply to multiple users.  A group can be considered as a class or type of users and this greatly eases administration effort.   From the point of view of ACL management, groups look like ordinary users, i.e. ACL can be assigned, changed or retrieved for a group just like for a user.   Groups do not have assigned passwords and it is impossible to login into system using the group name as a userid.  Any user that is included in multiple groups will obtain the union of all possible access for all groups for which they are a member, with the caveat that the first explicit refusal of access (a negative ACL - see below) will halt further access checks.

Various systems define different approaches to group management. Some allow including groups into groups. Other systems allow just one level of groups and do not allow including of group into group.   The ability to include groups into groups provides more flexibility for access right management but requires more attention (or better clearly defined policy) because of possible loops and unwanted rights assignment.   These systems are more complex for implementation.  Single level systems are less flexible but don't have the issues described above.  Since there are no compelling reasons for implementing groups within groups, a single level system will be used in P2J.

Authentication

Authentication is the process of associating a user or system component with a system recognized identity (an account in the P2J Directory).

Users are authenticated using a userid and password.  Authentication is performed by a pluggable component which implements a standard interface. For example, the following components can be implemented:

Authentication components may:

Based on analysis regarding the Secure Transport (TLS), it is clear that compromising a client side certificate does not allow an attacker to read data transferred through that TLS connection.  This makes it unnecessary to implement any form of challenge-response authentication scheme.  Challenge Reponse is a scheme in which after connection, the server sends the client a random challenge string which is then merged with the password.  A hash function is calculated for the resulting string.   The result returned by the hash function is then passed to the server along with the userid.   The server side performs the identical calculation for the reversibly encrypted password stored in module storage.  The server's result is compared with the data provided by the client.  If these values match then authentication succeeds.   The advantage of this approach is that the password is not sent over the connection, so it cannot be breached in this manner.  However, TLS provides sufficient safeguards by using a symettic session key that is never sent over the connection (it is calulated based on data that is sent such that only the server can decrypt it).  In other words, a TLS session is only vulnerable when the server side certificates are compromised.

All schemes have strengths and weaknesses.  In all cases server side is assumed as secure, i.e. all security measures are correctly implemented and maintained.  Examples:

Ordinary userid/password authentication is simple to implement and reasonably safe as long as the server is secure from the point of view of third party access.  The problem is that passwords are generally weak and often are written down or are easily guessed.

The individual certificates scheme does not have the problems of the above described scheme since the user must have something unique (not just know something) in order to access the server.  It is the most secure of the options.  However, it does require storing personal data on the client side which makes maintenance more complicated.  Another problem is that personal data also may be accessed by unauthorized subjects.  Encryption and/or storing of this personal data on individual removable devices can address this problem.  In addition, in a terminal oriented system the "client" side is really the server side.  If the server can be adequately secured, the maintenance and security issues of the individual certificates approach can be minimized.

It should be noted that it is of greatest importance to ensure that the TLS certificates are secured, since these may be the sole basis for authentication in some implementations.

The ordinary userid/password approach and the individual certificates approach will both be implemented in the first pass of P2J. In all cases, the P2J Directory will be the storage mechanism for the authentication data.

Server (and possibly client) components will be authenticated using TLS certificates.  As a result of this it is critical to separate and isolate different components into different filesystem paths.  Then by associating that process with a specific operating system account and only allowing that account to have access to the filesystem locations defined for it, the certificates can be reasonably secured and a breach of one server or service will not necessarily represent a security breach of the entire system.  As noted above, this approach requires that the hardware and operating system security is implemented properly.  P2J has no control over this level of security and is completely dependent upon it.

Communication Session Integration

All P2J clients must process authentication as part of setting up the secure transport and connecting to the server. Nothing is accessible or visible before authentication succeeds. Once authentication succeeds, the user's identity is established and is associated with the secure transport session in use. This identity is used as the subject in security decisions. When a request for access is made from a non-client module, then the identity is established via the key based authentication of the secure transport itself.

Resources and ACLs

Standardized access control points (resources) will be established in:

Each such location is considered a different resource.  A resource type is defined as any unique decision hook (access control point) that implements the unique backing logic.  Where multiple access control points implement the same backing logic (usually through shared code), all such resources would be considered to be of the same resource type.   Each resource type has a naming convention for how it differentiates between which specific resource is being requested.  In addition, each resource type has a set of valid ACLs by which it both defines the possible access methods and also defines the rights (or lack thereof) assigned to specific subjects.  For example, in Unix systems it is common for there to be the access rights RWX (read, write and execute).  In traditional SMB file/print servers (e.g. IBM's LAN Server) one can see CDRWXAPN (create, delete, read, write, execute, change attributes, change permissions, no access) as the list of possible permissions.  In both cases, these are normally defined as a set of boolean values (a bit set) that can be easily defined and quickly tested.  All resource types will define their own set of possible permissions along with the representation of those permissions in the ACL itself.

Note that it is useful in some cases to have an explicit "no access" bit (a "negative ACL"), which can override access that a subject might otherwise have.  By default, all hooks implement no access in the case where no positive access right is defined.

All resources will be "invisible" when access is not granted.  There must be no indication to the subject as to whether or not a resource actually exists if no access is allowed.  This is an important factor in limiting attacks that are designed to learn about the resources of the system as a preface to making additional attacks through other means.  It is much harder to attack a system if one can not even determine that a particular resource exists.  In addition, sometimes knowing that a resource exists is a security breach itself if the resource is unique enough (e.g. through its name) to tell the attacker something sensitive about the contents.  For example, if one resource type is a database and a database exists for each customer, then being able to see the list of database names that are disallowed still allows the attacker to get a customer list from the system.  For these reasons, no access always means invisibility.

Each access control point has a unique type, a name and implements a hook that calls the security decision engine for service. This hook is responsible for reading the correct ACL data from the P2J Directory and properly invoking the security decision engine.  The names of resources are used as an identifiers for the purposes of ACL set management (storing/retrieval/changing) and the subject is known by the context of the current session. All resources are named in unified way.

It is possible that some resource types may implement their own hierarchical structure of related resources.  A traditional filesystem is a classic example of this.  In these cases, access to one resource can be made dependent upon prior access to higher nodes in the tree.  Resources should be able to implement inheritance schemes and hierarchy traversal in a manner that makes hierarchical resources possible.

Security Decision Engine

The Security Decision Engine processes generic security rules (ACLs) using the Expression Engine.  The rules specify a logical expression that defines whether a particular subject (user or groups to which user belongs) has specific access rights to a particular resource. Each resource has its own set of possible access rights. This may be as simple as a boolean permit/deny or as complex as a bitmask of possible rights. The interpretation of the rights is up to the particular code that provides access to the resource. This code implements the hook to call the security decision engine with enough information regarding the resource, the rights requested and the subject. The security decision engine then reads the related security rules and processes these rules to determine the outcome. This result is specified as a logical expression (a boolean outcome). The caller of the Security Decision Engine is responsible for enforcing the resulting outcome.

The ACLs need to be generically stored and accessed, even though the resources and access rights that may be referenced are arbitrary and application specific. The security decision engine should not need to be aware of the meaning of such values, but it should still be able to properly evaluate the ACL. Besides the standardized data (like resource name and the access rights bitmask), there may be an arbitrary amount of application specific data that is associated with an ACL. There may also be security relevant data associated with other parts of the P2J Directory database such as a user's account. This approach allows a generic security infrastructure to secure a customized set of resources and implement the corresponding custom access rights.

Directory Server

With the distributed design of P2J, it is critical that all clients and servers share a common set of security and configuration data.  This will be accomplished by maintaining a centralized directory service.  This service will be responsible for the user authentication, the storage and retrieval of security and configuration data and access control over that data.  It will provide this information via the secure transport to properly authorized entities.

The stored data can be classified in 2 categories (the following are only examples and are subject to change):
  1. Security Related
  2. Other Configuration
The following design decisions have been made:
  1. The directory service is logically separate from the rest of the application server processes.  It may reside on the same system or it may reside on a separate system which is accessible via the secured transport.
  2. The secured transport will be the only method of communication with this service.
  3. Each client and server in the system will have the following "bootstrap" configuration:
  4. With the bootstrap configuration, any client or server has enough information to contact the directory server and authenticate.  Once this occurs, the rest of its authorized configuration is accessible.
  5. Lightweight Directory Access Protocol (LDAP) will be used as the access mechanism for the directory data.
  6. A directory service that is specific to P2J will be the front-end to the LDAP accessible directory server.  Only this directory service will be coded to LDAP.  This abstraction layer will provide a P2J interface internally and will use LDAP to access the data.
  7. The directory service will provide a configurable mapping between P2J entities/attributes and the specific customer implemented LDAP schema.
The specific design above provides the following benefits:
  1. Common directory storage between multiple applications, at least one of which will be P2J based and others of which may be written in any arbitrary language and running on arbitrary operating systems.  This is enabled by the logical separation of the directory server as well as the choice of LDAP as the access protocol.  LDAP is so commonly available, data exposed via LDAP is easily accessed from virtually any environment and OS platform.
  2. Logical separation of the directory service from the rest of the P2J architecture allows the customer to make the implementation choice of where to run the service (physical co-location with the other P2J services or not) to maximize reliability, availability, security and performance.
  3. The abstraction of the directory service from the backing LDAP server allows:
Please see the LDAP Primer for more information on LDAP.

Detailed Design Process:
  1. Make a detailed list of requirements for the LDAP server software.
  2. Review the open source options for LDAP servers.  The preference would be for a pure Java LDAP server, if one is available with the right function, reliability, performance and scalability.  Choose the best match.
  3. Once the security model detailed design has been completed, it must be converted into a specific set of LDAP schema files suitable for implementation.  These may be product specific (e.g. OpenLDAP format) unless there is a well supported standard for this.  Note that we must map the security model into the standard LDAP attributes that are available.  Any additional data in the security model which cannot be mapped into the standard attributes must be mapped into custom attributes that are specific to P2J.  Make every effort to use the standard attributes if at all possible.
  4. Design the directory service interface as presented to the P2J application environment.  Make sure to take into account the requirement that an administration toolset will need add/edit/delete access to maintain the directory database.  In addition, the security model must be handled cleanly and without hard coding to LDAP-only security features.
  5. Investigate the effort to design a pluggable interface to decouple the directory service front-end from the specific back-end LDAP client.  This would allow other (non-LDAP) back-ends to be installed at implementation time, based on customer requirements.
  6. Design the format for the mapping file in which the P2J configuration variables are associated with the implementation specific LDAP schema information for proper retrieval.
  7. Document the requirements for the core directory service itself (other than the public interface).
  8. Design the administrative interface (requirements, user interface and flow) for the directory service.
  9. Identify the source data tables and fields in the current Progress database from which we can populate the directory server (at least in regards to the user list, access rights and menuing/navigation).
  10. Design the process for converting this data into the proper format and loading it into the directory server.

Expression Engine

All general purpose languages provide support for expressions.  Logical expressions are those that generate a single boolean result.  Arithmetic expressions generate a numeric result (which could be floating point or integer).

Many areas of the P2J environment require the ability to evaluate logical or arithmetic expressions at runtime.  These complex algebraic relationships can be represented in custom written Java source code.  This has the advantage of doing exactly what a particular P2J component needs with a minimum of code.  The downside is that such code needs to be written for every place in P2J that needs similar functionality.  Thus a great deal of redundancy exists and/or there are many separate places in the code that will have related processing.  Testing, debugging and maintaining these separate areas of the code will take significant effort.

The alternative approach is to create an engine that provides a generic expression service.  The expression language is common to all locations in the code that need such processing.  All expressions are represented as data using this language.  A common, well-tested, well-debugged code base is used to evaluate the expressions, regardless of the component in P2J that needs such processing.  This means the expression processing can be done right, once and then leveraged in many places.

Golden Code has a Java Trace Analyzer (JTA) that includes a generic expression engine.  It is designed to be loosely coupled with its client code, allowing different applications to provide expression support using a common language and a well tested and debugged technology.

Please see the javadoc for package com.goldencode.expr for details on the API.

This expression engine may require some modifications to support all functions necessary across P2J.  The current engine accepts an expression of which may contain user defined functions and user resolved variables.  The client of the engine submits an in-fix expression as either a logical or arithmetic type.  The expression engine parses the expression and converts it into a post-fix format.  It then generates Java byte code for each element of the expression, calling back to the client to obtain service for the user defined functions or for resolution of variable values.  This byte code is written as a proper class format and is loaded and run.  This Just In Time (JIT) compiling of expressions means that subsequent runs of the same expression (with different variable values) bypasses the compilation step and runs immediately from the expression cache.

Expression Language Overview:

An expression consists of one or more operations.  An operation may consist of one or two operands and an operator, such as

count = 105
where count and 105 are the operands, and the equals symbol (=) operator indicates the purpose of the operation (to compare the variable count to the constant 105 for equality).  When evaluated using actual data to substitute for the variable operand, this expression will evaluate to true or false.

Alternatively, an operation may be self contained, as in the case of a user function.  For example,

@pattern('Some text to look for', A)
A user function is evaluated as an atomic operation which returns true or false.  The value returned depends upon the purpose of the user function.  In this example, the function will return true if the data being tested contains the ASCII byte pattern 'Some text to look for' (not including the single quotation marks).

An expression may be comprised of the following elements:

Operators

Operators within filter expressions may be logical or arithmetic, binary or unary.  In order to construct valid expressions, it is important to understand the relative precedence of the available operators, and to use parentheses to group subexpressions properly.

The set of operators which may be used within filter expressions is listed in the following table.  Operators in this table are listed in order of their precedence, from those evaluated first to those evaluated last.  Operators which have the same precedence are grouped together.  When evaluating operations whose operators have the same precedence, the program will evaluate the operations in the order in which they appear, from left to right.  Parentheses (()) may be used to group operations which must be evaluated in a different order.

Precedence Symbol Type Unary/Binary Operation Performed
1
! or not
Logical Unary Logical complement
2
~
Bitwise Binary Bitwise complement
3
*
Arithmetic Binary Multiplication
/
Arithmetic Binary Division
%
Arithmetic Binary Remainder
4
+
Arithmetic Binary Addition
-
Arithmetic Binary Subtraction
5
<<
Bitwise Binary Left shift
>>
Bitwise Binary Right shift w/ sign extension
>>>
Bitwise Binary Right shift w/ zero extension
6
<
Logical Binary Is less than
<=
Logical Binary Is less than or equal to
>
Logical Binary Is greater than
>=
Logical Binary Is greater than or equal to
7
= or ==
Logical Binary Is equal to
!=
Logical Binary Is not equal to
8
&
Bitwise Binary Bitwise AND
9
^
Bitwise Binary Bitwise XOR
10
|
Bitwise Binary Bitwise OR
11
&& or and
Logical Binary Conditional AND
12
|| or or
Logical Binary Conditional OR
Operators by Precedence


User Functions

A number of special purpose user functions exist to simplify the encoding of operations which would be cumbersome or impossible to represent in a conventional expression.  User functions may be embedded in an expression, or may comprise the entire expression.  A user function is evaluated as an atomic unit.

The syntax of a user function varies from function to function, but is always of the general form

@function_name(arg_1[, arg_2...arg_N])
where:
the @ symbol
Designates the following text as a user function.  Every user function must begin with this symbol.
function_name
The name of the user function.  For example, protocol, pattern, in, etc.
arg_X
An argument required by the specific user function being used.  This may be a variable or a numeric or string constant.  Each user function takes at least one argument.  Others require additional arguments;  some user functions accept a variable number of arguments.
The user functions listed in the following table are examples:

Function Name Purpose
in Test the data within a protocol field for a match against a list of constants
isnull Test for the presence of a specific data field within a record
pattern Test for a specific byte pattern within a record
User Functions

Manageability Service

Warning: these features are desirable and Golden Code does intend to implement them if at all possible in the first release.  However, these features are not critical to the implementation and will be dropped if necessary to meet the final milestone.

This is a logically separate process running on the same or different physical hardware as other P2J application modules.  It uses the secure transport to authenticate and communicate with an in-process agent in the other P2J modules.  For the purposes of this document, the term "manager" will be another name for the manageability service.

The following lists the purposes of the manageability service:
  1. It is the central location where all P2J agents register at startup and where periodic "heartbeat" notifications are sent.  The manageability server thus maintains a registry of all running P2J modules and their high level status.
  2. It provides a central control point for sending management commands to P2J modules.  These commands can be sent on a scheduled or ad-hoc basis.
  3. It provides the central collection point for arbitrary data records relating to the specific status, security, integrity or execution of P2J modules.
At a minimum, each heartbeat (#1 above) should contain the identification of the P2J module, the identification of any authenticated user that may be logged on (this would be done on client UI modules and possibly other modules like administrative UIs) and any other general purpose status information such as a timestamp, memory usage/heap size or other process specific status information.  The first heartbeat may be a special one that includes more static information that will not change during runtime, such as system information or JVM specific values (e.g. Java properties).  At the exit of a P2J process, a final heartbeat should be sent which includes a notification that the P2J module is exiting as well as any return code or condition codes that indicate the reason for the exit.  This should be sent whether the exit is caused by an abnormal or normal end.

Commands (#2 above) that could be sent to the agent for execution:
  1. display a message to the user or on the console
  2. enable/disable specific tracepoints (to forward data records to the manageability server)
  3. query or set the value of specific attributes
  4. enable periodic or change notification monitoring of specific attributes
  5. set the alert level for asynchronous events (the threshold at which an event is important enough to forward to the manageability server)
  6. run user specified Java code
  7. halt the process
There should be a mechanism for broadcasting a command or list of commands to an arbitrary set of agents (based on the registry).

As a central collection point (#3 above), once the data is collected, it will be processed by a set of filters/rules and based on the outcome the data record can be routed to one or more particular output targets such as:
  1. write it to a log file (probably using log4j)
  2. write it to a trace file
  3. send an email (probably using Java Mail)
  4. set the value of a local variable, increment a counter or accumulate the value (to gather statistics/trending and/or implement monitors)
  5. forward to another "upstream" P2J manageability service
  6. forward to another "upstream" management system (e.g. via SNMP)
  7. run an external program
  8. run a user defined hook (Java based Interface)
Each record received by the manageability service will be queued.  A daemon thread will read the head of the queue when the queue is non-empty otherwise it is blocked.  Once it finds a record, it will dequeue the record and traverse the list of defined rules to determine the disposition of the record.  Each rule is made up of an expression (written in a modified form of the JTA filter language) and a chain of targets that will receive the record if the expression evaluates true.  There will need to be a default disposition in the case that no rule matches.

Detailed Design Process:
  1. Review the JMX specifications (such as they are). Make sure that we understand (to the extent possible) the approach to the JMX Manager so that we design our manageability service with the future goal of conforming to JMX.  Note that even if the JMX Manager specification was available now, it is likely that we would not implement it because of the expected TCK requirement.  The TCK requires a license agreement (and fees) for Sun and greatly limits the distribution because every change has to be tested for compliance before we ship.  This burden makes sense when shipping a JVM but it doesn't make sense when shipping an application environment that is designed completely by Golden Code.
  2. Design the wire-level protocol to be used between the agent and manager.  It must support the heartbeat, asynchronous events, asynchronous attribute values/change notifications and a request/response transaction interface from the manager to the agents.  All functions defined above need to be possible via this protocol.
  3. Evaluate and document the requirements for the incoming queuing of data records.  In particular, should there be any prioritization of the records or should it be a simple FIFO?
  4. Document any unique requirements for the JTA expression language and the expected changes.  This should include a mechanism for user defined variables (referencing the event record being processed) and any necessary user defined functions that need to be provided.
  5. Define the format for the rules definitions including providing a chain of targets and any specific parameters that must be passed for each target in the chain.  For example, if one of the targets is to log the record to file using log4j, then the log file name may need to be parameterized.
  6. Define how much of the configuration of the manager is stored in the directory versus provided ad-hoc at runtime.
  7. Define whether the input queue is persistent (survives a restart of the manager).
  8. Design the registry and heartbeat processing.  Include the ability to define arbitrary sets of agents in the registry which can be referred to by a group name.  This would be used in broadcasting commands.
  9. Define the Interface by which the registry and heartbeat status information can be accessed.
  10. Define the Interface by which command processing will occur (by which a command or list of commands can be submitted, the status of the commands can be monitored and the results of the command(s) can be obtained).
  11. Define the Interface by which events and attributes are enabled/disabled and accessed in specific agents or sets of agents for forwarding to the manager.   This must provide for query, set as well as the enablement of asynchronous forwarding.
  12. Define the Interface through which a new target can be added to the list of possible targets.  This will need to include a method of passing the record data.

Manageability Agent and Instrumentation

Warning: these features are desirable and Golden Code does intend to implement them if at all possible in the first release.  However, these features are not critical to the implementation and will be dropped if necessary to meet the final milestone.

To the extent possible, the Java Management Extensions (JMX) will be used to implement the P2J manageability agent.  The JMX architecture is not yet complete.  At the time of this writing, some of it is standardized and some of it is still in the JSR process.  The JMX instrumentation architecture and the JMX agent architecture are the 2 parts of the specification that are currently available.  Sun provides reference implementations of these.  It is not known how mature and production-ready these reference implementations are.  If for some reason the reference implementations are not suitable for production, then we will need to implement our own minimum function with the design goal that moving to a JMX environment in the future would be possible without a huge rewrite.  Note that Sun's approach to JMX requires that any supplier making a JMX compliant agent must license the JMX TCK and execute it to prove compatibility.  For this reason, it is critical that in the case that the reference implementation is not sufficient, we will not implement any agent that is based on the JMX APIs or JMX Agent specification.  This way there is no Sun certification or licensing requirement that inhibits the P2J project.

The functions of the P2J manageability agent are the following:
  1. Register the P2J module (in which the agent is running) with the manageability service that is defined as this agent's manager.  For the lifetime of the process, the agent will provide a periodic "heartbeat" that includes specific predefined status data.  At process exit, the agent will provide a heartbeat that includes a notification of termination and the reason/return code.
  2. Provide an access point from which the manageability service can query/set and control the P2J module.  This will be provided through the secure transport.
  3. Provide a local proxy for all local Java objects on behalf of the manageability service.  This will allow any local Java object to send an event or provide an attribute value/notification.
  4. Provide an adapter to translate and forward logging from 3rd party modules into the P2J manageability service.  An important example is the need to create an adapter for LOG4J (used by Hibernate).
Java objects in the P2J environment will be optionally instrumented to support manageability.  To the degree that is possible, this instrumentation will be in accordance with JMX specifications (note that this assumes a JMX compliant agent is in use).  The JMX specification does provide the ability to expose events and attributes to the agent.  The agent then becomes the gateway for these to be accessible from the manageability service.

For more information on JMX, please see:

Sun JMX Home
JMX Instrumentation and Agent Specification, v1.2
JMX Remote API 1.0
JMX White Paper


Detailed Design Process:
  1. Review the JMX specifications (such as they are). Make sure that we understand (to the extent possible) the approach to the JMX Agent  and JMX Instrumentation so that we design our code with the goal of conforming to JMX (subject to the limitations noted above).
  2. Define the requirements for the P2J agent (it will try to use the Sun reference JMX Agent and perform the functions of a connector as well as some of the proxy functions for the manager).
  3. Prototype the solution using the Sun JMX reference implementation.  The main objectives are to determine if the Sun JMX is ready for production and if it will support the function we are trying to implement.

Application Client/Server Components

Runtime Environment - Execution

Threading and Multiprocessing

Progress 4GL has no threading or multiprocessing.  Everything is single threaded.  Even features like persistent procedures are implemented as a one time execution of the external block on the main thread.  The internal procedures, functions and triggers are compiled, loaded and made available to subsequent procedures, but at no time does any of the code in a persistent procedure ever run in a separate thread.  Likewise, although event processing (e.g. user interface triggers that can be executed from wait-for or other statements that block waiting for user input) has the appearance of threading, all event processing is done on the main thread in a synchronous manner.  Even the "process events" feature is just a cooperative yield to the input processing to give it a synchronous opportunity to handle any pending input.

This execution model makes the P2J implementation straightforward in its architecture.  For each service (a single JVM process) that is running on the server, multiple client applications may be concurrently executing at any given time.  Each client represents a single user and a single thread of execution.  This exactly matches the semantics of the single threaded Progress 4GL runtime environment.

Shared Variable Support

Shared variables are used heavily in Progress 4GL applications.  This is a consequence of the fact that early in the Progress 4GL lifecycle, there was no such thing as procedure parameters.  Thus most code uses shared variables (with newer functions using parameters).  Unfortunately, the definition and use of these is not as consistent as one might wish, so it can be difficult  to find where something is defined or finding all the places that use that variable.  One good practice (that is not used as frequently as is optimal) is to use include files to centrally define access to shared variables defined elsewhere.  Note that there are efficiency implications of using parameters (which presumably require space on the stack) versus shared variables which are probably handled internally with pointers.  This would be especially pronounced with anything that is large in size, like an array.

Alternatives for converting global and scoped shared variables:
  1. a user-specific instance of a shared variable manager that is centrally accessible and provides lookup facilities
  2. pass references to scoped shared vars as parameters to method calls (if the points of access are few)
  3. provide access via a common ancestor (possibly by passing it to the constructor of the object and then referencing it as needed)
  4. group related shared variables in classes, create instances at the proper level of scoping and then access specific data in place using a reference to the properly scoped instance and the associated accessor methods (getters and setters)
When using the term "shared" and "global" it is important to note that only the Java code executing on behalf of the current user can ever access shared and/or global variables.  Other users run in a different context on the server have no access to these variables.  For this reason, P2J cannot use techniques such as static methods to access such variables.

The choice of approach will be made at conversion time, but some runtime infrastructure may exist to support these alternatives.

Detailed Design/Implementation Process:
  1. Design and implement the shared variable manager.

Procedure Handles and Persistent Procedures

All procedures that are currently on the call stack in Progress can be referenced using a unique "handle" that is associated with that instance of the procedure.

The Progress 4GL also allows a procedure to be defined as "persistent".  The addition of this keyword to a "run" statement causes the external procedure to be run once and its context remains permanently in memory for future use (instead of on the stack).  This means that any internal procedures and triggers it defines can be executed at any later time as long as one has the associated procedure handle. 

To use these procedure handles, one can store them in a variable and then use it with the "in" clause to a subsequent "run" statement.  Alternatively, one can use the "session manager" to walk the list of procedures and based on attributes, the specific (persistent or not persistent) procedure can be found and executed.

In any case where the code explicitly uses a handle to reference a loaded procedure (persistent or not), this code will very naturally be represented in Java.  This corresponds to the standard case where a Java object is instantiated and then a reference to this object is contained and used as needed.  One can just consider a procedure handle to be the equivalent of an object reference.

The status of a procedure as persistent or not is determined by the run statement rather than the procedure statement (don't confuse this with the "persistent" option on the procedure statement, which relates to shared libraries only).  This means that the caller of a procedure determines its context (global or stack based).  Thus the same procedure can be run both ways and this has to be possible in the target P2J environment.

In addition, it is possible to run multiple instances of a persistent procedure with different state.  By accessing the same internal procedures and triggers, in different instances, the varying state can be accessed.  This means it is not possible to implement persistent procedures using static methods in Java.

The Progress 4GL concept of persistent procedures can be recreated in Java by creating additional runtime infrastructure.  An analog to the session manager can be created which keeps track of the chain of procedures (persistent or not) and allows one to walk this chain and inspect attributes and execute methods.  To the degree that Progress 4GL procedure attributes are used, these will need to be mapped into J2SE equivalents or artificial constructs will need to be provided to make this functionally equivalent.  To the extent that classes need to implement these features, the generated code will need to derive from the right object hierarchy (inheritance) or will need to implement the correct Interface definitions.

Detailed Design Process:
  1. Catalog all uses of handles in the existing (non-dead) 4GL source code.
  2. Catalog all uses of the "session" system handle and its traversal features.
  3. Document all requirements for the conversion of this usage into native Java equivalents.
  4. For anything that can be represented directly using the standard Java language (object references), define the mapping from source to target environments.
  5. Design the interface for any handle oriented features that require runtime support (cannot be implemented using object references alone).

Business Logic

The converted application is exposed as transactions to the clients and possibly to other servers.  Each transaction is registered in the directory and made accessible to clients that have the proper authorization.

The application's business logic (decoupled from the user interface processing) runs inside the runtime environment.  It handles all application processing and is the only direct client of the data model.

Input Validation

Transaction entry points into the business logic and many other general purpose methods can benefit from strong and consistent input validation.  In most applications there are common rules for input validation which are shared by many different locations in the source code.  Unfortunately, most of the time although the rules are common, the implementation is not.  Instead, the logic that validates input is hard coded into each location where such validation is needed.

P2J will implement a separated validation layer that is leveraged at the transaction interface.  Other methods will be able to access this runtime service as it makes sense.  In is not practical to implement all input validation throughout the application using common rules since in many cases the code to leverage a common infrastructure might exceed the code to do the check inline.  In these cases a common implementation might obscure rather than help.  However, it is likely that the transaction interface will consistently implement such a common approach.  Transaction parameters can be validated (on the server side) before the request is dispatched to the actual transaction handler.

Note that due to the limitations of the proxy approach P2J won't implement as a proxy.

Input validation rules will be specified using the common expression syntax and will be evaluated using the Expression Engine which will be driven by an Input Validation manager.  Entry points in the Input Validation manager will be called with the data to be validated and a reference to the rule which defines the expression.  The rules will be stored in the database.  Since this process is data driven, it can  be edited without recompiling the system.  This reduces effort and risk.

The validation being referenced here regards program input although technically the approach has use in a more direct user interface input validation as well.  The rules are business focused and will be generated during conversion based on validation done in the current Progress source code.

Detailed Design Process:
  1. Catalog the validation logic in the current code base.
  2. For any logic that cannot be represented using the current expression language, design and implement user defined functions to provide the needed tools.
  3. Design and implement the variable resolution scheme and the manner in which variable references will be specified in the expressions.
  4. Design the format for the storage of the input validation rules.
  5. Implement the database tables necessary to support this storage.
  6. Implement the Input Validation manager.

Runtime Libraries

Progress provides a long list of language statements, built-in functions and widgets or objects with associated methods/attributes.  For each of these kinds of Progress runtime support, there will be a corresponding conversion plan for a native Java implementation.

Many of the above features may be implemented by a straight or direct mapping of code into a Java equivalent.  For example, features such as the IF THEN ELSE language statement directly translate into the Java if ( ) {} else {} construct.  Other features might require a backing method, but the J2SE platform may provide a valid replacement.  An example would be the EXP(base,exponent) function which can be easily replaced with the java.lang.Math.pow(double,double) method.

During the Stage 5 conversion process, features will be identified which need to be implemented as specific helpers (rather than a straight remapping of code into Java).  For each feature there will be a corresponding Java API defined and implemented.  This set of APIs will represent the runtime libraries for the P2J system.

These libraries will maintain a consistent naming scheme and structure.  The code conversion process will rewrite code as necessary to use these runtime APIs.

Runtime Environment - Data

Progress 4GL Transaction Processing Facilities

All database processing in Progress 4GL is performed at the record level. Each statement which retrieves data from the database always returns only one record at a time.  Each statement which updates/adds a record does that by updating/adding one record at a time. This is significantly different from the set-oriented approach of a modern RDBMS. This approach has significant consequences. First of all, many things which usually are hidden (done automatically by the RDBMS) are exposed and the Progress 4GL programmer has control of them. This yields great flexibility but causes complexity to grow enormously when trying to handle complex data processing.   Depending on the implementation, set-oriented optimizations become significantly harder to realize.  In some cases such optimizations are impossible without a logic rewrite.

The following things are exposed to the Progress 4GL programmer:
  1. Table buffers.  A table buffer contains some number of records from the table.  The actual number of records is not important because only one record is active at a time.  The 4GL automatically maintains one table buffer for each used table.  The programmer can declare and use additional table buffers if necessary.
  2. Transaction management. Although the 4GL tries to maintain transactions automatically this is not always correct and/or optimal so the programmer is encouraged to maintain transaction blocks manually.
  3. Locking.   The 4GL tries to maintain locking automatically but the programmer is encouraged to maintain locks manually, especially because the default behavior may be not obvious (see below about transactions and locking).
  4. Temp tables. Usually (in a set-oriented RDBMS) temporary tables are completely internal and are not visible to user. But the 4GL allows the programmer to create and maintain temp tables because this enables more complex queries and data processing. It appears that temp tables are created at the level where they defined and destroyed when the containing block goes out of scope.  Usually temp tables are defined at the beginning of the procedure and destroyed at the end of procedure.
  5. Handling tables as variables.  The 4GL allows tables to be passed by value and reference as a parameters to a procedure.  When tables are passed by value, table content is copied into a formal parameter.  This can be a very time and resource consuming procedure. Passing tables by reference (handle) resolves the issue but requires attention from the programmer because table content may be changed by the called procedure.
Progress queries can be implicit when certain fields are referenced or based on certain language keywords/facilities (e.g. certain looping constructs automatically query the database and iterate over records).  These implicit database accesses must be converted to explicit use of the data model in the target environment.

Automatic Joins

In Progress, database queries can be nested arbitrarily.  Depending on the structure of the query, Progress may or may not automatically join these into a single query.   The data model must handle all such situations of arbitrary nesting of queries, including the situations in which queries are automatically joined.

The following are examples of situations in which Progress does support automatic joining (including inner and outer joins). From the 'Progress Programming Handbook':

    FOR EACH Customer WHERE State = "NH",
        EACH Order OF Customer WHERE ShipDate NE ? :
        DISPLAY Customer.Custnum Name OrderNum ShipDate.
    END.

or

    DEFINE QUERY CustOrd FOR Customer, Order.
    OPEN QUERY CustOrd FOR EACH Customer, EACH Order OF Customer.

or even

    DEFINE QUERY CustOrd FOR Customer, Order.
    OPEN QUERY CustOrd FOR EACH Customer, EACH Order OF Customer OUTER-JOIN.

Such simple joins can be directly mapped into SQL.  This may have an impact on the implementation and optimization of the data model (specifically the implementation of methods exposed to the data model's clients).

Transactions
  1. It appears that transactions in the 4GL assumes changing/rolling back ONE RECORD IN EACH TABLE involved in the transactions. In other words, if in one transaction block more than one record changed then during roll back (UNDO in 4GL terminology) only one (current) record is rolled back. So, if there is an statement which processes many records (for example FOR EACH) then each update of the database is assumed as a separate transaction. There are many signs that such an assumption is correct but it should be verified on real 4GL installation.
  2. Transactions in 4GL can affect table fields and variables. Variable or temp-table should be explicitly defined as 'NO-UNDO' if rolling back of transaction should not restore variable value.
  3. Transaction bounds can be controlled implicitly or explicitly. Implicit transaction is started by following statements if they DIRECTLY UPDATE DATABASE:
  4. It should be noted that 4GL may propagate start of transaction to upper block level, up to procedure level (making entire procedure executing as a single transaction). Propagation rules are described in the 'Progress 4GL Handbook'.
  5. Addition of the TRANSACTION keyword to DO, FOR EACH or REPEAT block makes its boundary defined explicitly.
  6. Each statement mentioned above can be marked by label and this label then can be used as parameter for the UNDO statement (see below) to specify transaction to roll back.
  7. Transactions can be nested just like blocks, which specify transaction boundaries, can be nested.
  8. Transaction does lock management on behalf of application but following note should be taken into account (citing the 'Handbook'):
  9. "Never read a record before a transaction starts, even with NO-LOCK, if you are going to update it inside the transaction. If you read a record with NO-LOCK before a transaction starts and then read the same record with EXCLUSIVE-LOCK within the transaction, the lock is automatically downgraded to a SHARE-LOCK when the transaction ends. Progress does not automatically return you to NO-LOCK status."
  10. At present it is not known if the application programmers took this into account or not. In either case this semantic should be implemented or taken into account during conversion.
  11. UNDO-ing (rolling back) transactions can be done using UNDO statement. If transaction label is not specified then innermost transaction containing block with error property is rolled back. Otherwise transaction which starts at specified label is rolled back.
  12. Unlike other systems 4GL automatically supports retrying or skipping of failed transactions, i.e. in case of failure updating of a record can be retried or record can be skipped and updating of next record started.  Actual action depends on UNDO parameters. This specific behavior also should be taken into account during conversion.
Locks
  1. Locks are not applied to temp tables because they are completely internal for the procedure which contains it.
  2. Each retrieved record automatically receives SHARE-LOCK. It does allow others to read record but any try to apply EXCLUSIVE-LOCK (for example for update) will fail.
  3. EXCLUSIVE-LOCK can be applied explicitly (by passing options to, for example, FIND statement).
  4. By passing NO-LOCK option one may read any record (even incomplete transactions).
  5. Transaction automatically upgrade lock from any level to EXCLUSIVE-LOCK and then return it to SHARE-LOCK.
  6. Default locking mode for queries with associated browse is NO-LOCK.
  7. RELEASE statement does release lock (locking mode is set to NO_LOCK).

Data Model

The data model is a set of objects which provide a native Java view of the business data in the application.  This business data may be comprised of local variables and/or data stored in the database.  For each type of data used in the Progress 4GL environment, there must be an equivalent representation in the Java language.  This corresponds to a set of Java objects that must be designed with suitable facilities to replace the matching facilities in the Progress 4GL.

The P2J environment has a strong separation between business logic and the data model.  The business logic is the client or user of the data model.  The data model represents a native Java access and storage mechanism for data created and used by the business logic.  The objects in the data model provide a gateway from the object-oriented Java environment to the Relational Database Management System (RDBMS) environment.   This mapping is defined during Stage 3 of the Data/Database Conversion process.  These objects use the Object to Relational Mapping (O2R Mapping) technology Hibernate to implement this gateway.

The data model must support facilities designed to replace the Progress 4GL implementation of transactions, locking and undo as summarized in the previous section.

There will be an transaction oriented interface built into the data model which is available to business logic.  This interface will be scoped to only include committing or rolling back changes to the specific data model object in question.  This level of transaction support is called an "Application Level Transaction" in Hibernate but it will be referred to as a "data model transaction".  This level of transaction support is not equivalent to database-level transaction support (SQL commit or rollback).  Database level transactions are very short lived.  For example, one may start a transaction, read a result set and then close the transaction.   However in an application level transaction, one may read a purchase order (under the covers it executes the previously described database transaction) and then one may edit some of the purchase order's details and finally save the purchase order.  When the save occurs, a second database level transaction will occur bracketing the SQL update.  So a single data model object (e.g. an order) level transaction may yield multiple database level transactions.   The following diagram illustrates this example flow:



Any particular data model transaction might be long lived (compared to the very short duration of the database level transactions).  In fact, it might take minutes to complete in a situation in which the user pauses in the middle of a specific application screen.  So data model transactions and database transactions have different granularities and life spans.  To the degree that is practical, the business logic will not have direct access to the database-level transaction control.  However there are optimization considerations that may influence the final results in this area.

The functionality of Progress 4GL locking controls must be replaced with equivalent mechanisms in the data model.  These will either be mapped directly into RDBMS locking concepts or will be implemented in the data model itself.

There may also be a business level transaction capability in which the updates to multiple data model objects occur as an atomic entity.  This is not part of the data model itself, though the data model must be designed to enable it.

This transactional support must be carefully designed to handle a wide range of business situations. 

Closely related to the transactional support is the multi-level undo functionality.   The use of undo is controlled by the block structure of the Progress procedures and the associated implicit or explicit properties associated with each block.  Depending on the nesting of these blocks, there may be a complex multi-level set of transactions including a concept of primary transactions and 1 or more sub-transactions.  The undo implementation by default is associated with the current transaction/sub-transaction but by labeling undos the association can be changed to an arbitrary point in the levels of nested blocks.  This means that one can undo multiple levels up or just one level up at a time.

The data model must provide for an equivalent of the Progress 4GL capabilities regarding undo.   The data model must handle all kinds of variables, not just those backed by the database.  This is because even the simple variables support the Progress concept of undo.  There is a set of Java classes that back each data type.  For each Progress 4GL datatype a specific mapping needs to be made into a Java primitive or class, although the undo feature may prohibit the chance to directly use primitives.

Some of the Progress functionality related to undo is designed to provide a standard flow control mechanism.  In other words, undo can be used in Progress to change the point of execution in a program as well as to restore variables or database fields to a specific prior state.  The flow control aspects of undo will be handled by other parts of the runtime and code conversion as this is quite separate from the variable/database rollback.   There may be cases where undo is being used for both flow control and variable/database rollback.  However it is also possible that a particular use of undo ignores the results of the intermediate levels of variable/database rollback.  Detecting this situation may be an important optimization as it may allow the elimination or reduction of levels of undo to be supported.  This would reduce working set and CPU utilization.

Security will be an important feature of the data model.  This is a critical location because it is the gateway to all business data.  There are 2 types of security mechanisms that will be supported in the data model:
  1. Whether access to a specific data object is allowed at all.
  2. Filtering of the results that are valid for this user.
To implement such security capabilities, any such constructs in the Progress 4GL must be detected and converted into a set of rules that can then be processed in a generic manner by the data model.  In other words, the data model will implement a standard layer of access control and filtering.  This layer will consult a set of rules (access control lists or ACLs and filter descriptors) describing the policies that must be enforced by the data model based on the runtime context (e.g. the user) and the context in the data model (which resource is being accessed).  In order to implement a generic approach to filtering, it is likely that an expression language and expression processing engine will be necessary.  The filter descriptors would be rules written in this expression language and the expression engine would evaluate the expression for each record of a result set to determine if the record could be accessed in the current context.

To optimize filtering performance it will be important to leverage any possible conditions that can be placed on queries to reduce the result sets based on known security parameters.  This means that to optimize performance, data model filtering may be implemented in 2 passes:
  1. Where possible, generate a query to the database that eliminates as much of the result set as possible, if those results would be eliminated by the filter descriptors active in this context.  This will be dependent upon:
  2. Run the expression engine against the final (smaller) result set to handle those parts of the expression that could not be delegated to the database via SQL.
By maximizing the filtering that will be done by the database, the overall performance can be maximized.  Note that we may wish to leverage Hibernate's filtering and query capabilities to perform this filtering at a higher level of abstraction, rather than managing the SQL directly.
Detailed Design Process:
  1. Document the detailed requirements for the data model.  Include a complete treatment of the following:
  2. Design the object hierarchy for the data model.  Specific issues needing resolution include:
  3. Design the approach to transaction processing, locking and undo.
  4. Define the list of patterns in which Progress 4GL code implements data model security (variable or database access control, result set filtering).
  5. Make a complete list of the application source code that implements data model security.
  6. Design the logic and locations for implementing access control security in the data model.
  7. Define the expression language and requirements for the expression processing engine.  Use the Java Trace Analyzer (JTA) expression language and engine as a starting point.
  8. Design the logic and locations for implementing filtering in the data model.
  9. Define the format and storage/access mechanism for the:

Data Storage

All persistent data storage is in a Relational Database Management System (RDBMS).  This provides a well structured storage and access environment which can be accessed via standard Structured Query Language (SQL).  The use of this approach will be kept generic.  No database-specific features will be used, except to the extent that these are encapsulated and managed transparently by Hibernate's database dialect abstraction layer.  This will allow the implementor of the P2J environment to choose the database based on implementation requirements rather than driving the database choice based on artificial limits in the P2J source code.  It must be equally possible to use open source database technologies such as MySQL or PostgreSQL, as it is to use the commercial databases such as DB2 or Oracle.

PostgreSQL is the database of choice for the P2J implementation.  It is important to test using this environment, but other environments must also be tested to ensure that the SQL implementation is generic.

Detailed Design Process:
  1. Ramp up (acquire basic skills) on PostgreSQL RDBMS.  Create a "cheat sheet" of critical processes and commands so that others can implement each of the below areas without first reading all manuals or becoming a PostgreSQL expert.  Focus on:
  2. Setup PostgreSQL in development and test environments.  Make sure that an adequate number of such environments exist which can properly support the size of the development team (4-8 people).

Object to Relational Mapping

Accessing data in a relational database (typically via JDBC) does not produce a set of native Java objects as a result.  In order to provide simple native database access, an Object to Relational Mapping (ORM) solution is used.  This technology provides the infrastructure to expose the database as a set of Java objects.  Another common name for ORM solutions is "persistence frameworks" since the technology is used to  store data to and retrieve data from a persistent storage mechanism (a database).

Hibernate v2.1 was chosen as the ORM implementation for this project for a number of reasons:
This open source framework was developed by community developers led by JBoss team members. It is made available via LGPL. Community input and support seems strong, though paid support and developer training contracts are available from JBoss as a last resort. The free documentation is decent but leaves a number of questions unanswered. The published API seems well documented via JavaDoc, though the internal implementation classes are often completely undocumented. Several excellent books are available for Hibernate:
Hibernate is designed to work best with "Plain Old Java Objects" and thus does not require persistable objects to extend or implement framework-specific classes or interfaces, respectively. It is not completely transparent, however; data model objects will have some minimal persistence-related design requirements (at minimum a database identifier field for most persistable classes). Nevertheless, we believe the benefits of this framework far outweigh these inconveniences, which we expect would be required in any ORM approach.

Hibernate's SchemaExport (hbm2ddl) tool will be used to generate the Data Definition Language (DDL) and apply it to the target database. This isolates us from differences in implementation between database vendors, significantly simplifying both the abstract definition of the schema and the creation of DDL. Initially, one Hibernate mapping document (*.hbm.xml) will be created per relational table. This approach may be modified iteratively as we refine our understanding (and schema hints documents) of the relations between Progress tables.

We will use Hibernate's XML configuration file to configure the framework, which is the preferred mechanism according to its designers. Object to table mappings for use at runtime will be configured using the *.hbm.xml files used for schema creation. It may also be necessary to dynamically configure additional (and likely temporary) mappings at runtime in order to mimic support for Progress' temp table mechanism.

Connection pooling technology will be implemented to mitigate the overhead of creating new JDBC connections for each database request. Hibernate implements a plug-in architecture for connection pooling and manages the connection pool transparently. A default connection pooling implementation is provided, but it is intended for development use only, not deployment. The top, contending, production-level technologies (all open source) which are currently supported by Hibernate are C3P0 (ships w/ Hibernate), Proxool, and Apache's DBCP. We have used previous (satisfactory) experience with C3P0.

Second level caching will be used to minimize database round trips wherever possible, primarily with read-only and read-mostly data which is least likely to become stale when cached. For example, a good candidate for setting up a read-only second level cache is for the table to which we map metadata associated with Progress database fields, which is used by the user interface (format clauses, labels, help text, etc.). Second level caching is optional, but for all practical purposes is probably necessary for the P2J system to scale well. Hibernate implements a plug-in architecture for second level cache. Candidates here are EHCache (the default), OSCache, SwarmCache, and JBossCache.

Where it is possible and feasible to extract sufficient data query information from Progress source code, these queries will be mapped to compile-time, named queries using HQL (Hibernate Query Language). HQL provides a mechanism to generate polymorphic queries and to perform more intuitive substitutions than prepared statements (which are used under the covers). It uses an SQL-like syntax which refers to object-level constructs rather than database table-level constructs. Using named queries (i.e., those stored outside Java code in XML and accessed by name at runtime) makes for easier query maintenance and cleaner code.

Where the use of HQL is not practical or possible, we will use native SQL queries (which can also be named and maintained outside Java code). A possible candidate for this approach is P2J's implementation of Progress temp table support. If implemented to leverage temp table support in the backing, relational database, this will require the use of syntax which HQL does not support (e.g. SELECT INTO TEMP...). Hibernate's Criteria API will not be used, unless we find a compelling need, due to constructs in Progress source code which are not easily replaced otherwise.

Detailed Design Process:
  1. Investigate connection pooling technology candidates. Review the types of open source licenses associated with these technologies.
  2. Investigate second level cache technology candidates. Part of this task is to review the types of open source licenses associated with these technologies.

Runtime Environment - User Interface

Character Mode User Interface

The P2J user interface (UI) architecture is designed to offer multiple client types (e.g. Swing, Web, ASCII Terminal) by using a a pluggable front-end that handles the client-type specifics. 

P2J will initially provide a complete replacement for the terminal oriented user interface of the Progress 4GL environment.  This character mode UI (sometimes referred to as "CHUI") will have multiple components:
  1. CHARVA
  2. Progress 4GL Compatible CHUI Widgets
  3. Client Application Launcher
  4. Application Specific Classes
Future enhancements of P2J will be targeted at providing additional client types.

Detailed Design Process:
  1. Make a complete list of all user interface controls and features used in the current Progress 4GL application.
  2. For each control or feature, document the exact specification of its visual look and functional behavior.  This will likely require some amount of prototyping research in the Progress 4GL environment to "fill in" the inevitable gaps in the documentation.
  3. Map each control and feature into its CHARVA equivalent, where there is one.  Document all gaps as well as all required changes or enhancements that will be needed to provide the needed function.  This will require experimentation with the CHARVA code.
  4. Analysis/testing of the CHARVA code in specific terminal environments.  All issues will be documented in detail. The exact terminal software used by the client will be used for this testing and the following terminal types will be tested:
  5. Implement each Progress 4GL Compatible CHUI Widget and all proper behaviors.
  6. Design the process and the specific logic (flow chart + pseudocode) for refactoring the UI code from the business logic.  This means that all 4GL UI code must be:
  7. Develop the application launcher class (to startup the client side).

Menuing and Navigation

Although menuing and navigation is closely related with the user interface, it is a separate topic.  P2J provides a generic mechanism for implementing a menu-based interface that allows the user to navigate through an application.  The menus are defined in the Directory and each user's specific access rights are likewise in the Directory.  This allows a generic menuing mechanism to operate differently for each user.

Security will be implemented by the menu manager.  Only those entries to which a user has access will be available.

The navigation itself will have a user interface component that is responsible for the managing the interaction with the user.  It will display the menu to the user and process the menu choices.  This user interface component must have a customizable look and feel, such that customers can properly mimic current menu interfaces.

This processing must be locale enabled.

Implementation Process:
  1. Document requirements for a generic menu/navigation processing facility.
  2. Define the syntax and format for the navigation rules.

Online Help

A replacement for the Progress 4GL context sensitive help system will be built.  This facility will look and feel the same (using the CHARVA widgets) as the Progress equivalent.  The key here is that this is a hand-built Java application that uses the widgets to implement the UI for the online help.  A back-end set of transactions for accessing the help data must be built.  This processing must be locale enabled.

Implementation Process:
  1. Create a set of transactions to access specific help topics based on context.
  2. Document look and feel for the UI.
  3. Implement the UI using the CHARVA widgets.

Data and Database Conversion

The following diagram illustrates the 3 stage process through which the application data and Progress 4GL database schema will be converted:



Database Schema Conversion (Stage 1)

Using the Progress 4GL development environment, one can export the current database schema. This creates a set of .df text files that have a structured definition of the Progress database schema. An early assumption of this project was that the entire schema definition is exported into this format, so no other data would be necessary to perform the schema conversion. This is not the case.

In particular, the schema dump file does not contain information about the data integrity relations between database tables. There is no explicit information in the dump file to determine a table's primary and foreign keys, which are necessary to create the relational schema correctly and ensure the data is migrated correctly. The dump file does include information about a table's indices, one of which must be designated the primary index. However, the semantics of a primary index in Progress differ from that of primary keys in a relational data model, such that we can't map Progress primary indices directly to relational primary keys in all cases.

Specifically, Progress primary indices are not necessarily unique.It appears that a Progress primary index is simply a "regular" index, but with some special significance. Most likely the "primary" designation merely ensures that the index so marked will determine the default scanning order for the records in that table during a query, and that index will be preferred for this purpose over other indices (if any) defined for that table.

The net effect of this deficiency is that the 3 stages of database conversion described herein are actually converging with one another and to some degree with the code conversion area of the project. A dependency upon the ability to scan database access code to analyze query logic now exists. Table relation information will have to be gleaned from automated and manual code analysis.

A schema conversion tool parses the .df files and for each element of that file maps this to the appropriate counterpart in a relational database structure.  The tool generates an output report indicating any issues encountered during the conversion.

A "P2R Hints" (P2R stands for Progress to Relational) file allows the persistent specification of any deviations from the default conversion choices. It will also contain information about the primary-foreign key relations between tables. The schema conversion process will take the default conversion mapping and sizes unless otherwise overridden by entries in this hints file, and will in any event use the relational information in that file. The default mappings for the data type, column widths/constraints and name conversions can be overridden.  The hints file also allows the specification of new/deleted tables, columns, indices... and other refactoring to be implemented in the conversion process.

There is an optional "Usage Pattern Hints" file.  By encoding the Progress database's pattern of use the schema conversion can be augmented with any additional columns or features needed to properly support the Hibernate caching and locking strategies defined in stage 3.  This file will be generated through interviews with the Progress application developers, analysis of the application source and analysis of the exported data.  The knowledge that will be encoded includes:
  1. For each table, the frequency of:
  2. The relative volumes of data in given tables:
Some data types have no fixed length columns (see below for some needed investigations here).  Note that it is likely that only character data has this behavior.  In addition, there is a "hard limit" on the total record size (the cumulative size of all columns in a record) of 32KB.  In the case where no fixed column size exists, the schema only holds hints about the expected size.  To find out the actual size the actual data records must be reviewed.  Thus the Progress database contents will be exported to .dd files. These are structured text files that have the actual records on a table by table basis.  Analysis of these files will allow the calculation of the current column size based on production data.  In a Relational Database Management System (RDBMS) most columns require a real maximum to be set in advance.  Calculations from the actual data, heuristic rules and overrides in the P2R hints file will provide the means to establish these limits.

Progress does not have a timestamp data type.  This means that timestamps are really implemented by each application developer and most applications can have multiple different approaches to handling this.  For example:
  1. A date field + an integer that holds seconds past midnight.
  2. A date field + a string.
  3. A string.
  4. A decimal.
This is not an exhaustive list.  Since this can be done in so many ways and there are no standards, it may be quite challenging to implement a conversion to a timestamp data type.  At a minimum, the brute force approach of leaving the code as is and running the equivalent in Java will provide the same capabilities as today. However, this may forego opportunities to result in a cleaner application by converting a more complicated approach into a common, standard timestamp approach.  If there is enough commonality in the patterns of how timestamps are done that we can detect and convert this code in a reasonable manner, then we will implement such an approach.  Note that this has a big impact on all stages of the data/database conversion process as the schema, the data conversion and the data model must all be aware of this non-default conversion.  This may also be a good example of an override in the P2R Hints file.

The output from the schema conversion tool is as follows:
  1. Progress Schema Namespace
  2. P2R Mapping
  3. Hibernate Mapping Documents
  4. Data Model Hints
  5. Issue Report
Detailed Design Process:
  1. Determine if the variable column width extends to all data types (e.g. do integers have unlimited size).
  2. Determine if there is any inappropriate reliance upon the unlimited column size feature of Progress.  Specifically, check for situations in which a particular field of a specific record is being used in an ever growing never shrinking manner to accumulate a log or other history-like data.  If so we will need to either replace this approach manually or design a method to detect and convert such situations automatically.  The choice of approach will depend upon how many instances of such use there are and how uniform these instances are.
  3. The format of the .df schema files is documented in the data_df.g ANTLR grammar to a degree sufficient for parsing and creating a hierarchy of objects which represent the application schema. The meaning and purpose of the particular keywords/properties must yet be documented.
  4. Reverse engineer and document the format of the .dd data export files.
  5. For each schema element, determine and document the proper mapping(s) in a standard RDBMS environment.
  6. Define the rules for converting Progress names to names appropriate in a standard RDBMS environment, where it makes sense to change the names.
  7. Identify the list of places in the application that use timestamps.  Categorize the number of discrete timestamp implementations.  Make a determination of the feasibility of automatically identifying and converting each implementation pattern.
  8. List the requirements for the P2R Hints file.  The list of specific overrides and refactoring features must be documented.
  9. Define the format of the P2R Hints file.
  10. List the requirements for the Usage Pattern Hints file.
  11. Define the format of the Usage Pattern Hints file.
  12. Extend the format of the P2R Mapping output file prototype.  This needs to contain a complete, bidirectional mapping between every Progress schema element and the final RDBMS equivalent.

Data Conversion and RDBMS Creation/Loading (Stage 2)

The first step in the data conversion stage is to create the relational database.  DDL must be generated using the P2R Mapping output from Stage 1.  That DDL is then submitted to the proper database via JDBC.  Any issues that occur during the creation process need to be reported clearly and completely.

The second step is to read the data from the exported .dd text files and convert each field of each record from the Progress format into the format needed for the relational database.  While most string data may not need modification, numeric data, timestamps or data in other formats might need conversion or transformation before it can be loaded into the RDBMS.  Most important is that this processing must maintain the correct referential integrity in the target database, such that all record level relationships are maintained.  In instances where there are new or different  primary keys, the conversion code or the target database itself must generate these keys.  Of course, all foreign keys that refer to these new or modified primary keys must be kept synchronized (changed in "lock-step").   The data will be converted on a per-record basis, with the process starting at a root point in the tree of dependent tables.  Following this dependency tree, a graph of related record data will be read and converted into the target format.

The third step is to take the converted data and load the RDBMS.  While this step is displayed as a logically separate step, it is likely that this will be done at the same time as the data conversion, on a per-record basis.  Each related graph of converted data will be inserted into the RDBMS at the same time, thus maintaining referential integrity.  The final result will be a fully populated RDBMS with the production data in a form ready for application access.

It is likely that the calculation of the tree of table dependencies (that defines the order in which the tables of the database need to be loaded) and the graph of relationships between tables (which defines the manner in which related records are processed to be loaded at the same time to maintain referential integrity) will be useful information for use in Stage 3.  Caching this information in the "Data Model Hints" may be done if this is of value.

While performance is always of some concern, the primary design drivers of this processing are simplicity and reliability.  As long as the resulting database can be populated with the complete set of the correct data in the correct format that maintains all necessary relationships, then the process is a success.  Simplicity is important because the maintenance and improvement of this processing must be easy and less error prone.

Detailed Design Process:
  1. Define the specific set of rules by which one determines the tree of dependent tables.
  2. Define the specific set of rules by which one determines the graph of relationships for record level processing.
  3. Define the list of all data conversions that must be done based on the P2R Mapping (output from Stage 1) and the format of the .dd files.  This task is dependent upon the documentation P2R Mapping and the .dd file format in the Stage 1 design.

Data Model Conversion (Stage 3)

This stage has the objective of creating a set of business objects that represent the database in a format that is easily programmed and understood.  This is called the data model and the core idea is that Java business logic needs a set of objects that provide a natural interface to the database, and which abstract the mechanics of how the data is stored and retrieved.  The business logic is "client" code in this regard and it just uses and modifies the data model and the data model handles the database specifics without the business logic's involvement.

The data model is a representation of the different (and possibly overlapping) business-defined views of the database.  The definitions of these views are stored in the Progress 4GL application source code.  Thus the core input for the data model conversion process is the preprocessed 4GL application source code.  It must be preprocessed because until the preprocessor runs, the source file may still contain code that is not valid Progress 4GL.

The P2R Mapping is a second input that is extremely important to the data conversion process.  This is the primary output from Stage 1 and it is needed here to allow the data conversion process to convert the business views in the source code into the correct Java representation that abstracts the RDBMS structure.  In order to abstract that structure, the data conversion process must have the structure's definition and it must know the exact Progress 4GL source of each target element.

Other inputs to the conversion process include usage pattern hints and data mode conversion hints, both of which are created during Stages 1 and 2.  These hints define overrides and other non-default behavior in order to resolve ambiguous decisions or optimize performance.  In particular, the usage pattern hints will define where non-default approaches are needed in regards to the caching and locking strategies of the Object to Relational Mapping layer.

Outputs include:
  1. Object to Relational Mapping (O2R Mapping)
  2. Progress 4GL to Object Mapping (P2O Mapping)
  3. Code Conversion Hints
  4. Cross-reference of Progress 4GL variables/schema elements to Java Data Model replacements (and vice versa) for the following:
Once the data model conversion process completes, the O2R mapping output can be converted into a specific set of Object to Relational Mapping configuration/mapping files.  This will be done by a P2J specific process and it will generate Hibernate specific output files.  In the future, if other persistence frameworks are supported, this generation process will require changes.  Note that the O2R mapping will not need to be changed in this case.

Once the Hibernate mapping files are created, the actual Java data model source code can be generated.

Detailed Design Process (highly dependent upon the completion of the Data Model detailed design process):
  1. Define the rules for converting Progress names to data model names, where it makes sense to change the names.
  2. List the requirements for the O2R mapping file.
  3. Define the format of the O2R mapping file.
  4. List the requirements for the P2O mapping file.
  5. Define the format of the P2O mapping file.
  6. List the requirements for the Code Conversion Hints file.
  7. Define the format of the Code Conversion Hints file.
  8. Document the process of writing the Hibernate-specific mapping files from the O2R output file.
  9. Determine the approach for the generating code for the data model.  Can/will Hibernate utilities be used or leveraged?  Will these utilities need to be enhanced or replaced?

Code Conversion

The following diagrams illustrate the 6 phase process through which the Progress 4GL source code will be converted into Java:




Progress 4GL Lexical Analyzer and Parser

Multiple components of the P2J conversion tools will need the ability to process Progress 4GL source code.  To do this in a standard way, a Progress 4GL-aware lexical analyzer and parser will be created.  Some details on lexical analyzers and parsers:

Lexical Analyzer and Parser Primer

By using a common lexical analyzer and parser, the tricky details of the Progress 4GL language syntax can be centralized and solved properly in one location.  Once handled properly, all other elements of the conversion tools can rely upon this service and focus on their specific purpose. 

Parser and lexer generators have reached a maturity level that can easily create results to meet the requirements of this project.  This creates a savings of considerable development time while generating results that have a structure and level of performance that meets or exceeds the hand-coded approach.

In a generated approach, it is important to ensure that the resulting code is unencumbered from a license perspective.

The current approach is to use a project called ANTLR.  This is a public domain (it is not copyrighted which eliminates licensing issues) technology that has been in development for over 10 years.  It is used in a large number of projects and seems to be quite mature.  It is even used by one of the Progress 4GL vendors in a tool to beautify and enhance Progress 4GL source code (Proparse).  The approach is to define the Progress 4GL grammar in a modified Extended Backus-Naur Form (EBNF) and then run the ANTLR tools to generate a set of Java class files that encode the language knowledge.  These resulting classes provide the lexical analyzer, the parser and also a tree walking facility.  One feeds an input stream into these classes and the lexer and parser generate an abstract syntax tree of the results.  Analysis and conversion programs then walk this tree of the source code for various purposes.  There are multiple different "clients" to the same set of ANTLR generated classes.  Each client would implement a different view and usage of the source tree.  For example, one client would use this tree to analyze all nodes related to external or internal procedure invocation.  Another client would use this tree to analyze all user interface features.

P2J uses ANTLR 2.7.4.  This version at a minimum is required to support the token stream rewriting feature which is required for preprocessing.

Some useful references (in the order in which they should be read):

Building Recognizers By Hand
All I know is that I need to build a parser or translator. Give me an overview of what ANTLR does and how I need to approach building things with ANTLR
An Introduction To ANTLR
BNF and EBNF: What are they and how do they work?
Notations for Context Free Grammars - BNF and EBNF
ANTLR Tutorial (Ashley Mills)
ANTLR 2.7.4 Documentation
EBNF Standard - ISO/IEC 14977 - 1996(E)

Do not confuse BNF or EBNF with Augmented BNF (ABNF).  The latter is defined by the IETF as a convenient reference to their own enhanced form of BNF notation which they use in multiple RFCs.  The original BNF and the later ISO standard EBNF are very close to the ABNF but they are different "standards".

Golden Code has implemented its own lexical analyzers and parsers before (multiple times).   A Java version done for the Java Trace Analyzer (JTA). However, the ANTLR approach has already built a set of general purpose facilities for language recognition and processing.  These facilities include the representation of an abstract syntax tree in a form that is easily traversed, read and modified.  The resulting tree also includes the ability to reconstitute the exact, original source.  These facilities can do the job properly, which makes ANTLR the optimal approach.

The core input to ANTLR is an EBNF grammar (actually the ANTLR syntax is a mixture of EBNF, regular expressions, Java and some ANTLR unique constructs).  It is important to note that Progress Software Corp (PSC) uses a BNF format as the basis for their authoritative language parsing/analysis.  They even published (a copyrighted) BNF as 2 .h files on the "PEG", a Progress 4GL bulletin board (www.peg.com).  In addition, the Proparse grammar (EBNF based and used in ANTLR) can also be found on the web.  Thus it is clear that EBNF is sufficient to properly define the Progress 4GL language and that ANTLR is an effective generator for this application.  This is the good news.

However, there is a very important point here:  the developers have completely avoided even looking at ANY published Progress 4GL grammars so that our hand built grammar is "a clean room implementation"  that is 100% owned by Golden Code. The fact that they published these on the Internet does not negate their copyright in the materials, so these have been treated as if they did not exist.

An important implementation choice must be noted here.  ANTLR provides a powerful mechanism in its grammar definition.  The structure of the input stream (e.g. a Progress 4GL source file) is defined using an EBNF definition.  However, the EBNF rules can have arbitrary Java code attached to them.  This attached code is called an "action".  In the preprocessor, we are deliberately using the action feature to actually implement the preprocessor expansions and modifications.  Since these actions are called by the lexer and parser at the time the input is being processed, one essentially is expanding the input directly to output.  This usage corresponds to a "filter" concept which is exactly how a preprocessor is normally implemented.  This is also valid since the preprocessor and the Progress 4GL language grammars do not have much commonality.  Since we are hard coding the preprocessor logic into the grammar, reusing this grammar for other purposes will not be possible.  This approach has the advantage of eliminating the complexity, memory footprint and CPU overhead of handling recursive expansions in multiple passes (since everything can be expanded inline in a single pass).  The down side to this approach is that the grammar is hard coded to the preprocessor usage and it is more complicated since it includes more than just structure.  This makes it harder to maintain because it is not independent of the lexer or parser.

The larger, more complex Progress 4GL grammar is going to be implemented in a very generic manner.  Any actions will only be implemented to the extent that is necessary to properly tokenize, parse the content or create a tree with the proper structure.  For example, the Progress 4GL language has many ambiguous aspects which require context (and sometimes external input) in order to properly tokenize the source.  An example is the use of the period as both a database name qualifier as well as a language statement terminator.  In order to determine the use of a particular instance, one must consult a list of valid database names (which is something that is defined in the database schema rather than in the source code itself).  Actions will be used to resolve such ambiguity but they will not be used for any of the conversion logic itself.  This will allow a generic syntax tree to be generated and then code external to the grammar will operate on this tree (for analysis or conversion).    In fact, many different clients of this tree will exist and this allows us to implement a clean separation between the language structure definition and the resulting use of this structure.

The following is a list of the more difficult aspects of the Progress 4GL:
Based on analysis and testing of a simple Progress 4GL grammar, all of the above problems can be solved using ANTLR.

For the implementation process, please see the detailed design document's Implementation Plan section.

4GL Unified Abstract Syntax Tree

An abstract syntax tree (AST) is a hierarchical data structure which represents the semantically significant elements of a language in a manner that is easily processed by a compiler or interpreter.  Such trees can be traversed, inspected and modified.  They can be directly interpreted or used to generate machine instructions.  In the case of P2J, all Progress 4GL programs will be represented by an AST.  The AST associated with a specific source file will be generated by the lexer and parser.

Since most Progress 4GL procedures call other Progress 4GL procedures which may reside in separate files, it is important to understand all such linkages.  This concept is described as the "call tree".  The call tree is defined as the entire set of external and internal procedures (or other blocks of Progress 4GL code such as trigger blocks) that are accessible from any single "root" or application entry point.  It is necessary to process an entire Progress 4GL call tree at once during conversion, in order to properly handle inter-procedure linkages (e.g. shared variables or parameters), naming and other dependencies between the procedures.

One can think of these 2 types of trees (abstract syntax trees and the call tree) as 1 larger tree with 2 levels of detail.  The call tree can be considered the higher level root and branches of the tree.  It defines all the files that hold reachable Progress 4GL code and the structure through which they can be reached.  However as each node in the call tree is a Progress 4GL source file, one can represent that node as a Progress 4GL AST.  This larger tree is called the "Unified Abstract Syntax Tree" (Unified AST or UAST).  Please see the following diagram:



By combining the 2 levels of trees into a single unified tree, the processing of an entire call tree is greatly simplified.  The trick is to enable the traversal back and forth between these 2 levels of detail using an artificial linkage.  Thus one must be able to traverse from any node in the call tree to its associated Progress 4GL AST (and back again).  Likewise one must be able to traverse from AST nodes that invoke other procedures to the associated call tree node (and back).

Note that strictly speaking, when these additional linkage points are added the result is no longer a "tree".  However for the sake of simplicity it is called a tree.

This representation also simplifies the identification of overlaps in the tree (via recursion or more generally wherever the same code is reachable from more than 1 location in the tree).

To create this unified tree, the call tree must be generated starting at a known entry point.  This call tree will be generated by the "Unified AST Generator" (UAST Generator) whose behavior can be modified by a "Call Tree Hints" file which may fill in part of the call tree in cases where there is no hard coded programmatic link.  For example, it is possible to execute a run statement where the target external procedure is specified at runtime in a variable or database field.  For this reason, the source code alone is not deterministic and the call tree hints will be required to resolve such gaps.

Note that there will be some methods of making external calls which may be indirect in nature (based on data from the database, calculated or derived from user input).  The call tree analyzer may be informed about these indirect calls via the Call Tree Hints file.  This is most likely a manually created file to override the default behavior in situations where it doesn't make sense or where the call tree would not be able to properly process what it finds.

The UAST Generator will create the root node in the call tree for a given entry point.  Then it will call the lexer/parser to generate a Progress 4GL AST for that associated external procedure (source file).  An "artificial linkage" will be made between the root call tree node and its associated AST.   It will then walk the AST to find all reachable linkage points to other external procedures.  Each external linkage point represents a file/external procedure that is added to the top level call tree and "artificial" linkages are created from the AST to these call tree nodes.  Then the process repeats for each of the new files.  They each have an AST generated by the lexer/parser and this AST is linked to the file level node.  Then additional call tree/file nodes will be added based on this AST (e.g. run statements...) and each one gets its AST added and so on until all reachable code has been added to the Unified AST.

The resulting UAST is built based on input that is valid, preprocessed Progress 4GL source code.  This is used as input to both Stage 2 (Dead Code Analysis) and Stage 3 (Annotation).  Stage 3 modifies the Unified AST to add "annotations".  Annotations are flags or other attributes attached to specific nodes that provide more information than just the node type and contents.

The output from Stage 3 is an Annotated Unified AST.  The structure is the same as the Unified AST, but there is much more information stored in the Annotated Unified AST.  This Annotated Unified AST is an input to Stages 4 (Structure Analysis) and 5 (Code Conversion).

All of these different progressions of the Unified AST are related to and based upon the 4GL application source.  Since this UAST represents the source side of the conversion process, this document may refer to the 4GL UAST as the "Source UAST".

Detailed Design/Implementation Process:
  1. Make a full list of possible call node types.  This means that all possible methods of program invocation must be documented including enough information to build a pattern recognition module to appropriately match such instances.
  2. Document the requirements for data that must be captured and/or derived and subsequently stored in each call node.  Each node must store any generic data in addition to type-specific data.
  3. Analyze the list of indirect or otherwise troublesome program invocation constructs in the 4GL application source code.
  4. Design the interface for the call tree level nodes.
  5. Design the mechanism for "artificial linkage" between the call tree and each node's AST.  This mechanism must allow the seamless traversal between the 2 levels.  It is this facility that allows this to be called "unified".
  6. Design a hint format to store enough information to identify all such troublesome situations as well as to "patch" the call tree to traverse such situations.
  7. Design a persistence mechanism for Unified ASTs.  XML may be an appropriate technology to leverage since it easily represents tree structured data.
  8. Implement the UAST call tree nodes and the associated AST node modifications.
  9. Implement the call tree creation logic to walk the source tree and create a UAST using the lexer/parser output on each source file to create the associated AST.  Ensure that all linkages are properly created.
  10. Implement persistence.

Java Unified Abstract Syntax Tree

The Java Unified Syntax Tree is based on the same UAST concept described in the preceding section.

Where the Source UAST represents a call tree of 4GL source code, the Java UAST represents the converted set of Java class/interface files, resource bundles, expressions, rules and other output from the conversion process.  For this reason, we may refer to this UAST as the "Target UAST".

Where the Source UAST is built by lexing/parsing the 4GL source code, the Target UAST is constructed in memory through multiple stages (Stages 4 and 5) and has no filesystem backing in Java source code.  For this reason, the nodes in the Target UAST are not based on ANTLR derived objects but instead are created on a custom basis for P2J.  These nodes represent the semantic equivalent of Java language constructs and other resources that in total represent the converted Java application equivalent of the Source UAST.  To be clear: while in the Source UAST each node represents some compilable 4GL source code, in the Target UAST, each node represents the conceptual Java construct but cannot be rendered into a source code form without further processing (and formatting).  Thus a node might represent a Java language "for loop" and it would store the minimum information necessary to describe the loop, such as the variable initializer(s), the expression that tests for completion, the expression and assignment that executes at loop end.  It would also have a subtree of other nodes that represent the for loop contents.

Stages 4 and 5 use analyze the Source UAST (after it has been annotated) and build up a representation of the converted Java application, in the form of the Target UAST.  By handling all this separate from the actual source code output and formatting, these tools can focus on getting the logical equivalence correct.  In addition, this architectural design makes multiple passes significantly easier.   If the output of Stages 4 and 5 were Java source code, then multiple passes could only be enabled by editing the source code or parsing it back into a tree form.

The Target UAST will be populated in Stage 4 with structural information (classes, interfaces, methods, data members...), naming information and linkage information (method signatures...).

In Stage 5, this skeleton structure is then "fleshed out" with the actual replacement function.

This split in the building of structure versus the generation of the details is necessary because the resulting programs are being refactored into separate components (UI is separate from business logic which is separate from the data model...).  In order for these components to work together (to handle the logical equivalent of the previous, highly interdependent 4GL approach), the structure, naming and linkage between the parts must be known in advance even when the actual client components that call these interfaces have not yet been generated).

So in Stage 5, the actual Java language constructs (and other supporting facilities like resource bundles or generic rules/expressions) are added to the Target UAST.  When this UAST is complete, the actual output of the conversion can be prepared (Stage 6).  Stage 6 actually writes out the compilable Java source code (and other resources or rules) that is represented in a more abstract manner in the Target UAST.

Detailed Design/Implementation Process:
  1. Design and implement the generic base class(es) that will be needed to construct the tree.  Specific language constructs (see Stages 4 and 5) will be handled as subclasses of these base classes.  Make sure that all standard behavior and data to be stored is handled by the base classes.

Tree Processing Harness

Since multiple code conversion stages (3-6) require processing an entire UAST, the complexity of walking the tree is centralized in a "Tree Processing Harness".  This harness provides the following capabilities:
  1. Provides a collection of those nodes of the tree that meet some criteria (based on annotations, node type or other node data).
  2. Calls a client-specified callback method for each node in the tree or a subset of nodes (as in #1 above).
  3. Provides an option to ignore processing a section of the tree that represents a recursion or more generally any call to a branch of the tree that has already been processed.
Detailed Design/Implementation Process:
  1. Design the public interface for the tree processing harness.
  2. Implement the tree processing harness.

Progress 4GL Preprocessor (Stage 1)

The objective of this stage of code conversion is to process the entire 4GL source tree and resolve/expand all preprocessor directives in order to turn an arbitrary source file into a valid Progress 4GL source file.  In general, the assumption must be that a file is not  syntactically valid Progress 4GL until after the preprocessor runs.

The Progress 4GL environment uses a purely interpreted execution model.  While this has significant performance limitations, it does allow for some advantages over a compiled environment (e.g. Java).  External procedures can be written with preprocessor references to named or positional arguments (not to be confused with parameters which are runtime entities that exist on the stack).  These arguments can be static (determined by the source code) or they can be dynamic (determined at runtime through logic, calculation, user input or other data input sources such as database fields).  All static references can be resolved in a standalone preprocessor (such as P2J has).  However, any dynamic arguments cannot be known in advance and thus there can be preprocessor constructions or directives which cannot be resolved by a standalone preprocessor.  Such external procedures cannot be precompiled in the Progress 4GL environment.  However at runtime, when all such information is available for the interpretation of the RUN statement, the external procedure is preprocessed just before it is converted into R-Code.  The interpretive model enables a "late" preprocessor which is what allows this feature to be provided.
  1. Note that such problems do not affect include files as all such references are statically available in this case.  Everything for an include file is processed as a string where expressions such as value() can be used in RUN statements.
  2. A valid use for this feature is to make template code for something that can otherwise only be referenced statically.  An example is a database name (e.g. customer.address).  These references are statically compiled and are not processed at runtime.  By using an expression to generate the name customer.address and passing it as an argument to an external procedure, you don't have to hard code the static customer.address anywhere.  Instead you can derive, calculate or query this value at runtime.  The key point here is you can't pass such a static value as a parameter because parameters are processed on the stack at runtime and database names are processed at compile time statically.
  3. Include files do not solve this particular problem because for each database name you still would need to have a separate external procedure in which the specific database name is statically defined.  Thus while some of the logic could be made into a "template" and centrally maintained, there are parts that could not be handled dynamically at runtime.
  4. The feature can be abused and is considered an abuse in cases where a parameter would have worked (in other words, where the resulting reference could have been handled dynamically at runtime).  Evidently we may find some such situations.
Some of these cases will be converted in a programmatic manner.  For example, the xt/pushhot.p is a very simplistic external procedure that uses positional arguments.  In this case, the problem can be resolved by converting the RUN statement into an inline preprocessor directive (see xt/x-login.p for an example of one such caller in which this can be done).  Other cases may take more complex conversion where the Java result is written to be equivalent in function while the logic is very different.  Yet other cases may remain that will require manual intervention.

The preprocessor itself is designed to handle a single external procedure (and an arbitrary number of included files).  To feasibly preprocess the thousands of 4GL source files in a given project, a Preprocessor Harness will be created.  This harness is responsible for properly managing the input to the Preprocessor and for driving the preprocessing of an entire source tree in a batch mode.  To do this the harness must have access to the original or "raw" 4GL source code.  In addition, in order to provide the proper arguments and options to the Preprocessor, the harness has a Preprocessor Hints file.  This file is manually created and it stores such input.   The Preprocessor Hints file may also include lists of source files which do not require processing.  These "ignores" may be listed due to manual conversion or they may be manually inserted due to the code being found to be dead in the Stage 2 analysis.

The Preprocessor is created based on a grammar which is used by ANTLR to create a set of lexers and a parser.  This grammar is being implemented with all of the preprocessor logic embedded in Java based actions attached to specific rules as necessary.  This usage of ANTLR is considered proper for the implementation of a filter.  A preprocessor is simple enough to be implemented as a filter and such an implementation avoids the complexity of an undefined number of iterations walking through the entire syntax tree.  Instead the expansion of each preprocessor feature is done by the lexer as it tokenizes.  Then this result is just written to output and the next token is processed.  When the lexer is done, the stream has been processed and all preprocessor expansion is complete.

The Preprocessor conceptually does the following:
  1. Uses an ANTLR generated set of 3 Lexers (to tokenize) and 3 Parsers (to syntactically check and structure the input).
  2. Implements a filter approach where the input stream (a text file) is preprocessed and written to an output stream (stdout or an output file).
  3. At a high level, it is byte for byte compatible with the following Progress Preprocessor features:
  4. Everything else is unrecognized and is assumed to be the Progress 4GL language.
  5. This unrecognized input is supposed to be unchanged and is copied directly to output.
  6. Producing statistics and reports.
Special Character support:

Input
Output
;& @
;< [
;> ]
;* ^
;' '
;(
{
;%
|
;)
}
;? ~

Operator and Function support:

Operator/Function
Type
+
operator
-
operator
*
operator
/
operator
= or EQ
operator
<> or NE
operator
< or LT
operator
> or GT
operator
<= or LE
operator
>= or GE
operator
AND
operator
OR
operator
NOT
operator
BEGINS
operator
MATCHES
operator
MODULO
operator
DEFINED()
Preprocessor symbol dictionary function

The following functions are NOT IMPLEMENTED since they are not necessary to preprocess the application.  These are the only known features of the Preprocessor that are not supported.  All other features are 100% implemented.

Due to the problem with using named or positional arguments in external procedures, it is possible that some (hopefully small) percentage of the source cannot be completely preprocessed without custom logic (probably enabled through manual analysis and the Preprocessor Hints file).  So on output we will have a processed 4GL source file, though there may be cases in which this processing is partial.  In all these cases, a report will be generated that documents such problems.  Other report output will also be available to allow easy analysis of any arbitrary source file for preprocessor content.

A good question the reader may be asking is: why not just use the Progress environment's ability to save a preprocessor output file (by using an option to the COMPILE language statement) instead of spending the time to make a clean room replacement for the Progress preprocessor?  There are 4 reasons:
  1. Having a pure Java, clean room preprocessor eliminates all dependencies upon having a working Progress runtime environment in order to make a conversion.
  2. The code conversion process needs to know exactly which include files are used and exactly where they are included in order to make proper decisions about which code can be made into a common class and used from multiple locations.  Only the preprocessor has this knowledge.  Without it the code conversion would have to run arbitrary comparisons to find all the locations of larger code patterns that match across the entire project.  This is a task that is not feasible.
  3. Preprocessor conditionals (&IF directives) cause each file to potentially have multiple output variants.  All of these locations would have to be known and each variant would have to be manually generated (using the COMPILE statement) in order to ensure that all possible code paths are visible to the P2J environment.  Instead, the P2J preprocessor will have a generate all variants mode that allows this to occur automatically (since this there are only a finite number of possible permutations).
  4. RUN statement arguments cause a runtime form of multiple variants.  The big problem is that these can be theoretically infinite in number.  Thus having the preprocessor available at all times makes this problem more easily resolved.
Detailed Design/Implementation Process:
  1. Design the format for the Preprocessor Hints file.
  2. Design and implement the set of statistics to be captured.
  3. Design the format and content of the output report.  Based on options, this must provide a detailed listing of the expansions and modifications that occurred during this preprocessor run.  Implement this report output.
  4. Design the format for the Code Conversion Hints file.  Note that from a preprocessor perspective, the detailed listing of the expansions must be written into the Code Conversion Hints format.  It is most important that the information regarding include file processing is complete.  The later stages of code conversion would have no knowledge of where includes are used without this output.  The objective is to maximize the ability of the later conversion stages to write a common class (and then call into that code multiple times) for as much included code as possible.
  5. Implement helper class(es) to process the hints files on input and output.
  6. Implement the file tree processing harness which takes its input from the hints file and calls the preprocessor to process the entire application source tree.

Dead Code Analysis (Stage 2)

The objectives of this stage are twofold:
  1. To make a complete list of all possible "root" procedure entry points into the 4GL application.  This is a simple list of all possible external procedures (files) that can ever be run as a starting point for an end user.
  2. To make a list of all 4GL application files that cannot ever be executed due to a lack of a call path that allows them to be reached.  This is the dead code list.  These files are not to be converted.
The effort starts with a manual analysis of the possible methods of launching Progress (in particular the name referenced in the -p option on the command line).  A list is made of all external procedures which can be launched into at startup.  Then the call tree is manually analyzed to trace the top level flow.  This step is only necessary if there is a layer of abstraction between the top level entry points and the real application entry points.The information is encoded in a form that can be easily read, parsed and used by the dead code analyzer.

The dead code analyzer is a module that uses the list of root entry points as a starting point.  It reads each root entry point and generates the resulting UAST.  The call tree nodes in this UAST comprise a list of all possible external procedures accessible from this root entry point.  This process is repeated for all root entry points and the union of all possible accessible external procedures is recorded.  All other files that are not part of this union are considered dead code since they can never be executed.

The dead code list must be created with awareness of which include files are actually in use.  Only those include files that can never be reached should appear in the dead code list.  This input is a combination of the knowledge output by the preprocessor into the Code Conversion Hints and the first pass at the Dead Code List where all the dead external procedures are known.  This second pass can complete the list using these first two inputs (any include files only accessible from a dead external procedure is also considered dead).

Detailed Design/Implementation Process (starting this is dependent upon the completion of development of the Progress 4GL lexer/parser and UAST):
  1. Design the root entry point list format.
  2. Design the dead code list format.
  3. Develop the dead code analyzer.
  4. Run this against the complete application source tree.
  5. Review the results with the client and obtain confirmation that the dead code list is accurate.

Annotation (Stage 3)

Once the Source UAST is available, the job of conversion cannot be started without additional analysis.  The majority of this analysis is a matter of pattern recognition.  As certain patterns are found in the Source UAST, it is useful to remember that these patterns exist, where they exist and other related or derived information about these patterns.  This information will be stored in the Source UAST itself in the form of "annotations" to the tree.  So the primary input to Stage 3 is the Source UAST and an Annotated Source UAST is the primary output.

This stage leverages a generic framework and infrastructure for implementing a data-driven pattern recognition and annotation process.  The infrastructure will be built once and each situation that calls for an annotation will be described as a rule.  This rule is made up of a simple or complex pattern to be matched and one or more annotations that should be made in the case of a match.  The patterns themselves are written as expressions using a modified form of the Java Trace Analyzer (JTA) expression language.

Other options for handling this include:
  1. Hard coding the pattern recognition and annotation logic into a large amount of custom Java source code that runs against the UAST.
  2. Hard coding complex and varied logic into the lexer and parser to try to annotate as each AST is built.
The second option is not technically viable.  One will need to annotate based on analysis of nodes that exist across multiple ASTs or between multiple levels of the UAST.  Since the lexer and parser only ever operate on a single Progress 4GL AST, this would be impossible.  In addition, it is troublesome and limiting to only be able to annotate while lexing or parsing.  In other words, one cannot look ahead enough to make a wide enough range of pattern recognition possible.  For these reasons, there is a strong separation between the Progress 4GL lexing/parsing and the subsequent annotation process.

While the first option is very possible, taking the generic, data driven approach has the advantage of allowing the same, well debugged, working Java code to be used thousands of times by just specifying rules rather than hand coding Java source for each one.  This will save a great deal of coding, debugging and testing time.

Most importantly, the same core technology used here in Stage 3 will also be used in Stages 4/5 to create the target tree.  It is expected that 80% of the conversion can be done using this generic set of pattern recognition technologies and the subsequent rule based actions.  Knowing this, the obvious question is: why split the conversion up into 3 phases when the same technology is used for all 3?  The reason is simple: by using multiple, sequential passes (a pipelined approach), certain problems can be solved in one pass and which means that subsequent passes can ignore these issues.  For example, the lexer deals with the problems of converting characters to tokens and the parser only deals with tokens and relationships between tokens.  This simplifies the implementation of both the lexer and the parser and allows each to do one thing well.  For the same reason, there are things that must be done to the entire tree before other processes can easily be started.  These tasks have been split up (loosely) into the stage 3, 4 and 5 processing. 

How the technology works:

A list of rules encodes the core data necessary for the annotation process.  This list of rules is called a ruleset.  There can be multiple rulesets, each used for specific purposes.

Each rule includes an expression (either simple or compound) which matches some subset of all UAST nodes.  If the expression only refers to the contents of a single node (Progress 4GL language statement) and its immediate children then it is considered a simple expression.  If the expression refers to data from multiple nodes (multiple language statements) in order to determine a match, this is a compound expression.  In other words, a compound rule is one that references stateful information stored regarding previous nodes (parents and/or siblings and their subtrees) in the tree.  This state information may be stored in a scoped dictionary and can be referenced to match context sensitive patterns. 

A rule also includes an action that should be executed on any match of the associated expression.  The list of valid actions is the list of annotations that are possible.  Many annotations will edit the UAST itself, while other annotations may maintain stateful information for other rules or to provide dictionaries for conversion processing.  An Interface will be defined to allow annotation actions to be "plugged in" without having to modify core processing.  Another action example is creating a custom logfile or statistics (e.g. a scanning report).

The standard Tree Processing Harness will be used to iteratively process the Source UAST.  One pass will be made for each rule to be processed.  Thus each rule will be tested against every node in the tree.

The Pattern Recognition Engine is the primary component of this stage.  It reads the list of rules and for each rule it is responsible for utilizing the tree processing harness to walk the tree.  The pattern recognition engine is responsible for providing user functions and variable resolution to the expression engine.  The first time an expression is used, the generic expression engine parses and compiles the expression into a Java class which is stored in the expression cache.  Every subsequent time that expression is used, it is directly run from the cache.  Variable values and the results of user functions are derived at runtime.  These callbacks are made by the compiled Java class and the pattern recognition engine services these need.

As the harness walks the tree, at each node the entire list of rules will be processed (as opposed to walking the tree once for each filter).  This is an important enabler of compound rules because this allows a real-time scoping methodology to be used.  As nodes in the tree are encountered that add or remove scopes to a data structure or dictionary, annotation actions will always be able to simply trigger their actions on the correct scope (usually the current scope, but sometimes a global scope).  Then subsequent rules will be able to directly lookup or reference such data that is naturally scoped to their current node.  The alternative would be to make each added record have a scoped definition and to make the lookup itself be aware of only returning the value that matches the correct scope.  This is possible but is much more work.

When a match is found, the associated action is triggered which makes an annotation to the tree or to other related data structures, dictionaries, trace or log files.  Once all rules have been processed against the current node, the current node is changed to the next node rule list is evaluated again.  This process continues until all nodes have been processed.  At that point, the Source UAST is fully annotated and Stage 3 is complete.  The Source UAST will most likely be stored persistently in the filesystem to ensure that the results are saved for future stages.

A powerful feature of actions is their ability to calculate or derive the data which they will insert as an annotation ("smart actions").  For example, one would imagine that encoding the conversion category or conversion level would simply set a known attribute of a node to a constant value.  However, one could also envision translating a Progress name into a Java name and saving this as an annotation.   This latter example is a condition where the annotation applied by the same rule is different for each node because it is dynamically generated.

Detailed Design/Implementation Process (starting this is dependent upon the completion of development of the Progress 4GL lexer/parser and UAST):
  1. Make a list of the known types of annotations that need to be made.  For each type, design a method to store the annotation in the Source UAST, Target UAST or in a target data structure if the annotation is not a tree edit.  At a minimum, the following must be provided for:
  2. Make a list of the user functions (usable inside the expressions) needed to process the Source UAST.  For each one, detail the signature and function to be provided.
  3. Design the approach for variable resolution.  This must include support for both simple (referencing data within the current node of the tree) and compound (referencing data across multiple nodes of the tree) rules.  Some ideas about handling compound rules:
  4. Design the Interface to plug in actions.
  5. Design the format for encoding each rule (the simple or compound expression and the associated action).
  6. Implement the Pattern Recognition Engine, variable resolution and user-defined functions.
  7. Implement the known list of actions.
  8. Re-implement (in Java) the REXX based scanning and analysis code using rulesets and the Pattern Recognition Engine.

Structural Analysis (Stage 4)

In this stage the 4GL application source will be analyzed and a structural design of the target Java application will be constructed.  The output of this stage can be considered the "skeleton" of the Target UAST which will be completed by the Stage 5 Code Conversion.  The structure and interfaces of the Target UAST must be generated completely in Stage 4, before any code is actually converted.  This is necessary because of the refactoring that is occurring between the source and target application.  Since the code's natural structure is not staying intact, it is necessary to generate the complete target structure so that when code is converted, the code locations and the interfaces (used as linkage between related objects) are known.  Otherwise, there would be a problem in generating any code that might have to link to code that is not yet generated.  By precalculating the interfaces and structure, each piece of code can be converted separately in Stage 5.

The tree processing harness module and the Pattern Recognition Engine from Stage 3 will simplify the processing of each (Annotated) Source UAST as a related unit.  Each such tree can be considered a standalone application since each tree encompasses all source code that can be possibly reached from a specific root entry point.  The full set of these trees can have overlapping contents to a greater or lesser degree, as many files (and thus the resulting subtrees rooted at those locations) will be reachable from multiple trees.  To the extent that these overlapping branches are already processed, the tree processing harness allows bypassing reprocessing such branches.

Refactoring is done by defining a set of conversion categories that in total, represent the complete set of function of the target application.  Each category is only responsible for a specific subset of the application function.  In Stage 4, the function that is category-specific is called the "Category Strategy Choice(s)".  Each Category Strategy Choice module will use the Stage 3 Pattern Recognition Engine and a category specific ruleset to walk the tree looking for structural patterns that correspond with a resulting target structure.  Much (if not all) of the recognition logic needed to categorize the Source UAST is handled by Stage 3 (Annotation).  By starting with the annotated UAST, the job of analyzing the category-specific nodes of the tree is significantly easier.  The Stage 4 ruleset is designed to analyze the category-specific structure and to use actions to generate the structural nodes or skeleton of the Target UAST.  It is expected that 80% or more of the conversion cases can be handled in this manner.

Once the pattern recognition engine has run for a specific category, the Category Strategy Choice module must process all remaining unhandled nodes of that category in the Source UAST.  This is where category specific custom coding may occur to handle cases that cannot be properly (or easily) represented using the Pattern Recognition Engine.  At a minimum, it is important that the conversion ruleset will at least flag those areas that require more attention.

In cases where there is more than one strategic choice for how to structure the target, the Category Strategy Choice module must decide on the correct structure.  It is possible that multiple strategies are chosen for different parts of the source application, however the strategic choice can be as simple as makes sense for a given situation.  Note that the Code Conversion Hints file is an input to this process.  This file can store input from prior stages as well as manually created input.  In either case, this input may have the affect of overriding a default structural choice in preference for another.

The Category Strategy Choice must generate nodes in the Target UAST for the target Classes, Interfaces and the corresponding inheritance model.  In addition, the Source UAST will be cross-referenced to the Target UAST locations (to allow Stage 5 to know exactly where to place its results) and vice versa.  For example, a particular user interface "Frame" may be analyzed by the UI Category Strategy Choice module and a class may be created in the Target UAST.  Then both the source and target will be annotated with a reference to the associated node(s) in the other UAST.

The Code Conversion Hints that are generated by the Preprocessor will be used to know where inlining has occurred (and which source files the inlining occurred from).  This will allow the code conversion to generate Java classes once that are reused in many places where inlining used to occur.  We will need to programmatically determine when it is appropriate to create a class versus allow the inlining to occur:
  1. If there are no { } substitutions and the code doesn't directly reference specific local variable names, the odds increase that a separate class is possible.  Even in these cases, the code may be rewritten to handle these issues.
  2. If the inline file contains function/procedure definitions as the primary or only code, then it is likely these can be moved into a central class.
The next step in Stage 4 is to perform a Linkage Analysis.  This is the identification of each place where two pieces of code interface.  For each of these locations, a decision must be made about the linkage strategy used in the target.  As a result, each Class and Interface in the Target UAST  must have its methods (including signatures) generated.  This also includes defining accessor methods (getters and setters) for data members.  This process may be overridden or modified by code conversion hints.  Both the Source UAST and Target UAST will be annotated with a cross-reference to the associated node(s) in the other UAST.

Once the object or class hierarchy is generated and all linkage points has been resolved, the Naming Generation/Conversion must run to convert or translate Progress 4GL names into valid Java names.  Certain changes will be required since some Progress 4GL symbol characters are not valid Java symbol characters.  For example, all hyphens must be converted to another character (the hyphens cannot necessarily be removed because they may then generate a naming conflict).  In particular, it is a challenge to generate "good" Java names based on the current (good or bad) Progress 4GL names.  Heuristics are written for doing the best job possible and for identifying those situations in which the result is just unacceptable (flagging these for human review).  Any human generated overrides or modifications would be stored in the hints input to the code conversion processing.

When complete, Stage 4 leaves behind the complete structure of the target application.  Please note that due to refactoring, the Target UAST will not have a single root, but instead it will have many smaller trees that are each likely to be category specific.

Outputs include:
  1. Interactive hints mode where the conversion process halts at ambiguous decisions or decisions that need human review.  At these points, the operator would be prompted with the choices and any non-default choices would be stored in the Code Conversion Hints file.
  2. Cross-reference annotations written back into the Source UAST.
  3. Creation of a skeleton structure in the Target UAST.
Detailed Design/Implementation Process (start is dependent upon completion of Stage 3 implementation AND upon the Java UAST implementation):
  1. Design and implement a mechanism to store cross-references in both Source and Target UASTs.
  2. For each form of possible programmatic linkage (this includes all call or program invocation mechanisms as well as direct variable references using shared variables), define the rules by which the Java equivalent is made.  Document any ambiguous cases where manual review is required.
  3. For each Java language construct that must be represented (structural, naming, linkage), design and implement a Target UAST node to represent this construct in the tree.
  4. Design and implement a mechanism to track the conversion status of each node in the Source and Target UASTs.  This will allow a standard tree walk to be implemented that only walks those nodes that still must be processed.  This is a concept of saving state to allow a form of incremental or multi-pass tree processing.
  5. Strategy Definition
  6. Define and implement the Pattern Recognition Engine ruleset for name generation/conversion.  These are by nature category specific as Java has well defined rules for naming that  differ by type of symbol (e.g. method name versus variable name).  Document any ambiguous cases where manual review is required.

Code Conversion (Stage 5)

The goal of Code Conversion is to finish the Target UAST skeleton created in Stage 4.  Once Stage 5 is complete, the Target UAST is done and ready to be output to Java source files.

All nodes in the Source UAST can be classified in one of five possible levels:

Conversion Levels
  1. Elimination
  2. Direct Mapping (to Java language, J2SE and/or GCD runtime)
  3. Automated Rewriting
  4. Manual Rewriting
  5. Unknown (review required)
L0 code does not have to be converted because it serves no functional purpose in the target application.  Examples of reasons for L0 code include Progress-specific batch processing and code that is not reachable because it is platform specific (for a platform that is not run by the client).

L1 code has a direct mapping from Progress 4GL into Java.  This means that not only is the function equivalent between the two environments, the source and target logic is identical.  An example of L1 code is an arithmetic expression where the operators and formatting may be different, but there is a 1 to 1 correspondence in the logic between the two environments.

L2 code can be programmatically converted such that the result is functionally equivalent, but the resulting Java is logically different from the original source.  This conversion is called a "rewrite" of the code and an L1 is automatically rewritten.

L3 code can be rewritten by hand to provide equivalent function, but (at this time) it cannot be programmatically rewritten.  This is the same as L2 but with a manual conversion.

L4 code is unknown.  This means that some manual review is required.

Each node in the Source UAST has an assigned level.  This assignment occurs using the Stage 3 Annotation process.  The Stage 5 processing only handles L1 or L2 code.  All other code is ignored.

As noted in prior stages, all code will be handled by separating the application function into a finite number of categories.  The following is the current list (it may be incomplete or incorrect):

Conversion Categories
  1. comments
  2. constants
  3. string resources
  4. flow of control
  5. expressions
  6. assignments
  7. other base language statements
  8. functions
  9. methods/attributes
  10. variables including array processing
  11. database access
  12. I/O (files, pipes, devices and printers)
  13. user input processing (keys)
  14. accelerator keys (hotkeys)
  15. event processing (UI, database, procedure)
  16. screens and dialogs
  17. UI controls or widgets
  18. menus, navigation and scrolling
  19. security rules
  20. input validation rules (used in transactions, procedures)
  21. transaction interfaces (client code, service interface)
  22. operating system commands and external programs/scripts
  23. dynamic source file creation and execution (it is possible to write Progress 4GL source code to a file and then execute that code at runtime)
Conversion Categories organize the activities in both Stage 4 and Stage 5.  This is the essence of how refactoring is handled by the P2J environment.  Each category handles its portion of the conversion:
  1. Without mixing in code from other categories.
  2. In a manner that is optimal for how that category of function is handled in Java.
The result is a set of Classes, Interfaces, methods, resources and other output that is a set of well designed Java objects that represent the minimum code necessary to implement the specific category of code.

In Stage 5, there is a Code Conversion module for each category.  Each such Conversion module processes the Source UAST using the Tree Processing Harness and the Pattern Recognition Engine from Stage 3.  For each node which is marked for the category being converted and which has conversion level L1 or L2, the Code Conversion module will run one or more rulesets through the Pattern Recognition Engine.  The ruleset actions will create/edit the proper Target UAST nodes to ensure that the resulting application has equivalent function.  If necessary, processing can be implemented as multiple, sequentially run rulesets (each ruleset is processed across the entire tree before the next ruleset runs).  After the Pattern Recognition Engine is done processing all category specific rulesets, the Conversion module will process all category specific nodes for which conversion could not be properly or easily specified in the rulesets.  This is a kind of post-processing that allows any custom conversion strategy to be implemented.  The idea is to handle 80% or more of the conversion using the Pattern Recognition Engine and then to handle the rest with custom code if needed.

Once all category specific nodes in the Source UAST are processed, that categories' Code Conversion is complete.  Note that the Code Conversion Hints provide a mechanism for feedback from other stages and from manual review.  These hints can modify or override the default conversion behavior that would otherwise occur.  In some cases, nodes in the Source UAST may be ignored using hints.  In other cases, a manually created replacement for some specific nodes may be specified.  In still other cases, a non-default conversion choice may be forced via hints.  This is especially important in any conversion situation in which the choice is ambiguous.

Note that all hints are customer application specific.  These are ways to override the default behavior or processing.  Some hints are global (ignore the following list of language keywords: VMS, OS2...) and some hints are file-specific (tied to a specific piece of Progress 4GL source code).

An important point must be made regarding the lack of a language neutral intermediate representation of the application.  One can consider that in between the Source and Target UAST, there could be an artificial language neutral representation (an "Intermediate UAST").  This would allow one to more easily abstract (and thus substitute) front ends (4GLs) and back ends (target languages other than Java) that provide support for different language conversions.  Instead, we have chosen a direct conversion between Source and Target UASTs, which means that the P2J code conversion tools are very specific to a Progress 4GL front end and a Java back-end.   The Source UAST is Progress 4GL specific and the Target UAST is Java specific.  No language neutral Intermediate UAST will be used.  This limits the tools (as implemented at this time) to Progress to Java conversions.  It also allows the process to be simplified and for specific conversion problems to be solved with an optimal solution that is much easier to achieve than it would be with a decoupled process.

Outputs include:
  1. Reports mode where the nodes are flagged that have ambiguous decisions or decisions that need human review.  A manual review would be done off this report and any non-default choices would be stored in the Code Conversion Hints file.
  2. A Target UAST that is ready for output into Java source code and/or into a bidirectional cross-reference of the conversion.
Detailed Design/Implementation Process (start is dependent upon completion of Stage 3 implementation AND upon the Java UAST implementation):
  1. Determine if categories are mutually exclusive or if they can be overlapping.  For example, there can be user interface related options or clauses to variable declarations (e.g. format strings for how a variable should be displayed).  While the greater variable declaration has nothing to do with the UI, some portion of it may be UI related.
  2. For each Java language construct that must be represented, design and implement a Target UAST node to represent this construct in the tree.
  3. Code Conversion Plan

Output Generation (Stage 6)

This stage is designed to generate output based on the Source and Target UASTs.  It does not alter either UAST in any way, nor does it make any decisions.  It simply uses the Tree Processing Harness to walk the UASTs and for each node, it will generate output.

It is important to note that no conversion logic is embedded in the output engines.  The Java Language Formatting Engine (JLFE) processes the Target UAST and for each note it generates the syntactically correct Java source.  If the node represents a Java language "for" loop, the JLFE will output the text that includes the main for ( ; ; ) the { and }.  Between the braces it would output any enclosed nodes in the tree.  The JLFE only knows how to output Java source for each Java source construct represented in the tree.  It does not make any decisions, it only outputs what it finds.  Using JLFE options, the formatting of the text output can be controlled in certain ways.  For example, the number of spaces in every indent can be specified as an option though it defaults to 3 spaces.

The Report Engine reads both the Source and Target UASTs and it generates a bidirection cross-reference to the Progress 4GL and Java source code.  All of the cross-reference information is already stored in the UASTs so this engine only formats this representation into HTML and writes this output into multiple files.  This output is used to verify the correctness of the conversion.  More importantly, it is a continuing resource for the application developers that are to maintain the converted application.

Both engines are based on a core Template Processing Engine that takes a UAST node and a corresponding template and emits some text as a result.  Once the base templating code is working, the development of each node-specific output is much faster because the knowledge of how to handle that node is encoded in a small piece of data that is distilled down to the minimum necessary information.  Rather than spending time coding, debugging and testing custom Java code for every node, one just needs to debug the template itself.  This is a much smaller effort than writing custom Java code for each node.  The templates provide the ability to handle literal text, substitution of node attributes and the specification of how the standard formatting rules apply to this template.  Any such standard formatting rules would highly standardized across all output and might be modified by customer specific formatting options.  For example, some customers may wish to insert a new line before opening any curly brace where others prefer the open curly brace to be on the same line as the language construct it is "connected" with.  There is a syntax for specifying these different parts of the template.

There may even be more than one template for each node, with one being a default and the others being alternate formatting versions that are used in certain circumstances.  For example, there may be 2 different formats of "for" loop that are used based on whether or not the total line length is smaller than some value (e.g. 78 chars).

The output from multiple nodes may go to the same file.   It is likely that some nodes may contribute output from multiple templates and each template's output may be destined for different files/streams.  For example, a node representing a Java Class would (usually) generate source code into a file of the same name as the class.  In addition, it may generate some content into an ANT build.xml file to ensure that the generated build environment has the proper knowledge of how to build the resulting class.  There is a mechanism for the template to be matched with the correct output stream.

Outputs:
  1. Syntactically correct Java source code and any resource bundles or other files that may be needed.  An ANT build.xml file is one example.  This represents the complete converted application source code.
  2. A bidirectional, cross-reference of all source and target constructs.  This documentation provides an HTML view of the source and target trees with both indices and details.  Some of the contents:
In the future, we want to automate the comparison of the source and target trees for logical equivalence.   This would be possible with L0 and L1 code.  This might be a precursor to Stage 6 or just run at the beginning of Stage 6.  At this time, this is out of scope.

Detailed Design/Implementation Process (start is dependent upon completion of Stage 3 implementation AND upon the Java UAST implementation):
  1. For each Target UAST node type, specify the syntactically correct Java source code that should be generated and the placement of substitutions based on attributes in the node.  The the "for" example above, the loop criteria being tested would be an expression stored in the node.  This would need to be emitted in the correct part of the "for ( ; <criteria> ; )" text.
  2. Make a list of the standard source code formatting options and the variations that can be specified.
  3. Design the HTML formats for the report output (see #2 above).
  4. Design a template syntax that provides sufficient function to handle both types of output.
  5. Develop the Template Processing Engine, the Report Engine and the JLFE.

Other Conversion Tasks


Implementation Process (start is dependent upon completion of the runtime Menuing and Navigation implementation):
  1. Document application-specific requirements for the account and menuing database.
  2. Identify the specific Progress account/menuing database tables and fields that need to be converted and any changes that need to be made to the data.
  3. Develop the schema hints to specify how to redefine the P2R Mapping (in Database Schema Conversion) and to subsequently capture this data in Data Conversion.  It is possible that additional types of conversion mappings will be necessary to implement this function.  If so, add these new mappings.

Miscellaneous Issues

A modest up-front effort to design for a wide range of implementation scenarios is a significant value when compared to the alternative of deferring that effort.  Adding the ability to control or modify processing later typically requires the modification or replacement of working components in the system.  This is time consuming, error prone and can generally be made unnecessary with the proper design at the beginning.  Another way of stating this is that it is a fact that the least costly way (calculated as total ownership cost over the lifetime of a system) to solve problems is to design the system with the problems in mind.  Adding solutions (to problems that a system was poorly designed to handle) later to an already implemented system is the most costly approach.  Simply put, getting the design right is critical.  If a system wasn't designed to handle a particular problem, making it do so is going to be much harder than it ever should be.

Many of the following issues relate to the proper design of P2J to handle a wider range of implementation requirements.  While none of these problems may be completely resolved in the first version, the design of P2J will be such that resolving these in future versions will be a reasonable effort.

Load Balancing/Failover/Redundancy

Load balancing is the concept that multiple servers share a combined workload but appear as a single server.  This is typically done to allow multiple physical hardware platforms to provide a service that appears to be from a single virtual server.  Systems implementing this facility obtain scalability and capacity by using multiple physical hardware systems instead of a single relatively larger system.

Failover is the concept where a failed server's workload is transparently shifted to and handled by another server.  Systems designed for failover provide a significantly higher level of availability than is otherwise possible.

Redundancy is the concept that multiple equivalent paths through a system can be implemented to reduce possible points of failure.  More specifically, these multiple paths must be implemented using duplicate components that can failover transparently.  In every place where this is implemented it eliminates a single point of failure and increases the reliability and availability of the system.

These 3 problems are highly related.  Typically, there is a single solution to all 3 problems.

Naming and routing can be used to enable multiple identical, physically separated back end services to provide what appears to be a single unified service.  This is very important to ensure that the client can locate, establish communications and execute transactions with this service without any knowledge that it is being serviced by multiple processes, spread across multiple systems on the network.

The physical design of these services must provide for network, hardware and software redundancy.  The best scenario is to ensure that the network is meshed such that a single failing component cannot make the service unavailable.  Then each cooperating service must be located on a separate physical platform, on separate network devices.

The directory service combined with the router provides the load balancing and failover from the client perspective.  It must be aware of the status of each cooperating service and it distributes workload accordingly.  Most importantly, when a service becomes unavailable the router must ensure that the work is redirected to other services AND that any transactions in process are restarted as necessary to continue processing.  At worst the client should only see a temporary failure and a retry will succeed.  Optimally, even this temporary failure would not occur, however this is highly dependent upon the state of the database/transactions and the session state of the client.  While the session state will be stored at or near the router, the database and transaction state is typically very difficult to share.

The P2J environment is quite reliant upon the directory server (an LDAP implementation) and upon the relational database that the customer chooses to implement.  Both of these facilities must be implemented as a redundant solution such that these do not become a single point of failure.

Finally, the transaction server/router itself must be made to be redundant with automatic failover.  This will likely involve an external (customer-supplied) solution that implements a single IP address from the client view but multiple back end destinations in reality.  Many such solutions exist (some hardware based and some are software based).  These solutions must handle the proper matching of the session traffic with the right destination.  To make this more seamless, the transaction server/router will be written to very carefully maintain an identical shared copy (accessible to each instance of the transaction server/router) of each session's state.  By ensuring consistency of this shared state at all times, seamless failover can occur because individual transactions can be handled by different transaction servers/routers each time without any negative impact.

There is no plan to implement these features for the first version.  It is important to note that the system will be designed such that adding these features in the future will not require a complete rewrite of the system.

User Hooks and Implementation Control

It is much less effort to modify a system's behavior via a configuration file than it is to modify or replace the components of the system itself.  Thus in the areas of P2J where it is likely that implementor control will have great value, it is important to plan for such control in the design.

The following are approaches that will be used to provide such control:
  1. User Hooks
  2. Configuration Values
  3. Data Driven Rules

Internationalization

All runtime modules will be enabled for internationalization.  This includes the user interface components, the online help system, string and other data processing and formatting.  This enablement means that if multiple resource bundles exist, they will be properly recognized and utilized by the runtime components.  One feature that will not be present in the first version is right to left (RTL) language support (e.g. Arabic or Hebrew).   The Java language itself handles the normal problems related to character sets by using UNICODE.  P2J picks up this benefit "for free".

To the extent practical, string resources (literals) of the converted application will be collected into resource bundles in a format that facilitates internationalization.  In particular, it is important to enable the simple maintenance, editing and runtime substitution of alternate resource bundles based on the target locale.

Locale specific data formats (e.g. dates, monetary amounts...) will likewise be easily maintained and runtime-selected using a configuration (stored file, database or the directory) rather than hard coding these values into the converted application code.

Please note that this support is designed to separate out the resources into sets that can be translated or replaced and then selected at runtime.  This represents a significant amount of the effort in internationalizing an application, but by no means does it cover the full effort.  In particular, it is extremely likely that any given Progress 4GL source file has dependencies on a hard coded locale.  An example would be language statements that concatenate English strings in a given order and with hard coded padding or character widths (possibly for columns in a report).  Internationalizing the strings alone does not resolve the basic problem since the source code itself has assumptions regarding the locale.  P2J will not address this issue at all in the first version.  If such hard coding exists in the application, it will exist after conversion as well.

At this time there is no plan to provide multiple translations of the P2J runtime resources (for standard dialogs, error messages or the administrative interface) and documentation.  The runtime will support multiple locales, but only the US English locale will be provided in the first versions.  Enhancements to this will be possible in the future based on customer requirements.

Since the vast majority of the data processing (including string and data formatting) is done on the server side of the P2J process, it is very likely that the client's session must define the locale and this locale must be honored at the server.  For this reason, a user in Japan that is using a terminal that is connected to a CHARVA client running on the P2J server, must be running a client using the expected locale.  This locale will be picked up and stored in the session state by the server side.  Subsequent processing on the server will use this locale.  Different users running clients that have different locales must each find that the same server honors their specific settings without regard to the locale in which the server is running.  Likewise these different users must never see any indication that other locales (besides their own) are in use.

Regression Testing

Unit Testing

Unit testing is designed to determine the conformance of a specific module or component to its corresponding specification.  There are 2 approaches to unit testing which differ based on the type of processing being tested:
  1. Batch Processing
  2. API Conformance
System Testing

System testing is designed to certify the proper functioning of the entire system.  By nature, such tests are more end-user oriented.  This means that the testcases are designed to match as close to real application processing as possible.  Testcases will be generated by both the customer and Golden Code.  At this time, it is expected that system level testing will be a manual process of running each testcase and confirming the results or noting the deficiencies.

In the future, automated tests would be the objective. This is significantly more complicated than the majority of the unit testing, since most of the system testing is likely to involve interactive components.  It is unlikely that an automated approach can be achieved before the first production installation.

Build Environment

ANT is used to manage builds.  ANT was chosen for the following reasons:
  1. It is well accepted by the Java developer community.  It is the defacto standard for Java build environments.
  2. It is platform independent (since it is written in Java).
  3. It has many features that are specifically for Java.
No platform specific scripts or components are used in the build process.  The build can run on any platform with a Sun-based JDK.  This may exclude some IBM JDKs but most JDK ports are based on the Sun reference.

The entire P2J environment (conversion tools and runtime) is built with ANT.

References

Progress® Version 9.1 and WebSpeed® Version 3.1 PDF Documentation: Documentation: Products: the Progress Company
Progress Programming Handbook
Progress Language Reference
PROGRESS E-Mail Group Home Page
Joanju - Home
User Groups Worldwide Listing: User Groups: Tech Support: the Progress Company

Trademarks

'Golden Code' is a registered trademark of Golden Code Development Corporation.

Progress is a registered trademark of Progress Software Corporation.

'Java', 'J2SE', and 'Java Compatible' are trademarks of Sun Microsystems, Inc.

Other names referenced in this document are the property of their respective owners.


Copyright (c) 2004-2005, Golden Code Development Corporation.
ALL RIGHTS RESERVED. Use is subject to license terms.