Skip navigation links

Package com.goldencode.p2j.persist

Runtime services which support persistence of an application's business domain Data Model Objects using the Hibernate ORM framework.

See: Description

Package com.goldencode.p2j.persist Description

Runtime services which support persistence of an application's business domain Data Model Objects using the Hibernate ORM framework.

Authors
Eric Faulhaber
Constantin Asofiei
Date
June 4, 2009
Access Control
CONFIDENTIAL

Contents

Introduction
Design Goals
The Data Model
Supported Data Types
Transaction Management
Record Retrieval
Row Level Access
Pessimistic Record Locking
Lock Types and Characteristics
Lock State Transition Rules
P2J Implementation
Temporary Tables
Transaction Isolation and Index Updates
Record Inserts
Record Updates
Record Deletes
Summary of Progress Behavior
P2J Runtime Implementation
Dirty Sharing
Data Validation
Runtime Support for Natural Joins
Using Foreign Keys for Runtime Joins
Keeping Foreign Keys in Synch
Special Where Clause Considerations
Client-Side Where Clause Processing
Valid Date Ranges

Introduction

This P2J runtime component comprises the persistence layer between an application's business domain objects and a relational database backend.  The Hibernate Object to Relational Mapping (ORM) framework is used to implement this component.

There is a significant impedance mismatch to overcome in this area, as there are considerable semantic differences between using Progress 4GL to access a Progress database, and using Java to access a non-Progress, relational database.  As an example, all database access in Progress is fundamentally row-level access.  That is, even in cases where a set of records may be preselected with query criteria (and possibly cached at some level), each record is effectively retrieved and acted upon in the 4GL independently.  In the target environment, the natural idiom is set-oriented database access.  That is, a set of records is retrieved by a query, then those records are acted upon in Java code as a set.  Furthermore, Progress manages concurrency with pessimistic locking, while this is generally (though not always) avoided in favor of optimistic locking strategies in the target environment.

There are numerous other differences in approach and implementation.  There are advantages and disadvantages to each, but the point is that the "best practices" approaches and implementations in either environment are fundamentally different.  This package (and converted application code) must overcome those differences to produce the same results in the target environment as are produced by the pre-conversion application.

Design Goals

The design goals of this package are to:
The latter is required to ensure a converted application maintains the integrity and consistency of data access required so as to not deviate from its current and expected behavior.  The former is provided to ensure that future development in the target environment does not require that the baggage of reproducing the existing semantic be carried forward as new modules and features are added.

The differences in approach and consequently, facilities, between Progress and the target environment necessitate that we accept some performance penalties associated with reproducing the existing environment's semantics in an environment which was not necessarily intended to accommodate them.  The overriding goal of this package with respect to the converted application is that data integrity and application behavior be maintained identically.  Thus, where a tradeoff exists between performance and preserving the semantic of the existing application to ensure correct behavior, we always choose the latter in the first pass conversion.  Once a converted application is certified to behave correctly, we aim to review areas which might be safely improved, replacing portions of or all of the Progress-like semantic in those areas with a best practices implementation for the target environment.

The Data Model

The data model supported by the P2J environment consists of POJO ("Plain Old Java Objects") mapped to relational tables using the Hibernate framework.

Each Data Model Object (DMO) class uses Java bean-like semantics to provide public, read/write access to its various, private data members, each of which logically represents a field from the original, Progress database table.  Thus, a Progess table foo containing a data field bar will convert roughly to a DMO class Foo with a private instance variable bar, and public getBar and setBar methods (note:  the accessor will be isBar in the case where bar represents a logical data type).  In actuality, the naming conversions may be more complex than this, in that arbitrarily granular substitutions and expansions are applied during the conversion process, which are specific to the application being converted.  This allows for a more verbose DMO class name (often a compound word), which is typical in the target environment, as opposed to the often terse, and sometimes cryptic, table and field names used in the source environment, imposed by historical limitations.

The moniker "POJO" implies that the objects are essentially business objects with no database intelligence or facilities built in.  In truth, the objects are not completely transparent, in that they each contain a unique identifier which is primarily for the benefit of Hibernate to manage these objects with respect to their backing database table(s).  The persist package may also use this identifier directly to retrieve individual records in certain cases.  This unique identifier maps directly to the primary key value of the backing record in the database.  Temp table DMO classes are slightly more polluted:  each additionally contains a multiplex ID field and implements the Temporary marker interface.

Each DMO class is mapped to one or more database tables using a Hibernate mapping document (HBM) written in XML.  Each such document shares the root name of the top level DMO class, but has a filename extension of .hbm.xml instead of .class (.java in the source directories).  These documents live alongside their Java counterparts in the same directory inside the deployment archive (generally a JAR file).

In some cases, a top level DMO will have one or more inner classes which are each mapped to a secondary database table, separate from (but related by foreign key to) the table mapped to the top level class.  These mappings are defined in the same HBM as the corresponding, top level class.  This structure is designed to support Progress array fields, a generic analog for which does not exist in the target environment.  For details on the mappings of inner classes to secondary tables, please refer to the Converted Table Structure and Data Model Objects subsections in the schema package overview.

Some DMOs contain references to other DMO instances, if the underlying database tables for these objects have a relational association with one another.  A detailed discussion of the possible associations is available in the Key Constraints subsection of the schema package overview.

Supported Data Types

All DMO classes which have been converted from an application's Progress schema use the P2J wrapper data types for all data members except the primary key (and multiplex ID for temp tables).  This is necessary to preserve proper Progress semantics in arithmetic calculations, logical comparisons, etc.  For further reading on why these wrapper data types are needed in converted code, please refer to the Data Type Mapping subsection of the convert package overview.

Thus, all accessor and mutator methods return and accept, respectively, one of the following types:
While Hibernate built-in types are used for java.lang.Integer (Hibernate type integer) and java.lang.Long (Hibernate type long), the backend database in the target environment and Hibernate of course are unaware of the P2J wrapper data types.  This is another area where the mismatch between Progress 4GL semantics and the target environment must be managed.

The solution is the use of Hibernate's user data type facility;  specifically, custom implementations of the org.hibernate.usertype.UserType interface.  This provides transparent transitions between the P2J wrappers needed by the converted application, and the SQL data types needed by the database.  For the custom user type implementations, we provide a mapping between the P2J wrapper types and the closest corresponding JDBC data type (defined in the java.sql.Types class), as follows:

Progress Type
JDBC/SQL Type
P2J Type
User Type Implementation
Notes
character
VARCHAR
character
CharacterUserType
User type uses java.lang.String internally.
date
DATE
date
DateUserType
User type uses java.sql.Date internally.
decimal
NUMERIC
decimal
DecimalUserType
User type uses java.math.BigDecimal internally.
integer
INTEGER
integer
IntegerUserType
User type uses int internally.
recidINTEGERrecidIntegerUserTypeUser type uses int internally.
rowidBIGINTrowidRowidUserTypeUser type uses java.lang.Long internally.
logical
BIT
logical
LogicalUserType
User type uses boolean internally.
raw
VARBINARY
raw
RawUserType
User type uses byte[] internally.
unknown
NULL
n/a
n/a
All user type implementations transparently implement this mapping

The mapping between DMO properties (i.e., fields/columns) and the appropriate user type implementation is defined in the HBM document for each DMO class on a per-property basis.  By using this architecture, the transition between database and DMO data types is hidden inside the Hibernate layer and the Progress semantic is preserved for converted code.  However, the mapping is easily changed post-conversion, if it is deemed that different data types (e.g., primitives or J2SE wrappers) are suitable in certain cases.  Perhaps more importantly, this architecture supports the development of new DMOs and supporting application modules which do not use the Progress semantic at all.

Transaction Management

The persistence layer relies heavily upon the TransactionManager to map Progress transaction semantics to SQL transactions.  As much transaction processing as possible has been made transparent to application code.  The persistence layer uses TransactionManager's various callback services to determine when a transaction must be commited or rolled back, and when database resources and record locks are released.

Buffer Manager

An instance of the BufferManager class is created immediately upon the creation of each new user context.  It manages all buffers open in that context.  As an implementor of Scopeable, BufferManager is notified by the TransactionManager each time a new block scope is opened and again when it is closed.  The object is also notified by each RecordBuffer instance whenever a new scope is opened on that buffer.  BufferManager tracks all open buffer scopes within the local user context.  When it receives notice that a transaction level block has opened, it starts a new database level transaction for each database currently referenced by an open buffer scope.  When any block opens that is within a transaction, it registers all open buffers for a commit/rollback callback.

BufferManager also implements BatchListener, so that it can receive notifications about assign batch mode processing.  This mode is used when multiple changes must be made to one or more DMO objects without triggering validation until the batch ends.  BatchListener.batchNotify methods trigger calls to RecordBuffer's static startBatch and endBatch methods.

Record Buffer

An instance of the RecordBuffer class is created each time a buffer is defined in application code.  This object manages all resources associated with a buffer throughout its lifetime, including any record locks acquired through that buffer.  RecordBuffer implements Finalizable and registers with the TransactionManager each time openScope is invoked.  The context-local ConnectionManager is also notified of the buffer scope open event.  Upon closing of the scope, the buffer attempts to transition any record locks acquired by the buffer during the lifetime of the current scope.  This may mean releasing each lock entirely, or downgrading it to a share lock from an exclusive lock.

RecordBuffer manages the interaction between DMOs and Hibernate's Session object (indirectly, via the Persistence class), in order to maintain sub-transaction level undo (i.e., rollback) capability across nested scopes.  For each create, delete, and load operation which occurs in a RecordBuffer instance, a RecordBuffer.Reversible instance is stored for the current scope.  RecordBuffer implements Commitable and is registered to receive commit and rollback callbacks by the BufferManager at each block scope inside a transaction.  When a subtransaction or transaction commit occurs, all Reversible objects in the current scope are copied up to the parent scope, to preserve undo capability at that scope.  When a rollback occurs, all Reversible objects in the current scope have their rollback methods invoked, which resets the buffer's state to enable a retry.

Connection Manager

An instance of the ConnectionManager class is created the first time it is needed in the current user context.  This is typically during instantiation of the first RecordBuffer used in that context.  This object tracks all connections made to a physical database, in the sense of Progress connections (i.e., uses of the CONNECT and DISCONNECT statements in pre-converted code).  This does not trigger a JDBC connection, since this occurs implicitly when a transaction is started or when a query is executed.  Rather, this object is used to track references to physical, transient (i.e., not configured by default) databases which are dynamically registered and deregistered by application logic.  Connections trigger transient databases to be registered with the DatabaseManager, and disconnections cause them (eventually) to be deregistered.

Requests to disconnect a database are not honored immediately, if the database is still being referenced by any buffer in the current scope.  Instead, the disconnection is deferred until the last scope in which the database is referenced is closed.  While a transaction is in progress, any databases involved in that transaction are actively referenced.  ConnectionManager implements Finalizable, and registers itself for master transaction finish notifications upon construction.  Whenever a transaction scope closes, all pending disconnect requests are honored and those databases are deregistered with the database if no other buffers in the current context still reference them.

ConnectionManager also registers an anonymous inner Finalizable implementation class which deregisters all connected databases immediately upon session closure for the current context.

Record Retrieval

This section discusses the various record retrieval mechanisms in Progress, and how these are implemented in the target environment.

Row Level Access

Progess 4GL generally handles all database access on a row by row basis.  FIND and FOR loops each fetch a single record at a time.  Even the newer QUERY and PRESELECT mechanisms fetch records individually.  Efficiencies have been implemented to minimize the impact on performance.  For instance, PRESELECT retrieves a results list of row identifiers up front, then fetches each record in the results list efficiently, using its unique identifier.  FOR in certain circumstances will prefetch more records from the server than it needs to fulfill the current request.

Database access in Progress 4GL is centered around the concept of a buffer which contains a single record at any given time.  Multiple buffers can be created to handle multiple records simultaneously.  State is maintained by the server as to which record is the "current" fetch record for a given buffer, on a per client basis, or by the client directly.  The former is done in the Progress server by maintaining cursors on table indexes, on behalf of clients.  The latter is accomplished by maintaining results lists directly on the client, which are refreshed as needed by the server (similar in concept to scrollable cursors in SQL databases).

The index cursor implementation in Progress implies a tight coupling between database server and client, which is not the case in the target environment.  Thus, we must simulate the use of index cursors using SQL, and by maintaining fetch state for each active buffer/client combination.  This is conceptually similar to the results list approach.  Thus, the replacement for both approaches in Progress is a results list oriented approach implemented in the persistence service layer of the P2J server.

FIND Statement

The Progress FIND statement is used to retrieve a single database record into a buffer.  This record is considered the "current" record.  The current record is generally replaced when another FIND statement is executed (except FIND CURRENT, which is typically used to modify the lock type on the current record).

FIND is implemented in Progress by selecting the appropriate index as described above, then applying selection criteria against all or a subset of records in that index, in order to find the first, last, previous, or current record which matches the conditions, where order is defined by the active index.  The Progress server maintains an index cursor for the current buffer on behalf of the client.  This index cursor is moved by subsequent FIND statements which use the same buffer.  If a different index is used for a subsequent FIND, the current record is moved relative to its position in the new index.  For instance, one FIND may be used to move to (and fetch) a desired record by the most efficient manner, which may be one particular index.  A subsequent FIND may then be used in a loop to navigate from that initial record to process a subset of records along a different index.  Cursor position is always managed relative to the index in use by the current FIND statement.  This is an important point, as this semantic must be preserved in the replacement implementation.

The P2J implementation uses a multi-step query to simulate the behavior of FIND, except in the case of a unique FIND, where only a single query is executed.  In either case, a static HQL statement generated at conversion time is executed the first time a FIND statement with a particular signature is encountered within a scope.  Runtime parameters are substituted into this statement as necessary.

For a unique FIND, this query returns either null or a unique record into the current buffer, or throws an exception if the result was not unique.  For a non-unique FIND, the HQL statement is designed to retrieve a scrollable result set (i.e., backed by a scrollable cursor) of primary key values for those records which meet the selection criteria.  The sort order of the query is critical to the proper function of the implementation.  The sort order is determined by the ORDER BY clause of the HQL statement.  The components of this clause and their sort direction correspond with the components of the Progress index which was associated with the FIND statement in the original application.  The appropriate index is chosen at conversion time by applying the index selection rules described above and incorporating this information into the HQL ORDER BY text.

For a non-unique FIND, the initial query is followed up immediately by a second query which retrieves the current record (or subset of fields thereof), as directed by the FIND statement.  This query is prosecuted using the record's primary key value, which is retrieved from the initial query's result set.  Thus, the first record selected incurs the penalty of two database accesses, though the second should be the most efficient access possible.  The primary key chosen depends upon the navigational directive of the FIND:
A pessimistic lock may be requested for the fetched record, depending upon the lock type associated with the original FIND statement.  See Locking for a discussion of how pessimistic locking is managed.

As subsequent iterations of a FIND statement with the same signature are executed, the result set is traversed as directed by the FIND statement (next, previous, etc.).  As soon as a FIND statement with a different signature is encountered in the same (or narrower) scope for the same buffer, the current result set is invalidated, and the process begins again.  However, the primary key of the most recently fetched record is retained as long as the scope is active;  it is bound to the next HQL statement as a runtime parameter to an equality match via a logical OR in the WHERE clause.  This is necessary to ensure navigation is always resumed relative to the most recently fetched record, in the new result set.  If the primary key is not made an additional search criterion, there is no guarantee the new result set will contain the current record, and proper navigation will not be maintained.

Another event which can invalidate or modify the current result set is an update to the table currently being navigated.  A simplifying assumption is made here that only a modification made in the current application thread will invalidate/modify the current result set.  The reasoning behind this assumption is that it would be highly unusual for an application design to rely upon asynchronous changes made by another client, precisely timed with the logic executing on the current client, such that the result set driving navigation for the current buffer would be invalidated as an intentional process of the system.  It is much more likely that the application design might rely upon changes made synchronously by the logic flow within the current client, in which case the current result set could no longer be used for navigation without being modified or replaced.

FIND NEXT/PREV

When executed inside a REPEAT, FOR or PROCEDURE block, the FIND NEXT/PREV statements will generate an END-KEY condition, when no more records are available. This will result in executing the defined END-KEY action (which defaults to UNDO, LEAVE, if it was not explicitly overriden by the user).

When a FIND NEXT/PREV statement is nested in a FOR/FOR EACH block and it generates an END-KEY condition, the buffers associated with the FOR/FOR EACH block must reference the same record as on the iteration when the END-KEY was generated.

FOR Statement - TBD

The Progress FOR EACH statement is used to iterate through records which meet a set of criteria.  The FOR FIRST and FOR LAST variants are used to select a single record.  Records may be accessed from one or more tables by joining tables using a WHERE or OF clause in the record phrase of this statement.

Like the FIND statement, FOR keeps track of the currently fetched record for each table being accessed.

When the query finds no more records, the FOR loop must end, but the buffers must reference no other records. In P2J, this is possible in two phases:
  1. On block initialization, the buffers associated with the FOR queries are saved as query off-end listeners.
  2. After the last record is reached, the query will throw a QueryOffEndException. If all the off-end listeners collected by the QOEE are the ones registered on block initialization, then the buffers are marked for off-end process and a UNDO, LEAVE action is executed (the default action for the END-KEY condition). After UNDO, the buffers will reference no record.
When an END-KEY condition is generated inside a FOR block, the buffers will rollback any changes to the current record, but referenced record will not be changed.

TODO: confirm the behavior with the UNDO statement too - it seems the referenced record for the FOR associated buffers doesn't change.

Pessimistic Record Locking

The Progress programming model relies heavily on pessimistic record locking to maintain data integrity in concurrent applications.  P2J emulates this locking behavior in the runtime environmet, to ensure converted applications manage concurrent access to database records in exactly the same manner.

Note that locking is ignored for temp-tables.  In Progress, temp-tables are private to a user context.  Since no sharing of data is possible in such a model, no locking is necessary.  Consequently, locking operations on temp-tables are no-ops.  The remainder of this discussion of lock behavior assumes permanent tables are in use, unless otherwise indicated.

Lock Types and Characteristics

Progress offers the following lock types:
There are a number of ways to obtain record locks, both implicit and explicit.  An attempt to edit a record upon which the current session holds a SHARE-LOCK will implicitly cause that lock to be upgraded to an EXCLUSIVE-LOCK.  Locks commonly are obtained using data access language statements such as FIND, FOR, OPEN QUERY, etc.  By default, a lock request (implicit or explicit) will block the session indefinitely if the lock is not available;  however, locks also can be requested with the NO-WAIT option, which causes the data access request to fail if the requested lock is not immediately available.

Lock acquisition, use and release for the same record among several buffers within the same user context are communal and asymmetric.

Lock State Transition Rules

Locks behave differently within and outside of transactions.  Very specific rules apply to the release of locks, particularly when crossing a transaction boundary.  The following code samples illustrate the lock type held on a single record through various state transitions.  The color of the code indicates the lock type held after each statement is executed, according to the following legend.

Legend for Code Snippets
No record in scope
      NO-LOCK
SHARE-LOCK
EXCLUSIVE-LOCK
Lock Type State Transitions
Example Notes
1
001 do transaction:
002  find first person exclusive-lock.
003  person.emp-num = 100.
004  find current person no-lock.
005 end.
At line 002, the transaction a record is found with EXCLUSIVE-LOCK, then re-found with NO-LOCK at line 004.  However, because we are inside a transaction, the lock is downgraded to SHARE-LOCK here, rather than released.  After the transaction ends, the lock is released.
2
001 find first person no-lock.
002 do transaction:

003  find current person exclusive-lock.
004  person.emp-num = 100.
005  find current person no-lock.
006 end.
007 find current person no-lock.
In this case, the buffer scope of the person buffer is larger than the transaction scope, because it begins at line 001, when the record is found with NO-LOCK.  As in the previous example, it is downgraded rather than released at line 005, but the SHARE-LOCK remains even after the transaction ends at line 006.  It is not until line 007 that the lock is finally released by the explicit find NO-LOCK statement.
3
001 find first person no-lock.
002 do transaction:

003  find current person exclusive-lock.
004  person.emp-num = 100.
005 undo, leave.
006 end.
Even though the buffer scope is larger than the transaction scope in this case, the undo at line 005 causes the lock to revert at the end of the transaction to its state at the beginning of the transaction, NO-LOCK.
4
001 find first person no-lock.
002 do transaction:
003   repeat:
004     find current person exclusive-lock.
005  person.emp-num = 100.
006 undo, leave.
007 end.
008 end.
In this case, the EXCLUSIVE-LOCK is acquired in a repeat block nested within the transaction scope at line 004.  Even though the repeat block is undone at line 006, the exclusive lock is not released when the repeat block ends at line 007.  It is not until the transaction exits at line 008 that the lock is released.

Note that even though the buffer scope is larger than the transaction scope, the lock is not downgraded to SHARE-LOCK when the transaction ends, unlike Example 2.  The existence of the UNDO preempts this behavior and causes the lock instead to revert to its state at the beginning of the transaction, even though it remains at EXLUSIVE-LOCK throughout the remainder of the transaction.  Thus, had we initially found the record with SHARE-LOCK at line 001, it would still be locked with SHARE-LOCK at line 008.
5
001 find first person no-lock.
002 do transaction:
003   repeat:
004     find current person exclusive-lock.
005  person.emp-num = 100.
006 leave.
007 end.
008 end.
009 find current person no-lock.
This example is similar to Example 4, however, without the UNDO, the EXCLUSIVE-LOCK is downgraded to SHARE-LOCK at the end of the transaction.  It remains SHARE-LOCKED until it is explicitly released at line 009.
6
001 do transaction:
002 repeat:
003 find first person exclusive-lock.
004 person-emp-num = 100.
005 leave.
006 end.
007 end.
In this case, the buffer scope is smaller than the transaction scope.  Even though the record is no longer referenced after the end of the inner repeat loop, it remains locked through the end of the transaction, so that no other session can edit it before the transaction has been committed (or undone).  However, the EXCLUSIVE-LOCK is downgraded to a SHARE-LOCK at the end of the buffer's scope.  It is released entirely at the end of the transaction.

P2J Implementation

P2J uses a runtime implementation of the pessimistic locking model.  While this is not optimal, the reasons for using a runtime implementation instead of relying upon the backing database's pessimistic lock model are several:
Consequently, there are several downsides to using a runtime implementation:
A number of classes in the persistence runtime collaborate to manage the acquisition and release of record locks.  The locking implementation is logically divided into a low level interface and a higher level interface which uses it.

Low Level Interface

The low level interface is responsible for accessing the core locking implementation, which responds to requests to acquire and release locks.  It is this interface's job to manage the locks held by each user context, and to respond to requests to lock, unlock, and query lock status for individual records, per context.

The LockManager Java interface was developed for this purpose.  A particular concrete implementation is created at P2J server initialization;  the implementation class to use is specified in the directory.  The Persistence class uses the lock manager transparently to do its work;  generally, the lock manager does not need to be accessed directly by application level code.  There is one Persistence instance per physical database managed by a P2J application server.  External applications can access a Persistence instance via remote method calls, or they may implement a remotable implemenation of the LockManager interface (see below).  To date, a non-Java mechanism does not exist for external access.

The default implementation of the LockManager interface is InMemoryLockManager.  An instance of this class resides in the server JVM.  As the name suggests, all locking behavior is managed in the internal memory of the server process.  Thus, coordinated locking is available only to objects which exist in the JVM server process.  To maintain the integrity of concurrent record access, all lock requests must be routed through this object.  If locking needs to be shared among external JVM instances, one of the mechanisms described above must be employed.  The default mechanism for such coordination among separate P2J server instances is described below.

Each P2J server instance is responsible for managing record locks for the database instances to which it permanently connects (as configured in the P2J directory).  Each such P2J server instance is said to be authoritative for those database instances, and each such database instance is said to be a local database with respect to that server.  Databases which are accessed by a P2J server (server A) via an explicit connection request in business logic (i.e., the converted form of the Progress CONNECT statement), which are local to a different P2J server instance (server B), are said to be remote databases instances with respect to server A.

A P2J server instance (the authoritative server) is responsible for managing all record locks for its local database instances, including requests made by external P2J servers.  Locking services for a remote database are accessed via a distributed implementation of the LockManager interface:  RemoteLockManager.  An instance of this class is automatically created for and used by a specialized Persistence instance, which is responsible for accessing a remote database.  RemoteLockManager acts as a proxy for the remote server's corresponding, primary LockManager for the target database:  as a method on the RemoteLockManager instance is invoked, the request is forwarded to the remote server and is handled there by a LockManagerMultiplexer implementation, which dispatches the request to the appropriate, local LockManager for the target database.  A response is returned to the RemoteLockManager which initiated the request on the requesting server.  Other than the increased latency to communicate between P2J servers, the use of the remote LockManager proxy mechanism is transparent, and is managed automatically by the P2J runtime environment.  This mechanism presumes a direct, server-to-server connection can be established between the two P2J servers.

High Level Interface

The high level locking interface is provided by the RecordLockContext Java interface.  It is intended for use only within the P2J runtime environment, to manage communal locks among RecordBuffer instances.  A no-op implementation is used for temporary tables, and a package private, inner class implementation is used for permanent tables.  When a RecordBuffer or query object needs to acquire or release a record lock, it does so through this interface.  A RecordLockContext implementation uses the low level interface to perform the actual lock operations.  This implementation, in collaboration with the RecordBuffer class, enforces the lock transition rules specified above.  It is also responsible for ensuring the asymmetric rules of lock acquisition and release when multiple buffers within a user context access locks for the same record.

Usage from Business Logic

Locks can be acquired explicitly during record retrieval or implicitly during record retrieval or update.  Explicit locking is managed by specifying a lock type when defining or using the various query implementations.  Implicit locking occurs either when no lock type is specified (the various query implementations may elect default lock types), or when data model object (DMO) modification requires a lock upgrade.

Specifically, an implicit lock upgrade occurs when application code modifies the data within a DMO instance by invoking one of its mutator methods.  This is achieved through the use of Java's dynamic proxy mechanism.  When a DMO is defined by the application (via RecordBuffer.define or TemporaryBuffer.define), the application is given a proxy to the underlying DMO, rather than a direct reference.  The proxy's invocation handler (implemented within RecordBuffer as an inner class) intercepts mutator method invocations on the DMO and upgrades the lock from SHARE to EXCLUSIVE if necessary.  It is assumed a transaction is active when such an upgrade is attempted.

Temporary Tables

DMO interfaces and classes which represent converted temp tables use a special database instance name _temp, which is reserved for this purpose.  This circumvents the problem of having to associate temp tables at runtime with different, existing databases.  Whereas regular table DMOs are backed by RecordBuffer instances and are defined with a call to RecordBuffer.define(), temp table DMOs are backed by TemporaryBuffer instances and are defined with a call to TemporaryBuffer.define().  Since TemporaryBuffer extends RecordBuffer, they are otherwise used the same way.

It is possible in Progress to create multiple instances of the same temp table.  One way is to define multiple buffers define parameters for temp table instances.  When a temp table parameter is passed to another procedure by a RUN statement, certain data copying semantics occur implicitly, depending on the use of the INPUT, OUTPUT, or INPUT-OUTPUT options to DEFINE PARAMETER.  Progress creates separate temp table instances for the caller and the callee procedures.  All record rows are copied from the caller's instance to the callee's instance at the beginning of the called procedure for the INPUT and INPUT-OUTPUT options.  All record rows are copied from the callee's instance to the caller's instance at the end of the called procedure for the OUTPUT and INPUT-OUTPUT options.  This functionality is provided by the TemporaryBuffer.associate method, which should be called at the beginning of a method to which a temp table (i.e., Temporary instance) is passed.  This method does an immediate INPUT copy if this option is specified.  It registers for an OUTPUT copy at the end of the method's scope if this option is specified.

The need for multiple temp table instances as temp table parameters are passed to nested method invocations presents a difficulty in the P2J environment.  Namely, only one physical temp table can be created in the backing SQL database per table name, and as discussed above, we are limited to mapping our DMO objects to only one backing table name by Hibernate.  The same DMO type is needed in both the calling and called method, meaning we are limited to a single physical temp table name.  Because of this restriction, it is necessary to multiplex a single physical temp table instance in the database to appear as multiple, virtual temp tables to the persistence runtime.  This is managed transparently to the application, but requires some special preparation of temp table DMO interfaces and classes:
Physical temp table instances are created and dropped dynamically, as needed.  The first temporary DMO instance of a given DMO type to be used in the current context triggers table creation.  When the final DMO instance of a given DMO type in the current context is finished with the table, the table is dropped.  If a temp table defines fields with extents, secondary tables are created as necessary to support these, each with a foreign key back to the primary table.

Refer to the TemporaryBuffer class for additional details on multiplexing.

Transaction Isolation and Index Updates

Progress has an unusual mode of transaction isolation, in that there is some visibility of uncommitted transaction changes across sessions.   This applies only to permanent tables;  temp-tables and work tables are necessarily session-specific and are not visible at all across sessions.

Specifically, a change to a database index is visible immediately to all sessions, even if the transaction in which the change occurred has not yet been committed.  The change is visible the moment the index is modified, whether the modification represents an insert, update, or delete.  Records can be found (or not found in the case of a delete) based upon the index change;  however, the new data cannot necessarily always be accessed directly by other sessions.  The level of visibility depends upon the type of modification.

Record Inserts

The insertion of a new record is the most complicated form of an uncommitted change which may be visible across sessions.  It is not the creation of the record per sé which triggers the record to be visible across sessions, but rather it is the modification of any index on the table to which that new record will be added.

A new record is created with the 4GL CREATE statement.  In and of itself, record creation does not modify any index.  Once the new record is created, it is not actually added to the database until something causes it to be flushed from the record buffer to the database.  The one exception to this rule seems to be the creation of a temp-table record, when the corresponding temp-table has no indices on it -- in this case, flushing seems to happen immediately upon creation.  Other than this exception, flushing can only be caused by the following events:
Before a permanent record is flushed to the database, the indices on that record's table may be modified by changes to the record.  These index changes are immediately visible to other sessions, even if the enclosing transaction has not yet been committed.  That is, the new record can be found by other sessions which use an index so modified to retrieve records from the target table.  Furthermore, until it is flushed to the database, the record may be found by external sessions, if and only if those sessions use such a modified index to find it.  A search which uses an unmodified index to retrieve records will be unable to find the new record, even if its default field values would suggest it should be present at a particular location in the unmodified index.

A new record is first added to an index or to multiple indices as the result of having one or more of its fields assigned values, by means of either the ASSIGN statement, or simply via the assignment operator (=), or at the end of a buffer copy operation.  If multiple fields are being updated in a BUFFER-COPY or in an ASSIGN statement, updates to affected indices are deferred at least until all fields in the respective statement have been assigned.  As soon as enough assignments have been made so as to have "touched" all the fields participating in any index (or group of indices) of the target table, a snapshot of the new record is made in its current state, and this "unfinished", placeholder record is inserted into that index or into those indices.  All other indices remain unaffected.  Note that for the placeholder record to be inserted into an index, it is not necessary that the new values of the assigned fields be different than their initial, default values;  all that is necessary is the act of assignment itself.

The unflushed record which will be found by external sessions represents an unfinished form of the record, a placeholder record of sorts.  It appears to be a snapshot of the record's state at the moment it was first added to any index, after it was newly created.  Subsequent assignments to other fields of the record in the creating session may cause other indices of the target table to be modified.  However, until the record is flushed to the database and the enclosing transation committed, external sessions will only see the data in the placeholder record as it was at the moment the original snapshot was taken.

Example

Consider the following table definition for the "sample" table, and the corresponding 4GL code (annotated with line numbers):

TABLE DEFINITION:  sample
FieldTypeInitial Value
f1integer0
f2integer2
index idxF1 field f1
index idxF2 field f2

CURRENT DATA IN TABLE sample
Field f1Field f2
11
33

Session A:
1  CREATE sample.
2 f1 = 2.
3 pause.
4 f2 = 2.
5 pause.
6 f2 = 7.
Session B:
1  FIND FIRST sample WHERE f1 = 2 USE-INDEX idxF1 NO-LOCK NO-ERROR.
2 FIND FIRST sample WHERE f2 = 2 USE-INDEX idxF2 NO-LOCK NO-ERROR.
3 pause.
4 FIND FIRST sample WHERE f2 = 2 USE-INDEX idxF2 NO-LOCK NO-ERROR.
5 pause.
6 FIND FIRST sample WHERE f2 = 7 USE-INDEX idxF2 NO-LOCK NO-ERROR.
7 IF AVAILABLE sample
8 THEN MESSAGE "Found sample record where f2 = 7; f1:" f1 ", f2:" f2.
Assume session A runs through line 2 and pauses at line 3.  At this point, a new sample record has been created and only field f1 has been assigned.  The new sample record has a value of 2 in field f1 (explictly assigned), and a value of 2 in field f2 (initial, default value).  If the code in session B is now run, the FIND statement in line 1 will find a snapshot of the newly created record in session 1.  The FIND statement in line 2 will not find a record, even though the default value of field f2 matches the search criteria.  This is because the assignment in line 2 of session A only has updated index idxF1 so far.  Index idxF2 has not yet been updated to reflect the presence of the new record.

Now session A is allowed to continue through line 4 and it pauses at line 5.  Even though the assignment in line 4 sets field f2's value as 2 (the same value it already had as its initial value), index idxF2 is now updated.  If the FIND statement in line 4 of session B is executed, it will find the new record's snapshot.

Finally, session A is allowed to continue through line 6, which updates index idxF2 again.  Now, session B is allowed to continue through line 8.  Line 6 in session B will find the new record's snapshot.  However, the message which is printed at line 8 will look like:
   Found sample record where f2 = 7;  f1: 2 , f2: 2
Note that -- even though the new record was found in session B using the criterion f2 = 7 -- session B will not be aware of the new value (7) of f2, assigned in line 6 of session A.  Only the data which were in the new record at the moment the snapshot was made (that is, at the moment the first index was modified in session A, line 2), are visible from other sessions.  Subsequent updates to the new record are not visible in other sessions until the new record is flushed to the database and the enclosing transaction is committed;  however, insofar as any updates made in session A which change the location of the new record in any index on the sample table, those index updates will change where and if the new record can be found in other sessions.

This unusual, partial, transaction isolation seems to be a quirk of the Progress database implementation.  Note that it is only apparent when the second session uses NO-LOCK to find a newly created record.  Had SHARE-LOCK or EXCLUSIVE-LOCK been used in the example above, session B would have blocked waiting on session A to release its exclusive lock on the new record.  [TODO:  determine if a different reread-no-lock setting changes this behavior.]

Record Updates

The rules for updates of existing records are consistent with the discussion above regarding record inserts.

If an existing record is updated, such that the update is to a field participating in one or more indexes, the change to any affected index immediately is visible to all other sessions, regardless of whether the enclosing transaction has been committed.  However, the new data are not yet visible, and will not be until the enclosing transaction is committed.  That is, the modified record can be retrieved in a separate session from its new location in a modified index immediately, but any field data dereferenced from the retrieved record (including the updated field(s) which triggered the index change(s)), will not be the new data as updated in the uncommitted transaction in the first session[1].  Rather, the second session will still be able to access only the old data available at the time its current transaction had begun.

Note that in the case of updating properties of an existing record (as opposed to updating properties of a newly created record) it is not necessary to touch all fields participating in an multi-field index, in order for that index change to be visible across contexts.  The assignment of any single field or any combination of fields of a multi-field index will trigger an index change which is immediately visible to all sessions, whether or not all fields participating in the index were actually touched.

In summary, there is some level of transaction isolation for record updates, in that record data retrieved (NO-LOCK) within a transaction remains consistent throughout the transaction, even if another transaction modifies the retrieved record.  Only the index changes are visible across sessions immediately, so, while the uncommitted changes to a record are not visible in a separate session, that session may find the modified record at a different location within its containing table's indices, as a result of those uncommitted changes.
Notes
  1. A quirk/bug has been discovered in Progress v9.1C, which may or may not exist in other versions.  Under certain conditions, uncommitted changes to fields in one session may be fully visible in other sessions.  Empirical testing has shown that executing any query (in the generic sense of the term;  i.e., any FIND, FOR EACH, PRESELECT, etc.) which performs an inequality test in its where clause between a database field and any other operand, will trigger all uncommitted updates to any records containing that field, which have been made up to that point, to be "leaked" to other sessions.  That is, other sessions will be able to dereference fields in those modified records and access that uncommitted data.  It appears a snapshot of the uncommitted, modified state of those records is made at the moment the triggering query is performed, and that snapshot is exposed to other sessions which retrieve those records (with NO-LOCK).  Subsequent changes in the first session are not visible, until another such query exposes them, or until the enclosing transaction is committed.  For our purposes, this behavior is termed the update isolation leak.

Record Deletes

When an existing record is deleted, that record no longer is available to any other session, even if the transaction in which the delete took place has not yet been committed.  All affected indexes are updated immediately, and there appears to be no way to retrieve that record in another session.  Surprisingly, even OPEN QUERY PRESELECT and REPEAT PRESELECT loops which began before the delete took place in a separate session will not retrieve the deleted record.

Summary of Progress Behavior

The following table summarizes the behavior of the various record retrieval statements in one session, when a database operation occurs in another session, within a transaction which has not yet been committed.  In the event of looping statements, it is assumed the loop has already been entered at the time a record is inserted/updated/deleted in the separate session.

INSERT UPDATE DELETE
  • FIND
  • FOR
  • QUERY FOR / GET
  • Inserted record immediately available at the index location determined by the indexed fields at the moment the related index is updated.
  • Only data in record at the moment of first index update is visible to second session, even after subsequent updates in first session.
  • Upon undo of insert, lock on new record cannot be acquired in second session.
  • Updated record immediately available at the index location determined by the updated index fields.
  • Updated record no longer available at its old index location.
  • Data changes not visible to second session, including changes to indexed fields (change affects record's location in index, but not its contents)
Deleted record can no longer be retrieved.
  • REPEAT PRESELECT / FIND
  • QUERY PRESELECT / GET
Inserted record not available to second session. Record update affects neither record retrieval nor data visibility in second session. Deleted record can no longer be retrieved.

Additional Detail

All the below issues were observed while running in session 1 the driver program dmo_oper.p and in session 2 the driver program dmo_test.p.
The chosen strategy for testing was as following: UPDATE statement: When session 1 has a pending update not yet committed, and after that session 2 starts to iterate through buffer records, session 2 will: UPDATE statement: When session 2 has already started to iterate through records and session 1 modifies a record but does not commit it, session 2 will: CREATE statement: When session 1 has a pending record to be inserted, and after that session 2 starts iterating the buffer records, session 2 will: CREATE statement: When session 2 has already started to iterate buffer records, and session 1 inserts a record, but does not commit, session 2 will: DELETE statement: When session 1 has a pending record to be deleted, and after that session 2 starts iterating the buffer records, session 2 will: DELETE statement: When session 2 has already started iterating the buffer records, and session 1 has a not-committed deleted record, session 2 will: In summary, the following issues are to be considered about dirty records:

P2J Runtime Implementation

Post-conversion, an application is backed by a relational database that generally has much different transaction isolation behavior than the Progress database.  PostgreSQL, for instance, uses Multi-Version Concurrency Control (MVCC) to ensure that each transaction has a separate copy of the state of the records it uses.  Index changes are not visible to other transactions until the transaction which caused those changes are committed.  Because of the impedance mismatch created by these different approaches to transaction isolation, it is necessary to emulate Progress' isolation quirks in the P2J runtime environment.  This adds an unfortunate level of complexity to the implementation.

Dirty Sharing

The sharing of uncommitted changes (i.e., dirty sharing) is accomplished via tightly synchronized, multi-session access to an embedded, transient, in-memory database resident in the P2J server's JVM.  This database is known as the dirty database.  At the time of writing, P2J uses the H2 Database for this purpose.  In addition, a number of related data structures are maintained outside the dirty database to assist in dirty sharing.

At a high level, updates are made to the dirty database whenever a session makes changes (inserts, updates, deletes) to a permanent database table in the primary database, which in Progress would update one or more indexes in the associated permanent tables, as described above.  These updates to the dirty database are committed immediately in micro-transactions, the scope of which generally matches a single field assignment, ASSIGN statement, or BUFFER-COPY statement in the corresponding, pre-conversion Progress code.  A separate, dirty database instance exists for each primary database used in the application.  No dirty database is maintained for temporary tables, since these are private to a session by nature and thus require no sharing of uncommitted transaction state across sessions.  The data managed by a dirty database is transient, in that it exists only in memory, and is never persisted when the server is stopped.

An instance of DirtyShareManager manages access to each dirty database instance and its supporting data structures.  There is one such object per primary, permanent database instance.  It is this object which executes queries, inserts, and updates against the dirty database when information about uncommitted transaction state in the corresponding, primary database must be shared.  This object lazily creates tables in the dirty database as needed.  Each such table mirrors the structure of its corresponding relation in the primary database, including indexes, except that no index (save the surrogate primary key index) is unique, even if its analog is unique in the primary database.  This enables the P2J runtime to detect potential unique constraint violations safely, without triggering an actual error in the dirty database.  Only information about uncommitted inserts of new records and updates of new or existing records is maintained inside the dirty database.  Information about uncommitted record deletes is maintained in separate data structures, within the dirty share manager implementation.

Each user session creates its own instance of DirtyShareContext, one per primary database instance.  This DirtyShareContext instance is that session's access point to the associated DirtyShareManager.  When a session needs information about uncommitted state in other sessions, or needs to share information about its own uncommitted state, it does so using the DirtyShareContext API.  A DefaultDirtyShareContext implementation coordinates with the appropriate DirtyShareManager implementation object for permanent tables.  A NopDirtyShareContext implementation, whose methods generally do nothing and return quickly, is used for temporary tables.  This choice was made to avoid having to write and maintain specialized runtime code specifically for temporary tables in many places, since so much of the runtime is common for permanent and temporary tables.

If a user session connects with a remote database, its DirtyShareContext instance for that database will coordinate with a special implementation of the DirtyShareManager interface:  RemoteDirtyShareManager.  RemoteDirtyShareManager acts as a local proxy to the remote database's DirtyShareManager instance.  All requests are executed over the network to the remote server, where they are serviced by the appropriate dirty share manager.  The remote nature of a database and any dirty sharing done with respect to that database is thus logically transparent to the requesting session.

The Progress update isolation leak quirk/bug is emulated through the cooperation of several classes in the persistence framework.  HQLPreprocessor analyzes each where clause and determines which types of DMOs must have their uncommitted changes leaked when the associated query is executed.  RandomAccessQuery and PreselectQuery use this information to trigger the dirty share manager to update its snapshots of the associated, modified records in the current session via DirtyShareContext.updateSnaphots(List<String>, Persistence), effectively publishing these uncommitted updates to all sessions.

Data Validation

Dirty sharing is tightly coupled with the concept of data validation.  Data validation is the process of vetting the changes to new and existing records, to ensure those changes do not violate constraints imposed by a primary database's schema, before those changes are flushed to the primary database.  It is important to note that only certain, schema-level validation is performed within the persistence runtime;  specifically, only uniqueness and nullability constraints are validated.  Other validation defined within the Progress schema, such as that triggered by validation statements and messages, are transformed to application-level validation during conversion, and are triggered by business logic.

While unique and nullability constraint checking will happen naturally within the backing relational database, it is necessary to perform a discreet validation step before changes are flushed to the database, for several reasons:
Data validation is managed across several classes:
Validation Triggers
DMO validation may be triggered in a number of ways.  When a record buffer creates a new record, modifies an existing record, or is prepared for a query, validation of the buffer's current record is necessary.  Business logic may invoke an explicit validation of a record.

A record buffer may be flushed at various points.  Flushing the buffer only has an effect when there is a record in the buffer, and that record is transient.  A transient record is one that has been created in memory, but has not yet been persisted to the backing Hibernate session.  Flushing a buffer which contains a transient record triggers validation.

The creation of a new DMO per sé does not trigger validation.  However, just before a new DMO is instantiated, a check is performed to see if there is a transient record still in the target buffer from a previous operation.  If so, the buffer is flushed, which triggers validation of the existing record as described above.

Another validation trigger is the modification of a buffer's current record by business logic.  A record can be modified in several ways.  The invocation of any of a DMO's setter methods will trigger immediate validation of the affected property and of any unique index(es) which would be changed as a result of the updated property.  A converted ASSIGN statement will create a batch operation which can update multiple properties across multiple buffers.  In this case, multiple setter methods may be invoked, with validation deferred until the batch is closed (see RecordBuffer.endBatch()).  Finally, a converted BUFFER-COPY statement will update multiple (potentially all) of a DMO's properties.  Again, validation of the changes to the destination DMO is deferred until all targeted properties are copied.

When a converted query operation (e.g., FIND, FOR EACH, REPEAT PRESELECT, etc.) is about to fetch its result(s), it first flushes its backing buffer(s), triggering validation of any transient records therein.

Business logic may invoke the static RecordBuffer.validate(DataModelObject) method explicitly.  This triggers full validation of the target DMO.

Finally, the rollback of modifications to a DMO's properties or of the deletion of a DMO can trigger validation of the target record.
Levels of Validation
Validation of a DMO may be full or partial.  Full validation checks that every non-nullable property of the DMO has not been set to unknown value, and that every unique constraint on the DMO's backing table has not been violated by the latest changes to the DMO.  Partial validation checks some subset of these nullability and unique constraints.

There are two levels of partial validation:  remaining or dirty.  A dirty level validation checks only those non-nullable properties which have been most recently modified (i.e., since the record was loaded or since the last validation was performed against the current record, whichever has occurred later), and any unique indexes which have been most recently modified.  A remaining level validation checks all non-nullable properties and unique indexes which have not otherwise been checked since the record was loaded.  A full validation ignores both the dirty and remaining state.  After a full or remaining level validation, subsequent remaining level validation attempts will check nothing, until a different record is loaded into the buffer.

Buffer flushes trigger a remaining level validation.  DMO property modifications trigger a dirty level validation.  Explicit validations by business logic, and rollbacks of record deletions, trigger full validations.  The ValidationHelper inner class of RecordBuffer manages the state which tracks which properties and unique constraints have been validated and which still require validation.

While this distinction of partial validation levels may appear arbitrarily complex, it is necessary in order to mimic the way in which the Progress database manages its index changes.  When 4GL code changes fields in a database record within a transaction, only those mandatory fields and unique indexes which are directly affected are updated;  the other fields have not been changed, so they trigger no constraint checks.  It is especially important to mimic this behavior when considering newly created records, whose fields may have default values which would violate unique constraints, if those indexes were validated too early.
The Validation Process
When the RecordBuffer determines a particular level of validation is necessary, it uses the ValidationHelper to prepare a ValidationData instance.  This object contains metadata about the validation to be performed.  ValidationHelper passes this object to an instance of DMOValidator.  One instance of DMOValidator is created for each DMO type.  These instances are immutable and stateless (with respect to any discreet validation operation, at least).  As such, they can be shared across user contexts and are thus created lazily and cached in a static map by DMO type.  The DMOValidator interrogates the ValidationData object to determine which nullability and unique constraints must be checked.

Nullability constraints are checked first.  Any non-nullable DMO property which contains unknown value causes a ValidationException to be thrown, which is caught by the calling RecordBuffer and reported to the user as an error.

Next, unique constraints are checked.  For each unique index which would be modified if the proposed property changes were to be permitted, a query is executed against the primary database to check whether the values of the target DMO's properties which participate in that index, clash with any record already in the backing table.  For permanent tables, a similar query is executed against the dirty database.  If either query produces a clashing record, a ValidationException is thrown.  If all checks pass, the change is immediately flushed to the database.  This flushing and checking is synchronized at an index level, so that only one user context can be updating an index at a time.  It is also at this stage (for permanent tables) that index changes are shared with other sessions via the dirty share manager.

Runtime Support for Natural Joins

Natural joins are detected during schema conversion.  This analysis of natural joins in Progress source code (and supported by the Progress schema) results in the creation of foreign relations between tables in the P2J environment.  These are integrated into the database schema as foreign keys, and into the Java Data Model Objects (DMOs) as embedded references in one DMO to another DMO or to a collection of DMOs as follows:

Relation
DMO Instance Member
many-to-one; "many" end represents the "foreign" end of the relation DMO on "many" end stores direct reference to DMO at the "one" end of the relation
one-to-many;  "many" end represents the "foreign" end of the relation DMO on "one" end stores a Set of DMOs at the "many" end of the relation
one-to-one;  conversion hint determines "foreign" end of the relation DMO on either end stores direct reference to DMO at the opposite end of the relation

In each case, the table representing the "foreign" end of the relation defines a foreign key to the related table's primary key.  The Progress fields which previously defined the relation, while now redundant, are not removed from the converted tables and DMOs, however, as they generally will still be referenced (as DMO properties) within the converted application code.  At the same time, we prefer to use the foreign key to perform runtime joins where possible, as this generally will be more efficient than using the legacy keys, which often are composites of multiple fields.  These conflicting needs present two issues that need to be resolved at runtime (and to some extent at code conversion time):
  1. the Progress <record> of <table> join construct must be supported by the use of foreign keys;
  2. an update to any property or properties of a "foreign" record which previously defined the join between two Progress tables must cause the foreign key to be updated appropriately.

Using Foreign Keys for Runtime Joins

All P2JQuery implementations support the use of an inverse DMO, either at query construction, or when adding a component to a multi-table query.  The inverse parameter represents the DMO instance on the other end of the foreign relation which defines the natural join.  When such a parameter is provided to a query, the persistence runtime does the necessary work behind the scenes to execute an ANSI-style join between the necessary tables using foreign keys.  Calling code needs to provide the inverse parameter to a query constructor or to an addComponent() method variant, but the natural join query support is otherwise transparent to application code.

Keeping Foreign Keys in Synch

A converted application does not update foreign keys directly, since there was no awareness of their existence in pre-conversion business logic.  However, an application often will contain code which creates, deletes, or updates records which contain foreign keys in the post-conversion environment.  In order to maintain relational integrity, the persistence runtime must track all changes to a record which indirectly impacts its foreign key, and reflect those changes by updating the foreign key accordingly.  This is the responsibility of the RecordBuffer class.

Since the deletion of a record eliminates its foreign key, this case is not managed.  The creation of a new record also does not trigger foreign key synchronization, since all properties are initialized to default values.  However, any update to a property, which represents a field in the legacy index defining the implicit foreign relation to another record in Progress, will trigger foreign key synchronization in the P2J runtime.  The list of each such property (if any) in a table is stored in the dmo_index.xml file for the application, specifically in the path dmo-index/schema/class/foreign/property/.  This document is created during application conversion and is read by the persistence framework at P2J server initialization.

The AssociationSyncher class does the grunt work of foreign key synchronization.  At defined points (after a batch assignment of property values, or upon the assignment of a single property if not in batch assign mode), this class will query the database for the foreign record which matches the new property value (or set of new values), and update the appropriate instance member in the current buffer record with the DMO, if any, found.  When this change is flushed to the database, Hibernate automatically updates the associated foreign key.

Special Where Clause Considerations

There is an inherent mismatch between P2J's database access architecture and that of Progress, in that the Progress database server and 4GL client are very tightly coupled, while the P2J server (i.e., the database "client") and the database backend (i.e., the database "server") communicate only through the JDBC driver.  Progress' architecture allows the Progress database server to execute user defined functions and many built-ins and operators within the context of the 4GL client, which provides great flexibility in what may be embedded within a where clause.  In addition to logical and arithmetic operators, ternary constructs and built-in functions, it is possible to embed user-defined functions which will actually execute in the client's context.  This last point is important to take into consideration, to the extent that the execution of a user defined function from within a where clause may have side effects which impact the current state of the client process.

To the degree possible, the P2J architecture pushes as much of the work of restricting the records included in a result set as possible to the database server.  A converted application's HQL where clauses include basic functionality supported directly by the backend, such as ternary clauses, logical operators, table and field references and certain standard, built-in functions (note however that the built-in functions supported by HQL are a different subset than those supported by Progress, and may differ in their implementations).  In addition, many Progress-specific built-in functions and specialized operator implementations are supported as custom, database functions which are mapped to their P2J Java implementations.  Currently, this is achieved with PostgreSQL using PL/Java, though the same concept could be applied to other backends which support Java as a function definition language.  For details on this implementation, please refer to the com.goldencode.p2j.persist.pl package description.

Client-Side Where Clause Processing

To date there is no reliable way to support converted, user defined functions directly in the P2J backend database server.  This is simply because server-side database functions in the P2J architecture operate within the context of a separate JVM executing within the backend database server process, NOT within the client's context in the P2J server JVM.  Since the execution of a converted, user defined function may have side effects to the state of the invoking client context, or may require resources not available within the context of the database server, these functions must be executed with the proper client context within the P2J server.  Note that this is an inefficient (yet necessary) implementation, because each record to be checked for inclusion in the result set must be retrieved from the backend database, hydrated into a DMO instance, and checked against the criteria set by the client-side component(s) of the where clause.

We refer to process of retrieving records from the database server and checking them for result set inclusion within the P2J server as client-side where clause processing.  The term client-side can seem misleading, because the processing actually occurs in the P2J server.  However, from the perspective of a database operation, which takes place between the database backend server and the P2J server middle tier, the P2J server actually acts as a database client.

Fortunately, the embedding of user defined functions within a where clause seems to be an esoteric use case.  Also, we know of no built-in function in Progress which is usable within a where clause and which is not immutable, that is, which would have any impact on the state of the client context, or which would require access to any resource which would be unavailable from the backend database server process.  So currently all built-in Progress functions are supported in P2J as custom, server-side functions.  The occurence of client-side where clause processing therefore is likely to be rare.  However, the primitive data wrapper types used in the P2J built-in function implementations have specialized features which, if used, currently are not fully supported in the server-side implementation.  Expressions which rely upon these features must be converted to client-side where expressions.  See the following table for details.

Wrapper Type
Feature
Notes
com.goldencode.p2j.util.character
Case sensitivity
Case sensitivity is an attribute of the P2J character class.  However, server-side database functions currently use java.lang.String as the data type for parameters and return types where a text value is needed.  Thus, all text comparison operations which occur internally within the database function will be case sensitive.  For case-insensitive comparisons, this issue usually can be circumvented by uppercasing text parameters in the HQL expression as necessary, before they are passed to the database function.  However, functions whose work relies upon querying or setting this attribute internal to the function implementation will not necessarily behave correctly.  Functions must therefore be reviewed on a case by case basis to determine whether such a dependency exists.
com.goldencode.p2j.util.decimal
Precision
The P2J implementation of decimal uses a default precision value of 10 places after the decimal point.  However, server-side database functions currently used java.lang.Double as the data type for parameter and return types where a non-integral, numeric value is needed.  Thus, database function parameters which use a non-default precision setting currently disqualify a particular function or operator instance from converting to a server-side database function.  Likewise, database functions which set the precision value to a non-default value cannot be converted to server-side functions.
com.goldencode.p2j.util.date
Timezone
The P2J implementation of date is timezone-neutral by default.  If a where clause expression requires a specific timezone setting for a function parameter or for an operand of a logical or arithmetic operation, the subexpression containing that function call or operation currently cannot be converted to a server-side database function.

To support client-side where clause processing, the persist package includes the WhereExpression class.  Each query type in the persist package allows one (or multiple, for some types) WhereExpression instance to be registered with the query.  Where clauses in the pre-conversion application are analyzed during conversion.  All components of the where clause which can be executed at the database server are distilled into an HQL where clause.  Those which require client-side where clause processing result in an anonymous WhereExpression subclass being defined as an instance variable within the converted program's Java class.  This conversion implementation is designed to minimize the number of records brought back to the P2J server for client-side testing, by maximizing the restriction which occurs at the server.  However, this is not always possible and in the worst case scenario will result in every record in a table being returned to the P2J server for client-side testing, in the case where no HQL expression can be distilled from the original where clause.

Valid Date Ranges

The SQL standards are not particularly stringent when it comes to database support for dates.  The concept of a date data type is implemented differently across database implementations.  Accordingly, individual implementations do not always map well to Progress' implementation of the date data type.  Some SQL databases have time-agnostic date implementations, others encapsulate date information within a timestamp data type.  Timezones are not handled uniformly across implementations, and the issue is further complicated by the JDBC driver layer, since java.sql.Date is actually a time and date value rolled into one, including a particularly confusing implementation of timezones, which may have implications on the use of dates in the persistence runtime.  The most obvious problem from a runtime perspective is the mismatch of valid date ranges supported by different database vendors, which creates the possibility that we will encounter dates within a query's where clause which are supported by Progress (and Java), but may be out of bounds (i.e., too early or too late) for a particular backend.

If out of bounds date values are permitted to flow through the query to the backend, the result will most likely be an error raised in the backend, which will be reported by the JDBC driver as some form of SQLException.  Therefore, we cannot allow out of bounds dates to be sent to the backend.  This problem may occur both with dates which are used as query substitution parameters, as well as date literals which are embedded directly in the where clause text.  The former issue has been addressed by clipping query substitution parameter dates which are invalid (i.e., too early or too late) with regard to the range of dates supported by the backend database's date type.  The latter issue remains unaddressed;  in this case, a different approach will be necessary, probably involving some advanced analysis by the HQLPreprocessor.

Hibernate's database dialect architecture is surprisingly silent on the issue of differing date implementations across databases.  We therefore extend Hibernate's dialect idiom to add an awareness of the low and high date supported by a particular database dialect.  When we encounter a query substitution parameter which is of type com.goldencode.p2j.util.date, we check whether its value is outside the range supported by the backend implementation.  If so, we reset the value to the closest possible date supported by the backend.  A debug message (level FINE) is logged whenever such a substitution is made.

In the case of PostgreSQL, for instance, a date earlier than January 1, 4713 B.C. (the low end of PG's date range) is reset to January 1, 4713 B.C.  PG's high end happens to coincide with that of Progress, but the clipping idea is the same on the high end.  Note that we use the GMT timezone when defining our concrete implementations of com.goldencode.p2j.persist.dialect.P2JDialect, to match the implementation of com.goldencode.p2j.util.date.

This solution works well for queries where the restriction test is a range match (e.g., ...where my_date >= ? and my_date <= ?).  However, it may cause a change in behavior for equality and inequality matches (e.g., ...where my_date = ?).  We chose to log the message at level FINE, rather than WARNING.  In cases where the substitution is necessary, it may occur quite often, particularly for dynamic queries executed within a loop.  Thus, the cost of composing and logging the message repeatedly would become significant in a production environment.  Therefore, it should be noted that this higher level of log filtering may mask an unintended change in query behavior, which may only be detectable by its subtle side effects (missing records, too many records found, etc.).  We deemed this risk to be minimal, since it would be unusual for a business application to rely upon date processing outside the date ranges supported by most modern database implementations.
Skip navigation links
Copyright (c) 2004-2017, Golden Code Development Corporation.
ALL RIGHTS RESERVED. Use is subject to license terms.