Chapter 7 Schema Conversion - FWD - Golden Code Redmine

Schema Conversion¶

Schema Conversion

Inputs¶

In the Progress 4GL, the structure, naming, data types and other database or temp-table configuration is stored in database schema definitions. Temp-tables (and work-tables) are defined in the Progress 4GL source code itself (as DEFINE TEMP-TABLE or DEFINE WORK-TABLE language statements). Each database accessed has a schema that is defined as part of the database's internal configuration and data structures. Using the Progress Data Dictionary, each database's schema definition can be exported into a .df file.

These schema definitions must be converted into a form that can be used in a semantically identical way in the converted system.

Transformation Overview¶

The following diagram illustrates the high level inputs and outputs of the schema conversion process:

On the left side are the possible sources of schema definition input. In each case, the database-specific portions of the definitions are parsed and transformed into a tree-structured representation (an abstract syntax tree or AST). This AST is known as the Progress to Object (P2O) representation (or mapping). It is important to note that part of any schema definition can include configuration that is specific to the Progress 4GL user-interface (e.g. help strings, format strings, validation expressions...). The user-interface configuration has no impact on the schema conversion and so it is not included in the P2O mappings since the P2O mappings are only used for database conversion. The user-interface configuration is used only for conversion of the 4GL source code, see the Code Conversion section below.

On the right side of the diagram are the outputs that are generated by the schema conversion process.

Outputs¶

Relational databases generally use Structured Query Language (SQL) as the tool for the specification of the structure and configuration of the database as well as for data access/manipulation.

The SQL syntax to define the structure and configuration of the database is handled by a subset of SQL known as Data Definition Language (DDL). This is the relational database counterpart to the Progress 4GL schema definitions. By executing DDL, a database instance can be created and all of the tables, indexes, constraints and sequences can be created inside that database. This ensures that the database's structure, naming, types, relations, constraints and other configuration is properly setup.

Access and modification of the data in a database is done with a subset of SQL syntax that is referred to as the Data Manipulation Language (DML). In the converted system, the DML is generated and executed at runtime by a combination of FWD code, a 3 ^rd party technology called Hibernate, a 3 ^rd party Java Database Control (JDBC) driver and the J2SE platform's JDBC layer. The FWD layer handles providing a set of fully compatible Progress 4GL features. Hibernate is used to handle the translation of database features into Java objects, as well as to hide the differences between relational databases behind database-specific “dialects”. By representing the database features as Java objects, the use and manipulation of the database is made much easier. The JDBC layer is responsible for the low-level connection and transport to communicate with the database.

The conversion must create the Java classes used to represent the database features. These Java objects are called Data Model Objects (DMOs). There is one DMO per table (permanent database table or temp-table). Each DMO is made of two parts. It has a Java interface that defines an application programming interface (API) which is an abstract definition of the methods available for use on a given table. It also has a Java implementation class where the interface is implemented as real Java source code. The converted business logic uses objects of the DMO interface type which are actually instances of the DMO implementation class.

Hibernate handles the creation, update and deletion of a DMO instance for each row in the given table associated with that DMO. To do this job, Hibernate must be configured to define the mapping of the database to the Java objects (DMOs) used by the application. There is a Hibernate mapping file created for each DMO, which provides the configuration needed by Hibernate to associate each database column with its proper type and property in the DMO instance. Other configuration is also included to transparently handle the converted replacement for Progress extent fields.

The FWD runtime loads Hibernate when the FWD application server initializes. It loads Hibernate based on a list of DMO classes in a file called the DMO Index. Once Hibernate is loaded, FWD gets much of its configuration data by inspecting the Hibernate environment. Where there are requirements for features that are not supported by Hibernate and where the Hibernate configuration is insufficient, additional configuration data is read from the DMO Index.

Details¶

The DDL for permanent tables is created automatically during the M0 part of the conversion process, or it can be run independently, by hand, as explained in the Schema Conversion chapter. The DDL resides in two files, one with the core structure DDL and one with the index DDL. During the M0 phase, the DDL is generated for all dialects specified in the p2j.cfg.xml file, in the namespace/ddl-dialects node.

To generate the core structure DDL, the Hibernate DDL generation tools are used to create DDL for the specific dialect of SQL. Depending on the dialect, the Hibernate mapping file for each DMO may need to be preprocessed before the DMO is registered with Hibernate. For the H2 dialect, FWD will need to automatically add a computed column for each indexed, character (text) column. This is required because FWD requires that indexes containing text columns operate on right-trimmed (and possibly uppercased) string data, in order to support Progress-like selection and sorting behavior. Some databases (such as PostgreSQL) support the creation of indexes using built-in functions, like upper and rtrim. When such a database is in use, these functions are embedded directly within the index creation DDL. However, other databases (such as H2) do not support the use of functions within an index definition, so computed columns must be used instead, when a text column is involved. Once all the DMOs are registered, DDL creation is delegated to Hibernate's DDL generation tool.

The file containing the core structure DDL will first drop all tables in the database; after this, it will run any dialect-specific statements (e.g., setting the collation and installing the UDF for the H2 dialect), then it will create the actual tables (CREATE TABLE statements). These CREATE TABLE statements define the tables of the database, the columns in each table, the data types of the columns and configuration of the tables and columns.

Progress supports extent fields which are database columns that represent an array of multiple elements, where each element can be referenced directly via an integer index. Not all relational databases support array columns. As part of the conversion of any Progress table that has extent fields, separate tables are created to store the elements of the array. Those separate tables are related back to the “main” table using a one to many relationship. Each row of the main table has a unique id. Each row of the dependent array tables has a foreign key constraint which references back to the primary key of the main table (which is the unique id). The DDL defines the array tables and their indexes/constraints. This DDL consists of CREATE TABLE statements, CREATE INDEX statements and ALTER TABLE statements (to create the foreign key constraints). This DDL is included in the same DDL file that drops and creates the core structure. The DDL statements for defining the sequences, if any, are also appended at the end of this file.

Any DDL required specifically for the particular database dialect in use is emitted at the end of this file. At the time of this writing only H2 needs custom DDL - this consists of ALTER TABLE statements to set the formula for the special computed columns noted above. When using the H2 dialect, a computed column is defined for each indexed, character column; the computed column is then referenced by index definition DDL.

The core structure DDL does not include the CREATE DATABASE statement. This allows the details of the specific database instance to be controlled separate from the definition of the core structure. The CREATE DATABASE statement is something done as part of the data import process, just before the core structure DDL is executed.

The second part of the permanent tables DDL is the index DDL. The index DDL is created and executed during the data import process. That DDL is not stored in an output file, but is only created on a transient basis (in memory). While it would have been possible to generate the index DDL along with the core structure DDL, this was found to substantially lengthen the time needed for the import process. The issue is that the indexes must be maintained as each record is inserted (which takes time) but most indexes are not needed (and don't help) those same inserts. So there is a cost to each index, without any benefit. To resolve this problem, the CREATE INDEX statements are generated and executed during the import process as needed. As each table is processed, first the unique indexes are created. Then the data is imported to that table. The unique indexes are needed to ensure that the database can enforce the unique constraints on the table. Otherwise there would potentially be invalid rows in the table. Even though Progress has support for unique indexes, in practice it has been found that it is possible to have conflicting records (more than one record that matches the same unique criteria). Such records are rejected when inserted as rows during the import process. The import process handles the insertion of the extent field data into the associated rationalized array tables at this same time. Once all the data is imported, the remaining indexes for that table are created. Each table is processed in this manner in turn until all data has been imported.

The DDL for temp-tables is created dynamically at runtime. The FWD server using the DMO Index and Hibernate mapping files to load and configure Hibernate. FWD then adds index definitions based on data stored in the DMO Index and uses Hibernate to generate the temp-table DDL statements. Those statements are post-processed and then submitted to the H2 embedded Java database for both the _temp database (where temp-tables reside) as well as the “dirty database” which is an internal cache of edits that are visible across sessions in Progress. The H2 embedded database exists inside the same Java process as the FWD application server. Since all of this DDL is generated on the fly, it is never stored as output of the conversion process. That is why it does not appear on the diagram.

The DMO index is a file named dmo/dmo_index.xml which is stored in the root package of the converted Java code. This file is built into the application jar file and is loaded from there at runtime by FWD. This file is only used for runtime decisions and processing, and it is only used by FWD. The first function of this file is to provide a list of all tables (both permanent and temporary) that must be configured in Hibernate. When FWD loads, it must initialize and configure Hibernate. The list of tables loaded in Hibernate is based on the DMO index. Other Progress 4GL features that cannot be configured in Hibernate or directly in the database, may depend upon configuration data stored in this file. Database column case-sensitivity, unusual caching behavior associated with unique constraints and implicit/advisory foreign key relationships are all configured via data in this file. These features are implemented for both permanent tables and temp-tables. The unique constraint data, while present for permanent tables is no longer used since that data can be dynamically read from the Hibernate and/or database metadata.

The temp-table sections of the DMO index have additional definitions for all indexes. For permanent tables, the index definitions can be read from the database metadata. But for temp-tables, the DDL to create the indexes and to configure Hibernate must be obtained from some persistent storage available at runtime, since temp-tables have no other persistent configuration. Unfortunately, the Hibernate mapping documents do not properly support the necessary index configuration features. For example, Hibernate has index objects that can support multi-column indexes (which are pervasive in Progress 4GL) but the order of the columns in an index cannot be specified or controlled via the Hibernate mapping documents.

Hibernate is used for 3 purposes in FWD:

the first usage is when the permanent table DDL is generated (see above);
the second usage is for data import where each row of data is read from the export dump (.d files) and then set into the associated DMO for that table. Hibernate is used to insert that row (via the DMO) into the database. At this stage the current values of sequences are also set using Hibernate;
the third usage of Hibernate is at runtime for the FWD server.

The DDL generation and data import use cases are only for permanent tables. The FWD runtime Hibernate usage is for both permanent tables and temp-tables. Since the Hibernate mapping documents are used to load and configure Hibernate, all three of these processes are dependent upon the Hibernate mappings being properly generated. As noted above, due to limitations with multi-column indexes, the Hibernate mapping documents don't contain index configurations. This is true for the Hibernate mappings for both permanent tables and temp-tables. The FWD conversion process reads the table configurations, field configurations and data associated with extent fields from the P2O files. That data is then used to generate the Hibernate mapping documents. That same data is used to define the interface for each DMO and to write the implementation file. All three of these outputs are highly interrelated. The interface defines the getters and setters accessible in an instance of the DMO. The interface is found off the package root directory in a file named dmo/<database_name>/<table_name_as_a_Java_class>.java. The database name for a permanent database is read from the FWD project configuration. For temp-tables, _temp is used as the database name. The file is a valid Java interface and includes JavaDoc that describes each property and the original source field in Progress from which the property was derived. The implementation file is a Java class that implements that specific DMO interface. It is found in a file named dmo/<database_name>/impl/<table_name_as_a_Java_class>Impl.java (the same database and table naming rules apply). Each method of the interface is implemented with real backing code including the proper initialization of each property. The Hibernate mapping file for that DMO is written in a file dmo/<database_name>/impl/<table_name_as_a_Java_class>Impl.hbm.xml. That mapping file instructs Hibernate in how exactly to map each property in the DMO interface into a specific table's column. Thus it is a logical description of the DMO interface which is used as the configuration data for Hibernate.

Please see Part 3 - Schema Conversion for full details on the input and transformations that are supported.

Project

General

Profile

FWD

Wiki

Schema Conversion¶

Inputs¶

Transformation Overview¶

Outputs¶

Details¶