Chapter 10 Data Definition Language - FWD - Golden Code Redmine

Data Definition Language (DDL)¶

Data Definition Language (DDL)

During M0 part of the conversion process, the .df files with the schema for the permanent tables are used to generate the DMO files - the DMO interface, the Hibernate mapping file and the DMO implementation class. At the end of the M0 part and using the information in both the .df files and in the generated DMOs, there will be generated index, table and dialect specific DDL (for each of the dialects configured in the p2j.cfg.xml file). This DDL can be used to create an empty database in which the data can be imported or identify and extract DDL for permanent tables added to the 4GL schema after the first production release of the converted application.

Near the end of M0 part, the DDL is generated by executing the schema/generate_ddl rule set and using the settings in the p2j.cfg.xml file. At the very of M0 part, the sequence DDL are appended to output file by schema/generate_seq_ddl. Also, the generate_ddl and generate_seq_ddl rule sets can be executed from command line at any time, after at least one M0 conversion run was finished successfully. For more information about which commands can be used to generate the DDL, please see the Data Import section of the Data Migration chapter in the FWD Conversion Handbook.

Currently, only postgresql and h2 dialects are supported for permanent tables and DDL can be generated only for these dialects. To generate DDL for one or both dialects, each namespace referring a physical database defined in the p2j.cfg.xml file must be configured, to specify the required dialects; please see the Global Configuration section of the Project Setup chapter in the FWD Conversion Handbook for more details on how to configure the p2j.cfg.xml file.

As noted above, the DDL generation is done using as input the generated DMOs and definition data in the .df files. This is done in 3 steps: generate index and identity sequence DDL, generate table DDL and generate other dialect-specific DDL; most of the index and table DDL is generated using Hibernate's schema and index export tools. From this point on, we will detail each step explaining how DDL is generated for permanent databases, using the postgresql and h2 dialects. Note that no DDL is generated for the temporary tables as they are created at runtime, when the temporary table is first used.

Generating Index DDL¶

The index data is collected from the ADD INDEX clauses in 4GL schema file. The converted index name is built by appending the idx__ prefix to its converted, compatible SQL name. If the schema hints file doesn't have a drop annotation for this index, it will be processed and DDL will be emitted to the schema_index_{schema_name}_{dialect}.sql file in the $P2J_HOME/ddl/ folder. For each ADD INDEX clause, two SQL statements will be emitted: a DROP statement to remove the index (in case if it already exists) followed by a CREATE INDEX statement to create the actual index. The generated DDL will always group the statements by each table: first there will be a group with the DROP statements which will remove all indexes created for this table, followed by a group of CREATE INDEX statements to create the indexes for that table.

Although the ADD INDEX clause supports several parameters in 4GL, during DDL generation only the UNIQUE parameter is of concern, to mark an index unique or not. If an index is determined to not be unique then the id column will be automatically added to the index, as the last column in the list. The PRIMARY parameter is not needed for index DDL generation as the converted tables will have their primary key set on the id column of the converted table. Any UNIQUE PRIMARY index defined in the 4GL schema will be converted to an UNIQUE index.

In the 4GL schema file, the index components are defined using the INDEX FIELD command, following to the ADD INDEX clause. Each INDEX FIELD will have as parameter the 4GL field name followed by the direction of the sort: ASC for ascending and DESC for descending. By default, if no ASC or DESC parameter is present, all index fields are sorted ascending.

PostgreSQL Dialect¶

For each defined 4GL index, following DDL statements are emitted, to drop and then create the index:

DROP INDEX <index_name>;
CREATE [UNIQUE] INDEX <index_name> ON <table_name> ({ [ <column_name> | <formula> ] [ASC | DESC] },...);

In the above statements, the index_name, table_name and column_name always represent the converted name of the index, table and column. If the column is a character column, a formula will be emitted instead: this is needed to comply with the 4GL case-sensitivity and white space trimming rules. When a character column is encountered, the formula will be emitted using one of the following rules:

if this is a case-sensitive character column, the column's formula will be set to:

RTRIM(<column_name>, E' \t\n\r')

if this is a case-insensitive character column, the column's formula will be set to:

UPPER(RTRIM(<column_name>, E' \t\n\r'))

H2 Dialect¶

For each defined 4GL index, following DDL statements are emitted, to first drop and than create the index:

DROP INDEX <index_name>;
CREATE [UNIQUE] INDEX <index_name> ON <table_name> ({ <column_name> [ASC | DESC] },...);

In the above statements, the index_name, table_name and column_name always represent the converted name of the index, table and column 4GL names. If the column is a character column, it will always be prefixed with the computed column string, the two underlines, as in __<column_name>.

When converting an index, the only concern are the character columns which appear in the index definition. These will be collected and, when table DDL is generated, a special computed column will be emitted. With this in mind, whenever character columns are found to be part of the index, the index field will be altered so that it refers the computed column and not the original column. This is done by adding a special prefix to the column name: __ (two underlines), the computed column identifier; the formula for this column will be set with ALTER TABLE statements, appended at the end of the table DDL file. Please see the next section, Generating Table DDL, about how ALTER TABLE statement looks.

Identity Sequence DDL¶

After the index DDL is generated, at the end of the file following two statements will be emitted:

DROP SEQUENCE p2j_id_generator_sequence;
CREATE SEQUENCE p2j_id_generator_sequence START WITH <start_value>;

These statements create a generator sequence which will be used to generate the ID value when a new record is added to a table. The name for this sequence is the reserved p2j_id_generator_sequence. When the DDL is generated, if no start value is given for the ID generator sequence, it will default to 10000. When first importing the data, the value for this sequence will be set to the ID for the last inserted record (the maximum ID value, from all records in all tables). Note that at the time of this writing the, these two statements are emitted the same for both both H2 and PostgreSQL dialects, as they don't have dialect-specific structure.

Generating Table DDL¶

The table data is collected from the ADD TABLE entries in the 4GL schema file. If the schema hints file doesn't have a drop annotation for this table, it will be processed and DDL will be emitted to the schema_table_{schema_name}_{dialect}.sql file in the $P2J_HOME/ddl/ folder.

For each ADD TABLE clause, following SQL statements will be emitted: one or more DROP statements to remove the table and any extent tables (in case they already exist) and CREATE TABLE statement(s) to create the main table and any associated extent tables. For each table, the SQL table name is built by converting the 4GL table name to the compatible SQL format; in case of tables associated with extent fields, their name will be built from the SQL table name followed by the __<extent_size> suffix. Each extent table will always contain all the extent fields of the same size, from a certain table.

From each ADD TABLE definition, only the 4GL table name is needed during DDL generation. The table columns are converted using as source the ADD FIELD definition which follow the ADD TABLE definition. For each field, only the AS data-type, CASE-SENSITIVE,DECIMALS, EXTENT, INITIAL and MANDATORY clauses are used, to determine the SQL data type, to mark the case sensitivity state for character columns, the number of decimal digits for decimal columns, the extent size (for extent fields), the default value and the not-null state of this field. The impact of these clauses is explained in the next table:

ADD FIELD Clause	DDL Equivalent	Details
`AS data-type`	`data_type`	The converted SQL data type, based on the H2 dialect. See the ... TODO: book for more details on how the 4GL data type is converted to their SQL data type, based on the dialect.
`CASE-SENSITIVE`	n/a	Marks a certain character column to be case sensitive. Affects only the DMO implementation class - when a new DMO record is created, all case sensitive fields are marked accordingly.
`DECIMALS`	`numeric(<size>, <decimals>)`	For all decimal type fields, the `DECIMALS` clause provides the number of the decimal digits. In DDL, it is emitted as the second parameter for the `numeric` data type.
`EXTENT`	extent table	All extent fields with the same size are grouped and emitted in the same extent table. See bellow for more details.
`INITIAL`	n/a	The column's default value is not emitted at DDL, but at the DMO class implementation. Whenever a new DMO is created, all of its fields are initialized to the default values, based on the `INITIAL` clauses, if they exist.
`MANDATORY`	`NOT NULL`	Optional clause, marks a certain field as mandatory.

Table 1. TODO:

The generated DDL structure of a table without extent fields and without indexed character fields looks like this:

CREATE TABLE <table_name>
{
   id <id_type> NOT NULL,
   <column_name> <data_type> [NOT NULL],
   ...
   PRIMARY KEY (id)
}

Here, the table_name is the converted table name, data_type is the SQL data type and column_name is the converted column name. Each converted table will have as primary key the id column, its name being also a reserved name. The id_type represents the dialect-specific type of the ID field. For postgresql dialect, this type is int8 and for h2 dialect, this type is bigint.

In cases when there are indexed character fields, the

For the DROP TABLE statement, there is a dialect difference: the IF EXISTS clause is added only for h2 dialect, while postgresql doesn't support this clause:

DROP TABLE <table_name> [IF EXISTS];

When a 4GL table contains extent fields, all extent fields of the same size are grouped together in a single table. For each generated extent table, beside the extent table DDL, it is needed a foreign key constraint (to link the child extent table to its parent table) and an explicit index on the parent__id column; note that the primary key for the extent tables always contains both the parent__id and the list__index columns, in this order. Also, in 4GL, the extent columns can never be part of an index. Following are the DDL statements emitted for this extent table:

ALTER TABLE <extent_table> DROP CONSTRAINT <constraint_name>;
DROP TABLE <extent_table> [IF EXISTS];
CREATE TABLE <extent_table>
{
   parent__id <id_type> NOT NULL,
   <column_name> <data_type> [NOT NULL],
   ...
   list__index <idx_type> NOT NULL,
   PRIMARY KEY (parent__id, list__index)
}
CREATE INDEX <extent_table>_fkey ON <extent_table> (parent__id);
ALTER TABLE <extent_table>
   ADD CONSTRAINT <constraint_name>
   FOREIGN KEY (parent__id)
   REFERENCES <table_name>;

where:

the IF EXISTS clause for the DROP TABLE statement is h2 specific (is not emitted for postgresql).
id_type represents the dialect-specific type of the ID field. For postgresql dialect, this type is int8 and for h2 dialect, this type is bigint.
idx_type represents the dialect-specific type of the list__index field. For postgresql dialect, this type is int4 and for h2 dialect, this type is integer.
extent_table is the converted name of the extent table, which has the <table_name>__<extent_size> format. Here, the table_name is the converted 4GL table name and extent_size is the size of the extent fields which are grouped in this table.
parent__id is a special field which holds the ID of the parent record to which the data for the extent fields belongs.
list__index is the position of this row in the extent list; it can take values only from 0 to extent_size - 1
the parent__id and list__index columns will always form the primary key for any extent table.
an explicit index is created on the parent__id column.
constraint_name is the name of the constraint which links the child extent table to its parent table. On DDL generation, the name will have a form like FKC9CB32BFB4620D8.
column_name is the converted name of the 4GL extent field and data_type is the dialect-specific data type of the 4GL extent field. The generated DDL for a column in this table follows the same rules as the non-extent columns.

At the end of the file the DDL for creation of sequences are appended. Because sequences are database dependent they are described below, for each dialect.

PostgreSQL Dialect¶

The DDL statements emitted for sequences look like this:

DROP SEQUENCE IF EXISTS <seq_name>;
CREATE SEQUENCE <seq_name> INCREMENT <inc> [ MINVALUE <min> ] [ MAXVALUE <max> ] START <init> [CYCLE] CACHE 1;

where the CYCLE is only emitted for cycling sequences, and MINVALUE and MAXVALUE if they are defined into the export file.

Except the ID data type presented above, there are no other dialect specific DDL emitted for PostgreSQL, related to table creation.

H2 Dialect¶

The DDL statements emitted for sequences look like this:

DROP SEQUENCE IF EXISTS <seq_name>;
CREATE SEQUENCE <seq_name> START WITH <init> INCREMENT BY <inc> CACHE 1;

The H2 support for sequences is rather limited, only linear (non cycled) sequences can be handled and there is no support for bounds. These 4GL features will be handled by special code in FWD.

Indexed Character Columns¶

As H2 doesn't allow formulas in the index definition, the only solution is to add a copy of the character column which will have the correct formula set and will be used by the index, so that the index behaves as expected. For each character column collected during index DDL generation, another special column will be added to the table, with its name built using the original column name to which the special computed column prefix, __(two underlines), is added. So, each definition of an indexed character column will be followed by the definition of its associated computed column:

__<column_name> varchar [NOT NULL]

Before Hibernate loads the mapping file for each DMO, the conversion runtime checks if computed columns are needed and inserts them into the mapping file for this DMO. Note that the new computed columns will not have their formulas set at the CREATE TABLE statement, as the Hibernate mapping file doesn't support formulas to be added to the column's definition. The solution is to alter the column definition: after the DDL for all tables was generated, ALTER TABLE statements will be emitted to set the correct formula to the computed columns, following these rules:

if this is a case-sensitive character column, the column's formula will be set to:

RTRIM(<column_name>, ' \t\n\r')

if this is a case-insensitive character column, the column's formula will be set to:

UPPER(RTRIM(<column_name>, ' \t\n\r'))

The syntax of the ALTER TABLE statement which sets the computed column's formula is this:

ALTER TABLE <table_name> ALTER COLUMN __<column_name> <data_type> as <formula>;

where table_name and column_name are the converted 4GL table and column names and data_type is the SQL data type. Also, formula is one of the two expressions presented above.

As the extent columns can't be part of an index, computed columns will never be generated for extent character columns.

Other DDL¶

PostgreSQL Dialect¶

At the time of this writing, PostgreSQL requires DDL to be manually maintained only for the server-side database functions. The file which contains this DDL is located at $P2J_HOME/pl/p2jpl.ddr and contains definitions for all the methods defined in the Functions, Operators and ErrorHandler classes in the com.goldencode.p2j.persist.pl package.

The p2jpl.ddr file is a special file following the structure required by the PL/Java module for the PostgreSQL server and is an implementation of the SQLJDeploymentDescriptor, used by p2jpl.jar. Basically, this file contains two sections, one to install the server-side functions (which contains CREATE FUNCTION statements) and one to remove the server-side functions (which contains DROP FUNCTION statements). Please see the PL/Java documentation for more details about the structure of this file.

H2 Dialect¶

For H2 dialect, some other statements need to be emitted in the table DDL file. These statements are related to the database collation and server-side database functions.

Database Collation¶

For H2, the database files are created when the database is first accessed and there is no external command which creates the database and sets its collation. So, the following statement needs to be executed before any table is created (note that only en_US_FWD collation is supported at this time):

set collation en_US_FWD;

For convenience, after table DDL generation, the conversion rules read the H2 table schema file and, after the drop table section, before any table is created, the SET COLLATION statement is emitted.

Server-Side Database Functions¶

For H2 dialect, the server-side database functions need to be explicitly defined using CREATE ALIAS statements. They are emitted automatically during DDL generation and is needed because, even if the p2jpl.jar is set in the Java class path when the H2 server is started, the H2 server doesn't know which functions to load - they need to be created at the same time with the rest of the tables. Definitions are emitted for all the methods defined in the Functions, Operators and ErrorHandler classes in the com.goldencode.p2j.persist.pl package. All the server-side functions are defined using the following syntax:

CREATE ALIAS IF NOT EXISTS <alias> FOR “<class_name>.<method_name>(<data_type>,...)”;

where:

alias is the server-side function alias
class_name is the package-prefixed Java class name which defines the method
method_name is the Java method name
data_type is the list of parameter data types

DDL with Foreign-Keys Enabled¶

When the foreign-keys global parameter of the p2j.cfg.xml file is set to true, foreign key constraints will be emitted for all the natural joins encountered throughout the application code. When found, a special column will be emitted in the child table, which will have the role to refer the record ID in the parent table. This special column will have the name built from the parent table's converted name to which the __fk suffix is added and will always be emitted at the end of the column list of the CREATE TABLE statement. Also, this column will be indexed and a the foreign key constraint will be created for it, as in:

ALTER TABLE <child_table> DROP CONSTRAINT <constraint_name>;
CREATE TABLE <child_table>
{
   id <id_type>,
   ...
   <foreign_column> <id_type>,
   PRIMARY KEY (id)
}
CREATE INDEX <index_name> ON <child_table> (<foreign_column>);
ALTER TABLE <child_table>
   ADD CONSTRANT <constraint_name>
   FOREIGN KEY (<foreign_column>)
   REFERENCES <parent_table>;

where:

child_table is the name of the child table, determined from the natural join by the conversion rules.
parent_table is the name of the parent table, determined from the natural join by the conversion rules.
foreign_column is the name of the new foreign key column, with its name built from the parent table name to which the __fk suffix is added, as in <parent_table>__fk.
id_type is the dialect-specific data type used for ID columns, int8 for postgresql dialect and bigint for h2 dialect.
index_name is the name of the index on the foreign key column. This has a name built from the child table name and an unique ID, like idx_<child_table>_<unique_ID>
constraint_name is the name of the constraint which links the child extent table to its parent table. On DDL generation, the name will have a form like FKC9CB32BFB4620D8.

Note this feature is disabled by default and is not recommended to be used in a production environment. This is because it may cause deadlocks at runtime, depending upon the order in which an application updates or deletes records in tables which participate in foreign key relations.

Project

General

Profile

FWD

Wiki

Data Definition Language (DDL)¶

Generating Index DDL¶

PostgreSQL Dialect¶

H2 Dialect¶

Identity Sequence DDL¶

Generating Table DDL¶

PostgreSQL Dialect¶

H2 Dialect¶

Indexed Character Columns¶

Other DDL¶

PostgreSQL Dialect¶

H2 Dialect¶

Database Collation¶

Server-Side Database Functions¶

DDL with Foreign-Keys Enabled¶