Data Migration¶

Data Migration

This chapter is out of date and is no longer being maintained. It is only being retained for historical purposes. The replacement chapter can be found in Data Migration.

All but the most trivial Progress 4GL applications require database access. In the FWD runtime environment, data persistence services are backed by a Relational Database Management System (RDBMS). To prepare your converted application to run, it will be necessary to move your data out of a Progress database and into an RDBMS. Currently, the FWD project supports PostgreSQL and H2 as its backing database system. In this chapter we discuss:

exporting your existing data from a Progress database;
how to create of a new PostgreSQL or H2 database which provides compatibility for certain Progress features expected by the converted application;
how to generate and apply a schema to the new database, using schema information from your original application; and
the process of migrating your data to its new home.

In order to migrate your data, you should have completed as a prerequisite at least the Front End and Middle portions of the conversion process with your application, and the results of that effort should successfully build (see the Building Converted Code chapter). Data migration relies upon classes and configuration files generated by those phases of conversion.

Data Export¶

The first step in migrating your data is to export it from its source, Progress database into an intermediate form which the import process can use. FWD data import uses the standard format written by the Progress EXPORT statement. It expects each table's data to be stored in a separate text file with a .d extension. It is critically important that the root file name of the export file matches the name specified for the corresponding table by the DUMP-NAME property in the .df schema file for the database. Otherwise, the import process will not be able to associate the exported data with the target table in the new database, and that table's data will not be imported.

A special case are the sequences. Theirs definitions are usually exported altogether with the rest of the tables of the database, but optionally they can be exported to a special file named _seqdefs.df. However, their current values must be manually exported and for the import operation to be successful the name of the file must be the default _seqvals.d offered by the Data Dictionary tool or else the file will not be found and the sequences will not be correctly initialized in FWD.

The most convenient - though not necessarily the fastest - way to generate this result is to use the Progress Data Dictionary utility.

Refer to the Internationalization chapter for how to deal with encoding requirements during the export.

Move these files to a location accessible to the target database system. For a large data set, we recommend placing them on as fast a partition/drive as is available, to maximize I/O throughput during data import. In preparation for import, the files should be uncompressed and in a single directory.

Data Import¶

In this phase you will create a new database, apply a schema which is compatible with your converted application, and populate the new database with the data exported from your Progress database. The next two sections will explain how to setup the PostgreSQL and H2 databases, so that the exported data can be imported in each of these RDBMSs.

Importing to PostgreSQL¶

As a prerequisite, you should have installed the PostgreSQL database software and initialized a FWD database cluster as described in the Installation chapter.

Create the database. The first step is to ensure that there is sufficient disk space for the database to be stored. Plan for the PostgreSQL version of your database to be larger (by up to 50% or more) than it was under Progress. We suspect this difference is due to Progress' use of compressed indexes.
For large data sets, we recommend using as fast a partition/drive as is available. PostgreSQL supports tablespaces, which provide an easy way to create a database on a specific partition, so you can choose the fastest available for the import process. Consult the PostgreSQL documentation if you want to use this feature.
The minimal command line to create the database is as follows, but you may require additional parameters, depending on whether and how you configured the PostgreSQL server:
```
sudo su - postgres
createdb [-D {fast_tablespace}] {import_database}
```
where {import_database} is the name of the database you are creating. If you created a tablespace on a fast partition, specify that you want the new database created in that tablespace, by using the optional -D {fast_tablespace} parameter.
Generate a PostgreSQL schema. Although during M0 FWD generates the table and index DDL for all the specified dialects, there is a secondary way of generating this DDL. So, FWD provides a TRPL rule-set with which the database creation DDL can be generated. This can only be used once the conversion driver has successfully run through at least the M0 (middle stages).
For each Progress schema which is being converted, there will be a .p2o file in data/namespace/. To generate the schema, change directory to the root of your project directory and execute this command for each .p2o file:
```
java -server -classpath $P2J_HOME/p2j/build/lib/p2j.jar:.: -Xmx256m com.goldencode.p2j.pattern.PatternEngine -d 2 "dbName=\"{schema_name}\"" "targetDb=\”postgresql\”" schema/generate_ddl data/namespace "{schema_name}.p2o" 
```
where {schema_name} is the name of the schema being converted. By default, this will generate files named schema_table_{schema_name}_postgresql.sql and schema_index_{schema_name}_postgresql.sql in the $P2J_HOME/ddl/ folder. Both these files contain Data Definition Language (DDL) statements to create the necessary database objects in your database.
The index DDL should not be applied to the database (see step 4 below) unless no data is being imported, since the import process creates the necessary indexes in a specific order which causes the import process to perform significantly faster than it would if the indexes were created before the import. In fact, the import process will fail if the indexes already exist. In the case where data is not being imported using the FWD tools, the schema_index_{schema_name}_postgresql.sql can be used to create the indexes after the tables have been created.
Beside the required dbName and optional targetDb substitution variables, some other command prompt variable substitutions are provided to define runtime replaceable parameters that can be read and used by the DDL generation rule-set. These variable substitutions must be inserted after the Java class name, after any pattern engine option flags but before the report name. Warning: the variable names are case-sensitive! The following command line parameters can be added to control the DDL generation:

Definition	Default Value	Required	Usage
dbName	n/a	Yes	This is a string expression in the form `dbName=”name”` where `name` will be substituted into the filenames for the index and table DDL. Also, this is the name of the converted schema for which the DDL needs to be generated.
targetDb	h2	No	This is a string expression in the form `targetDb=”dialect”` where `dialect` will be either `h2` or `postgresql`. The generated DDL will be specific for this dialect.
delimiter	;	No	This is a string expression in the form `delimiter=”';'”` which specifies the DDL statement delimiter.
highID	10000	No	This is a long integer expression in the form `highID=”#(long) num”` where `num` specifies the starting identifier for the primary key sequence.

Install the schema into the new database. Now that you have generated the appropriate DDL for your database schema, you must install it into the target database. The PostgreSQL psql client provides a simple way to do this:
```
psql -U postgres -f schema_table_{database_name}_postgresql.sql {database_name}
```
where {database_name} is the name of the target database. Note that this is a minimal command line for psql. Additional options for host and port may be necessary, depending upon how you have configured the PostgreSQL server. Please refer to your PostgreSQL documentation if these parameters are necessary.
This will emit many status messages to the console. The first part of DDL script contains statements which will attempt to drop constraints and tables that do not yet exist. As a result, you will see messages which complain about errors dropping these objects. This is normal. After this phase, the remaining messages should not refer to errors.
Prepare a path to the exported data, so FWD can import it. The data import process will expect to find the data export (.d) files in a subdirectory of $P2J_HOME/data/. The approach we recommend is to create a symbolic link within $P2J_HOME/data/, which points to the directory containing those files. For example:
```
cd $P2J_HOME/data
ln -s {export_file_path} dump
```
where {export_file_path} is the absolute path to the export file directory.

Set logging properties for the import process. The import program writes status and error messages using the J2SE logging API. This mechanism requires the logging subsystem be configured. To do this, FWD relies upon a logging.properties file in the $P2J_HOME/cfg/ subdirectory. A minimalist example of this file follows:

############################################################
#       Default Logging Configuration File
#
# You can use a different file by specifying a filename
# with the java.util.logging.config.file system property.
# For example java -Djava.util.logging.config.file=myfile
############################################################

############################################################
#       Global properties
############################################################

# "handlers" specifies a comma separated list of log Handler
# classes.  These handlers will be installed during VM startup.
# Note that these classes must be on the system classpath.
# By default we only configure a ConsoleHandler, which will only
# show messages at the INFO and above levels.
handlers= java.util.logging.ConsoleHandler

# Default global logging level.
# This specifies which kinds of events are logged across
# all loggers.  For any given facility this global level
# can be overriden by a facility specific level
# Note that the ConsoleHandler also has a separate level
# setting to limit messages printed to the console.
.level = WARNING

############################################################
# Handler specific properties.
# Describes specific configuration info for Handlers.
############################################################

# Limit the message that are printed on the console to INFO and above.
java.util.logging.ConsoleHandler.level = FINEST
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter

############################################################
# Facility specific properties.
# Provides extra control for each logger.
############################################################

com.goldencode.p2j.schema.ImportWorker.level = WARNING

The above settings will log warning and error messages generated by the import process to the console from which the program is launched. This level of logging is appropriate in most cases. For more verbose logging, change the last line (for the com.goldencode.p2j.schema.ImportWorker logger) to a more verbose setting than WARNING, such as INFO, FINE, FINER, or FINEST.
Hibernate uses a number of its own loggers to provide additional details. For example, you can add entries for the org.hibernate.SQL and org.hibernate.type loggers, which will log detail about the SQL statements used by the import. Be forewarned that this can produce a lot of output.

Run the import. The database has been created, its schema has been applied, the Progress export data has been prepared, and logging is configured. It is finally time to migrate the data to its new home.
The FWD import is a multi-threaded process, in which each thread of execution is dedicated to the import of a single table's data from start to finish, until that table's migration is complete. The import program will attempt to process tables in the order which allows the overall process to complete most quickly. Thus, those tables which are deemed to require the most time are attacked first in parallel threads, then those which are expected to take less time, and so on.
The following command should be run from the $P2J_HOME directory to launch the import program:
```
java
   -server
   -classpath p2j/build/lib/p2j.jar:build/lib/{my_app}.jar:cfg:
   -Djava.util.logging.config.file=cfg/logging.properties
   -DP2J_HOME=.
   com.goldencode.p2j.pattern.PatternEngine
      -d 2
```

      "dbName=\"{export_file_path}\"" 
      "targetDb=\"postgresql\"" 
      "url=\"jdbc:postgresql://{host[:{port}]}/{database_name}\"" 
      “uid=\”{username}\””
      “pw=\”{password}\””

      schema/import

Install base PL/Java support into the new database. As noted in the Installation chapter, PL/Java is used to provide User Defined Function (UDF) support, which allows a compatible equivalent of Progress' built-in functions to be executed within SQL statements at the database server. That chapter described how to prepare the database server and cluster to execute PL/Java. In addition, two sets of changes must be made to each database which uses PL/Java UDFs. The first installs Java as a procedure language and adds a schema and several tools, so that PL/Java can be used to manage UDFs with the target database. This is done with the following command:
```
psql -U postgres -f {pljava_home}/install.sql {database_name}
```
where {pljava_home} is the directory where PL/Java 1.4 is installed and {database_name} is your target database.
The second set of changes will install the FWD implementations of the Progress-compatible, built-in functions. However, we only want to install these on the working database copy, and the current database is about to be archived to a master database instance (see below). Therefore, we will defer this installation until after we have created the archive.
Create a “live”, master database backup. This step is not strictly necessary, but we strongly recommended making a pristine backup of the freshly imported database. This will be a “live” database instance in the same database cluster as your working database. While this will double the amount of storage space needed, it is invaluable in a test and development environment, where the quick restoration of your working database instance to a known, good state is necessary from time to time (e.g., regression testing, stress testing, etc.). Otherwise, you are left with the much slower alternatives of re-importing the data to a new database, or restoring from a file archive.
If fast disk space is at a premium, and you have more abundant, but slower disk storage available, we recommend creating the master database instance in a separate tablespace. You can create this tablespace on your slower partition/device, since the live master database is only needed occasionally for recovery purposes. If you choose to go this route, please consult the PostgreSQL documentation for instructions on creating a tablespace.
The PostgreSQL createdb utility offers a convenient way to copy a database instance:
```
createdb -U postgres [-D {slower_tablespace}] -T {database_name} {master_name}
```
where:
{slower_tablespace} is the name of the tablespace you optionally created on a slower partition. If you do not want to use a separate tablespace, omit this parameter entirely.
{database_name} is the name of the database you wish to use as your template for the master database; the database you created earlier and into which you imported your data.
{master_name} is the name of the live master database you wish to create.
Again, you may need to add parameters for host and port, depending upon your PostgreSQL server configuration.
Note that you would use a variation of this command to restore a working copy from the master database instance. In that case, you would simply use {master_name} with the -T option, and optionally specify the target tablespace for your working copy.
Create a file archive of your database. This is another optional step, but as a best practice, we recommend dumping your imported database to file before continuing, and archiving that file per your usual backup procedures. PostgreSQL offers the pg_dump utility for this purpose. Please consult the PostgreSQL documentation for details of that utility's use.
Install Progress-compatible built-in functions as PL/Java UDFs. This is the final phase of enabling Progress-compatible built-in functions in the database. Now that you have a live master database instance and a file archive as backups, you will apply this step to the working copy of your database. We recommend this approach, rather than installing the UDFs in the master database, because the UDF implementations may change between FWD versions, due to bug fixes, new functionality, etc. Thus, it is a good idea to be sure you are always applying the latest version to your working database.
The UDFs are bundled as a jar file: p2jpl.jar, located in the p2j/build/ subdirectory of the project. You will use the PL/Java tools and base support installed in your database earlier to install this jar file into the database in a way PL/Java can access it. This step will attempt to (a) remove any previous version of the jar file (this will normally report an error the first time it is run, since there is no previous version to remove), (b) install the latest version, and (c) set a classpath for PL/Java, so it can locate the classes which implement the individual UDFs.
First, the SQL script used for this purpose must be edited, to identify where in your file system PL/Java can find the p2jpl.jar file to install. Edit p2j/pl/install_p2j.sql:
```
select sqlj.remove_jar('p2j', true);
select sqlj.install_jar('file:///{p2j_home}/build/lib/p2jpl.jar', 'p2j', true);
select sqlj.set_classpath('public', 'p2j');
```
Replace {p2j_home} with the path to the FWD installation on your system. Note that this must be an absolute path from the system's root partition. Save the file.
Issue the following command (including any parameters appropriate to your PostgreSQL installation), from the root directory of your application project:
```
psql -U postgres -f p2j/pl/install_p2j.sql {database_name}
```
You should see the following output:
```
psql:install_p2j.sql:1: ERROR:  java.sql.SQLException: No Jar named 'p2j' is known to the system
 install_jar
-------------

(1 row)

 set_classpath
---------------

(1 row)
```
To ensure the installation completed successfully, try using one of the PL/Java UDFs you just installed. Log into the psql client:
```
psql -U postgres {database_name}
```
At the interactive prompt, emit the following (including trailing semi-colon) and press <Enter>:
```
select todec('123.45');
```
The result should be the following:
```
     todec
----------------
 123.4500000000
(1 row)
```
Change ownership of your working database copy. The final step is to change the owner of your database from the postgres superuser to the limited-access user account you created in the Installation chapter. This will ensure the limited-access user is able to access and write data to the database. To do this, you will use psql, but connect to a different database. The postgres database exists for this type of administrative use:
```
psql -U postgres postgres
```
At the psql interactive command prompt, execute the following commands, pressing <Enter> after each command (note that the trailing semi-colon in the first line is necessary):
```
ALTER DATABASE {database_name} OWNER TO {new_owner};
\q
```
where {database_name} is the name of your working database instance and {new_owner} is the limited access database user.

Importing to H2¶

When migrating the data to a H2 database, it is not necessary to have a H2 server started. Instead, H2's embedded mode will be used.

Create the database. The first step is to ensure that there is sufficient disk space for the database to be stored. Plan for the H2 version of your database to be larger (by up to 50% or more) than it was under Progress. We suspect this difference is due to Progress' use of compressed indexes.
For large data sets, we recommend using as fast a partition/drive as is available. In H2, the database is specified in the connection URL by using an existing path, which ends with the database name.
The database will be created automatically in the specified location when the schema is installed into the new database.
Generate an H2 schema. Although during M0 FWD generates the table and index DDL for all the specified dialects, there is a secondary way of generating this DDL. So, FWD provides a TRPL rule-set with which the database creation DDL can be generated. This can only be used once the conversion driver has successfully run through at least the M0 (middle stages).
For each Progress schema which is being converted, there will be a .p2o file in data/namespace/. To generate the schema, change directory to the root of your project directory and execute this command for each .p2o file:
```
java -server
```

   -classpath $P2J_HOME/p2j/build/lib/p2j.jar:.:
   -Xmx256m
   com.goldencode.p2j.pattern.PatternEngine
      -f -d 2
      "dbName=\"{schema_name}\"" 
      schema/generate_ddl
      data/namespace
      "{schema_name}.p2o"

Definition	Default Value	Required	Usage
dbName	n/a	Yes	This is a string expression in the form `dbName=”name”` where `name` will be substituted into the filenames for the index and table DDL. Also, this is the name of the converted schema for which the DDL needs to be generated.
targetDb	h2	No	This is a string expression in the form `targetDb=”dialect”` where `dialect` will be either `h2` or `postgresql`. The generated DDL will be specific for this dialect.
delimiter	;	No	This is a string expression in the form `delimiter=”';'”` which specifies the DDL statement delimiter.
highID	10000	No	This is a long integer expression in the form `highID=”#(long) num”` where `num` specifies the starting identifier for the primary key sequence.

Install the schema into the new database. Now that you have generated the appropriate DDL for your database schema, you must install it into the target database. This is done by running the following command:
```
java
```

   -classpath $P2j_HOME/p2j/build/lib/h2.jar:$P2J_HOME/p2j/build/lib/p2jpl.jar
   org.h2.tools.RunScript
      -url jdbc:h2:{location}/{database_name}
      -user {username}
      -password {password}
      -script $P2J_HOME/ddl/schema_table_{database_name}_h2.sql
      -continueOnError

Prepare a path to the exported data, so FWD can import it. The data import process will expect to find the data export (.d) files in a subdirectory of $P2J_HOME/data/. The approach we recommend is to create a symbolic link within $P2J_HOME/data/, which points to the directory containing those files. For example:
```
cd $P2J_HOME/data
ln -s {export_file_path} dump
```
where {export_file_path} is the absolute path to the export file directory.

############################################################
#       Default Logging Configuration File
#
# You can use a different file by specifying a filename
# with the java.util.logging.config.file system property.
# For example java -Djava.util.logging.config.file=myfile
############################################################

############################################################
#       Global properties
############################################################

# "handlers" specifies a comma separated list of log Handler
# classes.  These handlers will be installed during VM startup.
# Note that these classes must be on the system classpath.
# By default we only configure a ConsoleHandler, which will only
# show messages at the INFO and above levels.
handlers= java.util.logging.ConsoleHandler

# Default global logging level.
# This specifies which kinds of events are logged across
# all loggers.  For any given facility this global level
# can be overriden by a facility specific level
# Note that the ConsoleHandler also has a separate level
# setting to limit messages printed to the console.
.level = WARNING

############################################################
# Handler specific properties.
# Describes specific configuration info for Handlers.
############################################################

# Limit the message that are printed on the console to INFO and above.
java.util.logging.ConsoleHandler.level = FINEST
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter

############################################################
# Facility specific properties.
# Provides extra control for each logger.
############################################################

com.goldencode.p2j.schema.ImportWorker.level = WARNING

Run the import. The database has been created, its schema has been applied, the Progress export data has been prepared, and logging is configured. It is finally time to migrate the data to its new home.
The FWD import is a multi-threaded process, in which each thread of execution is dedicated to the import of a single table's data from start to finish, until that table's migration is complete. The import program will attempt to process tables in the order which allows the overall process to complete most quickly. Thus, those tables which are deemed to require the most time are attacked first in parallel threads, then those which are expected to take less time, and so on.
The following command should be run from the $P2J_HOME directory to launch the import program:
```
java
   -server
   -classpath p2j/build/lib/p2j.jar:build/lib/{my_app}.jar:cfg:
   -Djava.util.logging.config.file=cfg/logging.properties
   -DP2J_HOME=.
   com.goldencode.p2j.pattern.PatternEngine
      -d 2
```

      "dbName=\"{export_file_path}\"" 
      "targetDb=\"h2\"" 
      "url=\"jdbc:h2:{location}/{database_name};LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0\"" 
      “uid=\”{username}\””
      “pw=\”{password}\””

      schema/import

Create a master database backup. This step is not strictly necessary, but we strongly recommended making a pristine backup of the freshly imported database. While this will double the amount of storage space needed, it is invaluable in a test and development environment, where the quick restoration of your working database instance to a known, good state is necessary from time to time (e.g., regression testing, stress testing, etc.).
As in H2 the database files can be located anywhere on the available partitions, making a database backup is as simple as archiving or copying the contents of the folder specified at step №3, via the {location} substitution parameter.
When restoring the master database backup to another instance, just copy or decompress the backed up files to the desired location. In H2, a database instance is defined by the location where the database files reside and the database name, so don't forget to use the same database name, when you will build the new connection URL - in different instances, just the location of this files will differ.
Install Progress-compatible built-in functions UDFs.
The generated schema at step №2 already contains the necessary UDFs for the built-in functions. These were already added to the database during schema installation at step №3, so no special action is needed.
Change ownership of your working database copy.
You may want to limit access to this database only to specific user accounts on that machine. This is done by directly changing the ownership and/or the access rights of the database files

Summary¶

At this point, you have migrated your application's data set into a PostgreSQL or H2 working database instance, and you have created live and archived backups of your new database. If you have successfully converted and built your entire application by this point, you are ready to configure your application to run in the FWD environment. If not, you can at least use your favorite SQL tool to connect to your database and browse through your data in its new home.

Project

General

Profile

FWD

Wiki