Data Migration¶

Data Migration

All but the most trivial Progress 4GL applications require database access. In the FWD runtime environment, data persistence services are backed by a Relational Database Management System (RDBMS). To prepare your converted application to run, it will be necessary to move your data out of a Progress database and into an RDBMS. Currently supported databases are PostgreSQL, SQL Server, and H2. As a prerequisite to the data import, you should have installed the necessary database software and prepared a new, target, database instance, as described in one of the following chapters:

In this chapter, we discuss:

exporting your existing data from a Progress database;
the process of migrating your data to its new home.

In order to migrate your data, you should have completed as a prerequisite at least the front end and middle portions of the conversion process with your application, and the results of that effort should successfully build (see the Building Converted Code chapter). Data migration relies upon classes and configuration files generated by those phases of conversion.

Data Export¶

The first step in migrating your data is to export it from its source, Progress database into an intermediate form which the import process can use. FWD data import uses the standard format written by the Progress EXPORT statement. It expects each table's data to be stored in a separate text file with a .d extension. It is critically important that the root file name of the export file matches the name specified for the corresponding table by the DUMP-NAME property in the .df schema file for the database. Otherwise, the import process will not be able to associate the exported data with the target table in the new database, and that table's data will not be imported.

Sequences represent a special case. Their definitions usually are exported together with the rest of the tables of the database, but optionally they can be exported to a special file named _seqdefs.df. Additionally, their current values must be exported manually. For the import of these values to be successful, the name of the file must be the default _seqvals.d offered by the Data Dictionary tool. Otherwise, the file will not be found and the sequences will not be correctly initialized in FWD.

The most convenient - though not necessarily the fastest - way to export data from its original Progress database is to use the Progress Data Dictionary utility. From the Data Dictionary, go to the Tools -> Data Administration menu. From there, use the menu Admin -> Dump Data and Definitions -> Sequences Current Values.

Refer to the Internationalization chapter for how to deal with encoding requirements during the export.

Move the exported files to a location accessible to the target database system. For a large data set, we recommend placing them on as fast a partition/drive as is available, to maximize I/O throughput during data import. In preparation for import, the files should be uncompressed and in a single directory.

Data Import¶

In this phase you will populate a new, target, database with the data exported from the corresponding Progress database. The process is the same, regardless of which type database you are using, since it uses the FWD runtime to process the import.

Prepare a Path to the Exported Data.¶

The data import process expects to find the data export (.d) files in a subdirectory of $P2J_HOME/data/. The approach we recommend is to create a symbolic link within $P2J_HOME/data/, which points to the directory containing those files. For example:

cd $P2J_HOME/data
ln -s {export_file_path} dump

where {export_file_path} is the absolute path to the export file directory.

If your system does not support symbolic links, store the exported data directly in a dump subdirectory of $P2J_HOME/data/.

Set Logging Properties¶

The import program writes status and error messages using the J2SE logging API. This mechanism requires the logging subsystem be configured. To do this, FWD relies upon a logging.properties file in the $P2J_HOME/cfg/ subdirectory. A minimalist example of this file follows:

############################################################
#       Default Logging Configuration File
#
# You can use a different file by specifying a filename
# with the java.util.logging.config.file system property.
# For example java -Djava.util.logging.config.file=myfile
############################################################

############################################################
#       Global properties
############################################################

# "handlers" specifies a comma separated list of log Handler
# classes.  These handlers will be installed during VM startup.
# Note that these classes must be on the system classpath.
# By default we only configure a ConsoleHandler, which will only
# show messages at the INFO and above levels.
handlers= java.util.logging.ConsoleHandler

# Default global logging level.
# This specifies which kinds of events are logged across
# all loggers.  For any given facility this global level
# can be overriden by a facility specific level
# Note that the ConsoleHandler also has a separate level
# setting to limit messages printed to the console.
.level = WARNING

############################################################
# Handler specific properties.
# Describes specific configuration info for Handlers.
############################################################

# Limit the message that are printed on the console to INFO and above.
java.util.logging.ConsoleHandler.level = FINEST
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter

############################################################
# Facility specific properties.
# Provides extra control for each logger.
############################################################

com.goldencode.p2j.schema.ImportWorker.level = WARNING

The above settings will log warning and error messages generated by the import process to the console from which the program is launched. This level of logging is appropriate in most cases. For more verbose logging, change the last line (for the com.goldencode.p2j.schema.ImportWorker logger) to a more verbose setting than WARNING, such as INFO, FINE, FINER, or FINEST.

Hibernate ORM is used under the covers for the import. It uses a number of its own loggers to provide additional details. For example, you can add entries for the org.hibernate.SQL and org.hibernate.type loggers, which will log detail about the SQL statements used by the import. Be forewarned that this can produce a lot of output.

Run the Import¶

The FWD import is a multi-threaded process, in which each thread of execution is dedicated to the import of a single table's data from start to finish, until that table's migration is complete. The import program will attempt to process tables in the order which allows the overall process to complete most quickly. Thus, those tables which are deemed to require the most time are attacked first in parallel threads, then those which are expected to take less time, and so on.

The command to run the import is as follows. It is formatted on multiple lines for readability, but it must be executed as a single command.

java
   -server
   [-Xmx{max heap} [-XX:+HeapDumpOnOutOfMemoryError]]
   -classpath p2j/build/lib/p2j.jar:deploy/lib/{application}.jar:cfg:
   -Djava.util.logging.config.file=cfg/logging.properties
   -DP2J_HOME=.
   com.goldencode.p2j.pattern.PatternEngine
     -d 2
     "dbName=\"[{export path}]\"" 
     "targetDb=\"{postgresql | sqlserver | h2}\"" 
     "url=\"{jdbc url}\"" 
     "uid=\"{user role}\"" 
     "pw=\"{password}\"" 
     ["dataPath=\"{dataPath}\""]
     ["lobPath=\"{lobPath}\""]
     ["maxThreads={threads}"]
     schema/import
     data/namespace {schema}.p2o 2>&1 | tee data_import_$(date '+%Y%m%d_%H%M%S').log

where:
{max heap} is the maximum heap available to the JVM, specified as a number followed by an m for megabytes or a g for gigabytes;
{export path} is an optional path which specifies the location of the export files, relative to $P2J_HOME/data/; if empty, the export files should reside directly in $P2J_HOME/data/;
{application} must be replaced with the name of the application (as used for an application jar file);
{jdbc url} must be replaced with a JDBC URL appropriate to the target database, as follows:
for PostgreSQL: jdbc:postgresql://[{host}[:{port}]/]{database}
for SQL Server: jdbc:sqlserver://{host};instanceName={sqlInstance};databaseName={database};
for H2: jdbc:h2:{location}/{database_name};MV_STORE=FALSE;LOG=0;CACHE_SIZE=65536;LOCK_MODE=0;UNDO_LOG=0
{user role} should be replaced with a database role which has right to insert records into the database;
{password} should be replaced with the password for the specified {user role};
{dataPath} is the name of the folder containing .d files. If not specified it is data/dump/{dbName}
{lobPath} is the name of the folder containing LOB data. If not specified it is {dataPath}/lobs
{threads} is a positive number indicating how many threads the import should use;
{schema} is the name of the database schema namespace, as specified in $P2J_HOME/cfg/p2j.cfg.xml.

For example, for a PostgreSQL target database named hotel_test_db whose schema is named hotel, we might use:

java
   -server
   -Xmx2g -XX:+HeapDumpOnOutOfMemoryError
   -classpath p2j/build/lib/p2j.jar:deploy/lib/hotel.jar:cfg:
   -Djava.util.logging.config.file=cfg/logging.properties
   -DP2J_HOME=.
   com.goldencode.p2j.pattern.PatternEngine
     -d 2
     "dbName=\"hotel\"" 
     "targetDb=\"postgresql\"" 
     "url=\"jdbc:postgresql://localhost:5433/hotel_test_db\"" 
     "uid=\"fwd_user\"" 
     "pw=\"password\"" 
     "maxThreads=2" 
     schema/import
     data/namespace hotel.p2o 2>&1 | tee data_import_$(date '+%Y%m%d_%H%M%S').log

The maximum heap setting is optional, as is the setting to create a heap dump in the event of an out of memory error. Note that these are specific to the Oracle JVM implementation, or those based upon it (e.g., OpenJDK JVMs). If using a different JVM, options specific to that JVM implementation may be used here instead.

Setting the number of threads to use is optional as well. The default is 4. A higher number here is not necessarily better, depending on the distribution of data across exported tables. If there is a small number of very large tables and a large number of smaller tables, using a higher number of threads may actually increase the overall import time. The reason is that each thread operates on a single table. A small number of very large tables (sometimes only one) will define the critical path of the import. Taking away I/O resources from the threads processing those very large tables by assigning additional threads to process smaller tables can slow the progress of those critical tables, and may therefore lengthen the critical path.

If exported data has a more even distribution among tables, a higher number of threads generally will be better. However, it is recommended to not exceed the number of available CPU cores.

Summary¶

At this point, you have migrated your application's data set into a new, working, database instance. Before proceeding further, it would be wise to make live and archival backups of this pristine database. Please refer to the vendor-specific database setup chapter for instructions on doing this for the particular database implementation you are using.

If you have successfully converted and built your entire application by this point, you are ready to configure your application to run in the FWD environment. If not, you can at least use your favorite SQL tool to connect to your database and browse through your data in its new home.

Project

General

Profile

FWD

Wiki