Project

General

Profile

Feature #3872

implement a two stage import process

Added by Greg Shah over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
version_reported:
version_resolved:

History

#1 Updated by Greg Shah over 5 years ago

Users with very large databases might have import times that are longer than the outage window which can be arranged. For example, if the cutover window is 24 hours and it takes 48 hours to import, then some other approach is needed.

A simple solution is to implement a 2 stage import process:

  1. The first stage operates the same way as the current import works. The difference is that it is run ahead of time. The user dumps the database in OpenEdge but does not shut down the 4GL system. The system is allowed to continue working.
  2. The second stage just applies diffs since the first stage was executed. This is designed to be very fast and it could easily handle a few weeks or more of diffs without taking very long at all.

The key here is to implement some kind of diffs capture and then to write a 2nd stage process to apply those changes. Of course, it must handle deletes as well as inserts and edits.

With OpenEdge 11.7 there is a Change Data Capture (CDC) facility that is designed exactly for this same purpose (probably for efficient replication of changes). That should be leveraged rather than writing a trigger-based implementation. The 4GL docs describe how to configure CDC this but I don't see anything obvious about how to dump or use the captured results. On the other hand it describes storing this data in "change tables". I guess one uses 4GL to read/dump the data in these change tables.

Unless there is a standard for storing the CDC results, we will have to implement the dump to some format and the "apply changes" 2nd phase would be written to read in that format. If there is a standard, we should use that standard if at all practical.

#2 Updated by Greg Shah over 5 years ago

A customer has posed this question:

Many customers are on OpenEdge 11.6 while CDC requires 11.7 and it requires an extra license.

Is there another way to implement the two stage import without CDC?

Without CDC, there are multiple possible approaches:

  • One could use schema triggers to log the changes. This change log would then be "replayed" in the migrated database.
  • An alternative is to compare a full export of the modified database with the full export snapshot that was used in the stage 1 import. Java tools can be written to calculate the diffs by comparing the .d files.

I'm sure there are other approaches too.

Also available in: Atom PDF