Project

General

Profile

Feature #8786

incremental schema conversion and updates

Added by Greg Shah about 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
version_reported:
version_resolved:

Related issues

Related to Conversion Tools - Feature #6320: sharded schemata New

History

#1 Updated by Greg Shah about 2 months ago

#2 Updated by Greg Shah about 2 months ago

Over time, developing 4GL code naturally requires schema updates. For existing non-trivial persistent database instances, there must be some kind of process to apply those updates without a dump and load of the entire database. This requirement exists in the converted system as well, so long as 4GL schema changes are being made.

During convertion, the schema is emitted as DDL scripts that are SQL dialect-specific. In p2j.cfg.xml, you specify the dialects to be supported and for each one a set of DDL is generated. These scripts are used to create new database instances.

Once a database instance has been created, future schema updates have traditionally been left to the customer. We have multiple customers that use some variant of the following manual process:

  1. Save off the ddl/ output from the baseline conversion. The baseline conversion is the last conversion for the .df before the changes which need to be applied were made.
  2. Update the code and schemata in the project and run a new conversion.
  3. Compare the new ddl/ to the old ddl/ contents. You can use your favorite diffing tools.
  4. Extract the diffs. With little to no edits, these are the basis for a SQL script to apply the schema changes to the database for that dialect (PostgreSQL, MariaDB...).
  5. If needed edit the scripts to initialize data for new columns, edit data for changed columns.
  6. Use psql to run the scripts on a specific already existing and populated database instance. This can be automated.

Steps 1 through 5 are done once per schema update. Step 6 is done once per database instance that already exists.

The idea is that these scripts should leave an upgraded database instance with the identical schema to a newly created instance. For example:

  • Conversion is run on 4GL app/schema v1.0.
  • Installations A and B go into production with the converted code for v1.0. Each has a database created using the DDL that came from the v1.0 conversion.
  • 4GL devs write changes to create 4GL app/schema v1.1. There are schema changes, so the .df is a superset of the v1.0 version.
  • Conversion is run for 4GL app/schema v1.1.
  • 4GL devs use the manual process noted above to create the differential v1.0 to v1.1 upgrade DDL.
  • Installations C and D go into production with the converted code for v1.1.
  • Installations A and B get upgraded to use the new v1.1 code and the differential upgrade scripts are used to update the database to have the same exact schema as if they had been newly created with v1.1.

You should not be able to tell the difference between the schematas at A, B, C or D. They should be identical regardless of which installation path they took.

Note that we have one customer that has built their own automated process to create these scripts and they have their own process to apply those scripts in deployments (if I understand correctly).

This task is meant for us to create a tool that is part of conversion, which would build these scripts automatically. It will have to know the baseline version and will have to retain knowledge of the schema from that version so that the diffs can be used to emit the proper script output. There is no sense in leaving this to customers, since a manual scripting process is error prone AND it is easy to end up with different schemata for different installation paths, which must never happen.

#4 Updated by Roger Borrello about 2 months ago

On my customer, and in hotel_gui, I have a archive.incremental task in build.xml which attempts to collect all the artifacts that are required to perform an incremental conversion. Just a mention that these artifacts could be retrieved from an archive like this, and used to generate the delta per your specifications.

#5 Updated by Ovidiu Maxiniuc about 1 month ago

I would like to add a few thoughts here. We need to analyse all kind of mutations which can happen during an upgrade from schema 1.0 to 1.1.
  • the simplest case is when the a new table is added. The CREATE statement for it is just added to DDL 1.0;
  • next: a new field is added. Similar, an ALTER TABLE ADD COLUMN is generated; In this case a default value should be provided. We cannot assume the INITIAL value or NULL are valid for already existing records;
  • similar for indices;
  • the cases when a table or a filed is deleted is pretty simple as well;
  • things get more complicated if a table/ field is renamed. Dropping and recreating might not be the correct choice since there may be data stored in them, and we do no want to loose that;
  • the most complicated case, I think, is when the 4GL team chooses to change the type of a field. Usually, it can happen that an int is widened to a int64 or a date to datetime or datetime-tz, but sometime a logical get changed to int or any other type is changed to character. These are not trivial operations, a full column must be updated and for this additional parameters are necessary: case-sensitivity, time-zone, decimal part. In case a field is morphed to datetime-tz, the additional offset column must also be injected.

of course, this is not a complete list. I am sure we will encounter a lot of other cases which will require solutions as well.

#6 Updated by Greg Shah about 1 month ago

things get more complicated if a table/ field is renamed. Dropping and recreating might not be the correct choice since there may be data stored in them, and we do no want to loose that;

We may not need to support this. I think it is pretty specialized and probably not easily done in OE either. If this is not easily done in OE, then we can ignore it and assume that the customer will write code to do any such transition.

the most complicated case, I think, is when the 4GL team chooses to change the type of a field. Usually, it can happen that an int is widened to a int64 or a date to datetime or datetime-tz, but sometime a logical get changed to int or any other type is changed to character. These are not trivial operations, a full column must be updated and for this additional parameters are necessary: case-sensitivity, time-zone, decimal part. In case a field is morphed to datetime-tz, the additional offset column must also be injected.

I don't think we need to support this. I'm pretty sure OE doesn't make this easy.

#7 Updated by André van Hulzen about 1 month ago

We keep changing our database schema. We basically copied the mechanism we used in OpenEdge. It works like this:

  • We use schema version numbers and we have incremental schema change files for these version. So schema version 300 has an OpenEdge schema change file 300.df and a FWD schema change file DB00300.sql. These schema change files contain the schema code to get from schema version 299 to 300.
  • We record the current schema version number in a record in the database.
  • At every startup we check if there are schema change files with a higher number than the recorded version number in the database and we apply them to the database.
  • After every applied schema change file we record the schema version in the database. So if something goes wrong in the next one for some reason, the database contains the correct schema version number.
  • We create the schema change files by hand. We do not create them automatically. That's was pretty complicated.
  • We do have tools that can be run on a developers machine to compare the schema of a full OE conversion with the schema of FWD application with all the current incremental schema files applied. Both resulting schema files should be exactly the same! Even the slightest difference should be solved.
  • The downside of this is: it is a manual process. Sometimes people forget to do the comparison. However most of the time when something goes wrong they are in a hurry, get overconfident and think they can skip the comparison. This why we have Jenkins jobs running doing an extra check at least once a day. If these jobs find something is wrong, it is published on Slack.

If you need more information like where to find all this in our code, please let me know.

On the discussion above about what to support and what not: developers do anything the (OE) code let's them do, just because they can. I am a developer.

Also available in: Atom PDF