Project

General

Profile

Bug #7924

add support for creating an empty database

Added by Alexandru Lungu 9 months ago. Updated 8 days ago.

Status:
Internal Test
Priority:
High
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:
version_reported:
version_resolved:

7924-20231114.patch Magnifier (4.67 KB) Dănuț Filimon, 11/14/2023 07:07 AM


Related issues

Related to Database - Bug #8883: ensure FWD data import always creates indexes even if no dump file exists for a table New

History

#1 Updated by Alexandru Lungu 9 months ago

  • Assignee set to Dănuț Filimon
  • Status changed from New to WIP

A summary from a customer task - mostly from Eric:

  • Empty database: A database with a fully working schema (including UDFs), but no pre-existing data (other than perhaps converted _User records, if needed)
  • We can create the tables using the generated ddl/schema_table_<name>_<dialect>.sql
  • Indices will be missing, since currently, it is the import process which creates these (we use unique indices during the import to detect duplicate records; however, non-unique indices are not created until after the import, to allow bulk inserts to execute as quickly as possible).
  • I don't think the p2j_id_generator_sequence will be created without the import. This sequence is created after all records have been inserted.
  • If there needs to be authentication of one or more users, we will need those user records imported into the meta_user table.
  • Two ways of enabling this:
    • development environment: ant task named create.empty.db

Problems:

1. From Ovidiu:

The indices are created during the import of each table. If there is no dump file, the table is not touched. If the .d file exists but is empty (its size is 0), it is listed in import log but actually ignored.

The workaround for having the indices created is to have .d files for each table without any records, but just the PCS footer. It would look like this, for all files:

.
PSC
records=0000000000000
timestamp=2016/12/07-14:55:06
numformat=44,46
dateformat=dmy-1950
cpstream=ISO8859-1
.
0000000000

Running the import for this set of files will create all indices. There is another benefit here: these tables will look pristine, so a subsequent import process will allow the tables to be populated, if desired.

Of course, the ddl/schema_index_<name>_<dialect>.sql are available anyway and can be used at the same time with ddl/schema_table_<name>_<dialect>.sql

2. From Ovidiu:

A important issue: while doing some researches in this area, I noted that we have a small difference in index naming. For example, the import utility will attempt to delete and then recreate idx__meta_user_idx__userid while the static conversion will generate the same index with idx__meta_user__userid. Notice the extra _idx token. While working 'normally' (aka running an import without creating the indices) this is not a problem, there is a couple of problems if the indices are created before the import is performed:

* the import cannot delete the existing indices and the performance is penalized at import (because the kept indices are updated);
* after the table is imported, a duplicate index is created with unpredictable results (beside the extra used space).

Danut, please investigate the concern here. I tried to summarize the issue here, but maybe there are still questions left from your part. Please use these days to investigate and start implementing. We need a solution for this in a reasonable time.

Eric / Ovidiu, please feel free to complete with whatever you feel it is relevant here.

#2 Updated by Greg Shah 9 months ago

Please note that the ant task is a development convenience. When deploying to production environments, there will be no conversion project installed. Since there will be no ant task to run, we must have a clean/easy way to implement an empty database that can be run directly from FWD itself.

I'm fine with adding an ant task, but it should just run the simple process in FWD itself.

See #4722 for similar problems that we solved recently for non-empty database imports.

#3 Updated by Alexandru Lungu 9 months ago

Greg, thank you for the heads-up. This exact same thing I wanted to mention, but I missed it. Please note my "Two ways of enabling this", but there is only one way in fact :) The second one was meant to be done just like you state in #7924-2.

#5 Updated by Alexandru Lungu 8 months ago

  • Priority changed from Normal to High

Danut, mind that I will move this in High Priority. Even if it is part of the Database thread, it is also a feature required by a customer in a quite reasonable time.

#6 Updated by Dănuț Filimon 8 months ago

I started working on this task, should the import.sh added in trunk/rev.14773 be extended for this case or a new create_empty.sh be created? There is a lot of useful functionality in import.sh that can be used for creating an empty database.

#7 Updated by Greg Shah 8 months ago

It is better to augment import.sh than to add a new script.

Most importantly: import.sh is meant to be just a wrapper around features that alredy exist in FWD. We want all the "hard stuff" to be in FWD.

#8 Updated by Dănuț Filimon 8 months ago

  • % Done changed from 0 to 50

I extended import.sh to support creation of an empty database. I've committed 7924a/rev.14790, the changes consist of adding a new -e option to import.sh that sets the new empty_import_only variable. The new variable is used in the final import command and is a parameter for schema/import. This parameter is used to initialize ImportWorker.empty (default is false) and this property is used to toggle record processing and table analysis. It is important to note that indexes and sequences are still generated, including p2j_id_generator_sequence (since the value of the imported record doesn't influence if the table is added to the complete set, the dmo is always added when importAsync is called). There's still some investigation that needs to be done on the second issue mentioned by Ovidiu in #7924-1 and the _User records which should be imported (currently no table is imported).

The import "skipping" is done for each table so that only the record processing is skipped (this should allow some degree of flexibility if we want to allow certain tables to be imported e.g _User), the index creation before and after the processing will be done. The table analysis is skipped since there are no records in the table.

#9 Updated by Dănuț Filimon 8 months ago

Fixed a NPE at conversion time in 7924a/rev.14791.

#10 Updated by Dănuț Filimon 8 months ago

The additional idx__ is added when instantiating the index definition in schema/import using data.formatWithEscapedParameter("idx__%s", text). When indexes are created in ImportWorker$Library.createIndexes, they reach Dialect.getCreateIndexString which calls Dialect.getSqlIndexName. H2 and Postgres add the idx__ prefix to the index, but other dialects (MariaDB) don't do so, the name of the index is simply returned.

#11 Updated by Ovidiu Maxiniuc 8 months ago

Please replace data.formatWithEscapedParameter("idx__%s", text) with the simple text. H2 and PostgreSQL will still add another suffix, as in generated DDLs.

More details.
For example idx__meta_user_idx__userid. There are two idx__ tokens here. The first will be kept (it is optional, and comes form the Dialect.getSqlIndexName()), but the second must go (it comes from import.xml). The final name should be idx__meta_user_userid, the same as static conversion generates.

Than please do a conversion and an import test using hotel_gui with h2, psql and mariadb. Check whether the index names are matching (those from ddl/ against those reported in import's log for each dialect).

#12 Updated by Dănuț Filimon 8 months ago

Ovidiu Maxiniuc wrote:

Please replace data.formatWithEscapedParameter("idx__%s", text) with the simple text. H2 and PostgreSQL will still add another suffix, as in generated DDLs.
Than please do a conversion and an import test using hotel_gui with h2, psql and mariadb. Check whether the index names are matching (those from ddl/ against those reported in import's log for each dialect).

This change was already applied together with the _user import, I've also tested conversion using H2 and PostgreSQL on Hotel_GUI yesterday and today I ran the import.sh script on both types of databases.

I currently have a problem with MariaDB, it's related to the following ant job create.db.mariadb: [exec] mariadb: /lib/x86_64-linux-gnu/libncurses.so.6: no version information available (required by mariadb). I recently patched ncurses since it helped me solve a problem with a customer application. After a new setup, it's still not working and it seems that 6.2 version is available, while the wiki specifies version 6.1.

Can I receive some help with this? I've modified the commands to use 6.1 but it wasn't helpful.

#13 Updated by Dănuț Filimon 8 months ago

Committed 7924a/rev.14792. Fixed the additional idx__ and modified the import to always process the records of the meta_user table, even if an empty database is being created.

The only testing left that needs to be done is with MariaDB.

#14 Updated by Ovidiu Maxiniuc 8 months ago

I tested the import using both ant script and fwd's import.sh.

The project's build_db.xml needed some changes and I added them to hotel_gui in rev 94.

The import.sh performed o for empty database (-e) after replacing the mariadb-admin with mysqladmin. I guess this is debatable. However, the normal import failed with some errors like these:

_user.d: SQL: create index user_domain_name on meta_user (domain_name__null,domain_name, sql_only_user__null,sql_only_user, userid__null,userid, recid)
Error: 1072-42000: Key column 'sql_only_user__null' doesn't exist in table
checkout-rooms.d: SQL: create index idx_rc_room on checkout_rooms (room_number__null,room_number, recid)
_user.d: Error processing import data; Error: 1072-42000: Key column 'room_number__null' doesn't exist in table
1 of 1 record(s) successfully processed;  1 record(s) uncommitted due to this error;  0 record(s) dropped.
checkout-rooms.d: Error processing import data; 2 of 2 record(s) successfully processed;  2 record(s) uncommitted due to this error;  0 record(s) dropped.

This is because there seems to be a regression in generating the table DDL for the strict variant of the dialect.

OTOH, I was not able to test the lenient variant because it is not supported by the script. Here is how the script was executed from the project root:

p2j/import.sh -ddata/dump -hlocalhost -p3306 -fp2j/build/lib -abuild/lib/hotel.jar  \
              --db_type=mariadblenient --db_adminid=fwd_admin --db_adminpw=fwd_password \
              --db_userid=fwd_user --db_userpw=fwd_pasword --db_names=hotel

#15 Updated by Dănuț Filimon 8 months ago

Ovidiu Maxiniuc wrote:

I tested the import using both ant script and fwd's import.sh.

The import.sh performed o for empty database (-e) after replacing the mariadb-admin with mysqladmin. I guess this is debatable. However, the normal import failed with some errors like these:
[...]

This is because there seems to be a regression in generating the table DDL for the strict variant of the dialect.
[...]

It seems that MariaDB has a problem that is not related to the changes made here. As you mentioned, ddls contain incorrect named indexes and I can confirm the same after converting the project with a newer version of trunk. For the index sql_only_user__null, __null should probably not exist, correct me if I am wrong.

I've managed to solve the problem with MariaDB after upgrading my OS version, but I am still having issues when trying to connect to the database.

#16 Updated by Dănuț Filimon 8 months ago

... but I am still having issues when trying to connect to the database.

I overlooked the fact that I need to change the port in build.properties. There are no more problems and I am able to replicate the issues mentioned by Ovidiu when importing using MariaDB.

#17 Updated by Ovidiu Maxiniuc 8 months ago

Dănuț Filimon wrote:

It seems that MariaDB has a problem that is not related to the changes made here. As you mentioned, ddls contain incorrect named indexes and I can confirm the same after converting the project with a newer version of trunk. For the index sql_only_user__null, __null should probably not exist, correct me if I am wrong.

In fact, the sql_only_user__null is specific to strict variant of MariaDb. It is a computed column which is used to emulate the order of null elements in indices and sorting because this dialect does not feature a nulls last / nulls first option. So instead of having domain_name as index component, we add domain_name__null,domain_name pair instead. The semantic of this extra column is, as you expect, to test whether its pair the null and force the sorting order accordingly.
This is (one of) the reason we created the lenient variant, where the indices and queries are simplified by ignoring this this aspect. Note that this additional SQL column is created only for indexed columns of the non-lenient mariadb dialect.

However, it is a bit strange. I am looking at the DDLs generated for hotel_gui project and they are generated only for the meta_user table definitions, although the fields in other tables are recognized as indexed column and the index definitions are correctly defined. As noted in my previous note, this is the cause of the failure: the indices attempts to use these computed columns but they are not defined.

#18 Updated by Dănuț Filimon 8 months ago

Ovidiu Maxiniuc wrote:

As noted in my previous note, this is the cause of the failure: the indices attempts to use these computed columns but they are not defined.

I've managed to look over the import issue and found the following case where an undefined computed column is used:
  • The table has the following field defined: addr_id integer not null, the computed column is not generated because it is not a character type. During ddl generation, it reaches Dialect.getCreateIndexString which calls the MariaDbDialect specific method orderByNulls where an index is created on (addr_id_null, addr_id). The field should not exist since the property is defined as not null so the table definition is correct, but the index is created without knowing that the index field can't be null.

#19 Updated by Dănuț Filimon 8 months ago

  • File 7924-20231114.patchMagnifier added
  • % Done changed from 50 to 70
  • Status changed from WIP to Review

I attached a patch that solves the issue. Ovidiu, can you retest MariaDB and tell me if there are any more problems when using it? You'll have to reconvert the application to generate the new ddls.

#20 Updated by Ovidiu Maxiniuc 5 months ago

Dănuț Filimon wrote:

Ovidiu Maxiniuc wrote:

As noted in my previous note, this is the cause of the failure: the indices attempts to use these computed columns but they are not defined.

I've managed to look over the import issue and found the following case where an undefined computed column is used:
  • The table has the following field defined: addr_id integer not null, the computed column is not generated because it is not a character type. During ddl generation, it reaches Dialect.getCreateIndexString which calls the MariaDbDialect specific method orderByNulls where an index is created on (addr_id_null, addr_id). The field should not exist since the property is defined as not null so the table definition is correct, but the index is created without knowing that the index field can't be null.

Indeed, the addr_id integer not null column should not create (addr_id_null, addr_id) pair in the index definition. However, the reason is not because its type is character but because it is mandatory, Ie, not null. (Since it is defined as mandatory / not null by SQL constraints, we do not need to add this overhead to index definition). But we do need to have these generated columns in case of other types of nullable datatypes which are index components.

#21 Updated by Dănuț Filimon 5 months ago

  • Status changed from Review to WIP

Ovidiu Maxiniuc wrote:

Indeed, the addr_id integer not null column should not create (addr_id_null, addr_id) pair in the index definition. However, the reason is not because its type is character but because it is mandatory, Ie, not null. (Since it is defined as mandatory / not null by SQL constraints, we do not need to add this overhead to index definition). But we do need to have these generated columns in case of other types of nullable datatypes which are index components.

I've made an assumption based on the field types that were allowing this type of index definition. In this case, we should mark fields as mandatory so that we can create the correct index definition. I see that P2JField has a mandatory class member variable, maybe I can start from there.

#22 Updated by Dănuț Filimon 5 months ago

I've looked into DDLGeneratorWorker and how we can make use of the mandatory and while debugging the conversion I noted that the P2JField for the table are created after the index so we can't simply get the P2JField instance corresponding to the index component when it is created. And another concern is adding an additional field to the P2JIndex.addComponent() requires multiple changes to methods that eventually use it and in some cases, the mandatory value might not be available.

Is the order index -> index component -> fields important? The rules specify this order, but I did not did upwards to actually look into what rules are used. Even if the order is not necessary, the second concern is still a problem.

#23 Updated by Greg Shah 5 months ago

The order in the ruleset does not matter too much (in this case), but the structure of the tree will drive the rules in that order regardless of changing the sequence/order of the rules in the ruleset. In other words, if the tree is structured such that an index contains index components which contain fields, then normal walk rules will execute at the index node before the contined index component nodes (and so forth).

#24 Updated by Ovidiu Maxiniuc 5 months ago

I think you can access the not-null annotation when configuring the index components.

  • in index components are added using ddl.addIndexComponent() (rules/schema/dmo_common.rules:1024);
  • the added elements are INDEX_COL nodes. They have a refid annotation;
  • the refid annotation is already used (line 1027) to obtain the PROPERTY node which have the not-null annotation;
  • use this annotation as an additional parameter to addIndexComponent() and store it in P2JIndexComponent as a new flag;
  • when the Dialect method is invoked pass the value of the flag;

Note you should also take care of dynamic case. When ready, compare the index and table schema sql files to see whether they match.

#25 Updated by Dănuț Filimon 5 months ago

Ovidiu Maxiniuc wrote:

I think you can access the not-null annotation when configuring the index components.

  • in index components are added using ddl.addIndexComponent() (rules/schema/dmo_common.rules:1024);
  • the added elements are INDEX_COL nodes. They have a refid annotation;
  • the refid annotation is already used (line 1027) to obtain the PROPERTY node which have the not-null annotation;
  • use this annotation as an additional parameter to addIndexComponent() and store it in P2JIndexComponent as a new flag;
  • when the Dialect method is invoked pass the value of the flag;

Note you should also take care of dynamic case. When ready, compare the index and table schema sql files to see whether they match.

This is perfect. I wanted to know initially if we can get the fields before the index since we could just look into the table fields when the index component is created in addIndexComponent, but adding the parameter to addIndexComponent is better.

I will go ahead with this implementation.

#26 Updated by Dănuț Filimon 5 months ago

The changes work well, I tested conversion with hotel_gui and compared the index ddls for mariadb (before and after). When comparing the indices, I confirmed that __null is not added to fields that do not have not null in the table definition. At the same time, I ended up making more changes to the rules and fwd methods since both DDLGeneratorWorker.addIndexComponent and P2JIndex.addComponent are used in the rules and in both cases the not-null property is available for use. I'm planning to clear up the implementation and see if I can adjust anything. The default value for not-null is false, for cases where the mandatory value can't be found (there is no Property, P2JIndexComponent available, e.g. IndexMetadataCollector.retrieveIndexData).

#27 Updated by Dănuț Filimon 5 months ago

  • % Done changed from 70 to 90
  • Status changed from WIP to Review

Committed 7924a/rev.14793.

Based on the error from #7924-14 I noticed that the column does not exist because it is a boolean and the field for the table is not generated in DDLGeneratorWorker.addFieldImpl because it was not a character. At the same time, __null columns are generated for not null fields which are part of indices.

  • Mandatory character properties that are part of indices do not generate a second index component with __null;
  • Added a MANDATORY flag for index components;
  • Changes rules to fit the addition of the not-null value;

I am also looking into removing mandatory fields that are augmented with __null from the table schema ddl.

Please review.

#28 Updated by Ovidiu Maxiniuc 5 months ago

Review of 7924a/rev.14793.
The code seems fine, except two issues:

1. In case of nullable fields, why injection of generated <field>_null (MariaDbDialect:191) is conditioned by the type of field being a character?

For example, for the table:

ADD TABLE "pt1" 
  AREA "Schema Area" 
  DUMP-NAME "pt1" 

ADD FIELD "f1" OF "pt1" AS integer 
  FORMAT "->,>>>,>>9" 
  INITIAL "0" 
  POSITION 2
  MAX-WIDTH 4
  ORDER 10

ADD FIELD "f3" OF "pt1" AS integer 
  FORMAT "->,>>>,>>9" 
  INITIAL "0" 
  POSITION 4
  MAX-WIDTH 4
  ORDER 10

ADD INDEX "idx3" ON "pt3" 
  AREA "Schema Area" 
  UNIQUE
  PRIMARY
  INDEX-FIELD "f3" ASCENDING
the expected DDLs are as follows:
create table pt1 (
   recid bigint not null,
   f1 integer,
   f3 integer,
   f3__null boolean generated always as (f3 is null) virtual,
   primary key (recid)
);
drop index if exists idx3 on pt1;
create unique index idx3 on pt1 (f3__null,f3);

The f3__null is created/added as an index component regardless of f3's type (in this case it is an integer, not a character).

2. In case of MariaDbDialect, am not sure the proper order by clause is assembled in order to force/trick the planner to use the desired index.

I will do a smoke-test myself after this issues are fixed.

#29 Updated by Dănuț Filimon 5 months ago

Ovidiu, the problem does not start from the index, but from the table field. If you take a look at DDLGeneratorWorker.addFieldImpl you will notice that the additional field is created when isChar && dialect.injectComputedColumns() && isIndexComponent(field, table) condition is true. There are no other types that generate it aside from character so the method MariaDbDialect.getComputedColumnString is not called.

In fact, the f3__null field will not be generated. This is also backed by the error you mentioned in #7924-14 where the sql_only_user is a boolean field and sql_only_user__null is not generated. Let me know if f3__null should actually be generated since this is originally not done so. If this is the case, I can have this fixed immediately.

#30 Updated by Ovidiu Maxiniuc 5 months ago

  • % Done changed from 90 to 70

Dănuț,

sql_only_user__null must be generated.

This thing seemed familiar to me. I did a bit of research and found out why. I was working on a non-stock FWD, which includes the 8009a. This branch has been frozen last 2 months because the main issue in #8009 was identified at other level. Nonetheless, I think the code in that branch should go into trunk, and this might be the chance. I am talking about changes in DDLGeneratorWorker. Notice there were two methods for adding fields, with slightly different implementations. The character type you mention above was only checked if the filed was the last component of the index. Of course this is not correct.

#31 Updated by Dănuț Filimon 5 months ago

  • % Done changed from 70 to 100

Issue has been fixed in 7924a.rev.14794. All field types can be augmented with __null for MariaDbDialect when they are part of an index, but only when they are not mandatory. At the same time I expanded the import.sh to handle MariaDb lenient version with --db_type=mariadblenient, I noticed it when you tried to use it in #7924-14 and I've been trying to come back to it.

This should include all functionality for creating an empty database for each dialect which was the purpose of this issue.

#32 Updated by Constantin Asofiei 16 days ago

  • Related to Bug #8883: ensure FWD data import always creates indexes even if no dump file exists for a table added

#33 Updated by Dănuț Filimon 12 days ago

Rebased 7924a to latest trunk/rev.15306, now the branch is at revision 15311.

#34 Updated by Greg Shah 12 days ago

Ovidiu: Please review.

#35 Updated by Ovidiu Maxiniuc 11 days ago

Review of 7924a, rev.15307-15311.

Good job!
The majority of the changes are required for carrying along the mandatory flag needed by the MariaDB non lenient dialect when the indexes are re-created.
I saw no problem in the new code except for:

  • MariaDbDialect.java:
    • the value returned by orderByNulls() does not look correct. +1 should be added only if (!mandatory) was processed;

#36 Updated by Dănuț Filimon 8 days ago

  • Status changed from Review to Internal Test

Ovidiu Maxiniuc wrote:

Review of 7924a, rev.15307-15311.
I saw no problem in the new code except for:

  • MariaDbDialect.java:
    • the value returned by orderByNulls() does not look correct. +1 should be added only if (!mandatory) was processed;

Fixed in 7924a/rev.15312.

Moving to Internal Test.

Also available in: Atom PDF