Project

General

Profile

Bug #7454

Make ValueStringIgnoreCase the default generated value for setString in FWD-H2

Added by Alexandru Lungu 11 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:
version:

Related issues

Related to Database - Bug #7452: Analyze memory of the value cache from FWD-H2 and eventually increase size WIP

History

#1 Updated by Alexandru Lungu 11 months ago

This task is related to setString parameter setting in FWD-H2. Most of the strings we are using with this option are for case-insensitive fields (at least in _temp). We can design a FWD-H2 option OPTIMIZE_IGNORE_CASE that forces ValueStringIgnoreCase to be created on setString instead of the default ValueString. This way we avoid extra internal casts from ValueString to ValueStringIgnoreCase.

I expect to see this being implemented synchronously in FWD (add option for _temp) and FWD-H2 (add support for the option).
At this point I am not 100% sure that such change is safe, so this needs extensive testing and profiling.

#2 Updated by Alexandru Lungu 11 months ago

  • Related to Bug #7452: Analyze memory of the value cache from FWD-H2 and eventually increase size added

#3 Updated by Alexandru Lungu 11 months ago

  • Status changed from New to WIP
  • Assignee set to Dănuț Filimon

Danut, I created 7454a_h2 for this task. The change is not complex, but confirming this is the right thing to do is hard. I will need your expertise on feature testing and profiling on this one.

#4 Updated by Dănuț Filimon 11 months ago

Committed 7454a_h2/rev.24 which adds the OPTIMIZE_IGNORE_CASE database property, it is false by default.

I created a profiling test that consists of inserting a large number of records (1 million - 5 million) on a table with 5 to 40 fields where all fields are of type varchar_ignorecase:
Test type Default (ms) OPTIMIZE_IGNORE_CASE = true (ms) Difference (%)
1m records, 5 fields 4327.166 4178.833 -3.43
5m records, 5 fields 25465 25118.833 -1.36
1m records, 10 fields 7880.5 7509 -4.71
1m records, 20 fields 15294.66 14622.833 -4.39
1m records, 40 fields 31040 28919.5 -6.83

The average improvement is of 4%.

It might not be a problem to add this as a feature, most of the FWD-H2 tests pass with a few exceptions that I still need to look further into:
  • TestRtrim: there were previous problems with mismatching results between 4GL and FWD, this change might solve/create more problems;
  • TestMetaData: A test which matches a the column type of the values from the ResultSet and VARCHAR type directly. It can be ignored, all original tests will not use OPTIMIZE_IGNORE_CASE;
  • TestPreparedStatement: calls toString() on a PreparedStatement after setting a string parameter. It's the same situation encountered previously.

Overall it is great that most of the tests are not greatly impacted by this change, I will continue by creating a test file for this property and investigating TestRtrim results.

#5 Updated by Alexandru Lungu 11 months ago

  • % Done changed from 0 to 100
  • Status changed from WIP to Review

#6 Updated by Alexandru Lungu 11 months ago

Review of 7454a_h2/rev.24

  • I am OK with the changes. Good job!

I will start a testing + profiling round and get back to you when ready.

Danut, please fix the regression tests and do the same testing on a large customer application. Please get a bit deep down with the investigation to check that there are no performance issues using the debugger (no extra casts, etc.). Also, mind that we shall add OPTIMIZE_IGNORE_CASE in FWD. You shall test this locally. We will "officially" commit OPTIMIZE_IGNORE_CASE to FWD when FWD-H2 changes are completely tested.

#7 Updated by Dănuț Filimon 11 months ago

I looked into the rtrim test and the failed assertion was caused by something that should've been expected. If you create a table with a varchar_casesensitive field and insert the value 'A', you can do 2 prepared statements where you can try to select the inserted value by trying 'A' and then 'a'. In this case, since the field is case-sensitive, searching for 'a' will result into an empty ResultSet. Using OPTIMIZE_IGNORE_CASE will cause the 'a' to be ValueStringIgnoreCase and at the end, the ResultSet will contain the inserted row.

I am currently working on implementing a solution for this problem, the Value should be aware of the OPTIMIZE_IGNORE_CASE property when deciding on the higher order type which is used when casting two values of different types.

#8 Updated by Dănuț Filimon 10 months ago

The problem mentioned in #7454-7 is almost solved, all there is to do is to test it a bit more, I even managed to write a TestOptimizeIgnoreCase class where I added most of the troublesome statements that failed with the new property. When a comparison needs to be made between a ValueString and a ValueStringIgnoreCase, and OPTIMIZE_IGNORE_CASE is set to TRUE, then ValueString will have a higher order and the comparison will be done between ValueString instances. After this change a lot more tests failed and I took them one by one and analyzed them, here's a summary of my findings:
  • OPTIMIZE_IGNORE_CASE=TRUE should always be paired with the RTRIM=TRUE property to avoid spaces making a difference in a comparison between ValueString and ValueStringFixed.
  • There are some cases when the H2 behavior changed, but from the 4GL perspective it is correct.

#9 Updated by Dănuț Filimon 10 months ago

Committed 7454a_h2/rev.25.
  • Renamed OPTIMIZE_IGNORE_CASE to DEFAULT_IGNORE_CASE
  • Fixed test cases to handle this property by default
  • Fixed a regression caused by running similar statements that should return rows
  • Fixed individual TestRtrim test case (unrelated to current issue)
  • Added TestDefaultIgnoreCase test case

Please review.

EDIT: I forgot to bzr add the test case.

#10 Updated by Alexandru Lungu 10 months ago

Review

  • In Query, the comparison is not correct. If a and b don't have the same value type, it doesn't mean we should return false. Note that areEqual does implicit cast so it may succeed in comparing values of different types. Also, if you choose to handle the STRING_IGNORECASE in a particular manner, consider both return true and return false cases. In your implementation, this the values are the same, you fallback to areEqual; I guess you can short0circuit this to continue.
  • Is is safe to use false as defaultIgnoreCase in Value.getHigherType? This seems quite risky.

I didn't want to get into the tests as there are a lot. I just want to make sure your cover enough cases. Please consider:

  • IGNORECASE / CASESENSITIVE table field; this field set in an index
  • setString as parameter for:
    • value to be inserted in the table
    • value to be updated to the table
    • WHERE clause
      • make sure the WHERE uses =, < and > over the indexed field
  • Try simple where expressions ('a' = 'A') - static and dynamic (? = 'A', 'a' = ?, etc.)
  • Try to use joins over the character field; something like join tt2 on tt2.f2 = tt1.f1, where f1 and f2 can be of any combination of IGNORECASE / CASESENSITIVE
  • Try ? = ? with setString ignore-case

As a performance check, please use your FWD case-sensitive, case-insensitive you used when implementing RTRIM. Extend them into performance tests. Compare them with/without DEFAULT_IGNORE_CASE option.

I am really excited about these changes; having H2 work with string cases just like 4GL looks like a big deal.

#11 Updated by Dănuț Filimon 10 months ago

Committed 7454a_h2/rev.26. I added the missing test file and fixed the condition in Query as specified in #7454-10.
Alexandru Lungu worte:

Is is safe to use false as defaultIgnoreCase in Value.getHigherType? This seems quite risky.

I was thinking that it may cause problems if true was used instead, but I was wrong. Now I changed it to use true.

As a performance check, please use your FWD case-sensitive, case-insensitive you used when implementing RTRIM. Extend them into performance tests. Compare them with/without DEFAULT_IGNORE_CASE option.

Still have some work to do on the performance test.

#12 Updated by Dănuț Filimon 10 months ago

Before talking about the results of the performance tests, here is a list of short terms that I will use and their meanings:
  • CI-CS - case insensitive database field and case sensitive variable
  • CS-CS - case sensitive database field and case sensitive variable
  • CI-CI - case insensitive database field and case insensitive variable
  • CS-CI - case sensitive database field and case insensitive variable
  • Case-sensitivity - the name of the test that handles simple string comparisons
  • Rtrim - test that handles string comparison with multiple spaces at the end of the string
  • Ltrim - test that handles string comparison with multiple leading spaces in a string
  • Space - test that handles the comparison of empty strings/strings only containing spaces

7454a_h2/rev.26 (default) means that DEFAULT_IGNORE_CASE property is not used, which defaults to false.
7454a_h2/rev.26 (true) means that DEFAULT_IGNORE_CASE property is used and it is set to true.

I finished profiling the changes, here are the results (measured in ):
CI-CS CS-CS CI-CI CS-CI
Branch Case-sensitivity Rtrim Ltrim Space Case-sensitivity Rtrim Ltrim Space Case-sensitivity Rtrim Ltrim Space Case-sensitivity Rtrim Ltrim Space
7454a_h2/rev.23 3598.4 2246.8 1740.4 2353.6 1504.8 1936.8 1655.2 2286 1488.2 1967.2 1696.8 2260.4 1480.2 1924 1639.2 2232.2
7454a_h2/rev.26 (default) 3472.4 2167.8 1769.6 2326.2 1516.8 1889 1635.8 2239.2 1512.6 1943 1652.8 2235 1443.8 1880.2 1618.2 2179.8
7454a_h2/rev.26 (true) 3338.2 2148.2 1684.4 2350.4 1528.4 1957.8 1668.4 2268.8 1451.4 1877.2 1619.8 2187 1433.6 1921.8 1620 2226
From the results above, the performance improvement is:
CI-CS CS-CS CI-CI CS-CI
Branch Case-sensitivity Rtrim Ltrim Space Case-sensitivity Rtrim Ltrim Space Case-sensitivity Rtrim Ltrim Space Case-sensitivity Rtrim Ltrim Space
7454a_h2/rev.26 (default) -3.501% -3.516% +1.677% -1.164% +0.797% -2.467% -1.172% -2.047% +1.639% -1.23% -2.593% -1.123% -2.459% -2.276% -1.281% -2.347%
7454a_h2/rev.26 (true) -7.230% -4.388% -3.217% -0.135% +1.568% +1.084% +0.797% -0.752% -2.472% -4.575% -4.537% -3.247% -3.148% -0.114% -1.171% -0.277%

The tests consisted of creating temporary tables with 50k records and running checking if any match is found (the test will pass if any record is found). I also managed to extend the tests to not only compare simple strings, but strings with different amount of leading/trailing spaces and so on.

Even if I increased the numbers of records created from 10k to 50k already, the time spent executing the test is still low so I calculated the results as such. There is an overall improvement when using the latest version of 7454a_h2 and I think that the results are pretty good.

#13 Updated by Alexandru Lungu 10 months ago

I found a regression while testing. I attempted a fix in 7454a_h2 / rev. 27. This has a wide impact on direct-access, as the driver used DataType.convertToValue to convert from String to H2 internal string representation. As it was defaulting to ValueString, the comparisons were case-sensitive.

Danut, please review. If you have other suggestions, please advice.

#14 Updated by Alexandru Lungu 10 months ago

Also, Danut, please make the flag naming consistent with the option. Right now, the option is DEFAULT_IGNORE_CASE and the option is OPTIMIZE_IGNORE_CASE.

EDIT: Run a large customer application and put a break-point in ValueString.get. Check if the call sites are OK with using ValueString, instead of ValueStringIgnoreCase. We should lower the uses of ValueString as much as we can when DEFAULT_IGNORE_CASE is set (close to no instance of ValueString).

#15 Updated by Dănuț Filimon 10 months ago

Alexandru Lungu wrote:

I found a regression while testing. I attempted a fix in 7454a_h2 / rev. 27. This has a wide impact on direct-access, as the driver used DataType.convertToValue to convert from String to H2 internal string representation. As it was defaulting to ValueString, the comparisons were case-sensitive.

Danut, please review. If you have other suggestions, please advice.

The fix is correct. I've retrieved DEFAULT_IGNORE_CASE value in a similar manner in JdbcConnection. I've done a quick test just to be sure and it works properly.

Also, Danut, please make the flag naming consistent with the option. Right now, the option is DEFAULT_IGNORE_CASE and the option is OPTIMIZE_IGNORE_CASE.

This change was committed to 7454a_h2/rev.28.

#16 Updated by Alexandru Lungu 10 months ago

  • Status changed from Review to WIP
  • % Done changed from 100 to 70

Great!

I've added in rev. 29 the possibility to make sequences temporary. I just noticed that there are several attempts to flush the sequence for the other users, even if in per-user no-lock temporary H2 database. Therefore, this will be a separate performance increase from this FWD-H2 functionality.

I analyzed where ValueString is still used and there is still some concerns:
  • Character literals are ValueString by default. To be compatible with 4GL, we need character literals to be case insensitive (Parser.read() CHAR_STRING case). Basically, all clauses like tt.f1 = '' will attempt to convert tt.f1 to a case-sensitive value and compare. This is wrong both functionally and performance-wise.
  • We still emit upper clauses from FWD. Thus, I often encounter upper(tt.f1) = 'A'. upper is defined as a function which requires a case-sensitive string and returns a case-sensitive string:
    • Please test with 4GL UPPER function and check its behavior. Try to replicate for FWD-H2. Do the same for the other build-in character functions in Function.
    • FWD itself generates from UDF inside FWD-H2. These "initializing DDL" should correctly honor the ignore-case priority implemented in #7454. These should be eventually refactored.
    • We shall get into FWD and eliminate upper clauses when using FWD-H2 dialect. From my POV, this should go down exactly like RTRIM (right from FQLPreprocessor). However, this can be done independently from the implementation of FWD-H2. Keeping upper is just an performance overhead for now, not a functional concern. I will make a separate task for this eventually.

I created 7454a for any related FWD work.

#17 Updated by Alexandru Lungu 10 months ago

Profiling shows -0.3% improvement. this is decent, but can be way better. I expect way more improvement after we get #7454 done. Note that there are still ~5.000 ValueString.get that are quite useless in my POC, mostly due to character literal interpretation as ValueString instead of ignore-case. Also, the test was done without my temporary sequences change (I expect some extra boost from there).

#18 Updated by Dănuț Filimon 10 months ago

I looked into #7454-16, the change to the Parser works properly since it has access to the database settings.

After testing a customer application, I noticed that there are instances of internal conversion to ValueString when working with meta records, but also found out that user-defined functions will force a conversion from ValueStringIgnoreCase to ValueString. Before the value of the user-defined function is returned, a DataType.getTypeFromClass call returns the String class which forces a conversion to Value.STRING. In the same class, we have getTypeClassName which returns the String class for Value.STRING, Value.STRING_FIXED and Value.STRING_IGNORECASE.

Should we consider changing it? This will require a parameter to be passed through each time a function is called because FunctionAlias/DataType do not store any information related to the database settings, but the session is provided when calling the function.

#19 Updated by Alexandru Lungu 10 months ago

Dănuț Filimon wrote:

After testing a customer application, I noticed that there are instances of internal conversion to ValueString when working with meta records

This may be exactly due to the sequences we use that are not temporary. I added the support for temporary sequences, but you will also need changes in FWD to enable them. Consider them as fixed for now.

but also found out that user-defined functions will force a conversion from ValueStringIgnoreCase to ValueString. Before the value of the user-defined function is returned, a DataType.getTypeFromClass returns the String class which forces a conversion to Value.STRING. In the same class, we have getTypeClassName which returns the String class for Value.STRING, Value.STRING_FIXED and Value.STRING_IGNORECASE.

getTypeFromClass should honor defaultIgnoreCase. That is, if you receive a Java STRING, then you should implicitly consider it is an ignore-case H2 String. Here a change is required, I agree.
getTypeClassName is fine - all 3 FWD-H2 String representations are backed by Java strings.

Should we consider changing it? This will require a parameter to be passed through each time a function is called because FunctionAlias/DataType do not store any information related to the database settings, but the session is provided when calling the function.

In DataType, you should pass down a CastDataProvider (either the session or database).

These are the first-level problems. However, you should go deeper into:
  • Function.getSimpleValue (at least for UPPER, but other string function as well).
  • H2Helper.getPrepareStatements, more exactly Functions, and check they reach the FWD-H2 engine right. By default, Java String parameters are ignore-case string parameters. However, you will need to redo the investigation to check if the UDF are working alright in 4GL and FWD.

Last topic is how case-sensitive characters types will work with FWD-H2 UDF. This is not a concern now, but should be thought of as our changes will regress for case-sensitive types. We will leave this at the end.

#20 Updated by Dănuț Filimon 10 months ago

Found an interesting example when experimenting with functions in H2.

Here is a simple table definition: create table test(name varchar check(name = upper(name)));

In the first experiment we have the following:
- a varchar field compared with it's uppercase version;
- jdbc:h2:mem:;mv_store=false; as the connection url;
- the UPPER function that returns Value.STRING_IGNORECASE;
- the higher order type between ValueString and ValueStringIgnoreCase will be ValueStringIgnoreCase;
The execution of INSERT INTO table1 (f1) VALUES(?) {1: "a"}; will pass and the value "a" will be inserted because the name value is casted to the higher order type, ValueStringIgnoreCase. If the UPPER function returned a Value.STRING, the same statement would result in a constraint violation.

The second experiment is using the same scenario, but with the connection url: jdbc:h2:mem:;mv_store=false;default_ignore_case=true;. Executing the same insert statement will result in a constraint violation because the higher order is ValueString.

I discovered this scenario when running FWD-H2 tests. Changing the functions return type breaks FWD-H2 tests, but from the 4GL perspective it is correct. This change is also not influenced by the default_ignore_case property, all the built-in functions are instantiated using a static block and access to the flag can be provided later through Database.

#21 Updated by Alexandru Lungu 10 months ago

Danut, the following statements contradict:
  • the UPPER function that returns Value.STRING_IGNORECASE;: this theoretically happens when default_ignore_case is set.
  • the higher order type between ValueString and ValueStringIgnoreCase will be ValueStringIgnoreCase;: this happens when default_ignore_case is false.

We can ignore this now. From my POV, I feel like UPPER should be polymorphic. If the target is case-sensitive, it should return ValueString. If the target is case-insensitive, it should return ValueIgnoreCase. Therefore, UPPER shouldn't depend on the flag, so your first statement is irrelevant.

Danut, I would like to have another version of 7454a today/tomorrow. I think H2 functions may be an impediment here:
  • make them polymorphic. We should provide support for case-sensitive UPPER even when default_ignore_case=true is set.
  • I am not sure if two sets of functions are right now. The old tests should work, as the UPPER will honor the underlying case. If the old tests verify a different semantic, they should be refactored.

As for the FWD UDF, they should be eagerly tested across the 4GL behavior. Good news are that we won't have unit tests regressions on these :)

#22 Updated by Dănuț Filimon 10 months ago

Alexandru Lungu wrote:

Danut, the following statements contradict:
  • the UPPER function that returns Value.STRING_IGNORECASE;: this theoretically happens when default_ignore_case is set.

Functions are not influenced by the default_ignore_case flag because it can't be retrieved when the functions are added in the static initialization block from Function. This was just a test to see how changing the return type of the method will affect the existent FWD-H2 tests.

  • the higher order type between ValueString and ValueStringIgnoreCase will be ValueStringIgnoreCase;: this happens when default_ignore_case is false.

default_ignore_case is false by default when it is not specified.

Danut, I would like to have another version of 7454a today/tomorrow. I think H2 functions may be an impediment here:
  • make them polymorphic. We should provide support for case-sensitive UPPER even when default_ignore_case=true is set.
  • I am not sure if two sets of functions are right now. The old tests should work, as the UPPER will honor the underlying case. If the old tests verify a different semantic, they should be refactored.

I am currently working on it and it's possible to get the function based on the database flag, but not the parameter type. Even if we use a function that returns a ValueStringIgnoreCase value, it will be casted to ValueString when compared to another value of the same type (only when the flag is set to true).

#23 Updated by Alexandru Lungu 10 months ago

My point was to do a bit of refactoring on Functions to allow both IGNORECASE and CASESENSITIVE as parameters for such functions. I agree that we can retrieve "the function based on the database flag", but we still can use case-sensitive characters even when the flag is set right. Think of _meta database where we have case-sensitive variables there. What will happen when you will call UPPER? I don't think casting then to ValueStringIgnoreCase is OK, just because the flag was set.

#24 Updated by Dănuț Filimon 10 months ago

Alexandru Lungu wrote:

My point was to do a bit of refactoring on Functions to allow both IGNORECASE and CASESENSITIVE as parameters for such functions. I agree that we can retrieve "the function based on the database flag", but we still can use case-sensitive characters even when the flag is set right. Think of _meta database where we have case-sensitive variables there. What will happen when you will call UPPER? I don't think casting then to ValueStringIgnoreCase is OK, just because the flag was set.

Thank you for explanation. I understand what you mean now.

I began writing a test suite for 4GL functions and found a few problems.
For the following example:

DEFINE TEMP-TABLE tmp2 FIELD f1 AS CHARACTER CASE-SENSITIVE.
CREATE tmp2. tmp2.f1 = 'a'.
FIND FIRST tmp2 WHERE tmp2.f1 EQ UPPER(tmp2.f1) NO-ERROR.
IF AVAILABLE tmp2 THEN MESSAGE "yes". // 4GL: No output, FWD: yes

The converted query looks like this:

silent(() -> new FindQuery(tmp2, "tmp2.f1 = upper(tmp2.f1)", null, "tmp2.recid asc").first());
and the final where clause is:
 where tmp2_1_1__0_._multiplex = ? and ((tmp2_1_1__0_.f1 = tmp2_1_1__0_.f1 or tmp2_1_1__0_.f1 is null and tmp2_1_1__0_.f1 is null)) 

Upon investigating I found out that this is a regression introduced in #7108. P2JH2Dialect handles case-insensitive compare operations for case-insensitive columns and automatically rtrims character columns, the UPPER call is removed from the fql in FQLPreprocessor.preprocess.

I found no other problems with the modifications to UPPER and LOWER functions in H2. SUBSTRING, LEFT-TRIM, REPLACE, RIGHT-TRIM and SUBSTRING are already handled by FWD so I decided to not modify them.

#25 Updated by Dănuț Filimon 10 months ago

Committed 7454a_h2/rev.30 and testcases/rev.2383.
  • Character literals were modified to be ValueStringIgnoreCase by default.
  • UPPER and LOWER functions will return a value with the same type as the argument that it was given.
  • Modified a few test cases in TestDefaultIgnoreCase, some of them had to be changes to fit the usage of character literals and added a new test for upper.
  • Created a test suite for character fields and 4GL functions comparison.

#26 Updated by Alexandru Lungu 9 months ago

  • % Done changed from 70 to 80

Regression tests passed.
Profiling tests shown -0.3% improvement (without temporary sequences). This is modest, but this will go higher once we fix the following review.

I still have a concern with indexOf function. It takes two strings as parameters. It attempts to find the index of the pattern into a string target. However, casting to case-sensitive is attempted. Please test this and fix. If 4GL is doing case-sensitive search when computing the index-of, we can leave it like this / optimize. If not, please investigate and fix.

I just noticed that there are several attempts to flush the sequence for the other users, even if in per-user no-lock temporary H2 database.

Note my rev. 29 commit. It introduced a TEMPORARY keyword for sequences, so they are no longer flushing. Anyway, you should change FWD to append this keyword at the end of H2 _temp generated sequences.
Also, I still have UPPER generated in some of the _temp queries. Is this going to be handled in this task or not? If no, please create another Database task to remove upper from the generated FWD-H2 queries.

Finally, please commit to 7454a any FWD changes required (including the use of TEMPORARY sequences).

#27 Updated by Dănuț Filimon 9 months ago

Alexandru Lungu wrote:

I still have a concern with indexOf function. It takes two strings as parameters. It attempts to find the index of the pattern into a string target. However, casting to case-sensitive is attempted. Please test this and fix. If 4GL is doing case-sensitive search when computing the index-of, we can leave it like this / optimize. If not, please investigate and fix.

This is a mistake on my end, previously in #7454-18 and #7454-19 we discussed about DataType.getTypeFromClass which returns Value.STRING type and should honor defaultIgnoreCase. I must've missed it after reverting some changes in H2 for a commit or didn't find a way to get the flag for this method since there was no CastDataProvider available (we can pass the session from FunctionAlias.getValue where DataType.getTypeFromClass is called).

Also, I still have UPPER generated in some of the _temp queries. Is this going to be handled in this task or not? If no, please create another Database task to remove upper from the generated FWD-H2 queries.

If I remember correctly, upper and rtrim methods were removed but reintroduced later in FQLPreprocessor.mainWalk(). I will check again to make sure and if that is the case, I will create a separate issue. Otherwise I will make this change along with the TEMPORARY keyword for sequences.

#28 Updated by Dănuț Filimon 9 months ago

I've looked into the problem and upper methods are not reintroduced later in FQLPreprocessor.mainWalk(). It was a particular case of simplifyProperties that was left out where existent upper methods that were the parameter of other functions would not be removed. Considering the recent changes in 7454a_h2, we can include this case now.

Committed 7454a/rev.14669. Solved the upper methods still being generated in _temp queries, introduced temporary sequences and added the DEFAULT_IGNORE_CASE property to in-memory temp-tables.

Committed 7454a_h2/rev.31. Added missing change to DataType.getTypeFromClass, now it honors DEFAULT_IGNORE_CASE property value.

#29 Updated by Alexandru Lungu 9 months ago

  • % Done changed from 80 to 100
  • Status changed from WIP to Review

Review of 7454a and 7454a_h2

  • I am OK with the changes.
  • Regression tests passed.
  • I am no longer hitting ValueString.get in warm runs. In server-startup, I see some ValueString.get being called at _meta database initialization, which is fine. In a cold run I have only some calls to ValueString.get due to an in operation (upper(tt.f1) in ('A', 'B')). These are still slow, but are working correctly and I intend to merge this asap, so it is not a priority to solve this now. Danut, please try to recreate upper(tt.f1) in ('A', 'B') with a simple test-case and open a separate Redmine task where you will post the test-case. There might be other such constructs where upper and ValueString are still used, so let the issue have a broader scope: eliminate all usages of upper with _temp FWD-H2.
    • After a second round of tests, I also encountered constructs like upper(tt.f1) is null. This is not OK - the upper call looks redundant.
    • I encountered upper in constructs like upper(tt.f1) like 'a.%'. I think like can work with ignore-case by default if the flag is set. Needs better investigation.
    • I've seen several FWD method calls of numEntries_1, entryIn_1 and entryIn_2. I am not sure when are they used, but it is worth investigating.
    • All of the above should be included in the broader scope issue.
  • Profiling shows decent results. I have several metrics that show the improvement somewhere between -0.4% and -0.6%.

Danut, please let me know if there is anything left to be worked on this task. If everything is ready, I am considering merging it to FWD-H2 trunk, and integrating 7454a into FWD trunk once we have a FWD-H2 artifact uploaded.

#30 Updated by Dănuț Filimon 9 months ago

Alexandru Lungu wrote:

Danut, please let me know if there is anything left to be worked on this task. If everything is ready, I am considering merging it to FWD-H2 trunk, and integrating 7454a into FWD trunk once we have a FWD-H2 artifact uploaded.

There's nothing left to do for this task.

upper(tt.f1) in ('A', 'B') and upper(tt.f1) is null look to be very specific. The first one should still be easily eliminated once we find out the Aast type of the IN keyword and eliminate it (a similar situation with the indexOf, where the parent was a function), this should also extend to LIKE case.

I've seen several FWD method calls of numEntries_1, entryIn_1 and entryIn_2. I am not sure when are they used, but it is worth investigating.

Is upper still being generated in those cases?

I will create a test case for upper(tt.f1) like 'a.%' since it seems easier to recreate and open a new issue. I will still need to find a test case for upper(tt.f1) in ('A', 'B') to verify my assumption on the keyword type.

#31 Updated by Alexandru Lungu 9 months ago

Dănuț Filimon wrote:

upper(tt.f1) in ('A', 'B') and upper(tt.f1) is null look to be very specific.

Not really; upper(tt.f1) is null is generated each time you do something like tt.f1 != "a", because in 4GL, this condition evaluates to true if tt.f1 is unknown. In SQL, tt.f1 != 'a' doesn't evaluate to true if tt.f1 is null. That is why we generate tt.f1 != 'a' or tt.f1 is null - to match the 4GL behavior. I think we actually generate tt.f1 != 'a' or upper(tt.f1) is null, so this kinda creates the problem here.

Also, if you do tt.f1 = tt2.f2, this will generate tt.f1 = tt2.f2 or (tt.f1 is null and tt2.f2 is null) out of the same reasons. In fact, the upper versions are used, if I am not mistaken.

I've seen several FWD method calls of numEntries_1, entryIn_1 and entryIn_2. I am not sure when are they used, but it is worth investigating.

Is upper still being generated in those cases?

Yes.

#32 Updated by Alexandru Lungu 9 months ago

Committed 7454a_h2 to FWD_H2 trunk as rev. 25. Archived 7454a_h2.

Danut, please update H2_Database_Fork with DEFAULT_IGNORE_CASE in "FWD-H2 connection options" section. Consider if "Add a new database property to FWD-H2" section should be updated.

Note that I will merge 7454a into FWD trunk once we upgrade FWD-H2 to 1.25 or later.

#33 Updated by Dănuț Filimon 9 months ago

Alexandru Lungu wrote:

Not really; upper(tt.f1) is null is generated each time you do something like tt.f1 != "a", because in 4GL, this condition evaluates to true if tt.f1 is unknown. In SQL, tt.f1 != 'a' doesn't evaluate to true if tt.f1 is null. That is why we generate tt.f1 != 'a' or tt.f1 is null - to match the 4GL behavior. I think we actually generate tt.f1 != 'a' or upper(tt.f1) is null, so this kinda creates the problem here.

tt.f1 != "a" generates the expected tt.f1 is null but without the upper call. I am still trying to figure out a test where the upper is introduced.

Also, if you do tt.f1 = tt2.f2, this will generate tt.f1 = tt2.f2 or (tt.f1 is null and tt2.f2 is null) out of the same reasons. In fact, the upper versions are used, if I am not mistaken.

Won't the tt2.f2 actually be a substitution parameter and the tt2.f2 is null check be omitted? Same thing happens when doing tt.f1 != tt2.f2.

I've seen several FWD method calls of numEntries_1, entryIn_1 and entryIn_2. I am not sure when are they used, but it is worth investigating.

Is upper still being generated in those cases?

Yes.

UPPER is not generated in the mentioned methods when using 7454a/rev.14669.

#34 Updated by Alexandru Lungu 9 months ago

Dănuț Filimon wrote:

tt.f1 != "a" generates the expected tt.f1 is null but without the upper call. I am still trying to figure out a test where the upper is introduced.

I wonder if tt.f1 was case-sensitive in the first place. I need to double-check.

Won't the tt2.f2 actually be a substitution parameter and the tt2.f2 is null check be omitted? Same thing happens when doing tt.f1 != tt2.f2.

It is a substitution parameter if you have an un-optimized CompoundQuery. If you have a 2-component query that can be optimized (i.e. they have tt.f1 = tt2.f2 and f1, f2 are indexed), you end up with a multi-table query. That will have (tt.f1 is null and tt2.f2 is null), eventually with upper? I need to double-check.

UPPER is not generated in the mentioned methods when using 7454a/rev.14669.

Danut, please test the following. Note that all fields are case-insensitive characters.

   for each tt where
            tt.a = "test" and
            tt.b = "=" and
           (tt.c = "" or tt.c = "?" or tt.c = ?):
[...]
   end.

I get the following query:

select 
    tt.* 
from
    tt42 tt 
where
    tt._multiplex = ? and tt.a = 'test' and tt.b = '=' and (upper(tt.c) in ('', '?') or upper(tt.c) is null)
order by 
        [...]
lazy

#35 Updated by Dănuț Filimon 9 months ago

Alexandru Lungu wrote:

Danut, please update H2_Database_Fork with DEFAULT_IGNORE_CASE in "FWD-H2 connection options" section. Consider if "Add a new database property to FWD-H2" section should be updated.

H2_Database_Fork has been updated with details about the DEFAULT_IGNORE_CASE option. "Add a new database property to FWD-H2" section should not be updated, it only serves as an example on how a new property can be added.

Danut, please test the following. Note that all fields are case-insensitive characters.

   for each tt where
            tt.a = "test" and
            tt.b = "=" and
           (tt.c = "" or tt.c = "?" or tt.c = ?):
[...]
   end.
This test case covers all the current problems with upper. The converted query looks like this:
query0.initialize(tmp1, "upper(tmp1.f1) = 'TEST' and upper(tmp1.f2) = '=' and (upper(tmp1.f3) in ('', '?') or upper(tmp1.f3) is null)", null, "tmp1.recid asc");
This means that the upper already exists and is not introduced later when preprocessing the fql. The solution is to add IN and IS_NULL cases to FQLPreprocessor.simplifyProperties method.

#36 Updated by Alexandru Lungu 9 months ago

This test case covers all the current problems with upper. The converted query looks like this:

Note that the conversion always adds UPPER for character fields because it is independent from the underlying database. In PG we still need the upper, so we just generate upper for all dialect and remove them for dialects/cases we don't need them (FWD-H2).

#37 Updated by Dănuț Filimon 9 months ago

Alexandru Lungu wrote:

Note that the conversion always adds UPPER for character fields because it is independent from the underlying database. In PG we still need the upper, so we just generate upper for all dialect and remove them for dialects/cases we don't need them (FWD-H2).

From previous discussions, I assumed that upper(tmp1.f3) is null) can only be generated when preprocessing, I did not think it could also be done at conversion.

The solution mentioned in #7454-35 was committed to 7454a/rev.14670.

#38 Updated by Eric Faulhaber 6 months ago

Can this issue be closed?

#39 Updated by Dănuț Filimon 6 months ago

Eric Faulhaber wrote:

Can this issue be closed?

7454a was not merged, but 7454a_h2 was. If I remember correctly, Alexandru wanted to do a profiling test with these changes or the h2 trunk at the time had a regression, so this issue can't be closed yet.

Alexandru, should we rebase 7454a with the latest trunk and retest before anything else?

#40 Updated by Alexandru Lungu 6 months ago

Yes! I was caught up with #6720 by now and I didn't want to spoil my profiling test with it. I will attempt to check 7454a today so we can move forward with it.

#41 Updated by Alexandru Lungu 6 months ago

Rebased 7454a to latest trunk. It is now at rev. 14825.

#42 Updated by Alexandru Lungu 6 months ago

  • Status changed from Review to Internal Test

Profiled 7454a now and got -0.6% improvement. This was on-spot with the expectancy (or even quite better).
I will move them to regression testing and let you know of the results.
In the meant-time, please do some smoke tests with a large application.

#43 Updated by Dănuț Filimon 6 months ago

Alexandru Lungu wrote:

Profiled 7454a now and got -0.6% improvement. This was on-spot with the expectancy (or even quite better).
I will move them to regression testing and let you know of the results.
In the meant-time, please do some smoke tests with a large application.

I tested it and everything works well. I'll be waiting for the results of the regression test.

#44 Updated by Alexandru Lungu 6 months ago

Regression tests passed successfully. 7454a can be merged - Greg, waiting for your call

#45 Updated by Greg Shah 6 months ago

You can merge now.

#46 Updated by Alexandru Lungu 6 months ago

  • Status changed from Internal Test to Test

Branch 7454a was merged into trunk as rev. 14833 and archived.

#47 Updated by Alexandru Lungu 4 months ago

This can be closed.

#48 Updated by Greg Shah 4 months ago

  • Status changed from Test to Closed

Also available in: Atom PDF