Bug #7454
Make ValueStringIgnoreCase the default generated value for setString in FWD-H2
100%
Related issues
History
#1 Updated by Alexandru Lungu 11 months ago
This task is related to setString
parameter setting in FWD-H2. Most of the strings we are using with this option are for case-insensitive fields (at least in _temp). We can design a FWD-H2 option OPTIMIZE_IGNORE_CASE
that forces ValueStringIgnoreCase
to be created on setString
instead of the default ValueString
. This way we avoid extra internal casts from ValueString
to ValueStringIgnoreCase
.
I expect to see this being implemented synchronously in FWD (add option for _temp) and FWD-H2 (add support for the option).
At this point I am not 100% sure that such change is safe, so this needs extensive testing and profiling.
#2 Updated by Alexandru Lungu 11 months ago
- Related to Bug #7452: Analyze memory of the value cache from FWD-H2 and eventually increase size added
#3 Updated by Alexandru Lungu 11 months ago
- Status changed from New to WIP
- Assignee set to Dănuț Filimon
Danut, I created 7454a_h2 for this task. The change is not complex, but confirming this is the right thing to do is hard. I will need your expertise on feature testing and profiling on this one.
#4 Updated by Dănuț Filimon 11 months ago
Committed 7454a_h2/rev.24 which adds the OPTIMIZE_IGNORE_CASE
database property, it is false by default.
varchar_ignorecase
:
Test type | Default (ms) | OPTIMIZE_IGNORE_CASE = true (ms) | Difference (%) |
---|---|---|---|
1m records, 5 fields | 4327.166 | 4178.833 | -3.43 |
5m records, 5 fields | 25465 | 25118.833 | -1.36 |
1m records, 10 fields | 7880.5 | 7509 | -4.71 |
1m records, 20 fields | 15294.66 | 14622.833 | -4.39 |
1m records, 40 fields | 31040 | 28919.5 | -6.83 |
The average improvement is of 4%.
It might not be a problem to add this as a feature, most of the FWD-H2 tests pass with a few exceptions that I still need to look further into:TestRtrim
: there were previous problems with mismatching results between 4GL and FWD, this change might solve/create more problems;TestMetaData
: A test which matches a the column type of the values from the ResultSet andVARCHAR
type directly. It can be ignored, all original tests will not useOPTIMIZE_IGNORE_CASE
;TestPreparedStatement
: callstoString()
on aPreparedStatement
after setting a string parameter. It's the same situation encountered previously.
Overall it is great that most of the tests are not greatly impacted by this change, I will continue by creating a test file for this property and investigating TestRtrim
results.
#5 Updated by Alexandru Lungu 11 months ago
- % Done changed from 0 to 100
- Status changed from WIP to Review
#6 Updated by Alexandru Lungu 11 months ago
Review of 7454a_h2/rev.24
- I am OK with the changes. Good job!
I will start a testing + profiling round and get back to you when ready.
Danut, please fix the regression tests and do the same testing on a large customer application. Please get a bit deep down with the investigation to check that there are no performance issues using the debugger (no extra casts, etc.). Also, mind that we shall add OPTIMIZE_IGNORE_CASE
in FWD. You shall test this locally. We will "officially" commit OPTIMIZE_IGNORE_CASE
to FWD when FWD-H2 changes are completely tested.
#7 Updated by Dănuț Filimon 11 months ago
I looked into the rtrim test and the failed assertion was caused by something that should've been expected. If you create a table with a varchar_casesensitive field and insert the value 'A', you can do 2 prepared statements where you can try to select the inserted value by trying 'A' and then 'a'. In this case, since the field is case-sensitive, searching for 'a' will result into an empty ResultSet. Using OPTIMIZE_IGNORE_CASE
will cause the 'a' to be ValueStringIgnoreCase
and at the end, the ResultSet
will contain the inserted row.
I am currently working on implementing a solution for this problem, the Value
should be aware of the OPTIMIZE_IGNORE_CASE
property when deciding on the higher order type which is used when casting two values of different types.
#8 Updated by Dănuț Filimon 10 months ago
TestOptimizeIgnoreCase
class where I added most of the troublesome statements that failed with the new property. When a comparison needs to be made between a ValueString
and a ValueStringIgnoreCase
, and OPTIMIZE_IGNORE_CASE
is set to TRUE
, then ValueString
will have a higher order and the comparison will be done between ValueString
instances. After this change a lot more tests failed and I took them one by one and analyzed them, here's a summary of my findings:
OPTIMIZE_IGNORE_CASE=TRUE
should always be paired with theRTRIM=TRUE
property to avoid spaces making a difference in a comparison betweenValueString
andValueStringFixed
.- There are some cases when the H2 behavior changed, but from the 4GL perspective it is correct.
#9 Updated by Dănuț Filimon 10 months ago
- Renamed
OPTIMIZE_IGNORE_CASE
toDEFAULT_IGNORE_CASE
- Fixed test cases to handle this property by default
- Fixed a regression caused by running similar statements that should return rows
- Fixed individual
TestRtrim
test case (unrelated to current issue) AddedTestDefaultIgnoreCase
test case
Please review.
EDIT: I forgot to bzr add
the test case.
#10 Updated by Alexandru Lungu 10 months ago
Review
- In
Query
, the comparison is not correct. Ifa
andb
don't have the same value type, it doesn't mean we should return false. Note thatareEqual
does implicit cast so it may succeed in comparing values of different types. Also, if you choose to handle theSTRING_IGNORECASE
in a particular manner, consider bothreturn true
andreturn false
cases. In your implementation, this the values are the same, you fallback toareEqual
; I guess you can short0circuit this tocontinue
. - Is is safe to use
false
as defaultIgnoreCase inValue.getHigherType
? This seems quite risky.
I didn't want to get into the tests as there are a lot. I just want to make sure your cover enough cases. Please consider:
IGNORECASE
/CASESENSITIVE
table field; this field set in an indexsetString
as parameter for:- value to be inserted in the table
- value to be updated to the table
WHERE
clause- make sure the WHERE uses
=
,<
and>
over the indexed field
- make sure the WHERE uses
- Try simple where expressions ('a' = 'A') - static and dynamic (? = 'A', 'a' = ?, etc.)
- Try to use joins over the character field; something like
join tt2 on tt2.f2 = tt1.f1
, wheref1
andf2
can be of any combination ofIGNORECASE
/CASESENSITIVE
- Try
? = ?
with setString ignore-case
As a performance check, please use your FWD case-sensitive, case-insensitive you used when implementing RTRIM. Extend them into performance tests. Compare them with/without DEFAULT_IGNORE_CASE
option.
I am really excited about these changes; having H2 work with string cases just like 4GL looks like a big deal.
#11 Updated by Dănuț Filimon 10 months ago
Committed 7454a_h2/rev.26. I added the missing test file and fixed the condition in Query as specified in #7454-10.
Alexandru Lungu worte:
Is is safe to use
false
as defaultIgnoreCase inValue.getHigherType
? This seems quite risky.
I was thinking that it may cause problems if true
was used instead, but I was wrong. Now I changed it to use true
.
As a performance check, please use your FWD case-sensitive, case-insensitive you used when implementing RTRIM. Extend them into performance tests. Compare them with/without
DEFAULT_IGNORE_CASE
option.
Still have some work to do on the performance test.
#12 Updated by Dănuț Filimon 10 months ago
CI-CS
- case insensitive database field and case sensitive variableCS-CS
- case sensitive database field and case sensitive variableCI-CI
- case insensitive database field and case insensitive variableCS-CI
- case sensitive database field and case insensitive variableCase-sensitivity
- the name of the test that handles simple string comparisonsRtrim
- test that handles string comparison with multiple spaces at the end of the stringLtrim
- test that handles string comparison with multiple leading spaces in a stringSpace
- test that handles the comparison of empty strings/strings only containing spaces
7454a_h2/rev.26 (default) means that DEFAULT_IGNORE_CASE
property is not used, which defaults to false
.
7454a_h2/rev.26 (true) means that DEFAULT_IGNORE_CASE
property is used and it is set to true
.
CI-CS | CS-CS | CI-CI | CS-CI | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Branch | Case-sensitivity | Rtrim | Ltrim | Space | Case-sensitivity | Rtrim | Ltrim | Space | Case-sensitivity | Rtrim | Ltrim | Space | Case-sensitivity | Rtrim | Ltrim | Space |
7454a_h2/rev.23 | 3598.4 | 2246.8 | 1740.4 | 2353.6 | 1504.8 | 1936.8 | 1655.2 | 2286 | 1488.2 | 1967.2 | 1696.8 | 2260.4 | 1480.2 | 1924 | 1639.2 | 2232.2 |
7454a_h2/rev.26 (default) | 3472.4 | 2167.8 | 1769.6 | 2326.2 | 1516.8 | 1889 | 1635.8 | 2239.2 | 1512.6 | 1943 | 1652.8 | 2235 | 1443.8 | 1880.2 | 1618.2 | 2179.8 |
7454a_h2/rev.26 (true) | 3338.2 | 2148.2 | 1684.4 | 2350.4 | 1528.4 | 1957.8 | 1668.4 | 2268.8 | 1451.4 | 1877.2 | 1619.8 | 2187 | 1433.6 | 1921.8 | 1620 | 2226 |
CI-CS | CS-CS | CI-CI | CS-CI | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Branch | Case-sensitivity | Rtrim | Ltrim | Space | Case-sensitivity | Rtrim | Ltrim | Space | Case-sensitivity | Rtrim | Ltrim | Space | Case-sensitivity | Rtrim | Ltrim | Space |
7454a_h2/rev.26 (default) | -3.501% | -3.516% | +1.677% | -1.164% | +0.797% | -2.467% | -1.172% | -2.047% | +1.639% | -1.23% | -2.593% | -1.123% | -2.459% | -2.276% | -1.281% | -2.347% |
7454a_h2/rev.26 (true) | -7.230% | -4.388% | -3.217% | -0.135% | +1.568% | +1.084% | +0.797% | -0.752% | -2.472% | -4.575% | -4.537% | -3.247% | -3.148% | -0.114% | -1.171% | -0.277% |
The tests consisted of creating temporary tables with 50k records and running checking if any match is found (the test will pass if any record is found). I also managed to extend the tests to not only compare simple strings, but strings with different amount of leading/trailing spaces and so on.
Even if I increased the numbers of records created from 10k to 50k already, the time spent executing the test is still low so I calculated the results as such. There is an overall improvement when using the latest version of 7454a_h2 and I think that the results are pretty good.
#13 Updated by Alexandru Lungu 10 months ago
I found a regression while testing. I attempted a fix in 7454a_h2 / rev. 27. This has a wide impact on direct-access, as the driver used DataType.convertToValue
to convert from String to H2 internal string representation. As it was defaulting to ValueString
, the comparisons were case-sensitive.
Danut, please review. If you have other suggestions, please advice.
#14 Updated by Alexandru Lungu 10 months ago
Also, Danut, please make the flag naming consistent with the option. Right now, the option is DEFAULT_IGNORE_CASE
and the option is OPTIMIZE_IGNORE_CASE
.
EDIT: Run a large customer application and put a break-point in ValueString.get
. Check if the call sites are OK with using ValueString
, instead of ValueStringIgnoreCase
. We should lower the uses of ValueString
as much as we can when DEFAULT_IGNORE_CASE
is set (close to no instance of ValueString
).
#15 Updated by Dănuț Filimon 10 months ago
Alexandru Lungu wrote:
I found a regression while testing. I attempted a fix in 7454a_h2 / rev. 27. This has a wide impact on direct-access, as the driver used
DataType.convertToValue
to convert from String to H2 internal string representation. As it was defaulting toValueString
, the comparisons were case-sensitive.Danut, please review. If you have other suggestions, please advice.
The fix is correct. I've retrieved DEFAULT_IGNORE_CASE
value in a similar manner in JdbcConnection
. I've done a quick test just to be sure and it works properly.
Also, Danut, please make the flag naming consistent with the option. Right now, the option is
DEFAULT_IGNORE_CASE
and the option isOPTIMIZE_IGNORE_CASE
.
This change was committed to 7454a_h2/rev.28.
#16 Updated by Alexandru Lungu 10 months ago
- Status changed from Review to WIP
- % Done changed from 100 to 70
Great!
I've added in rev. 29 the possibility to make sequences temporary. I just noticed that there are several attempts to flush the sequence for the other users, even if in per-user no-lock temporary H2 database. Therefore, this will be a separate performance increase from this FWD-H2 functionality.
I analyzed whereValueString
is still used and there is still some concerns:
- Character literals are
ValueString
by default. To be compatible with 4GL, we need character literals to be case insensitive (Parser.read() CHAR_STRING case). Basically, all clauses likett.f1 = ''
will attempt to converttt.f1
to a case-sensitive value and compare. This is wrong both functionally and performance-wise. - We still emit
upper
clauses from FWD. Thus, I often encounterupper(tt.f1) = 'A'
.upper
is defined as a function which requires a case-sensitive string and returns a case-sensitive string:- Please test with 4GL UPPER function and check its behavior. Try to replicate for FWD-H2. Do the same for the other build-in character functions in
Function
. - FWD itself generates from UDF inside FWD-H2. These "initializing DDL" should correctly honor the ignore-case priority implemented in #7454. These should be eventually refactored.
- We shall get into FWD and eliminate
upper
clauses when using FWD-H2 dialect. From my POV, this should go down exactly like RTRIM (right fromFQLPreprocessor
). However, this can be done independently from the implementation of FWD-H2. Keepingupper
is just an performance overhead for now, not a functional concern. I will make a separate task for this eventually.
- Please test with 4GL UPPER function and check its behavior. Try to replicate for FWD-H2. Do the same for the other build-in character functions in
I created 7454a for any related FWD work.
#17 Updated by Alexandru Lungu 10 months ago
Profiling shows -0.3% improvement. this is decent, but can be way better. I expect way more improvement after we get #7454 done. Note that there are still ~5.000 ValueString.get
that are quite useless in my POC, mostly due to character literal interpretation as ValueString
instead of ignore-case. Also, the test was done without my temporary sequences change (I expect some extra boost from there).
#18 Updated by Dănuț Filimon 10 months ago
I looked into #7454-16, the change to the Parser
works properly since it has access to the database settings.
After testing a customer application, I noticed that there are instances of internal conversion to ValueString
when working with meta records, but also found out that user-defined functions will force a conversion from ValueStringIgnoreCase
to ValueString
. Before the value of the user-defined function is returned, a DataType.getTypeFromClass
call returns the String
class which forces a conversion to Value.STRING
. In the same class, we have getTypeClassName
which returns the String
class for Value.STRING
, Value.STRING_FIXED
and Value.STRING_IGNORECASE
.
Should we consider changing it? This will require a parameter to be passed through each time a function is called because FunctionAlias
/DataType
do not store any information related to the database settings, but the session is provided when calling the function.
#19 Updated by Alexandru Lungu 10 months ago
Dănuț Filimon wrote:
After testing a customer application, I noticed that there are instances of internal conversion to
ValueString
when working with meta records
This may be exactly due to the sequences we use that are not temporary. I added the support for temporary sequences, but you will also need changes in FWD to enable them. Consider them as fixed for now.
but also found out that user-defined functions will force a conversion from ValueStringIgnoreCase to ValueString. Before the value of the user-defined function is returned, a DataType.getTypeFromClass returns the String class which forces a conversion to Value.STRING. In the same class, we have getTypeClassName which returns the String class for Value.STRING, Value.STRING_FIXED and Value.STRING_IGNORECASE.
getTypeFromClass
should honor defaultIgnoreCase
. That is, if you receive a Java STRING, then you should implicitly consider it is an ignore-case H2 String. Here a change is required, I agree.getTypeClassName
is fine - all 3 FWD-H2 String representations are backed by Java strings.
Should we consider changing it? This will require a parameter to be passed through each time a function is called because
FunctionAlias
/DataType
do not store any information related to the database settings, but the session is provided when calling the function.
In DataType
, you should pass down a CastDataProvider
(either the session or database).
Function.getSimpleValue
(at least for UPPER, but other string function as well).H2Helper.getPrepareStatements
, more exactlyFunctions
, and check they reach the FWD-H2 engine right. By default, Java String parameters are ignore-case string parameters. However, you will need to redo the investigation to check if the UDF are working alright in 4GL and FWD.
Last topic is how case-sensitive characters types will work with FWD-H2 UDF. This is not a concern now, but should be thought of as our changes will regress for case-sensitive types. We will leave this at the end.
#20 Updated by Dănuț Filimon 10 months ago
Found an interesting example when experimenting with functions in H2.
Here is a simple table definition: create table test(name varchar check(name = upper(name)));
In the first experiment we have the following:
- a varchar
field compared with it's uppercase version;
- jdbc:h2:mem:;mv_store=false;
as the connection url;
- the UPPER
function that returns Value.STRING_IGNORECASE
;
- the higher order type between ValueString
and ValueStringIgnoreCase
will be ValueStringIgnoreCase
;
The execution of INSERT INTO table1 (f1) VALUES(?) {1: "a"};
will pass and the value "a"
will be inserted because the name
value is casted to the higher order type, ValueStringIgnoreCase
. If the UPPER
function returned a Value.STRING
, the same statement would result in a constraint violation.
The second experiment is using the same scenario, but with the connection url: jdbc:h2:mem:;mv_store=false;default_ignore_case=true;
. Executing the same insert statement will result in a constraint violation because the higher order is ValueString
.
I discovered this scenario when running FWD-H2 tests. Changing the functions return type breaks FWD-H2 tests, but from the 4GL perspective it is correct. This change is also not influenced by the default_ignore_case
property, all the built-in functions are instantiated using a static block and access to the flag can be provided later through Database
.
#21 Updated by Alexandru Lungu 10 months ago
the UPPER function that returns Value.STRING_IGNORECASE;
: this theoretically happens whendefault_ignore_case
is set.the higher order type between ValueString and ValueStringIgnoreCase will be ValueStringIgnoreCase;
: this happens whendefault_ignore_case
is false.
We can ignore this now. From my POV, I feel like UPPER
should be polymorphic. If the target is case-sensitive, it should return ValueString
. If the target is case-insensitive, it should return ValueIgnoreCase
. Therefore, UPPER
shouldn't depend on the flag, so your first statement is irrelevant.
- make them polymorphic. We should provide support for case-sensitive
UPPER
even whendefault_ignore_case=true
is set. - I am not sure if two sets of functions are right now. The old tests should work, as the
UPPER
will honor the underlying case. If the old tests verify a different semantic, they should be refactored.
As for the FWD UDF, they should be eagerly tested across the 4GL behavior. Good news are that we won't have unit tests regressions on these :)
#22 Updated by Dănuț Filimon 10 months ago
Alexandru Lungu wrote:
Danut, the following statements contradict:
the UPPER function that returns Value.STRING_IGNORECASE;
: this theoretically happens whendefault_ignore_case
is set.
Functions are not influenced by the default_ignore_case
flag because it can't be retrieved when the functions are added in the static initialization block from Function
. This was just a test to see how changing the return type of the method will affect the existent FWD-H2 tests.
the higher order type between ValueString and ValueStringIgnoreCase will be ValueStringIgnoreCase;
: this happens whendefault_ignore_case
is false.
default_ignore_case
is false by default when it is not specified.
Danut, I would like to have another version of 7454a today/tomorrow. I think H2 functions may be an impediment here:
- make them polymorphic. We should provide support for case-sensitive
UPPER
even when default_ignore_case=true is set.- I am not sure if two sets of functions are right now. The old tests should work, as the UPPER will honor the underlying case. If the old tests verify a different semantic, they should be refactored.
I am currently working on it and it's possible to get the function based on the database flag, but not the parameter type. Even if we use a function that returns a ValueStringIgnoreCase
value, it will be casted to ValueString
when compared to another value of the same type (only when the flag is set to true).
#23 Updated by Alexandru Lungu 10 months ago
My point was to do a bit of refactoring on Functions to allow both IGNORECASE and CASESENSITIVE as parameters for such functions. I agree that we can retrieve "the function based on the database flag", but we still can use case-sensitive characters even when the flag is set right. Think of _meta database where we have case-sensitive
variables there. What will happen when you will call UPPER? I don't think casting then to ValueStringIgnoreCase
is OK, just because the flag was set.
#24 Updated by Dănuț Filimon 10 months ago
Alexandru Lungu wrote:
My point was to do a bit of refactoring on Functions to allow both IGNORECASE and CASESENSITIVE as parameters for such functions. I agree that we can retrieve "the function based on the database flag", but we still can use case-sensitive characters even when the flag is set right. Think of _meta database where we have
case-sensitive
variables there. What will happen when you will call UPPER? I don't think casting then toValueStringIgnoreCase
is OK, just because the flag was set.
Thank you for explanation. I understand what you mean now.
I began writing a test suite for 4GL functions and found a few problems.
For the following example:
DEFINE TEMP-TABLE tmp2 FIELD f1 AS CHARACTER CASE-SENSITIVE. CREATE tmp2. tmp2.f1 = 'a'. FIND FIRST tmp2 WHERE tmp2.f1 EQ UPPER(tmp2.f1) NO-ERROR. IF AVAILABLE tmp2 THEN MESSAGE "yes". // 4GL: No output, FWD: yes
The converted query looks like this:
silent(() -> new FindQuery(tmp2, "tmp2.f1 = upper(tmp2.f1)", null, "tmp2.recid asc").first());and the final where clause is:
where tmp2_1_1__0_._multiplex = ? and ((tmp2_1_1__0_.f1 = tmp2_1_1__0_.f1 or tmp2_1_1__0_.f1 is null and tmp2_1_1__0_.f1 is null))
Upon investigating I found out that this is a regression introduced in #7108. P2JH2Dialect handles case-insensitive compare operations for case-insensitive columns and automatically rtrims character columns, the
UPPER
call is removed from the fql
in FQLPreprocessor.preprocess
.
I found no other problems with the modifications to UPPER
and LOWER
functions in H2. SUBSTRING
, LEFT-TRIM
, REPLACE
, RIGHT-TRIM
and SUBSTRING
are already handled by FWD so I decided to not modify them.
#25 Updated by Dănuț Filimon 10 months ago
- Character literals were modified to be
ValueStringIgnoreCase
by default. UPPER
andLOWER
functions will return a value with the same type as the argument that it was given.- Modified a few test cases in
TestDefaultIgnoreCase
, some of them had to be changes to fit the usage of character literals and added a new test for upper. - Created a test suite for character fields and 4GL functions comparison.
#26 Updated by Alexandru Lungu 9 months ago
- % Done changed from 70 to 80
Regression tests passed.
Profiling tests shown -0.3% improvement (without temporary sequences). This is modest, but this will go higher once we fix the following review.
I still have a concern with indexOf
function. It takes two strings as parameters. It attempts to find the index of the pattern into a string target. However, casting to case-sensitive is attempted. Please test this and fix. If 4GL is doing case-sensitive search when computing the index-of, we can leave it like this / optimize. If not, please investigate and fix.
I just noticed that there are several attempts to flush the sequence for the other users, even if in per-user no-lock temporary H2 database.
Note my rev. 29 commit. It introduced a TEMPORARY
keyword for sequences, so they are no longer flushing. Anyway, you should change FWD to append this keyword at the end of H2 _temp generated sequences.
Also, I still have UPPER
generated in some of the _temp queries. Is this going to be handled in this task or not? If no, please create another Database task to remove upper
from the generated FWD-H2 queries.
Finally, please commit to 7454a any FWD changes required (including the use of TEMPORARY sequences).
#27 Updated by Dănuț Filimon 9 months ago
Alexandru Lungu wrote:
I still have a concern with indexOf function. It takes two strings as parameters. It attempts to find the index of the pattern into a string target. However, casting to case-sensitive is attempted. Please test this and fix. If 4GL is doing case-sensitive search when computing the index-of, we can leave it like this / optimize. If not, please investigate and fix.
This is a mistake on my end, previously in #7454-18 and #7454-19 we discussed about DataType.getTypeFromClass
which returns Value.STRING
type and should honor defaultIgnoreCase
. I must've missed it after reverting some changes in H2 for a commit or didn't find a way to get the flag for this method since there was no CastDataProvider
available (we can pass the session from FunctionAlias.getValue
where DataType.getTypeFromClass
is called).
Also, I still have UPPER generated in some of the _temp queries. Is this going to be handled in this task or not? If no, please create another Database task to remove upper from the generated FWD-H2 queries.
If I remember correctly, upper and rtrim methods were removed but reintroduced later in FQLPreprocessor.mainWalk()
. I will check again to make sure and if that is the case, I will create a separate issue. Otherwise I will make this change along with the TEMPORARY
keyword for sequences.
#28 Updated by Dănuț Filimon 9 months ago
I've looked into the problem and upper
methods are not reintroduced later in FQLPreprocessor.mainWalk()
. It was a particular case of simplifyProperties
that was left out where existent upper
methods that were the parameter of other functions would not be removed. Considering the recent changes in 7454a_h2, we can include this case now.
Committed 7454a/rev.14669. Solved the upper
methods still being generated in _temp queries, introduced temporary sequences and added the DEFAULT_IGNORE_CASE
property to in-memory temp-tables.
Committed 7454a_h2/rev.31. Added missing change to DataType.getTypeFromClass
, now it honors DEFAULT_IGNORE_CASE
property value.
#29 Updated by Alexandru Lungu 9 months ago
- % Done changed from 80 to 100
- Status changed from WIP to Review
Review of 7454a and 7454a_h2
- I am OK with the changes.
- Regression tests passed.
- I am no longer hitting
ValueString.get
in warm runs. In server-startup, I see someValueString.get
being called at _meta database initialization, which is fine. In a cold run I have only some calls toValueString.get
due to an in operation (upper(tt.f1) in ('A', 'B')
). These are still slow, but are working correctly and I intend to merge this asap, so it is not a priority to solve this now. Danut, please try to recreateupper(tt.f1) in ('A', 'B')
with a simple test-case and open a separate Redmine task where you will post the test-case. There might be other such constructs whereupper
andValueString
are still used, so let the issue have a broader scope: eliminate all usages ofupper
with _temp FWD-H2.- After a second round of tests, I also encountered constructs like
upper(tt.f1) is null
. This is not OK - theupper
call looks redundant. - I encountered
upper
in constructs likeupper(tt.f1) like 'a.%'
. I thinklike
can work with ignore-case by default if the flag is set. Needs better investigation. - I've seen several FWD method calls of
numEntries_1
,entryIn_1
andentryIn_2
. I am not sure when are they used, but it is worth investigating. - All of the above should be included in the broader scope issue.
- After a second round of tests, I also encountered constructs like
- Profiling shows decent results. I have several metrics that show the improvement somewhere between -0.4% and -0.6%.
Danut, please let me know if there is anything left to be worked on this task. If everything is ready, I am considering merging it to FWD-H2 trunk, and integrating 7454a into FWD trunk once we have a FWD-H2 artifact uploaded.
#30 Updated by Dănuț Filimon 9 months ago
Alexandru Lungu wrote:
Danut, please let me know if there is anything left to be worked on this task. If everything is ready, I am considering merging it to FWD-H2 trunk, and integrating 7454a into FWD trunk once we have a FWD-H2 artifact uploaded.
There's nothing left to do for this task.
upper(tt.f1) in ('A', 'B')
and upper(tt.f1) is null
look to be very specific. The first one should still be easily eliminated once we find out the Aast type
of the IN
keyword and eliminate it (a similar situation with the indexOf
, where the parent was a function), this should also extend to LIKE
case.
I've seen several FWD method calls of numEntries_1, entryIn_1 and entryIn_2. I am not sure when are they used, but it is worth investigating.
Is upper
still being generated in those cases?
I will create a test case for upper(tt.f1) like 'a.%'
since it seems easier to recreate and open a new issue. I will still need to find a test case for upper(tt.f1) in ('A', 'B')
to verify my assumption on the keyword type.
#31 Updated by Alexandru Lungu 9 months ago
Dănuț Filimon wrote:
upper(tt.f1) in ('A', 'B') and upper(tt.f1) is null look to be very specific.
Not really; upper(tt.f1) is null
is generated each time you do something like tt.f1 != "a"
, because in 4GL, this condition evaluates to true
if tt.f1
is unknown. In SQL, tt.f1 != 'a'
doesn't evaluate to true if tt.f1
is null. That is why we generate tt.f1 != 'a' or tt.f1 is null
- to match the 4GL behavior. I think we actually generate tt.f1 != 'a' or upper(tt.f1) is null
, so this kinda creates the problem here.
Also, if you do tt.f1 = tt2.f2
, this will generate tt.f1 = tt2.f2 or (tt.f1 is null and tt2.f2 is null)
out of the same reasons. In fact, the upper
versions are used, if I am not mistaken.
I've seen several FWD method calls of numEntries_1, entryIn_1 and entryIn_2. I am not sure when are they used, but it is worth investigating.
Is
upper
still being generated in those cases?
Yes.
#32 Updated by Alexandru Lungu 9 months ago
Committed 7454a_h2 to FWD_H2 trunk as rev. 25. Archived 7454a_h2.
Danut, please update H2_Database_Fork with DEFAULT_IGNORE_CASE
in "FWD-H2 connection options" section. Consider if "Add a new database property to FWD-H2" section should be updated.
Note that I will merge 7454a into FWD trunk once we upgrade FWD-H2 to 1.25 or later.
#33 Updated by Dănuț Filimon 9 months ago
Alexandru Lungu wrote:
Not really;
upper(tt.f1) is null
is generated each time you do something likett.f1 != "a"
, because in 4GL, this condition evaluates totrue
iftt.f1
is unknown. In SQL,tt.f1 != 'a'
doesn't evaluate to true iftt.f1
is null. That is why we generatett.f1 != 'a' or tt.f1 is null
- to match the 4GL behavior. I think we actually generatett.f1 != 'a' or upper(tt.f1) is null
, so this kinda creates the problem here.
tt.f1 != "a"
generates the expected tt.f1 is null
but without the upper
call. I am still trying to figure out a test where the upper is introduced.
Also, if you do
tt.f1 = tt2.f2
, this will generatett.f1 = tt2.f2 or (tt.f1 is null and tt2.f2 is null)
out of the same reasons. In fact, theupper
versions are used, if I am not mistaken.
Won't the tt2.f2
actually be a substitution parameter and the tt2.f2 is null
check be omitted? Same thing happens when doing tt.f1 != tt2.f2
.
I've seen several FWD method calls of numEntries_1, entryIn_1 and entryIn_2. I am not sure when are they used, but it is worth investigating.
Is
upper
still being generated in those cases?Yes.
UPPER
is not generated in the mentioned methods when using 7454a/rev.14669.
#34 Updated by Alexandru Lungu 9 months ago
Dănuț Filimon wrote:
tt.f1 != "a"
generates the expectedtt.f1 is null
but without theupper
call. I am still trying to figure out a test where the upper is introduced.
I wonder if tt.f1
was case-sensitive
in the first place. I need to double-check.
Won't the
tt2.f2
actually be a substitution parameter and thett2.f2 is null
check be omitted? Same thing happens when doingtt.f1 != tt2.f2
.
It is a substitution parameter if you have an un-optimized CompoundQuery
. If you have a 2-component query that can be optimized (i.e. they have tt.f1 = tt2.f2
and f1
, f2
are indexed), you end up with a multi-table query. That will have (tt.f1 is null and tt2.f2 is null)
, eventually with upper? I need to double-check.
UPPER
is not generated in the mentioned methods when using 7454a/rev.14669.
Danut, please test the following. Note that all fields are case-insensitive characters.
for each tt where tt.a = "test" and tt.b = "=" and (tt.c = "" or tt.c = "?" or tt.c = ?): [...] end.
I get the following query:
select tt.* from tt42 tt where tt._multiplex = ? and tt.a = 'test' and tt.b = '=' and (upper(tt.c) in ('', '?') or upper(tt.c) is null) order by [...] lazy
#35 Updated by Dănuț Filimon 9 months ago
Alexandru Lungu wrote:
Danut, please update H2_Database_Fork with
DEFAULT_IGNORE_CASE
in "FWD-H2 connection options" section. Consider if "Add a new database property to FWD-H2" section should be updated.
H2_Database_Fork has been updated with details about the DEFAULT_IGNORE_CASE
option. "Add a new database property to FWD-H2" section should not be updated, it only serves as an example on how a new property can be added.
Danut, please test the following. Note that all fields are case-insensitive characters.
for each tt where tt.a = "test" and tt.b = "=" and (tt.c = "" or tt.c = "?" or tt.c = ?): [...] end.This test case covers all the current problems with
upper
. The converted query looks like this:query0.initialize(tmp1, "upper(tmp1.f1) = 'TEST' and upper(tmp1.f2) = '=' and (upper(tmp1.f3) in ('', '?') or upper(tmp1.f3) is null)", null, "tmp1.recid asc");This means that the upper already exists and is not introduced later when preprocessing the fql. The solution is to add
IN
and IS_NULL
cases to FQLPreprocessor.simplifyProperties
method.#36 Updated by Alexandru Lungu 9 months ago
This test case covers all the current problems with upper. The converted query looks like this:
Note that the conversion always adds UPPER
for character fields because it is independent from the underlying database. In PG we still need the upper, so we just generate upper for all dialect and remove them for dialects/cases we don't need them (FWD-H2).
#37 Updated by Dănuț Filimon 9 months ago
Alexandru Lungu wrote:
Note that the conversion always adds
UPPER
for character fields because it is independent from the underlying database. In PG we still need the upper, so we just generate upper for all dialect and remove them for dialects/cases we don't need them (FWD-H2).
From previous discussions, I assumed that upper(tmp1.f3) is null)
can only be generated when preprocessing, I did not think it could also be done at conversion.
The solution mentioned in #7454-35 was committed to 7454a/rev.14670.
#38 Updated by Eric Faulhaber 6 months ago
Can this issue be closed?
#39 Updated by Dănuț Filimon 6 months ago
Eric Faulhaber wrote:
Can this issue be closed?
7454a was not merged, but 7454a_h2 was. If I remember correctly, Alexandru wanted to do a profiling test with these changes or the h2 trunk at the time had a regression, so this issue can't be closed yet.
Alexandru, should we rebase 7454a with the latest trunk and retest before anything else?
#40 Updated by Alexandru Lungu 6 months ago
Yes! I was caught up with #6720 by now and I didn't want to spoil my profiling test with it. I will attempt to check 7454a today so we can move forward with it.
#41 Updated by Alexandru Lungu 6 months ago
Rebased 7454a to latest trunk. It is now at rev. 14825.
#42 Updated by Alexandru Lungu 6 months ago
- Status changed from Review to Internal Test
Profiled 7454a now and got -0.6% improvement. This was on-spot with the expectancy (or even quite better).
I will move them to regression testing and let you know of the results.
In the meant-time, please do some smoke tests with a large application.
#43 Updated by Dănuț Filimon 6 months ago
Alexandru Lungu wrote:
Profiled 7454a now and got -0.6% improvement. This was on-spot with the expectancy (or even quite better).
I will move them to regression testing and let you know of the results.
In the meant-time, please do some smoke tests with a large application.
I tested it and everything works well. I'll be waiting for the results of the regression test.
#44 Updated by Alexandru Lungu 6 months ago
Regression tests passed successfully. 7454a can be merged - Greg, waiting for your call
#46 Updated by Alexandru Lungu 6 months ago
- Status changed from Internal Test to Test
Branch 7454a was merged into trunk as rev. 14833 and archived.
#47 Updated by Alexandru Lungu 4 months ago
This can be closed.