Bug #7667
Support OUTER-JOIN with PRESELECT at conversion
30%
History
#1 Updated by Alexandru Lungu 9 months ago
- Status changed from New to WIP
- Subject changed from Support OUTER-JOIN with PRESELECT to Support OUTER-JOIN with PRESELECT at conversion
This is a continuation of the #6196 effort of optimizing CompoundQuery
to merge components that have OUTER-JOIN. In 6196a there is a run-time support for OUTER-JOIN in PreselectQuery (more specifically in AdaptiveQuery
). However, we can extend the solution to cover multi-table PreselectQuery
with OUTER-JOIN right from the conversion side.
Goal: change the conversion rules such that preselect multi-table queries with outer-join will convert into PreselectQuery
, not CompoundQuery
.
Andrei, please address this issue after #6196 is completed. AFAIK, there is some attempt already in 6582a (which is really old, but can provide some insight) - refer to #6582.
#3 Updated by Alexandru Lungu 9 months ago
Andrei, please address #7667-2. Also, please make some tests with CAN-FIND
for that matter.
#5 Updated by Andrei Bălteanu 8 months ago
I started by looking first at the part of this task related to CONTAINS
and run-time optimization of multiple CompoundComponents
into a PreselectQuery
.
In order to block this optimization, I think we can add a condition in Optimizer.isServerJoinPossible()
that will check if we have a CONTAINS
in the where clause (if so, we automatically return false).
At the time of this method call, the FQL
attached to a component is not yet computed.
As a trivial potential implementation for this, I added this:
=== modified file 'src/com/goldencode/p2j/persist/CompoundQuery.java' --- old/src/com/goldencode/p2j/persist/CompoundQuery.java 2023-08-28 14:07:19 +0000 +++ new/src/com/goldencode/p2j/persist/CompoundQuery.java 2023-09-22 13:30:18 +0000 @@ -3528,8 +3528,17 @@ return false; } - String where = query.getOriginalWhere(); - + String where = query.getOriginalWhere(); + if (where.contains("contains")) + { + if (fine) + { + log.fine(" --- can't optimize a query with 'contains': "); + } + + return false; + } + if (fine) { StringBuilder buf = new StringBuilder(top ? "*" : " ");
The problem encountered by Roger in #7683-5 seems to be solved.
Of course, this is just a quick fix and a more elegant approach will be needed.
#7 Updated by Igor Skornyakov 8 months ago
I've looked at the CompoundQuery in the trunk.
For me this check for CONTAINS looks oversimplified:- The
where
string can contain'contains'
substring as e.g. part of the string literal. - In theory is is possible to convert a CompoundQuery with CONTAINS to the PreselectQuery if the search expression of CONTAINS is a constant (does not depend on any other table used in the query).
- Are you absolutely sure that if
where
string really uses CONTAINS operator it will contain'contains'
string in lower case?
I would rather perform the check for the CONTAINS operation at the HQL level. In fact the Compound query that cannot be converted to the PreselectQuery because of the CONTAINS will have "contains_db_expr" annotation added in the scope of #7086.
#8 Updated by Andrei Bălteanu 8 months ago
Igor, I think I led you into a small misunderstanding.
The comment from #7667-5 was related to a run-time
optimization from a CompoundQuery
to an Adaptive/PreselectQuery
and it is not conversion related.
The case that I'm currently working on is a for each
that is converted into a CompoundQuery
and further optimized (at run-time) into an AdaptiveQuery
.
#7667-5 was related to a regression caused by #6196, that appears when the CompoundQuery.Optimizer
"succeeds" in doing it's job in cases where a CONTAINS
keyword appears in the FQL
. The presence of CONTAINS
should always stop the optimization.
For that, I added the boolean hasContains
in FQLPreprocessor
and the method hasContainsKeyword
to retrieve the value of it. The value of hasContains
is changed only in FQLPreprocessor.fixEmptyContains
.
The problem Roger encountered in #7683-5 seems to be solved.
Committed on 7667a revision 14752.
#9 Updated by Andrei Bălteanu 8 months ago
While trying to create a testcase that would help me solve this initial purpose of the task (changing the conversion rules such that multi-table queries with outer-join will convert into PreselectQuery
instead of not CompoundQuery
), I discovered a regression caused by #6196. This regression can be observed when we look at this testcase:
output to "resultFWD.txt". define temp-table t1 field f1 as integer index index1 is primary unique f1. define temp-table t2 field f2 as integer index index2 is primary unique f2. create t1. t1.f1 = 10. create t1. t1.f1 = -15. create t2. t2.f2 = -15. create t2. t2.f2 = 20. define query q0 for t1, t2. open query q0 preselect each t1, each t2 outer-join where t1.f1 eq t2.f2. get next q0. message t1.f1. message t2.f2. message "--------". get next q0. message t1.f1. message t2.f2. message "--------".
The result in 4GL is:
-15 -15 -------- 10 ? --------
The result in FWD is:
-15 -15 -------- -15 -15 --------
I think the problem is in
PreselectQuery.load()
. Inside this method, we iterate trough the row ids of the results. For the second next, the temp-table
t2
has no value that matches the join clause resulting in the row id to be null. That will cause this if to be hit:if (id == null) { isNullId = true; break; }
Because
isNullId
is TRUE
, a call to PreselectQuery.setEmptyBuffersUnknown()
will be done. Inside this method, we iterate trough the original components (which in this case are 2) and check if the buffer associated is null.Since that is not the case, because each component already has a buffer that contains the previous result (-15 for both), nothing will be done, resulting in the buffer not to change.
#10 Updated by Alexandru Lungu 8 months ago
Review of 7667a changes:
- Please limit this only to outer-join cases. I think it is right to join two tables if there is an
contains
clause as long as they are not outer. With your changes, the optimization will always be ruled out. - Please rework the
hasContains
flag into ahasNonConstantContains
to handle only the cases ofcontains
with dynamic parameter. You should do some extra steps infixEmptyContains
to set this flag only if the one of its parameter is non-constant. For this matter, you need eventually to build an auxiliary static methodFQLPreprocessor.isConstant
that is able to detect if a FQL AST is a constant expression (a.k.a doesn't have a database reference). Basically, it should return false if it ever encountersPROPERTY
type.
#12 Updated by Roger Borrello 8 months ago
Branch 7667a has been rebased to the latest trunk (14754).
#13 Updated by Andrei Bălteanu 8 months ago
Regarding the problem from #7667-9, I was thinking about the following changes:
=== modified file 'src/com/goldencode/p2j/persist/PreselectQuery.java' --- old/src/com/goldencode/p2j/persist/PreselectQuery.java 2023-08-30 14:49:03 +0000 +++ new/src/com/goldencode/p2j/persist/PreselectQuery.java 2023-09-28 08:31:41 +0000 @@ -2994,12 +2994,22 @@ try { boolean isNullId = false; - for (Long id : ids) + for (int i = 0; i < len; i++) { - if (id == null) + if (ids[i] == null) { - isNullId = true; - break; + QueryComponent comp = components.get(i); + + if (comp.isOuter() && i > 0) + { + RecordBuffer buffer = comp.getBuffer(); + buffer.setUnknownMode(); + } + else + { + isNullId = true; + break; + } } }
Len
is a variable that stores the number of original components before optimization. We iterate through rowIds
of the results. If any are null, we check if the component is outer and not first. For those components we set the buffer to the unknownMode.
Retested the simple testcase from #7667-9 and the output in FWD matches the one in 4GL.
#14 Updated by Alexandru Lungu 8 months ago
- Fix history entry of
CompoundQuery
(AB is on a different line than the entry description) - Recheck spacing (a return has align at 4 spaces instead of 3) - no need for extra history entries
I am functionally OK with the changes, so once these points are fixed, I will wait for Igor's review co continue with the merge.
#16 Updated by Alexandru Lungu 8 months ago
Note that this task is about conversion support, so 7667a is not going to address the task, but only fix a run-time regression required by this task later on.
I am OK with the changes now.
Igor, please review.
Greg, let me know if we can merge this after the review.
#17 Updated by Igor Skornyakov 8 months ago
Alexandru Lungu wrote:
Note that this task is about conversion support, so 7667a is not going to address the task, but only fix a run-time regression required by this task later on.
I am OK with the changes now.
Igor, please review.
Alexandru,
I do not think that I'm a right person to review this change since I do not undersrand the details of the PreselectQuery logic good enough.
I believe that Ovidiu or Eric can do it better.
#19 Updated by Eric Faulhaber 8 months ago
Code review 7667a/14757:
The logic looks good. A few minor issues:
- Please avoid using the
for-each
syntax with anArrayList
(or array) in potentially performance-sensitive code. Useint len = list.size(); for (int i = 0; i < len; i++) { Foo foo = list.get(i); ... }
instead. The former creates unnecessary objects under the covers, and is slightly slower. - In
CompoundQuery$Optimizer.isServerJoinPossible
in the comments starting at line 3667, please insert a space between the double-slash and the start of each comment. Please correct "non-contans" to "non-constant" in the comments and logging messages. - The javadoc for
FQLPreprocessor.hasContains
is a bit confusing. It references thecontains
keyword, but that is not checked here; the fact that the method (currently) is only called when processing acontains
operator is only known by the caller. Please use the term "node" instead of "root". Root has a very specific meaning: it is the single ancestor node of the entire AST, not the parent of a particular branch.
BTW, I know this next item is not part of this update, but the edit in FQLP.fixEmptyContains
made me aware of it: we really should not be doing a separate, dedicated walk of the entire AST for fixEmptyContains
(unless it is unavoidable, due to the change it makes to the tree). There is a cost to the iteration of nodes, and the contains
keyword is relatively rarely used. A missing operand for contains
is even more rare. So, we are doing an extra walk of the entire tree for every where clause, to detect/correct a missing contains
operand, when this is likely to not occur in 99.999...% of cases. If this adjustment can be integrated into mainWalk
instead, it would be more efficient. I do not want to hold up the merge for this, since it is pre-existing, but please queue this for attention afterward.
#20 Updated by Eric Faulhaber 8 months ago
Andrei Bălteanu wrote:
Regarding the problem from #7667-9, I was thinking about the following changes:
[...]Len
is a variable that stores the number of original components before optimization. We iterate throughrowIds
of the results.
If any are null, we check if the component is outer and not first. For those components we set the buffer to the unknownMode.
Retested the simple testcase from #7667-9 and the output in FWD matches the one in 4GL.
This change looks ok to me, but this is code which is executed in a lot of different scenarios, so regression testing is warranted.
#21 Updated by Roger Borrello 8 months ago
Branch 7667 has been rebased to trunk_14756.
#22 Updated by Alexandru Lungu 8 months ago
Andrei, please address #7667-19 asap (including the move of your code in mainWalk
to avoid extra AST traversals). As these are (theoretically) minor, I will do a quick review after and merge.
#24 Updated by Alexandru Lungu 8 months ago
Andrei, I am OK with the changes. Still, isConstant
is not "integrated" into the mainWalk - there are still two walks done in parallel. This should be fixed, but I don't have quite an idea how you can isolate.
There is a cost to the iteration of nodes, and the contains keyword is relatively rarely used. A missing operand for contains is even more rare. So, we are doing an extra walk of the entire tree for every where clause, to detect/correct a missing contains operand, when this is likely to not occur in 99.999...% of cases.
Creating a separate topic #7855.
I am ready for merge of 7667a.
#26 Updated by Alexandru Lungu 8 months ago
Branch 7667a was merged to trunk rev. 14757 and archived.
#28 Updated by Andrei Bălteanu 8 months ago
- Status changed from Review to WIP
- % Done changed from 20 to 30
The branch 7667a was created to solve a problem related to a run-time optimization. This was not the initial purpose of this task, it was
a regression that was discovered in the process.
The purpose of this task is to change the conversion rules such that preselect multi-table queries with outer-join will convert into PreselectQuery, not CompoundQuery, at conversion-time.
I think we can move it at 30%, because there is still work to do.
#29 Updated by Andrei Bălteanu 8 months ago
I retested the testcase from #6196-47, but instead of for each
I used preselect each
.
Without the changes from #7667-13 there are multiple places where the output differs from the one in 4GL.
Using the patch, the output in FWD is the same one as the one from 4GL.
Committed the changes to 7667b revision 14752.
#30 Updated by Alexandru Lungu 7 months ago
I just checked-out 7667b and changes are OK (#7667-13).
I suggest getting this into trunk when possible; this was an obvious regression that got fixed in 7667b. The sooner we merge this, the better. I will queue it in my testing routine. Keeping you updated with the testing.
#33 Updated by Alexandru Lungu 4 months ago
Branch 7667b was merged to trunk revision 14916 and archived.