Bug #2552: bad hql caching prevents queries to position on the correct rows - Database - Golden Code Redmine

Bug #2552

bad hql caching prevents queries to position on the correct rows

Added by Ovidiu Maxiniuc about 9 years ago. Updated over 7 years ago.

Status:

Closed

Priority:

High

Assignee:

Ovidiu Maxiniuc

Target version:

FWD - Cleanup and Stablization for Server Features

Start date:

Due date:

% Done:

100%

billable:

vendor_id:

GCD

case_num:

version_reported:

version_resolved:

hql-issue.p (2.49 KB) Ovidiu Maxiniuc, 04/20/2015 01:41 PM

hql-issue.p - updated/improved testcases (4.78 KB) Ovidiu Maxiniuc, 04/21/2015 02:02 PM

om_upd20150421a.zip (81.1 KB) Ovidiu Maxiniuc, 04/21/2015 02:02 PM

om_upd20150422a.zip - incremental update for relational ops / unknown values (28.4 KB) Ovidiu Maxiniuc, 04/22/2015 01:00 PM

hql-issue2.p - testcase for relational ops / unknown values (2.25 KB) Ovidiu Maxiniuc, 04/22/2015 01:00 PM

om_upd20150603a.zip (28.6 KB) Ovidiu Maxiniuc, 06/03/2015 02:07 PM

om_upd20150604a.zip (28.5 KB) Ovidiu Maxiniuc, 06/04/2015 11:38 AM

om_upd20150608c.zip (82.3 KB) Ovidiu Maxiniuc, 06/08/2015 12:53 PM

History

#1 Updated by Ovidiu Maxiniuc about 9 years ago

Status changed from New to WIP
Target version set to Milestone 11
Start date deleted (~~04/17/2015~~)

When a query iterates over multiple tables joined by a non-mandatory filed, sometimes it will fail to locate the correct rows.
Here is the testcase:

DEFINE TEMP-TABLE tt1
    FIELD f1 AS INT
    FIELD f2 AS DATE.

CREATE tt1.
ASSIGN tt1.f1 = 0 tt1.f2 = ?.

CREATE tt1.
ASSIGN tt1.f1 = 1 tt1.f2 = TODAY.

DEFINE TEMP-TABLE tt2
    FIELD f1 AS INT
    FIELD f2 AS DATE.

CREATE tt2.
ASSIGN tt2.f1 = 0 tt2.f2 = 12/12/12.

CREATE tt2.
ASSIGN tt2.f1 = 1 tt2.f2 = TODAY.

DEFINE VARIABLE qh AS HANDLE NO-UNDO.
CREATE QUERY qh.

qh:SET-BUFFERS(BUFFER tt1:HANDLE, BUFFER tt2:HANDLE).
qh:QUERY-PREPARE("FOR EACH tt1, EACH tt2 WHERE tt2.f1 EQ tt1.f1 AND tt2.f2 EQ tt1.f2").
qh:QUERY-OPEN.
qh:GET-NEXT(NO-LOCK).

IF NOT qh:QUERY-OFF-END THEN DO:
   DISPLAY tt1.
   DISPLAY tt2.
END.
ELSE 
   MESSAGE "QUERY FAILED.".

qh:QUERY-CLOSE().
DELETE OBJECT qh.

The output from P4GL is:

┌───────────────────────────────────────┐
│        f1 f2               f1 f2      │
│────────── ──────── ────────── ────────│
│         1 17/04/15          1 17/04/15│
└───────────────────────────────────────┘

P2J will fail to locate the records that are sought and displays QUERY FAILED. message.

The cause of the issue is the fact that PreselectQuery.Component keeps an private filed hqlPreprocessor that 'preprocessed' one-time-only.
The preprocessing involves replacing the "= NULL" with "NOT NULL" phrases. In a join where the common field is unknown/null in the first iteration will cause the entire query to get stuck with "not null" even if next row of the components are not null any more.

In the protected PreselectQuery.Component.getHQLPreprocessor(), hqlPreprocessor is lazy initialized and kept for subsequent usages, but it should not be cached at all. Or at least it should re-processed with each set of resolvedArgs.

#2 Updated by Eric Faulhaber about 9 years ago

Assignee set to Ovidiu Maxiniuc
Priority changed from Normal to High

#3 Updated by Ovidiu Maxiniuc about 9 years ago

Eric,
In HQLPreprocessor.get() there is a check (line ~454) if the current arguments contain a null/unknown value.
I believe that there was some kind of handling of this issue or something similar. I understand that at this moment, only HQLPreprocessor with not-null arguments are cached.
For a quick solution, I wonder: what if we cache the all the objects ?

Another thing that is worthy to be mentioned here: it is express noted that the cache is simplified of inlined HQLs. I'm thinking that replacing = null with correct syntax is also some king of inlining - the reason the HQL is not seek in the cache. I discovered this bug when I observed that the HQL contained a "not null" when there shouldn't be any. What if other inline operations causes the same bug? I mean: 1st time the HQL is processed, inlined, not saved in cache, but saved in PreselectQuery.Component as a private field. If the arguments that were inlined change during the loop iteration of the previous Component, wouldn't that render the HQL stored in PreselectQuery.Component member field invalid ?

#4 Updated by Eric Faulhaber about 9 years ago

Ovidiu Maxiniuc wrote:

Eric,
In HQLPreprocessor.get() there is a check (line ~454) if the current arguments contain a null/unknown value.
I believe that there was some kind of handling of this issue or something similar. I understand that at this moment, only HQLPreprocessor with not-null arguments are cached.
For a quick solution, I wonder: what if we cache the all the objects ?

Please explain how this would solve the issue. Also, I'm concerned this would make the cache huge.

Another thing that is worthy to be mentioned here: it is express noted that the cache is simplified of inlined HQLs. I'm thinking that replacing = null with correct syntax is also some king of inlining - the reason the HQL is not seek in the cache. I discovered this bug when I observed that the HQL contained a "not null" when there shouldn't be any. What if other inline operations causes the same bug? I mean: 1st time the HQL is processed, inlined, not saved in cache, but saved in PreselectQuery.Component as a private field. If the arguments that were inlined change during the loop iteration of the previous Component, wouldn't that render the HQL stored in PreselectQuery.Component member field invalid ?

I think I tried to address this with PreselectQuery$Component.resetArgs(). I have not debugged this in a long time...perhaps that mechanism is flawed or is not called at the right time?

#5 Updated by Ovidiu Maxiniuc about 9 years ago

Indeed, storing all inlined HQLPreprocessor is not a good idea. Many of them won't ever be reused. The difficulty is identifying when the HQLPreprocessor should be dropped or reset. I had a short look over the PreselectQuery$Component.resetArgs() and this looks like the right solution. I am investigating now why this is not called at the right moment or is not doing its job right.

Thanks!

#6 Updated by Eric Faulhaber about 9 years ago

Ovidiu Maxiniuc wrote:

The difficulty is identifying when the HQLPreprocessor should be dropped or reset.

The original idea was to cache only those which had NOT had specific arguments inlined, since those would be useful only with those specific substitution parameters (and the HQL we use as part of the key would have been changed).

The implementation of inlining was expanded over time to address other issues as we found them. Nevertheless, as long as the HQLPreprocessor instance is aware it had been inlined (i.e., wasInlined() returns true), this should still work. IIRC, the inlining of null substitution parameters as an is [not] null phrase was one of the first types of inlining to be implemented, so I would expect that to work, but it is worth confirming this mechanism has not been compromised by later changes.

Even if the wasInlined() part is still working properly, your findings suggest the code which uses that information may be flawed.

#7 Updated by Ovidiu Maxiniuc about 9 years ago

wasInlined() seems to work correctly but resetArgs() is not called when it should.
I checked and it is only called from PreselectQuery.reset(). This is not correct. The argument should be reset after each iteration from the loop. At this moment CompoundQuery.processComponent() calls reset() but with argument resolveArgs = false because resolveOnce = true.
This is NOT how it is documented in the javadoc of CompoundQuery:

FOR {EACH | FIRST | LAST}              resolveOnce = false

#8 Updated by Ovidiu Maxiniuc about 9 years ago

I believe that I found why resolveOnce = true.
The converted java code is:

qh.unwrapQuery().prepare("FOR EACH tt1, EACH tt2 WHERE tt2.f1 EQ tt1.f1 AND tt2.f2 EQ tt1.f2");

I was expected to apply the Progress construct from my previous note. This is not the case because QueryProcessor.preparePredicate() will rewrite the predicate like OPEN QUERY DynGenQuery FOR EACH tt1, EACH tt2 WHERE tt2.f1 EQ tt1.f1 AND tt2.f2 EQ tt1.f2. So the converter will go for 1st choice listed in the CompoundQuery javadoc:

OPEN QUERY FOR                         resolveOnce = true

#9 Updated by Eric Faulhaber about 9 years ago

So, the conversion semantic is being honored correctly, but there is a flaw with our runtime implementation? To help me understand this better, would you please post the converted form of the dynamic query? I'd like to see what the HQL and substitution parameter(s) look like.

#10 Updated by Ovidiu Maxiniuc about 9 years ago

I've done some investigations, rewriting the dynamic query from the note 1 as a static iteration and also in a Progress query. Here are the P4GL sources:

FOR EACH tt1, EACH tt2 WHERE tt2.f1 EQ tt1.f1 AND tt2.f2 EQ tt1.f2:
   DISPLAY tt1.
   DISPLAY tt2.
END.

DEFINE QUERY qq FOR tt1, tt2.
OPEN QUERY qq FOR EACH tt1, EACH tt2 WHERE tt2.f1 EQ tt1.f1 AND tt2.f2 EQ tt1.f2.
GET FIRST qq. 
DO WHILE AVAILABLE tt1:
   DISPLAY tt1.
   DISPLAY tt2.
   GET NEXT qq.
END.

and java sources respectively:

query0 = new CompoundQuery(false, false);
query0.addComponent(new AdaptiveQuery(tt1, (String) null, null, "tt1.id asc"));
RecordBuffer.prepare(tt2);
query0.addComponent(new AdaptiveQuery(tt2, "tt2.f1 = ? and tt2.f2 = ?", null, "tt2.id asc", new Object[]
{
   new FieldReference(tt1, "f1"),
   new FieldReference(tt1, "f2")
}));

public void body()
{
   query0.iterate();
   [...]
}

query0.assign(new CompoundQuery(true, false));
query0.addComponent(new AdaptiveQuery(tt1, (String) null, null, "tt1.id asc"));
RecordBuffer.prepare(tt2);
query0.addComponent(new AdaptiveQuery(tt2, "tt2.f1 = ? and tt2.f2 = ?", null, "tt2.id asc", new Object[]
{
   new FieldReference(tt1, "f1"),
   new FieldReference(tt1, "f2")
}));
query0.open();
query0.first();

while (tt1._available())
{
   [...]
   query0.next();
}

Note that the general structure is the same, one first major difference is the QUERY version has a precondition (in the while loop) while the 1st form depends on QueryOffEndException to leave the loop when the query is iterate() -ed. The second difference is in the CompoundQuery constructor arguments: false, false for FOR EACH construct and true, false for the QUERY. They are the in accord with the CompoundQuery javadoc.
There is one problem though: in P4GL, both codes will print out the same output:

┌───────────────────────────────────────┐
│        f1 f2               f1 f2      │
│────────── ──────── ────────── ────────│
│         1 20/04/15          1 20/04/15│
└───────────────────────────────────────┘

(this is the same as the dynamic part in fact), while running converted Java of the QUERY version will not print any matching rows.

The dynamic P2J is generated using the OPEN QUERY syntax, so the behavior became quite expected with these new informations.

I tried to force the resolveOnce to false for the query, too, and the java version gave the expected result. I checked the flow of that value and I saw that it is used only for reset() -ting the query when the next component is processed.

At this moment I think we should always add components of multiple joined tables with resolveOnce set to false. This was thought as an optimization (to skip re-evaluating the arguments and adjusting the hql when possible) but if the query iterates multiple tables joined this way, it won't work.

#11 Updated by Eric Faulhaber about 9 years ago

Ovidiu Maxiniuc wrote:

At this moment I think we should always add components of multiple joined tables with resolveOnce set to false. This was thought as an optimization (to skip re-evaluating the arguments and adjusting the hql when possible) but if the query iterates multiple tables joined this way, it won't work.

So, is your conclusion limited to the [OPEN QUERY] FOR EACH form of CompoundQuery? Have you tested with [OPEN QUERY] PRESELECT forms as well?

What about this logic in PreselectQuery$Component.resetArgs:

if (hqlPreprocessor != null && hqlPreprocessor.wasInlined())
{
   resetHQLPreprocessor();
}

Can this go wrong in the case where the HQLPreprocessor was not inlined, because the first substitution parameter was not null, but in a subsequent pass of the loop is null? In this case, wouldn't the non-inlined preprocessor incorrectly be re-used?

If so, we need to find a way to correct this which does not throw away a good HQLPreprocessor unnecessarily, since these are quite expensive to create.

#12 Updated by Ovidiu Maxiniuc about 9 years ago

File hql-issue.p added

Eric Faulhaber wrote:

Ovidiu Maxiniuc wrote:

At this moment I think we should always add components of multiple joined tables with resolveOnce set to false. This was thought as an optimization (to skip re-evaluating the arguments and adjusting the hql when possible) but if the query iterates multiple tables joined this way, it won't work.

So, is your conclusion limited to the [OPEN QUERY] FOR EACH form of CompoundQuery? Have you tested with [OPEN QUERY] PRESELECT forms as well?

I tested right now. In my testcase the generated code for PRESELECT-ed query looks like

query0.assign(new PreselectQuery("tt1.id asc, tt2.id asc"));

so it does not use CompoundQuery constructor. When it is looped, the hql (

select tt1.id, tt2.id from TempRecord1Impl as tt1, TempRecord2Impl as tt2 where (tt1._multiplex = ?0) and ((tt2._multiplex = ?1) and (tt2.f1 = tt1.f1 and tt2.f2 = tt1.f2)) order by tt1.id asc, tt1._multiplex asc, tt2.id asc

) is feed directly to Hibernate Session to produce & scroll a Query. There is one more bug here! In Progress, tt2.f2 = tt1.f2 stands even if both are nulls/unknown. If this hql is used in hibernate without pre-processing, rows that satisfy tt2.f2 = tt1.f2 = null will not be selected!

What about this logic in PreselectQuery$Component.resetArgs:
[...]
Can this go wrong in the case where the HQLPreprocessor was not inlined, because the first substitution parameter was not null, but in a subsequent pass of the loop is null? In this case, wouldn't the non-inlined preprocessor incorrectly be re-used?
If so, we need to find a way to correct this which does not throw away a good HQLPreprocessor unnecessarily, since these are quite expensive to create.

Indeed, if the first loop does not need "is [not] null" inlining, the query gets stuck so future iterations. I believe we need to always reset the hql or we should at least use a different flag for is [not] null inlineing. It's preferable to be slow and correct rather than fast and wrong.

Please see the attached issue. In P4GL it prints of 30 (ie records 2, 3 and 5 all gets selected). In PJ2 does not.
Here are my findings in these scenarios:

with P2J untouched (bzr 10840) it will print (3 10 3 10), half of the records are clearly ignored;
if resolveOnce = false for OPEN QUERY we get (10 10 10 10) - slightly better, but the unknown pairs (record with id 3) are not selected because hql got stuck with inline = false;
otherwise, always resetting the hqlPreprocessor, we get (3 10 3 10), the same as the initial test. Since resolveOnce = true, the PreselectQuery.Component.resetArgs() does not get called so the hqlPreprocessor is kept;
if resolveOnce = false for OPEN QUERY and PreselectQuery.Component.resetArgs() always resets the hqlPreprocessor we get (30 30 30 10).

We can see that the last result (for preselect query) is clearly unaffected by these changes. The wrong values is the result of the missing records with id = 3 because both these records have f2=? - see my 1st answer in this note.

#13 Updated by Ovidiu Maxiniuc about 9 years ago

The solution I test at this moment uses a little more memory to save the last configuration of null/unknown values for each query component. The hql is pre-processed as usual, except that inlining to "is [not] null" does not change the inline flag. The HQLPreprocessor.get() is aware of the iterations with unknown parameters and will not cache these instances.
There are two important steps:

creation of the HQLPreprocessor. At this moment a 'fingerprint' of the argument used for creating the custom hql is saved (boolean array with true on unknown position). This has the same lifespan as the hqlPreprocessor in the PreselectQuery.Component.
when the Component is about to be reset. If the hqlPreprocessor was inlined (other cases than [not] null substitution), then it is dropped (and the fingerprint with it). Next, the current parameters are checked against the fingerprint from previous iteration. If they match, the
hqlPreprocessor stays, otherwise it is discarded because the hql contains another configuration of unknown fields.

This way, the hql is rebuild (or even extracted from cache) only when really needed. The price is composed from:

the fingerprint (array of N booleans). Usually they are only a couple of them, I never saw more than a dozen in the customer server project. They are discarded on each iteration so the memory is collected immediately.
a linear iteration on the parameter list to check and store if they are null when the hqlPreprocessor is created and one iteration on each loop to see if the fingerprint changed in the new list of parameters. And only if the hqlPreprocessor was not otherwise inlined case in which it is dropped beforehand.

#14 Updated by Eric Faulhaber about 9 years ago

This seems like a well-reasoned, practical approach. Please post an update for review when you are satisfied with your local testing.

#15 Updated by Ovidiu Maxiniuc about 9 years ago

File hql-issue.p added
File om_upd20150421a.zip added

The attached update fixes the management of the preprocessed hql. There are some things I am not completely satisfied so some part of the old code remained in place.

Regarding the remaining bug. Normally a query like tt2.f1 = tt1.f1 in SQL will select records with respective fields f1 not nulls. To mimic the Progress semantics, the SQL/hql should be rewritten into something like: (tt2.f1 = tt1.f1 or (tt2.f1 is null and tt1.f1 is null)).
However, the syntax change should be doe not only for EQ operator, but for all of the relational operators. I believe that the easiest solution is to use the functions exported from Operators that return the ~~correct~~ expected relations between database fields.

#16 Updated by Eric Faulhaber about 9 years ago

Ovidiu Maxiniuc wrote:

Regarding the remaining bug. Normally a query like tt2.f1 = tt1.f1 in SQL will select records with respective fields f1 not nulls. To mimic the Progress semantics, the SQL/hql should be rewritten into something like: (tt2.f1 = tt1.f1 or (tt2.f1 is null and tt1.f1 is null)).
However, the syntax change should be doe not only for EQ operator, but for all of the relational operators. I believe that the easiest solution is to use the functions exported from Operators that return the ~~correct~~ expected relations between database fields.

The problem with this approach is the database query planner. We don't use the Operators UDFs (I've tried before) because they are opaque to the database's query planner and cause the database to choose very inefficient query plans (i.e., lots of table scans). This is a non-starter, since it effectively disables the use of database indexes in queries and thus absolutely kills query performance.

#17 Updated by Ovidiu Maxiniuc about 9 years ago

Then the only solution is to rewrite the relational operators applied on two fields in the hqls as mentioned in note 15.

#18 Updated by Eric Faulhaber about 9 years ago

Agreed, we should try this. Hopefully, the or in the where clause does not cause a performance problem as well.

I think we probably should implement this in the HQLPreprocessor, to keep the converted HQL cleaner. On the other hand, this approach is a bit opaque, in that the behavior of the HQL the developer sees is changed at runtime. What are your thoughts?

#19 Updated by Eric Faulhaber about 9 years ago

Code review 0421a:

Looks good, please regression test. It seems the change to the CompoundQuery(boolean, boolean) constructor be accompanied (perhaps in a separate pass) by a conversion change which eliminates the resolveOnce parameter? If it is always false now, why not eliminate it altogether?

#20 Updated by Ovidiu Maxiniuc about 9 years ago

File hql-issue2.p added
File om_upd20150422a.zip added

Eric Faulhaber wrote:

Agreed, we should try this. Hopefully, the or in the where clause does not cause a performance problem as well.

I think we probably should implement this in the HQLPreprocessor, to keep the converted HQL cleaner. On the other hand, this approach is a bit opaque, in that the behavior of the HQL the developer sees is changed at runtime. What are your thoughts?

Please find attached the testcase and the update that fixes it. As you can see, without the update the result is bad (it outputs the row results too, in a text file):

                  EQ         NE         LT         GT        LTE        GTE
Expected:     30,008    180,741        107        512     30,115     30,520
Computed:          5        619        107        512        112        517

The adjustments done are the following:

`EQ`	tt2.f2 = tt1.f2	tt2.f2 = tt1.f2 or tt2.f2 is null and tt1.f2 is null
`NE`	tt2.f2 != tt1.f2	tt2.f2 != tt1.f2 or tt2.f2 is null and tt1.f2 is not null or tt2.f2 is not null and tt1.f2 is null
`LT`	tt2.f2 < tt1.f2	changes not required
`GT`	tt2.f2 > tt1.f2	changes not required
`LTE`	tt2.f2 <= tt1.f2	tt2.f2 <= tt1.f2 or tt2.f2 is null and tt1.f2 is null
`GTE`	tt2.f2 >= tt1.f2	tt2.f2 >= tt1.f2 or tt2.f2 is null and tt1.f2 is null

The most complicated is the NE, I don't know if there is a shorter form for the expression. The good part is that formed of OR-s too.

All these changes happen in the HQLPreprocessor. I understand that the queries might confuse peoples, but they are required to match the results P4GL returns using HQL/SQL dialect. I suppose a knowledgeable programmer does not expect to have P4GL simply work in SQL as well. If needed, we can add some warnings in the docs/javadoc.

#21 Updated by Ovidiu Maxiniuc about 9 years ago

Eric Faulhaber wrote:

Code review 0421a:

Looks good, please regression test. It seems the change to the CompoundQuery(boolean, boolean) constructor be accompanied (perhaps in a separate pass) by a conversion change which eliminates the resolveOnce parameter? If it is always false now, why not eliminate it altogether?

The CompoundQuery remained the same because I wanted to be sure the runtime works fine before making more changes. Unfortunately, there were many (hundreds) regressions in the test. It's clear something is wrong in there.

#22 Updated by Eric Faulhaber about 9 years ago

Ovidiu Maxiniuc wrote:

Eric Faulhaber wrote:

Code review 0421a:

Looks good, please regression test. It seems the change to the CompoundQuery(boolean, boolean) constructor be accompanied (perhaps in a separate pass) by a conversion change which eliminates the resolveOnce parameter? If it is always false now, why not eliminate it altogether?

The CompoundQuery remained the same because I wanted to be sure the runtime works fine before making more changes. Unfortunately, there were many (hundreds) regressions in the test. It's clear something is wrong in there.

Yes, I tested this update with the passing and failing tests overnight and it went badly (did not finish, lots of timeouts and failures). I noticed some new failures in the primary key search tests, so I then ran just the passing search tests. There were 30+ new failures. Unfortunately, my archive of the results was overwritten, so I need to run the tests again. I'll post the before and after results once that's done.

Also, you've added a SEVERE-level log message about the OUTER-JOIN to INNER-JOIN override. AFAIR, this is what Progress does (please double-check this), and I was just mimicking the behavior, so I don't think this deserves an error message -- I'm pretty sure Progress does not give you one. There are many of these messages in the server.log and it is unnecessarily alarming. I think we should (a) reduce the logging severity to FINE (since I believe this is Progress-compatible behavior); and (b) make the message more informative, so the reader can actually associated it with a query.

#23 Updated by Ovidiu Maxiniuc about 9 years ago

Yes, I tested this update with the passing and failing tests overnight and it went badly (did not finish, lots of timeouts and failures). I noticed some new failures in the primary key search tests, so I then ran just the passing search tests. There were 30+ new failures. Unfortunately, my archive of the results was overwritten, so I need to run the tests again. I'll post the before and after results once that's done.

I am really sorry I broke your test results. My tests were quite focused on the task, I did not fully test it to see any side effects.

Also, you've added a SEVERE-level log message about the OUTER-JOIN to INNER-JOIN override. AFAIR, this is what Progress does (please double-check this), and I was just mimicking the behavior, so I don't think this deserves an error message -- I'm pretty sure Progress does not give you one. There are many of these messages in the server.log and it is unnecessarily alarming. I think we should (a) reduce the logging severity to FINE (since I believe this is Progress-compatible behavior); and (b) make the message more informative, so the reader can actually associated it with a query.

I will certainly remove the message, it was not meant to reach the repository in this form. In fact the only thing that really needs logged is when the conversion from OUTER-JOIN to INNER-JOIN truly happens. So far I did not observe this, I just wanted to be sure it is logged if ever the case.

#24 Updated by Eric Faulhaber about 9 years ago

Ovidiu Maxiniuc wrote:

I haven't reviewed the update yet, but to respond to one of your items...

The most complicated is the NE, I don't know if there is a shorter form for the expression. The good part is that formed of OR-s too.
...

NE tt2.f2 != tt1.f2 tt2.f2 != tt1.f2 or tt2.f2 is null and tt1.f2 is not null or tt2.f2 is not null and tt1.f2 is null

I'm going to use A and B (and parentheses), because it's easier for me to understand. The equivalent of your expansion is:

(B != A) or (B is null and A is not null) or (B is not null and A is null)

Doesn't this simplify to the following?

(B != A) or (B is null != A is null)

Is that valid HQL/SQL? If not, we go with your original version.

#25 Updated by Eric Faulhaber about 9 years ago

Ovidiu Maxiniuc wrote:

I am really sorry I broke your test results. My tests were quite focused on the task, I did not fully test it to see any side effects.

No apology necessary. The purpose was to regression test your update.

I will certainly remove the message, it was not meant to reach the repository in this form. In fact the only thing that really needs logged is when the conversion from OUTER-JOIN to INNER-JOIN truly happens. So far I did not observe this, I just wanted to be sure it is logged if ever the case.

I didn't notice it either, though I did not comb through all the messages (there were a lot). I think leaving the inner message (perhaps as INFO) is good, but since you are removing the more general message before it, make the prefix something like, "Converting OUTER-JOIN to INNER-JOIN: ".

#26 Updated by Ovidiu Maxiniuc about 9 years ago

Doesn't this simplify to the following?
(B != A) or (B is null != A is null)

Is that valid HQL/SQL? If not, we go with your original version.

You are right. I hardcoded it (it also have the advantage of the similarity with = form, except for central node).
I tested it in H2 query tester and the results are very good (full parentheses were required). I quickly added into HQLPreprocessor:

{tt2.f2 != tt1.f2} HQLPreprocessor => [tt2.f2 != tt1.f2 or tt2.f2 is null <> tt1.f2 is null]

Unfortunately, it does not work. The result were different. The only way to see why was to enable SQL output, and here it is:

select temprecord0_.id as col_0_0_, temprecord1_.id as col_1_0_ 
      from tt1 temprecord0_ cross join tt2 temprecord1_ 
      where temprecord0_._multiplex=? and temprecord1_._multiplex=? 
         and (temprecord1_.f2<>temprecord0_.f2 
              or 
              (temprecord1_.f2=null)<>temprecord0_.f2 is null) 
      order by temprecord0_.id asc, temprecord0_._multiplex asc, temprecord1_.id asc

As you can see there is a big failure to properly convert it. An educated guess is that we need to patch Hibernate again because is not able to handle this kind of hql-s properly.

#27 Updated by Eric Faulhaber about 9 years ago

Ovidiu Maxiniuc wrote:

As you can see there is a big failure to properly convert it. An educated guess is that we need to patch Hibernate again because is not able to handle this kind of hql-s properly.

Let's not take this on if we don't have to. If your form works and doesn't cause a significant performance penalty, let's use it. The application developer won't see this complexity anyway, unless they have to log the expanded form.

#28 Updated by Ovidiu Maxiniuc about 9 years ago

Let's not take this on if we don't have to. If your form works and doesn't cause a significant performance penalty, let's use it. The application developer won't see this complexity anyway, unless they have to log the expanded form.

OK, I will document this in the code, leaving the second tree construction commented for a future optimization.

#29 Updated by Eric Faulhaber about 9 years ago

See #1868, note 51 for the primary key search test results with and without the 0421a update. There are 33 new failures which may help track down the problem.

#30 Updated by Eric Faulhaber about 9 years ago

It is difficult to review the 0422a update, because it is built on top of the 0421a update. Since we know we're not keeping the latter due to regressions, we either need a combined update which fixes the regressions from 0421a, or a clean version of 0422a which is not mixed with the 0421a changes.

#31 Updated by Ovidiu Maxiniuc about 9 years ago

File om_upd20150603a.zip added

Eric,
I found the probable cause for the failures in regression testing of the ? relational semantics. I was working with HQLAst tree and I was expecting that if I create an OR node under an AND node, the emit() will correctly walk and create the hql using parentheses where needed. Unfortunately, emit() method is not smart enough so the LPARENTS nodes must be forced in this case. The attached update fixes this issue.

I know that the file is a little messed mostly because of the createAstNode helper method. Please ignore the 'intermediary' System.out s. I will remove them at commit time, but now they log the changes I apply in code. If there are issues with Majic, I can see immediately the bad queries. The most important changes are the blocks starting at lines 1308 and 1426. The rest are small code improvements.

I have a small question: at lines 1599, shouldn't we also handle the NOT_EQ operator?

#32 Updated by Eric Faulhaber about 9 years ago

Code review 0603a:

Looks good, nice find. Agreed, will look better with the println statements removed. Do you want to leave these behind as FINEST-level log messages, or are you concerned about the overhead?

at lines 1599, shouldn't we also handle the NOT_EQ operator?

That already evaluates to the BOOL_FALSE result. Is that an incorrect outcome?

#33 Updated by Ovidiu Maxiniuc about 9 years ago

Looks good, nice find. Agreed, will look better with the println statements removed. Do you want to leave these behind as FINEST-level log messages, or are you concerned about the overhead?

I was going to remove them. I don't think processing of the relational operators worth getting to repo. Maybe the cache PUT operation, logged with lowest priority, eventually.

at lines 1599, shouldn't we also handle the NOT_EQ operator?

That already evaluates to the BOOL_FALSE result. Is that an incorrect outcome?

In P4GL two unknowns are equals, contrary to standard SQL. So yes, NOT_EQ operator must return No when comparing two ?.

#34 Updated by Eric Faulhaber about 9 years ago

Ovidiu, was your 0603a update meant to be applied on top of the 0421a update?

I tested P2J 10871 + 0603a (without 0421a) against the full set of search unit tests. It does not fix any additional tests on its own, but it doesn't seem to regress anything either. I guess we need to correct the HQLPreprocessor caching as well to potentially see an improvement? The good news is, there was no noticeable performance degradation with 0603a -- I was worried about this, given the greater complexity of the HQL.

#35 Updated by Ovidiu Maxiniuc about 9 years ago

0603a is an independent update. I intentionally removed any relation to previous updates in order to isolate the issue at hand.
From my analysis, the impact of the rewritten SQL should be minimal on performance: the location and HQL changes are processed quickly, no iterations involved, there are rare cases when the additional boolean terms are evaluated.
I could count only 7 occurrences in the PRIMARY_KEY searches on the customer's server but there are 1227 queries affected in Majic tests. There are quite many of them and I wonder how this unknown issue is not affecting any of the tests. Probably there are not so many NULLs in database (?).

BTW, with the exception of two failed testcases in gso_ctrlc_tests because of a client crash, the update passed the regression testing.

#36 Updated by Eric Faulhaber about 9 years ago

I have posted my full search test results at #1868, note 60 (including the stdout.log this time), for your reference.

Even though no new tests passed, I agree that this update produces more correct HQL than we had before. Perhaps we will see a different result when the original, HQL preprocessor caching issue is worked out.

Please very carefully remove the debug statements or convert them to proper logging as discussed above. After confirming the code compiles after these changes, please commit and distribute the revised update.

#37 Updated by Ovidiu Maxiniuc about 9 years ago

File om_upd20150604a.zip added

Eric Faulhaber wrote:

I have posted my full search test results at #1868, note 60 (including the stdout.log this time), for your reference.

Thanks, you read my mind.

Even though no new tests passed, I agree that this update produces more correct HQL than we had before. Perhaps we will see a different result when the original, HQL preprocessor caching issue is worked out.
Please very carefully remove the debug statements or convert them to proper logging as discussed above. After confirming the code compiles after these changes, please commit and distribute the revised update.

I removed the traces to System console and even ran a quick testset against the customer server project. The attached update was committed to bzr (10873) and distributed by mail, as usually.

#38 Updated by Ovidiu Maxiniuc about 9 years ago

File om_upd20150608c.zip added

This update should handle the main issue of this task: the caching of the hql-s. It will put into cache only the queries that have no null parameters. Also the constructor for CompoundQuery will ignore the resolveOnce parameter because the hql needs to be re-evaluated in the eventuality that unknown fields were inlined with in not null.
It a first run of regression testing it has failed some testcases but I cannot clearly state if it passed or not.

#39 Updated by Eric Faulhaber about 9 years ago

Code review 0608c:

I think the changes are OK. Please post your regression test results when finished. I'll test this update tonight against the full set of search unit tests.

#40 Updated by Eric Faulhaber about 9 years ago

Eric Faulhaber wrote:

I'll test this update tonight against the full set of search unit tests.

The test went well. A handful fewer tests failed than my last run, but it's hard to make a conclusion about whether this was due to the update or the instability of the current version of the test environment (my last few runs before this have had varying results, even though the code under test and runtime were the same). If it passes regular regression testing, please check this update in and distribute.

#41 Updated by Ovidiu Maxiniuc about 9 years ago

The 0608c update passed the regression testing, was committed to bzr as revno 10876 and distributed by mail.

#42 Updated by Eric Faulhaber about 9 years ago

Status changed from WIP to Closed
% Done changed from 0 to 100

#43 Updated by Eric Faulhaber about 9 years ago

Status changed from Closed to WIP

I've re-opened this task to address a defect: it is possible for the HQLPreprocessor to create invalid ASTs when the new, NULL-handling logic affects fields of type character. This is due to the fact that by the time this tree manipulation is done, the <alias>.<property> references may already be enclosed by functions (like rtrim). The fix is to confirm the top of the branch is an ALIAS node and if not, to traverse down the branch until a descendant node of type ALIAS is found. As an optimization, we only check the child at index 0 for each descendant level, since we know which functions can be in use and the order of their parameters.

The proposed fix is in 2552a/10888 (branched from trunk 10887). I currently am testing it with the set of unit tests which exposed this defect. If that goes well, I'll regression test.

#44 Updated by Eric Faulhaber about 9 years ago

Status changed from WIP to Closed

The change fixed the unit tests I was targeting.

I rebased 2552a to trunk rev. 10888 to pick up Hynek's redirected output fix, which put this fix at 2552a task rev. 10889.

The update passed regression testing and was merged into the trunk as rev. 10889. I'm re-closing this task as a result.

#45 Updated by Greg Shah over 7 years ago

Target version changed from Milestone 11 to Cleanup and Stablization for Server Features

Also available in: Atom PDF

Project

General

Profile

FWD » Core Development » Database

Issues

Custom queries

Bug #2552

bad hql caching prevents queries to position on the correct rows

History

#1 Updated by Ovidiu Maxiniuc about 9 years ago

#2 Updated by Eric Faulhaber about 9 years ago

#3 Updated by Ovidiu Maxiniuc about 9 years ago

#4 Updated by Eric Faulhaber about 9 years ago

#5 Updated by Ovidiu Maxiniuc about 9 years ago

#6 Updated by Eric Faulhaber about 9 years ago

#7 Updated by Ovidiu Maxiniuc about 9 years ago

#8 Updated by Ovidiu Maxiniuc about 9 years ago

#9 Updated by Eric Faulhaber about 9 years ago

#10 Updated by Ovidiu Maxiniuc about 9 years ago

#11 Updated by Eric Faulhaber about 9 years ago

#12 Updated by Ovidiu Maxiniuc about 9 years ago

#13 Updated by Ovidiu Maxiniuc about 9 years ago

#14 Updated by Eric Faulhaber about 9 years ago

#15 Updated by Ovidiu Maxiniuc about 9 years ago

#16 Updated by Eric Faulhaber about 9 years ago

#17 Updated by Ovidiu Maxiniuc about 9 years ago

#18 Updated by Eric Faulhaber about 9 years ago

#19 Updated by Eric Faulhaber about 9 years ago

#20 Updated by Ovidiu Maxiniuc about 9 years ago

#21 Updated by Ovidiu Maxiniuc about 9 years ago

#22 Updated by Eric Faulhaber about 9 years ago

#23 Updated by Ovidiu Maxiniuc about 9 years ago

#24 Updated by Eric Faulhaber about 9 years ago

#25 Updated by Eric Faulhaber about 9 years ago

#26 Updated by Ovidiu Maxiniuc about 9 years ago

#27 Updated by Eric Faulhaber about 9 years ago

#28 Updated by Ovidiu Maxiniuc about 9 years ago

#29 Updated by Eric Faulhaber about 9 years ago

#30 Updated by Eric Faulhaber about 9 years ago

#31 Updated by Ovidiu Maxiniuc about 9 years ago

#32 Updated by Eric Faulhaber about 9 years ago

#33 Updated by Ovidiu Maxiniuc about 9 years ago

#34 Updated by Eric Faulhaber about 9 years ago

#35 Updated by Ovidiu Maxiniuc about 9 years ago

#36 Updated by Eric Faulhaber about 9 years ago

#37 Updated by Ovidiu Maxiniuc about 9 years ago

#38 Updated by Ovidiu Maxiniuc about 9 years ago

#39 Updated by Eric Faulhaber about 9 years ago

#40 Updated by Eric Faulhaber about 9 years ago

#41 Updated by Ovidiu Maxiniuc about 9 years ago

#42 Updated by Eric Faulhaber about 9 years ago

#43 Updated by Eric Faulhaber about 9 years ago

#44 Updated by Eric Faulhaber about 9 years ago

#45 Updated by Greg Shah over 7 years ago