Project

General

Profile

Bug #7035

Extract query parameters from statically converted FQL

Added by Alexandru Lungu over 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:
version_reported:
version_resolved:

7035-20230201.patch Magnifier (43.8 KB) Dănuț Filimon, 02/01/2023 04:21 AM

7035-20230209.patch Magnifier (35 KB) Dănuț Filimon, 02/09/2023 06:55 AM

7035-20230331.patch Magnifier (1.03 KB) Dănuț Filimon, 03/31/2023 03:54 AM

7035-jmx-20230424.patch Magnifier (19.1 KB) Dănuț Filimon, 04/24/2023 08:59 AM

History

#1 Updated by Alexandru Lungu over 1 year ago

  • Assignee set to Dănuț Filimon
  • Status changed from New to WIP

The following is a testcase in which FWD hard-codes the arguments into the FQL:

define temp-table tt no-undo field f1 as int.

define query q for tt.
open query q for each tt where tt.f1 = 2. // query0.assign(new AdaptiveQuery().initialize(tt, "tt.f1 = 2", null, "tt.recid asc"));
open query q for each tt where tt.f1 = 3. // query0.assign(new AdaptiveQuery().initialize(tt, "tt.f1 = 3", null, "tt.recid asc"));
open query q for each tt where tt.f1 = 4. // query0.assign(new AdaptiveQuery().initialize(tt, "tt.f1 = 4", null, "tt.recid asc"));
open query q for each tt where tt.f1 = 5. // query0.assign(new AdaptiveQuery().initialize(tt, "tt.f1 = 5", null, "tt.recid asc"));

This testcase doesn't benefit from query preparing and caching due to the fact that 2, 3, 4 and 5 values are hard-coded into the SQL.
Ovidiu hinted that the processing we do in FqlPreprocessor and FqlToSqlConverted actually extracts these values, but they don't properly reach the database as query parameters. Another fact is that this technique we want here is already implemented for dynamic queries in some sort.

Danut, please check if different kind of queries are actually extracting the parameters right from the conversion part: RandomAccessQuery or PreselectQuery. Afterwards, do an in-depth investigation of FqlPreprocessor (still HqlPreprocessor in the trunk) to check how such arguments can be extracted (or if they are extracted as some point). Ultimately, check FqlToSqlConverted (toSQL method). Anyways, you should also do a skim investigation of the current implementation for dynamic query "argument extractor".

Please report back with any relevant finding for this topic. The goal to is achieve something like select [...] where [...] tt.f1 = ? [...] {1 : X} instead of select [...] where [...] tt.f1 = X [...] for any hard-coded X (note that "[..]" is a marker for my laziness to write the full query).

#3 Updated by Dănuț Filimon over 1 year ago

I've experimented with the testcase provided and found out a lot about how FQLPreprocessor works. I wrote an implementation that makes use of the FQLPreprocessor.mainWalk() method called by FQLPreprocessor.preprocess(). The mainWalk method walks depth-first from root to leaves and rewrites certain nodes, this means that it is possible to rewrite nodes of right value types (NUM_LITERAL, DEC_LITERAL, STRING with the exception of BOOL_TRUE and BOOL_FALSE which already have defined cases) into the substitution type (SUBST), while saving necessary data. I made use of a list of objects which is populated with both replaced and existent arguments in the order of the processed nodes, at the end we would have the arguments and the argumentsDiff that contain both the parameters and the substituted ones. After the type of the right value node is changed, it falls into the substitution case in which I met a problem. The method FQLPreprocessor.inlineSubstitutionParameter changes the substituted argument to it's original state.

For example, here is a query that I modified and used as reference for my current test:

OPEN QUERY q FOR EACH tt WHERE tt.f1 = 3 and tt.f1 < 5. 
// query0.assign(new AdaptiveQuery().initialize(tt, "tt.f1 = 3 and tt.f1 < 5", null, "tt.recid asc"));

In the case of tt.f1 = 3, the right value is modified to ? and doesn't reach the inlineSubstitutionParameter because the EQUALS case is not specified. But
tt.f1 < 5 reaches the method as tt.f1 < ? and is substituted back to tt.f1 < 5.

What is the purpose of inlineSubstitutionParameter? The modifications I made for extracting the hard-coded values in the query are inserted back by this method. Please note that I modified the parameters received by the method to keep the same functionality.

#4 Updated by Eric Faulhaber over 1 year ago

Dănuț Filimon wrote:

What is the purpose of inlineSubstitutionParameter? The modifications I made for extracting the hard-coded values in the query are inserted back by this method. Please note that I modified the parameters received by the method to keep the same functionality.

I think at some point long ago, I decided from looking at PostgreSQL logs that it was faster to inline parameters for range matches and LIKE expressions. But this may no longer hold: it was written this way when Hibernate was generating our SQL, it was tested on a much older version of PostgreSQL, and it was speculative, in that I was looking at limited cases.

If you've found through your testing that we get better better performance generally from using substitution parameters, I think we can disable the one call to inlineSubstitutionParameter. We should have a side benefit of having fewer variants of FQL statements to preprocess and cache, which should also help performance by bypassing the main walk in some cases. But please review the calls to FQLPreprocessor.get which pass true for the inline parameter (namely, RecordBuffer.aggregate and QueryComponent.prepareHQLPreprocessor). I don't recall why we thought it was necessary to enable the inlining in only these cases. Please try setting these to false and running a test case which flows through these paths (you've already found one for the QueryComponent.prepareHQLPreprocessor case, apparently), to be sure things are still working properly with the inlining turned off.

Note that there are other cases where we have to do some inlining outside of inlineSubstitutionParameter, due to other refactoring of the FQL during preprocessing. These cases should be left alone, but they don't go through that inlineSubstitutionParameter code path, and therefore should not impact FQL like the example you gave.

#5 Updated by Alexandru Lungu over 1 year ago

Eric Faulhaber wrote:

Dănuț Filimon wrote:
Note that there are other cases where we have to do some inlining outside of inlineSubstitutionParameter, due to other refactoring of the FQL during preprocessing. These cases should be left alone, but they don't go through that inlineSubstitutionParameter code path, and therefore should not impact FQL like the example you gave.

I like to point out a case where inlining was critical in one of my past tasks:
  • The multi-table AdaptiveQuery (optimized from a compound query) starts in preselect mode. That is, a clause like tt.f1 = ? with a parameter of type FieldReference should be "inlined". This way, the joining clause should be done at the SQL-side (lets say tt.f1 = tt2.f2).
  • Not inlining the parameter results in tt.f1 = ? with a resolved parameter, lets say purely 6. This way, the PreselectQuery will retrieve all records with f1 set on 6, not dynamically based on the value of the FieldReference (lets say tt2.f2).
  • However, hard-coding the FieldReference (tt2.f2) into the where clause, much like how PreselectQuery does, is not OK. In case the query is invalidated, it should fallback to a CompoundQuery. Evidently, the single-table component of this query can't handle tt1.f1 = tt2.f2, because tt2 is not joined in the SQL, but independently fetched by FWD. Only for this matter, it is OK not to inline the FieldReference, but resolve it to its static value (lets say 6).

Note that this example is only for FieldReference query parameters. This whole process is done by inlineReferenceSubs inside FQLPreprocessor. This task is related mostly to literal extraction, but please take care of other arguments like FieldReference.

#6 Updated by Dănuț Filimon over 1 year ago

I wrote an implementation for that makes use of the FQLPreprocessor.mainWalk method to parse the where clause AST and rewrite certain nodes. I added NUM_LITERAL, DEC_LITERAL, STRING cases to the mainWalk method which represent possible value types in the expression. BOOL_FALSE, BOOL_TRUE can also be types in the expression, but cases for them were already present so I decided to currently avoid them. My first idea was to let the mentioned cases continue into the SUBST case which eventually called the inlineSubstitutionParameter method and reverted the substitution changes, but eventually decided that it would be better to separate the cases from the SUBST.

I've refactored the ParameterIndices class into ParameterDiff and changed it's functionality. ParameterIndices was a mapping of query substitution parameter indices, represented by the value of the "index" annotation of the AST node and the change I made was to store an array of functions for retrieving the substitution parameters from the "extracted-value" annotation that stores the value of the replaced node or from an array of Objects given as a parameter.

I am currently testing how the changes affect functions and clauses where substitution parameters and query parameters are mixed. In the following example all three NUM_LITERAL values are replaced with ? and no error is thrown when the statement is executed.

    query0.assign(new AdaptiveQuery().initialize(tt, "tt.f1 > 1 and modulo(tt.f1, 3) = 0", null, "tt.recid asc"));

Are there any cases where NUM_LITERAL, DEC_LITERAL, STRING should not be substituted?

#7 Updated by Eric Faulhaber over 1 year ago

Dănuț Filimon wrote:

Are there any cases where NUM_LITERAL, DEC_LITERAL, STRING should not be substituted?

Not that I can think of at the moment, but please ensure that any re-ordering of the parameters is preserved.

The re-ordering might only have been used for "normalized" extent field element references (expressed as field[index] in the FQL, but needing to reference a list__index value in a secondary, extent field table).

Please add a test with such an extent field reference, and make sure it still works, both in the current ("normalized") form and also in the new default (expanded, a.k.a. "denormalized") form. Please see #7020 and #2134 for more info on what I am talking about.

Thanks!

#8 Updated by Alexandru Lungu over 1 year ago

Dănuț Filimon wrote:

What is the purpose of inlineSubstitutionParameter? The modifications I made for extracting the hard-coded values in the query are inserted back by this method. Please note that I modified the parameters received by the method to keep the same functionality.

Dănuț, I've found some cases where inlining can happen and may severely affect the parameter list:

define temp-table tt field f1 as char field fk as int index idx1 is unique f1 ascending.
define temp-table tt2 field f2 as char field fk as int field fk2 as int index idx2 fk ascending.
define temp-table tt3 field f3 as char field fk2 as int index idx3 is unique fk2 ascending.

def var hq as handle.
create query hq.
hq:forward-only = yes.
hq:add-buffer(temp-table tt:default-buffer-handle).
hq:add-buffer(temp-table tt2:default-buffer-handle).
hq:add-buffer(temp-table tt3:default-buffer-handle).
hq:query-prepare("for each tt where tt.f1 = 'abc' and true, each tt2 where tt.fk = tt2.fk and true, first tt3 where tt3.fk2 = tt2.fk2 and tt3.f3 = 'xyz' and true").
hq:query-open().
hq:get-first(no-lock).

This is a pretty complex test; having a good extraction implementation working on this test-case should be a good deal. Note that there are some true clauses. All of them are extracted as parameters by the dynamic query parser. However, the FQLPreprocessor notices that and ? with substitution true is an irrelevant clause, so it prefers to inline and optimize it out altogether.

A thing to note here is that FQLPreprocessor.get does the following if query parameters exist: inline = inline && dialect.isQueryRangeParameterInlined();. This setting is disabled for H2 dialect, so this kind of inlining is never done for H2. However, testing this with PostgreSQL using persistent tables has a different behavior as true is actually inlined and the clause is completely removed. I think this is the exact setting Eric is talking about in #7035-4, regarding range matches.

My point here is that temp-table examples are really different than the persistent-table examples just because of this inlining difference. Because inlining is done and the clause is removed, you may lack feedback and so you may end up with an extra true parameter in your list.

#9 Updated by Dănuț Filimon over 1 year ago

Eric Faulhaber wrote:

Please add a test with such an extent field reference, and make sure it still works, both in the current ("normalized") form and also in the new default (expanded, a.k.a. "denormalized") form. Please see #7020 and #2134 for more info on what I am talking about.

I tested extent fields, both normalized and denormalized cases executed without any problems. I noticed that, for example, when field[2] is given, the value 2 is extracted and it will be matched to the list__index in the prepared statement (normalized form).

Alexandru Lungu wrote:

My point here is that temp-table examples are really different than the persistent-table examples just because of this inlining difference. Because inlining is done and the clause is removed, you may lack feedback and so you may end up with an extra true parameter in your list.

You are right, from the provided test I noticed that the query requires 2 substitution parameters, but 3 are given. This causes an error, but I managed to fix it.

Here is a summary on how query parameters are extracted and used:
  • Refactored ParameterIndices into ParameterDiff. Instead of storing indices for parameters, lambdas for retrieveing each parameter are stored.
  • In FQLPreprocessor.mainWalk, cases for NUM_LITERAL, DEC_LITERAL, STRING are used to store data about the index, datatype, and the value of the node and change the node type to SUBST.
    Even if the node type is changed to SUBST, the case is not visited to avoid inlining the parameters again.
  • Refactored createParameterIndices into createParameterDiffs. Lambdas are created for all parameters (existent and extracted) which are not inlined.
  • Modified methods in QueryComponent, RandomAccessQuery, RecordBuffer, TemporaryBuffer for retrieving the functions from ParameterDiff and then getting the parameters.

I attached the patch, please review and advice.

#10 Updated by Alexandru Lungu over 1 year ago

Review for patch #7035-9
There is consistent work going on with this extraction and I am pleased with this new ParamersDiff new approach. This allows us to extract / inline parameters way easier than the former index mapping.

I have several remarks:
  • First of all, I get a NPE in com.goldencode.p2j.persist.ParameterDiff.getSubstParameters(ParameterDiff.java:194) on a large customer application. I think this can be avoided with a simple null check. If this is the case, just repost the patch with the according null check and follow this review. If you have trouble reproducing, I can dig the use-case.
  • ParameterDiff paramDiff = null and List<Object> parametersDiff are two new attributes to FQLPreprocessor. While their naming is similar and may create confusion, can we use only one of them? Or maybe combine their functionality directly in ParameterDiff?
  • I feel like embedding this into mainWalk is a bit fragile. Between mainWalk and createParameterIndices there are other tools modifying the AST (eventually inlining parameters). These are: augmentForUnknownValue, inlineDenormalizedField, analyzeComposites. I am afraid that current use-cases or further features may not honor the "index" and "parametersDiff" properly. Therefore, I suggest using the createParameterDiff visitor to do the extracting in the same time as building ParameterDiff. Please let me know if there is any issue with this - extraction at the very end. Just now, I am looking forward for some FQLPreprocessor analysis tools with possible inlining and clause removal. Taking extra care of "parametersDiff" and doing reindex of parameters is not something I am comfortable with; other may not be even aware of.
  • I agree that we need to refactor all places where the former getParameterIndices was used. However, can we make this parameter transformation part of ParameterDiff implementation. I am looking forward for something like Object[] newArgs = diff.apply(oldArgs);. This way we have a centralized transformation point, way more easier to handle.

In the end, I had a draft of this patch working for quite some time on a large customer application until errors appeared; so there were several (maybe thousands) of statements that successfully passed extraction.

Once we have a stable version of this, please consider the following:
  • Create a counter JMX for our ps cache hits/misses; this is needed for quite a while. We need to see a cache rate improvement with this patch.

#11 Updated by Dănuț Filimon over 1 year ago

Alexandru Lungu wrote:

I get a NPE in com.goldencode.p2j.persist.ParameterDiff.getSubstParameters(ParameterDiff.java:194) on a large customer application.

There is no need for a use-case. I solved this issue, thanks for pointing it out.

ParameterDiff paramDiff = null and List<Object> parametersDiff are two new attributes to FQLPreprocessor. While their naming is similar and may create confusion, can we use only one of them? Or maybe combine their functionality directly in ParameterDiff?

You are right, I make use of the parametersDiff list to get how many parameters are substituted by extraction and create an array of a set size.

I feel like embedding this into mainWalk is a bit fragile. [...] I suggest using the createParameterDiff visitor to do the extracting in the same time as building ParameterDiff.

There should be no problems with this change. I think it's actually better to make use of createParameterDiff to extract these parameters. This way, I can safely delete the usage of parametersDiff list entirely because there would be no need to save the extracted parameters after the mainWalk call, everything would be done in the same method.

I agree that we need to refactor all places where the former getParameterIndices was used. However, can we make this parameter transformation part of ParameterDiff implementation. I am looking forward for something like Object[] newArgs = diff.apply(oldArgs);. This way we have a centralized transformation point, way more easier to handle.

ParameterDiff.getSubstParameters() fits this requirement already.

Create a counter JMX for our ps cache hits/misses; this is needed for quite a while. We need to see a cache rate improvement with this patch.

I made 2 counters PSCacheHits and PSCacheMisses. I tested a customer application and noticed that the the psCache is hit a lot more times with no changes being made (patch was not applied) by ~50k times. Even if I followed the same steps in the application when testing both patched and clean versions, they still differ slightly might be wrong in this case.

Overall I found a parameter mismatch in a query using a persistent table (2 parameters were required, but only 1 was given) in a customer application where I noticed that records were missing from a page. No errors were thrown, so it took a bit to point out what statement executed incorrectly and solve it.

I am posting an updated patch for 6129c. Please review.

#12 Updated by Alexandru Lungu over 1 year ago

  • % Done changed from 0 to 80

I profiled #7035 fix again and I noticed a slight performance improvement this time. I think the optimization is having some kind of synergy with the latest performance improvements, allowing better caching and planning, faster direct-access and less parsing. I would like however to do a final run on this (review, test and profile) and merge it to trunk.

I created 7035a out of rev. 14506. I am aware that this is not a stable version of trunk, but I will rebase it eagerly from now on. The goal is to integrate this ASAP.
Danut, please feel free to move this to Review and 100% once you move the changes to 7035a. If you have several history entries on a file, you can merge them into one single entry.

#13 Updated by Dănuț Filimon over 1 year ago

  • Status changed from WIP to Review
  • % Done changed from 80 to 100

Committed 7035a/rev.14507. I applied the patch from #7035-11 which refactors ParameterIndices into ParameterDiff and how substitution parameters are handled in FQLPreprocessor.

#14 Updated by Constantin Asofiei over 1 year ago

Alexandru, what's the state of testing for 7035a?

#15 Updated by Alexandru Lungu over 1 year ago

Tested it against two large customer applications. Also profiled against a customer POC. There were no regressions.

The performance wasn't necessarily altered (I've got tests with better times and some with worse times). Last time this introduced +10ms, but I've tested before and got -10ms. However, the changes here can help other optimizations we are going to introduce, so indirectly it may boost performance.

#16 Updated by Constantin Asofiei over 1 year ago

Alexandru Lungu wrote:

Tested it against two large customer applications. Also profiled against a customer POC. There were no regressions.

The performance wasn't necessarily altered (I've got tests with better times and some with worse times). Last time this introduced +10ms, but I've tested before and got -10ms. However, the changes here can help other optimizations we are going to introduce, so indirectly it may boost performance.

After 7026b is merged to trunk, please rebase. I'll run ETF testing here.

#17 Updated by Alexandru Lungu over 1 year ago

Rebased 7035a with latest trunk/rev. 14523.

#18 Updated by Constantin Asofiei over 1 year ago

Alexandru Lungu wrote:

Rebased 7035a with latest trunk/rev. 14523.

ETF testing fails badly.

#19 Updated by Constantin Asofiei over 1 year ago

There are lots of these:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1967)
        at com.goldencode.p2j.persist.FQLPreprocessor.lambda$createParameterDiffs$3(FQLPreprocessor.java:6118)
        at com.goldencode.p2j.persist.ParameterDiff.getSubstParameters(ParameterDiff.java:203)
        at com.goldencode.p2j.persist.QueryComponent.filterArgs(QueryComponent.java:612)
        at com.goldencode.p2j.persist.QueryComponent.getArgs(QueryComponent.java:499)
        at com.goldencode.p2j.persist.AdaptiveComponent.getArgs(AdaptiveComponent.java:276)
        at com.goldencode.p2j.persist.PreselectQuery.prepareParameters(PreselectQuery.java:5595)
        at com.goldencode.p2j.persist.AdaptiveQuery.prepareParameters(AdaptiveQuery.java:2817)
        at com.goldencode.p2j.persist.PreselectQuery.execute(PreselectQuery.java:5330)
        at com.goldencode.p2j.persist.AdaptiveQuery.execute(AdaptiveQuery.java:2917)
        at com.goldencode.p2j.persist.PreselectQuery.getResults(PreselectQuery.java:6189)
        at com.goldencode.p2j.persist.AdaptiveQuery.next(AdaptiveQuery.java:1704)
        at com.goldencode.p2j.persist.PreselectQuery.next(PreselectQuery.java:2644)

#20 Updated by Dănuț Filimon over 1 year ago

I attached a patch that should solve the issue. When handling strings in FQLPreprocessor.createParameterDiff, they start and end with '. It seems that this is not happening in your case, even thought I asserted ' to be the first and last character in the string and then remove it, there might be a case when the string length is 1. Please provide more details if this patch doesn't work, preferably the list of arguments used.

#21 Updated by Constantin Asofiei over 1 year ago

Dănuț Filimon wrote:

I attached a patch that should solve the issue. When handling strings in FQLPreprocessor.createParameterDiff, they start and end with '. It seems that this is not happening in your case, even thought I asserted ' to be the first and last character in the string and then remove it, there might be a case when the string length is 1. Please provide more details if this patch doesn't work, preferably the list of arguments used.

Please commit to 7035a so I can run tests again.

#22 Updated by Alexandru Lungu over 1 year ago

Committed it as 7035a/rev. 14525. I will also redo some of my regression tests for this.

#23 Updated by Constantin Asofiei over 1 year ago

7035a still fails, I think is related to some date literal. I need to find the actual FQL and SQL, will get back to you.

#24 Updated by Constantin Asofiei over 1 year ago

The root cause seems to be an FQL like tt1.f1 <> 0 where f1 is an integer field. In this case, 0 is extracted as a character argument and passed like that to the prepared query, as a String instead of a numeric value.

I think there may be an issue when extracting the literal, only quoted strings must be resolved as a character. The exception is something like:

Caused by: java.sql.SQLException: org.postgresql.util.PSQLException: ERROR: operator does not exist: numeric <> character varying
  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.
  Position: 1443
        at com.goldencode.p2j.persist.orm.SQLStatementLogger.execute(SQLStatementLogger.java:383)
        at com.goldencode.p2j.persist.orm.SQLStatementLogger.log(SQLStatementLogger.java:288)
        at com.goldencode.p2j.persist.orm.SQLQuery.list(SQLQuery.java:573)
        at com.goldencode.p2j.persist.orm.Query.list(Query.java:316)
        at com.goldencode.p2j.persist.Persistence.list(Persistence.java:1553)
        at com.goldencode.p2j.persist.ProgressiveResults.getResults(ProgressiveResults.java:1112)
        at com.goldencode.p2j.persist.ProgressiveResults.moveTo(ProgressiveResults.java:943)
        at com.goldencode.p2j.persist.ProgressiveResults.moveTo(ProgressiveResults.java:801)
        at com.goldencode.p2j.persist.ProgressiveResults.next(ProgressiveResults.java:415)
        at com.goldencode.p2j.persist.ResultsAdapter.next(ResultsAdapter.java:160)
        at com.goldencode.p2j.persist.AdaptiveQuery.next(AdaptiveQuery.java:1720)
        at com.goldencode.p2j.persist.PreselectQuery.next(PreselectQuery.java:2644)

#25 Updated by Constantin Asofiei over 1 year ago

There is another peculiarity at least with PostgreSQL - select '1' = 1; will show 'true'. The same for select 'true' = true;. Now, if you extract the literals in a prepared statement, and keep their data type (text/numeric and text/boolean), you will no longer be able to execute these statements.

Also, this may be dependent on the dialect - I haven't checked how H2 or MariaDB (or even MSSQL) behaves.

My conclusion at this point: extracting the literals as query parameters may pose more challenges than expected.

#26 Updated by Alexandru Lungu about 1 year ago

  • % Done changed from 100 to 70
  • Status changed from Review to WIP

Constantin Asofiei wrote:

There is another peculiarity at least with PostgreSQL - select '1' = 1; will show 'true'. The same for select 'true' = true;. Now, if you extract the literals in a prepared statement, and keep their data type (text/numeric and text/boolean), you will no longer be able to execute these statements.

Very surprising that substitution isn't equivalent to inlining. In this case, mind my last comment.

Also, this may be dependent on the dialect - I haven't checked how H2 or MariaDB (or even MSSQL) behaves.

This need thorough testing then.

My conclusion at this point: extracting the literals as query parameters may pose more challenges than expected.

The performance statistic is low (<0.1%), so from this POV, it is not something we really are hanging on.
However, I wouldn't drop this effort that easy. This is a thought I had at one point, but didn't know if it would fit in this task: maybe we can extract only the values used in some specific expressions like tt.f1 = 1 or tt.f1 < 1. Right now, all literals are extracted and this may truly reveal some suspect behavior. Basically, these are the conditions we usually see and need improvement (<field> <comparison> <value>).

For the dialect differences, we can incrementally check each dialect we use. For '1' = 1, I think we can skip this issue as we would target way more specific constructs.

#27 Updated by Ovidiu Maxiniuc about 1 year ago

The H2 supports implicit conversion of 1 = '1', 'true' = true and even '1' = true. MariaDb will also return return 1 for 1='1' BUT 0 for true='true'. I cannot predict how Microsoft decided to implement their SQL Server. This is one reason for the strongly typed parameters of prepared queries.

Another, maybe more important reason for prepared statements to require the correct data type is security: if a parameter is required to be an integer, for example, it is impossible to do a SQL injection.

Note that the type of the parameter is not mandatory to be known beforehand. The parameter can be passed as an object and the dialect driver will cast it correctly. I think it will also handle some implicit conversions (character/String to date).

The DynamicQueryHelper.java extracts these constants using injectVariable(). It will analyse the literalType = litAst.getType() and store the literal node as a default value of the respective BDT type. However, this is not mandatory, FWD is able to hande the native Java types are equally well.

#28 Updated by Alexandru Lungu about 1 year ago

Constantin Asofiei wrote:

There is another peculiarity at least with PostgreSQL - select '1' = 1; will show 'true'. The same for select 'true' = true;. Now, if you extract the literals in a prepared statement, and keep their data type (text/numeric and text/boolean), you will no longer be able to execute these statements.

Just tested this. Indeed, in H2 the behavior using inline/prepared statement is the same. It does type casts implicitly for both prepared and static statements. However, in PostgreSQL, inlining works fine with implicit data casting. However, prepared statements are type checked and doesn't allow type casting. That is the reason for #7035-24. Strangely, I can't find a reproduction testcase, as something like FOR EACH tt WHERE '1' = 1 causes a compilation error in 4GL. So does WHERE '1' = x, where x is integer.

I am a bit intrigued of the case in #7035-24 and how could we end up requesting an implicit data type casting in the database. I don't think this should happen and may cause unexpected behavior in regard to how 4GL works, which doesn't do implicit data type casting in where clauses.

Note that I retested a large customer application with regression testing and a POC - no such errors occurred. Constantin, do you think that ETF tests exploit another regression in FWD to force something like 1 = '1'; in my POV, this shouldn't be allowed.

News are that I used the PSCacheHit and PSCacheMiss in my tests. I have 98.7% cache hit (for temp-table ps cache) with and without 7035a. Only < 0.01% of new prepared statements are hitting the cache. I don't have a statistic for persistent database as the caching is done by c3p0 there. This is bad. The only good side of 7035a now is that we have shorter SQL statements I guess. This is not very encouraging.

#29 Updated by Dănuț Filimon about 1 year ago

  • % Done changed from 70 to 100
  • Status changed from WIP to Review

Committed 7035a/rev.14526. This is a fix for #7035-24.
I found a similar problem on a customer application after not being able to replicate the behavior in #7035-24 on my own. When creating the ParameterDiff instance by calling FQLPreprocessor.createParameterDiffs, all inlined parameters are skipped after the parameterType of the node has been identified, this means that the next node is not to be extracted and the parameterType should be reset (which did not happen). Instead of checking for inline parameters in the switch case, I changed it so that it is checked before it and solved the problem.

#30 Updated by Alexandru Lungu about 1 year ago

I am OK with the fix. However, my last profile session shown a +0.2% performance decrease with 7035a. I don't think it is the time to integrate it as it is now. We need to go the other way around (<-0.2%).

Danut, please constrain the changes to target only a specific sub-set of query parameters for now:
  • tt.f1 * <literal> should be tt.f1 * ?, where * is from {=, <>, <, <=, >=, >}. The same for <literal> * tt.f1

I think having ParameterDiff instead of ParameterIndices is something we can benefit in further work; so I would let go such change.

#31 Updated by Dănuț Filimon about 1 year ago

Alexandru Lungu wrote:

Danut, please constrain the changes to target only a specific sub-set of query parameters for now:
  • tt.f1 * <literal> should be tt.f1 * ?, where * is from {=, <>, <, <=, >=, >}. The same for tt.f1 * <literal>
Committed 7035a/rev.14527. I applied the change to limit operators involved in parameter extraction. At the same time, I limited the type of expressions that are involved. Only simple expressions like tt.f1 * <literal> and tt.f1 * <literal> will have their literals extracted as parameters. For example:
  • 5 < tt1.f1 + 2: Before 5 and 2 would be extracted, now 2 will not be extracted because the operator is not part of the ones mentioned and 5 will not be extracted because it is not part of a simple expression.
  • true > (tt1.f1 > 7): This case will remain the same, 7 is extracted because it is part of the simple expression tt1.f1 > 7.

#32 Updated by Eric Faulhaber about 1 year ago

Are we testing/profiling these changes only with H2, or with other database dialects as well? I am concerned that what improves performance with H2 might not work the same across all dialects.

For instance (and this may be long out of date), I found, many versions of PostgreSQL ago, that hard-coding a literal integer 1 in a LIMIT phrase performed better in many cases than using a substitution parameter. That is, LIMIT 1 gave better performance than LIMIT ? with a substitution value of 1. Presumably, knowing in advance that we only need 1 record allowed the query planner to optimize its plan for this. I realize this is not exactly what we are doing here and there is statement parsing/preparation/caching to consider, but the point is that we have been heavily focused on H2, and I don't want to improve H2 at the expense of regressing performance with PostgreSQL, Maria, etc.

I did not do a thorough review of the code, but I see the work is being done mostly in FQLPreprocessor and there are no changes to the dialect classes, so it seems this refactoring is being applied unconditionally (w.r.t. to dialect, at least).

#33 Updated by Alexandru Lungu about 1 year ago

  • % Done changed from 100 to 90
  • Status changed from Review to WIP

Eric Faulhaber wrote:

Are we testing/profiling these changes only with H2, or with other database dialects as well? I am concerned that what improves performance with H2 might not work the same across all dialects.

My profiling is across all dialects. My hope was to have optimizations across multiple databases back-ends in the same time; the more queries optimized, the better? However, I see your point. Maybe PostgreSQL (or other database) is not suited for such optimizations.

This is really hard to profile out-of-context, as 1M queries with extracted parameters are performing better than 1M without extracted parameters in all dialects. However, in real applications, we don't have the same query executed 1M times with different hard-coded values. There are queries which are executed once, so the parameters are extracted for nothing; but there are queries which are executed >5 times, so the parameter extraction is worth it. I can't tell the ratio of these cases across temp and persistent databases.

Danut, can you think of a temporary JMX to check how often the extraction of parameters results in similar queries, across different databases: meta, temp, dirty and persistent? That is, store all FQL query strings in a map. Check if a new query is in the map with/without extracted parameters. This should be database-wise. Maybe we can consider this optimization only for some databases in the end.

As for the performance, we will check this afterwards, per dialect. The only performance increase is due to the reuse of prepared statements, so if the previous JMX shows a good ratio. Unless we see a good ratio, I think the extraction should be rethinked (per application?, per database?, never?).

#34 Updated by Dănuț Filimon about 1 year ago

I created the temporary JMX and added counters for meta, temp, dirty and persistent database hits/misses. The counter will increase the miss count each time a new FQL appears (after it is emitted in FQLPreprocessor.preprocess) and will increase the hit count each time the same FQL is emitted.

After comparing 7035a rev.14523 (no changes) and rev.14527 (latest changes) on a large customer application, I got the following results:

JMX 7035a rev.14523 7035a rev.14527 Difference
Query Counter 2833 2544 -10.201%
META_DB_HITS 0 0 -
META_DB_MISSES 5 5 -
TEMP_DB_HITS 52 74 +42.307%
TEMP_DB_MISSES 812 613 -24.507%
DIRTY_DB_HITS 155 195 +25.806%
DIRTY_DB_MISSES 636 556 -12.578%
PERSISTENT_DB_HITS 204 210 +2.941%
PERSISTENT_DB_MISSES 1380 1370 -0.724%

There are less misses and more hits for the temp and dirty databases, while the persistent database also shows a bit of an improvement. I've attached the patch for the temporary JMX since it may be needed for future tests.

#35 Updated by Alexandru Lungu about 1 year ago

I've redone the tests with your JMX. I get the following results in Cold/Warm runs on a customer application.

JMX Before changes (Cold/Warm) After changes (Cold/Warm)
Query Counter 1792 / 340 1741 / 341
META_DB_HITS 0 / 0 0 / 0
META_DB_MISSES 1 / 0 1 / 0
TEMP_DB_HITS 52 / 36 67 / 36
TEMP_DB_MISSES 587 / 10 557 / 10
DIRTY_DB_HITS 306 / 37 320 / 37
DIRTY_DB_MISSES 127 / 0 112 / 0
PERSISTENT_DB_HITS 2667 / 2403 2665 / 2403
PERSISTENT_DB_MISSES 1077 / 330 1071 / 331

This change turns to be quite irrelevant for the customer application I tested. I conclude that this change is something that is highly dependent upon customer's business logic (relying or not on static values, conventions, etc). I don't think that another profiling round is necessary; it proves that the extraction attempt overhead is bigger that the actual benefit, which is none.

Danut, please try to test these changes on a warm run. Therefore, do a scenario iteration through the customer application, open the JMX and redo it. I think your results (as mine) as good for cold runs, but quite modest for warm runs.
If your results show the same (that the warm runs are not optimized), we can drop this effort. Otherwise, we can conclude that these changes are customer code dependent and can be introduced conditionally (with a server setting).

For testing, I extracted your changes as a patch and applied over a local trunk to test (as 7035a is quite outdated). The same went for the JMX patch.

#36 Updated by Dănuț Filimon about 1 year ago

I retested the changes on a customer application as you said and the results are almost the same.

JMX Before changes (Warm) After changes (Warm)
Query Counter 277 274
META_DB_HITS 0 0
META_DB_MISSES 0 0
TEMP_DB_HITS 26 26
TEMP_DB_MISSES 40 40
DIRTY_DB_HITS 24 23
DIRTY_DB_MISSES 34 33
PERSISTENT_DB_HITS 131 130
PERSISTENT_DB_MISSES 203 201

#37 Updated by Alexandru Lungu about 1 year ago

  • % Done changed from 90 to 100
  • Status changed from WIP to Review

Considering the JMX results of hit/miss before/after static query parameter extraction, I think this effort can be dropped. If we ever encounter a customer which will need such optimization, we can revive the effort. By now, we couldn't find any improvement on two customer applications we tested.

This was an attempt to optimize our query caches hit ratio (c3p0, psCache), but apparently with no success.

#38 Updated by Eric Faulhaber 12 months ago

  • Status changed from Review to Closed

Work on this task is complete for now, but based on the findings, we will not be integrating into trunk.

Also available in: Atom PDF