Feature #3761: add DYNAMIC-FUNCTION() support to the dynamic database WHERE clause interpreter - Database - Golden Code Redmine

Feature #3761

add DYNAMIC-FUNCTION() support to the dynamic database WHERE clause interpreter

Added by Greg Shah over 5 years ago. Updated about 5 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Ovidiu Maxiniuc

Target version:

Start date:

Due date:

% Done:

100%

billable:

vendor_id:

GCD

version_reported:

version_resolved:

History

#1 Updated by Eric Faulhaber over 5 years ago

Assignee set to Ovidiu Maxiniuc

#2 Updated by Ovidiu Maxiniuc over 5 years ago

Status changed from New to WIP

Started by creating toy testcases that use DYNAMIC-FUNCTION.

#3 Updated by Ovidiu Maxiniuc over 5 years ago

Well, I encountered some strange results. Please confirm that the goal of this task is to add support for constructs like:

   bh:FIND-FIRST("WHERE DYNAMIC-FUNCTION('f', tt4.f1) EQ 4", NO-LOCK) NO-ERROR.

Where bh is a handle to tt4 table, f1 is an int field of same table and f is a function that takes an int and returns another (like its square). In this case, the FIND-FIRST should reposition the buffer on first record with f1 = 2.

I am asking because my testcase will always find nothing on P4GL test machines :(.

#4 Updated by Greg Shah over 5 years ago

Yes, this is the objective. I will separately send you some example customer code which is supposed to work.

#5 Updated by Ovidiu Maxiniuc over 5 years ago

This is a bit (or more) strange. The combination of dynamic queries and dynamic function call is in fact used as a workaround for the lack of dynamic expression evaluation. After the peculiar results I was obtaining I search the Progress KB and I found this:
https://knowledgebase.progress.com/articles/Article/P39990

I is not very evident at first why they need a dummy query but then I understood that the expression evaluator is implemented in 4GL only at dynamic query level. I understand that, in previous versions, all code was interpreted so this kind of issues were simply passed to 4GL interpreter. In newer versions, 4GL compiles the code so the interpretation is not available any more. Except with this workaround.

I did some tests and I noted that the dynamic function is called only ONCE, when the query is prepared (in QUERY-PREPARE). More than that, the function is evaluated with current content of the buffers. Once the evaluation is done, the result is saved (and I quote the article: "in a global variable"). There are no re-evaluation of the dynamic function for each iterated record, if any field is passed to dynamically called function. So no re-evaluation for find-first, find-next and all other methods.

#6 Updated by Greg Shah over 5 years ago

In other words, it is treated as a query substitution parameter?

#7 Updated by Ovidiu Maxiniuc over 5 years ago

The problem is that the current implementation of the dynamic queries is too logical in FWD. If the parameters and any of its node are not field in query's buffers then FWD and P4GL will work identically: the dynamic call is a substitution parameter that is evaluated only once.

Otherwise, P4GL will work as expected to do (at least from my point of view): invoking dynamically the function for each iterated row when a field of the queried tables/buffers are used in the dynamic call. P4GL, on the other hand, will not. P4GL will evaluate the function using the value of the fields at the moment when the query is open and statically evaluate it only once - as I could deduct.

As I understand, in dynamic queries, DYNAMIC-FUNCTION behaves like some kind of bracket that separates two kind of field evaluations:

the fields that appear in its parameters tree will be statically evaluated to the value at the moment when the query gets open;
other fields occurrences are dynamically evaluates as the query is navigated (with GET-NEXT / GET-FIRST, etc).

Although I hate the idea of altering the dynamic query engine, the good news is that I think I may be able to emulate the broken semantic of P4GL rather easy, by using a 'static' region when DYNAMIC-FUNCTION is processed and replace the buffer fields with their actual value at the moment the query is open. I see an issue though: doing so will alterate the interpreted AST tree and break the caches.

#8 Updated by Eric Faulhaber over 5 years ago

Ovidiu Maxiniuc wrote:

I see an issue though: doing so will alterate the interpreted AST tree and break the caches.

Is this inability to cache limited to the cases which use DYNAMIC-FUNCTION? Or are you saying all caching will be broken?

The former I think is an acceptable limitation. The latter is not an acceptable setback to performance.

#9 Updated by Ovidiu Maxiniuc over 5 years ago

Eric Faulhaber wrote:

Is this inability to cache limited to the cases which use DYNAMIC-FUNCTION? Or are you saying all caching will be broken?
The former I think is an acceptable limitation. The latter is not an acceptable setback to performance.

My proposed solution was to use annotations (we are already using some of them to cache some information (like classes and literal values) to speed up the interpreter) to store the result of the DYNAMIC-FUNCTION. However, this will break the cache because the annotation will be stored in cached tree. At a next execution of a similar dynamic query the DYNAMIC-FUNCTION call might need to return a different value - because the current values of the parameters is different. Once the annotations are set they cannot be dropped and will be used as long as the JAST interpreted tree is in cache.
So yes, the solution here is to skip adding DYNAMIC-FUNCTION queries to caches. Even better, we should drop the 'generification' in this case.

However, while doing my investigations, at a finer level, I discovered another issue with the dynamic query implementation. It seems that the evaluation of the substitution parameters are evaluated when the query is open. We use the prepare step to parse and construct FWD-specific query: adding query components AND evaluation of the substitutions (P2JQuery.Parameter). The parser returns a completely built P2JQuery, ready to be open. From my tests, 4GL evaluates the substitutions (or similar concept) when the dynamic query is open. This was indeed transparent for all queries we encountered until now. If this dynamic evaluation is used only as in your example, there is no problem since the query is sequentially prepared-open-closed in same statement. In a more complex language constructs the effect is most likely different.

#10 Updated by Ovidiu Maxiniuc over 5 years ago

The current implementation should convert and compile without any problems. However, since this part of FWD uses the TRPL to convert code dynamically, it can affect the static conversion. The good part is that most of the changes I am writing now are additive so the impact should be minimum.

The solution I am working right now is driven by the observed pattern this 'feature' is used and the behaviour I noticed while analysing the P4GL code that uses it.
I will describe what I am trying to do. Suppose we have a construct like this:

qh:SET-BUFFERS(bh).
qh:QUERY-PREPARE("FOR EACH tt4 WHERE DYNAMIC-FUNCTION('f', tt4.f1) EQ 4").
qh:QUERY-OPEN().
qh:GET-FIRST().
qh:GET-NEXT().

1. When the dynamic engine processes this (in QUERY-PREPARE) it will create a hidden class like this:

public class DynGenQuery2
{
   AdaptiveQuery query0 = new AdaptiveQuery();

   public void execute()
   {
      query0.initialize(
         tt4, 
         ((String) null), 
         () -> isEqual(
                  new int64(
                     ControlFlowOps.invokeDynamicFunction(
                        hqlParam1dq, 
                        tt4.getF1())), 
                  hqlParam2dq), 
         "tt4.id asc");
  }
}

Before returning from QUERY-PREPARE, the method execute is executed (so that query0 gets initialized) and query0 is returned and assigned to qh (which is a QueryWrapper). Notice that at this moment no evaluation is done yet.

2. In 4GL when QUERY-OPEN is called, the dynamic function is called using the current values (of buffers) and the result is apparently cached. During this operation the dynamic call is executed for the single time.

3. The GET-FIRST() and GET-NEXT() will iterate the query for the value of the where predicate at OPEN time: if false, there will be no results (even if there may be some records in our table that satisfy the predicate), if true will iterate all table content (even if none of them satisfy the predicate). This is indeed odd, but these two methods are not used in practice, as far as I could observe. However, we need to mimic this for completeness.

The solution I am working on is the following: when such dynamic query is detected, a special post-processing step is called that will convert the above class into something like this:

public class DynGenQuery
{
   AdaptiveQuery query0 = new AdaptiveQuery();
   logical dynVal = null;

   public void execute()
   {
      query0.initialize(
         tt4, 
         ((String) null), 
         () -> dynVal, 
         "tt4.id asc");
  }

   public void open()
   {
      dynVal = isEqual(
                  new int64(
                     ControlFlowOps.invokeDynamicFunction(
                        hqlParam1dq, 
                        tt4.getF1())), 
                  hqlParam2dq);
   }
}

The open() method should be called when the query is open, before making sure the query is closed (and releasing the buffers). The result of the predicate based on current record loaded in buffer will be cached in dynVal at this moment and will be used for any future query navigations.

For the moment, the cache is deactivated, but as things evolved, I might be add it back. For the moment I see a single issue: how to call the open() method from the QueryWrapper.queryOpen() but I will think of a solution after getting the rest done.

I estimate that tomorrow, in the evening, to have an 'alpha' working and entering testing and fine-tuning.

#11 Updated by Ovidiu Maxiniuc over 5 years ago

It actually works! At least for the testcase the client provided and for a a few other I tested.
The lambda is evaluated through a queryOpen listener that I added quite similar to the reposition listener.

However, some simpler cases are not recognized as dynamic-evaluations and something does not go entirely as expected.

Please let me know where should I commit my changes after I do a bit of cleanup.

#12 Updated by Greg Shah over 5 years ago

Great! Please use 3750a.

#13 Updated by Ovidiu Maxiniuc over 5 years ago

Committed revision 11303.

#14 Updated by Ovidiu Maxiniuc over 5 years ago

I found an issue with the solution. When the converted query has a RandomAccessQuery, the P2JQuery.Parameter lambdas are not processed at the correct time. This is because the RandomAccessQuery.initialize() method is called when the query is assembled in the query-prepare. In turn, the method will call AbstractQuery.preprocessSubstitutionArguments() and so the parameters are evaluated and the dynamic-function is called for the current values of the buffers which might not the the correct ones if the buffer is altered between query-prepare and query-open calls.

#15 Updated by Ovidiu Maxiniuc over 5 years ago

I did some tests today in order to understand the exact moment when the evaluation of the P2JQuery.Parameter is needed for RandomAccessQuery. For this I create the static counterparts for this testcase:

FUNCTION f RETURNS INTEGER (INPUT x AS INTEGER, INPUT log-it AS LOGICAL):
    IF log-it THEN MESSAGE "f(" x ")".
    RETURN x * x.
END FUNCTION.

DEFINE TEMP-TABLE tt4
    FIELD f1 AS INT.

CREATE tt4. f1 = 0.
CREATE tt4. f1 = 1.
CREATE tt4. f1 = 2.
CREATE tt4. f1 = 3.
CREATE tt4. f1 = 4.

FIND FIRST tt4 WHERE tt4.f1 EQ 2.
FIND FIRST tt4 WHERE DYNAMIC-FUNCTION('f', tt4.f1, YES) EQ 4 NO-LOCK NO-ERROR.
MESSAGE "Static-q dynamic-f Found: " f1 f(f1, NO) EQ 4.

FIND FIRST tt4 WHERE tt4.f1 EQ 2.
FOR EACH tt4 WHERE DYNAMIC-FUNCTION('f', tt4.f1, YES) EQ 4:
    MESSAGE "Static-FOR Dynamic-f:" f1 f(f1, NO) EQ 4.
END.

The output in ABL is the following:

f( 2 )
Static-q dynamic-f Found:  0 no
f( 2 )
Static-FOR Dynamic-f: 0 no
Static-FOR Dynamic-f: 1 no
Static-FOR Dynamic-f: 2 yes
Static-FOR Dynamic-f: 3 no
Static-FOR Dynamic-f: 4 no

The code FWD generates today is the following:

new FindQuery(tt4, "tt4.f1 = 2", null, "tt4.id asc").first();
silent(() -> new FindQuery(tt4, 
                           (String) null, 
                           () -> isEqual(ControlFlowOps.invokeDynamicFunctionWithMode(integer.class, "f", "II", tt4.getF1(), true), 4), 
                           "tt4.id asc", 
                           LockType.NONE).first());
[...]
AdaptiveQuery query0 = new AdaptiveQuery();
forEach("loopLabel0", new Block((Init) () -> 
{
   query0.initialize(tt4, 
                     ((String) null), 
                     () -> isEqual(ControlFlowOps.invokeDynamicFunctionWithMode(integer.class, "f", "II", tt4.getF1(), true), 4),
                     "tt4.id asc");
},

To match the output of ABL, the generated code should be more like:

silent(() -> new FindQuery(tt4,
                           "?",
                           null,
                           "tt4.id asc",
                           new Object[] {
                              isEqual(ControlFlowOps.invokeDynamicFunctionWithMode(integer.class, "f", "II", tt4.getF1(), true), 4)
                           },
                           LockType.NONE).first());
[...]
query0.initialize(tt4, 
                  "?", 
                  null, 
                  "tt4.id asc", 
                  new Object[] {
                     isEqual(ControlFlowOps.invokeDynamicFunctionWithMode(integer.class, "f", "II", tt4.getF1(), true), 4)
                  });

Note the f(2) is called only once for the FIND FIRST statement. This means the query 'takes' the current value of the buffer (well, that was before the buffer is reset/flushed) and checks the predicate. Since I positioned the buffer intentionally on 2 the line before, the predicate will return YES.
This is not very relevant, the loop gives more insights: the query iterates ALL records in the table as they would match the predicate. Of course, this is not the truth, the MESSAGE statement prints the actual field value from the buffer and the value of the predicate. Evidently, the predicate is true/yes only for 2. But then how are the other records iterated???

In conclusion, the DYNAMIC-FUNCTION makes all its sub-nodes opaque and evaluated only once when the query is initialized. This is really counter-intuitive but this seems to be the way ABL does the things. Most likely, this is caused by the dynamic nature of the dynamic calls, which cannot be executed directly on the database. So instead of bringing all records to client side (as FWD does) they have chosen to evaluate statically the dynamic-function (isn't that funny) before the query is sent to database.

#16 Updated by Ovidiu Maxiniuc over 5 years ago

The function call seems to be more complicated. In fact, I think that all calls to UDF (ABL internal functions) are executed at most ONCE, when the query is open.
I am basing the assumption on the following code:

FIND FIRST tt4 WHERE tt4.f1 EQ 3 NO-LOCK.
FIND FIRST tt4 WHERE f(tt4.f1, yes) EQ 9 NO-LOCK NO-ERROR.
MESSAGE "f1:" tt4.f1 f(tt4.f1, false) EQ 9.

RELEASE tt4.
FIND FIRST tt4 WHERE f(tt4.f1, yes) EQ 9 NO-LOCK NO-ERROR.
MESSAGE "f1:" tt4.f1.

This will print

f( 3 )
f1: 0 no
** No tt4 record is available. (91)
f1:

The same observations I wrote in my previous note apply here, too: the first FIND FIRST loads an arbitrary record in buffer. The second FIND FIRST will just use the current value from the buffer to test the predicate when the query is open, but it will load the very first record from table, regardless it matches the predicate.

There is a new observation here: the 3rd FIND FIRST will actually NOT call the function AT ALL. I cannot explain this. The query execution is practically stopped when the substitution (using FWD idiom) is evaluated. I would expect to see at least a f( ? ) printed. At least this is what FWD does now, for me.

To get a closer match of the behaviour of ABL I changed FWD conversion to extract ALL call of ABL UDFs as P2JQuery.Parameter s. Except for the 'abort' issue.

#17 Updated by Eric Faulhaber over 5 years ago

Ovidiu, I am concerned your changes to index_selection.rules may be too aggressive. Won't this extract built-in functions with current buffer field arguments as well?

#18 Updated by Ovidiu Maxiniuc over 5 years ago

Eric Faulhaber wrote:

Ovidiu, I am concerned your changes to index_selection.rules may be too aggressive. Won't this extract built-in functions with current buffer field arguments as well?

Eric, you are right. I had a feeling last night when I mentioned the "bit of a risk". Thank you so much for the review!

I changed the code to ignore the built-in functions. The DYNAMIC-FUNCTION is also a built-in so the rules became a bit complicated.
After a quick test (well it failed at first, the AST-types were really messed, but a fresh rebuild fixed it) I committed the change to 3750b as r11351.

#19 Updated by Eric Faulhaber over 5 years ago

Thanks for fixing this. Please note that index_selection.rules lines 487-512 and lines 594-619 are exact duplicates. Perhaps move this code into a function?

#20 Updated by Greg Shah over 5 years ago

Is there anything more to do on this task?

Does the customer's testcase work properly?

#21 Updated by Ovidiu Maxiniuc over 5 years ago

Greg Shah wrote:

Does the customer's testcase work properly?

Yes, it did.

Is there anything more to do on this task?

I think it ready for the final regression test. It must be the ETF, the standard tests on devsrv01 do not cover the changes.

#22 Updated by Eric Faulhaber over 5 years ago