Feature #4347
add runtime support for STOP-AFTER block option
100%
Related issues
History
#1 Updated by Constantin Asofiei over 4 years ago
The STOP-AFTER option requires runtime - see the BlockManager.stopAfter
stubs.
#2 Updated by Constantin Asofiei over 4 years ago
- Related to Feature #3751: implement support for OO 4GL and structured error handling added
#3 Updated by Greg Shah over 4 years ago
- Related to Feature #4373: finish core OO 4GL support added
#4 Updated by Constantin Asofiei about 2 years ago
We need tests to see what happens if there are nested blocks with STOP-AFTER - as I think the STOP should target the block with the STOP-AFTER clause which expired. And this needs to cleanup any other nested STOP-AFTER stopper threads.
#5 Updated by Greg Shah about 2 years ago
I would like to avoid having a thread for each STOP-AFTER block. I think we need to sort the timeouts and process them all on 1 thread.
#6 Updated by Constantin Asofiei about 2 years ago
Greg Shah wrote:
I would like to avoid having a thread for each STOP-AFTER block. I think we need to sort the timeouts and process them all on 1 thread.
OK, makes sense, I think it can be done, we just need to figure out the cleanup (when block terminates, cleanup all nested blocks when targeting an outer block, etc).
#10 Updated by Alexandru Lungu 2 months ago
Eduard and I had a chat yesterday on this task an I took some "thinking" time on my own on this matter:
#4347-6 is not a trivial thing from my POV - in fact, it is the hardest part of this task. I think the 4GL statements are atomic, so in 4GL the functionality is something like:
do stop after 3: // if 3 seconds already passed, do stop. message "a". // if 3 seconds already passed, do stop. message "b". // if 3 seconds already passed, do stop. // ... end.To replicate this in Java, we need all converted statements to check if there was a time-out or not, so it can raise
stop
. To do that, we can either:
- Cherry-pick some places in FWD run-time:
BlockManager
is the obvious one, but there may be others likeAbstractQuery.next
(converted statement) orBDT.assign
(converted statement), etc. - Use AOP and annotate the methods used exclusively from the conversion side.
No need to use parallel thread. For each session, we can keep a stack of deadlines and the tightest deadline (current-time + after clause). For the hotspots we choose, we need to check if the tightest deadline was passed, so we raise the STOP condition. We can do that in BlockManager
for each session context.
- Nested blocks with
STOP AFTER
. I don't think this is necessarily hard - if a certain time-out occurred, simply raise STOP, wherever the execution stands (in a top-block or a nested-block). - Edge cases:
- Database (schema or session) triggers.
- UI statements including WAIT-FOR or UI triggers.
- What happens when a long running IO operations happens (including database interaction). Is this the case in #8573?
- If so, we can't cancel a DB query AFAIK (or maybe we can with some specific JDBC settings). How is 4GL dealing with this? Even if 4GL is waiting for the query to end, in FWD we don't have cursor walking, but special queries to prefetch data (like
ProgressiveResults
), which are not that granular.
My end-game here: if the urgency of #4347 is due to #8573 and #8573 is timing out due to a long running query, maybe we should optimize that first, right?
If we need to handle this now, I suggest doing some detective work for the edge cases and IO operations. After, we can deliver a more relaxed implementation where only BlockManager
is honoring the STOP AFTER
.
#11 Updated by Constantin Asofiei 2 months ago
Alexandru, the STOP AFTER needs to use a similar mechanism we have for CTRL-C support (which actually raises a STOP condition in ChUI). This will interrupt the target Conversation thread once the timeout exceeded and it will perform a rollback/rollup until the targeted DO STOP AFTER block is reached, via a STOP condition.
If you run this test in ChuI:
do on quit undo, leave: do on stop undo, leave: do stop-after 1: pause 3. end. message "no stop". end. message "no quit". end. message "done".
you will see that a STOP condition is being raised for both CTRL-C and the STOP-AFTER timeout.
#12 Updated by Greg Shah 2 months ago
To replicate this in Java, we need all converted statements to check if there was a time-out or not, so it can raisestop
. To do that, we can either:
- Cherry-pick some places in FWD run-time:
BlockManager
is the obvious one, but there may be others likeAbstractQuery.next
(converted statement) orBDT.assign
(converted statement), etc.- Use AOP and annotate the methods used exclusively from the conversion side.
I disagree here. We don't want to do this. It will make a mess of the code and will also add new processing that is a performance drag.
No need to use parallel thread. For each session, we can keep a stack of deadlines and the tightest deadline (current-time + after clause).
Actually, a timer thread is exactly what is needed. And as Constantin notes, we have the interruption mechanism already available to generate the STOP. This will work for all cases (base lang, UI, database, processing on the client or the server). See Control.interrupt()
.
This timer thread can:
- Be a single thread for all sessions. In this case, we must save some context local state to identify the session to be interrupted along with the scheduled deadline. OR
- Be a single thread per session that handles all timers for that context. In this case, the thread itself would have the context needed to know which session is being interrupted.
Most important from my POV:
- Database (schema or session) triggers.
- UI statements including WAIT-FOR or UI triggers.
- What happens when a long running IO operations happens (including database interaction). Is this the case in #8573?
- If so, we can't cancel a DB query AFAIK (or maybe we can with some specific JDBC settings). How is 4GL dealing with this? Even if 4GL is waiting for the query to end, in FWD we don't have cursor walking, but special queries to prefetch data (like
ProgressiveResults
), which are not that granular.
The Control.interrupt()
works for all of these cases.
#13 Updated by Eduard Soltan about 2 months ago
- Status changed from WIP to Review
Committed on 4347a, revision 15204.
#14 Updated by Eric Faulhaber about 2 months ago
Greg or Constantin: please review.
#15 Updated by Constantin Asofiei about 2 months ago
- we need a
SessionListener
to clean up theStopAfterTimer.stopAfterTimerOuts
map. Currently this is not done. stopAfterTimerOuts
needs to be synchronized- what happens if there are nested blocks with
STOP-AFTER
and i.e. the outer block has 1 second and the inner blocks 10 seconds? - two or more timers can start executing 'in parallel' -
cancel
is not a problem, but:- the linked list for the Task may be problematic.
- calling
Control.blockInterrupt();
twice, in parallel, from two timers is possible. I think we need to ensure we call it only once.
- there are some formatting issues with
StopAfterTimer
- new line between field definitions, afterextends
- please take a look - I think if we need to have a way to disable
STOP-AFTER
, via a directory configuration - the reason is if theSTOP-AFTER
values are tuned for OE performance, in FWD we may get into canceling tasks which should not be canceled.
#16 Updated by Eduard Soltan about 2 months ago
Constantin Asofiei wrote:
Review for 4347a rev 15204:
- what happens if there are nested blocks with
STOP-AFTER
and i.e. the outer block has 1 second and the inner blocks 10 seconds?
In that case it is iterating over nested blocks using nextTask
, and canceling the TimerTasks submitted to the timer corresponding to nested blocks.
- two or more timers can start executing 'in parallel' -
cancel
is not a problem, but:
- the linked list for the Task may be problematic.
- calling
Control.blockInterrupt();
twice, in parallel, from two timers is possible. I think we need to ensure we call it only once.
Isn't this something that wanted to be avoided? To have separate timer for each block.
#17 Updated by Constantin Asofiei about 2 months ago
Eduard Soltan wrote:
Constantin Asofiei wrote:
Review for 4347a rev 15204:
- what happens if there are nested blocks with
STOP-AFTER
and i.e. the outer block has 1 second and the inner blocks 10 seconds?In that case it is iterating over nested blocks using
nextTask
, and canceling the TimerTasks submitted to the timer corresponding to nested blocks.
Thanks, I see it now.
- two or more timers can start executing 'in parallel' -
cancel
is not a problem, but:
- the linked list for the Task may be problematic.
- calling
Control.blockInterrupt();
twice, in parallel, from two timers is possible. I think we need to ensure we call it only once.Isn't this something that wanted to be avoided? To have separate timer for each block.
Ah, you are right, there is a single thread where the tasks are posted. So there is no concurrency here. But, can anything wrong happen if you fire two tasks in a row, which ends up calling Control.blockInterrupt
almost one after the other? The point here is to look for race conditions.
#18 Updated by Eduard Soltan about 2 months ago
Constantin Asofiei wrote:
Ah, you are right, there is a single thread where the tasks are posted. So there is no concurrency here. But, can anything wrong happen if you fire two tasks in a row, which ends up calling
Control.blockInterrupt
almost one after the other? The point here is to look for race conditions.
I guess such a condition could happen.
I am thinking about adding a lock in a Session implementation, and to allow execution of the run
method of TimerTask only after lock acquiring. After acquiring the lock could be released if a on stop undo
phrase is encountered or at the program termination if no such phrase is encoutered.
#19 Updated by Greg Shah about 2 months ago
I'm reviewing the code now. After reviewing the 4GL docs, I do have some questions. I also added Marian to coordinate testcases and also he may know some of the answers.
Did you write testcases to check how OE works in these cases:
- 4GL syntax
- "time limit" value
- Test both literals and "complex" expressions (a variable and a calculation should be enough).
- Is the value really required to be an integer? Can it be decimal? Can it be int64?
- Can you have both an
ON STOP
phrase and aSTOP-AFTER
phrase on the same block? - Can you have
catch
blocks forProgress.Lang.Stop
(or subclasses) in the same block as anON STOP
phrase or aSTOP-AFTER
phrase?
- "time limit" value
- Exceptions versus Conditions and abnormal control flow
- The 4GL docs say that when the timeout occurs, both a
STOP
condition is raised and aProgress.Lang.StopAfter
exception will be thrown. This (and the references to enabling the oldSTOP
behavior using-catchStop 0
) suggests they have moved to an exception-first approach withSTOP
where they throwProgress.Lang.Stop
or one of its subclasses likeProgress.Lang.StopAfter
,Progress.Lang.UserInterrupt
for CTRL-C andProgress.Lang.LockConflict
for record lock timeouts) and probably have an implicit catch block for any blocks that haveON STOP
phrases. Is that how it works? - Does one need to use
ROUTINE-LEVEL ON ERROR UNDO, THROW.
orBLOCK-LEVEL ON ERROR UNDO, THROW.
to get this exception behavior or is it really present in later releases? (extra credit: which OE release brought this?) - Does
STOP-AFTER
add an implicit catch block forProgress.Lang.StopAfter
to the block in which it is specified? - Does using
STOP-AFTER
provide any behavior forProgress.Lang.Stop
instances that are notProgress.Lang.StopAfter
? - Does it add an implicit
ON STOP
phrase in the case that exceptions are not being used (-catchStop 0
) - Is
STOP-AFTER
essentially anON STOP UNDO, LEAVE
but only matching a timeout and not other kinds ofSTOP
? - If you can have both an
ON STOP
phrase and aSTOP-AFTER
phrase on the same block, how do they interact? - If you can have
catch
blocks forProgress.Lang.Stop
(or subclasses) in the same block as anON STOP
phrase or aSTOP-AFTER
phrase, how do they interact? - Does an
ON STOP
clause "catch" allProgress.Lang.Stop
exceptions, no matter the subclass?
- The 4GL docs say that when the timeout occurs, both a
- Timers
- Does the timer get reset at the top of every looping block that has
STOP-AFTER
?- Check this for both iterate and RETRY.
- Does the timer get removed after the final exit from any block that has
STOP-AFTER
?
- Does the timer get reset at the top of every looping block that has
- Nested Blocks
- Outer block is
STOP-AFTER 3
and inner block isSTOP-AFTER 8
, does a timeout in the inner block properly target the outer block? - Outer block is
STOP-AFTER 8
and inner block isSTOP-AFTER 3
, does a timeout in the inner block properly target the inner block?
- Outer block is
If you've already written these, just point them out and explain the results here. If not, then we probably need these testcases to confirm how things work. Many of these things are statements or implications from the 4GL documentation. We have learned not to trust that documention, it is often incomplete and sometimes incorrect.
LE: Added a Timers section above with more questions.
#20 Updated by Greg Shah about 2 months ago
Code Review Task Branch 4347a Revision 15204
1. Please explain why Control.blockInterrupt()
adds a remote call and uses the remote form of interrupting (sending a message)? I would have expected that we would use the Control.interrupt()
implementation (which has both local and remote interruption paths). As far as I know, Control.interrupt()
is exactly what we need.
2. In BlockManager.stopAfter(long)
, please call wa.tm.getBlock()
only once and save the result in a local var.
3. As noted in #4347-19, I think there is likely to be a lot of exception/condition processing that the 4GL implements. I suspect we can implement this properly in a couple of days if we have the 4GL behavior fully tested and documented. More importantly, the 4GL docs suggest that there is a difference between STOP
conditions processed for different causes. This could explain why they implemented an "exception first" approach to handling these cases. With that in mind, I don't understand the current BlockManager
implementation. Even beyond the exception approach, I suspect that the current implementation does not properly implement the control flow in many cases.
- I would expect that the timer would be a
Finalizable
- Reset at the
iterate()
orretry()
of a block. - Removed at the
finished()
.
- Reset at the
- Don't we need to add processing in locations where we are catching
ConditionException
otherwise how do we implement the impicitLEAVE
of a block in response toSTOP-AFTER
timeout? - Doesn't any
STOP
condiiton processing need to differentiateSTOP-AFTER
timeouts from otherSTOP
sources? - In the
finally
blocks forBlockManager.processBody()
andBlockManager.processForBody()
, there is theif (reason == null && wa.tm.getBlock().hasStopAfter)
conditional that removes the timer. We do need to remove the timer but I think this should be done using theFinalizable
.
4. The following usage in BlockManager
is expensive (in BM.stopAfter()
and in the finally
blocks of BM.processBody()
and BM.processForBody()
):
SessionManager sm = SessionManager.get(); Session session = sm.getSession();
For performance reasons, let's save the session
in the BlockManager.WorkArea
. We can do that the first time we encounter a STOP-AFTER
(in BM.stopAfter()
). Then we don't need to look it up ever again.
5. I would prefer if we keep a reference to the StopAfterTimer
instance in the BlockDefinition
. That way, all the BlockManager
operations can directly reference things without calling StopAfterTimer.getStopAfterTimer()
and we don't need to have the StopAfterTimer.stopAfterTimerOuts
map. This is faster and easier to uderstand. It also avoids the synchronization issues.
6. Some minor things:
- If we need to keep "Control.blockInterrupt()@, we need to change the name because it is confusing. It can suggest that we are "blocking interrupts" (preventing them from occurring).
- If we need to keep
Control.sendIntreruptMessage()
, it should be spelledsendInterruptMessage()
.
#21 Updated by Greg Shah about 2 months ago
Code Review (Part 2) Task Branch 4347a Revision 15204
7. Should we (can we?) name the Timer
thread to make it clear which session it belongs to? Otherwise it is just going to be chaos.
8. I'm surprised the timer thread doesn't need a FWD context. On that thread we are calling multiple context-local iooperations. This doesn't seem right.
9. I don't undertstand the usage of Control.init()
in the task's run()
method. That adds the current thread to the stack which seems wrong.
#22 Updated by Eduard Soltan about 1 month ago
def var a as int. def var b as int. do stop-after 5 on stop undo, leave: repeat: a = a + 1. end. catch eStopAfter as Progress.Lang.StopAfter: message "eStopAfter Error". b = b + 1. end catch. end. message b. message "Normal End".
I tried to run this code in Procedure Editor, but it does not work as expected.
In documention for Stop After
it is stated that catch
takes precedence over on stop
phrase. So the message from catch
should be displayed.
However the catch
is completely ignored. And it just follow the on stop
phrase behaviour.
And even if I modify stop
phrase to on stop undo, retry
and add if retry then do
. It will follow the retry behaviour.
Maybe some configuration should be added to enable behaviour describe in the documentation?
#23 Updated by Constantin Asofiei about 1 month ago
Greg, -catchStop 1
startup option is required for progress.lang.stopafter
to be thrown. Do we want to implement this now, as we don't really have even the skeletons for these classes.
Eduard, please change the test so that the CATCH ... StopAfter
is in an outer block, is the behavior the same? What if the program in note 22 is without a CATCH block, and is ran from another external program with the CATCH ... StopAfter
block? Is the StopAfter structured exception still raised? What if the external program has just the ON STOP condition, without the CATCH?
In any case, if the STOP condition is morphed into a structured Stop exception, then we need to behave somehow similar to the ERROR condition which is morphed into a structured App/SysError exception.
#24 Updated by Marian Edu about 1 month ago
As Constantin said you have to move the catch
block out of the stop-after
one and also make sure you remove the on stop
condition from that block else you actually tell it how to handle the STOP
condition itself so it won't raise the STOP
condition anymore.
def var a as int. def var b as int. do stop-after 5: repeat: a = a + 1. end. end. message b. message "Normal End". catch eStopAfter as Progress.Lang.StopAfter: message "eStopAfter Error". b = b + 1. end catch.
#25 Updated by Eduard Soltan about 1 month ago
Marian Edu wrote:
As Constantin said you have to move the
catch
block out of thestop-after
one and also make sure you remove theon stop
condition from that block else you actually tell it how to handle theSTOP
condition itself so it won't raise theSTOP
condition anymore.
I tested your example and other similar cases, but still get the same behaviour. Even Normal End
is not showed.
I think that here a stop condition is raised from do
block and is propagated up to the call stack, and not being catch by any catch
statement.
May guess is that it still might be a problem with configuration.
#26 Updated by Greg Shah about 1 month ago
I tried to run this code in Procedure Editor, but it does not work as expected.
Do not run any test code from inside the Procedure Editor. The Procedure Editor is 4GL code and it changes the execution environment. Make sure you run from the command line with the -p <filename>
option.
#27 Updated by Greg Shah about 1 month ago
Greg,
-catchStop 1
startup option is required forprogress.lang.stopafter
to be thrown.
Are you saying that -catchStop 0
is the default? I thought that -catchStop 1
was the default.
Do we want to implement this now, as we don't really have even the skeletons for these classes.
If the default in 11.x is -catchStop 0
, then we can defer the implementation of that part.
#28 Updated by Constantin Asofiei about 1 month ago
Greg Shah wrote:
Greg,
-catchStop 1
startup option is required forprogress.lang.stopafter
to be thrown.Are you saying that
-catchStop 0
is the default? I thought that-catchStop 1
was the default.
According to https://docs.progress.com/bundle/openedge-startup-and-parameter-reference-117/page/Catch-STOP-catchStop.html , 0 is default. This was introduced with 11.7, I didn't double-check on this OpenEdge version.
#29 Updated by Greg Shah about 1 month ago
Good.
Eduard: Create a new task for the -catchStop
behavior. Focus your implementation for this task on the traditional STOP
condition behavior.
#30 Updated by Marian Edu about 1 month ago
Constantin Asofiei wrote:
According to https://docs.progress.com/bundle/openedge-startup-and-parameter-reference-117/page/Catch-STOP-catchStop.html , 0 is default. This was introduced with 11.7, I didn't double-check on this OpenEdge version.
Yes, it was added as 'technology-preview' in 11.7 and it was off by default, however it is turned on by default starting with version 12.0.
#31 Updated by Greg Shah about 1 month ago
That explains it. I was looking at the 12.x docs.
#32 Updated by Eduard Soltan about 1 month ago
- File testsFile.p
added
Committed on 4347a, revision 15205.
Created a suite of test cases.
Also find some scenarios which behaviours in 4gl i, don't quiet understand.
outerBlock1: do stop-after 3 on stop undo, leave: innerBlock1: do stop-after 8 on stop undo, retry: if retry then do: message "InnerBlock retry". leave innerBlock1. end. repeat: a = a + 1. end. end. end.
Here a StopCondition is triggered on outerBlock1
, however because the nested innerBlock1
which has a on stop undo, retry
(this happens only with retry) it will not break from infinite repeat
and even retry
will not be triggered for innerBlock1
block.
for each tt1 stop-after 3 on stop undo, next: message tt1.f1. repeat: a = a + 1. end. end.
Here I would expect after each StopCondition
to move to the next reocr from tt1, however at first triggered stopcondition
for each is exited.
#33 Updated by Eric Faulhaber about 1 month ago
Eduard Soltan wrote:
Committed on 4347a, revision 15205.
I understand from today's meeting that this commit addresses the code reviews. Is it ready for a follow-up review, or is there more work to go?
Created a suite of test cases.
Also find some scenarios which behaviours in 4gl i, don't quiet understand.[...]
Marian, Greg, or Constantin, can you provide any guidance w.r.t. Eduard's findings here?
#34 Updated by Marian Edu about 1 month ago
Eduard Soltan wrote:
Here a StopCondition is triggered on
outerBlock1
, however because the nestedinnerBlock1
which has aon stop undo, retry
(this happens only with retry) it will not break from infiniterepeat
and evenretry
will not be triggered forinnerBlock1
block.
Guess this is just a bug in 4GL, the client becomes irresponsive in this case so really there isn't much to explain. Nesting stop-after
doesn't seems to be a good idea :)
Here I would expect after each
StopCondition
to move to the next reocr from tt1, however at first triggeredstopcondition
for each is exited.
That one works as expected for me (OE12.2), what you can do so it doesn't look like it's mining bitcoins is to use `pause x` (with x higher than the value of stop-after) instead of the repeat loop ;)
#35 Updated by Constantin Asofiei about 1 month ago
Marian Edu wrote:
That one works as expected for me (OE12.2), what you can do so it doesn't look like it's mining bitcoins is to use `pause x` (with x higher than the value of stop-after) instead of the repeat loop ;)
By 'expected' you mean the FOR EACH
is honoring the 'next' in on stop undo, next
?
#36 Updated by Marian Edu about 1 month ago
Constantin Asofiei wrote:
Marian Edu wrote:
That one works as expected for me (OE12.2), what you can do so it doesn't look like it's mining bitcoins is to use `pause x` (with x higher than the value of stop-after) instead of the repeat loop ;)
By 'expected' you mean the
FOR EACH
is honoring the 'next' inon stop undo, next
?
Yes, i see all records in the temp-table so it's doing the next as requested. When using `repeat` the client just looks like non responding but it works as well.
#37 Updated by Constantin Asofiei about 1 month ago
I think there is some behavior related to blocking UI statements (like UDPATE, or PAUSE): if these exist, then the STOP condition is delayed until there is some user interaction (a key is pressed).
#38 Updated by Greg Shah about 1 month ago
I think there is some behavior related to blocking UI statements (like UDPATE, or PAUSE): if these exist, then the STOP condition is delayed until there is some user interaction (a key is pressed).
Are you talking about "infinite loop retry protection"?
What led you to the linkage with STOP
conditions?
#39 Updated by Constantin Asofiei about 1 month ago
Greg Shah wrote:
I think there is some behavior related to blocking UI statements (like UDPATE, or PAUSE): if these exist, then the STOP condition is delayed until there is some user interaction (a key is pressed).
Are you talking about "infinite loop retry protection"?
Once the STOP-AFTER time has elapsed, the blocking UI operation like UPDATE does not get dismissed until a user presses a key.
#40 Updated by Greg Shah about 1 month ago
Eduard, Please do the following:
- In the testcase, add elapsed timers around each of the cases where you are expecting a timeout. This will tell us the granularity of the timer services. For example, when you use
decimal_timeout = 3.14.
does the timer have a granularity of seconds or is it down to a fractional second? - Please analyze the results of the testcase in answering the questions from #4347-19. Document the findings here. I don't want someone to have to do that analysis twice when you already have to do it to make this implementation. You can ignore the questions related to exception processing since that is being deferred.
#41 Updated by Greg Shah about 1 month ago
Constantin Asofiei wrote:
Greg Shah wrote:
I think there is some behavior related to blocking UI statements (like UDPATE, or PAUSE): if these exist, then the STOP condition is delayed until there is some user interaction (a key is pressed).
Are you talking about "infinite loop retry protection"?
Once the STOP-AFTER time has elapsed, the blocking UI operation like UPDATE does not get dismissed until a user presses a key.
Interesting. Eduard: Make sure you have tests to show this. See Language Statements that Block for User Input.
#42 Updated by Greg Shah about 1 month ago
Here a StopCondition is triggered on
outerBlock1
, however because the nestedinnerBlock1
which has aon stop undo, retry
(this happens only with retry) it will not break from infiniterepeat
and evenretry
will not be triggered forinnerBlock1
block.Guess this is just a bug in 4GL, the client becomes irresponsive in this case so really there isn't much to explain. Nesting
stop-after
doesn't seems to be a good idea :)
OK, we don't need to implement this bug (or "quirk") of the 4GL. Since the result in OE is "unresponsive", it should not be used in real code today. We just need to implement the reasonable approach and document it here.
Here I would expect after each
StopCondition
to move to the next reocr from tt1, however at first triggeredstopcondition
for each is exited.That one works as expected for me (OE12.2), what you can do so it doesn't look like it's mining bitcoins is to use `pause x` (with x higher than the value of stop-after) instead of the repeat loop ;)
Do we need to confirm the 11.x behavior?
#43 Updated by Eduard Soltan about 1 month ago
Greg Shah wrote:
- 4GL syntax
- "time limit" value
- Test both literals and "complex" expressions (a variable and a calculation should be enough).
Both literals and complex expression can be used for stop-after values. Both for 4gl and fwd.
- Is the value really required to be an integer? Can it be decimal? Can it be int64?
It can also be decimal or int64.
In fwd stopAfter(NumberType) method is called for decimal or int64 case.
I tried to put 3.9 seconds for stop-after, but elapsed time always shows around 3010 milisends. So in think that the decimal value is ceiled.
- Can you have both an
ON STOP
phrase and aSTOP-AFTER
phrase on the same block?
Absolutely, for both fwd and 4gl.
- Is
STOP-AFTER
essentially anON STOP UNDO, LEAVE
but only matching a timeout and not other kinds ofSTOP
?
From my point of view yes. The inner blocks with on stop undo, ...
does not respond to StopCondition raised by outer block.
- If you can have both an
ON STOP
phrase and aSTOP-AFTER
phrase on the same block, how do they interact?
StopCondition raised by this block timeout should be handled by this block upwards, even if a condition was raised in a nested block with on stop undo, ...
phrase
- Timers
- Does the timer get reset at the top of every looping block that has
STOP-AFTER
?
- Check this for both iterate and RETRY.
Yes, timer is reseted at every iteration of the looping block and for retry iteration also, regrardless of the nature of the retry (cause by stop after timeout or not).
- Does the timer get removed after the final exit from any block that has
STOP-AFTER
?
I think it is removed, because a stop condition is not triggered if the block was finished before timeout was hit.
- Nested Blocks
- Outer block is
STOP-AFTER 3
and inner block isSTOP-AFTER 8
, does a timeout in the inner block properly target the outer block?
Stop condition for the outer block will be raised and will be handled at the outer block.
Except for the case when the inner block has a on stop undo, retry
. In my current implementation I don't replicate this, I just throw the StopCondition until I reach outerBlock scope and continue to do that until I reach on stop undo, ...
.
- Outer block is
STOP-AFTER 8
and inner block isSTOP-AFTER 3
, does a timeout in the inner block properly target the inner block?
If the inner block has a on stop undo, ...
is handled there at the inner block level or is passed to the next level if not.
I noticed that pause
is not stopped by stop-after timeouts in 4gl, also in fwd that does not happen either. Because Control.intrerupt
does not stop the pause. But I have to investigate if it matches with Language Statements that Block for User Input.
#44 Updated by Greg Shah about 1 month ago
Is
STOP-AFTER
essentially anON STOP UNDO, LEAVE
but only matching a timeout and not other kinds ofSTOP
?From my point of view yes. The inner blocks with
on stop undo, ...
does not respond to StopCondition raised by outer block.
Did you check that an explicit STOP.
statement will NOT be caught?
I just throw the StopCondition until I reach outerBlock scope and continue to do that until I reach on stop undo, ....
This suggests that the STOP-AFTER
does not add an "implicit" ON STOP
phrase.
1. Unless there is an explicit ON STOP
phrase the timeout will not be caught. Is that correct? OR will the timeout generated STOP
exit from the outerBlock scope and cause the logic to continue processing even without an explicit ON STOP
?
2. Does the ON STOP
have to be be in an enclosing scope or can it be caught at the outerBlock scope?
#45 Updated by Eduard Soltan about 1 month ago
Greg Shah wrote:
From my point of view yes. The inner blocks with on stop undo, ... does not respond to StopCondition raised by outer block.
Did you check that an explicit STOP. statement will NOT be caught?
In that case the StopCondition will be caught at innerblock scope.
1. Unless there is an explicit
ON STOP
phrase the timeout will not be caught. Is that correct? OR will the timeout generatedSTOP
exit from the outerBlock scope and cause the logic to continue processing even without an explicitON STOP
?
Unless there is an explicit ON STOP
phrase the timeout will not be caught.
2. Does the
ON STOP
have to be be in an enclosing scope or can it be caught at the outerBlock scope?
outerBlock: do stop-after 5 on stop undo, leave: ... end.
If you mean that a 5 seconds timeout was raised by outerBlock in the example, then the StopCondinition will be caught at outerBlock scope.
#46 Updated by Eduard Soltan about 1 month ago
Constantin Asofiei wrote:
- I think if we need to have a way to disable
STOP-AFTER
, via a directory configuration - the reason is if theSTOP-AFTER
values are tuned for OE performance, in FWD we may get into canceling tasks which should not be canceled.
I added in dataset directory.xml, server/runtime/default
a new configuration node, is this a good location for it?
<node class="boolean" name="stopAfter"> <node-attribute name="value" value="TRUE"/> </node>
Committed on 4347a, revision 15206. Support for directory parameter.
#47 Updated by Constantin Asofiei 30 days ago
- StandardServer
- only change is a 'bad import', revert this file to rev 15203.
- BlockManager
- in
ContextContainer.initialValue
, read the directory value usingUtils.getDirectoryNodeBoolean
, withUtils.DirScope.BOTH
, and depending on this flag, either create or don't theWorkArea.stopAfterTimer
instance. After that, you just check forstopAfterTimerl
is null or not, to see if FWD has disabled this feature. The path needs to belegacy-system/stopAfter
.
- in
- StopAfterTimer
- in the constructor, the
if (timer == null)
is not needed - you can even maketimer
final I think. Also, the javadoc is wrong - please do a 'ant javadoc' and fix any errors. - add a string parameter to
new Timer
, with the name similar to the Conversation thread (i.e. StopAfter [<id>:bogus]), so theTimer
thread is named. - retry/iterate/finished - what happens if the timer fires when these are executed? Do we need some synchronization here?
- I see that there is a StopAfterTimer for each context now - we should create this instance 'lazily', when the first STOP-AFTER block is encountered.
- the constructor should get directly the Control instance, and not the
ContextLocal
instance.
- in the constructor, the
- for the new
Control
methods,getCurrentContext
andsetControl
, I think we need to secure them viaSecurityUtil.checkCaller
.
#48 Updated by Eduard Soltan 29 days ago
Constantin Asofiei wrote:
- StopAfterTimer
- in the constructor, the
if (timer == null)
is not needed - you can even maketimer
final I think. Also, the javadoc is wrong - please do a 'ant javadoc' and fix any errors.
Does this mean that a timer thread will be running even if there isn't any stop-after blocks?
innerBlock: do stop-after 5: ... end. some other code without stop-after blocks ....
So after execution of innerBlock
, we will still have a timer thread without any task scheduled.
#49 Updated by Eduard Soltan 29 days ago
Committed on 4347a, revision 15208.
#51 Updated by Greg Shah 22 days ago
Code Review 4347a Revisions 15205 through 15208
1. The BlockManager.LOG
should not be initialized using ErrorManager.class
.
2. We should default isStopAfterActive
to true
. Setting it to false
should be an exceptional case.
3. The registration code wa.stopAfterTimer = new StopAfterTimer(Control.getCurrentContextName(), Control.getCurrentContext());
. This is 2 calls to checkCaller()
and each one is very expensive. We definitely don't want 2 calls there. One call to Control.getCurrentContext()
already returns the Control
instance so why look it up again? In fact, can't we just use the thread name from the CurrentThread
instead? Then Control.getCurrentContextName()
can be removed.
4. In BlockManager.stopAfter()
, please convert e.printStackTrace();
to proper logging.
5. The use of checkCaller()
in Control
is incorrect. The concept of checkCaller()
is that it creates a stack trance and looks at the caller to see if the call comes from an authorized location. For example, in SecurityManager
you can see:
public static synchronized boolean updateCachedUserAccount(UserDef user) throws RestrictedUseException { // restrict the use of this method SecurityUtil.checkCallerAbort("com.goldencode.p2j.main.UpdateAccountTask.run"); ...
Notice that the calling class and method name is specified (this can also be an array of possible callers). It does not pass "com.goldencode.p2j.security.SecurityManager.updateCachedUserAccount"
since it isn't a caller of itself.
The way you've written the code, those new methods cannot be called from anywhere but itself which doesn't make sense.
Also, why not call checkCallerAbort()
and let it handle raising the RestrictedUseException
?
6. Control.getCurrentContext()
there is a call to controlThread.getName();
that seems unnecessary.
7. Help me understand why the implementation of Control.setControl()
is correct. In particular, don't we need to restore the state after we are done interrupting? I'm also worred that the current approach won't work when the control flow is on the client side. Have you checked that?
8. In StopAfterTimer.addTimeOut()
, directly raising NumberedException
seems to be incorrect. Is there a 4GL-compatible error that needs to be raised?
9. In StopAfterTimer.Task.run()
there is a e.printStackTrace();
that needs to be reworked.
10. Please rename StopAfterTimer.isStopedBlock()
to StopAfterTimer.isStoppedBlock()
.
11. Is the change in EnvironmentOps.getKeyValue()
based on 4GL behavior?
12. Control
should have all history entries for the branch under H019.
#52 Updated by Eduard Soltan 22 days ago
Greg Shah wrote:
7. Help me understand why the implementation of Control.setControl() is correct. In particular, don't we need to restore the state after we are done interrupting?
With Control.setControl() I just set serverInstance context local variable of the timers thread, to the control of the Conversation thread.
What do you mean by restore the state? Perhaps restore the state of control.local
?
I'm also worred that the current approach won't work when the control flow is on the client side. Have you checked that?
As was mentioned in #4347-37, some blocking UI statements will block until a user interaction or in case of for example pause 30
, it will wait for 30 seconds to raise the StopCondition.
I am thinking about adding something like this.
+ public void checkStopAfterCondition() + throws StopConditionException + { + if (stopAfterTimer != null && + stopAfterTimer.isStopAfterConditionRaised()) + { + throw new StopConditionException("Thread"); + } + } + /** * Return the network interface of LOG-MANAGER client service. * @@ -3423,6 +3435,8 @@ targetWindowWorker(hWin, false), stack); + lt.checkStopAfterCondition(); + if (mrv != null) { // update the caller's var with the result of the edit operation @@ -9319,6 +9333,7 @@ ChooseData res = lt.client.choose(frame.getFrameId(), inc, wl); + lt.checkStopAfterCondition(); Accessor kv = c.getKeysVar(); // make the list of keys available to the caller if requested @@ -15313,6 +15328,8 @@ // notified via the applyChanges() method of this class int keycode = lt.client.pause(millis, text, noMessage, targetWindowWorker(hWin, true)); + lt.checkStopAfterCondition(); +
#53 Updated by Eduard Soltan 22 days ago
Greg Shah wrote:
8. In
StopAfterTimer.addTimeOut()
, directly raisingNumberedException
seems to be incorrect. Is there a 4GL-compatible error that needs to be raised?
From the documentation I read, in 4gl a block with stop-after 0
is treated a normal block and not a error of any type thrown.
By throwing a NumberedException I prevented it going further.
#54 Updated by Greg Shah 21 days ago
Eduard Soltan wrote:
Greg Shah wrote:
7. Help me understand why the implementation of Control.setControl() is correct. In particular, don't we need to restore the state after we are done interrupting?
With Control.setControl() I just set serverInstance context local variable of the timers thread, to the control of the Conversation thread.
It is a good point that this is for the timer thread.
What do you mean by restore the state? Perhaps restore the state of
control.local
?
Yes, but you're right that it may not matter. Though my sense is that it is the cause of the client side issue.
I'm also worred that the current approach won't work when the control flow is on the client side. Have you checked that?
As was mentioned in #4347-37, some blocking UI statements will block until a user interaction or in case of for example
pause 30
, it will wait for 30 seconds to raise the StopCondition.
Yes, this is a problem. It shouldn't block waiting on the server for the client to return. For example, what happens if the block is indefinite? That is not OK.
Unfortunately, the "aynchronous interrupt" processing was designed to be sent from the client to the server, but not the other way around. This is because it is completely based on the concept of the user triggering CTRL-C
on the client which the key processing code will detect and then call ThinClient.watchCtrlC()
. That code is aware of whether the current control flow is on the client or server and acts accordingly.
Perhaps we need to down-call to cause this to occur. But I don't think it is as simple as making a downcall, since this needs to be done from the timer thread which has no context and thus no access to the protocol/queue. Even if we solve the downcall issue, we can't invoke TC.watchCtrlC()
directly. I think we might have to insert CTRL-C
into the type ahead queue. Interestingly, that should even work when the client is not blocked waiting for input.
I am thinking about adding something like this.
No, this is not the correct approach. The existing Control
facility already provides almost a complete soution. The challenge here is to use it properly. When used properly, the interruption code can detect where the control flow currently exists (local or remote) and it will either trigger the interrupt locally or send the interrupt over the network to occur on the other side. As noted above, the gap right now is the lack of a server to client asynch interrupt path.
I think your code that sets local
to true
is a problem. It tells the Control
subsystem that execution is happening on the server, but that isn't necessarily correct.
#55 Updated by Eduard Soltan 21 days ago
Greg Shah wrote:
Perhaps we need to down-call to cause this to occur. But I don't think it is as simple as making a downcall, since this needs to be done from the timer thread which has no context and thus no access to the protocol/queue. Even if we solve the downcall issue, we can't invoke
TC.watchCtrlC()
directly. I think we might have to insertCTRL-C
into the type ahead queue. Interestingly, that should even work when the client is not blocked waiting for input.
In fact, I saved Control of the Conversation thread in StopAfterTimer object, and this one could be used in timer thread as well.
Also in Control, there is the session of the Conversation thread and I could use it the send message to remote connection. As it done in Control.interruptImpl method.
However as you mentioned that aynchronous interrupt was designed to be triggered from client to server. Running the interruptImpl() on server with the control on client side, will force the execution to enter if(clientInstance == null && !rightSide)
and thus not sending an message to client.
I think that we should check the control
.
If it is true
, call the Control.interrupt
as before and interrupt the conversation thread.
If it is false
, have a separate method that will send a interrupt message to remote side.
#57 Updated by Eduard Soltan 20 days ago
Imagine an situation.
If I am in timer thread and control is on the client side, I send a interrupt message to the client.
While I send that message, control changes from back to the server. In this case client while processing the interrupt message, will send another interrupt message to server. And from this back and forth passing of the ball will result an infinite loop.
My question is, can I prevent changing of control when interrupt is about to happen or how can I synchronize server and client?
#58 Updated by Greg Shah 20 days ago
Eduard Soltan wrote:
Imagine an situation.
If I am in timer thread and control is on the client side, I send a interrupt message to the client.While I send that message, control changes from back to the server. In this case client while processing the interrupt message, will send another interrupt message to server. And from this back and forth passing of the ball will result an infinite loop.
My question is, can I prevent changing of control when interrupt is about to happen or how can I synchronize server and client?
This case should already be handled. You cannot block the flow of control from shifting from the client back to the server during the moment that you send the interrupt message to the client. However, we planned for this case. At the moment that the CTRL-C is generated, if the flow of control has moved to the server, then we forward the interrupt there. The Control
instance there will either
- Detect that the processing is local and cause a local delivery. OR
- Detect that the processing has moved back to the client, in which case the client will handle it. I think this scenario is detected by the client side and a local interruption will happen there if we've sent a message to the server and in the intervening time the control flow has come back to the client.
Anyway, this is what I recall. If we simulate the client side generation of the CTRL-C
, the rest should just work as it does today.
#59 Updated by Eduard Soltan 19 days ago
Greg Shah wrote:
I think we might have to insert
CTRL-C
into the type ahead queue. Interestingly, that should even work when the client is not blocked waiting for input.
From I found it gets keys is in TypeAhead$KeyReader.run
, directly from the ScreenDriver. And in that case I am not sure, how it could be done.
Perhaps we need to down-call to cause this to occur. But I don't think it is as simple as making a downcall, since this needs to be done from the timer thread which has no context and thus no access to the protocol/queue. Even if we solve the downcall issue, we can't invoke
TC.watchCtrlC()
directly. I think we might have to insert CTRL-C into the type ahead queue. Interestingly, that should even work when the client is not blocked waiting for input.
Is there any any reason, that we can't invoke TC.watchCtrlC()
directly?
Because I think that we can add another data member to LogicalTerminal (similar to client
of type ClientExports), of type Watcher (which is implemented by ThinClient). And also give LogicalTerminal to timer's thread context, and from there invoke TC.watchCtrlC
.
Also we should synchronized the message queue, between Conversation and Timer threads.
#60 Updated by Eduard Soltan 15 days ago
Committed on 4347a, revision 15209.
I implemented a mechanism to interrupt a Conversation thread and synchronize the actions between Conversation and Timer thread's.
When a stop-after timeout is raised, Control.stopAfterInterrupt
method will be called. And a semaphore will be acquired.
There are same cases I took into consideration for interrupt synchronization:
- Control is on server side, if we want to move to remote side. pushRemoteCall
will be blocked (if a switch to remote is needed) and will wait until we handle interrupt on server side.
- Control is on client side, we send a interrupt message to client side.
- Control is on client side, but in meantime it switch back to server. For this case criticalSection
lock was introduced, and it will either:
1) send a interrupt message to the client. client will return back the interrupt message (because it has already switch control), and the clinet will be stopped.
2) or the server will be stopped from popRemoteCall
method.
#61 Updated by Greg Shah 14 days ago
I think we might have to insert
CTRL-C
into the type ahead queue. Interestingly, that should even work when the client is not blocked waiting for input.From I found it gets keys is in
TypeAhead$KeyReader.run
, directly from the ScreenDriver. And in that case I am not sure, how it could be done.
We could deliver the "event" (just a volatile boolean
local var that has a setter) to the TypeAhead.KeyReader
. Then inside run()
, we would check if that flag is set. If so, we raise CTRL-C instead of calling mgr.readKey()
.
This has the very real advantage of allowing the system to work as it already is designed.
Perhaps we need to down-call to cause this to occur. But I don't think it is as simple as making a downcall, since this needs to be done from the timer thread which has no context and thus no access to the protocol/queue. Even if we solve the downcall issue, we can't invoke
TC.watchCtrlC()
directly. I think we might have to insert CTRL-C into the type ahead queue. Interestingly, that should even work when the client is not blocked waiting for input.Is there any any reason, that we can't invoke
TC.watchCtrlC()
directly?
Maybe. I don't know for sure, but I suspect that if we let the TypeAhead
processing happen independently, that it could lead to weird results. You could try it.
Because I think that we can add another data member to LogicalTerminal (similar to
client
of type ClientExports), of type Watcher (which is implemented by ThinClient). And also give LogicalTerminal to timer's thread context, and from there invokeTC.watchCtrlC
.
Also we should synchronized the message queue, between Conversation and Timer threads.
This is something I want to avoid. Rewriting the synchronization of our protocol and/or Control
is not something that is likely to work well.
#62 Updated by Greg Shah 14 days ago
Code Review Task Branch 4347a Revision 15209
1. The changes to rewrite the Control
synchronization are extensive and they do not look safe. The core idea of the class is being changed in multiple ways. For example, the class was written to deliver asynchronous interrupts cleanly into a full-duplex communication protocol but this new version attempts to block control flow transfer while still using the protocol for delivery of the interrupt. My suggestion to insert the CTRL-C
on the client side (in TypeAhead
) would eliminate the need for changing the Control
implementation. That is much more likely to be a safe approach than the extensive changes that have been made. My sense is that the current changes will cause deadlocks.
By the way, we have the ctrl-c
tests as part of the ChUI Regression suite. When you have an implementation that passes review, those tests will need to pass.
2. The StopAfterTimer
logger is using BlockManager.class
.
#63 Updated by Greg Shah 14 days ago
Perhaps we need to down-call to cause this to occur. But I don't think it is as simple as making a downcall, since this needs to be done from the timer thread which has no context and thus no access to the protocol/queue. Even if we solve the downcall issue, we can't invoke
TC.watchCtrlC()
directly. I think we might have to insert CTRL-C into the type ahead queue. Interestingly, that should even work when the client is not blocked waiting for input.Is there any any reason, that we can't invoke
TC.watchCtrlC()
directly?Maybe. I don't know for sure, but I suspect that if we let the
TypeAhead
processing happen independently, that it could lead to weird results. You could try it.
To clarify: I am worried that in an interactive client, the TypeAhead.KeyReader
will still be processing at the same time as any timer thread downcalls to TC.watchCtrlC()
. This means there is a latent race condition where the user could generate a CTRL-C
causing TypeAhead.KeyReader
to call TC.watchCtrlC()
at the same time as the timer thread downcalls to TC.watchCtrlC()
. The results could be bad.
On the other hand, this could probably be dealt with using a synchronized
block around the body of watchCtrlC()
. I think this is something to try before you try to inject a CTRL-C into TypeAhead
.
#64 Updated by Eduard Soltan 14 days ago
Committed on 4347a, revision 15210.
Changed implementation to insert Ctrl-C into TypeAhead queue.
Added as special RoutingKey, for server side interrupt.
KeyReader thread has to be interrupted during generation of server side Ctrl-C, to make it break from mgr.readKey().
Tested latest implementation on customer application scenario, it solves the issue and didn't see any obvious regressions.
#65 Updated by Greg Shah 13 days ago
Code Review Task Branch 4347a Revision 15210
This is good. It is very close.
1. I'm worried about the use of message.setType(MessageTypes.REQUEST_SYNCH);
in interruptImpl()
. I'm not sure this is OK. We will see in testing.
2. I don't want UI code to be mixed into the Dispatcher
code. Move the TypeAhead
access into a new method in Control
and call that from the Dispatcher
.
3. Remove the unnecessary imports from Control
.
4. Fix the StopAfterTimer
logger init.
#67 Updated by Eric Faulhaber 12 days ago
Eduard reported in today's meeting that he was working to get Ctrl+C testing running. Greg, FYI, there is a 4347a/15211 revision, which I assume is what he is testing.
#69 Updated by Eduard Soltan 12 days ago
I tested Ctrl-C tests, with 4347a changes. And haven't got any regressions.
Also tested customer scenario, and it also works as expected.
#70 Updated by Alexandru Lungu 12 days ago
- Status changed from Review to Internal Test
Eduard, please assess #4347-68 for 4347a.
This should be ready to be merged, right? Greg, other testing you suggest for this?
#71 Updated by Eduard Soltan 12 days ago
Committed on 4347a, revision 15212.
#72 Updated by Greg Shah 12 days ago
Additional testing:
- ChUI regression testing. It uses virtual sessions and more than one FWD server which is definitely affected by these changes.
- Smoke testing with a large customer GUI application.
- ETF or some other appserver application testing to confirm that we haven't affected anything related to appserver.
At that point, we should be good to merge.
#73 Updated by Alexandru Lungu 12 days ago
ChUI regression testing. It uses virtual sessions and more than one FWD server which is definitely affected by these changes.
I am picking these up on my bare metal machine and test this weekend. The point is that on local environments, the tests are a bit slow so there is a high change to have timeouts on screens.
Smoke testing with a large customer GUI application.
Andrei Iacob will handle this now on another customer application.
ETF or some other appserver application testing to confirm that we haven't affected anything related to appserver.
This is for Constantin.
#74 Updated by Alexandru Lungu 12 days ago
I am rebasing this now.
#75 Updated by Alexandru Lungu 12 days ago
Rebased 4347a to latest trunk. It is now at rev. 15315.
#76 Updated by Șerban Bursuc 12 days ago
Smoke tests for a customer GUI application passed.
#77 Updated by Constantin Asofiei 11 days ago
Alexandru Lungu wrote:
ETF or some other appserver application testing to confirm that we haven't affected anything related to appserver.
This is for Constantin.
ETF testing is OK.
#78 Updated by Alexandru Lungu 7 days ago
- Status changed from Internal Test to Merge Pending
- % Done changed from 0 to 100
ChUI regression passed. One test failed (time-out) when running all tests automatically, but I re-run it manually and it passed. It was a false negative.
I re-run the CTRL-C tests and they passed.
Eduard, please merge to trunk now.
Also, patch it to 7156b.
#79 Updated by Eduard Soltan 7 days ago
4347a was merged into trunk revision 15310, and archived.
#80 Updated by Alexandru Lungu 7 days ago
- Status changed from Merge Pending to Test
#81 Updated by Greg Shah 7 days ago
- Related to Feature #8904: implement -catchStop and the integration/conversion of STOP conditions into exceptions (progress.lang.stop and subclasses) added
#83 Updated by Eduard Soltan 5 days ago
Greg Shah wrote:
Are all of your testcases meeting the following criteria?
- Checked into xfer Testcases project.
- Implemented and runnable with ABLUnit.
- Consistently pass with trunk 15310.
I committed my testcases on TestCases project (stopAfter/StopAfterTest.cls), implemented with ABLUnit. And yes, they are passing with trunk 15310.