Bug #6923: inner transaction block reverts too much - Database - Golden Code Redmine

Bug #6923

inner transaction block reverts too much

Added by Radu Apetrii over 1 year ago. Updated over 1 year ago.

Status:

WIP

Priority:

Normal

Assignee:

Radu Apetrii

Target version:

Start date:

Due date:

% Done:

billable:

vendor_id:

GCD

case_num:

version_reported:

version_resolved:

6923-2.patch (1.55 KB) Radu Apetrii, 11/23/2022 06:14 AM

History

#1 Updated by Radu Apetrii over 1 year ago

Here is the code that I've been working with, on 3821c, rev.14354:

//define temp-table test field f1 as integer field f3 as integer.

for each test:
 delete test.
end.

do transaction:
    create test.
    test.f1 = 999.
    test.f3 = 2.
    create test.
    test.f1 = 998.
    test.f3 = 5.
end.

do transaction on error undo, leave:
    create test.
    test.f1 = 997.
    test.f3 = 7.
    create test.
    test.f1 = 996.
    test.f3 = 11.
    test.f3 = 13.
    do transaction:
        test.f3 = 17.
        undo, leave.
    end.
end.

find last test.
message test.f3.

define query q for test scrolling.
open query q preselect each test.
get first q.
do while available(test):
  message test.f1 test.f3.
  get next q.
end.

for each test:
    delete test.
end.

If I test with temporary tables, here are the results:

4GL	FWD
13	7
999 2	999 2
998 5	998 5
997 7	997 7
996 13

If I test with persistent tables, here are the results:

4GL	FWD
2	2
996 13	997 7
997 7	998 5
998 5	999 2
999 2

I want to point out that my persistent table called test has 2 fields (f1 and f3, both of type integer), but, the field f1 is part of an index (I don't think it matters, but just in case).
The problem seems to be inside the inner transaction when encountering undo, leave. Instead of undo-ing/reverting only the value of test.f3 to 13, the program deletes the whole record of test with test.f1 = 996.

#2 Updated by Radu Apetrii over 1 year ago

Update: If you replace the middle transaction block with the code below, the program works fine. It reverts only the value of the test.f3, (re)making it equal to 13, and no other places are affected by the undo, leave.

do transaction on error undo, leave:
    create test.
    test.f1 = 997.
    test.f3 = 7.
    create test.
    test.f1 = 996.
    test.f3 = 11.
    test.f3 = 13.
    create test.
    test.f1 = 995.
    test.f3 = 19.
    find prev test.
    do transaction:
        test.f3 = 17.
        undo, leave.
    end.
end.

Also, maybe it was not clear enough, but the definition of the temp-table is commented out because I also have a persistent table called test. If I were to test this example with a temporary table, I would simply uncomment the first line.

#3 Updated by Radu Apetrii over 1 year ago

Start date deleted (~~11/10/2022~~)
Assignee set to Radu Apetrii
Status changed from New to WIP

An update to this issue:
I narrowed down the code, in order to describe what I have found until this moment. This is the most minimalistic example that does not work (I found others too):

define temp-table test field f1 as integer field f3 as integer.

do transaction:
    create test.
    test.f1 = 995.
    test.f3 = 13.

    do transaction:
        test.f3 = 17.
        undo, leave.
    end.
end.

find first test.

In 4GL there is no error given, while in FWD it says that it cannot find the first value of test.
Each record has a RecordState that defines the state that it is currently in. This is done as follows:

In the first do transaction, the record of the table test will receive the state NEW CHANGED TRACKED, which is fine because the intent is to flush it to the database if no errors appear.
In the nested do transaction, at line test.f3 = 17., the record will receive again the state NEW CHANGED TRACKED, which is fine again, due to the same reason as before.
At line undo, leave, the program reaches ChangeSet.rollback, more specifically, liveState.state = baselineState. This means that the record will reset its state to the baseline one.

In this case, the baseline should be NEW CHANGED TRACKED because we still have the outside transaction (the first one), but, that is not the case. The value of baselineState is DEFAULT, which means that when the flushing to the database happens, the record would remain untouched, because there is nothing to do with it (this is the DEFAULT behavior). It will exist in the buffer, but not in the database.
After that, I have investigated why the value of baselineState is DEFAULT at that moment and I have come to these conclusions:

Its value is only modified inside the constructor of ChangeSet.
The code that deals with creating new ChangeSet s is located in BaseRecord.prepareChangeSet:

// create a change set if one with the same transaction ID does not already exist
         if (changeSet == null || !changeSet.verify(session, txId))
         {
            changeSet = new ChangeSet(session, txId, data, state.state, txLevel);
         }

For this example, there should be 2 ChangeSet s, because there are 2 transactions that start. The second one is not created because the transaction ID is the same as the one before (and they should not be equal).
txId is obtained by taking the last 32 bits from txNestingId.
txNestingId is obtained in Session.getTxNestingId, and it consists of 2 parts: the first 32 bits represent the block nesting level and the last 32 bits are obtained from txCount.
I see that txCount is increased inside Session.beginTransaction, but that only happens once (for the first do transaction) and it seems to me that this method does not take into consideration nested do transaction s. Thus, the value of both ids is the same which leads to not making a new ChangeSet.

I've seen this behavior only inside nested do transaction s. Also, I have tested with 3821c, rev. 14361.
Should I continue to investigate this further? If so, can you give me some information on how txCount is supposed to work? Thank you.

#4 Updated by Eric Faulhaber over 1 year ago

When you nest transactions like:

do transaction:
   ...
   do transaction:
      ...
   end.
   ...
end.

...this does not represent two full transactions. It represents a full transaction with an enclosed sub-transaction. When OE opens a full transaction, any block further down the stack which normally would open its own, full transaction becomes a sub-transaction block. Think of a sub-transaction as a savepoint, where anything done in that block can be committed or rolled back. If it is rolled back, it does not affect work done outside of it, in the context of the full transaction. Sub-transactions can be nested arbitrarily deeply (there may be a practical limit, but I mean this conceptually).

In FWD, we only open a full database transaction at the outermost, full transaction block we encounter. It stays open and is committed or rolled back when that block ends or iterates. This is not ideal, as SQL best practices encourage transactions to be short-lived, and we don't know how long such a block will take to finish its work, but we follow the 4GL semantic in this regard.

As we encounter sub-transactions within that outermost, open transaction, we use:

savepoints to represent these sub-transaction units of work on the database side, and
ChangeSet levels to track the in-memory state of a record.

Both the database and in-memory state need to remain consistent w.r.t. each transaction and sub-transaction level, but they are not necessarily in sync with each other at a given point, because of the timing with which in-memory changes are flushed to the database. So, we use these two independent but related mechanisms.

Session.txCount is the count of full transactions opened by a particular Session instance. The purpose of this variable is only to produce a unique identifier for a particular, full transaction within that scope/context. It is not incremented for sub-transactions. Individual sub-transactions are identified by their level within a given full transaction.

Session.getTxNestingId provides both these values in a 64-bit long. The javadoc for this method is wrong (I must have changed the implementation without updating the javadoc). The method stores the current block's sub-transaction nesting level in the high-order 4 bytes of the return value, and the full transaction identifier (i.e., txCount) in the low-order 4 bytes.

There should only ever be 0 or 1 ChangeSet instances per record at any moment. There is never more than 1, because only one full transaction can be active at a time. So, I believe that part is working correctly. The bug could be in the restoration of state for a sub-transaction, but that is based on intuition only; I have not debugged into your test cases. OTOH, the fact that the results are different in your first test case between temp-table and persistent table use cases suggests some difference in the savepoint processing. Are you using H2 or PostgreSQL as the back-end for your persistent table use case?

Please continue with the investigation, taking my description of the intended design into account, and document your further findings here.

#5 Updated by Eric Faulhaber over 1 year ago

Eric Faulhaber wrote:

...the fact that the results are different in your first test case between temp-table and persistent table use cases suggests some difference in the savepoint processing.

I just realized the results of the 4GL programs are different between these two cases as well.

I think this is explained by the difference in transaction behavior between the temp-table and persistent table use cases.

For the persistent-table case, I believe the for each test loops at the start and end of the program which delete all the records in the table cause an implicit, full transaction to be scoped to the external procedure block. So, the two do transaction blocks are treated as sub-transactions.

For the temp-table case, I believe the delete loops do NOT cause an implicit, full transaction to be scoped to the external procedure block. So, the two do transaction blocks each are treated as full transactions.

You can create a listing of the original program in both forms using the COMPILE statement to confirm this.

Regardless, FWD's results still differ, so we need to understand why.

#6 Updated by Radu Apetrii over 1 year ago

Thank you for the explanations!
I've taken them on board and I have seen that the bug is in fact about the restoration of the state for a sub-transaction. Unfortunately, it seems that there are more factors to this than just assigning the correct state. I'll use two examples to illustrate this point.

1. This is the current implementation:

When entering the full transaction, the record will be given the NEW CHANGED TRACKED state. Also, this means that inside the ChangeSet class, the array states will have CHANGED on the first position.
When entering the sub-transaction, the state of the record will remain unchanged, but, the array states will have CHANGED on the second position as well.
At the line with undo, leave, the program would execute ChangeSet.rollback, which means that the record would be given the state DEFAULT.
When the sub-transaction ends, the program would try to flush the record to the database, which won't work because of the DEFAULT state that indicates that there is nothing to do with the record. This is fine because we do not want to flush the record when the sub-transaction ends.
When the outer transaction ends, the program would try to flush the record to the database, which won't work due to the same reason: the record's state is DEFAULT. The problem is that the record should've been committed to the database.

2. Instead of doing liveState.state = baselineState inside ChangeSet.rollback, I tried this:

if (inner > 0)
{
 liveState.state = states[inner - 1];
}
else
{
 liveState.state = baselineState;
}

This means that instead of assigning the baseline state, the record would receive the stored state (the one from states ) of the "parent" transaction block. This leads to the following steps:

The first two steps remain the same as in case 1.
At the line with undo, leave, the program would execute ChangeSet.rollback, which means that the record would receive the state CHANGED, taken from the array states. This state corresponds to the full transaction's stored state inside the array.
When the sub-transaction ends, the program would enter the flush method, but will not commit the record to the database. This happens because a transaction is already active (the full-transaction), and it is good, because this is not the moment to commit. But, inside the Validation.flush method, the record will change its state, becoming CACHED.
When the outer transaction ends, the program would try to flush the record to the database, but this will not work because the state CACHED does not suggest that it needs flushing.

In both cases the commit to the database will not happen, even though it should.
In the second case, the record has a good state for committing (CHANGED) at a given moment, but that moment is not the right one. In order to commit the record, its state should be CHANGED when the full transaction ends and the flushing is tried.
There are 2 solutions that come to my mind, but I'm not sure if any of them is actually intended:

The changing of the record's state could be done in two steps (at the moment, it happens only in the ChangeSet.rollback method): first, in the rollback method, make the state equal to DEFAULT so when the flushing part happens (from the sub-transaction), nothing would be done. Then, after the flushing from the sub-transaction, make the state equal to what it is supposed to be. That means that if we are still inside a sub-transaction, make the state equal to states[inner - 1], or, if we are in the full transaction block, make it equal to baselineState. Thus, the commit will only happen at the end of the full transaction, which I think was the intent anyways (with accent on "I think").
The other solution, perhaps not as "clean" as the other one, is an extension of the second case. The if (inner > 0) would remain inside ChangeSet.rollback, which means that the record's state will be updated (in this case it will become CHANGED). But, inside the flush method, instead of making the state equal to CACHED, maybe another check could be done to see if indeed the CACHED state should be given to the record or maybe another state (or remain unchanged, in this case CHANGED). This way, the state would remain the same until the record is finally committed to the database, and then it would be changed.

I will continue to investigate this, but I just wanted to provide an update. The main test is the same as in the #6923-3, the branch and revision are 3821c, rev. 14361.

Eric Faulhaber wrote:

Are you using H2 or PostgreSQL as the back-end for your persistent table use case?

I am using PostgreSQL to test with persistent tables.

For the persistent-table case, I believe the for each test loops at the start and end of the program which delete all the records in the table cause an implicit, full transaction to be scoped to the external procedure block. So, the two do transaction blocks are treated as sub-transactions.

For the temp-table case, I believe the delete loops do NOT cause an implicit, full transaction to be scoped to the external procedure block. So, the two do transaction blocks each are treated as full transactions.

You can create a listing of the original program in both forms using the COMPILE statement to confirm this.

I have done so. In my findings, the for each loops at the start and end do not affect the full transaction scope, so those two do transaction blocks are not treated as sub-transactions. A case where this would happen would be if the program would start with create test. Then, the whole program would be considered a large, full-transaction.
Still, I will provide further updates on this issue.

#7 Updated by Radu Apetrii over 1 year ago

File 6923-2.patch added

I attached a patch with some changes that, in my opinion, make more sense than the current implementation. Mainly, there are 2 changes that affect the program:

In a rollback situation, instead of updating a record's state with the baselineState in all cases, I made the state equal to the stored state of the "parent" transaction, if the "parent" transaction is a sub-transaction, or equal to the baselineState otherwise. This happens in ChangeSet.rollback.
When the states array is being updated, instead of storing the value flags, I thought that the intent was to store the state of the record. So I modified the parameter from flags to state.state. This happens in BaseRecord.dataChanged and in BaseRecord.bulkDataChanged.

Even though this patch solved some of the problems that I previously had, I discovered a test in which this logic doesn't fully work. There are some cases when it is not correct to restore a record's state by giving it the state that it had in the "parent" transaction.
This is the test with the problem:

define temp-table test field f1 as integer field f3 as integer.

do transaction:
    create test.
    test.f1 = 995.
    test.f3 = 13.

    create test.
    test.f1 = 996.
    test.f3 = 14.

    do transaction:
        find first test.
        test.f3 = 17.
        undo, leave.
    end.
end.

find first test.
message test.f1 test.f3.

What happens here is:

Right before the creation of the second record (the one with test.f1 = 996.), the first record is committed to the database. This means that the record changed its state, becoming CACHED TRACKED from NEW CHANGED TRACKED. This is fine and it makes sense.
The second record was created and right before the sub-transaction begins, the states array is updated with the state NEW CHANGED TRACKED on level 0 (corresponding to the full transaction) for that record.
Inside the sub-transaction, right before find first test., the record with f1 = 996 is committed to the database and its state changes from NEW CHANGED TRACKED to CACHED TRACKED. Also, the states array for that record, on level 1 (because we are in the sub-transaction) becomes CACHED TRACKED.
When the rollback happens, the second record "reverts" its state, becoming NEW CHANGED TRACKED again, because that was the value it last had in the full transaction.
When the flush happens, because the second record has the state NEW CHANGED TRACKED, it is committed to the database (or at least tried). This breaks the program, because the record already exists there and I cannot insert one with the same primaryKey.

I'll pend this issue for the moment, because I'm not sure when I should revert the state and when not. Maybe you can give me some information on this, if that's OK.
Also, these changes were tested with 3821c, rev.14373.

Also available in: Atom PDF

Project

General

Profile

FWD » Core Development » Database

Issues

Custom queries