Project

General

Profile

Bug #8593

H2 performance improvement for table 'truncate by multiplex'

Added by Constantin Asofiei about 1 month ago. Updated 1 day ago.

Status:
Review
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:
version:

Related issues

Related to Database - Bug #8703: NO-UNDO option at DMO must be removed New

History

#2 Updated by Constantin Asofiei about 1 month ago

  • Status changed from New to Review
  • Assignee set to Constantin Asofiei
  • % Done changed from 0 to 100
Created task branch 8593a_h2 which improves the DELETE FROM tt1 WHERE tt1._multiplex = ?. The reasoning is this:
  • normal DELETE operations in H2 will first do a SELECT to gather all the rows. This includes DELETE FROM tt1 WHERE tt1._multiplex = ?
  • with normal DELETE, each row is deleted individually from the index - this means that all the index columns are used to find the target node to be removed
  • the implementation first checks that the DELETE is for a _MULTIPLEX = ? condition, and if so, then:
    • the MultiplexedScanIndex is cleaned first, and gathers all PKs for the records being deleted with this multiplex ID
    • all indexes which have as first index column the _MULTIPLEX column will remove all these records, by matching only the multiplex ID
    • all remaining indexes assume the PK is the first column (I need confirmation that this will be always true), and just find that record in the index and remove it.
    • this avoids using all the index columns to find the record

Alexandru: please take a look at 8593a_h2.

#3 Updated by Alexandru Lungu about 1 month ago

Created task branch 8593a_h2 which improves the DELETE FROM tt1 WHERE tt1._multiplex = ?. The reasoning is this:
  • all remaining indexes assume the PK is the first column (I need confirmation that this will be always true), and just find that record in the index and remove it.

If you meant PK as recid, this is always the last column in an index, but the first column in the row. If you meant PK as H2 internal row key, this is kept outside the data array and I think it is not ever indexed.
This being said, you can't remove a row only based on the recid/PK from a tree index. You can however prune the tree index based on multiplex (as it is the first field in the index).
Please let me know if I misunderstood your intent. Without a full-scan of the index, you can't find a row solely on its PK.

One last important thing is that you still need to log the deletes. so you will most certainly need to "register" the row as being deleted in case of a roll-back (unless you do this only for NO-UNDO tables).

I didn't look into the changes yet.

#4 Updated by Constantin Asofiei about 1 month ago

You mean the Row.key is not the same as the internal PK?

Also, there is 'recid' index (with just this column) in H2 created for FWD temp-tables.

#5 Updated by Constantin Asofiei about 1 month ago

Constantin Asofiei wrote:

You mean the Row.key is not the same as the internal PK?

I mean Row.key to be the same as the 'recid' PK we set on position zero in the row's data array.

#6 Updated by Constantin Asofiei about 1 month ago

Alexandru, the assumption was that the MultiplexedBatchIndex$Batch holds data only for a certain multiplex. This is not correct (maybe because of re-used keys?).

Currently a batch can have rows for more than one multiplex ID.

#7 Updated by Alexandru Lungu about 1 month ago

Constantin Asofiei wrote:

Alexandru, the assumption was that the MultiplexedBatchIndex$Batch holds data only for a certain multiplex. This is not correct (maybe because of re-used keys?).

Currently a batch can have rows for more than one multiplex ID.

This is absolutely no good. The invariant shouldn't be (ever) broken. Do you have a scenario that triggers such anomaly?

#8 Updated by Constantin Asofiei about 1 month ago

The recreate is this:
  • mpid1.p
    def temp-table tt1 field f1 as int index ix1 f1.
    def var k as int.
    
    procedure proc0.
       def var i as int.
       run mpid2.p(input-output table tt1).
       buffer tt1:empty-temp-table().
       do i = 1 to 10:
          create tt1.
          tt1.f1 = i.
          release tt1.
       end.
    end.
    
    do transaction:
       run proc0.
    end.
    
    
  • mpid2.p
    def temp-table tt1 field f1 as int index ix1 f1.
    
    def var i as int.
    def input-output param table for tt1.
    
    do i = 1 to 10000:
       create tt1.
       tt1.f1 = i.
       release tt1.
    end.
    
    for each tt1: delete tt1. end.
    
    

For an undoable temp-table, a EMPTY-TEMP-TABLE results in a UPDATE tt1 SET _multiplex = <delete-multiplex> WHERE _multiplex = <old-multiplex>. The assumption with the batches was that the multiplex ID is read-only - this is not true for undoable temp-tables, where a delete is emulated via changing the multiplex (and thus hiding the record from the legacy temp-table).

#9 Updated by Alexandru Lungu about 1 month ago

Constantin Asofiei wrote:

For an undoable temp-table, a EMPTY-TEMP-TABLE results in a UPDATE tt1 SET _multiplex = <delete-multiplex> WHERE _multiplex = <old-multiplex>. The assumption with the batches was that the multiplex ID is read-only - this is not true for undoable temp-tables, where a delete is emulated via changing the multiplex (and thus hiding the record from the legacy temp-table).

Do you mean the INPUT-OUTPUT TABLE functionality, right? I see your point, but I thought this wasn't implemented yet (it seems it was implemented in the mean-time).

Anyway, the UPDATE in H2 is a DELETE + INSERT. All records will be first removed (doing something like a DELETE ... WHERE multiplex = <old-multiplex>) and then added (doing an insert for each row). The only issue I see is the recid. It is still the old recid from the source table. I am mostly sure this is wrong. 4GL is reindexing the records, so we shall do this the same (reserve id before insert).

1. How urgent is this? Anything breaking right now?
2. We can have a direct-access API for this (changeMultiplex). It will basically do the UPDATE + reindexing in the updating.
3. Otherwise, we can have a special SQL keyword (REINDEX) for the UPDATE, that is reindexing the records after update.

#10 Updated by Constantin Asofiei about 1 month ago

No, the empty-temp-table is the problem, while executing in a transaction, for an undoable temp-table - this does not remove the records, but changes the multiplex ID.

For MultiplexedScanIndex, the target batch in setRow(key, row) is calculated from the record's key (which is not changing!) - so a remove/add will end up in the same batch.

Urgency: the performance for #8363 is urgent, the bug appears only when introducing 8593a_h2 in ETF.

#11 Updated by Alexandru Lungu about 1 month ago

For MultiplexedScanIndex, the target batch in setRow(key, row) is calculated from the record's key (which is not changing!) - so a remove/add will end up in the same batch.

Hmm, I see. Drop my statement then - the INPUT-OUTPUT tables are not yet implemented with UPDATE.

Still, why are we updating with a <delete-multiplex> instead of actually deleting them? If they are undoable, we can rollback (with a savepoint or not). Why do we need to have such functionality of marking them as deleted?

#12 Updated by Eric Faulhaber about 1 month ago

Alexandru Lungu wrote:

Still, why are we updating with a <delete-multiplex> instead of actually deleting them? If they are undoable, we can rollback (with a savepoint or not). Why do we need to have such functionality of marking them as deleted?

This technique of updating the multiplex ID to emulate an undoable delete predates (by almost two decades) our use of savepoints and any editing of H2 source code.

#13 Updated by Greg Shah about 1 month ago

Seems like a long-standing leak of records. If that built up to a large number, it might be both a memory leak and a drag on performance.

#14 Updated by Eric Faulhaber about 1 month ago

Greg Shah wrote:

Seems like a long-standing leak of records. If that built up to a large number, it might be both a memory leak and a drag on performance.

No, this "emulated delete" only happens at the subtransaction level, so that an UNDO is able to restore the records by switching the multiplex ID back. They are actually deleted at transaction commit, or restored at transaction rollback. At least, that was the original implementation; I can't speak to this new issue that Constantin detected.

#15 Updated by Alexandru Lungu about 1 month ago

Eric, even for a sub-transaction. We can simply delete and rollback to savepoint, right?

#16 Updated by Eric Faulhaber about 1 month ago

Alexandru Lungu wrote:

Eric, even for a sub-transaction. We can simply delete and rollback to savepoint, right?

Yes, we should be able to now. We just weren't using savepoints back then, and this technique has survived all these years. If you make this change, we should test thoroughly, since this has been a core design point from the beginning. I don't recall if/where we have dependent assumptions on this approach in the runtime.

#17 Updated by Alexandru Lungu about 1 month ago

Just a thing that was on my mind and wanted to have it here as well:

For NO-UNDO cases, we used to rollback (transaction or savepoint) and redo the state of the NO-UNDO tables (i.e. rollbacking the rollback). This is no longer the case as FWD-H2 has the NO-UNDO keyword in place, so rollbacks are only for UNDO tables.

Yes, we should be able to now. We just weren't using savepoints back then, and this technique has survived all these years. If you make this change, we should test thoroughly, since this has been a core design point from the beginning. I don't recall if/where we have dependent assumptions on this approach in the runtime.

Ok

#18 Updated by Greg Shah 29 days ago

  • % Done changed from 100 to 50
  • Status changed from Review to WIP

#19 Updated by Constantin Asofiei 29 days ago

Previously we used the UPDATE ... SET _MULTIPLEX = ? paired with a ReversibleBulkDelete, which was meant to restore the multiplex ID (via a reverse UPDATE), in case of rollback of an undoable temp-table.

ReversibleBulkDelete was removed I think when we introduced no-undo temp-tables directly in FWD-H2 (or the savepoints?).

I do not think we actually need the UPDATE ... SET _MULTIPLEX = ? in TemporaryBuffer.removeRecords anymore, the forceDelete flag should always be true. Currently is set to:

forceDelete = forceDelete || !isUndoable() || !bufferManager.isTransaction()

where:
  • !isUndoable() means will be true for NO-UNDO temp-tables
  • !bufferManager.isTransaction() means we are not in a transaction
So, for a bulk delete, this means that forceDelete needs to be true for:
  • when we are inside a transaction
  • when the temp-table is undoable

I'm testing this approach now. Please let me know if you see anything wrong.

#20 Updated by Constantin Asofiei 27 days ago

  • % Done changed from 50 to 100
  • Status changed from WIP to Review
Created task branch 8593a from trunk rev 15157, with:
  • For a NEW temp record which was not flushed and is being deleted or rolled back in the transaction, the allocated primary key row (which is an empty template) to reserve this primary key, needs to be explicitly removed.
  • Savepoints are used to rollback undoable temp-tables, and no-undo support is builtin into H2 itself; so, TemporaryBuffer.removeRecords will always perform an actual DELETE.

8593a_h2 rev 48 contains the FWD-H2 changes related to this.

The test for this is:

def temp-table tt1 field f1 as int index ix1 f1.
def var c as int.

do transaction:

   do transaction:
      create tt1.
      assign tt1.f1 = 1.
      release tt1.
      create tt1.
      assign tt1.f1 = 2.
      // do a delete or a undo, not both
      // delete tt1.  
      // undo, leave.
   end.

   undo, leave.
end.

c = 0.
for each tt1: c = c + 1. end.
message c.

#21 Updated by Alexandru Lungu 25 days ago

The FWD direct-access changes are OK!
Also, getting rid of the forceDelete is good, but requires extensive testing!

For H2:
  • rowCount -= removed.cardinality; is something that is theoretical irrelevant. The cardinality should always be 0 when killing the multiplex. the killMultiplex API always relies that this is a batch clear-up attempt, not an actual removal of rows. If you encountered this as being "required", it means some kind of invariant was broken.
  • return i SearchRow.ROWID_INDEX ? ValueLong.get(key) : data null ? null : data[i]; why is data allowed to be null in the first place?
  • of course, this also requires extensive testing.

#22 Updated by Constantin Asofiei 25 days ago

Alexandru Lungu wrote:

The FWD direct-access changes are OK!
Also, getting rid of the forceDelete is good, but requires extensive testing!

ETF testing passed with the current changes, and also was the reason why these bugs surfaced.

  • rowCount -= removed.cardinality; is something that is theoretical irrelevant. The cardinality should always be 0 when killing the multiplex. the killMultiplex API always relies that this is a batch clear-up attempt, not an actual removal of rows. If you encountered this as being "required", it means some kind of invariant was broken.
Well, the invariant was/is already broken:
  • rowCount at the MultiplexedScanIndex tracks only the 'flushed' records - it does not include the 'rows allocated for primary key reservation'
  • cardinality at the batch tracks all rows added to the batch.

So that was the reason I had to add 'clearPrimaryKey' for unflushed records which are deleted/rollback, as otherwise the DELETE ... WHERE _MULTIPLEX = ? statement in TemporaryBuffer.removeRecords failed the invariant.

  • return i SearchRow.ROWID_INDEX ? ValueLong.get(key) : data null ? null : data[i]; why is data allowed to be null in the first place?

I think this was because of the 'template row' with null data used to allocate a row to reserve a primary key in a batch.

#23 Updated by Alexandru Lungu 25 days ago

Constantin Asofiei wrote:

  • rowCount -= removed.cardinality; is something that is theoretical irrelevant. The cardinality should always be 0 when killing the multiplex. the killMultiplex API always relies that this is a batch clear-up attempt, not an actual removal of rows. If you encountered this as being "required", it means some kind of invariant was broken.
Well, the invariant was/is already broken:
  • rowCount at the MultiplexedScanIndex tracks only the 'flushed' records - it does not include the 'rows allocated for primary key reservation'
  • cardinality at the batch tracks all rows added to the batch.

So that was the reason I had to add 'clearPrimaryKey' for unflushed records which are deleted/rollback, as otherwise the DELETE ... WHERE _MULTIPLEX = ? statement in TemporaryBuffer.removeRecords failed the invariant.

Do you mean that the invariant was broken and now it was fixed by clearPrimaryKey? I understood the fact that there were "leaking reservations" and these were removed by the rollback changes.

The original code always presumed that the records were deleted before the multiplex was killed, so we always presumed that the cardinality if 0 when killing the batches. I agree this was broken before your changes, but now it should be fixed, right? So rowCount -= removed.cardinality; is no longer needed.

I think this was because of the 'template row' with null data used to allocate a row to reserve a primary key in a batch.

Oh yeah, right. The code is reserving slots with a null data for the sake of performance/memory.

#24 Updated by Constantin Asofiei 25 days ago

Alexandru Lungu wrote:

Do you mean that the invariant was broken and now it was fixed by clearPrimaryKey? I understood the fact that there were "leaking reservations" and these were removed by the rollback changes.

Is still somehow broken: a template row for the allocated primary key is not added to the other indexes. So the row count at MultiplexedScanIndex, the table and other indexes will always have the 'flushed row count', while if you sum the batches' cardinality, this will be different.

The original code always presumed that the records were deleted before the multiplex was killed, so we always presumed that the cardinality if 0 when killing the batches. I agree this was broken before your changes, but now it should be fixed, right? So rowCount -= removed.cardinality; is no longer needed.

I want to keep it, as at the moment the multiplex is deleted, we want to ensure all is stable.

Also, I think the actual DirectAccessHelper.killMultiplex is no longer needed - it tries to remove rows which were previously removed via the DELETE from TemporaryBuffer.removeRecords, plus it clears only the MultiplexedScanIndex - what about the other indexes?

#25 Updated by Alexandru Lungu 25 days ago

Constantin Asofiei wrote:

Alexandru Lungu wrote:

Do you mean that the invariant was broken and now it was fixed by clearPrimaryKey? I understood the fact that there were "leaking reservations" and these were removed by the rollback changes.

Is still somehow broken: a template row for the allocated primary key is not added to the other indexes. So the row count at MultiplexedScanIndex, the table and other indexes will always have the 'flushed row count', while if you sum the batches' cardinality, this will be different.

Oh, the invariant I was talking about is removed.cardinality 0 when killing a multiplex. I agree that the sum of cardinalities is different from the rowCount.

Also, I think the actual DirectAccessHelper.killMultiplex is no longer needed - it tries to remove rows which were previously removed via the DELETE from TemporaryBuffer.removeRecords, plus it clears only the MultiplexedScanIndex - what about the other indexes?

The need for killMultiplex was due to the empty batches that were left behind. For UNDO tables, empty batches are left behind in case an UNDO happens. For instance, if you remove a record, the batch (even empty) will stay alive for UNDOABLE tables, waiting for an UNDO. If the UNDO never comes, the batch will leak. killMultiplex was intentionally designed to only drop the empty batches of UNDOABLE tables. That is why the removed.cardinality 0 invariant should hold when killing the multiplex and that is why we always need to clear the table before killing the multiplex.

#26 Updated by Constantin Asofiei 24 days ago

Alexandru Lungu wrote:

Oh, the invariant I was talking about is removed.cardinality 0 when killing a multiplex. I agree that the sum of cardinalities is different from the rowCount.

When the actual killMultiplex(multiplexId) is being called from FWD, no more 'primary key allocation rows' should exist; also, all batches should be already removed because of the removeAll(session, multiplexID) call.

The need for killMultiplex was due to the empty batches that were left behind. For UNDO tables, empty batches are left behind in case an UNDO happens. For instance, if you remove a record, the batch (even empty) will stay alive for UNDOABLE tables, waiting for an UNDO. If the UNDO never comes, the batch will leak. killMultiplex was intentionally designed to only drop the empty batches of UNDOABLE tables. That is why the removed.cardinality 0 invariant should hold when killing the multiplex and that is why we always need to clear the table before killing the multiplex.

I need to double-check for undoable temp-tables. The case I'm thinking is a empty-temp-table call followed by undo, leave. This would trigger a DELETE ... FROM ... _MULTIPLEX = ?, which will call removeAll and also remove all the batches. The rollback operation should create the batches again.

#27 Updated by Alexandru Lungu 24 days ago

Constantin Asofiei wrote:

I need to double-check for undoable temp-tables. The case I'm thinking is a empty-temp-table call followed by undo, leave. This would trigger a DELETE ... FROM ... _MULTIPLEX = ?, which will call removeAll and also remove all the batches. The rollback operation should create the batches again.

This is a huge thing. Removing a batch will make the MultiplexScanIndex quite unstable. If you want to recreate it, you will need to do some kind of linked list traversal to find its spot and "revive" it. This is the exact reason for why I was preferring to leak them and let them be cleared entirely.

I think the removeAll caused by the truncate should only clear the batches (not actually remove them). In fact, it can remove the batches only for NO-UNDO. Also, mind that the "last" batch is a bit more special as it can reclaim keys. Therefore, never remove a batch if it is the "last" for a multiplex.

#28 Updated by Constantin Asofiei 24 days ago

Alexandru Lungu wrote:

This is a huge thing. Removing a batch will make the MultiplexScanIndex quite unstable. If you want to recreate it, you will need to do some kind of linked list traversal to find its spot and "revive" it. This is the exact reason for why I was preferring to leak them and let them be cleared entirely.

I think the removeAll caused by the truncate should only clear the batches (not actually remove them). In fact, it can remove the batches only for NO-UNDO. Also, mind that the "last" batch is a bit more special as it can reclaim keys. Therefore, never remove a batch if it is the "last" for a multiplex.

Understood, thanks.

#29 Updated by Constantin Asofiei 24 days ago

Alexandru, please review 8593a_h2 rev 49/50 and 8593a rev 15158. I'll need to re-test these changes.

#30 Updated by Constantin Asofiei 24 days ago

Constantin Asofiei wrote:

Alexandru, please review 8593a_h2 rev 49/50 and 8593a rev 15158. I'll need to re-test these changes.

ETF testing passed. Alexandru, if you can run a ChUI regression testing with this and 8363g, please let me know.

#31 Updated by Alexandru Lungu 24 days ago

ETF testing passed. Alexandru, if you can run a ChUI regression testing with this and 8363g, please let me know.

ChUI takes ~2h:30mins. Do you want to test them both or separately? I would prefer to have them together as I have limited time on the bare-metal due to the intensive performance testing we are doing today.

#32 Updated by Constantin Asofiei 24 days ago

You can test them both (8593a + 8636g). I've uploaded fwd-h2 1.46 and 8593a rev 15161 has the change in build.gradle, so you can build directly. Just patch 8593a over 8363g.

#33 Updated by Constantin Asofiei 23 days ago

Alexandru, please let me know if you managed to run ChUI - I'd like to merge these tomorrow.

#34 Updated by Alexandru Lungu 23 days ago

  • Status changed from Review to Internal Test

Yes. I had to upgrade to latest 6667i (after recent rebase), reconvert are set a new baseline. After, I patched (8593a + 8636g), ensured only FWD-H2 1.46 is in deploy/lib and re-run the tests. I left them overnight.
I am looking at them now - everything PASSED.

The changes are good to go!

#35 Updated by Constantin Asofiei 23 days ago

  • Status changed from Internal Test to Merge Pending

#36 Updated by Greg Shah 22 days ago

You can go ahead with the merge of 8593a and 8636g.

#37 Updated by Constantin Asofiei 22 days ago

  • Status changed from Merge Pending to Test

8593a was merged to trunk rev 15169 and archived.
8593a_h2 was merged to fwd-h2 rev 46 and archived.

#38 Updated by Constantin Asofiei 19 days ago

  • Related to Bug #8703: NO-UNDO option at DMO must be removed added

#40 Updated by Constantin Asofiei 16 days ago

  • Status changed from Test to Internal Test
Created task branch 8593b from trunk rev 15186. Contains regression fixes (mixed):
  • revno: 15190 - Fixed regression in trunk rev 15169; refs #8593:
    • A NEW and INVALID record has already been removed from the temp-table, so no need to clear its primary key.
    • updated to fwd-h2 rev 1.47 via 8593b_h2 - a bulk delete which removes all rows from the multiplex must clean the multiplex and leave it in a state to reclaim the very first key.
  • revno: 15189 - Fixed bulk delete where the FQL was prepared with a faulty alias. Refs #8593 #8715
  • revno: 15188 - When a BEFORE-TABLE is being built too, for a dynamic temp-table, the associated DMOs for both the before and after table must have the after/beforeTable annotations set. Refs #8545
  • revno: 15187 - Fixed regression in trunk rev 15179: when waiting for data on the socket, 'hasMoreDataworker' must not assume there is no more data, if the waiting has finishedbecause of a timeout. Also, a SocketTimeoutException must be thrown only if SO_TIMEOUT is not zero and the timeout has actually expired. Refs #8486

I'm placing this in ETF testing. There are also changes in 8593b_h2.

Eric/Ovidiu/Greg - please review.

#41 Updated by Greg Shah 16 days ago

Code Review Task Branch 8593b Revision 15179

The LowLevelSocketImpl changes look good.

Eric/Ovidiu: Please look at the persistence changes ASAP.

#42 Updated by Ovidiu Maxiniuc 16 days ago

Code Review Task Branch 8593b.

I do not see anything wrong with the code (r15187-).

#43 Updated by Constantin Asofiei 16 days ago

8593b passed ETF testing.

#44 Updated by Greg Shah 16 days ago

  • Status changed from Internal Test to Merge Pending

You can merge to trunk.

#45 Updated by Constantin Asofiei 16 days ago

  • Status changed from Merge Pending to Test

Branch 8593b was merged to trunk rev 15189 and archived.

#46 Updated by Alexandru Lungu 10 days ago

  • Status changed from Test to WIP
  • % Done changed from 100 to 80

Timeline:

1. I was thinking of replacing the sequence based PK generator used by FastCopyHelper with a more customized syntax to retrieve the PK directly from the table (MultiplexedScanIndex). However, I started off by doing some tests and I couldn't confirm your finding that lead to 1.47. I used:

def temp-table tt no-undo field f1 as int.

def var i as int.

do transaction:
    repeat i = 1 to 300:
        create tt.
        tt.f1 = i.
    end.
end.

output to test.txt.

for each tt:
    message "B" recid(tt).
end.

run b.p(input-output table tt).

for each tt:
    message "A" recid(tt).
end.

and

def temp-table tt field f1 as int.

define input-output parameter table for tt.

do transaction:
    for each tt:
        tt.f1 = tt.f1 * 10.
    end.
end.

for each tt:
    message "I" recid(tt).
end.
In this case, I had:
  • B 2304-2475 and 2560-2686 (2 batches)
  • I 4352-4524 and 4608-4734 (2 batches)
  • A 6400-6567 and 6656-6787 (2 batches)

Disregard the huge offsets (beside the batches, 4GL uses some kind of pages so that the offsets are sometimes bigger across multiplexes - this is not implemented in FWD). Also, the batches have some padding, so they don't have exactly 256 records.
I also attempted with UNDO or APPEND, but I have similar results (however, the A(fter) results are using the same page from B(efore), but is not reclaiming).

PS: Using BY-REFERENCES results in having the same ids all over the place.

2. But finally I found some extra weirdness that drove me to another idea. If I am doing the same test as above, but only with 100 records, I have:
  • B 2304-2403 (1 batch)
  • I 4352-4451 (1 batch)
  • A 2304-2403 (1 batch)

It reclaims! I tried edge cases (with 173/174 record - the break point between batch sizes) and confirmed that reclaiming happens only when there is one single batch.

Overall, this means that the changes in FWD-H2 1.47 should hold only when there is only one single batch. When there are multiple batches, the old implementation (of assigning a new fresh batch) is correct.

I am planning to fine-tune getNextPrimaryKey bulk. If there is only one batch - use 1.47 implementation. If there are at least 2 batches, use the old implementation.
Anything else to take into account?

#47 Updated by Constantin Asofiei 10 days ago

This is a test where it gets into an infinite loop:

def temp-table tt1 no-undo field f1 as int index ix1 f1.
def temp-table tt2 like tt1.

do transaction:
  create tt1.
  tt1.f1 = 1.
  release tt1.
end.

for each tt1 where tt1.f1 > 0:
   message tt1.f1.
   run proc0 (input-output table tt1).
end.

procedure proc0.
   def input-output parameter table for tt2.
end.

This can be expanded to have more than one batch. So the 'old implementation' I don't think is OK.

#48 Updated by Alexandru Lungu 10 days ago

Constantin Asofiei wrote:

This is a test where it gets into an infinite loop:
[...]

This can be expanded to have more than one batch. So the 'old implementation' I don't think is OK.

With your test:
  • If I have only one batch (1 - ~150 records), I have the following output (including the recid):
    2304 1
    2305 2
    2306 3
    ...
    2402 99
    2403 100
  • If I have more than one batch (I tested with 300 records, I have the following output:
    2304 1
    6400 1
    4353 2
    2306 3
    6402 3
    4355 4
    2308 5
    6404 5
    4357 6
    ....
    2667 299
    6763 299
    4716 300

So as said, there is a difference between copying 1 batch and copying more than 1 batch. Apparently it doesn't enter an infinite loop.

The answer is regarding the pages (that are not implemented in FWD). Note that only odd records are iterated twice:
  • The original rows are in page 1.
  • When going into proc0, the rows are moved in page 2.
  • When returning, if there are more than one batch, the records go on page 3.
  • This way record 1 is iterated again
  • proc0 is called and rows are moved to page 1. (see how multiplex 2 is reusing the page 1 that was used by multiplex 1 at first). This is happenning just because the table is NO-UNDO and we know that multiplex 1 can't reuse items from page 1.
  • proc0 returns and rows are moved back to page 2. At this point, the for each goes to the element 2.

This approves my idea from #8593-46. It is OK to move records to a new page (higher PK) if more than 1 batch.
The only missing puzzle piece is the page implementation in FWD. Because there are no pages, we always assign higher batches and don't recycle the empty pages.

Pages are always 2048 in size and can contain (apparently) only 8 batches. If all batches are empty, the page can be reused cross-multiplex.

Lastly: For UNDO, the situation is different and requires way more investigation overall (it increases the PK several times until the end of the pages; at that point it seems it forces reclaiming to avoid leaking out the page).

#49 Updated by Alexandru Lungu 6 days ago

  • Status changed from WIP to Review

Committed 8593c_h2/rev. 48 and 8593c/rev. 15203. Contains the following:

  • H2 support for NEXT PK FOR (<table-name>, <multiplex>) construct to work similarly as NEXT VALUE FOR <sequence-name>, but using the internal multiplexed scan index PK reservation mechanism.
  • This includes a bit of refactoring of MultiplexScanIndex.getNextPrimaryKey. The is no way to distinguish between append and no-append, so simply work as "always reclaim the lowest PK possible". If no PK to reclaim, assign a new batch. This is an aggressive implementation to reclaim when possible to avoid infinite loops. This is not 100% accurate to how 4GL works (page based indexing), but it is better than what we have now. I left a TODO on this which can be tackled in a future iteration.
  • FWD changes includes a refactoring of FastCopyHelper to switch from sequence based indexing to table based indexing (NEXT PK FOR syntax). This way, I purged the getNextPrimaryKey bulk functionality entirely.
    • Constantin, there was some kind of functionality that was relying on consecutive numbering before, but it was more like an optimization. If the copy was not append mode, we would always presumed consecutive numbering so we can easier index normalized extents and before tables. However, there was an in-DB PK mapping functionality that was used for append mode and I widened its usage in 8593c. This may imply some performance draw-back. TL;DR: If a non-consecutive mapping was required, it was done in-DB. If not, it was presumed a consecutive order and a sequence was used. With my changes, all non-trivial cases that require a PK mapping will use an in-DB one.
    • Also, this means that even "simple" fast copy operations will do in-DB PK mappings, so there are new execution paths enabled that should be carefully tested.

- by in-DB I mean "a database table that holds the PK mapping and is joined when doing the INSERT INTO SELECT FROM"

With this, example from #8593-47 is not running infinitely (with 1, 150 or 300 records) and the leak is no more. However, these are the only tests I did by now. Please review this first iteration of changes.

#50 Updated by Greg Shah 5 days ago

Ovidiu/Eric: Please review 8593c.

This is quite Urgent since it is to resolve a regression in a customer application that has been present for over a week.

#51 Updated by Alexandru Lungu 5 days ago

I committed now 8593c/rev. 15204. This includes a requirement to use in-DB PK mapping if the table has normalized extent fields. This was found during regression testing.
With this final version, the unit tests full suite of a large customer doesn't regress.

#52 Updated by Ovidiu Maxiniuc 5 days ago

Eric, I am starting the review process in the next minutes. Please let me know if you are already in the process.

#53 Updated by Alexandru Lungu 5 days ago

I committed 8593c_h2/rev. 50: Allowing forced reclaiming only for NO-UNDO tables. Otherwise, on a rollback, we may have conflicting PK.
This means that COPY-TEMP-TABLE will never reclaim keys for UNDO table. This is accurate to the findings in 4GL.

#54 Updated by Alexandru Lungu 5 days ago

I am also having a hard time tracking down an issue with the truncate now (not related to my changes in 8593c/8593c_h2). It seems like, at the moment of truncate, there are two records in the table. This first one has data, but the second one is just a reserved PK.

The root issue is that dataset fill may sometimes reach BufferImpl.FillWorker.fillRow with the following code:

if (!validated)
            {
               if (isMerge)
               {
                  // merge mode: ignore it, just skip to next record
                  TransactionManager.disableLoopProtection(null);
                  hasMore.assign(fillQuery._getNext());
                  // we need to release the newly created record as if undone. We cannot call
                  // BlockManager.undoNext(lbl) here as it will affect the buffer(s) from the query
                  // driving the fill operation.
                  recBuf.release(true, false); // I.e. drop() - this form avoids a buffer() lookup
                  return false;

Before reaching this, it always does selfBuf.create and it eventually releases the buffer due to recBuf.release. However, this is quite a special release as it doesn't flush, due to the first parameter true which stands for undo. Avoiding the flush, this DMO leaks its reserved PK.

Constantin: this is part of your effort to support clearPrimaryKey. Currently, it takes effect in ChangeSet.rollback and Session.delete, but this case is a bit more special. It may be registered so that ChangeSet.rollback will clear its PK, but it would be too late. If we hurry the process, we may end up clearing the same PK twice, which is again bad. I am not quite sure how to table the PK clear on release with undo.

I wonder if we can't be more lenient with the reserved PK and simply ignore them when truncating. On the other end, I identified this exact scenario (hopefully the last one) so I can work around it.

#55 Updated by Constantin Asofiei 5 days ago

Alexandru, I don't like leaving 'allocated records' with no meaning in H2. Can we explicitly clear the primary key for this fill case, too?

#56 Updated by Alexandru Lungu 5 days ago

These are the changes that unblocked me:

=== modified file 'src/com/goldencode/p2j/persist/RecordBuffer.java'
--- old/src/com/goldencode/p2j/persist/RecordBuffer.java    2024-04-23 07:57:16 +0000
+++ new/src/com/goldencode/p2j/persist/RecordBuffer.java    2024-05-13 18:47:14 +0000
@@ -11862,6 +11862,19 @@
                                                                dbTrigManager);
             }

+            if (isTemporary && oldRecord.checkState(DmoState.NEW) && undo)
+            {
+               try
+               {
+                  // this is important to dealloc the reserved PK
+                  persistence.getSession().delete(oldRecord);
+               }
+               catch (PersistenceException e)
+               {
+                  LOG.log(Level.SEVERE, "Failed deleting the transient record on undo", e);
+               }
+            }
+            
             updateNoUndoState();

This is basically removing the DMO entirely from the session if isTemporary, NEW and undo. The delete from the session is also removing the reservation. The undo release is used only by this fill and BufferImpl.drop (in case exceptions occur). The changes are mostly in the safe side as they are hit in very specific cases.

But this requires testing.

#57 Updated by Ovidiu Maxiniuc 5 days ago

Greg Shah wrote:

Ovidiu/Eric: Please review 8593c.

I am OK with the code. A few of notes (cleanup/low priority):
  • TemporaryBuffer.java
    • nextPrimaryKey() is now public and @Override annotation dropped. It is never used to require public visibility and it still overrides the method in RecordBuffer;
    • getSequenceName() doesn't seem to be used any more. More than that Context.sequenceName is used to create the sequence and delete it, but no other uses, since FastCopyHelper relies in H2 now. I think these occurrences can be dropped as well.
  • FastCopyHelper.java:
    • you marked FastCopyBundlePKElement as @Deprecated, but I think it can be dropped completely since it is not in use any more.

#58 Updated by Alexandru Lungu 4 days ago

  • % Done changed from 80 to 100

Constantin, please review 8593c_h2 changes. If they are OK, please upload it to the GCD repository.

#59 Updated by Alexandru Lungu 4 days ago

  • % Done changed from 100 to 80

Committed 8593c/rev. 15205 assessing the review from Ovidiu.

#60 Updated by Ovidiu Maxiniuc 4 days ago

I have reviewed the changes from 8593c/rev. 15205. Good job!

Is there a reason for downgrading the percentage done to 80?

#61 Updated by Constantin Asofiei 4 days ago

I've uploaded 1.48 to projsrv01 and upgraded the version in rev 15206.

I'll run ETF and test with another customer app tomorrow.

#62 Updated by Alexandru Lungu 4 days ago

  • % Done changed from 80 to 100

Reset %Done.

#63 Updated by Constantin Asofiei 3 days ago

ETF testing passed, the original #7143-903 is fixed, and another app works OK with this.

#64 Updated by Greg Shah 3 days ago

You can merge to trunk now.

#65 Updated by Constantin Asofiei 3 days ago

Greg Shah wrote:

You can merge to trunk now.

Starting.

#66 Updated by Constantin Asofiei 3 days ago

Branch 8593c was merged to trunk revision 15212 and archived.

Branch 8593c_h2 was merged to fwd-h2 trunk revision 48 and archived.

#67 Updated by Greg Shah 3 days ago

This resolves the regression documented in #7143-900?

#68 Updated by Constantin Asofiei 3 days ago

Greg Shah wrote:

This resolves the regression documented in #7143-900?

Yes

#69 Updated by Alexandru Lungu 1 day ago

Committed 8593d/rev. 15206 to fix #8593-54. It contains the patch from #8593-56.
Recap: there was an issue on dataset fill where a transient record is undo-ed using a "special" release that doesn't flush. In this case, the DMO should be deleted to drop the reserved PK.

#70 Updated by Greg Shah 1 day ago

Is there any further review needed for this or do we just need testing?

Also available in: Atom PDF