Project

General

Profile

Bug #8593

H2 performance improvement for table 'truncate by multiplex'

Added by Constantin Asofiei 22 days ago. Updated 4 days ago.

Status:
Test
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:

Related issues

Related to Database - Bug #8703: NO-UNDO option at DMO must be removed New

History

#2 Updated by Constantin Asofiei 22 days ago

  • Status changed from New to Review
  • Assignee set to Constantin Asofiei
  • % Done changed from 0 to 100
Created task branch 8593a_h2 which improves the DELETE FROM tt1 WHERE tt1._multiplex = ?. The reasoning is this:
  • normal DELETE operations in H2 will first do a SELECT to gather all the rows. This includes DELETE FROM tt1 WHERE tt1._multiplex = ?
  • with normal DELETE, each row is deleted individually from the index - this means that all the index columns are used to find the target node to be removed
  • the implementation first checks that the DELETE is for a _MULTIPLEX = ? condition, and if so, then:
    • the MultiplexedScanIndex is cleaned first, and gathers all PKs for the records being deleted with this multiplex ID
    • all indexes which have as first index column the _MULTIPLEX column will remove all these records, by matching only the multiplex ID
    • all remaining indexes assume the PK is the first column (I need confirmation that this will be always true), and just find that record in the index and remove it.
    • this avoids using all the index columns to find the record

Alexandru: please take a look at 8593a_h2.

#3 Updated by Alexandru Lungu 22 days ago

Created task branch 8593a_h2 which improves the DELETE FROM tt1 WHERE tt1._multiplex = ?. The reasoning is this:
  • all remaining indexes assume the PK is the first column (I need confirmation that this will be always true), and just find that record in the index and remove it.

If you meant PK as recid, this is always the last column in an index, but the first column in the row. If you meant PK as H2 internal row key, this is kept outside the data array and I think it is not ever indexed.
This being said, you can't remove a row only based on the recid/PK from a tree index. You can however prune the tree index based on multiplex (as it is the first field in the index).
Please let me know if I misunderstood your intent. Without a full-scan of the index, you can't find a row solely on its PK.

One last important thing is that you still need to log the deletes. so you will most certainly need to "register" the row as being deleted in case of a roll-back (unless you do this only for NO-UNDO tables).

I didn't look into the changes yet.

#4 Updated by Constantin Asofiei 22 days ago

You mean the Row.key is not the same as the internal PK?

Also, there is 'recid' index (with just this column) in H2 created for FWD temp-tables.

#5 Updated by Constantin Asofiei 22 days ago

Constantin Asofiei wrote:

You mean the Row.key is not the same as the internal PK?

I mean Row.key to be the same as the 'recid' PK we set on position zero in the row's data array.

#6 Updated by Constantin Asofiei 21 days ago

Alexandru, the assumption was that the MultiplexedBatchIndex$Batch holds data only for a certain multiplex. This is not correct (maybe because of re-used keys?).

Currently a batch can have rows for more than one multiplex ID.

#7 Updated by Alexandru Lungu 20 days ago

Constantin Asofiei wrote:

Alexandru, the assumption was that the MultiplexedBatchIndex$Batch holds data only for a certain multiplex. This is not correct (maybe because of re-used keys?).

Currently a batch can have rows for more than one multiplex ID.

This is absolutely no good. The invariant shouldn't be (ever) broken. Do you have a scenario that triggers such anomaly?

#8 Updated by Constantin Asofiei 20 days ago

The recreate is this:
  • mpid1.p
    def temp-table tt1 field f1 as int index ix1 f1.
    def var k as int.
    
    procedure proc0.
       def var i as int.
       run mpid2.p(input-output table tt1).
       buffer tt1:empty-temp-table().
       do i = 1 to 10:
          create tt1.
          tt1.f1 = i.
          release tt1.
       end.
    end.
    
    do transaction:
       run proc0.
    end.
    
    
  • mpid2.p
    def temp-table tt1 field f1 as int index ix1 f1.
    
    def var i as int.
    def input-output param table for tt1.
    
    do i = 1 to 10000:
       create tt1.
       tt1.f1 = i.
       release tt1.
    end.
    
    for each tt1: delete tt1. end.
    
    

For an undoable temp-table, a EMPTY-TEMP-TABLE results in a UPDATE tt1 SET _multiplex = <delete-multiplex> WHERE _multiplex = <old-multiplex>. The assumption with the batches was that the multiplex ID is read-only - this is not true for undoable temp-tables, where a delete is emulated via changing the multiplex (and thus hiding the record from the legacy temp-table).

#9 Updated by Alexandru Lungu 20 days ago

Constantin Asofiei wrote:

For an undoable temp-table, a EMPTY-TEMP-TABLE results in a UPDATE tt1 SET _multiplex = <delete-multiplex> WHERE _multiplex = <old-multiplex>. The assumption with the batches was that the multiplex ID is read-only - this is not true for undoable temp-tables, where a delete is emulated via changing the multiplex (and thus hiding the record from the legacy temp-table).

Do you mean the INPUT-OUTPUT TABLE functionality, right? I see your point, but I thought this wasn't implemented yet (it seems it was implemented in the mean-time).

Anyway, the UPDATE in H2 is a DELETE + INSERT. All records will be first removed (doing something like a DELETE ... WHERE multiplex = <old-multiplex>) and then added (doing an insert for each row). The only issue I see is the recid. It is still the old recid from the source table. I am mostly sure this is wrong. 4GL is reindexing the records, so we shall do this the same (reserve id before insert).

1. How urgent is this? Anything breaking right now?
2. We can have a direct-access API for this (changeMultiplex). It will basically do the UPDATE + reindexing in the updating.
3. Otherwise, we can have a special SQL keyword (REINDEX) for the UPDATE, that is reindexing the records after update.

#10 Updated by Constantin Asofiei 20 days ago

No, the empty-temp-table is the problem, while executing in a transaction, for an undoable temp-table - this does not remove the records, but changes the multiplex ID.

For MultiplexedScanIndex, the target batch in setRow(key, row) is calculated from the record's key (which is not changing!) - so a remove/add will end up in the same batch.

Urgency: the performance for #8363 is urgent, the bug appears only when introducing 8593a_h2 in ETF.

#11 Updated by Alexandru Lungu 20 days ago

For MultiplexedScanIndex, the target batch in setRow(key, row) is calculated from the record's key (which is not changing!) - so a remove/add will end up in the same batch.

Hmm, I see. Drop my statement then - the INPUT-OUTPUT tables are not yet implemented with UPDATE.

Still, why are we updating with a <delete-multiplex> instead of actually deleting them? If they are undoable, we can rollback (with a savepoint or not). Why do we need to have such functionality of marking them as deleted?

#12 Updated by Eric Faulhaber 20 days ago

Alexandru Lungu wrote:

Still, why are we updating with a <delete-multiplex> instead of actually deleting them? If they are undoable, we can rollback (with a savepoint or not). Why do we need to have such functionality of marking them as deleted?

This technique of updating the multiplex ID to emulate an undoable delete predates (by almost two decades) our use of savepoints and any editing of H2 source code.

#13 Updated by Greg Shah 20 days ago

Seems like a long-standing leak of records. If that built up to a large number, it might be both a memory leak and a drag on performance.

#14 Updated by Eric Faulhaber 20 days ago

Greg Shah wrote:

Seems like a long-standing leak of records. If that built up to a large number, it might be both a memory leak and a drag on performance.

No, this "emulated delete" only happens at the subtransaction level, so that an UNDO is able to restore the records by switching the multiplex ID back. They are actually deleted at transaction commit, or restored at transaction rollback. At least, that was the original implementation; I can't speak to this new issue that Constantin detected.

#15 Updated by Alexandru Lungu 20 days ago

Eric, even for a sub-transaction. We can simply delete and rollback to savepoint, right?

#16 Updated by Eric Faulhaber 20 days ago

Alexandru Lungu wrote:

Eric, even for a sub-transaction. We can simply delete and rollback to savepoint, right?

Yes, we should be able to now. We just weren't using savepoints back then, and this technique has survived all these years. If you make this change, we should test thoroughly, since this has been a core design point from the beginning. I don't recall if/where we have dependent assumptions on this approach in the runtime.

#17 Updated by Alexandru Lungu 19 days ago

Just a thing that was on my mind and wanted to have it here as well:

For NO-UNDO cases, we used to rollback (transaction or savepoint) and redo the state of the NO-UNDO tables (i.e. rollbacking the rollback). This is no longer the case as FWD-H2 has the NO-UNDO keyword in place, so rollbacks are only for UNDO tables.

Yes, we should be able to now. We just weren't using savepoints back then, and this technique has survived all these years. If you make this change, we should test thoroughly, since this has been a core design point from the beginning. I don't recall if/where we have dependent assumptions on this approach in the runtime.

Ok

#18 Updated by Greg Shah 11 days ago

  • % Done changed from 100 to 50
  • Status changed from Review to WIP

#19 Updated by Constantin Asofiei 10 days ago

Previously we used the UPDATE ... SET _MULTIPLEX = ? paired with a ReversibleBulkDelete, which was meant to restore the multiplex ID (via a reverse UPDATE), in case of rollback of an undoable temp-table.

ReversibleBulkDelete was removed I think when we introduced no-undo temp-tables directly in FWD-H2 (or the savepoints?).

I do not think we actually need the UPDATE ... SET _MULTIPLEX = ? in TemporaryBuffer.removeRecords anymore, the forceDelete flag should always be true. Currently is set to:

forceDelete = forceDelete || !isUndoable() || !bufferManager.isTransaction()

where:
  • !isUndoable() means will be true for NO-UNDO temp-tables
  • !bufferManager.isTransaction() means we are not in a transaction
So, for a bulk delete, this means that forceDelete needs to be true for:
  • when we are inside a transaction
  • when the temp-table is undoable

I'm testing this approach now. Please let me know if you see anything wrong.

#20 Updated by Constantin Asofiei 8 days ago

  • Status changed from WIP to Review
  • % Done changed from 50 to 100
Created task branch 8593a from trunk rev 15157, with:
  • For a NEW temp record which was not flushed and is being deleted or rolled back in the transaction, the allocated primary key row (which is an empty template) to reserve this primary key, needs to be explicitly removed.
  • Savepoints are used to rollback undoable temp-tables, and no-undo support is builtin into H2 itself; so, TemporaryBuffer.removeRecords will always perform an actual DELETE.

8593a_h2 rev 48 contains the FWD-H2 changes related to this.

The test for this is:

def temp-table tt1 field f1 as int index ix1 f1.
def var c as int.

do transaction:

   do transaction:
      create tt1.
      assign tt1.f1 = 1.
      release tt1.
      create tt1.
      assign tt1.f1 = 2.
      // do a delete or a undo, not both
      // delete tt1.  
      // undo, leave.
   end.

   undo, leave.
end.

c = 0.
for each tt1: c = c + 1. end.
message c.

#21 Updated by Alexandru Lungu 7 days ago

The FWD direct-access changes are OK!
Also, getting rid of the forceDelete is good, but requires extensive testing!

For H2:
  • rowCount -= removed.cardinality; is something that is theoretical irrelevant. The cardinality should always be 0 when killing the multiplex. the killMultiplex API always relies that this is a batch clear-up attempt, not an actual removal of rows. If you encountered this as being "required", it means some kind of invariant was broken.
  • return i SearchRow.ROWID_INDEX ? ValueLong.get(key) : data null ? null : data[i]; why is data allowed to be null in the first place?
  • of course, this also requires extensive testing.

#22 Updated by Constantin Asofiei 6 days ago

Alexandru Lungu wrote:

The FWD direct-access changes are OK!
Also, getting rid of the forceDelete is good, but requires extensive testing!

ETF testing passed with the current changes, and also was the reason why these bugs surfaced.

  • rowCount -= removed.cardinality; is something that is theoretical irrelevant. The cardinality should always be 0 when killing the multiplex. the killMultiplex API always relies that this is a batch clear-up attempt, not an actual removal of rows. If you encountered this as being "required", it means some kind of invariant was broken.
Well, the invariant was/is already broken:
  • rowCount at the MultiplexedScanIndex tracks only the 'flushed' records - it does not include the 'rows allocated for primary key reservation'
  • cardinality at the batch tracks all rows added to the batch.

So that was the reason I had to add 'clearPrimaryKey' for unflushed records which are deleted/rollback, as otherwise the DELETE ... WHERE _MULTIPLEX = ? statement in TemporaryBuffer.removeRecords failed the invariant.

  • return i SearchRow.ROWID_INDEX ? ValueLong.get(key) : data null ? null : data[i]; why is data allowed to be null in the first place?

I think this was because of the 'template row' with null data used to allocate a row to reserve a primary key in a batch.

#23 Updated by Alexandru Lungu 6 days ago

Constantin Asofiei wrote:

  • rowCount -= removed.cardinality; is something that is theoretical irrelevant. The cardinality should always be 0 when killing the multiplex. the killMultiplex API always relies that this is a batch clear-up attempt, not an actual removal of rows. If you encountered this as being "required", it means some kind of invariant was broken.
Well, the invariant was/is already broken:
  • rowCount at the MultiplexedScanIndex tracks only the 'flushed' records - it does not include the 'rows allocated for primary key reservation'
  • cardinality at the batch tracks all rows added to the batch.

So that was the reason I had to add 'clearPrimaryKey' for unflushed records which are deleted/rollback, as otherwise the DELETE ... WHERE _MULTIPLEX = ? statement in TemporaryBuffer.removeRecords failed the invariant.

Do you mean that the invariant was broken and now it was fixed by clearPrimaryKey? I understood the fact that there were "leaking reservations" and these were removed by the rollback changes.

The original code always presumed that the records were deleted before the multiplex was killed, so we always presumed that the cardinality if 0 when killing the batches. I agree this was broken before your changes, but now it should be fixed, right? So rowCount -= removed.cardinality; is no longer needed.

I think this was because of the 'template row' with null data used to allocate a row to reserve a primary key in a batch.

Oh yeah, right. The code is reserving slots with a null data for the sake of performance/memory.

#24 Updated by Constantin Asofiei 6 days ago

Alexandru Lungu wrote:

Do you mean that the invariant was broken and now it was fixed by clearPrimaryKey? I understood the fact that there were "leaking reservations" and these were removed by the rollback changes.

Is still somehow broken: a template row for the allocated primary key is not added to the other indexes. So the row count at MultiplexedScanIndex, the table and other indexes will always have the 'flushed row count', while if you sum the batches' cardinality, this will be different.

The original code always presumed that the records were deleted before the multiplex was killed, so we always presumed that the cardinality if 0 when killing the batches. I agree this was broken before your changes, but now it should be fixed, right? So rowCount -= removed.cardinality; is no longer needed.

I want to keep it, as at the moment the multiplex is deleted, we want to ensure all is stable.

Also, I think the actual DirectAccessHelper.killMultiplex is no longer needed - it tries to remove rows which were previously removed via the DELETE from TemporaryBuffer.removeRecords, plus it clears only the MultiplexedScanIndex - what about the other indexes?

#25 Updated by Alexandru Lungu 6 days ago

Constantin Asofiei wrote:

Alexandru Lungu wrote:

Do you mean that the invariant was broken and now it was fixed by clearPrimaryKey? I understood the fact that there were "leaking reservations" and these were removed by the rollback changes.

Is still somehow broken: a template row for the allocated primary key is not added to the other indexes. So the row count at MultiplexedScanIndex, the table and other indexes will always have the 'flushed row count', while if you sum the batches' cardinality, this will be different.

Oh, the invariant I was talking about is removed.cardinality 0 when killing a multiplex. I agree that the sum of cardinalities is different from the rowCount.

Also, I think the actual DirectAccessHelper.killMultiplex is no longer needed - it tries to remove rows which were previously removed via the DELETE from TemporaryBuffer.removeRecords, plus it clears only the MultiplexedScanIndex - what about the other indexes?

The need for killMultiplex was due to the empty batches that were left behind. For UNDO tables, empty batches are left behind in case an UNDO happens. For instance, if you remove a record, the batch (even empty) will stay alive for UNDOABLE tables, waiting for an UNDO. If the UNDO never comes, the batch will leak. killMultiplex was intentionally designed to only drop the empty batches of UNDOABLE tables. That is why the removed.cardinality 0 invariant should hold when killing the multiplex and that is why we always need to clear the table before killing the multiplex.

#26 Updated by Constantin Asofiei 6 days ago

Alexandru Lungu wrote:

Oh, the invariant I was talking about is removed.cardinality 0 when killing a multiplex. I agree that the sum of cardinalities is different from the rowCount.

When the actual killMultiplex(multiplexId) is being called from FWD, no more 'primary key allocation rows' should exist; also, all batches should be already removed because of the removeAll(session, multiplexID) call.

The need for killMultiplex was due to the empty batches that were left behind. For UNDO tables, empty batches are left behind in case an UNDO happens. For instance, if you remove a record, the batch (even empty) will stay alive for UNDOABLE tables, waiting for an UNDO. If the UNDO never comes, the batch will leak. killMultiplex was intentionally designed to only drop the empty batches of UNDOABLE tables. That is why the removed.cardinality 0 invariant should hold when killing the multiplex and that is why we always need to clear the table before killing the multiplex.

I need to double-check for undoable temp-tables. The case I'm thinking is a empty-temp-table call followed by undo, leave. This would trigger a DELETE ... FROM ... _MULTIPLEX = ?, which will call removeAll and also remove all the batches. The rollback operation should create the batches again.

#27 Updated by Alexandru Lungu 6 days ago

Constantin Asofiei wrote:

I need to double-check for undoable temp-tables. The case I'm thinking is a empty-temp-table call followed by undo, leave. This would trigger a DELETE ... FROM ... _MULTIPLEX = ?, which will call removeAll and also remove all the batches. The rollback operation should create the batches again.

This is a huge thing. Removing a batch will make the MultiplexScanIndex quite unstable. If you want to recreate it, you will need to do some kind of linked list traversal to find its spot and "revive" it. This is the exact reason for why I was preferring to leak them and let them be cleared entirely.

I think the removeAll caused by the truncate should only clear the batches (not actually remove them). In fact, it can remove the batches only for NO-UNDO. Also, mind that the "last" batch is a bit more special as it can reclaim keys. Therefore, never remove a batch if it is the "last" for a multiplex.

#28 Updated by Constantin Asofiei 6 days ago

Alexandru Lungu wrote:

This is a huge thing. Removing a batch will make the MultiplexScanIndex quite unstable. If you want to recreate it, you will need to do some kind of linked list traversal to find its spot and "revive" it. This is the exact reason for why I was preferring to leak them and let them be cleared entirely.

I think the removeAll caused by the truncate should only clear the batches (not actually remove them). In fact, it can remove the batches only for NO-UNDO. Also, mind that the "last" batch is a bit more special as it can reclaim keys. Therefore, never remove a batch if it is the "last" for a multiplex.

Understood, thanks.

#29 Updated by Constantin Asofiei 6 days ago

Alexandru, please review 8593a_h2 rev 49/50 and 8593a rev 15158. I'll need to re-test these changes.

#30 Updated by Constantin Asofiei 5 days ago

Constantin Asofiei wrote:

Alexandru, please review 8593a_h2 rev 49/50 and 8593a rev 15158. I'll need to re-test these changes.

ETF testing passed. Alexandru, if you can run a ChUI regression testing with this and 8363g, please let me know.

#31 Updated by Alexandru Lungu 5 days ago

ETF testing passed. Alexandru, if you can run a ChUI regression testing with this and 8363g, please let me know.

ChUI takes ~2h:30mins. Do you want to test them both or separately? I would prefer to have them together as I have limited time on the bare-metal due to the intensive performance testing we are doing today.

#32 Updated by Constantin Asofiei 5 days ago

You can test them both (8593a + 8636g). I've uploaded fwd-h2 1.46 and 8593a rev 15161 has the change in build.gradle, so you can build directly. Just patch 8593a over 8363g.

#33 Updated by Constantin Asofiei 4 days ago

Alexandru, please let me know if you managed to run ChUI - I'd like to merge these tomorrow.

#34 Updated by Alexandru Lungu 4 days ago

  • Status changed from Review to Internal Test

Yes. I had to upgrade to latest 6667i (after recent rebase), reconvert are set a new baseline. After, I patched (8593a + 8636g), ensured only FWD-H2 1.46 is in deploy/lib and re-run the tests. I left them overnight.
I am looking at them now - everything PASSED.

The changes are good to go!

#35 Updated by Constantin Asofiei 4 days ago

  • Status changed from Internal Test to Merge Pending

#36 Updated by Greg Shah 4 days ago

You can go ahead with the merge of 8593a and 8636g.

#37 Updated by Constantin Asofiei 4 days ago

  • Status changed from Merge Pending to Test

8593a was merged to trunk rev 15169 and archived.
8593a_h2 was merged to fwd-h2 rev 46 and archived.

#38 Updated by Constantin Asofiei about 15 hours ago

  • Related to Bug #8703: NO-UNDO option at DMO must be removed added

Also available in: Atom PDF