Project

General

Profile

Bug #7389

Make fast-copy statements cachable

Added by Alexandru Lungu about 1 year ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:
version_reported:
version_resolved:

Related issues

Related to Database - Bug #7351: Reduce SQL query string size of an INSERT INTO SELECT FROM Closed

History

#1 Updated by Alexandru Lungu about 1 year ago

Fast copy statements (INSERT INTO [...] SELECT FROM [...]) are quite often, especially as we have multiplex on tables and not separate tables for each DEFINE TEMP-TABLE.

However, in certain cases (when append option is used or normalized extent fields are used), an intermediate table is created named MASTER__PK__MAPPING. This allows mapping the old recid with the newly generated recid. Note that in append mode, only the records which don't violate any unique constraint are copied!

With MASTER__PK__MAPPING, the master table row copies and extent table row copies are done by joining this intermediate table. Similar semantics are for BEFORE__PK__MAPPING and LIST__INDEX__TABLE.

The concern here is that:
  • MASTER__PK__MAPPING is created before this process and dropped after. This forces the prepared statement to be reparsed each time re-used. psCache has several such statements, but even if they are cached, a reparsing is done when the cache is hit (because we are talking about another MASTER__PK__MAPPING table object each time).
  • Forcing out reparsing may increase performance as it avoids checking each time if a specific statement should be reparsed or not. For statements over tables with >100 columns, this checking is significant.
  • We've tried in #7351 to make unique MASTER__PK__MAPPING by appending a uid at the end. This way, we can force out reparsing as database objects will always have different names. However, this really flooded the cache with INSERT INTO SELECT FROM statements, which are large by nature and can't be ever reused.

One option is to avoid caching such statements altogether, but it feels quite hacky and not optimal.

The best to identify which is the main trigger for such MASTER__PK__MAPPING: append mode or normalized extent fields, so we can focus on the right scenario. I have a feeling that normalized extent fields are causing trouble here, but I will provide an update from a large customer application here.

#2 Updated by Alexandru Lungu about 1 year ago

  • Related to Bug #7351: Reduce SQL query string size of an INSERT INTO SELECT FROM added

#3 Updated by Dănuț Filimon about 1 year ago

I tested a large customer application (~30 minutes) and managed to gather the following data about the number and type of the insert statements executed:
Type Total No append or loose mode Append mode Loose mode
Fast insert 4504 4504 0 0
Extent insert 361 357 4 0
Simple extent insert 0 0 0 0
Shadow insert 1081 0 1081 0
Please note that:
  • Fast insert - generated by FastCopyHelper.getFastTableCopySql
  • Extent insert - generated by FastCopyHelper.getExtentInsertSql
  • Simple extent insert - generated by FastCopyHelper.getSimpleExtentInsertSql
  • Shadow insert - generated by FastCopyHelper.getShadowCopySql

From the inserts mentioned above, only fast insert statements are the ones that don't use MASTER_PK_MAPPING/BEFORE_PK_MAPPING/LIST__INDEX__TABLE, so this means that ~24.25% from the total use this mapping. Other interesting findings are that ~81.75% don't use any mode, while ~18.25% use append mode and 0% use loose mode.

#4 Updated by Alexandru Lungu about 1 year ago

  • Status changed from New to WIP
  • % Done changed from 0 to 20

This basically means that we should focus on the append mode: this is the one that usually triggers MASTER_PK_MAPPING (at least for your application). I don't think we should rule out extents; AFAIK, there are several applications rather using extents in temp-tables with copy.

I uploaded testcases/uast/fast-copy with the fast-copy test when I first designed FastCopyHelper. They are mostly profiling tests. You can test different scenarios with different kind of tables. Note that large tables / large populates are working very slow, so you might tune these values on your environment.

Notes:
  • if you have append and (extent or before) case, you shall stick with MASTER_PK_MAPPING as this will persist the pk mapping in the database for several insert into ... select from.
  • there are several places where such database-level mapping is done, so I don't think we have an easy task here:
    • If append: generate MASTER_PK_MAPPING because when we will copy extents / before tables, we won't be able to identify the right mapping (note that some records may be skipped due to unique constraint violation).
    • If extent: we simply need / generate MASTER_PK_MAPPING and do the copy using it.
    • If before: the same as extent, we need / generate MASTER_PK_MAPPING and do the copy (and before extent copy)
    • If we need destRecId: we compute the MASTER_PK_MAPPING to use it with srcRecId - I think this can be optimized

In general, MASTER_PK_MAPPING is good as you can execute copy-temp-table over the same table (with append eventually). So you should avoid an infinite recursion when doing INSERT INTO SELECT FROM. I doubt that WITH will help us here, as it is a view, not a materialized table. I think we should pay more attentions to #7382: where we can keep a single MASTER_PK_MAPPING which we are clearing, considering we have single-user _temp databases.

The only idea I have here is for append without extent and before: we can "inline" the MASTER_PK_MAPPING into the where clause and add a new condition (select only the rows with recid < ?, where ? is the last pk from that table). This way, we avoid any infinite loop. Note that this works only before copying the results in the master table. Also, we need to make sure we don't need that mapping afterward for extents, before table or destRecId (set postCopyPkMapping on false).

Please start your investigation from FastCopyHelper.executeCopy: it does the copy step-by-step (master, extents, before, before extents, destRecId). You can use a noMappingNeeded flag and send it to copyMasterTable. Let it decide if it can use an eventual safeTableCopyNoMapping. Profile with test-append.p.

#5 Updated by Alexandru Lungu about 1 year ago

I have the following quite nice test-case. Please make sure the final answer is still 1 2. I am a bit surprised that this actually works now with fast-copy helper ... congrats to me :) Never-mind, this doesn't use fast-copy.

def temp-table tt field f1 as INTEGER 
                  field f2 as INTEGER
                  index idx1 f1 f2 asc.

def temp-table tt2 field f1 as INTEGER 
                  field f2 as INTEGER 
                  index idx1 is unique f1.

do transaction:                 
    create tt.
    tt.f1 = 1.
    tt.f2 = 2.

    create tt.
    tt.f1 = 1.
    tt.f2 = 3.
end.

temp-table tt2:handle:copy-temp-table(temp-table tt:handle, true).

for each tt2:
    message tt2.f1 tt2.f2.
end.

The idea is that the second record 1 3 is not inserted, as 1 2 already matches the unique constraint. I think the idea in #7389-4 is still OK, as the unique constraints checks (made as subselects) are not depending on what was inserted / what was there before insert.

#6 Updated by Dănuț Filimon about 1 year ago

Committed 7389a.rev14593. Added FastCopyHelper.safeTableCopyNoMapping that doesn't use mapping.
At the moment, the copy statement is executed multiple times and I am investigating this issue. It may be related to the multiplex or sorting, but I am not sure.

Please review.

#7 Updated by Alexandru Lungu about 1 year ago

  • % Done changed from 20 to 80

Review of 7389a.rev14593

  • I am OK with the changes.
  • Please also consider srcRecId when ruling our pk mappings. If this parameter is set, you will need a mapping.

Danut, could you track down the issue and reach a stable state? I intend to start profiling this solution this week.

#8 Updated by Dănuț Filimon about 1 year ago

Committed 7389a.rev14594. Fixed the problem mentioned in #7389-6. It was caused by missing records that were not copied in tables that use the same buffer but have different multiplex values. The recid clause should use the source buffer primary key and not the one of the destination buffer.

Alexandru Lungu wrote:

Please also consider srcRecId when ruling our pk mappings. If this parameter is set, you will need a mapping.

Done.

Now I am going to do the same test I did in #7389-3 and see how many times FastCopyHelper.safeTableCopyNoMapping produces the new statement.

#9 Updated by Dănuț Filimon about 1 year ago

Here is an update of the test from #7389-3:
Type Total No append or loose mode Append mode Loose mode
Fast insert 4490 4490 0 0
Insert, no mapping 1061 0 1061 0
Extent insert 377 375 2 0
Simple extent insert 0 0 0 0
Shadow insert 2 0 2 0

The new insert method replaces almost all shadow inserts.

#10 Updated by Alexandru Lungu about 1 year ago

  • % Done changed from 80 to 100
  • Status changed from WIP to Review

This is good. I am extracting your changes and add them to my profiling branch. I am curious of the improvement. However, I expect even more improvement once #7404 is ready, as there will be even more append copies which will be optimized by your changes. If the profiling doesn't show any performance regression, the changes are good to go.

#11 Updated by Alexandru Lungu about 1 year ago

The changes are good. However, I think this is quite related to #7404. Danut, please move 7389a changes to 7404a - I will like to profile them together. Back there, all replace-mode are converted into append-mode if the target is empty. I have cases where there are "tons" of replace modes, so it will make sense to have append faster (as the replace will be converted into appends).

#12 Updated by Dănuț Filimon about 1 year ago

Alexandru Lungu wrote:

The changes are good. However, I think this is quite related to #7404. Danut, please move 7389a changes to 7404a - I will like to profile them together. ...

Committed 7404a/rev.14630. I moved 7389a/rev.14593 and 14594 to the specified revision.

#13 Updated by Alexandru Lungu about 1 year ago

  • Status changed from Review to Test

Great. I will notify here when 7404a will be eventually merged.

#14 Updated by Alexandru Lungu about 1 year ago

  • Assignee set to Dănuț Filimon

#15 Updated by Alexandru Lungu 10 months ago

Committed 7404a to trunk as rev. 14730.

#16 Updated by Alexandru Lungu 6 months ago

This can be closed.

#17 Updated by Greg Shah 6 months ago

  • Status changed from Test to Closed

Also available in: Atom PDF