Bug #7389
Make fast-copy statements cachable
100%
Related issues
History
#1 Updated by Alexandru Lungu about 1 year ago
Fast copy statements (INSERT INTO [...] SELECT FROM [...]) are quite often, especially as we have multiplex on tables and not separate tables for each DEFINE TEMP-TABLE.
However, in certain cases (when append option is used or normalized extent fields are used), an intermediate table is created named MASTER__PK__MAPPING
. This allows mapping the old recid with the newly generated recid. Note that in append mode, only the records which don't violate any unique constraint are copied!
With MASTER__PK__MAPPING
, the master table row copies and extent table row copies are done by joining this intermediate table. Similar semantics are for BEFORE__PK__MAPPING
and LIST__INDEX__TABLE
.
MASTER__PK__MAPPING
is created before this process and dropped after. This forces the prepared statement to be reparsed each time re-used.psCache
has several such statements, but even if they are cached, a reparsing is done when the cache is hit (because we are talking about anotherMASTER__PK__MAPPING
table object each time).- Forcing out reparsing may increase performance as it avoids checking each time if a specific statement should be reparsed or not. For statements over tables with >100 columns, this checking is significant.
- We've tried in #7351 to make unique
MASTER__PK__MAPPING
by appending auid
at the end. This way, we can force out reparsing as database objects will always have different names. However, this really flooded the cache withINSERT INTO SELECT FROM
statements, which are large by nature and can't be ever reused.
One option is to avoid caching such statements altogether, but it feels quite hacky and not optimal.
The best to identify which is the main trigger for such MASTER__PK__MAPPING
: append mode or normalized extent fields, so we can focus on the right scenario. I have a feeling that normalized extent fields are causing trouble here, but I will provide an update from a large customer application here.
#2 Updated by Alexandru Lungu about 1 year ago
- Related to Bug #7351: Reduce SQL query string size of an INSERT INTO SELECT FROM added
#3 Updated by Dănuț Filimon about 1 year ago
Type | Total | No append or loose mode | Append mode | Loose mode |
---|---|---|---|---|
Fast insert | 4504 | 4504 | 0 | 0 |
Extent insert | 361 | 357 | 4 | 0 |
Simple extent insert | 0 | 0 | 0 | 0 |
Shadow insert | 1081 | 0 | 1081 | 0 |
- Fast insert - generated by
FastCopyHelper.getFastTableCopySql
- Extent insert - generated by
FastCopyHelper.getExtentInsertSql
- Simple extent insert - generated by
FastCopyHelper.getSimpleExtentInsertSql
- Shadow insert - generated by
FastCopyHelper.getShadowCopySql
From the inserts mentioned above, only fast insert statements are the ones that don't use MASTER_PK_MAPPING/BEFORE_PK_MAPPING/LIST__INDEX__TABLE
, so this means that ~24.25% from the total use this mapping. Other interesting findings are that ~81.75% don't use any mode, while ~18.25% use append mode and 0% use loose mode.
#4 Updated by Alexandru Lungu about 1 year ago
- Status changed from New to WIP
- % Done changed from 0 to 20
This basically means that we should focus on the append mode: this is the one that usually triggers MASTER_PK_MAPPING
(at least for your application). I don't think we should rule out extents; AFAIK, there are several applications rather using extents in temp-tables with copy.
I uploaded testcases/uast/fast-copy
with the fast-copy test when I first designed FastCopyHelper
. They are mostly profiling tests. You can test different scenarios with different kind of tables. Note that large tables / large populates are working very slow, so you might tune these values on your environment.
- if you have
append
and (extent
orbefore
) case, you shall stick withMASTER_PK_MAPPING
as this will persist the pk mapping in the database for severalinsert into ... select from
. - there are several places where such database-level mapping is done, so I don't think we have an easy task here:
- If append: generate
MASTER_PK_MAPPING
because when we will copy extents / before tables, we won't be able to identify the right mapping (note that some records may be skipped due to unique constraint violation). - If extent: we simply need / generate
MASTER_PK_MAPPING
and do the copy using it. - If before: the same as extent, we need / generate
MASTER_PK_MAPPING
and do the copy (and before extent copy) - If we need destRecId: we compute the
MASTER_PK_MAPPING
to use it withsrcRecId
- I think this can be optimized
- If append: generate
In general, MASTER_PK_MAPPING
is good as you can execute copy-temp-table
over the same table (with append eventually). So you should avoid an infinite recursion when doing INSERT INTO SELECT FROM
. I doubt that WITH
will help us here, as it is a view, not a materialized table. I think we should pay more attentions to #7382: where we can keep a single MASTER_PK_MAPPING
which we are clearing, considering we have single-user _temp databases.
The only idea I have here is for append
without extent and before: we can "inline" the MASTER_PK_MAPPING
into the where clause and add a new condition (select only the rows with recid < ?, where ? is the last pk from that table). This way, we avoid any infinite loop. Note that this works only before copying the results in the master table. Also, we need to make sure we don't need that mapping afterward for extents, before table or destRecId
(set postCopyPkMapping
on false
).
Please start your investigation from FastCopyHelper.executeCopy
: it does the copy step-by-step (master, extents, before, before extents, destRecId). You can use a noMappingNeeded
flag and send it to copyMasterTable
. Let it decide if it can use an eventual safeTableCopyNoMapping
. Profile with test-append.p
.
#5 Updated by Alexandru Lungu about 1 year ago
I have the following quite nice test-case. Please make sure the final answer is still 1 2
. I am a bit surprised that this actually works now with fast-copy helper ... congrats to me :) Never-mind, this doesn't use fast-copy.
def temp-table tt field f1 as INTEGER field f2 as INTEGER index idx1 f1 f2 asc. def temp-table tt2 field f1 as INTEGER field f2 as INTEGER index idx1 is unique f1. do transaction: create tt. tt.f1 = 1. tt.f2 = 2. create tt. tt.f1 = 1. tt.f2 = 3. end. temp-table tt2:handle:copy-temp-table(temp-table tt:handle, true). for each tt2: message tt2.f1 tt2.f2. end.
The idea is that the second record 1 3
is not inserted, as 1 2
already matches the unique constraint. I think the idea in #7389-4 is still OK, as the unique constraints checks (made as subselects) are not depending on what was inserted / what was there before insert.
#6 Updated by Dănuț Filimon about 1 year ago
Committed 7389a.rev14593. Added FastCopyHelper.safeTableCopyNoMapping
that doesn't use mapping.
At the moment, the copy statement is executed multiple times and I am investigating this issue. It may be related to the multiplex or sorting, but I am not sure.
Please review.
#7 Updated by Alexandru Lungu about 1 year ago
- % Done changed from 20 to 80
Review of 7389a.rev14593
- I am OK with the changes.
- Please also consider
srcRecId
when ruling our pk mappings. If this parameter is set, you will need a mapping.
Danut, could you track down the issue and reach a stable state? I intend to start profiling this solution this week.
#8 Updated by Dănuț Filimon about 1 year ago
Committed 7389a.rev14594. Fixed the problem mentioned in #7389-6. It was caused by missing records that were not copied in tables that use the same buffer but have different multiplex values. The recid clause should use the source buffer primary key and not the one of the destination buffer.
Alexandru Lungu wrote:
Please also consider
srcRecId
when ruling our pk mappings. If this parameter is set, you will need a mapping.
Done.
Now I am going to do the same test I did in #7389-3 and see how many times FastCopyHelper.safeTableCopyNoMapping
produces the new statement.
#9 Updated by Dănuț Filimon about 1 year ago
Type | Total | No append or loose mode | Append mode | Loose mode |
---|---|---|---|---|
Fast insert | 4490 | 4490 | 0 | 0 |
Insert, no mapping | 1061 | 0 | 1061 | 0 |
Extent insert | 377 | 375 | 2 | 0 |
Simple extent insert | 0 | 0 | 0 | 0 |
Shadow insert | 2 | 0 | 2 | 0 |
The new insert method replaces almost all shadow inserts.
#10 Updated by Alexandru Lungu about 1 year ago
- % Done changed from 80 to 100
- Status changed from WIP to Review
This is good. I am extracting your changes and add them to my profiling branch. I am curious of the improvement. However, I expect even more improvement once #7404 is ready, as there will be even more append copies which will be optimized by your changes. If the profiling doesn't show any performance regression, the changes are good to go.
#11 Updated by Alexandru Lungu about 1 year ago
The changes are good. However, I think this is quite related to #7404. Danut, please move 7389a changes to 7404a - I will like to profile them together. Back there, all replace-mode are converted into append-mode if the target is empty. I have cases where there are "tons" of replace modes, so it will make sense to have append faster (as the replace will be converted into appends).
#12 Updated by Dănuț Filimon about 1 year ago
Alexandru Lungu wrote:
The changes are good. However, I think this is quite related to #7404. Danut, please move 7389a changes to 7404a - I will like to profile them together. ...
Committed 7404a/rev.14630. I moved 7389a/rev.14593 and 14594 to the specified revision.
#13 Updated by Alexandru Lungu about 1 year ago
- Status changed from Review to Test
Great. I will notify here when 7404a will be eventually merged.
#14 Updated by Alexandru Lungu about 1 year ago
- Assignee set to Dănuț Filimon
#15 Updated by Alexandru Lungu 10 months ago
Committed 7404a to trunk as rev. 14730.
#16 Updated by Alexandru Lungu 6 months ago
This can be closed.