Feature #6821
java collection performance
0%
Related issues
History
#1 Updated by Constantin Asofiei over 1 year ago
For example, using Java for-each loop for an an ArrayList instead of a for
and getting the elements directly is more expensive, as there will be more objects allocated and hasNext()
be called each time.
Another example is using Map.entrySet().iterator()
when only the value or the key will be accessed.
In a large application, a next()
method in an Iterator is being called some 60 million times, for various lists/maps/sets. We need to analyze the iterator usage and improve/reduce it.
#2 Updated by Constantin Asofiei over 1 year ago
Some changes are in 3821c/14272. Greg, please review.
#3 Updated by Constantin Asofiei over 1 year ago
Eric: you may want to take a look at ScopedList, too.
#4 Updated by Constantin Asofiei over 1 year ago
Some notes about the profiling of a large customer application; this includes only JRE or FWD runtime calls:
Related to map access (hashCode calls):String.hashCode()
- 45 million calls (mostly from Map.get/put/containsKey)Utils.invoke
16 millionTableMapper$LegacyFieldInfo.getMutableInfo
1.8 millionTemporaryBuffer$ReferenceProxy.bind
1.25 millionTableMapper$TempTableMapper.mapClass
1 millionHandleChain.delete
880kFieldReference.getGetter/getSetter/getExtent
600kDmoMetadataManager.getDmoInfo(String)
537kSQLQuery.hydrateRecord
440kTriggerTracker.isTriggerEnabled
426kTextOps.matchesList
225kTemporaryBuffer.createPropsMap
180kBaseDataType$Type.of(String)
80k
HashMap.get
- 20 million callsString.hashCode
7 million callsMethod.hashCode
8 million (will be solved by Utils.invoke in #6819)TableMapper.getMultableInfo
1.8 million
iterator()
calls:
ArrayList.iterator
- 2 million calls - replace with basic 'for' loopIdentityHashMap$Values.iterator()
- 8 million calls - 7.8 fromBufferManager.deregisterBindingBuffer
ArrayDeque.iterator()
- 1.1 million callsDatabaseTriggerManager.peekTrigger
400kDatabaseTriggerManager.getTriggerStatus
160kWidgetPool$WorkArea.scopeStart/scopeFinished
180kObjectOps.isInitializing
100kProcedureManager$WorkArea.getStackEntry
60kProcedureManager$WorkArea.searchInStack
31k
next()
call:
HashMap$ValueIterator.next()
- 20 million callsSession.invalidateRecords
- 20 million
IdentityHashMap$ValueIterator.next()
- ~21 million callsBufferManager.deregisterBindingBuffer
~21 million - CA: already fixed in 6129b
ArrayList$Iter.next()
- 7 million calls (~80% is in H2 code)AnnotatedAst.notifyListenerLevelChanged
230kProcedureManager$WorkArea.scopeFinished
136kBufferImpl.attachDataSource
130kPreselectQuery.getRecordBuffers
113kCollections$UnmodifiableCollection
97k (from RandomAccessQuery.executeImpl, FastCopytHelper.processProperties, and more)TransactionManager.processCommit
96kTransactionManager.processValidate
96kDynamicTablesHelper$CacheKey.hashCode
92kDynamicTablesHelper$CacheKey.equals
184kPreselectQuery.verifyJoins
63kAbstractParameter.processAssignments
56k
IdentityHashMap$EntryIterator.next()
- ~5.8 million callsBufferManager.scopeStart
~5.8 million calls
ArrayDeque$DeqIterator.next()
- 2.2 million callsObjectOps.isInitializing
1.4 millionWidgetPool$WorkArea.scopeFinished/scopeStart
180kObjectOps.delete
171kProcedureManager.getStackEntry
114kObjectOps.instanceAssigned
70kArrayAssigner$WorkArea.isRegistered
52k
LinkedHashMap$LinkedEntryIterator.next()
- 1.8 million callsAnnotatedAst.duplicateImpl
980kSQLQuery.hydrateRecord
460kBufferImpl.attachDataSource
127kFqlToSqlConverter.expandAlis
80kBufferImpl.fill
80kPersister.update
54k
#5 Updated by Greg Shah over 1 year ago
- Assignee set to Stanislav Lomany
#6 Updated by Stanislav Lomany over 1 year ago
- Status changed from New to WIP
Constantin,
For example, using Java for-each loop for an an ArrayList instead of a
for
and getting the elements directly is more expensive, as there will be more objects allocated andhasNext()
be called each time.
I assume we're taking about replacing
for (Buffer buf : buffers)
with
for (int i = 0; i < buffers.size(); i++) { Buffer buf = buffers.get(i);
if
buffers
is an ArrayList
.I found that there're not so many places where the iterated list is explicity an
ArrayList
, however there're many cases when it is effectively it - simple analysis shows that an ArrayList
is always supplied to it, usually as a function input parameter or return value of other function. Should we replace those cases as well? Performance wise it may be a good idea, but an input parameter / return value has generic List
type, and what if another List
implementation is passed, like LinkedList
which has poor get()
perfomance?
Another example is using
Map.entrySet().iterator()
when only the value or the key will be accessed.
I don't think there's much difference. HashMap.EntrySet
, HashMap.KeySet
and HashMap.Values
are similar wrappers for the map nodes with no iternal state. And performance-wise they're similair too:
final class KeyIterator extends HashIterator implements Iterator<K> { public final K next() { return nextNode().key; } } final class ValueIterator extends HashIterator implements Iterator<V> { public final V next() { return nextNode().value; } } final class EntryIterator extends HashIterator implements Iterator<Map.Entry<K,V>> { public final Map.Entry<K,V> next() { return nextNode(); } }
#7 Updated by Constantin Asofiei over 1 year ago
Stanislav Lomany wrote:
I assume we're taking about replacing
Correct.
I found that there're not so many places where the iterated list is explicity an
ArrayList
, however there're many cases when it is effectively it - simple analysis shows that anArrayList
is always supplied to it, usually as a function input parameter or return value of other function. Should we replace those cases as well? Performance wise it may be a good idea, but an input parameter / return value has genericList
type, and what if anotherList
implementation is passed, likeLinkedList
which has poorget()
perfomance?
If there is a Java method with a List
argument, iterates that list, but all calls receive a ArrayList
, then just replace the parameter type to ArrayList
and use a for counter.
Another example is using
Map.entrySet().iterator()
when only the value or the key will be accessed.
The point with all the Map and Set iterators is that they allocate lots of short-lived objects.
Something else to consider; please check this test in a profiler:
procedure proc0. def input param i as int. end. function func0 returns int (input i as int). return i. end. def var i as int. def var l1 as int64. def var l2 as int64. l1 = etime. do i = 1 to 1000000: run proc0(i). end. l2 = etime. message (l2 - l1) "ms". l1 = etime. do i = 1 to 1000000: dynamic-function("func0", i). end. l2 = etime. message (l2 - l1) "ms".
You will see that the Scopeable scopestart/finished do lots of work - please check if it may be possible to condense two or more dictionaries/stacks into a single one, where the element encapsulates what we currently have in separate dictionary/stack.
#8 Updated by Igor Skornyakov over 1 year ago
Actually, I doubt that such optimization will result in any significant performance in all situations. See e.g. a discussion here - https://stackoverflow.com/questions/256859/is-there-a-performance-difference-between-a-for-loop-and-a-for-each-loop.
In any case, I believe that generally, it is not a good idea to perform optimization relying just on common sense. It is very easy to overlook some nuances such as an impact of a JIT compiler or the JRE version. I would perform benchmarking/profiling first.
#9 Updated by Greg Shah over 1 year ago
I would perform benchmarking/profiling first.
This is what was done. See #6821-4.
#10 Updated by Stanislav Lomany over 1 year ago
Actually, I doubt that such optimization will result in any significant performance in all situations.
I did some testing, and in this particular ArrayList case the suggested approach is 50% faster and has no memory implications compared to some GC work to garbage collect iterators.
#11 Updated by Stanislav Lomany over 1 year ago
The point with all the Map and Set iterators is that they allocate lots of short-lived objects.
That's true. I mean, are suggesting to replace Map.entrySet().iterator()
with Map.keySet()/values().iterator()
? If yes, then I'm telling there's no differece between these iterators memory or performance-wise.
#12 Updated by Igor Skornyakov over 1 year ago
Greg Shah wrote:
I would perform benchmarking/profiling first.
This is what was done. See #6821-4.
The number of calls says nothing about potential performance improvement after suggested changes. I've performed multiple profiling sessions and cannot remember that I've ever seen any of the iterator()
or next()
calls mentioned #6821-4 as CPU hotspots despite a huge number of such calls.
In addition. As far as I understand String.hashCode()
can hardly cause any performance issues which can be fixed since the corresponding value is cached. Well, in some situations HashMap
can be less efficient than e.g. TreeMap
because of e.g. a big number of collisions, but it is a completely different story.
I believe that a big number of the old-style synchronization primitives from the pre-java.util.concurrent
era can be a more promising target for optimization.
#13 Updated by Greg Shah over 1 year ago
Small improvements over millions of calls add up. The fact that the profiling tools don't show obvious hotspots means that there is not a single cause or a small list of causes, but instead we must implement efficiencies in a wider range of locations. We have already made measurable improvements with this specific technique and as Stanislav notes he sees additional improvements are possible.
If we need to switch data structures to gain efficiencies, we will do so.
We are pursuing many different approaches in parallel (see #4061 for base language and #4011 for persistence). No one single task will be the full answer. I expect this phase of work to require improvements from aproximately 15-20 different tasks.
I believe that a big number of the old-style synchronization primitives from the pre-java.util.concurrent era can be a more promising target for optimization.
I am open to specific recommendations here. Please create a new task relatred to the old-style synch primitives and document your proposal. We will discuss it further there.
#14 Updated by Igor Skornyakov over 1 year ago
Stanislav Lomany wrote:
Actually, I doubt that such optimization will result in any significant performance in all situations.
I did some testing, and in this particular ArrayList case the suggested approach is 50% faster and has no memory implications compared to some GC work to garbage collect iterators.
I do believe that one can see some performance boost with the suggested optimization, especially in some standalone artificial tests. What I doubt is that replacing all for each
loops with for
loops can result in an overall performance improvement that will be significant enough to justify the efforts and less code clarity.
#15 Updated by Stanislav Lomany over 1 year ago
That may be true. There're about 2k foreach loops, and about 10% of them can be updated. If we want, we can can limit optimization to specific packages, like p2j/persist
, p2j/ui
, p2j/util
.
Greg?
#16 Updated by Greg Shah over 1 year ago
The ones that are most important are listed in #6821-4. We aren't suggesting changing things everywhere.
#17 Updated by Constantin Asofiei over 1 year ago
- File Back-traces.csv added
ArrayList.iterator()
, the top callers are:
PreselectQuery.components()
- 430kAnnotatedAst.notifyListenerLevelChanged()
- 170kHQLBundle.statements
- via Collections's unmodifiable collection - 100kErrorManager.nestedSilent
- 120kTransactionManager.stopOffEndRegistration
- viaCollection.addAll
- 40kProcedureManager$WorkArea.scopeFinished
- 36kScrollableResults.get()
- 30kDAtaSet.clearImpl
- 40kAstractParameter$SCope.processAssignments
- 26k
See attached for a more comprehensive report after a single run of some tests, in a large customer app. A significant part is in H2, too.
A part of the goal here is to reduce the memory footprint of the iterator usage - for ArrayList, this can be solved by using a for counter instead of a for-each.
For the other cases - some analysis is needed to check if the reason for the large number of iterations is in some other place (or if it can be solved in some other ways).
#18 Updated by Stanislav Lomany over 1 year ago
- File 6821.diff added
- you may consider changes in some files unnecessary;
- In
EventList
,EventList.events
is normally anArrayList
but could be aLinkedList
in a single case. I replacedLinkedList
withArrayList
as a faster implementation forEventList
needs.
#19 Updated by Stanislav Lomany over 1 year ago
Related to map access (hashCode calls):
HandleChain.delete
880k
Maybe we don't need to perform first/lastResource.get(type)
because we can determine if the resource is first/last in the handle chain by checking its HandleChain.prev/nextSibling
?
HandleChain first = wa.firstResource.get(type); HandleChain last = wa.lastResource.get(type); if (this == first) { wa.firstResource.put(type, this.nextSibling); } if (this == last) { wa.lastResource.put(type, this.prevSibling); }
#20 Updated by Constantin Asofiei over 1 year ago
Stanislav Lomany wrote:
Please review the diff which replaces "for-each" loops with indexed "for" loops (I'll add headers later), specifically:
- you may consider changes in some files unnecessary;
- In
EventList
,EventList.events
is normally anArrayList
but could be aLinkedList
in a single case. I replacedLinkedList
withArrayList
as a faster implementation forEventList
needs.
what branch and revision did you use for this patch?
#21 Updated by Constantin Asofiei over 1 year ago
Stanislav Lomany wrote:
I think you are right:Related to map access (hashCode calls):
HandleChain.delete
880kMaybe we don't need to perform
first/lastResource.get(type)
because we can determine if the resource is first/last in the handle chain by checking itsHandleChain.prev/nextSibling
?
[...]
- if
prevSibling
is null, this must be the first resource being deleted, so nextSibling becomes the first resource - if
nextSibling
is null, this must be the last resource being deleted, so prevSibling becomes the last resource
You also need checks if this is the single resource in the chain being deleted, in this firstResource/lastResource must remove it completely.
Please check some standalone tests.
#22 Updated by Stanislav Lomany over 1 year ago
what branch and revision did you use for this patch?
3821c rev 14467.
#23 Updated by Constantin Asofiei over 1 year ago
- File CheckForEach.java added
Stanislav Lomany wrote:
what branch and revision did you use for this patch?
3821c rev 14467.
The patch is OK, just a minor formatting issue in EventList.intersects
, with the closing braces on lines 2012/2013.
Attached is a space-delimited file with all FOR EACH loops which have a List, ArrayList or Java array as the expression. The format is:
classqname:methodname expressionType line
Please look through this and fix cases in p2j.persist
, goldencode.ast
, and p2j.util
. Use common sense to prioritize these, you don't have to fix them all (for example, I think PayloadSerializer
Java array usage is a good place to fix, and also goldencode.ast
cases). I would start with Java arrays first (look for []
in the text). This was ran on 3821c/14468
This file was generated with https://spoon.gforge.inria.fr/ - the file is attached.
#24 Updated by Constantin Asofiei over 1 year ago
Keep in mind that this:
IdentityHashMap$ValueIterator.next() - ~21 million calls BufferManager.deregisterBindingBuffer ~21 million
is no longer an issue in 6129b.
#25 Updated by Constantin Asofiei over 1 year ago
- File foreach.csv added
Forgot to add the csv file.
#26 Updated by Constantin Asofiei over 1 year ago
Constantin Asofiei wrote:
Attached is a space-delimited file with all FOR EACH loops which have a List, ArrayList or Java array as the expression. The format is:
[...]... This was ran on 3821c/14468
The report was generated with the patch in #6821-18 applied, so these are excluded.
#27 Updated by Stanislav Lomany over 1 year ago
You also need checks if this is the single resource in the chain being deleted, in this firstResource/lastResource must remove it completely.
Please check some standalone tests.
Committed as 3821c rev 14472. Please review. I did some standalone testing.
#28 Updated by Constantin Asofiei over 1 year ago
Stanislav Lomany wrote:
You also need checks if this is the single resource in the chain being deleted, in this firstResource/lastResource must remove it completely.
Please check some standalone tests.Committed as 3821c rev 14472. Please review. I did some standalone testing.
Changes are OK.
Please backport this 3821c/14472 to 6129b and commit it there.
The same for the changes you make in this task, they need to be in 6129b, so please port them to this branch.
#29 Updated by Stanislav Lomany over 1 year ago
TableMapper$LegacyFieldInfo.getMutableInfo 1.8 million
This one looks like caching may be slightly improved, but I don't know how.
Session.invalidateRecords - 20 million
invalidateRecords
has TODO left:
* TODO: this is a terribly inefficient algorithm. We scan the entire contents of the cache and compare * table name strings of each entry to find a subset of records (possibly none). If it is used often, we * need a better caching strategy to support this use case.
What do you think about maintaing a Map
of table name -> set of records for this table. Entries are removed on record expiration event.
#30 Updated by Constantin Asofiei over 1 year ago
Stanislav Lomany wrote:
TableMapper$LegacyFieldInfo.getMutableInfo 1.8 million
This one looks like caching may be slightly improved, but I don't know how.
TableMapper is improved via #6825, don't touch that. I'll have a batch of changes today or tomorrow.
Session.invalidateRecords - 20 million
invalidateRecords
has TODO left:
[...]What do you think about maintaing a
Map
of table name -> set of records for this table. Entries are removed on record expiration event.
The downside will be maintaining these separate set of records. Which will require map lookup and set management. But look at the invalidateRecords
callers - maybe we can keep just a counter of records per each table, and if there are no cached records for that table, just do nothing?
#31 Updated by Greg Shah over 1 year ago
All usage of Session.invalidateRecords()
comes from RecordBuffer.pruneSessionCache()
which is only called in 2 places, both of this are in BufferImpl.acceptChanges()
.
Is a given DMO instance always specific to a Session
? That DMO is specific to the table already and could be used to store a direct link to the Set
of records in the cache. Something would still need to be done to handle the multiplex issue (an iteration of the set and removal of only those matches). Even that would be way faster than the current approach. The invalidation operation where the set could simply be cleared would be very fast for permanent tables.
#32 Updated by Stanislav Lomany over 1 year ago
invalidateRecords
method is called only for temp-tables. And called only by ACCEPT-CHANGES
method (I'm interested what app were you profiling because I couldn't find its usages in what I have). So maintaining a map is an overhead for the most applications. Now I understand why it was left as TODO. I don't think a counter will help because most likely we do have changes to commit for a ACCEPT-CHANGES
call.
Greg, are you suggesting to maintain for temp-tables a map of RecordBuffer.dmoClass
-> Set
of records? Make a directory option to enable this extra mapping for a customer's app?
#33 Updated by Greg Shah over 1 year ago
Greg, are you suggesting to maintain for temp-tables a map of RecordBuffer.dmoClass -> Set of records?
Not exactly. I was suggesting that your idea of the mapping between table name and a Set
of the records, could be taken a little further and the reference to the Set
could be stored in the DMO such that it would be very fast to access/clear.
My real point here is that it is OK to do something more aggressive/complex if there is a big performance win. More sophisticated caching is certainly worth the effort. I did not analyze all the map maintenance, nor did I look at the various use patterns.
Make a directory option to enable this extra mapping for a customer's app?
Whatever we do it would be preferred to have the code be standard for all customers.
#34 Updated by Greg Shah over 1 year ago
One thing that is not obvious to me is how the cache is used. Since we are using temp-tables with this mechanism, it seems to me that the same RecordIdentifier
can be present under different multiplex ids. Do I understand correctly that only the most recently added entry will be kept and any older entry is overwritten? Is that intentional?
#35 Updated by Eric Faulhaber over 1 year ago
Greg Shah wrote:
Greg, are you suggesting to maintain for temp-tables a map of RecordBuffer.dmoClass -> Set of records?
Not exactly. I was suggesting that your idea of the mapping between table name and a
Set
of the records, could be taken a little further and the reference to theSet
could be stored in the DMO such that it would be very fast to access/clear.
What do you mean specifically by "the DMO" in this context? An individual instance? The implementation class? Something else?
#36 Updated by Stanislav Lomany over 1 year ago
Do I understand correctly that only the most recently added entry will be kept and any older entry is overwritten? Is that intentional?
Looks like a good catch: it seems that a record with the same record ID but different multiplex ID it can be returned by the cache.
#37 Updated by Eric Faulhaber over 1 year ago
Greg Shah wrote:
One thing that is not obvious to me is how the cache is used. Since we are using temp-tables with this mechanism, it seems to me that the same
RecordIdentifier
can be present under different multiplex ids. Do I understand correctly that only the most recently added entry will be kept and any older entry is overwritten? Is that intentional?
You shouldn't have multiple DMO instances with different multiplex IDs but the same RecordIdentifier
. Every RecordIdentifier
should identify one and only one DMO instance in a user context (for temp-tables) or across the server (for persistent tables). If the multiplex ID is different between two temp-table DMO instances of the same type, the RecordIdentifier
must be different as well; there can be only one. If we have drifted from this, it is a bug.
There are special exceptions to this rule for housekeeping DMO instances (like a snapshot copy taken as a reference for query navigation). But these special DMOs must never be stored in the session cache.
#38 Updated by Eric Faulhaber over 1 year ago
Stanislav Lomany wrote:
Do I understand correctly that only the most recently added entry will be kept and any older entry is overwritten? Is that intentional?
Looks like a good catch: it seems that a record with the same record ID but different multiplex ID it can be returned by the cache.
Are you making a theoretical statement, or are you actually seeing this in the wild?
#39 Updated by Stanislav Lomany over 1 year ago
I made an assumption that temp-table record IDs are not unique across multiplex IDs, which is wrong, as you're saying. No, I'm not seeing it.
#40 Updated by Greg Shah over 1 year ago
Do we assign a new rowid
when we copy temp-table records from one temp-table to another, like in table parameter processing?
#41 Updated by Greg Shah over 1 year ago
Eric Faulhaber wrote:
Greg Shah wrote:
Greg, are you suggesting to maintain for temp-tables a map of RecordBuffer.dmoClass -> Set of records?
Not exactly. I was suggesting that your idea of the mapping between table name and a
Set
of the records, could be taken a little further and the reference to theSet
could be stored in the DMO such that it would be very fast to access/clear.What do you mean specifically by "the DMO" in this context? An individual instance? The implementation class? Something else?
An individual instance.
#42 Updated by Stanislav Lomany over 1 year ago
Please backport this 3821c/14472 to 6129b and commit it there.
Committed as 6129b rev 14343.
#43 Updated by Eric Faulhaber over 1 year ago
Greg Shah wrote:
Eric Faulhaber wrote:
Greg Shah wrote:
Greg, are you suggesting to maintain for temp-tables a map of RecordBuffer.dmoClass -> Set of records?
Not exactly. I was suggesting that your idea of the mapping between table name and a
Set
of the records, could be taken a little further and the reference to theSet
could be stored in the DMO such that it would be very fast to access/clear.What do you mean specifically by "the DMO" in this context? An individual instance? The implementation class? Something else?
An individual instance.
I REALLY dislike the idea of DMO instances having references to sets of other DMO instances. This sounds very messy. DMO instances are meant to be the endpoints of an object graph, not a set of branches with graphs of references to each other.
#44 Updated by Stanislav Lomany over 1 year ago
The "for" cycle changes reviewed by Constantin were merged and committed as 6129b rev 14344.
#45 Updated by Stanislav Lomany over 1 year ago
I would start with Java arrays first (look for
[]
in the text).
Constantin, are you suggesting to replace enhanced "for" loops with indexed loops for arrays?
int[] arr = new int[10]; for (int i : arr)
I'm afraid this can be even counterproductive: https://stackoverflow.com/questions/70583053/looping-over-array-performance-difference-between-indexed-and-enhanced-for-loop
#46 Updated by Constantin Asofiei over 1 year ago
Stanislav Lomany wrote:
I would start with Java arrays first (look for
[]
in the text).Constantin, are you suggesting to replace enhanced "for" loops with indexed loops for arrays?
[...]
I'm afraid this can be even counterproductive: https://stackoverflow.com/questions/70583053/looping-over-array-performance-difference-between-indexed-and-enhanced-for-loop
Stanislav, you are correct, I got carried away with that.
#47 Updated by Greg Shah over 1 year ago
Eric Faulhaber wrote:
Greg Shah wrote:
Eric Faulhaber wrote:
Greg Shah wrote:
Greg, are you suggesting to maintain for temp-tables a map of RecordBuffer.dmoClass -> Set of records?
Not exactly. I was suggesting that your idea of the mapping between table name and a
Set
of the records, could be taken a little further and the reference to theSet
could be stored in the DMO such that it would be very fast to access/clear.What do you mean specifically by "the DMO" in this context? An individual instance? The implementation class? Something else?
An individual instance.
I REALLY dislike the idea of DMO instances having references to sets of other DMO instances. This sounds very messy. DMO instances are meant to be the endpoints of an object graph, not a set of branches with graphs of references to each other.
I was thinking of the DMO in its buffer form, not the concept of the specific record that is contained.
#48 Updated by Stanislav Lomany over 1 year ago
PreselectQuery.components() - 430k
Committed 6129b rev 14345: replaced components iterator with ArrayList.get()
.
Please review.
#49 Updated by Eric Faulhaber over 1 year ago
Greg Shah wrote:
I was thinking of the DMO in its buffer form, not the concept of the specific record that is contained.
It would be fairly straightforward to track all records which are or were at one point held by each buffer, but this wouldn't necessarily be all the records for that buffer's backing table. Other buffers could be defined for that same backing table, which could hold a different or overlapping set of records. If I understand the pruning case correctly, we need to invalidate the union of all these.
For this, we would need some more complex map/set management which would track the superset of all buffers for that table in the local context. It certainly could be done, and the prune operation itself most likely would be more efficient than the current mechanism. What I am concerned with is that we would be adding the overhead of constantly managing these sets of records for what seems like a very specific case of needing to prune them only for ACCEPT-CHANGES calls.
If this is only needed for temp-tables involved in a data set (Ovidiu, please verify this), we could reduce the cases where we are actively tracking these "prunable" records. If this is only needed for data sets, it seems we should manage this information at the data set level. As temp-tables are associated with a data set, they would be notified that they need to update this information when records flow through them.
Ovidiu, can you help me understand the purpose of the pruning? Is it functional, or is it about housekeeping to save memory or to manage the cache? Perhaps we can scope this overhead down even further, for instance, if it can be limited to a certain operation. Should we be temporarily disabling caching while some operation occurs, rather than trying to clean up the cache after that operation? Now that we know there is a bottleneck in the pruning, let's consider whether we can avoid it altogether, while still performing the functionality we need.
#50 Updated by Ovidiu Maxiniuc over 1 year ago
The reason pruneSessionCache()
is called on afterBuf
and beforeBuf
after a batch operation executed using SQL directly on database. Since this bypassed the FWD caching mechanism, the latter must be invalidated for the table affected, or else the information in cache(s) might not be correct. The afterBuf
data is altered by an UPDATE
statement and the beforeBuf
by a delete
because the ACCEPT-CHANGES
basically makes the after table "definitive" by simply deleting the "history" (drops all before images and the now invalid links to them). Since we do not have direct fine control on which records were altered, the whole tables are invalidated. Ultimately, we could iterate the tables individually and process (delete/update then invalidate) the records individually, but this would affect performance.
Note that this call is only valid for BEFORE-TABLEs and only temp-tables are affected. If the method is called on a permanent table or even on a simple or after-table, it will quickly exit with an error (11887).
This invalidations were added added in r12998. Commit details can be found in #5671-8.
#51 Updated by Greg Shah over 1 year ago
Ultimately, we could iterate the tables individually and process (delete/update then invalidate) the records individually, but this would affect performance.
Don't we already know the exact list of rowids used for all UPDATE
statements? We also know the table and multipex ID so we could track these as a list and then use that instead of searching through.
I'm not sure if something similar can be calculated or tracked for deletes but if we can just track the exact list, then a full invalidation would not be needed. Perhaps this could optimize this ACCEPT-CHANGES
and also avoid over-aggressive invalidation which would have downstream performance benefits.
#52 Updated by Ovidiu Maxiniuc over 1 year ago
Greg Shah wrote:
Don't we already know the exact list of rowids used for all
UPDATE
statements? We also know the table and multipex ID so we could track these as a list and then use that instead of searching through.
There is no list of these rowids, but it can be extracted from SQL using a query like:
select <PK> from <after-table> where _rowState is null or _rowState = 0 /*ROW_UNMODIFIED*/;
before the update.
I'm not sure if something similar can be calculated or tracked for deletes but if we can just track the exact list, then a full invalidation would not be needed. Perhaps this could optimize this
ACCEPT-CHANGES
and also avoid over-aggressive invalidation which would have downstream performance benefits.
We are completely dropping the content of this temp table (if the changes were accepted, the before-table is reset) so the invalidation of all records sharing the specific _multiplex
is the right solution here.
#53 Updated by Stanislav Lomany over 1 year ago
Committed 6129b rev 14351: P2JIndex.components()
provides direct access to the components array in order to improve performance.
Please review.
#54 Updated by Stanislav Lomany over 1 year ago
Committed 6129b rev 14358: Final round of performance improvements which are replacements of some 'for-each' loops with indexed 'for' loops.
Please review.
#55 Updated by Stanislav Lomany over 1 year ago
l1 = etime. do i = 1 to 1000000: run proc0(i). end. l2 = etime. message (l2 - l1) "ms". l1 = etime. do i = 1 to 1000000: func0(i). end. l2 = etime. message (l2 - l1) "ms".
It looks interesting to me that functions are executed 10 times slower than procedures due to presence of returnNormal
in functions (I didn't look deeper yet).
In 4GL execution of both segments takes ~4s for each. In FWD the first segment takes ~2.5s and the second ~25s.
#56 Updated by Greg Shah over 1 year ago
Stanislav Lomany wrote:
[...]
It looks interesting to me that functions are executed 10 times slower than procedures due to presence of
returnNormal
in functions (I didn't look deeper yet).
In 4GL execution of both segments takes ~4s for each. In FWD the first segment takes ~2.5s and the second ~25s.
Hmm, I think it is worth looking deeper. Please go ahead with that.
#57 Updated by Stanislav Lomany over 1 year ago
Performance overhead in function vs procedure is cause by throwing ReturnUnwindException
. I think that in vast majority of cases RETURN
is the last statement in a function, and in this case I suppose that returnNormal
can be replaced with setting of return value and letting function to end naturally.
#58 Updated by Greg Shah over 1 year ago
- Related to Feature #7045: re-implement "normal" (non-abend) stack unwinding to avoid throwing an exception added
#59 Updated by Greg Shah over 1 year ago
Stanislav Lomany wrote:
Performance overhead in function vs procedure is cause by throwing
ReturnUnwindException
. I think that in vast majority of casesRETURN
is the last statement in a function, and in this case I suppose thatreturnNormal
can be replaced with setting of return value and letting function to end naturally.
Nice find. We always knew it was expensive but now you've demonstrated it is needing serious attention. I'm going to work this in #7045.
#60 Updated by Stanislav Lomany over 1 year ago
You will see that the Scopeable scopestart/finished do lots of work - please check if it may be possible to condense two or more dictionaries/stacks into a single one, where the element encapsulates what we currently have in separate dictionary/stack.
I'm a bit skeptical about this: Scopables
doesn't have much uniformity in logic and used data types: they use ScopedDictionaries
, PartitionedArrays
, Deques
.
#61 Updated by Stanislav Lomany over 1 year ago
As far as I can see, only pruneSessionCache()
issue is left here.
#62 Updated by Constantin Asofiei about 1 year ago
- Avoid the
LinkedList
overhead, useArrayDeque
instead, forNamedFunction.enclosingScopes
. - In
ScopedDictionary
andTailedScopedDictionary
, avoid the overhead of calculating an empty map when the scope has no dictionary - internally,Node.getDictionary(false)
will return null if there is no dictionary; but,getDictionaryAtScope()
will still return an empty map, instead of null. - Avoid
listIterator
usage, as a plain for loop can be used, inTransactionManager.nearestTopLevel
.
Tijs: you've last touched ScopedDictionary
and TailedScopedDictionary
, please review these changes.
#64 Updated by Tijs Wickardt about 1 year ago
Reviewed p2j 7026a -r14497..14499
.
The changes are good.
Remark on Avoid the overhead of calculating an empty map
:
Previously the empty map originated from Collections.emptyMap()
.
This returns a singleton, so we are only removing the overhead of assigning that singleton to a member.
Am I missing something here (probably)?
The old behavior, using the immutable empty map, guards against NPE's and makes the intention clearer than null
.
#65 Updated by Tijs Wickardt about 1 year ago
- Status changed from Review to WIP
#66 Updated by Constantin Asofiei about 1 year ago
Tijs Wickardt wrote:
Reviewed
p2j 7026a -r14497..14499
.
The changes are good.
Remark onAvoid the overhead of calculating an empty map
:
Previously the empty map originated fromCollections.emptyMap()
.
This returns a singleton, so we are only removing the overhead of assigning that singleton to a member.
Am I missing something here (probably)?
The old behavior, using the immutable empty map, guards against NPE's and makes the intention clearer thannull
.
The overhead is seen when this code is executed millions of times - I agree that using an immutable empty map the code is cleaner, but every member access of the map (clean, contains, get, etc) is expensive when executing lots of times. Plus the actual invocation to get the empty map via the supplyImmutableEmptyMap
is costly.
#67 Updated by Tijs Wickardt about 1 year ago
Constantin Asofiei wrote:
The overhead is seen when this code is executed millions of times - I agree that using an immutable empty map the code is cleaner, but every member access of the map (clean, contains, get, etc) is expensive when executing lots of times. Plus the actual invocation to get the empty map via the
supplyImmutableEmptyMap
is costly.
Agreed, all small things count on a hot path. If the impact is noticeable, I fully agree.
Did you profile this? (I see #6821-55 but it seems high level).
#68 Updated by Constantin Asofiei about 1 year ago
Tijs Wickardt wrote:
Did you profile this? (I see #6821-55 but it seems high level).
Yes, this was from profiling a customer's application.
#69 Updated by Tijs Wickardt about 1 year ago
Constantin Asofiei wrote:
Yes, this was from profiling a customer's application.
Great. Review complete, approved.
#70 Updated by Alexandru Lungu about 1 year ago
Greg Shah wrote:
Ultimately, we could iterate the tables individually and process (delete/update then invalidate) the records individually, but this would affect performance.
Don't we already know the exact list of rowids used for all
UPDATE
statements? We also know the table and multipex ID so we could track these as a list and then use that instead of searching through.I'm not sure if something similar can be calculated or tracked for deletes but if we can just track the exact list, then a full invalidation would not be needed. Perhaps this could optimize this
ACCEPT-CHANGES
and also avoid over-aggressive invalidation which would have downstream performance benefits.
I read your discussion regarding Session.invalidateRecords
, because I stumbled upon a hot-spot on HashMap.ValueIterator.next()
caused by this method. I see that you link this method very closely to ACCEPT-CHANGES
, but in my tests, 75% of the back traces lead to TemporaryBuffer.doCloseMultiplexScope
(when multiplex is no longer referenced), 22% to TemporaryBuffer.removeRecords
(when doing full remove) and 3% to TemporaryBuffer.loopDelete
(when doing full remove). All of these 3 are widely used in large customer applications. None of the samples were related to ACCEPT-CHANGES
.
In a server start-up, I've got: 2.887 invalidateRecords
calls, 1.774.419 table mismatches, 7.374 multiplex mismatches and 117ms wasted in the for loop
In a POC run, I've got: 44.768 invalidateRecords
calls, 26.603.934 table mismatches, 75.784 multiplex mismatches and 1.004ms wasted in the for loop
Therefore, should we consider that this method has a wider usage and should be optimized? I don't have an idea for this yet - thinking.