Project

General

Profile

Bug #7336

optimize DmoMeta.byLegacyName

Added by Alexandru Lungu about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:
version_reported:
version_resolved:

performance-tests.png (33.4 KB) Alexandru Lungu, 05/10/2023 05:12 AM

History

#1 Updated by Alexandru Lungu about 1 year ago

  • File performance-tests.png added
  • Status changed from New to WIP
  • Start date deleted (05/09/2023)
  • % Done changed from 0 to 30

The effort for this task is related to the DmoMeta.byLegacyName hotspot, caused by lots of calls to TextOps.rightTrimLower. This is required to identify a certain property in a map based on a rtrimmed case-insensitive comparison.

Exploring

General use-case

In a run of a large customer application, I get ~588k hits of DmoMeta.byLegacyName.
A single TextOps.rightTrimLower takes on average 0.00057ms, while an IdentityHashMap look-up is done in 0.00007ms close to 0.00009ms.
Our use-case requests to support a lot of look-ups (~588k) and no real-time updates. Also, the build-up time is negligible, especially in warm runs.
In conclusion, the map we use stores the intern representation of strings, so the map look-up is very fast. However, we are forced to rtrim and lower the strings we use before comparison (+ interning) at look-up time. The solution is to move the complexity from look-up time to build-up time.

Specific use-case

In the customer application I tested, a temp-table has ~35 fields, while a persistent-table has ~65 fields. These are only static tables, the real use-case may be skewed due to dynamic definitions.

Patricia Trie Performance

I had an attempt to implement a classical character based trie, but end up with very bad performance results. I don't have a clear mean of storing the edges, without obvious problems comparing to a simple IdentityHashMap.
  • A hash map in each trie node explodes the put time (x50). The get time is only x2 slower.
  • An array of 100 positions (just on the presumptions that characters are usually ASCII) causes OOM.
  • I encountered other implementations in Java of such trie, but most of them relate to suffix compression or edge compression. These are heuristics and don't guarantee a good fit on our use-case.
Of course, I had to fall-back to a classic PATRICIA tree (bitwise trie over strings). Luckily, apache.commons already has such implementation. However, there are some advantages and disadvantages:
  • The time is considerably better, considering we don't have to generate huge structures in each node, but only keep a reference to left/right nodes.
  • Each character is converted to its binary representation to use the left/right model from bitwise trie model. In Java, characters have 16 bits, so a 20 character long string determines a 320 depth tree. An easy optimization here is to extend the PATRICIA trie implementation to allow only 8-bit binary representation. I expect most of the table field names to use only ASCII characters anyway.
  • apache.commons4 allows the override of AbstractBitwiseTrie, but not the PatriciaTrie. Therefore, we don't have a mean of easily extending the PATRICIA tree there, beside re-extending it from AbstractBitwiseTrie (or do a pull request).

I've done some tests in that sense:

The y-axis is the time and the x-axis is the number of .get calls. In the left diagram we have a target map with 20 entries. In the right diagram we have a target map with 100 entries. The put time is similar between these. All times are warm.
For IdentityHashMap, I use the intern of strings. Each look-up and update is done with toLowerCase.
For PatriciaTrie, I use the original strings. Each look-up and update is done with the string as it is. This approach presumes that we will extend the PatriciaTrie implementation to solve the case-insensitive issue implicitly. Therefore, we can consider that these times will increase a bit after integration.
For CaseInsensitiveHashMap, I use the original strings. Each look-up and update is done with the string as it is. The implementation already takes case of case-sensitivity.

The results are not promising for the PatriciaTrie, especially in comparison with CaseInsensitiveMap. Also note that the PatriciaTrie should be extended to achieve something like this. Also, it may prove faster with lowering the bits used per character (8 instead of 16).

IdentityHashMap optimization

This is important!

I experimentally added a map look-up before TextOps.rightTrimLower(lf).intern(). If the parameter is already in the map, return it. I used the fact that the map entries are interned, so usually the string we will get may be already in the map as they are.
Out of 588k calls, 440k resolve with this short-circuit. The total time for this is 32ms. The other 148k are not found in the map in the first try (or the map is not created yet). All these 148k are resolved in 86ms.
With a rough calculation on my side, the old implementation shown 376ms spent in DmoMeta.byLegacyName. With this short-circuit, only 124ms are spent here, meaning a -66% improvement. This doesn't eliminate the lowerCase and trim, but rapidly handles the cases which don't require them.
With a real profiling scenario, the old implementation shown 250ms spent on the look-up. With this short-circuit, only 139ms are spent here, meaning a -45% improvement. This doesn't eliminate the lowerCase and trim, but rapidly handles the cases which don't require them.
The downside here is that we increase the dependence upon an IdentityHashMap and rule out the possibility to switch to a CaseInsensitiveHashMap.

I will update this task with a comparison between IdentityHashMap with short-circuit and CaseInsensitiveHashMap.

Right trimming

TODO

#2 Updated by Alexandru Lungu about 1 year ago

Done multiple runs of a customer POC. I got:

Original: 50ms spent on map look-up, 200ms spent on lowering, right-trimming and interning.
Original optimized: 63ms spent on map look-up and short-circuit, 76ms spent on lowering, right-trimming and interning.
CaseInsensitiveMap: 90ms spent on map look-up, 21 ms spent on right trimming.
Note that I also tried CaseInsensitiveMap with interning, but it was disastrous (94 ms look-up, 173 rtrim and intern).

The best solution for now is using CaseInsensitiveMap to a total of 111ms comparing to the original 250ms (-55% improvement). Fortunately, I think that we can make CaseInsensitiveMap aware of rtrim: compute the hash without the last spaces and do regionMatch properly. Hopefully, this can go under 100ms and have our optimization close to -60%.

On the other side, I would like to go ahead with a apache.commons4 PATRICIA test. Eventually, I will do some magic to have a case-sensitive, 8bit implementation there and see some proper results. If PATRICIA will have a look-up < 90ms (comparing to CaseInsensitiveMap), we can switch to PATRICIA and do the rtrimming inside.

#3 Updated by Alexandru Lungu about 1 year ago

  • Status changed from WIP to Review
  • % Done changed from 30 to 100

Right trimming

I found it accessible to add rtrim in some structures:
  • For CaseInsensitiveMap I stop computing the hash before the trailing spaces (detected after doing a reverse look-up). The string regionMatches will be done until the trailing spaces
  • For PatriciaTrie, string regionMatches is done until the trailing spaces. Fortunately, the suffix compression technique used there allowed me to approach such simple solution.
  • For IdentityHashMap and HashMap, such extension wasn't possible, so rtrimming should be done before look-up.

Patricia Trie optimizations

I've done a lot of PatriciaTrie optimizations. I cut down the whole implementation to the following traversing function.

public boolean isBitSet(final String key, final int bitIndex, final int lengthInBits) 
{
   // [sanity check]
   return BITS[key.charAt(bitIndex >> 3)][bitIndex & 7]; // BITS[i][j] -> if the j-th bit of the upper-case version of i is set
}

This is called between 1-8 times per character in a look-up to reach the longest matching prefix in the trie (return false means left, true means right). The upper-bound is Nodes times, but the average is log(Nodes) times.

After the closest node is found, it should be checked. This is done with regionMatches between the first mismatching character till the trailing spaces.
Unless BITS (which is a boolean[][]) can be optimized, I am afraid this is the best we can achieve here to support both case-insensitivity and rtrimming in a Patricia Trie.

Results

HashMap

Of course, I had to give HashMap a shot. Impressively, it set a really low standard of 118 ms.
  • Lower + r-trim: 51ms
  • Look-up: 67ms

This made me think that out problem here was the .intern. In fact, removing it and using HashMap cut down the time from 250ms to 118ms. Our issue was interning, causing a ~133ms overhead.. The results now on are only fine-tuning of these 118ms, but certainly not that big as we got after removing .intern. Doing a look-up into the string pool 558K times wasn't that fast. Also, we risked interning string patterns which were not in the map, so this was leaking random strings into the string pool.

Patricia Trie

This is a test with Patricia Trie without any changes. This uses the right implementation from apache.commons4.
  • Lower + r-trim: 52ms
  • Look-up: 88 ms

Of course, this approach fails as long as it doesn't succeed to surpass the simple HashMap. The whole idea of using a Patricia Trie is to have access to its internals and do the character comparison in ignore-case. Also, this allows rtrimming. It is good to see however what are the implications of using the Patricia Trie straight-away and what is the optimization degree afterwards.

Optimized Patricia Trie

Reducing 8bit to 16bit wasn't that relevant considering the path compression technique. I replaced s1.compareTo(s2) == 0 with s1.equals(s2) (or for that matter, equalsIgnoreCase). I replaced any division and mod operations with bit-wise operands and implemented support for rtrim. Of course, I've done the checks case-insensitivity. The results:
  • Look-up: 101ms

This is better than the HashMap and way better than the initial IdentityHashMap. However, it has a really big limitation. I had to copy-paste the apache.commons4 source code to be able to extend it. As said, AbstractPatriciaTrie is not public and some methods from AbstractBitwiseTrie were protected, so I couldn't just extend. In a ideal case, I should have been able to extend AbstractPatriciaTrie, assigning it another StringKeyAnalyzer which does the whole case-sensitive/rtrim job.

CaseInsensitiveMap

We already have an implementation of such map. I extended to conditionally support rtrim.
  • Look-up: 98ms

This is the fastest solution profiled by now. It is close to the Optimized Patricia Trie, but it is cleaner. For that matter, I would go with such replacement in DmoMeta.byLegacyName for a total of -60% improvement (-16% comparing to the HashMap).

Conclusion

Even though I had high hopes with the Patricia Trie, I have to admit that using CaseInsensitiveMap is the best way to go here. I had several performance tests in which I've got better results with Patricia Trie comparing to CaseInsensitiveMap, but all were of a really slight difference (< 5%). In the tested POC I have better results with CaseInsensitiveMap (~3%). Also, the integration of Patricia Trie is not that clean due to the need of a better interface with apache.common4 (allowing us to extend it easily and not integrate code as it is).

The goal of this task was to optimize DmoMeta.byLegacyName, but the bottleneck was related to .intern rather than the data structure used. Even so, we can agree that 16% optimization is for us to take with CaseInsensitiveMap, beside the whole .intern removal.

If there are no other suggestions here, I will go ahead committing CaseInsensitiveMap inside DmoMeta.byLegacyName to 7026d.

#4 Updated by Alexandru Lungu about 1 year ago

Committed the switch to CaseInsensitiveHashMap in DmoMeta.byLegacyName with 7026d/rev. 14565.

#5 Updated by Alexandru Lungu about 1 year ago

  • Status changed from Review to Test

7026d was merged to trunk as rev. 14587. This can be closed.

#6 Updated by Greg Shah about 1 year ago

  • Status changed from Test to Closed
  • Assignee set to Alexandru Lungu

#7 Updated by Eric Faulhaber about 1 year ago

  • Subject changed from Trie implementation to support case insensitive and rtimmed string structures to optimize DmoMeta.byLegacyName

Also available in: Atom PDF