Project

General

Profile

Feature #7363

Improve H2 Value caching, hashing and equals

Added by Alexandru Lungu about 1 year ago. Updated 29 days ago.

Status:
Closed
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
version_reported:
version_resolved:

Related issues

Related to Database - Feature #7447: Compare ValueString and ValueStringIgnoreCase faster in FWD-H2 Closed
Related to Database - Bug #7448: Optimize FWD-H2 ValueTimestampTimeZone and maybe avoid caching Rejected

History

#1 Updated by Alexandru Lungu about 1 year ago

  • Assignee set to Ștefan Roman

Recently we enabled support for case-insensitive character values in H2 using ValueStringIgnoreCase.

However, ValueStringIgnoreCase.hashCode pops up as a performance issue. This is because toUpperCase may instantiate another String if the target is not already upper-cased. Anyway, the current implementation traverses the string target twice (once for toUpperCase and once for hashCode).

In FWD, we handled this using StringHelper.hashCodeCaseInsensitive which computes the hash directly, without toUpperCase. The same should be done in H2. Implement the UPPER_CASE array and hashCodeCaseInsensitive function in org.h2.util.StringUtils. Use this approach in ValueStringIgnoreCase.

I created 7363a_h2.

#2 Updated by Alexandru Lungu about 1 year ago

The same issue we have in FWD, P2JField.computeHash. Please use hashCodeCaseInsensitive there, instead of .toLowerCase().hashCode().

#3 Updated by Alexandru Lungu about 1 year ago

One last comment here. If you already plan to do some changes to ValueStringIgnoreCase, also consider optimizing ValueStringIgnoreCase.get. This is called thousand of times and pops up in the profiler.

There are two theoretical features there:
  • H2 does some caching on String instances (much like Java does with String Pool). However, StringUtils.cache works as a fixed-size String Pool with eviction. This way, H2 will help the performance by recycling the same String instances as much as possible. If we "play" with the same String instances over and over again, then a == b is more probable in equality checks, so a.equals(b) is faster.
  • H2 does the same "caching" technique for ValueStringIgnoreCase using Value.cache. This cache allows us to recycle ValueStringIgnoreCase instances if they store the same String.

This whole caching overhead may take some time to execute. My concern is that ValueStringIgnoreCase will hit the cache case-insensitively (as ValueStringIgnoreCase.equals is case-insensitive); so "aBc" will match the cached "abc". For this matter, H2 will do a second equals check to see if the cached entry has the same case as the look-up key. This basically means two equals.

Possible solution:
  • Easy, but slower: do the final equality check between cache.value and obj.value (instead of s) as obj.value has a higher chance to be a cached string, so .equals may be faster
  • Hash, but faster: change/overload Value.cache to extract from the cache case-sensitively when ValueStringIgnoreCase is used. Basically, check in Value.cache the type of the operands (look-up key / cache entry) and if they are both STRING_IGNORECASE, cast to ValueStringIgnoreCase and do something like equalsCaseSensitive. This will avoid the second equals in ValueStringIgnoreCase.

#4 Updated by Ștefan Roman about 1 year ago

  • % Done changed from 0 to 50
  • Status changed from New to WIP

I added hashCodeCaseInsensitive in StringUtils and used it in ValueStringIgnoreCase.hashCode.
I also changed P2JField.computeHash to use hashCodeCaseInsensitive.
Lastly, I changed Value.cache so now it will check if the values are equal only if the types are STRING_IGNORECASE, and removed the equals in ValueStringIgnoreCase.get because Value.cache will check and return the appropriate value.

Commited to 7363a_h2 revision 18.

#5 Updated by Alexandru Lungu about 1 year ago

  • % Done changed from 50 to 100
  • Status changed from WIP to Review
  • Assignee changed from Ștefan Roman to Eric Faulhaber
  • vendor_id deleted (GCD)

Review of 7363a_h2 revision 18
I am OK with the changes.

Profiled and got a -0.6% improvement on a customer application. Tests pass successfully.
Merged 7363a_h2 to FWD-H2 trunk as rev. 18. Archived 7363a_h2.

This can be closed.

#6 Updated by Alexandru Lungu about 1 year ago

  • Assignee changed from Eric Faulhaber to Ștefan Roman

#7 Updated by Alexandru Lungu about 1 year ago

  • vendor_id set to GCD

#8 Updated by Alexandru Lungu about 1 year ago

  • % Done changed from 100 to 50
  • Status changed from Review to WIP

I also changed P2JField.computeHash to use hashCodeCaseInsensitive.

Stefan, where did you committed this change?
Please check-out 7026d and commit the fix there (that branch is performance-oriented; we gather performance patched there usually).
The P2JField.computeHash change is required asap (by the end of this day).

FWD-H2 value cache optimization

Further, we can take a step further here regarding H2 value caching. This is not as urgent as the previous matter.

Problem
We still create a ValueStringIgnoreCase instance with new each time we call get. This takes time on memory allocation and garbage collector's de-allocation. We need to make sure we create a new instance only if we don't find a suitable one in the cache already. Currently, we use a unified Value.cache(Value), which allows caching all kind of types in the same structure (softCache).

For ValueStringIgnoreCase, we have ~500k get calls on a customer application. Most of them are for the non-empty string, so new is triggered.

Goal
Let some H2 values check the cache faster, without using the unified Value.cache(Value) interface. I am thinking of Value.cache(int type, Object lowLevelValue, int hashCode) - retrieve the cached instance of type type that wraps lowLevelValue. This will overload the existing Value.cache and it will work almost the same way:
  • check if the bucket is filled with an entry of the same type
    • if yes, do a low level comparison between lowLevelValue and the underlying value of the cached object. We will need a lowLevelEquals(Object obj) method on Value which returns false by default. For instance, ValueStringIgnoreCase will implement lowLevelEquals which checks if the value equals lowLevelValue directly.
      • if values are equal, just return the cached instance and that is all
      • if values are not equal, this is a cache miss
    • if no, this is a cache miss
  • in case of a cache miss, create an overrideCache(Value obj, int hashCode). This will be called with a new instance of ValueStringIgnoreCase which will simply override the bucket.

Note that you can compute the hashCode only once and this should match the hash of the ValueStringIgnoreCase; use it to identify the proper bucket in the cache.

Extension
If this works properly, we can use the same technique for other types as well (ValueLong, ValueString, etc.). I can't tell if this will bring a major performance boost, but it is worth a try. I will do a profile once this is finished.

I created 7363b_h2 for any related work.

#9 Updated by Ștefan Roman about 1 year ago

I changed toLowerCase().hashCode() into StringHelper.hashCodeCaseInsensitive() in P2JField.computeHash().

Commited to 7026d revision 14575.

#10 Updated by Alexandru Lungu about 1 year ago

Ștefan, please provide a (comprehensive) report on your progress here. Include what have been implemented and what is left. Please share here any serious impediments on the way, if any.
I would also like to have a testing and custom profiling round at the end. That is, run testFWD and a large customer application at least.

Status
Please compute a table here (use Redmine's hypertext feature) with:
  • every value type from H2
  • if it uses the value cache (Value.cache). If yes, if you've already refactored it / going to refactor it / can't refactor it due to a logical constraint.
  • number of new calls in <value-type>.get. For each call, you can tell if it can be optimized or not.
  • if it overrides lowLevelEquals: already written the code for it / planning to do so / can't use a low level implementation here.
  • what hashing are you using for the cache look-up: the same hash technique as for the value / a custom different hash of the low-level value / can't compute the hash over a low level value to use with the cache.

You snapshot your work by now in this table. Edit it as you implement further.

Profiling
After completion, please make a Java test stressing out H2 with numerous values of different kinds (close to all available in H2, but most importantly the ones used by FWD). Run before / after changes. Use prepared statements to avoid allocating new resources for each statement.
I would expect:
  • Less objects allocated on the heap in total (check with VisualVM).
  • Faster execution time due to faster cache look-up (check with System.nanoTime).
  • Correct results: make sure you end up with a proper result in regard to your expectation.
  • A complete profiling report for each value on the table above. Thus, track how much time was spent before / after your changes in each <value-type>.get. This should go in an extra columns of the table: Time before, Time after, Diff.

Make sure each profiling result is computed by averaging several (usually 3) individual runs. Each run should do ~10 iterations (a for loop with 10 iterations basically). The time of a run is the average of the 10 iterations, excluding the cold one (the first). Also, make sure you compute of profiling test that takes ~10 seconds in total, so that any fluctuation is way more visible.

Please make this profiling test easy to use as we may want to share it and re-use it in the future. Therefore, consider implementing a nice small Java "Value profiler" that allows us to check how fast is H2 handling internal values. Note that for now the whole per-value profiling can't be done externally (as it needs changes to the H2 internals to trace <value-type>get). Therefore, profile only an overall time.

After this is completed, we prefer to integrate the profiling tool into H2 so we can run it from the CLI (./build.sh benchmark value-test).

#11 Updated by Ștefan Roman about 1 year ago

  • % Done changed from 50 to 70

I started with the ValueStringIgnoreCase and everything worked like you suggested. Value.cache(int type, Object lowLevelValue, int hashCode) and overrideCache(Value obj, int hashCode) seem to do the job, I still have to do a more detailed profiling to make sure that indeed new is called less times.
I changed Value.cache in almost all types that used a similar approach.
Here is the table with the changes that I already made :

Type Use Value.cache(TYPE, lowLevelValue, hashCode) Reason lowLevelEquals Hash Time with old Value.cache Time with new Value.cache Time difference
Array No get has multiple paramteres return false;
Boolean No get doesen`t call new return false;
Byte Yes return obj instanceof Byte && value = (byte) obj; value 0,992 s 0,912 s -0.08 s
Bytes Yes return this.value.equals(obj); Utils.getByteArrayHash(value) 1,024 s 0,937 s -0,087 s
CollectionBbase No Doesn`t use Value.cache(Value) The class is abstract so there is no need to override it
Date Yes return obj instanceof Long && this.dateValue = (long) obj; (int) (value ^ (value >>> 32)) 1,457 s 1,462 s +0,005 s
Decimal Yes return this.value.equals(obj); BigInteger.hashCode 1,057 s 1,094 s +0,037 s
Double Yes return obj instanceof Double && this.value = (double) obj;
Double.doubleToRawLongBits(value)@;
return (int) (hash ^ (hash >>> 32));
1,311 s 1,326 s +0,015 s
Enum No It does not extend Value
EnumBase No get has multiple parameters return false
Float Yes return obj instanceof Float && this.value = (float) obj; Float.floatToRawIntBits(value); 0,935 s 0,912 s -0,023 s
Geometry No get has multiple parameters return false;
Int No It is using ValueInt.DYNAMIC_CACHE / ValueInt.STATIC_CACHE return false;
Interval No Does not use get and have multiple paramteters on similar functions return false;
JavaObject No It does not extend Value
Json No get has multiple parameters return false;
Lob No Doesn`t use Value.cache(Value) return false;
LobDb No Doesn`t use Value.cache(Value) return false;
Long Yes return obj instanceof Long && this.value = (long) obj; (int) (value ^ (value >> 32)) 0,991 s 1,02 s +0,029 s
Null No Doesn`t use Value.cache(Value) return false;
ResultSet No Doesn`t use Value.cache(Value) return false;
Row No Doesn`t use Value.cache(Value) return false;
Short Yes return obj instanceof Short && this.value = (short) obj; value 1,078 s 0,99 s -0,088 s
String Yes return this.value.equals(obj); ValueString.hashCode() 0,811 s 0,808 s -0.003 s
StringFixed Yes return this.value.equals(obj); String.hashCode() 1,161 s 1,144 s -0,017 s
StringIgnoreCase Yes return this.value.equals(obj); StringUtils.hashCodeCaseInsensitive(value)() 1,334 s 1,032 s -0,302 s
Time Yes return obj instanceof Long && this.nanos = (long) obj (int) (value ^ (value >>> 32)) 1,282 s 1,298 s +0,016 s
Timestamp Yes
return obj1 instanceof Long &&
    obj2 instanceof Long &&
    this.dateValue = (long) obj1 &&
    this.timeNanos = (long) obj2;
(int) (dateValue ^ (dateValue >>> 32) ^ timeNanos ^ (timeNanos >>> 32)) 1,167 s 1,189 s +0,022 s
TimestampTimezone No get has multiple parameters return false;
TimeTimezone Yes
return obj1 instanceof Long &&
    obj2 instanceof Integer &&
    this.nanos = (long) obj1 &&
    this.timeZoneOffsetSeconds = (int) obj2;
(int) (nanos ^ (nanos >>> 32) ^ timeZoneOffsetSeconds) 2,388 s 2,871 s + 0,483 s
Uuid Yes
return obj1 instanceof Long &&
    obj2 instanceof Long &&
    this.high = (long) obj1 &&
    this.low = (long) obj2;
(int) ((high >>> 32) ^ high ^ (low >>> 32) ^ low) 2,268 s 2,093 s -0,175 s

#12 Updated by Alexandru Lungu about 1 year ago

  • Subject changed from Improve H2 ValueStringIgnoreCase.hashCode to avoid the use of toUpperCase to Improve H2 Value caching, hashing and equals

Some remarks:

  • Bytes, String and Date as I understand are still WIP
  • Date has an interesting static API. I don't think we can do the caching over the other inputs (after String or TimeZone). Stick on optimizing Date.fromDateValue. In the future, we may want to check how we can optimize this Date for FWD's use.
  • Note that Decimal has two get methods. Consider caching the hashCode in ValueDecimal. For small numbers, this is fast. However, decimals with several digits may have a consuming hashCode.
  • JavaObject extends ValueBytes that extends Value. Even so, I don't think it can be optimized due to several parameters being used.
  • Time uses fromNanos that uses Value.cache. fromNanos has only one parameter, namely long nanos. This is the only low-level value stored by Time. Please consider it for improving.

I agree with the rest. However, it really feels like we can cover several more values with a two-parameter cache look-up. That is, override Value.cache(int type, Object firstLowLevelValue, Object secondLowLevelValue, int hashCode) and lowLevelEquals(Object firstLowLevelValue, Object secondLowLevelValue). This way we cover UUid, ValueTimeTimeZone, ValueTimeStamp, ValueEnumBase.

Finally, ValueTimestampTimeZone is used across FWD-H2 to measure different internal times. However, this kind of value is usually build using System.currentTimeMillis. Such value will certainly miss cache, so we can avoid the whole caching look-up for this. We will discuss it later.

#13 Updated by Ștefan Roman about 1 year ago

I updated the table according to the new changes.
I changed get method in ValueBytes and ValueString, fromNanos in ValueTime and fromDateValue in ValueDate.

#14 Updated by Alexandru Lungu about 1 year ago

Ștefan, please prepare a commit on 7363b_h2 with your latest changes. Make sure you update #7363-11 accordingly. Also, post a status on your progress so far. I am ready to to review/test/profile a functional version of your changes by tomorrow EOD.

#15 Updated by Ștefan Roman about 1 year ago

I made a profiler to test the new caching method.
I updated the table with the average time of 5 runs for each value type. The total time of the new caching method (21,255 s) is smaller than the old one (21,637 s).
TimeTimezone test result is strange, is ~20% slower, I will make further investigations for that particular value.
I will commit 7363b_h2 ASAP, i want to test the changes with a large application before committing.

#16 Updated by Ștefan Roman about 1 year ago

Committed to 7363b_h2 revision 20.

#17 Updated by Alexandru Lungu about 1 year ago

Review of 7363b_h2 revision 20.

  • Use space after if keyword to separate from ( (according to H2 standard)
  • Use else on the same line as closing brace } (according to H2 standard)
  • In fact, there is no need for else branch; please use one single fat if conditional; don't extract cachedType
  • Please override the hash of the object if already computed. For example, in ValueBytes, as you already computed hash, you can assign it to the newly created object (obj.hash = hash). This avoids a second hash computation for the object when first used. Unfortunately, this slow-down couldn't have been catch by your profiling.
  • For ValueFloat and ValueDouble, you slightly changed the equals approach. H2 uses Double.compare(a, b) 0 and Float.compare(a, b) 0. Please do the same, instead of using ==.
  • In ValueString, you are using StringUtils.cache twice. No need for the second cache check.
  • In ValueStringFixed and ValueStringIgnoreCase, it is fine to let lowLevelEquals be inherited; no need to redefine.

In general, obj instanceof Long && this.nanos == (long) obj doesn't feel like a safe comparison. Most probably it will work as (long) casting will unbox Long, but as long as you do an instanceof Long, I expect the cast to be done on Long, not long. An ideal solution is obj instanceof Long && this.nanos = ((Long) obj).longValue(). This is the same code JDK uses behind the scenes on equals), so I guess it is better tested. The same goes anywhere you do such primitive comparisons (i.e. Long, Integer, etc.).

#18 Updated by Ștefan Roman about 1 year ago

I added the changes.

Commited to 7363b_h2 revision 21.

#19 Updated by Alexandru Lungu about 1 year ago

Review of 7363b_h2 revision 21

  • ValueByte.lowLevelEquals should use ((Byte) obj).byteValue() to stay consistent with other implementations.
  • ValueBytes.lowLevelEquals is not right: you are using this.value.equals(obj) instead of the current ValueBytes.equals implementation Arrays.equals(value, ((ValueBytes) other).value). Please change.
    • Please note that H2 uses different hash-code computation for byte arrays of over/under 50 elements.
    • Please extend you test-cases with byte arrays over/under 50 elements so that you trigger different Utils.getByteArrayHash computation.
  • ValueDate / ValueLong / ValueTime / others: I think the cognitive complexity of Long.compare(this.dateValue, ((Long) obj).longValue()) 0 is too high. Please fall-back to this.dateValue ((Long) obj).longValue(). Doing compare makes sense for Double and Float because == is subject to precision issues, while compare does a better job by retrieving more complex representations of the numeric data. Please let only Double and Float use compare.
  • ValueDecimal: the implementation here mismatches the original implementation:
    • use a space between if and (
    • the get(BigInteger) is not querying the cache with a BigDecimal, but with a BigInteger. This will always fail and miss the cache! Please create a new BigDecimal(bigInteger) and use it for hash, cache look-up and instantiation. Otherwise, lowLevelEquals will always try to compare a BigDecimal with a BigInteger and it will fail.
    • Please extend your test-cases to cover this case: assign BigInteger to BigDecimal field.
  • ValueString doesn't use String.hashCode, but a custom faster implementation of string hashes which can be found in ValueString.hashCode. This should be used on cache look-up.
    • Move the hash computation method in a separate method (protected static int computeHashCode(String value). Use it in ValueString.hashCode and ValueString.get
    • Especially when you do obj.hash = hash you assign a differently computed hash comparing to the one computed by ValueString.hashCode
    • Please extend your test-cases with strings over/under 16 characters so that you trigger different ValueString.hashCode computation.
  • The same goes for ValueStringFixed; please use ValueString.computeHashCode for that matter, not String.hashCode
    • You are missing obj.hash = hash for ValueStringFixed; the hash computation is quite heavy for string data
  • The same goes for ValueStringIgnoreCase; please use ValueString.computeHashCode for that matter, not String.hashCode
    • You are missing obj.hash = hash for ValueStringFixed; the hash computation is quite heavy for string data
  • ValueTimeTimezone performance decrease is really bad. Please take a look into it and try to understand why is this happening. Check that you don't do too much overhead in the caching and that you have the same cache hit as before.

Avoid caching ValueTimestampTimeZone

I see a potential optimization of CurrentTimestamp.get. This retrieves a ValueTimestampTimeZone based on System.currentTimeMillis(). However, this means that the parameters have a really high variance: the chance to have a cache hit is relative to the number of calls per ms. I wonder how many instances of ValueTimestampTimeZone are created per ms and if it makes sense to cache them. From my POV:
  • if there are few (< 2 per ms), we can avoid caching of course
  • if there are too many (>100 per ms), we can avoid caching as this will actually pollute our cache. With an ideal hashing technique, the cache will be full of ValueTimestampTimeZone in 10s.
I suggest making a test on this. Please compute:
  • The number of times CurrentTimestamp.get is called (in total and in avg. per ms)
  • The number of ValueTimestampTimeZone in the cache once every 100 ms (a ratio considering the total size of the cache)
  • The total time and avg. per call spent in CurrentTimestamp.get (with vs without caching)

#20 Updated by Ștefan Roman about 1 year ago

I added the changes.

Committed to 7363b_h2 revision 22.

#21 Updated by Alexandru Lungu about 1 year ago

Review of 7363b_h2 revision 22

  • ValueBytes is not using instanceof to check obj of type byte[]. This is fatal and should be fixed right away. If some other value with the same hash collides here, H2 will throw ClassCastException.
  • I am OK with the other changes

Please fix ValueBytes now. I am planning to do a profiling round asap.

#22 Updated by Ștefan Roman about 1 year ago

Added the change and committed it to 7363b_h2 revision 23.

#23 Updated by Alexandru Lungu about 1 year ago

  • Related to Feature #7447: Compare ValueString and ValueStringIgnoreCase faster in FWD-H2 added

#24 Updated by Alexandru Lungu about 1 year ago

  • Related to Bug #7448: Optimize FWD-H2 ValueTimestampTimeZone and maybe avoid caching added

#25 Updated by Alexandru Lungu about 1 year ago

  • Status changed from WIP to Review
  • % Done changed from 70 to 100

Review of 7363b_h2 revision 23

I am OK with the changes.
Most of the testing doesn't show regressions. I am doing the profiling now.
I created #7447 and #7448 for further work.

#26 Updated by Alexandru Lungu about 1 year ago

  • Status changed from Review to Test

Completely tested - everything looks right.

I tested a large customer application and I got:
  • 425.568 cache hits (1 low level value)
  • 209.659 cache misses (1 low level value)
  • therefore, a ratio of ~66% (1 low level value)
  • 212 cache hits (2 low level values)
  • 748 cache misses (2 low level values)
  • therefore, a ratio of ~12% (2 low level values)

I've done some memory tracking with/without your modifications. The changes are quite slim; the biggest memory spike had 50MB less than before for a single-user.
Profiling shows modest improvement: around -0.16% at best.

Merged into trunk as rev. 22.
The changes from #7363 are more prominent when the cache ratio is better (or even when the cache is bigger). Therefore, #7448 can benefit from these changes.

#27 Updated by Alexandru Lungu 11 months ago

  • Assignee changed from Ștefan Roman to Alexandru Lungu

#28 Updated by Alexandru Lungu 29 days ago

FWD-H2 is not at rev. 48 in trunk. This can be closed.

#29 Updated by Greg Shah 29 days ago

  • Status changed from Test to Closed

Also available in: Atom PDF