Feature #7363
Improve H2 Value caching, hashing and equals
100%
Related issues
History
#1 Updated by Alexandru Lungu about 1 year ago
- Assignee set to Ștefan Roman
Recently we enabled support for case-insensitive character values in H2 using ValueStringIgnoreCase
.
However, ValueStringIgnoreCase.hashCode
pops up as a performance issue. This is because toUpperCase
may instantiate another String
if the target is not already upper-cased. Anyway, the current implementation traverses the string target twice (once for toUpperCase
and once for hashCode
).
In FWD, we handled this using StringHelper.hashCodeCaseInsensitive
which computes the hash directly, without toUpperCase
. The same should be done in H2. Implement the UPPER_CASE
array and hashCodeCaseInsensitive
function in org.h2.util.StringUtils
. Use this approach in ValueStringIgnoreCase
.
I created 7363a_h2.
#2 Updated by Alexandru Lungu about 1 year ago
The same issue we have in FWD, P2JField.computeHash
. Please use hashCodeCaseInsensitive
there, instead of .toLowerCase().hashCode()
.
#3 Updated by Alexandru Lungu about 1 year ago
One last comment here. If you already plan to do some changes to ValueStringIgnoreCase
, also consider optimizing ValueStringIgnoreCase.get
. This is called thousand of times and pops up in the profiler.
- H2 does some caching on String instances (much like Java does with String Pool). However,
StringUtils.cache
works as a fixed-size String Pool with eviction. This way, H2 will help the performance by recycling the same String instances as much as possible. If we "play" with the sameString
instances over and over again, thena == b
is more probable in equality checks, soa.equals(b)
is faster. - H2 does the same "caching" technique for
ValueStringIgnoreCase
usingValue.cache
. This cache allows us to recycleValueStringIgnoreCase
instances if they store the sameString
.
This whole caching overhead may take some time to execute. My concern is that ValueStringIgnoreCase
will hit the cache case-insensitively (as ValueStringIgnoreCase.equals
is case-insensitive); so "aBc" will match the cached "abc". For this matter, H2 will do a second equals
check to see if the cached entry has the same case as the look-up key. This basically means two equals
.
- Easy, but slower: do the final equality check between
cache.value
andobj.value
(instead ofs
) asobj.value
has a higher chance to be a cached string, so.equals
may be faster - Hash, but faster: change/overload
Value.cache
to extract from the cache case-sensitively whenValueStringIgnoreCase
is used. Basically, check inValue.cache
the type of the operands (look-up key / cache entry) and if they are bothSTRING_IGNORECASE
, cast toValueStringIgnoreCase
and do something likeequalsCaseSensitive
. This will avoid the secondequals
inValueStringIgnoreCase
.
#4 Updated by Ștefan Roman about 1 year ago
- % Done changed from 0 to 50
- Status changed from New to WIP
I added hashCodeCaseInsensitive
in StringUtils and used it in ValueStringIgnoreCase.hashCode
.
I also changed P2JField.computeHash
to use hashCodeCaseInsensitive
.
Lastly, I changed Value.cache
so now it will check if the values are equal only if the types are STRING_IGNORECASE
, and removed the equals
in ValueStringIgnoreCase.get
because Value.cache
will check and return the appropriate value.
Commited to 7363a_h2 revision 18.
#5 Updated by Alexandru Lungu about 1 year ago
- % Done changed from 50 to 100
- Status changed from WIP to Review
- Assignee changed from Ștefan Roman to Eric Faulhaber
- vendor_id deleted (
GCD)
Review of 7363a_h2 revision 18
I am OK with the changes.
Profiled and got a -0.6% improvement on a customer application. Tests pass successfully.
Merged 7363a_h2 to FWD-H2 trunk as rev. 18. Archived 7363a_h2.
This can be closed.
#6 Updated by Alexandru Lungu about 1 year ago
- Assignee changed from Eric Faulhaber to Ștefan Roman
#7 Updated by Alexandru Lungu about 1 year ago
- vendor_id set to GCD
#8 Updated by Alexandru Lungu about 1 year ago
- % Done changed from 100 to 50
- Status changed from Review to WIP
I also changed P2JField.computeHash to use hashCodeCaseInsensitive.
Stefan, where did you committed this change?
Please check-out 7026d and commit the fix there (that branch is performance-oriented; we gather performance patched there usually).
The P2JField.computeHash
change is required asap (by the end of this day).
FWD-H2 value cache optimization¶
Further, we can take a step further here regarding H2 value caching. This is not as urgent as the previous matter.
Problem
We still create a ValueStringIgnoreCase
instance with new
each time we call get
. This takes time on memory allocation and garbage collector's de-allocation. We need to make sure we create a new instance only if we don't find a suitable one in the cache already. Currently, we use a unified Value.cache(Value)
, which allows caching all kind of types in the same structure (softCache
).
For ValueStringIgnoreCase
, we have ~500k get
calls on a customer application. Most of them are for the non-empty string, so new
is triggered.
Let some H2 values check the cache faster, without using the unified
Value.cache(Value)
interface. I am thinking of Value.cache(int type, Object lowLevelValue, int hashCode)
- retrieve the cached instance of type type
that wraps lowLevelValue
. This will overload the existing Value.cache
and it will work almost the same way:
- check if the bucket is filled with an entry of the same type
- if yes, do a low level comparison between
lowLevelValue
and the underlying value of the cached object. We will need alowLevelEquals(Object obj)
method onValue
which returnsfalse
by default. For instance,ValueStringIgnoreCase
will implementlowLevelEquals
which checks if the value equalslowLevelValue
directly.- if values are equal, just return the cached instance and that is all
- if values are not equal, this is a cache miss
- if no, this is a cache miss
- if yes, do a low level comparison between
- in case of a cache miss, create an
overrideCache(Value obj, int hashCode)
. This will be called with anew
instance ofValueStringIgnoreCase
which will simply override the bucket.
Note that you can compute the hashCode only once and this should match the hash of the ValueStringIgnoreCase
; use it to identify the proper bucket in the cache.
Extension
If this works properly, we can use the same technique for other types as well (ValueLong
, ValueString
, etc.). I can't tell if this will bring a major performance boost, but it is worth a try. I will do a profile once this is finished.
I created 7363b_h2 for any related work.
#9 Updated by Ștefan Roman about 1 year ago
I changed toLowerCase().hashCode()
into StringHelper.hashCodeCaseInsensitive()
in P2JField.computeHash()
.
Commited to 7026d revision 14575.
#10 Updated by Alexandru Lungu about 1 year ago
Ștefan, please provide a (comprehensive) report on your progress here. Include what have been implemented and what is left. Please share here any serious impediments on the way, if any.
I would also like to have a testing and custom profiling round at the end. That is, run testFWD
and a large customer application at least.
Please compute a table here (use Redmine's hypertext feature) with:
- every value type from H2
- if it uses the value cache (
Value.cache
). If yes, if you've already refactored it / going to refactor it / can't refactor it due to a logical constraint. - number of
new
calls in<value-type>.get
. For each call, you can tell if it can be optimized or not. - if it overrides
lowLevelEquals
: already written the code for it / planning to do so / can't use a low level implementation here. - what hashing are you using for the cache look-up: the same hash technique as for the value / a custom different hash of the low-level value / can't compute the hash over a low level value to use with the cache.
You snapshot your work by now in this table. Edit it as you implement further.
ProfilingAfter completion, please make a Java test stressing out H2 with numerous values of different kinds (close to all available in H2, but most importantly the ones used by FWD). Run before / after changes. Use prepared statements to avoid allocating new resources for each statement.
I would expect:
- Less objects allocated on the heap in total (check with VisualVM).
- Faster execution time due to faster cache look-up (check with
System.nanoTime
). - Correct results: make sure you end up with a proper result in regard to your expectation.
- A complete profiling report for each value on the table above. Thus, track how much time was spent before / after your changes in each
<value-type>.get
. This should go in an extra columns of the table:Time before
,Time after
,Diff
.
Make sure each profiling result is computed by averaging several (usually 3) individual runs. Each run should do ~10 iterations (a for loop with 10 iterations basically). The time of a run is the average of the 10 iterations, excluding the cold one (the first). Also, make sure you compute of profiling test that takes ~10 seconds in total, so that any fluctuation is way more visible.
Please make this profiling test easy to use as we may want to share it and re-use it in the future. Therefore, consider implementing a nice small Java "Value profiler" that allows us to check how fast is H2 handling internal values. Note that for now the whole per-value profiling can't be done externally (as it needs changes to the H2 internals to trace <value-type>get
). Therefore, profile only an overall time.
After this is completed, we prefer to integrate the profiling tool into H2 so we can run it from the CLI (./build.sh benchmark value-test
).
#11 Updated by Ștefan Roman about 1 year ago
- % Done changed from 50 to 70
I started with the ValueStringIgnoreCase
and everything worked like you suggested. Value.cache(int type, Object lowLevelValue, int hashCode)
and overrideCache(Value obj, int hashCode)
seem to do the job, I still have to do a more detailed profiling to make sure that indeed new
is called less times.
I changed Value.cache
in almost all types that used a similar approach.
Here is the table with the changes that I already made :
Type | Use Value.cache(TYPE, lowLevelValue, hashCode) | Reason | lowLevelEquals | Hash | Time with old Value.cache | Time with new Value.cache | Time difference |
---|---|---|---|---|---|---|---|
Array | No | get has multiple paramteres |
return false; |
||||
Boolean | No | get doesen`t call new |
return false; |
||||
Byte | Yes | return obj instanceof Byte && value = (byte) obj; |
value |
0,992 s | 0,912 s | -0.08 s | |
Bytes | Yes | return this.value.equals(obj); |
Utils.getByteArrayHash(value) |
1,024 s | 0,937 s | -0,087 s | |
CollectionBbase | No | Doesn`t use Value.cache(Value) |
The class is abstract so there is no need to override it | ||||
Date | Yes | return obj instanceof Long && this.dateValue = (long) obj; |
(int) (value ^ (value >>> 32)) |
1,457 s | 1,462 s | +0,005 s | |
Decimal | Yes | return this.value.equals(obj); |
BigInteger.hashCode |
1,057 s | 1,094 s | +0,037 s | |
Double | Yes | return obj instanceof Double && this.value = (double) obj; |
Double.doubleToRawLongBits(value)@; return (int) (hash ^ (hash >>> 32)); |
1,311 s | 1,326 s | +0,015 s | |
Enum | No | It does not extend Value |
|||||
EnumBase | No | get has multiple parameters |
return false |
||||
Float | Yes | return obj instanceof Float && this.value = (float) obj; |
Float.floatToRawIntBits(value); |
0,935 s | 0,912 s | -0,023 s | |
Geometry | No | get has multiple parameters |
return false; |
||||
Int | No | It is using ValueInt.DYNAMIC_CACHE / ValueInt.STATIC_CACHE |
return false; |
||||
Interval | No | Does not use get and have multiple paramteters on similar functions |
return false; |
||||
JavaObject | No | It does not extend Value |
|||||
Json | No | get has multiple parameters |
return false; |
||||
Lob | No | Doesn`t use Value.cache(Value) |
return false; |
||||
LobDb | No | Doesn`t use Value.cache(Value) |
return false; |
||||
Long | Yes | return obj instanceof Long && this.value = (long) obj; |
(int) (value ^ (value >> 32)) |
0,991 s | 1,02 s | +0,029 s | |
Null | No | Doesn`t use Value.cache(Value) |
return false; |
||||
ResultSet | No | Doesn`t use Value.cache(Value) |
return false; |
||||
Row | No | Doesn`t use Value.cache(Value) |
return false; |
||||
Short | Yes | return obj instanceof Short && this.value = (short) obj; |
value |
1,078 s | 0,99 s | -0,088 s | |
String | Yes | return this.value.equals(obj); |
ValueString.hashCode() |
0,811 s | 0,808 s | -0.003 s | |
StringFixed | Yes | return this.value.equals(obj); |
String.hashCode() |
1,161 s | 1,144 s | -0,017 s | |
StringIgnoreCase | Yes | return this.value.equals(obj); |
StringUtils.hashCodeCaseInsensitive(value)() |
1,334 s | 1,032 s | -0,302 s | |
Time | Yes | return obj instanceof Long && this.nanos = (long) obj |
(int) (value ^ (value >>> 32)) |
1,282 s | 1,298 s | +0,016 s | |
Timestamp | Yes | return obj1 instanceof Long && obj2 instanceof Long && this.dateValue = (long) obj1 && this.timeNanos = (long) obj2; |
(int) (dateValue ^ (dateValue >>> 32) ^ timeNanos ^ (timeNanos >>> 32)) |
1,167 s | 1,189 s | +0,022 s | |
TimestampTimezone | No | get has multiple parameters |
return false; |
||||
TimeTimezone | Yes | return obj1 instanceof Long && obj2 instanceof Integer && this.nanos = (long) obj1 && this.timeZoneOffsetSeconds = (int) obj2; |
(int) (nanos ^ (nanos >>> 32) ^ timeZoneOffsetSeconds) |
2,388 s | 2,871 s | + 0,483 s | |
Uuid | Yes | return obj1 instanceof Long && obj2 instanceof Long && this.high = (long) obj1 && this.low = (long) obj2; |
(int) ((high >>> 32) ^ high ^ (low >>> 32) ^ low) |
2,268 s | 2,093 s | -0,175 s |
#12 Updated by Alexandru Lungu about 1 year ago
- Subject changed from Improve H2 ValueStringIgnoreCase.hashCode to avoid the use of toUpperCase to Improve H2 Value caching, hashing and equals
Some remarks:
Bytes
,String
andDate
as I understand are still WIPDate
has an interesting static API. I don't think we can do the caching over the other inputs (afterString
orTimeZone
). Stick on optimizingDate.fromDateValue
. In the future, we may want to check how we can optimize thisDate
for FWD's use.- Note that
Decimal
has twoget
methods. Consider caching thehashCode
inValueDecimal
. For small numbers, this is fast. However, decimals with several digits may have a consuminghashCode
. JavaObject
extendsValueBytes
that extendsValue
. Even so, I don't think it can be optimized due to several parameters being used.Time
usesfromNanos
that usesValue.cache
.fromNanos
has only one parameter, namelylong nanos
. This is the only low-level value stored byTime
. Please consider it for improving.
I agree with the rest. However, it really feels like we can cover several more values with a two-parameter cache look-up. That is, override Value.cache(int type, Object firstLowLevelValue, Object secondLowLevelValue, int hashCode)
and lowLevelEquals(Object firstLowLevelValue, Object secondLowLevelValue)
. This way we cover UUid
, ValueTimeTimeZone
, ValueTimeStamp
, ValueEnumBase
.
Finally, ValueTimestampTimeZone
is used across FWD-H2 to measure different internal times. However, this kind of value is usually build using System.currentTimeMillis
. Such value will certainly miss cache, so we can avoid the whole caching look-up for this. We will discuss it later.
#13 Updated by Ștefan Roman about 1 year ago
I updated the table according to the new changes.
I changed get
method in ValueBytes
and ValueString
, fromNanos
in ValueTime
and fromDateValue
in ValueDate
.
#14 Updated by Alexandru Lungu about 1 year ago
Ștefan, please prepare a commit on 7363b_h2 with your latest changes. Make sure you update #7363-11 accordingly. Also, post a status on your progress so far. I am ready to to review/test/profile a functional version of your changes by tomorrow EOD.
#15 Updated by Ștefan Roman about 1 year ago
I made a profiler to test the new caching method.
I updated the table with the average time of 5 runs for each value type. The total time of the new caching method (21,255 s) is smaller than the old one (21,637 s).TimeTimezone
test result is strange, is ~20% slower, I will make further investigations for that particular value.
I will commit 7363b_h2 ASAP, i want to test the changes with a large application before committing.
#16 Updated by Ștefan Roman about 1 year ago
Committed to 7363b_h2 revision 20.
#17 Updated by Alexandru Lungu about 1 year ago
Review of 7363b_h2 revision 20.
- Use space after
if
keyword to separate from(
(according to H2 standard) - Use
else
on the same line as closing brace}
(according to H2 standard) - In fact, there is no need for
else
branch; please use one single fat if conditional; don't extractcachedType
- Please override the hash of the object if already computed. For example, in
ValueBytes
, as you already computedhash
, you can assign it to the newly created object (obj.hash = hash
). This avoids a second hash computation for the object when first used. Unfortunately, this slow-down couldn't have been catch by your profiling. - For
ValueFloat
andValueDouble
, you slightly changed the equals approach. H2 usesDouble.compare(a, b) 0
andFloat.compare(a, b) 0
. Please do the same, instead of using==
. - In
ValueString
, you are usingStringUtils.cache
twice. No need for the second cache check. - In
ValueStringFixed
andValueStringIgnoreCase
, it is fine to letlowLevelEquals
be inherited; no need to redefine.
In general, obj instanceof Long && this.nanos == (long) obj
doesn't feel like a safe comparison. Most probably it will work as (long)
casting will unbox Long
, but as long as you do an instanceof Long
, I expect the cast to be done on Long
, not long
. An ideal solution is obj instanceof Long && this.nanos = ((Long) obj).longValue()
. This is the same code JDK uses behind the scenes on equals
), so I guess it is better tested. The same goes anywhere you do such primitive comparisons (i.e. Long, Integer, etc.).
#18 Updated by Ștefan Roman about 1 year ago
I added the changes.
Commited to 7363b_h2 revision 21.
#19 Updated by Alexandru Lungu about 1 year ago
Review of 7363b_h2 revision 21
ValueByte.lowLevelEquals
should use((Byte) obj).byteValue()
to stay consistent with other implementations.ValueBytes.lowLevelEquals
is not right: you are usingthis.value.equals(obj)
instead of the currentValueBytes.equals
implementationArrays.equals(value, ((ValueBytes) other).value)
. Please change.- Please note that H2 uses different hash-code computation for byte arrays of over/under 50 elements.
- Please extend you test-cases with byte arrays over/under 50 elements so that you trigger different
Utils.getByteArrayHash
computation.
ValueDate
/ValueLong
/ValueTime
/ others: I think the cognitive complexity ofLong.compare(this.dateValue, ((Long) obj).longValue()) 0
is too high. Please fall-back tothis.dateValue ((Long) obj).longValue()
. Doingcompare
makes sense forDouble
andFloat
because == is subject to precision issues, whilecompare
does a better job by retrieving more complex representations of the numeric data. Please let onlyDouble
andFloat
use compare.ValueDecimal
: the implementation here mismatches the original implementation:- use a space between if and (
- the
get(BigInteger)
is not querying the cache with aBigDecimal
, but with aBigInteger
. This will always fail and miss the cache! Please create anew BigDecimal(bigInteger)
and use it for hash, cache look-up and instantiation. Otherwise,lowLevelEquals
will always try to compare aBigDecimal
with aBigInteger
and it will fail. - Please extend your test-cases to cover this case: assign
BigInteger
toBigDecimal
field.
ValueString
doesn't useString.hashCode
, but a custom faster implementation of string hashes which can be found inValueString.hashCode
. This should be used on cache look-up.- Move the hash computation method in a separate method (
protected static int computeHashCode(String value)
. Use it inValueString.hashCode
andValueString.get
- Especially when you do
obj.hash = hash
you assign a differently computed hash comparing to the one computed byValueString.hashCode
- Please extend your test-cases with strings over/under 16 characters so that you trigger different
ValueString.hashCode
computation.
- Move the hash computation method in a separate method (
- The same goes for
ValueStringFixed
; please useValueString.computeHashCode
for that matter, notString.hashCode
- You are missing
obj.hash = hash
forValueStringFixed
; the hash computation is quite heavy for string data
- You are missing
- The same goes for
ValueStringIgnoreCase
; please useValueString.computeHashCode
for that matter, notString.hashCode
- You are missing
obj.hash = hash
forValueStringFixed
; the hash computation is quite heavy for string data
- You are missing
ValueTimeTimezone
performance decrease is really bad. Please take a look into it and try to understand why is this happening. Check that you don't do too much overhead in the caching and that you have the same cache hit as before.
Avoid caching ValueTimestampTimeZone
¶
I see a potential optimization of CurrentTimestamp.get
. This retrieves a ValueTimestampTimeZone
based on System.currentTimeMillis()
. However, this means that the parameters have a really high variance: the chance to have a cache hit is relative to the number of calls per ms. I wonder how many instances of ValueTimestampTimeZone
are created per ms and if it makes sense to cache them. From my POV:
- if there are few (< 2 per ms), we can avoid caching of course
- if there are too many (>100 per ms), we can avoid caching as this will actually pollute our cache. With an ideal hashing technique, the cache will be full of
ValueTimestampTimeZone
in 10s.
- The number of times
CurrentTimestamp.get
is called (in total and in avg. per ms) - The number of
ValueTimestampTimeZone
in the cache once every 100 ms (a ratio considering the total size of the cache) - The total time and avg. per call spent in
CurrentTimestamp.get
(with vs without caching)
#20 Updated by Ștefan Roman about 1 year ago
I added the changes.
Committed to 7363b_h2 revision 22.
#21 Updated by Alexandru Lungu about 1 year ago
Review of 7363b_h2 revision 22
ValueBytes
is not usinginstanceof
to checkobj
of typebyte[]
. This is fatal and should be fixed right away. If some other value with the same hash collides here, H2 will throwClassCastException
.- I am OK with the other changes
Please fix ValueBytes
now. I am planning to do a profiling round asap.
#22 Updated by Ștefan Roman about 1 year ago
Added the change and committed it to 7363b_h2 revision 23.
#23 Updated by Alexandru Lungu about 1 year ago
- Related to Feature #7447: Compare ValueString and ValueStringIgnoreCase faster in FWD-H2 added
#24 Updated by Alexandru Lungu about 1 year ago
- Related to Bug #7448: Optimize FWD-H2 ValueTimestampTimeZone and maybe avoid caching added
#25 Updated by Alexandru Lungu about 1 year ago
- Status changed from WIP to Review
- % Done changed from 70 to 100
#26 Updated by Alexandru Lungu about 1 year ago
- Status changed from Review to Test
Completely tested - everything looks right.
I tested a large customer application and I got:- 425.568 cache hits (1 low level value)
- 209.659 cache misses (1 low level value)
- therefore, a ratio of ~66% (1 low level value)
- 212 cache hits (2 low level values)
- 748 cache misses (2 low level values)
- therefore, a ratio of ~12% (2 low level values)
I've done some memory tracking with/without your modifications. The changes are quite slim; the biggest memory spike had 50MB less than before for a single-user.
Profiling shows modest improvement: around -0.16% at best.
Merged into trunk as rev. 22.
The changes from #7363 are more prominent when the cache ratio is better (or even when the cache is bigger). Therefore, #7448 can benefit from these changes.
#27 Updated by Alexandru Lungu 11 months ago
- Assignee changed from Ștefan Roman to Alexandru Lungu
#28 Updated by Alexandru Lungu 29 days ago
FWD-H2 is not at rev. 48 in trunk. This can be closed.