Project

General

Profile

Feature #6815

configure all cache sizes in the directory, and create documentation for them

Added by Constantin Asofiei over 1 year ago. Updated 4 months ago.

Status:
Review
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
version:

Related issues

Related to Database - Bug #7388: Create server configuration container for cache sizes Test

History

#1 Updated by Constantin Asofiei over 1 year ago

In 6129a, the prepared statement cache in TempTableDataSourceProvider is still configured to 100 - this was not enough, I will increase it in 6129a to 65536. In 3821c, is configured to 1000.

We need to allow configuration of all caches in the directory, plus document them.

#2 Updated by Constantin Asofiei over 1 year ago

This includes the query_cache_size=1024 for temp-tables in H2Helper.setCommonInMemoryProperties.

#3 Updated by Alexandru Lungu 11 months ago

  • Assignee set to Dănuț Filimon

This is already implemented in 7388a.

#4 Updated by Constantin Asofiei 11 months ago

  • Related to Bug #7388: Create server configuration container for cache sizes added

#5 Updated by Constantin Asofiei 11 months ago

Alexandru Lungu wrote:

This is already implemented in 7388a.

We need also the documentation.

#6 Updated by Alexandru Lungu 6 months ago

  • Status changed from New to WIP

Danut, please point out to the documentation in this task and mark it as Review when finished.

#7 Updated by Constantin Asofiei 6 months ago

Alexandru, in #7669, I start seeing DynamicQueryHelper cache level #1 is full. Dumped 1 old entries - (some 750k message logs, this uses the default 65536, there is no config in directory.xml). Is there a way to get info when a cache size is small enough that it starts evicting entries aggresively? Maybe some JMX instrumentation? This may help to find some good values to tune the cache size.

OTOH, for this specific case, it may just be that the parsed predicate is 'one of a kind', in that it uses some unique value which is never reused, so caching it is moot anyway.

#8 Updated by Dănuț Filimon 6 months ago

Alexandru Lungu wrote:

Danut, please point out to the documentation in this task and mark it as Review when finished.

The documentation for the cache sizes is available in Database Configuration. Currently the CacheManager stores and uses the cache sizes defined in the directory configuration, each cache size is retrieved internally when creating caches (LRUCache or Map). If we want to use the CacheManager to set the size for the property mentioned in #6815-2, we can simply make CacheManager.getCacheSize method public. Should I go ahead and do that?

#9 Updated by Alexandru Lungu 6 months ago

OTOH, for this specific case, it may just be that the parsed predicate is 'one of a kind', in that it uses some unique value which is never reused, so caching it is moot anyway.

Maybe we can make them "more of a kind" :) I don't know if this is possible without context.

Is there a way to get info when a cache size is small enough that it starts evicting entries aggresively? Maybe some JMX instrumentation? This may help to find some good values to tune the cache size.

I think we have some specific JMX on some caches ... or maybe not. Now that we use the CacheManager, I see a really neat implementation: add dynamic JMX information for each cache generated by the CacheManager. This way we can instrument all caches for fine tuning. This is not implemented, but can be done. I don't quite have the inspiration now to design and implement this, so I will ask Danut to tackle this.

In my head, having a fat JMX storing a map Cache Name -> Cache Statistics is the way to go. Take for example QueryProfiler that for each query is computing statistics like cache hit / miss / row count etc. I see cache statistics as storing the number of elements / evicts cache hit / etc. To avoid any potential overhead, maybe we can deliver InstrumentedLRUCache from cache manager if we want to do the instrumentation of simply LRUCache if in production. Constantin, any feedback on this?

#10 Updated by Dănuț Filimon 4 months ago

  • Status changed from WIP to Review
  • % Done changed from 0 to 70

Created 6815a and committed rev.14916. As suggested by Alexandru in #6815-9, I've created a way to instrument cache through CacheManager. Similar to QueryProfiler, I created CacheProfiler and instrumented a total of 16 methods (number of calls and total execution time).

We also talked about expanding the CacheManager out of the persistence container if we want to configure caches that are outside of this scope. This includes modifying the searched containers (for cache sizes and instrumentation value) and documentation.

Alexandru, please review and modify the Done% if you think there more work to do. I plan to test a customer application and post the results here.

#11 Updated by Dănuț Filimon 4 months ago

Committed 6815a/rev.14917. Removed unnecessary methods and changes the container accessed for the instrumentation and cache-sizes. The container accessed is at the same level as the persistence container and is called cache-config.

As for the results of the instrumentation, I got the following:
Name GET (count) GET Time (ms) PUT (count) PUT Time (ms)
BufferManager 23964 8.76 297 0.18
DynamicTablesHelper 2357 32.453 306 1.626
FQLHelperCache 2245427 479.051 1984 0.718
FQLPreprocessor 377640 159.429 4082 1.832
FQLPreprocessor ast 26339 22.65 3003 1.208
FastFindCache L2 2116970 92.151 - -
FastFindCache L3 2075153 362.788 1700792 466.015
Persistence 1558366 362.994 14043 17.436
SortCriterion 382860 165.883 7463 1.7
TemporaryBuffer 7894 9.726 277 0.135
Session local/_temp/primary 5048192 5.048 - -
Session local/<db>/meta 25419 2.605 - -
Session local/<db>/primary 2911160 299.126 - -
TempTableDataSourceProvider 4210094 608.273 4064 1.719
SourceNameMapper search 23 0.096 3 0.01
SourceNameMapper source 105954 11.696 - -

There were more methods instrumented, but I just noticed an issue that invalidates some of the results so I solved it in 6815a/rev.14918.

#12 Updated by Alexandru Lungu 4 months ago

What other statistics can be achieved? I suspect the ratio is the most important one, so lets get it done. Also, we need a size, so we can understand in what circumstances were that statistics achieved.
Secondly, please address the export function. We want to have an easy to read format (table-like). Maybe use the Textile format we also use for Redmine (for easy copy-paste).
Mind that I didn't review the changes yet; please address the features above first.

Finally, we need to segregate the GET into GET hit and GET miss. We also need the timings separated. Also, we need different options for "instrument" and "profile". We can allow instrumentation without considerable overhead, but profiling may introduce too much of an overhead. Lets have them enabled separately.

#13 Updated by Dănuț Filimon 4 months ago

  • Status changed from Review to WIP

Alexandru Lungu wrote:

What other statistics can be achieved? I suspect the ratio is the most important one, so lets get it done. Also, we need a size, so we can understand in what circumstances were that statistics achieved.
Secondly, please address the export function. We want to have an easy to read format (table-like). Maybe use the Textile format we also use for Redmine (for easy copy-paste).
Mind that I didn't review the changes yet; please address the features above first.

Finally, we need to segregate the GET into GET hit and GET miss. We also need the timings separated. Also, we need different options for "instrument" and "profile". We can allow instrumentation without considerable overhead, but profiling may introduce too much of an overhead. Lets have them enabled separately.

Currently only the count and execution time are monitored, the ratio can be calculated during export (there is no method for exporting the results atm). I'll start working on the suggested points (the export, GET hit and GET miss ratio should be a must) and I especially like the idea of using the Textile format.

You can put the review on hold for the moment.

#14 Updated by Dănuț Filimon 4 months ago

I added monitoring for the configured and the in use cache size (latest size of the cache), GET hit/miss and modified the export method to print the results in the Textile format but the resulted table is too big to be posted on Redmine due to the number of instrumented methods. Can we consider redirecting the output to a csv file by adding an additional parameter to toggle between the formats or disregard the Textile format altogether?

Alexandru Lungu:

Also, we need different options for "instrument" and "profile".

I think instrumentation should always be enabled so that method calls can be counted, profiling should only monitor the execution time of the method and can be an additional parameter and when it is used, to calculate the average method call time based on the counted calls.

#15 Updated by Alexandru Lungu 4 months ago

Dănuț Filimon wrote:

I added monitoring for the configured and the in use cache size (latest size of the cache), GET hit/miss and modified the export method to print the results in the Textile format but the resulted table is too big to be posted on Redmine due to the number of instrumented methods. Can we consider redirecting the output to a csv file by adding an additional parameter to toggle between the formats or disregard the Textile format altogether?

Why is it too big? Shouldn't it be similar to #6815-11, but with some extra columns?
Anyway, you have only the possibility to use print with our FWD JMX. CSV can do to be easily imported in Spreadheets.

I think instrumentation should always be enabled so that method calls can be counted, profiling should only monitor the execution time of the method and can be an additional parameter and when it is used, to calculate the average method call time based on the counted calls.

The instrumentation implies overhead which is not desired in production systems. Instrumentation (and profiling) is meant to be used by us / system administrators that want to tune the caches in testing environments. Thus, please disable the instrumentation by default.
  • the CacheManager should easily return bare caches when instrumentation is disabled
  • the CacheManager should deliver instrumented caches when instrumentation is enabled,
  • the cacheManager should deliver instrumented caches with profiling enabled only when both instrumentation and profiling are enabled.

I think these settings can go in the directory.xml together with the sizes. After the size definition, we can enable instrumentation and profiling eventually (but they are disabled by default). Also, if you have a cache configuration in directory.xml, but no size, let the default size kick in.

#16 Updated by Dănuț Filimon 4 months ago

Alexandru Lungu wrote:

Why is it too big? Shouldn't it be similar to #6815-11, but with some extra columns?
Anyway, you have only the possibility to use print with our FWD JMX. CSV can do to be easily imported in Spreadheets.

Not only get and put methods are instrumented, methods like containsKey, containsValue, getOrDefault, putAll, putIfAbsent, remove, replace are methods that are also instrumented. If you also want to create an "Average" between call count and execution time we will have additional columns. I am currently looking into removing some of the methods that are not called based on previous instrumentation results.

The instrumentation implies overhead which is not desired in production systems. Instrumentation (and profiling) is meant to be used by us / system administrators that want to tune the caches in testing environments. Thus, please disable the instrumentation by default.
  • the CacheManager should easily return bare caches when instrumentation is disabled
  • the CacheManager should deliver instrumented caches when instrumentation is enabled,
  • the cacheManager should deliver instrumented caches with profiling enabled only when both instrumentation and profiling are enabled.

I think these settings can go in the directory.xml together with the sizes. After the size definition, we can enable instrumentation and profiling eventually (but they are disabled by default). Also, if you have a cache configuration in directory.xml, but no size, let the default size kick in.

I took it in account and this is already handled.

#17 Updated by Alexandru Lungu 4 months ago

Danut, please merge some of the statistics in one: get, getOrDefault can be merged. The same as put, putAll and putIfAbsent. The others can be standalone: containsKey, containsValue, remove and replace.

#18 Updated by Dănuț Filimon 4 months ago

  • Status changed from WIP to Review
  • % Done changed from 70 to 100

Committed 6815a/rev.14919. Reduced the number of instrumented methods (based on results from a large application, those methods were not used at all), combined similar methods, separated instrumentation and profiling. Profiling can be configured in the cache-config container, through profiling (similar to instrumentation, it is a boolean).

I just remembered that the Redmine tables are really responsive and I've seen large tables being posted so we can ignore my previous suggestion of using a csv (we can add it if we really want to).

We also received a fix for the POC so I'll resume work in other issues and also run the instrumentation.

Also available in: Atom PDF