Project

General

Profile

Bug #6627

improve conversion memory footprint for AstKey persistent maps

Added by Constantin Asofiei almost 2 years ago. Updated about 1 year ago.

Status:
Review
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
case_num:
version:

Related issues

Related to Conversion Tools - Bug #6642: improve parsing memory footprint Review

History

#2 Updated by Constantin Asofiei almost 2 years ago

There are these persistent (in the H2 conversion database) maps:

frame_generator_pre.xml:
117: <rule>frameInterfaces    = createMapFrameAstKeyToString("frameInterfaces")</rule> 

p2o_pre.xml
139: <action>tmpTabNames = createMapAstKeyToString("pre_tmpTabNames")</action> 

p2o.xml (2 matches)
537: <action>tmpTabNames = createMapAstKeyToString("tmpTabNames")</action> 
538: <action>tmpTabNodes = createMapAstKeyToAst('tmpTabNodes')</action> 

These work with a AstKey or FrameAstKey. The problem here is that they keep the AST in memory, even if it is used only for equals (which AFAIK the map will call it only if there is a collision on the hashcode). My idea here is to:
  • keep the AST id as an instance field, astId
  • the ast field is kept as null - all access to this field is replaced with a getter, which will load the AST (if not already loaded), using the astId field. This will load the AST only when needed.

I have some changes in testing which allow this 'lazy loading' of the AST only when reading the map from the H2 database. During the map creation, the AST is still kept in memory. I'll see how this conversion goes, and if is OK with a smaller heap, I'll make the ast fully lazy, even for new map entries. The point is, let the AST be loaded and remain in memory in the AstKey field only if equals is ever called.

The above should allow for a smaller memory footprint for these maps, but with some penalty on the total conversion time. Considering that these maps are used in conversion phases which are not that time-consuming (5 minutes on a large project for P2O processing and temp-table Java code generation), the impact should not be massive.

#3 Updated by Greg Shah almost 2 years ago

It is a good plan and a reasonable trade-off.

#4 Updated by Constantin Asofiei almost 2 years ago

Experimenting showed that the impact of these changes didn't have the magnitude I first assumed. The changes I have may improve some performance for incremental conversion, but I will not release them until I have a chance to test them more.

As a side note, what the customer was experiencing was a problem with a conversion heap allocated too high, and the operating system was terminating the Java process with error 137, as the machine was out of resource (this was not an OutOfMemoryException thrown by the JVM). After experimenting with lower heap values, I could find a setting where the conversion time was minimally impacted and allowed the customer to convert it with that machine's RAM (the conversion heap was reduced with ~25%).

And as a second side note, from experimenting with the customer apps, FWD conversion does not require (too) much additional memory once the parsing phase is done; during parsing:
  • a part of the memory is occupied during parsing by the legacy 4GL classes - ClassDefinition structures are kept in memory, and these can be in the 10s of thousands, each pinning to memory the entire AST. These instances remain in memory for the entire conversion.
  • there are .schema files which can be in the 100MB range on disk - some analysis of the SchemaDictionary memory footprint should be performed, this can be determined easily by converting only the .df files (without any 4GL code) and looking at the used heap (after garbage collection) in VisualVM.

#5 Updated by Constantin Asofiei almost 2 years ago

  • Related to Bug #6642: improve parsing memory footprint added

#6 Updated by Constantin Asofiei about 1 year ago

  • Status changed from New to Review
In #7079, customer reported again this issue and a more stable recreate could be found. 6129b/14391 fixes performance for the incremental conversion. The main issues were:
  • lazy loading of the AST at the map's value.
  • when a sub-AST is required at the key or the value, leave the entire AST file in the registry - otherwise, the same AST file will be re-loaded 10s or 100s of times, depending on how many sub-ASTs are required from that file.

In short, the problem above was with the temp-table ASTs - the previous approach was to reload the full AST each time a temp-table AST is required. But, for temp-tables, in the end the full AST is required to be loaded, and this resulted in re-loading the same AST 100's of times, impacting the overall performance. The solution was to load the temp-table AST fully, only once.

This same approach can't be applied to frame ASTs, as these reside in the main .ast file, and we can't keep in memory all the .ast files.

Also available in: Atom PDF