Project

General

Profile

Bug #7641

Lower the number of Map.get() calls in the persistence layer

Added by Alexandru Lungu 11 months ago. Updated 10 months ago.

Status:
Review
Priority:
High
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:
version_reported:
version_resolved:

History

#1 Updated by Alexandru Lungu 11 months ago

From Constantin:

  • AbstractTempTable.getMutableInfo - a plain array can be kept for cases when the LegacyFieldInfo instance is already resolved (thus each field has an unique index in this array).
  • TableMapper.locatePerm - each schema can have an unique index in an ArrayList. This can get managed via removePermanentSchema and mapPermanentDMO, and the only issue is to compress the permanentDBs (which will be an ArrayList) if its size gets over a certain threshold
  • something similar as above can be done for TempTableMapper.TT_CACHE and TempTableMapper.p2j, and also PermanentTableMapper.p2j and j2p maps (where possible).
  • the point of the above is: if the API's caller has access to the DmoMeta, we can have an unique index which will be used instead of its name.

#2 Updated by Alexandru Lungu 11 months ago

  • Status changed from New to WIP
  • Assignee set to Eduard Soltan
  • Priority changed from Normal to High

#3 Updated by Eduard Soltan 11 months ago

In TempTableMapper there is a private final Map<TempTable, LegacyTableInfo> p2j = new HashMap<>(); which should be replaced by an ArrayList.

I tried to index LegacyTableInfo elements in the follwing way, get the DmoMeta data member of TempTable using getDMOInfo method, after that get the DmoMeta id, and use that id as an index in the ArrayList I want to replace p2j.

The problem in current implementation is that TempTable does not have an override for equals and hashCode methods, this means that elements in p2j are mapped base on references of TempTable objects. And multiple elements with the same DmoMeta id , are introduced as keys in p2j, which obviously could not happen with ArrayList.

Also when running a converted application, in on case getAllLegacyFieldInfo method from TableMapper is called after removeClass of TempTableMapper with TempTable parameter. Please follow the table:

method DmoMeta Id TempTable Reference
getAllLegacyFieldInfo 23 1321
removeClass 23 14234

So removeClass deletes mapping from index 23 from ArrayList, remove from p2j mapping with reference 14234.
And getAllLegacyFieldInfo method called after removeClass, find a mapping with reference 1321 in p2j, and find a null at index 23 in ArrayList.

#4 Updated by Alexandru Lungu 11 months ago

Created 7641a.

#5 Updated by Eduard Soltan 11 months ago

Committed on 7641a revision 14683.

fieldInfo in AbstractTempTable became an ArrayList, instead of HashMap.
permanetBDs in TableMapper became an ArrayList, instead of HashMap.
j2p in PermantTableMapper became an ArrayList, instead of Hashmap.

p2j from PermanentTempTable probably can not be switch to ArrayList because methods where p2j is used, returns the DmoMeta which we use to get an id or an index.
Switching p2j from TempTableMapper to ArrayList cause the problem from #7641-3

#6 Updated by Constantin Asofiei 11 months ago

Eduard, fieldInfo (and other structures related to fields) can be 'frozen' as a Java array once this table info was built. This will remove the ArrayList overhead.

#7 Updated by Eduard Soltan 11 months ago

Made fieldInfo a simple Java array, also added to getMutableInfo a new parameter fieldCount, those meaning is number of tables' columns.

TableMapper.MutableFieldInfo getMutableInfo(Integer                    fieldIndex,
                                               int                        fieldCount,     
                                               boolean                    setter, 
                                               Supplier<TableMapper.MutableFieldInfo> field)
{
    if (!setter)
    {

       if (fieldInfo == null || fieldInfo.length <= fieldIndex)
       {
          return null;
       }

       return fieldInfo[fieldIndex];
    }

    if (fieldInfo == null)
    {
       fieldInfo = new TableMapper.MutableFieldInfo[fieldCount + 1];
    }

    TableMapper.MutableFieldInfo info = fieldInfo[fieldIndex];

   if (info == null)
   {
     fieldInfo[fieldIndex] = field.get();
   }

   return fieldInfo[fieldIndex];
}

I am getting fieldCount parameter from DmoMeta, getFieldCount method.

#8 Updated by Alexandru Lungu 10 months ago

  • % Done changed from 0 to 50

I am OK with most of the changes; there seemed to be quite some places which needed refactoring.

Eduard, please consider using int instead of Integer where possible. I committed some changes to SchemaHelper on that matter (rev. 14684 - please run bzr update).

  • I came up with quite an idea. Please "force" DatabaseManager.TEMP_TABLE_SCHEMA to have schema id 0. This way, we can convey that the 0 schema is the _temp schema. Rewrite any DatabaseManager.TEMP_TABLE_SCHEMA.equals(schema) into schemaId 0. Also, you won't need to propagate String schema through all API of TableMapper - so just remove the parameter.
  • Passing DmoMeta so much might be confusing. If somebody is going to call void removeTable, won't expect to need a DmoMeta. Can we strictly require (where possible) only what we need (i.e. getSchemaOrder or getIndexFromSchema)?
  • Please fix typos (e.g. BufferFieldImpl.getLcLegacyId javadoc - "field" instead of "filed")
  • You need history entries to all files you changed.
  • I think it makes sense to have orderInSchema member, rather then schemaOrder, for DMO meta. Or maybe even idInSchema. Mind the getter name as well.
  • You can make schemaIndexNumber a list as well, based on the schema id.
  • You have some mismatchde in the javadoc (parameter is dmoMeta, javadoc param is dmoInfo).
  • if (schemaId null) this shouldn't happen.

Eduard, fieldInfo (and other structures related to fields) can be 'frozen' as a Java array once this table info was built. This will remove the ArrayList overhead.

Can we always use exponentially growing arrays instead of ArrayList, and do the array growing in-house? This will basically reallocate an array twice as large when in need and do the array-copy. This will slow down the server start-up, but will "freeze" the state at some point, avoiding get calls at run-time.

#9 Updated by Alexandru Lungu 10 months ago

Rebased 7641a to trunk. 14711.

#10 Updated by Eduard Soltan 10 months ago

Committed refactoring the code on 7641a revision 14715.

#11 Updated by Alexandru Lungu 10 months ago

  • Status changed from WIP to Review
  • % Done changed from 50 to 100

I committed 7641a rev. 14716 and 14717 with some improvements, including the use of Java array instead of ArrayList. Also, I cached the schemaId at Database to avoid further map look-ups. I am doing some profiling now. There were no regressions some tests of mine. Eduard, please retest 7641a (rev. 14717) with other customer applications.

Constantin, please review.

#12 Updated by Alexandru Lungu 10 months ago

I continued improving 7641a, which is now at rev. 14720. Basically, I moved SchemaHelper to persist, instead of orm package. I also cached the schema id at persistence level, instead of database level. Finally, I ensured that there is no map.get() call through AbstractTempTable, TableMapper and such.

Profiling is not quite good, out of an unknown reason. With rev. 14717 I've got 8.210s of execution time and with rev. 14720 I've for 8.230s of execution time. The base-line is 8.193s. Even so, I've got quite mixed results with rev. 14820 (first run in 8.149, second in 8.259s, one run in 8.329s and another in 8.067s).

I will do a final review tomorrow morning.

#13 Updated by Alexandru Lungu 10 months ago

  • Assignee changed from Eduard Soltan to Alexandru Lungu

Done some improvements and committed 7641a / rev. 14721. I didn't expect to have that many (new) instances of StaticTempTable and TempTableBuilder especially at run-time. In fact, there are maybe hundreds of such instances created in warm runs. Therefore, these cannot be "freezed". I actually created an getEstimatedFieldCount to set-up a reasonable initial size. Surprisingly, this estimation is always on spot, so the field array is created only once per StaticTempTable instance.

This was my last attempt to improve. I can't find any other hot-spot in the changes and there are considerable less map calls.

The final time is 8.226s, which is still +0.3% slower than my baseline.

As this doesn't provide consistent improvements in this state, I am pending this for further review / advice.

#14 Updated by Constantin Asofiei 10 months ago

Alexandru Lungu wrote:

As this doesn't provide consistent improvements in this state, I am pending this for further review / advice.

I'll do some profiling when I get a chance.

Also available in: Atom PDF