Project

General

Profile

Bug #2569

memory leaks of procedure, buffer, handle resources

Added by Eric Faulhaber about 9 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:

tt_buffer_gc_shortest_paths.txt Magnifier (14 KB) Eric Faulhaber, 05/03/2015 05:17 AM

ca_upd20150503a.zip (98.5 KB) Constantin Asofiei, 05/03/2015 01:56 PM

ca_upd20150503b.zip (99.2 KB) Constantin Asofiei, 05/03/2015 04:38 PM

ca_upd20150504a.zip (125 KB) Constantin Asofiei, 05/04/2015 09:25 AM

ecf_upd20150517a.zip (409 KB) Eric Faulhaber, 05/18/2015 12:23 AM

ecf_upd20150518a.zip (409 KB) Eric Faulhaber, 05/18/2015 05:40 AM

ecf_upd20150518b.zip (408 KB) Eric Faulhaber, 05/18/2015 03:10 PM

History

#1 Updated by Eric Faulhaber about 9 years ago

See #2555 for the beginning of this discussion, which diverged from the original focus of that issue.

#2 Updated by Eric Faulhaber about 9 years ago

A related matter is that since all dynamically defined temp-tables are created at global scope, we have additional, strong references to ExternalProgramWrapper instances which have lived beyond their intended lifespan, via this reference chain:

Class Name                                                                                                              | Ref. Objects | Shallow Heap | Ref. Shallow Heap | Retained Heap
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                        |              |              |                   |              
java.lang.Thread @ 0x71e2801b8  Conversation [00000003:customer_appserver] Native Stack, Thread                          |       17,790 |          104 |         4,838,880 |        30,352
'- <Java Local> com.goldencode.p2j.net.DirectSession @ 0x71e280918                                                      |       17,790 |           32 |         4,838,880 |           128
   '- context com.goldencode.p2j.security.SecurityContext @ 0x646dd7d68                                                 |       17,790 |           72 |         4,838,880 |    86,258,152
      '- tokenMap java.util.HashMap @ 0x646dd7e40                                                                       |       17,790 |           48 |         4,838,880 |    86,257,896
         '- table java.util.HashMap$Entry[128] @ 0x73e4c0208                                                            |       17,790 |          528 |         4,838,880 |    86,257,848
            |- [74] java.util.HashMap$Entry @ 0x646dda068                                                               |       16,425 |           32 |         4,467,600 |        83,240
            |  '- value com.goldencode.p2j.security.ContextLocal$Wrapper @ 0x646dda088                                  |       16,425 |           24 |         4,467,600 |        83,208
            |     '- contents com.goldencode.p2j.util.TransactionManager$WorkArea @ 0x646dda0a0                         |       16,425 |          104 |         4,467,600 |        83,184
            |        '- globalBlock com.goldencode.p2j.util.BlockDefinition @ 0x646dda400                               |       16,425 |           80 |         4,467,600 |        81,144
            |           '- finalizables java.util.ArrayList @ 0x646dda468                                               |       16,425 |           24 |         4,467,600 |        81,040
            |              '- elementData java.lang.Object[6246] @ 0x72c43c128                                          |       16,425 |       25,000 |         4,467,600 |        81,016
            |                 |- [109] com.goldencode.p2j.persist.TemporaryBuffer @ 0x60c3b9bb8                         |           13 |          272 |             3,536 |         2,840
            |                 |  '- tempTableRef com.goldencode.p2j.persist.TempTableBuilder @ 0x60c3b1138              |           12 |           80 |             3,264 |         6,624
            |                 |     '- instProc com.goldencode.p2j.util.handle @ 0x60c3b1188                            |           12 |           16 |             3,264 |         6,040
            |                 |        '- value com.goldencode.p2j.util.ExternalProgramWrapper @ 0x60c3b1198            |           12 |           24 |             3,264 |         6,024
            |                 |           '- referent com.something.server.common.Apictrl0 @ 0x60c3b11b0|           12 |          320 |             3,264 |         5,976
            |                 |              |- ttmap com.goldencode.p2j.persist.$__Proxy18 @ 0x60c3b1990               |            1 |           64 |               272 |           200
            |                 |              |- bufttxmltrans com.goldencode.p2j.persist.$__Proxy21 @ 0x60c3b32f8       |            1 |           64 |               272 |           200
            |                 |              |- bufsubTtsearch com.goldencode.p2j.persist.$__Proxy20 @ 0x60c3b52c0      |            1 |           64 |               272 |           200
            |                 |              |- ttapiprocs com.goldencode.p2j.persist.$__Proxy17 @ 0x60c3b1640          |            1 |           64 |               272 |           200
            |                 |              |- ttsearchBuf2 com.goldencode.p2j.persist.$__Proxy20 @ 0x60c3b4e28        |            1 |           64 |               272 |           200
            |                 |              |- bufTtxmltrans com.goldencode.p2j.persist.$__Proxy21 @ 0x60c3b4ad8       |            1 |           64 |               272 |           200
            |                 |              |- ttsearch com.goldencode.p2j.persist.$__Proxy20 @ 0x60c3b2030            |            1 |           64 |               272 |           200
            |                 |              |- ttorder com.goldencode.p2j.persist.$__Proxy16 @ 0x60c3b12f0             |            1 |           64 |               272 |           200
            |                 |              |- ttxmltrans com.goldencode.p2j.persist.$__Proxy21 @ 0x60c3b2420          |            1 |           64 |               272 |           200
            |                 |              |- ttfiltercriteria com.goldencode.p2j.persist.$__Proxy30 @ 0x60c3b2920    |            1 |           64 |               272 |           200
            |                 |              |- ttfilter com.goldencode.p2j.persist.$__Proxy19 @ 0x60c3b1ce0            |            1 |           64 |               272 |           200
            |                 |              |- bufparentttsearch com.goldencode.p2j.persist.$__Proxy20 @ 0x60c3b2fa8   |            1 |           64 |               272 |           200
            |                 |              '- Total: 12 entries                                                       |              |              |                   |              
            |                 |- [145] com.goldencode.p2j.persist.TemporaryBuffer @ 0x60c391fd0                         |           13 |          272 |             3,536 |         2,840
            |                 |- [91] com.goldencode.p2j.persist.TemporaryBuffer @ 0x74effcb88                          |           13 |          272 |             3,536 |         2,840
            |                 |- [127] com.goldencode.p2j.persist.TemporaryBuffer @ 0x74ef8e5e0                         |           13 |          272 |             3,536 |         2,840
            |                 |- [37] com.goldencode.p2j.persist.TemporaryBuffer @ 0x75723fd08                          |           13 |          272 |             3,536 |         2,840
            |                 |- [28] com.goldencode.p2j.persist.TemporaryBuffer @ 0x757242698                          |           13 |          272 |             3,536 |         2,840
            |                 |- [19] com.goldencode.p2j.persist.TemporaryBuffer @ 0x757245028                          |           13 |          272 |             3,536 |         2,840
            |                 |- [82] com.goldencode.p2j.persist.TemporaryBuffer @ 0x7572484f0                          |           13 |          272 |             3,536 |         2,840
            |                 |- [73] com.goldencode.p2j.persist.TemporaryBuffer @ 0x75724ae70                          |           13 |          272 |             3,536 |         2,840
            |                 |- [64] com.goldencode.p2j.persist.TemporaryBuffer @ 0x75724d800                          |           13 |          272 |             3,536 |         2,840
            |                 |- [55] com.goldencode.p2j.persist.TemporaryBuffer @ 0x757250190                          |           13 |          272 |             3,536 |         2,840
            |                 |- [118] com.goldencode.p2j.persist.TemporaryBuffer @ 0x60c3ba7c0                         |           13 |          272 |             3,536 |         2,840
            |                 |- [46] com.goldencode.p2j.persist.TemporaryBuffer @ 0x75723d378                          |           13 |          272 |             3,536 |         2,840
            |                 |- [154] com.goldencode.p2j.persist.TemporaryBuffer @ 0x60c3bc7c8                         |           13 |          272 |             3,536 |         2,840
            |                 |- [100] com.goldencode.p2j.persist.TemporaryBuffer @ 0x60b68aa70                         |           13 |          272 |             3,536 |         2,840
            |                 |- [136] com.goldencode.p2j.persist.TemporaryBuffer @ 0x60c365c70                         |           13 |          272 |             3,536 |         2,840
            |                 |- [5287] com.goldencode.p2j.persist.TemporaryBuffer @ 0x6b4a2d620                        |           12 |          272 |             3,264 |         1,688
            |                 |- [2038] com.goldencode.p2j.persist.TemporaryBuffer @ 0x5dc4c4dc8                        |           12 |          272 |             3,264 |         1,752
            |                 |- [588] com.goldencode.p2j.persist.TemporaryBuffer @ 0x61d36a378                         |           12 |          272 |             3,264 |         1,736
            |                 |- [5593] com.goldencode.p2j.persist.TemporaryBuffer @ 0x5c243d968                        |           12 |          272 |             3,264 |         2,072
            |                 |- [2418] com.goldencode.p2j.persist.TemporaryBuffer @ 0x612a52630                        |           12 |          272 |             3,264 |         1,544
            |                 |- [3154] com.goldencode.p2j.persist.TemporaryBuffer @ 0x72e46c738                        |           12 |          272 |             3,264 |         1,496
            |                 |- [4530] com.goldencode.p2j.persist.TemporaryBuffer @ 0x6cf6c9118                        |           12 |          272 |             3,264 |         1,688
            |                 |- [4920] com.goldencode.p2j.persist.TemporaryBuffer @ 0x69adb0030                        |           12 |          272 |             3,264 |         1,608
            |                 |- [5703] com.goldencode.p2j.persist.TemporaryBuffer @ 0x69eb99720                        |           12 |          272 |             3,264 |         1,544
            |                 '- Total: 25 of 2,533 entries; 2,508 more                                                 |              |              |                   |              
            |- [33] java.util.HashMap$Entry @ 0x754ac0510                                                               |        1,331 |           32 |           362,032 |    14,133,648
            |- [68] java.util.HashMap$Entry @ 0x6473a36f8                                                               |           34 |           32 |             9,248 |        53,568
            '- Total: 3 entries                                                                                         |              |              |                   |              
com.goldencode.p2j.persist.TemporaryBuffer @ 0x69a79cf88 Native Stack                                                   |           25 |          272 |             6,800 |         1,576
com.goldencode.p2j.util.ExternalProgramWrapper @ 0x74136b038 Native Stack                                               |           13 |           24 |             3,536 |            24
com.goldencode.p2j.persist.$__Proxy20 @ 0x74136abd8 Native Stack                                                        |           12 |           64 |             3,264 |           200
java.util.HashMap$Entry @ 0x5c12d5f00 Native Stack                                                                      |            3 |           32 |               816 |            32
com.goldencode.p2j.persist.$__Proxy21 @ 0x74136b188 Native Stack                                                        |            2 |           64 |               544 |           200
java.util.HashMap$Entry @ 0x6d425e450 Native Stack                                                                      |            1 |           32 |               272 |            32
com.goldencode.p2j.persist.$__Proxy928 @ 0x74136b050 Native Stack                                                       |            1 |           64 |               272 |           272
java.util.HashMap$Entry @ 0x5c12d6010 Native Stack                                                                      |            1 |           32 |               272 |            32
com.goldencode.p2j.persist.$__Proxy928 @ 0x74136b2c0 Native Stack                                                       |            1 |           64 |               272 |           200
com.goldencode.p2j.persist.$__Proxy926 @ 0x74136ad88 Native Stack                                                       |            1 |           64 |               272 |           200
java.util.HashMap$Entry @ 0x5c12d6928 Native Stack                                                                      |            1 |           32 |               272 |            32
com.goldencode.p2j.persist.$__Proxy30 @ 0x74136ae88 Native Stack                                                        |            1 |           64 |               272 |           200
com.goldencode.p2j.persist.$__Proxy20 @ 0x74136ab50 Native Stack                                                        |            1 |           64 |               272 |           200
com.goldencode.p2j.persist.$__Proxy926 @ 0x74136aee0 Native Stack                                                       |            1 |           64 |               272 |           200
com.goldencode.p2j.persist.QueryWrapper @ 0x74136b228 Native Stack                                                      |            1 |           80 |               272 |           344
Total: 16 entries                                                                                                       |       17,855 |        1,120 |         4,856,560 |              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This occurs because of our TM finalizable design, regardless of whether the temp-table buffers are part of a persistent procedure, so we get many more temp-table buffers surviving an external procedure than actually need to be persistent.

We don't necessarily need to make this part of the current discussion, but I wanted to document it so I don't lose track.

#3 Updated by Eric Faulhaber about 9 years ago

For better readability of the content in the last last post, please see attached file.

#4 Updated by Constantin Asofiei about 9 years ago

Eric Faulhaber wrote:

A related matter is that since all dynamically defined temp-tables are created at global scope, we have additional, strong references to ExternalProgramWrapper instances which have lived beyond their intended lifespan, via this reference chain:

I don't understand something: aren't the TemporaryBuffer and TempTableBuilder instances you mention supposed to be cleaned up at some point? Are they dynamic or static?

The reference you mention is for the INSTANTIATING-PROCEDURE attribute - all resources know their creator. But I think you are correct, and there is a problem: as a dynamic resource can survive its creator, the INSTANTIATING-PROCEDURE for these resources gets set to unknown when the persistent proc which created it gets deleted.

The fix for this is a little more complicated, as it looks like each external program needs to track ALL resources created within it, not just static ones - so their INSTANTIATING-PROCEDURE attr can be set to unknown, when the external program gets deleted.

#5 Updated by Constantin Asofiei about 9 years ago

Eric Faulhaber wrote in #2555 note 53:

I tried a few tests and spent a lot of time in the debugger tonight. We still have otherwise unreachable instances of ExternalProgramWrapper collecting in idResources and resourceIds. The problem is that everywhere we call the ExternalProgramWrapper c'tor, we potentially add an entry into these maps. Since TransparentWrapper forwards hashCode and equals calls to the referent, we are at least not creating new entries if another instance has the same referent. However, the problem is we don't ever call handle.removeResource for most of these instances. Most of the handles which reference them eventually go out of scope and are no longer referenced by application code, but the strong references to their values remain in the context-local maps.

I'll add some logging to track cases when the handle.resourceId is called for an invalid resource... maybe we can find exactly who/when re-attaches these.

I did turn on FINE logging for ProcedureManager (I changed the LOG initializer to use ProcedureManager.class.getName(), btw). The only lines it logged were:

[05/03/2015 03:42:47 EDT] (com.goldencode.p2j.util.ProcedureManager:FINE) Postponing deletion of procedure common/aapsvpr0.p
[05/03/2015 03:42:47 EDT] (com.goldencode.p2j.util.ProcedureManager$WorkArea:FINE) Processing postponed delete for procedure common/aapsvpr0.p has failed

That should have bee has passed, I've got the texts backwards somehow.

BTW, this is a general problem which affects other WrappedResource implementors as well, e.g., BufferFieldImpl, StaticTempTable, TempTableBuilder, etc. These are not huge memory consumers, but there are many of them in the context-local maps. I didn't notice them there before; I think it is a side effect of this update.

Do you mean the handle$WorkArea context-local maps or something else?

#6 Updated by Eric Faulhaber about 9 years ago

Constantin Asofiei wrote:

I don't understand something: aren't the TemporaryBuffer and TempTableBuilder instances you mention supposed to be cleaned up at some point? Are they dynamic or static?

The resources being held at the ends of these GC chains are static temp-table buffers which are held by the converted program instances wrapped in ExternalProgramWrapper objects. These seem to be held in place by dynamic temp table buffers, which have the strong reference chain: TM@WorkArea.finalizables -> TemporaryBuffer.tempTableRef -> TempTableBuilder.instProc -> handle.value -> ExternalProgramWrapper.referent -> converted business logic -> proxies to static temp-tables.

What I don't yet understand is why these dynamic temp-table buffers are still in TM$WorkArea.finalizables. Perhaps the code I wrote (not yet checked in) to remove them from the global finalizables list via BufferImpl.resourceDelete is never invoked. This might be the case if the corresponding, dynamic 4GL objects are not being deleted properly by business logic, which would represent leaks in the original code, too...

#7 Updated by Constantin Asofiei about 9 years ago

Eric Faulhaber wrote:

Constantin Asofiei wrote:

I don't understand something: aren't the TemporaryBuffer and TempTableBuilder instances you mention supposed to be cleaned up at some point? Are they dynamic or static?

The resources being held at the ends of these GC chains are static temp-table buffers which are held by the converted program instances wrapped in ExternalProgramWrapper objects. These seem to be held in place by dynamic temp table buffers, which have the strong reference chain: TM@WorkArea.finalizables -> TemporaryBuffer.tempTableRef -> TempTableBuilder.instProc -> handle.value -> ExternalProgramWrapper.referent -> converted business logic -> proxies to static temp-tables.

OK, but what is relevant for me is that the instProc is referencing a (I think) deleted procedure. Keeping all the procedure's resources (static or not) with it is just a side-effect.

Are there any other references to the proxies to static temp-tables (the end of the chain)? If not, then the attached update should solve this problem... and allow the external programs to be GC'ed, once they are deleted (as they will no longer be referenced by the instProc field).

What I don't yet understand is why these dynamic temp-table buffers are still in TM$WorkArea.finalizables. Perhaps the code I wrote (not yet checked in) to remove them from the global finalizables list via BufferImpl.resourceDelete is never invoked. This might be the case if the corresponding, dynamic 4GL objects are not being deleted properly by business logic, which would represent leaks in the original code, too...

Well, I think you are correct, there is a problem there. When adding things to the global scope, you need to manually remove it from there. I see these cases which might be problematic:
  1. TemporaryBuffer.openScope:2849
    // A global temp table buffer is cleaned up when the context ends.
    TransactionManager.registerFinalizable(multiplexer, true);
    
  2. RecordBuffer.openScopeAt:5825
          // Register this buffer as a Finalizable with the TM, so that its
          // finished method is called when this buffer scope is about to close.
          TransactionManager.registerFinalizableAt(blockDepth, this);
    
  3. TemporaryBuffer.openScopeAt:2891
          // register for temp table cleanup
          Multiplexer multiplexer = new Multiplexer();
          TransactionManager.registerFinalizableAt(blockDepth, multiplexer);
    
          // register for no-undo validation if outside an application transaction
          if (!isUndoable() && !TransactionManager.isTransaction())
          {
             TransactionManager.registerFinalizableAt(blockDepth, new NoUndoValidator());
          }
    

All of these need to be cleaned up when the buffer gets deleted. And I don't think the right place is in BufferImpl.resourceDelete - the proper place should be in the RecordBuffer.finishedImpl and TemporaryBuffer.finishedImpl, so that all Finalizable instances get removed from the global scope...

#8 Updated by Constantin Asofiei about 9 years ago

I've just noticed the handle's logging for invalid resources is too noisy, it gets logged for valid cases (as when a resource is just created, before it gets valid)... please remove this before starting testing:

      if (!resource.valid())
      {
         if (LOG.isLoggable(Level.WARNING))
         {
            LOG.log(Level.WARNING, 
                    "You are creating a new ID for an invalid resource " + 
                    resource.getClass().getName() + "!",
                    new Exception());
         }
      }

#9 Updated by Eric Faulhaber about 9 years ago

Constantin Asofiei wrote:

Are there any other references to the proxies to static temp-tables (the end of the chain)? If not, then the attached update should solve this problem... and allow the external programs to be GC'ed, once they are deleted (as they will no longer be referenced by the instProc field).

Yes (speaking before your 0503a update, which I am testing now): besides the reference chain from TM$WorkArea.finalizables, the ExternalProgramWrapper references in idResources and resourceIds are still all there, and thus keep all these same static DMO proxies alive. I may have misunderstood your changes, but it didn't seem from my review that 0503a addresses this. Anyway, I'll let you know how the test goes...

[...] And I don't think the right place is in BufferImpl.resourceDelete - the proper place should be in the RecordBuffer.finishedImpl and TemporaryBuffer.finishedImpl, so that all Finalizable instances get removed from the global scope...

Yes, that's actually how I did it, but clearly there's a problem with my implementation, as the MAT output shows.

#10 Updated by Eric Faulhaber about 9 years ago

The 0503a appears to be working as designed with respect to cleaning up the reference chain previously pinned by TempTableBuilder.instProc, in that the handle instances stored in that field are cleared, AFAICT.

However, we are still holding lots of objects in the idResources and resourceIds maps (more than before, because now every WrappedResource instance is stored in there upon construction). I think we need to reconsider the approach introduced with 0502b.

#11 Updated by Constantin Asofiei about 9 years ago

Eric Faulhaber wrote:

The 0503a appears to be working as designed with respect to cleaning up the reference chain previously pinned by TempTableBuilder.instProc, in that the handle instances stored in that field are cleared, AFAICT.

However, we are still holding lots of objects in the idResources and resourceIds maps (more than before, because now every WrappedResource instance is stored in there upon construction). I think we need to reconsider the approach introduced with 0502b.

Is this from an intermediate state or final state, when everything should have cleaned up? I'll look for a lazy approach of setting the ID, only when the resource ID gets called explicitly.

#12 Updated by Constantin Asofiei about 9 years ago

Eric Faulhaber wrote:

[...] And I don't think the right place is in BufferImpl.resourceDelete - the proper place should be in the RecordBuffer.finishedImpl and TemporaryBuffer.finishedImpl, so that all Finalizable instances get removed from the global scope...

Yes, that's actually how I did it, but clearly there's a problem with my implementation, as the MAT output shows.

Please try this version. I think it might solve your problem: previously the postponed procedure delete was being done too early, and static resources were not being deleted properly.

I've also added lazy setting of the resource's ID, so the handle's resource-related maps should not explode. The warning is now OK to be used - handle.resourceID should never be called for an invalid resource and with no ID previously set.

#13 Updated by Eric Faulhaber about 9 years ago

Constantin Asofiei wrote:

Is this from an intermediate state or final state, when everything should have cleaned up?

I take the heap dumps before disconnecting the BPM (so the app server connection is still up), but after all the tests have completed and the test harness has stopped running. So, it is from a final state.

#14 Updated by Eric Faulhaber about 9 years ago

Constantin Asofiei wrote:

Please try this version. I think it might solve your problem: previously the postponed procedure delete was being done too early, and static resources were not being deleted properly.

I've also added lazy setting of the resource's ID, so the handle's resource-related maps should not explode. The warning is now OK to be used - handle.resourceID should never be called for an invalid resource and with no ID previously set.

This looks promising on review; will let you know how testing goes. Thanks!

#15 Updated by Eric Faulhaber about 9 years ago

Test runs with the last few updates have been noticeably slower (~2.5x). I haven't profiled yet to determine the cause, but the log from the last run using 0503b is full of these, which probably contributes:
[05/03/2015 17:08:55 EDT] (com.goldencode.p2j.util.handle:WARNING) You are creating a new ID for an invalid resource com.goldencode.p2j.util.ExternalProgramWrapper!
java.lang.Exception
        at com.goldencode.p2j.util.handle.resourceId(handle.java:515)
        at com.goldencode.p2j.util.handle.toString(handle.java:2280)
        at com.goldencode.p2j.util.handle.toString(handle.java:2257)
        at com.goldencode.p2j.util.handle.toStringMessage(handle.java:2294)
        at com.goldencode.p2j.persist.type.AbstractWrapperUserType.toString(AbstractWrapperUserType.java:295)
        at com.goldencode.p2j.persist.type.HandleUserType.bindNonNull(HandleUserType.java:141)
        at com.goldencode.p2j.persist.type.AbstractWrapperUserType.nullSafeSet(AbstractWrapperUserType.java:278)
        at com.goldencode.p2j.persist.type.HandleUserType.nullSafeSet(HandleUserType.java:27)
        at org.hibernate.type.CustomType.nullSafeSet(CustomType.java:155)
        at org.hibernate.persister.entity.AbstractEntityPersister.dehydrate(AbstractEntityPersister.java:2747)
        at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:3152)
        at org.hibernate.persister.entity.AbstractEntityPersister.updateOrInsert(AbstractEntityPersister.java:3087)
        at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:3416)
        at org.hibernate.action.internal.EntityUpdateAction.execute(EntityUpdateAction.java:140)
        at org.hibernate.engine.spi.ActionQueue.execute(ActionQueue.java:362)
        at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:354)
        at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:276)
        at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:326)
        at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:52)
        at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1213)
        at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:402)
        at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.beforeTransactionCommit(JdbcTransaction.java:101)
        at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.commit(AbstractTransactionImpl.java:175)
        at com.goldencode.p2j.persist.Persistence$Context.commit(Persistence.java:4184)
        at com.goldencode.p2j.persist.Persistence.commit(Persistence.java:3083)
        at com.goldencode.p2j.persist.RecordBuffer$Handler.invoke(RecordBuffer.java:10741)
        at com.goldencode.p2j.persist.$__Proxy34.setCrudhandle(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor1328.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.goldencode.p2j.persist.FieldReference.set(FieldReference.java:787)
        at com.goldencode.p2j.persist.FieldReference.set(FieldReference.java:723)
        at com.goldencode.p2j.persist.type.HandleFieldRef.set(HandleFieldRef.java:76)
        at com.goldencode.p2j.util.ControlFlowOps.invokeExternalProcedure(ControlFlowOps.java:3368)
        at com.goldencode.p2j.util.ControlFlowOps.invokePersistentImpl(ControlFlowOps.java:4309)
        at com.goldencode.p2j.util.ControlFlowOps.invokePersistentImpl(ControlFlowOps.java:4209)
        at com.goldencode.p2j.util.ControlFlowOps.invokePersistentSet(ControlFlowOps.java:1335)
        at com.goldencode.p2j.util.ControlFlowOps.invokePersistentSet(ControlFlowOps.java:1314)
        at com.something.server.common.Aapicrd0$15.body(Aapicrd0.java:1351)
        ...

There is only one log entry from ProcedureManager:
[05/03/2015 16:54:43 EDT] (com.goldencode.p2j.util.ProcedureManager:FINE) Postponing deletion of procedure common/aapsvpr0.p

There are still 3,339 ExternalProgramWrapper instances pinned by the following:
  • 2,277 by handle$WorkArea.idResources (Conversation thread);
  • 450 by handle$WorkArea.resourceIds (Conversation thread);
  • 608 by BufferManager.inUseRecords (Conversation thread); and
  • 4 by LogicalTerminal.survivingScopes (Writer thread).

I don't really understand why the numbers for idResources and resourceIds are not the same.

There are 1,063 distinct EPW referent types.

The BufferManager.inUseRecords references are all handles stored in the TempRecord143Impl.crudhandle field. Those DMOs should have been released and evicted from the Hibernate session when that static temp-table buffer went out of scope; I'll have to look into what went wrong there.

I haven't finished my analysis, but I wanted to post some preliminary feedback (especially the handle warning, since that's new). I'll pick it up again tomorrow. Let me take it a little further before you spend more time on this; I know you have other time-critical tasks.

#16 Updated by Constantin Asofiei almost 9 years ago

Eric Faulhaber wrote:

Test runs with the last few updates have been noticeably slower (~2.5x).

Are there regressions in the failed tests (except the duration)?

I haven't profiled yet to determine the cause, but the log from the last run using 0503b is full of these, which probably contributes:
[...]

Well, this one is a real issue. The handle is assigned too early, when the procedure is ran persistent. It needs to be assigned only if the procedure is executed (and not ended with an error). Attached solves this issue.

There is only one log entry from ProcedureManager:
[...]

Hm... you mean there is only one entry?

There are still 3,339 ExternalProgramWrapper instances pinned by the following:
  • 2,277 by handle$WorkArea.idResources (Conversation thread);
  • 450 by handle$WorkArea.resourceIds (Conversation thread);
  • 608 by BufferManager.inUseRecords (Conversation thread); and
  • 4 by LogicalTerminal.survivingScopes (Writer thread).

I don't really understand why the numbers for idResources and resourceIds are not the same.

There are 1,063 distinct EPW referent types.

The BufferManager.inUseRecords references are all handles stored in the TempRecord143Impl.crudhandle field. Those DMOs should have been released and evicted from the Hibernate session when that static temp-table buffer went out of scope; I'll have to look into what went wrong there.

I haven't finished my analysis, but I wanted to post some preliminary feedback (especially the handle warning, since that's new). I'll pick it up again tomorrow. Let me take it a little further before you spend more time on this; I know you have other time-critical tasks.

Yes, I'm not using this task as my primary, I'm using it as a "get out of the box" task :).

#17 Updated by Eric Faulhaber almost 9 years ago

Constantin Asofiei wrote:

Eric Faulhaber wrote:

Test runs with the last few updates have been noticeably slower (~2.5x).

Are there regressions in the failed tests (except the duration)?

No, the same number of tests pass as before.

[...]

Well, this one is a real issue. The handle is assigned too early, when the procedure is ran persistent. It needs to be assigned only if the procedure is executed (and not ended with an error). Attached solves this issue.

Thanks.

There is only one log entry from ProcedureManager:
[...]

Hm... you mean there is only one entry?

Yes, with FINE-level logging enabled, that is the only line in the server log from ProcedureManager. Did you expect a lot more?

#18 Updated by Eric Faulhaber almost 9 years ago

Constantin Asofiei wrote:

[...]
Attached solves this issue.

Confirmed, no more warnings in the log. Performance is still a problem, though; I'll profile the next test run. I've made some changes, too, so it could be something I introduced.

#19 Updated by Eric Faulhaber almost 9 years ago

I had to step away for a few days to chase down a related task. Now I've had a chance to do some more debugging.

The ExternalProgramWrapper instances that are being left behind in the resource maps are getting added with calls like this:

ControlFlowOps.invokeWithMode("ip-create-licence", "III", new character("L01000"), new character("Common Module"), new character("commondb01"));

They are added to the maps by side effect of a CompareOps.isEqual call at ControlFlowOps:4025:
if (callerHandle != null && !CompareOps._isEqual(callerHandle, h))

These programs are never added to ProcedureManager.deleteOnExit (in fact, nothing is -- it's always empty when ProcedureManager$WorkArea.scopeFinished is invoked). So, when one is popped off the thisProcedures deque, the code at line 2291:
// check if we have a postponed delete
Boolean delete = deleteOnExit.remove(referent);

always returns null. Since delete is always null, we never reach line 3338 where these would be cleaned up:
handle.removeResource(new ExternalProgramWrapper(referent));

I still think the core of this problem is that we are creating strong references to resources by side effect of a comparison operation. These resources should only be collected deliberately, not incidentally. I don't understand the overall mechanism well enough to propose an approach, but we seem to be calling handle.resourceId all over the place without registering a cleanup mechanism. Are all the places that call an asHandle method (which in turn calls handle.resourceId) meant to be matched with calls to handle.removeResource via 4GL business logic statements like DELETE OBJECT?

I've only been looking at the ExternalProgramWrapper instances, because they are pinning record buffers, but I'm beginning to suspect we may be leaking other kinds of WrappedResource objects as well.

#20 Updated by Constantin Asofiei almost 9 years ago

Eric, don't know how I didn't notice this until now. handle.removeResource should be called for every kind of resource (dynamic or static), when that resource gets deleted. HandleChain.delete does this. But external programs are a special kind and don't extend from HandleChain... and I was calling handle.removeResource only in ExternalProgramWrapper.delete (called only when persistent procedures get deleted). This call should happen in ProcedureManager.delete, which is called for both static and dynamic procedures.

So if you add a handle.removeResource at the end of ProcedureManager.delete might solve the issue:

handle.removeResource(pdata.phandle.getResource());

#21 Updated by Eric Faulhaber almost 9 years ago

Constantin, I included the suggested handle.removeResource fix, and it did indeed solve that problem. Thank you!

The attached update addresses this and a number of other memory leaks. These were accumulating to somewhere between 1.5-2GB of wasted heap in the largest set of customer unit tests. With the fix, the server remains in what appears to be a stable memory state throughout that test set. This gives us a nice performance boost as well, as the server previously would become progressively slower as more tests were run and more resources leaked in various data structures. We still seem to collect more dynamic temp-table buffer instances than I would like in TM global finalizables, but that's a relatively small leak compared to some of the others that were fixed, and I want to get those in sooner rather than later, so I'll defer fixing that for now.

Please note that the latest versions of the various updates you provided over the course of this discussion have been rolled into this update, and I fixed what appeared to be issues in several of them. In particular, where you added the id field to various classes, you used the idiom new WorkArea() in a number of places where I think the intention was work.obtain() or work.get(), or whatever flavor of retrieving a context local variable was appropriate for that particular class. Feel free to review those if you think I've misunderstood your intent. Anyway, bottom line is: you don't need to worry about testing and committing those files.

I am regression testing the update with Majic now. Thanks for all your help!

#22 Updated by Constantin Asofiei almost 9 years ago

Eric Faulhaber wrote:

Constantin, I included the suggested handle.removeResource fix, and it did indeed solve that problem. Thank you!
In particular, where you added the id field to various classes, you used the idiom new WorkArea() in a number of places where I think the intention was work.obtain() or work.get(), or whatever flavor of retrieving a context local variable was appropriate for that particular class. Feel free to review those if you think I've misunderstood your intent. Anyway, bottom line is: you don't need to worry about testing and committing those files.

There's one more place with a new WorkArea() in ColorTable.asHandle. Please fix this too.

Otherwise, I'm a worried about the unsynchronized access to the DatabaseManager.schemaByDMOClass map - this is populated by registerDatabase which has a lock on configLock, but the normalizeDMOClass does not sync on it. The other usage - getSchemaByClass - does sync on it.

#23 Updated by Eric Faulhaber almost 9 years ago

Constantin Asofiei wrote:

There's one more place with a new WorkArea() in ColorTable.asHandle. Please fix this too.

Done.

Otherwise, I'm a worried about the unsynchronized access to the DatabaseManager.schemaByDMOClass map - this is populated by registerDatabase which has a lock on configLock, but the normalizeDMOClass does not sync on it. The other usage - getSchemaByClass - does sync on it.

Fixed, thank you.

I've restarted regression testing with the attached. First run got hung up in Ctrl-C testing anyway.

#24 Updated by Eric Faulhaber almost 9 years ago

The attached update is essentially the same as ecf_upd20150518a, but with the original error logging in TM.abnormalEnd reverted back to its original, FINE setting (my temporary change in 0518a to WARNING is good for debugging, but too verbose for production).

This update has passed regression testing and is committed to bzr rev. 10866.

#25 Updated by Eric Faulhaber over 8 years ago

  • % Done changed from 90 to 100
  • Status changed from WIP to Closed

#26 Updated by Greg Shah over 7 years ago

  • Target version changed from Milestone 11 to Cleanup and Stablization for Server Features

Also available in: Atom PDF