Project

General

Profile

Bug #2695

race conditions when starting an appserver/batch process

Added by Constantin Asofiei over 8 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Start date:
09/09/2015
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
case_num:

History

#1 Updated by Constantin Asofiei over 8 years ago

When adding batch programs to be started via the schedulers, there are intermittent errors:

[09/09/2015 01:44:47 BST] (SessionManager.listen():INFO) {00000000:00000001:standard} Server ready
[09/09/2015 01:44:47 BST] (UpdateAccountTask.run():WARNING) {00000000:00000023:standard} Cannot change 7e691M11zN0cs5TQ account
Problem retrieving the spawn command, using these settings: port [3333] host [localhost] alias [standard] UUID [fadd8254-bac4-4c66-8a08-9a483b44e489]
java.lang.Exception: Authentication failed
        at com.goldencode.p2j.net.SessionManager.createQueue(SessionManager.java:1042)
        at com.goldencode.p2j.net.LeafSessionManager.connectDirect(LeafSessionManager.java:201)
        at com.goldencode.p2j.main.NativeSecureConnection.command(NativeSecureConnection.java:82)
[09/09/2015 01:44:49 BST] (AppServerLauncher:INFO) Appserver 'app_server' was started successfully!

This is a result of the fact that each job is executed in a different thread, using the server's context; thus the DirectoryService instance is shared among the threads, and this class is not thread-safe. If a thread calls ds.unbind while another thread has a not-finished ds.openBatch bracket, it closes the batch and the thread with ds.openBatch usage can no longer continue operations on the batch.

I was thinking on synchronizing on the DirectoryService instance; currently I think there are only two places which require this:
  1. UpdateAccountTask.run (where AdminServerImpl.setUser is called)
  2. AppServerDefinition.refresh - where the appserver details are read

There was also another race condition for TemporaryAccountWorker.callerLatch - this was made null once the task was completed, but if the task is finished before the caller has a chance to start waiting on the latch, a NPE will result.

Branch 2695a was created for this task.

Revision 10933 contains fixes for the above issues - please review.

#2 Updated by Greg Shah over 8 years ago

Code Review Task Branch 2695a Revision 10933

I'm OK with the changes.

I think the TemporaryAccountWorker change is hit by MAJIC regression testing, but I don't think the UpdateAccountTask or AppServerDefinition can be hit. How do you want to test this?

#3 Updated by Constantin Asofiei over 8 years ago

Greg Shah wrote:

Code Review Task Branch 2695a Revision 10933

I'm OK with the changes.

I think the TemporaryAccountWorker change is hit by MAJIC regression testing, but I don't think the UpdateAccountTask or AppServerDefinition can be hit. How do you want to test this?

Actually TemporaryAccountWorker.run (where the change is) is not hit by MAJIC, unless we connect to it via a web client or use the spawner in another way. I'll double-check to confirm this is still working, and if it does, it can be merged to trunk.

#4 Updated by Greg Shah over 8 years ago

I'll double-check to confirm this is still working, and if it does, it can be merged to trunk.

OK, go ahead with this plan.

#5 Updated by Constantin Asofiei over 8 years ago

Greg Shah wrote:

I'll double-check to confirm this is still working, and if it does, it can be merged to trunk.

OK, go ahead with this plan.

Web clients in MAJIC are working properly.

#6 Updated by Greg Shah over 8 years ago

Please merge to trunk.

#7 Updated by Constantin Asofiei over 8 years ago

2695a was merged to trunk rev 10933 and archived.

#8 Updated by Greg Shah over 8 years ago

  • Target version set to Milestone 11
  • Status changed from WIP to Closed

#9 Updated by Greg Shah over 7 years ago

  • Target version changed from Milestone 11 to Cleanup and Stablization for Server Features

Also available in: Atom PDF