Project

General

Profile

Debugging Converted Code

This chapter provides a general process for debugging converted code. The most common facilities, tools and techniques used during debugging issues will be covered. Using these facilities, tools and techniques within the structured framework of the debugging process provides a sound basis for handling any kind of issue.

Debugging Process

Converted applications are Java classes that run inside an application server environment. This can be thought of as a "container" which provides services and Progress 4GL compatibility. Special understanding and strategies are needed to effectively debug applications in such an environment.

Any deviation between a converted application and the behavior of the original 4GL application can be due to a problem in the conversion process, a defect in the runtime or both.

Step 0: Prepare the Dual Environments

Since this debugging session is all about determining the root cause of the difference in behavior between a 4GL program and the converted equivalent, it is critical to have "twin" environments, one in Progress 4GL and the other in Java.

These twin environments must match (be identical) in all of the following:

  • database schema
  • database contents/data
  • the same application source code (being run in Progress and as a conversion input to the FWD process)
  • the same terminal types (e.g. VT320) and sizes (80x24) in CHUI applications

Any deviation may make debugging painful or even impossible, depending on the problem.

Before going further, confirm the following:

  • The 4GL program at issue was converted to its Java equivalent via the FWD conversion process.
  • Neither the 4GL program nor the converted Java program have been manually modified since the conversion ran.

Another important point is that the debugging environments should sufficiently duplicate the conditions of the production environment such that the problem can be duplicated. It is assumed that the production environment is never used for debugging.

Step 1: Confirm the Deviation

Make sure that the problem recreate is well documented.

Using the same client system and two instances of the same terminal/terminal emulator, connect to each of the dual environments. One terminal client instance would be connected to the Progress 4GL environment and the other terminal client instance would be connected to the FWD-based converted application. Make sure that the configuration settings and size for the terminal clients are identical. Even a 1 row or 1 column difference can have a major effect on the resulting look and behavior of the application.

Execute the recreate with the same exact keystrokes (and timing if possible) on both clients simultaneously (or as near to simultaneously as possible). After each keystroke, compare the screens. The result should be identical if there is no problem.

Confirm the following:

  • The original 4GL program works as expected.
  • The appearance or behavior for the Java program deviates from the original Progress 4GL implementation.

It must be possible to deterministically cause the difference/problem to occur in the Java environment. It is often important to document the exact key sequences the user must execute to get to the difference, since minor changes in user input can subtly change the runtime behavior in the Java code even when such differences are not obvious in the 4GL environment. Once the problem has a documented and consistent recreation scenario that shows a deviation from the original 4GL environment, this step is complete.

WARNING: Progress 4GL is absolutely full of bizarre, non-intuitive, hidden behavior that appears to be wrong. Do not assume that a strange application behavior on the Java side is wrong before you check the exact case on the 4GL side. FWD works to duplicate all of these user-visible behaviors exactly. Always compare the environments with the same exact recreate steps and using screens/terminals that are compared "side by side".

Step 2: Find a Starting Point for Debugging

In order to debug a problem, a developer must have a good starting point. The following are the most common (and useful) cases:

  • There is a relatively unique and/or descriptive error message that occurs at the time of the deviation or failure.
  • There is relative unique text on the screen at the point of failure. For example, a title or label text from a frame that can be traced back to the source code.
  • There is a Java exception with a stack trace (usually this will be found in the server log, but it may appear on the screen or on a client-side log file).
  • Server (and rarely, client) log file entries at the moment of failure.
  • A thread dump from the FWD application server and/or the FWD client.

Gather as much of this as is available, to ensure that if one is not a useful starting point, that the odds of having something useful are increased.

If there is no evidence of a failure except a user report that some behavior is different, get the recreate going to the point of the failure (or as close to the failure point where there is a pause) and then take a thread dump. Capture one or more screen shots just before, at and after the failure point.

Step 3: Find the Java Code Where the Failure Occurs

Before a problem can be debugged, the failing code must be identified.

Using unique error message text or for unique text from a frame on the screen, the developer can search through the converted Java code to find matches.

Stack traces (from an exception or from a thread dump) have specific Java filenames and line numbers to help locate the source line of a failure.

Step 4: Debug

Use the facilities, tools and techniques that are documented below to isolate and fix the problem.

Once the Java code is found, it can be helpful to review the 4GL code that was the original source. This can help one understand the original intent of the code. Or if there is some deviation that was caused by a defect in conversion processing, that problem may become apparent.

It may also help to debug through FWD code to see how or why particular behaviors are being triggered by the Java business logic

Facilities, Tools and Techniques

Error Messages

Error messages can be application-level or they can be originated in the FWD runtime. Either way, they are normally displayed on the screen (often in one of the MESSAGE lines, sometimes in an alert box).

A runtime error message will usually begin with two asterisk characters, followed by the error text and an error number in parenthesis. For example:

** Some error message text. (-1)

Runtime errors will also be written to the logging implementation. This may be useful to put the problem in the context of the larger log file. It is also very useful in the case where a user reports that he or she saw an error message, but did not report the exact text of the message.

The specific error text of a given runtime error will be easily traced back to a small number of locations (usually only one location in practice) in the runtime code.

If the error message displayed is an application message, then the error text can be traced back to its source in the application code.

Once the source of the message is found, a debugging breakpoint can be set at that location such that an effective debugging session can be started.

Of course, it is possible that the error message itself contains enough information to diagnose the problem, avoiding the need for a debugger session.

Warning: many errors that are generated in FWD are also generated under the Progress 4GL environment. Such errors are not an indication of a deviation, if both environments generate the same error message. So the presence of an error is not proof that there is a deviation between FWD and the 4GL.

Exception Stack Trace

Some abnormal ends (i.e. abends) are accompanied by such a specific exception that the problem is self-evident. If this is not the case, then a more thorough investigation will be needed.

There are 4 common or expected conditions that are normally encountered by Progress 4GL code. These conditions are ERROR, END-KEY, STOP and QUIT. In the FWD runtime, these conditions map to a sub-class of the ConditionException class. Any such conditions that are raised by the operation of a converted application will not cause an abnormal end. Rather, these exceptions are naturally handled and in fact are necessary to ensure the compatible application control flow.

In addition to the Progress conditions, FWD has other expected uses of exceptions which are not considered abends or failures. For example, in FWD special purpose exceptions are used as a mechanism to duplicate the exact flow of control mechanisms of the 4GL, which could not otherwise be duplicated.

The runtime is fully aware of, expects and handles ConditionException instances as well as the flow of control exceptions. An abend only occurs when an unhandled exception is encountered by the application code. This will naturally unwind the stack completely and the user's session will exit (abnormally). Before exiting, the runtime will generally log the stack trace for any such exception. When a user reports that their session had an abend, the stack trace can usually be found in the server's log file. In some cases, the STDERR of the client may have received the stack trace output.

Server Stack Traces

When an abend occurs on the server, it will usually be generated on a user's session thread. This means that only that user will be affected by the abend. The server will continue running and all other users will continue working without any affect. All other (non-user) threads are especially well protected from exceptions and are running a small, well tested and well known code base.

Since all real server-side work occurs on user-session threads, this is where the problems most often occur. At the root of a server stack trace will usually be a Thread.run() and a Conversation.run() call respectively. User session threads are "conversation" threads that are dynamically created and are used for the lifetime of the user's session. This is a dedicated thread for that specific user and it will not return until the user exits the application, generates an unhandled STOP condition (usually via CTRL-C on the client), the user's client diconnects or the user encounters an abend.

Further up the server stack trace one will usually find a reference to StandardServer.standardEntry() which is the primary entry point on the server for interactive clients. The Java reflection classes are used in many places to implement dynamic invocation of classes and methods. This means that the J2SE reflection methods often appear in groups on the stack in between FWD classes and methods. Further up the stack will be an invocation of the "top-level" application entry point. This is the execute() method of the Java class which was generated from the Progress 4GL program that is normally used at startup of the application. From there the stack trace will be completely dependent upon the call chain of processing in the application.

Client Stack Traces

When an abend occurs on the client, this will almost always be thrown on the main thread of the application.

For an interactive client, the root of the stack would normally be ClientDriver.main(). Further up the stack is usually a reference to something like $Proxy2.standardEntry() which is a remote invocation of the StandardServer.standardEntry() method that exists on the server side. Above there, one will normally see Dispatcher.processInbound() which is an indication that the server has called a client-side exported API to obtain some kind of service such as rendering an application frame (a screen).

Exceptions thrown on the client during the processing of some server-driven request (this is almost always how things work) will unwind the stack "over" the network and be re-thrown on the server side. This means that client side stack traces can and will be seen in the server's log file. Just because the stack trace is present in the server log, does not mean that the problem is a server-side issue.

Near the top of the stack (client or server) will usually be a good location where one can set a breakpoint to start a productive debugging session.

More information about the roles of specific threads in the system can be found in the section Understanding FWD Threading below.

Fixing Regressions and Abnormal Ends

As mentioned in the previous section, abends usually are accompanied with a stack trace, either on the server or client side. Sometimes, due to the complexity of how various 4GL features interact, looking only at the code which caused the regression is not enough to isolate and duplicate the problem in a minimal testcase. This section will cover some techniques about how to isolate the problematic code, with or without access to a testing environment for the converted application; but, if help can be provided by someone with access to a testing environment (running on converted or legacy code), use this to gain access to heap/thread dumps, recreation scenario, or other business logic info which might be of help. In any case, access to the application's converted Java code (and ideally the original 4GL code) is needed.

Analysis Without Access to a Testing Server

If access to a testing environment with a running server is not available, we need to make use of any information we have at hand: this can include stacktraces, application screen shots, application error messages, recreation scenario, etc. Regardless of the available information, the final goal is to identify the 4GL code which causes the problem and, if possible, the runtime callpath to reach the 4GL code (to give an idea of the runtime state of various 4GL features active/in-use at the time of the abend).

Looking at the stacktrace, we can identify the location of the 4GL code which caused the abend, by looking at the last entry which targets Java code in a class located in one of the converted application's (sub)packages. To find the converted application's root package, see the pkgroot configuration parameter in the p2j.cfg.xml file (in the conversion project for this application). This will give you an idea of the 4GL code which causes the abend. Locate the 4GL program file associated with this Java class and determine the corresponding line for this Java code. This can be done either by looking at surrounding Java code and finding the corresponding code in the 4GL file; to speed up the search, you can use buffer names, frame names, variable names, string literals, comments, or whatever else seems to be unique to that portion of converted Java code. Some times, if include files or preprocessor directives are used, you will need to look at the generated .cache file, for that external program. If the preprocessed .cache file is not available, you can run the Front Phase conversion only for the problematic external program (to avoid running a full conversion for the entire code set, which can be time consuming). If access to the original 4GL code is not available, use the details in the Conversion Reference book to identify which 4GL features stand behind the emitted FWD APIs.

Having a stacktace, this can help in identifying the runtime state for some features, as the call path for the problematic code is available. This state includes especially cases when shared variables, buffers or parameters are used (i.e. state is computed by the caller and passed to this program). Go down the stack trace entries and check how this state is computed.

Without a stack trace, look at the available information and think how you can make the best of it. If screen shots are available, try to identify some unique features about that screen: a frame title, a widget's label or other static text which seems to be specific to this screen. Search through the 4GL code (searching through the .cache files is easier, as you will not have to determine which external program uses an include file) and find the external programs using this text. If a recreation scenario is available, identify which code calls the the problematic (usually the last) step. If the callpath needs to be identified, walk the recreation scenario backwards, to determine who calls who: this can be tricky, so you can use what you found until now to ask for help from someone with access to a running server, if you think it can get you the remaining information faster.

Now, that you have a knowledge about the problematic 4GL feature and the runtime state, look into how this 4GL code is setup: the goal is to find a minimal standalone testcase which can reproduce the problem. If the failure is not dependent on prior state, it can help to duplicate the entire business logic from that program into a standalone testcase and, if the testcase shows the problem, use a “divide-and-conquer” approach to minimize it: remove any code which is not related to the failure, in sections, until the problem is no longer seen - at this step, the last removed section should be the problematic one, but don't forget that this still can be dependent on the structure of outer blocks or some other state. In other cases, it is easier to look at the sequence of statements being called and some other state (i.e. frame setup, loaded records, transaction states, etc) and just follow the business logic of the application to create a standalone test which duplicates it, and after this start minimize it.

For example:

  • when looking into frame-related errors, you will need to identify how is the frame configured, what is the frame state upon the point of failure, which FWD APIs are used to setup/configure the frame, which widgets are used in the frame, etc
  • when looking into persistence-related errors, you should identify how are the transactions set up (are sub-transactions used?), how the record (and DB state) looks at the point of failure, are multiple users involved?
  • when looking into FWD APIs which require input parameters, determine what are the arguments passed to these APIs.
  • if user input is required by the point of failure, determine the user input.

Analysis With Access to a Testing Server

If access to a testing server is provided, running the recreate scenario using the server should duplicate the problem. If the testing DB doesn't show this problem, determine the production state of the tables in use by the problematic code (or better, find if you can connect the server to a snapshot of a production DB showing the problem). In any case, is best to run the server on your local machine, in debug mode (the location of the DB to which the server connects matter only in a query performance point-of-view, as connecting to a remote DB might slow down things when compared with a local DB). If the FWD server can't be ran on your local machine, start it on the remote machine and connect to this remote server by redirecting the server's debugging port; but keep in mind that the debugging will be significantly slower and will depend a lot on the connection speed between your location and the remote machine.

Once the server is started and a debug session is up, if a stacktrace is available, you can place a breakpoint at the point of invocation of the last FWD API before the failure (in the converted server code). Start the client and execute the recreate scenario. When the server stops at the breakpoint, look at the application state before invoking the FWD API and try to isolate the problem. The server and client logs will provide info about any exceptions were thrown when this scenario is executed (sometimes adjusting the logging settings to a finer level might be helpful). Also, note that if the client restarts or stops unexpectedly when a step is executed, this is almost surely a problem: when an abend is encountered on server side, the FWD runtime wraps the Java exception in a 4GL-style STOP condition which, if explicitly handled by the business logic or depending on how the client is setup, will either restart or stop the client.

If a stacktrace is not available, use the recreate to position your client just before the last step in the scenario is executed. Once this is done, you can either get a thread dump from the server process and find the client's Conversation thread or you can suspend the client's Conversation thread (on server-side) and determine the current stacktrace. Use your debugger to determine the application state before the FWD API is invoked and step into the problematic FWD API, to check how it behaves.

After you have found the problematic 4GL application code, if the bug is not obvious after debugging into the runtime, use the notes in the previous section (as if a running server was not available) and the found runtime state, to duplicate the scenario in a standalone case. But even if the bug is obvious after debugging into the runtime, create a standalone testcase which duplicates it, as it will be easier to work on a fix using this minimal case than going through the entire recreate scenario each time a fix attempt is made.

Logging

For the runtime, there is extensive and configurable logging for the server (but none for the client). As with the majority of the runtime configuration, these values are read from the directory. The logging infrastructure is based on that provided by the java.util.logging package in J2SE. The terms and data structures used in that package will map to those used below. Logging levels in J2SE are constants of the Level class. The following mappings are used to convert a symbolic string in the directory (the server's primary configuration file) into a Level constant. This table is in order of least verbose logging to most verbose logging:

Configuration Value Logging Level
OFF Level.OFF
SEVERE Level.SEVERE
WARNING Level.WARNING
INFO Level.INFO
FINE Level.FINE
FINER Level.FINER
FINEST Level.FINEST
ALL Level.ALL

The directory will usually contain at least one "logging" section, however the defaults set below will take effect if there is no such section. The configuration is normally found via the path /server/default/logging and in that case it defines the default values for all servers. A given server can have a more specific or custom logging configuration by adding the section via the path /server/[serverid]/logging where [serverid] is the unique server account name for the server being configured. The server-specific path is searched first and if found it is honored. Otherwise the default path is searched and honored if found. If no logging configuration is found at all, the defaults will be used. The following is an example of the directory logging section:

<node class="container" name="logging">
  <node class="string" name="level">
    <node-attribute name="value" value="WARNING"/>
  </node>
  <node class="container" name="handlers">
    <node class="container" name="console">
      <node class="string" name="level">
        <node-attribute name="value" value="FINEST"/>
      </node>
    </node>
    <node class="container" name="file">
      <node class="string" name="level">
        <node-attribute name="value" value="OFF"/>
      </node>
      <node class="boolean" name="append">
        <node-attribute name="value" value="FALSE"/>
      </node>
      <node class="integer" name="limit">
        <node-attribute name="value" value="256"/>
      </node>
      <node class="string" name="pattern">
        <node-attribute name="value" value="%h/server%g.log"/>
      </node>
      <node class="integer" name="count">
        <node-attribute name="value" value="4"/>
      </node>
    </node>
  </node>
  <node class="container" name="loggers">
    <node class="string" name="com.goldencode.p2j.util">
      <node-attribute name="value" value="INFO"/>
    </node>
    <node class="string" name="com.goldencode.p2j">
      <node-attribute name="value" value="WARNING"/>
    </node>
    <node class="string" name="com.goldencode.p2j.net">
      <node-attribute name="value" value="INFO"/>
    </node>
    <node class="string" name="com.goldencode.p2j.persist">
      <node-attribute name="value" value="INFO"/>
    </node>
    <node class="string" name="org.hibernate.cache">
      <node-attribute name="value" value="SEVERE"/>
    </node>
  </node>
</node>

The following documents the options for configuring the directory based logging. All node paths are relative to the .../logging/ section itself (which may be located in different paths, see above). So level would be found at .../logging/level in the directory.

Node Path and ID Type Default Value Meaning
level string WARNING This specifies the logging level for any otherwise unconfigured handlers. Note that at this time, additional handler support is not provided or used in FWD.
handlers/console/level string WARNING The logging level for the console handler. This will limit all log output via the console to this level and above. All of this console output will be written to the server process' STDERR.
Warning: no matter what logging level the logger (see below) is set to, only log entries of equivalent or less verbosity will be allowed to emit on this handler. Thus this logging level is a filter to exclude content that has been created by one or more loggers.
handlers/file/level string OFF The logging level for the file handler. This will limit all log output via the file to this level and above.
Warning: no matter what logging level the logger (see below) is set to, only log entries of equivalent or less verbosity will be allowed to emit on this handler. Thus this logging level is a filter to exclude content that has been created by one or more loggers.
handlers/file/append boolean FALSE TRUE will append to the file handler's previous log file, FALSE will overwrite.
handlers/file/limit integer 8 Maximum file size for each log file in megabytes for the file handler.
handlers/file/pattern string %h/p2j_server_%s_%g.log File name pattern for the file handler. %h is the current user's home directory, %s is the server account name and %g is a generation number.
handlers/file/count integer 4 Number of log file generations for the file handler.
loggers/[logger_name]/level string n/a The logging level for any arbitrary logger name (in a hierarchical namespace). By convention, the namespace is based on the package/class name hierarchy. This is not required since any arbitrary text can be inserted. However, FWD does use the package and optionally the class name when creating loggers. If these loggers have a name that begins with at least one of the specified logging levels in this section, the most specific (longest) matching name will specify the logging level for that logger. There may be any number of these entries. As a result, one may specify the logging level for a specific package (and by default all contained classes and sub-packages and their classes). In addition, a more specific class or package name will override the "parent" logging level with the one specified in that more specific entry.
This sets the level for the source of a log message. That means that the class that is logging will only log messages at that level (and all less verbose levels). But the handler's (the output destination for loggers) logging level (which is set separately) may exclude the generated messages if the handler's logging level is less verbose than that of the logger being used.

Although J2SE supports pluggable handlers and formatters, only console and file handler types are supported at this time. Likewise, user-defined formatters are not supported. FWD uses its own custom formatter that displays state from the current session and thread's context. Each log entry will look like the following in the case that the message was output from a server thread that does not have an end-user context:

[10/02/2007 13:49:11 EDT] (SecurityManager:INFO) {main} Message text.

In the case that the message is logged from a thread that has an end-user context, the message would look like this:

[10/02/2007 13:49:11 EDT] (Persistence:WARNING) {026:014:ges} Message text.

In both cases, the first section is the date/timestamp. Likewise, both will have a second section with the class name of the source of the message and the message level. The portion with curly braces is either a {thread_name} or user context data. The user context data is in the format {session_id:thread_id:user_id}. The main content of the log entry follows the curly braces.

Useful General Purpose Logging

When all else fails, logging can be a useful fall-back plan. The reason one doesn't usually start with logging is that the output from logging can be quite voluminous and thus very time consuming to interpret.

Please see the section above on the logging facilities built into the runtime. These facilities would be used to configure each server's logging settings. After changing any of the directory configuration for logging, the server will need to be restarted for those settings to take effect.

The following classes have highly useful, general purpose logging implemented:

Class Best Logging Level Output
com.goldencode.p2j.util.TransactionManager FINE This will provide a log entry on every block related event and the block's current state at the time of the event. Events include entering a block, iterating a block, retrying a block and exiting a block.
com.goldencode.net.RemoteObject FINER Provides a log entry for every outbound remote call.
com.goldencode.net.RemoteObject FINEST In addition to the output from FINER level (see above), this provides a dump of the remote call parameters and a log entry (including the elapsed time) for when the remote call completes.
com.goldencode.net.Dispatcher FINER Provides a log entry for every inbound remote call.
com.goldencode.net.Dispatcher FINEST In addition to the output from FINER level (see above), this provides a dump of the remote call parameters and a log entry (including the elapsed time) for when the call completes.

Since the FWD remote object protocol is bi-directional (the client can call the server and the server can call the client), it is usually important to implement RemoteObject logging (to see outbound calls) in conjunction with Dispatcher logging (to see inbound calls). The combination allows the developer to see all protocol traffic.

Thread Dumps

If one does not have any error message or exception stack trace from which to start, one can generate a thread dump on either the client and/or the server. A thread dump is a listing of all currently existing threads in the JVM and a stack trace from each of those threads. By interpreting these stack traces, one can get a very clear snapshot of the status of the client and/or server.

Before triggering the thread dump, it is best to recreate the problem and to get as close to the failure point as possible without passing the failure point. It is also important to ensure that the processing stays at that location long enough for one to trigger the dump.

The thread dumping feature is a standard facility of the Sun HotSpot based JVM, however virtually all JVMs will have this feature. In the Sun JVM, there is a simple mechanism to notify the JVM that a thread dump needs to be produced. The JVM will pause all threads, for each one it will write the stack trace to STDOUT and then the threads will be resumed. The threads do not halt permanently and there is no negative side effects of this thread dump. For this reason, this procedure is safe to use on a production server.

On Linux, one must know the process ID (pid) of the client or server process which needs to produce the dump. This can normally be obtained via the ps command. To generate the dump, run the command kill -SIGQUIT <pid> OR kill -3 <pid>.

On Windows, one must find the console in which the client or server is running and then manually execute the CTRL-BREAK key combination while that console has the "focus" (while user input is directed to that console).

Please note that the server output will be cleanly captured on STDOUT, but on an interactive CHUI client, STDOUT is in use for the application itself. In other words, generating a thread dump on a Linux CHUI client process will corrupt the screen. In addition, it is likely that the terminal is not in a scrolling mode. This would cause most of the thread dump to be "lost" since the user will not be able to scroll up the terminal buffer to see all of the output. On Linux, in the KDE Konsole terminal application, this problem can be resolved by executing the "Edit -> Reset & Clear Terminal" menu item BEFORE the thread dump is triggered. That will allow the terminal buffer to be scrolled and the full thread dump output will be visible. It is likely that after this the CHUI client will not operate properly so it may need to be exited.

A better approach to generate a stack trace on the client is to use the jstack utility. This works for the server as well. The idea is that the jstack program connects to the JVM and obtains the stack trace and then displays it on the console of the command shell in which jstack is running. That means that the thread dump does not appear on STDOUT of the client or server JVM, which in the client case is desirable. The syntax is:

jstack [option] <pid>

The <pid> is the process identifier which can be obtained with ps on Linux. The options are not mandatory, but the -F should be used if the jstack command hangs and never returns. The -F is a force option that makes sure that a thread dump of some sort will be obtained.

It is perfectly acceptable to generate thread dumps more than once for the same process. There is no adverse affect, however one will need to be careful in the interpretation of the STDOUT contents since the thread dumps will concatenate. Make sure the thread dump being analyzed is the one that is expected.

Once the thread dump is available, it can be used just like an exception stack trace to get a starting point for debugging. Please see the Understanding FWD Threading section below for more details on the interpretation of thread names and their associated purposes.

Understanding FWD Threading

The FWD runtime environment is heavily threaded. These threads are organized in related sets called thread groups.

Threads that are created as part of the Java Virtual Machine (JVM) but which are not application threads (e.g. any thread in the "system" thread group) can be ignored. These threads handle JVM housekeeping tasks and are not FWD threads, nor do they provide any direct functionality for a converted application. The following are possible JVM thread names (any use of X in these names implies an integer value such as 0 or 1):

  • Attach Listener
  • Low Memory Detector
  • CompileThreadX
  • Signal Dispatcher
  • Finalizer
  • Reference Handler
  • GC task thread#X (ParallelGC)
  • VM Periodic Task Thread
  • RMI *

In different JVMs and even in different versions of the same JVM, the set of VM threads can change. The key is to identify the VM threads and ignore them.

The following is a guide to the FWD threads on the FWD client:

Client Thread Name Thread Group Purpose
main main This is the thread on which the system initializes. Once a session with the server is established, this thread becomes the "conversation" thread (a dedicated message processing thread) if the network protocol is in conversation mode.
Reader protocol Reads new messages out of the network session's socket and enqueues them for local processing.
Writer protocol Dequeues outbound messages and serializes them to the network session's socket.
Dispatcher X dispatcher A thread that processes inbound messages when the session is not in conversation mode. This thread also processes all asynchronous messages for the session, even when in conversation mode. There are usually at least 2 of these threads. Each thread has an integer index, with the first thread being Dispatcher 0, the second being Dispatcher 1 and so on.
KeyReader main This is a dedicated thread that fills the type-ahead buffer with any user generated keystrokes. It is in an infinite loop handling this function. Other client threads read the keys out of the type-ahead buffer as needed. Note that this must be a separate thread to duplicate the asynchronous nature of the CTRL-C key combination in a Progress environment. CTRL-C interrupts most processing whether the current logic is executing on the client or the server.

The following is a guide to the FWD threads on the FWD server:

Server Thread Name Thread Group Purpose
main main This is the thread on which the system initializes. Once initialization is complete, this thread listens on the open server socket for new secure socket (SSL) connections. The thread stays in a loop to accept connections until the server needs to exit. The function of this thread would be best called the Secure Socket Listener, but since the main thread is a special thread for the server it is better to leave that name alone. When the main thread dies, the server process will end.
Reader protocol Reads new messages out of the network session's socket and enqueues them for routing or local processing.
When a session is first created and before the peer side is authenticated, the protocol is not yet fully initialized. For this reason, the thread name does not yet have any identifying data (a session ID or the account name).
Reader [session_id:account_name] protocol Reads new messages out of the network session's socket and enqueues them for routing or local processing.
Once a session is fully initialized (authentication has completed), the thread name of the Reader thread is modified to include the session ID and the account name.
Writer protocol Dequeues outbound messages and writes them to network session's socket.
When a session is first created and before the peer side is authenticated, the protocol is not yet fully initialized. For this reason, the thread name does not yet have any identifying data (a session ID or the account name).
Writer [session_id:account_name] protocol Dequeues outbound messages and writes them to network session's socket.
Once a session is fully initialized (authentication has completed), the thread name of the Writer thread is modified to include the session ID and the account name.
Incoming INSECURE Connector connection Temporary worker thread spawned by the Insecure Socket Listener thread to handle the initialization and authentication of the session. This also handles the startup of the protocol's queue instance.
Incoming SSL Connector connection Temporary worker thread spawned by the main thread (which is performing the function of a “Secure Socket Listener”) to handle the initialization and authentication of the SSL session. This also handles the startup of the protocol's queue instance.
Conversation [session_id:account_name] conversation A dynamically created thread that is dedicated to message processing for a specific session that is in conversation mode. The thread is created at session establishment and ends when the session terminates. When processing is on the server side, this thread is active or it is blocked waiting for service from some external entity such as a response to a database query. When processing is on the client side, this thread is blocked waiting for a response from the client or waiting to process a new inbound request from the client. The client/server protocol for FWD provides for nested calls and reentrancy.
Dispatcher X dispatcher A thread that processes inbound messages when the session is not in conversation mode. This thread also processes all asynchronous messages for the session, even when in conversation mode. There are usually at least 2 of these threads. Each thread has an integer index, with the first thread being Dispatcher 0, the second being Dispatcher 1 and so on.
Router Y router A thread that processes inbound messages which are not destined for the local system. These threads also handle all address space management for the FWD network. There are usually at least 2 of these threads. Each thread has an integer index, with the first thread being Router 0, the second being Router 1 and so on.
Insecure Socket Listener connection This is spawned from the main thread during server startup if and only if insecure sockets are enabled. This thread operates the same socket listening loop as the main thread. The only difference is that the server socket is a regular (non-SSL) socket, so all incoming sessions on this listener are insecure.
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#Z n/a 3 threads that provide JDBC resource pooling. Implemented by C3P0 in conjunction with Hibernate and JDBC. Each thread ends with an integer index that is 0, 1 or 2.
GlobalEventManager Queue Purge Thread n/a Cleanup thread which clears expired multi-session data edit notifications.
zzzzzzzzzz@qtp0-1 - Acceptor0
SslSocketConnector@0.0.0.0:yyyy
n/a Embedded Web Server (Jetty) socket listener thread. Only present when FWD administration support is used. The zzzzzzzzzz is a unique identifier and the yyyy is the port on which it is listening.
nnnnnnnnnnnnn@qtp0-0 n/a Embedded Web Server (Jetty) thread pool support. Only present when FWD administration support is used. The nnnnnnnnnnnnn is a unique identifier.
Identity pool scanning thread n/a Calculates gaps in the usage of primary keys across a database. These unused primary keys are stored in a pool to be used when a primary key is requested.
SequenceIdentityManager [local/database_name/primary] session closer n/a Allocates unique primary keys from a database sequence. The text inside square brackets includes the database name that this thread will handle. There will be one of these threads for each local database which is not supporting temp-tables.

Using a Debugger

It is quite useful to use debuggers to enable source level debugging. All the normal debugger facilities are available including single-step execution and variable inspection. Some Integrated Development Environments (IDEs) may provide enough flexibility to allow the IDE to dynamically launch the FWD client or server. However, the following tips will help to enable remote debugger connection at the startup the FWD server or FWD client. This is known to work very well and it has the advantage of allowing one or both sides of the FWD environment to run at full speed until it is time for the debugger to connect.

Add the following parameters to the JVM command line to enable a remote connection via sockets:

-Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,address=<port>,server=y,suspend=n

Change the <port> to the TCP/IP port on which the JVM will listen for the remote debugger's connection. Note that the suspend=n means that the JVM process will be allowed to run normally. To suspend the JVM after the debugging support is ready for connections, use suspend=y. That will allow debugging very early initialization of FWD, but it generally is not needed.

These are JVM parameters which means they must be added to the command line after the java command and before the name of the Java class to execute. Here is an example of starting the FWD server with debugging enabled:

java -XX:MaxPermSize=256m -Xmx2048m -server -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,address=2080,server=y,suspend=n -classpath $P2J_HOME/p2j/build/lib/p2j.jar com.goldencode.p2j.main.ServerDriver server.xml

In this case the Java class name being executed is com.goldencode.p2j.main.ServerDriver.

The sample server startup script (server.sh) and client startup script (client.sh) both support a -d option to enable debugging. Run the scripts with the -? option to see a syntax diagram.

It is possible and sometimes useful to simultaneously debug both sides of a session at once (both the server and the client). This is very useful when the developer has no idea on which side the problem occurs. The following tips can be useful:

  • The developer must know when the flow of control has been sent to the other side to know when to switch debuggers.
  • It is best to set temporary breakpoints at locations that you know will be hit when the flow of control switches. This allows the debugged process that is not being actively debugged to be in “run” mode instead of blocked. When flow of control shifts to that process, the debugger will automatically break.
  • To ensure that a shift back can be caught, set a temporary breakpoint on the line after the return from the other side and then leave that process in “run” mode.
  • Make sure to clear temporary breakpoints after they are triggered.

For more details and advanced debugging tips please see the book entitled FWD Developer Guide. That book also includes a list of useful locations to set breakpoints on both the FWD client and the FWD server.

Using a Profiler

Normal profiling techniques can be used on FWD code. The most commonly used tool is called HPROF. This is an example JVM option to force the JVM (at exit) to emit binary hprof profiling output in a file called output.hprof for further analysis:

-Xrunhprof:heap=dump,doe=y,format=b,file=output.hprof

The option must be placed on the command line between the java executable name and the class name of the Java program being executed.

The resulting HPROF binary file can be analyzed in various profilers and profile analyzers. For more details on HPROF, consult the J2SE documentation.

It is worth mentioning that when a thread is blocked outside of the JVM (e.g. in the kernel), the kernel may know that the thread is not runnable, but Java will not have this knowledge. If Java thinks a thread is runnable but in fact it is blocked, then any execution counters or execution times reported by the profiler can show significant differences from reality. This is due to the fact that Java profilers run inside or in cooperation with the JVM and more importantly they have no internal knowledge of the operating system kernel state.

To compensate for this, any dead time must be understood and discounted by the user since the profiler tool itself cannot help. Since FWD is very network dependent (between application server and database server, between client and application server, between application servers), there can be substantial dead time where the thread is runnable in Java but is blocked in the kernel (waiting for a network response in this example). File system and child process I/O wait time is another example of this potential dead time.

Techniques for Dealing with OutOfMemoryError (OOME)

The OOME usually is caused by the application using too much of the heap. The Java heap is the pool of memory used to satisfy application processing. Virtually all memory requirements for an application are satisfied from the JVM heap. The maximum heap size can be specified on the JVM command line using the sytax -XmxYYYm which sets the heap size at a maximum of YYY. The heap will grow to that point and if enough memory cannot be freed up (through garbage collection) to satisfy new allocation requests, then an OOME will be thrown. Usually, the JVM halts in such a case. The FWD server process has been somewhat hardened to the OOME and it will attempt to recover. This recovery should not be relied upon as a solution to an OOME. Any OOME is a real problem that needs to be addressed.

Most importantly, some OOMEs are due to the fact that the application just needs more memory. Other OOMEs are due to memory/resource leaks where unnecessary or dead resources are kept around too long and cannot be reclaimed. It is also possible that the FWD runtime has implementations of support that use more memory than needed to accomplish a particular function. Diagnosing and resolving an OOME requires the developer to determine the root cause of the problem.

J2SE ships a useful command line memory tool called jstat. It is a very good place to start with memory analysis as it gives real-time snapshots of the memory/heap usage and can be used on a production system.

The following command generates a report on basic garbage collection and heap utilization. The report is for the JVM with the process identifier of 8490 and each line in the report is a snapshot taken every 10 sections. There is a header line every 20 rows.

jstat -gcutil -h20 8490 10s

This is some same output from that command:

  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
 99.40   0.00  46.08   7.45  86.68      4    0.054     0    0.000    0.054
 99.40   0.00  46.09   7.45  86.68      4    0.054     0    0.000    0.054
 99.40   0.00  46.09   7.45  86.68      4    0.054     0    0.000    0.054
 99.40   0.00  46.09   7.45  86.68      4    0.054     0    0.000    0.054

This shows the utilization percentages for the two survivor areas (S0 and S1), eden (E), the old generation (O) and the permanent generation (P). YGC shows the number of young generation garbage collections and YGCT is the cumulative number of seconds spent in young generation garbage collection since the JVM was started. FGC is the number of full garbage collections and FGCT is the cumulative number of seconds spent in full garbage collection since the JVM was started. GCT is the total number of seconds spent in any form of garbage collection.

Other reports are also available, consult the jstat documentation for more details.

To go deeper into an OOME, the developer will need a heap dump. Normally, this is best taken at the point of the OOME. By adding the -XX:+HeapDumpOnOutOfMemoryError option to the JVM command line (between the java executable name and the Java class name), if/when an OOME occurs, the JVM will write out a heap dump.

Heap dumps can also be obtained using HPROF (see the section on Using a Profiler above), VisualVM (jvisualvm), using the Java Monitoring and Management Console (jconsole) or using the command line tool jmap. All of these tools usually are included with J2SE.

An example of jmap (substitute the JVM's process identifier for <pid>):

jmap -F -dump:[live,]format=b,file=<filename> <pid>

The -F option forces a dump to be taken even if the JVM is not responding. It is not required but it can be necessary if the JVM is not responding. The live parameter is optional; if specified, only live objects in the JVM's heap are dumped. The filename parameter specifies the output file which will be created. By convention, the filename ends with .hprof@; some analysis tools will expect this extension, so it is best to follow this convention.

In order to analyze a heap dump, having a full thread dump is very useful. Sometimes the heap dump may include this data but it is not always guaranteed. The normal methods of taking a heap dump interactively can be used (see the section above on Thread Dumps above).

There is also a mechanism in the JVM to capture a thread dump at the time of an OOME. Add this to the JVM command line:

-XX:OnOutOfMemoryError='./stack_dump.sh %p'

There must be a shell script called ./stack_dump.sh (in this case, but the path and file name can be anything) which would take a single parameter that is the process identifier of the JVM. The script name and the %p had to be put inside single quotes to ensure that those characters were passed unchanged to the JVM. This is an example of the contents of a ./stack_dump.sh:

jstack -F "$1" 2>"$1_stack_trace.txt" 1>&2

The reason the stack traces are very useful is to enable the tracing of heap usage back to specific code. Heap objects can have references that link them back to where they are being held in memory. However, in many cases the references can be intermixed into collections or arrays that include valid references. Thread dumps can show what was happening at the time of the failure which can help isolate the code that caused the invalid references to remain.

Once a heap dump is available, normal heap analysis tools can be used. Included with many J2SE implementations are tools such as Heap Analysis Tool (jhat) and VisualVM (jvisualvm) which can be quite useful. Many IDEs such as Eclipse and NetBeans also support heap analysis tool plugins. Analysis of a heap dump is a topic beyond the scope of this document; however, most analysis tools will provide some useful documentation to get you started.


© 2004-2017 Golden Code Development Corporation. ALL RIGHTS RESERVED.