Project

General

Profile

Troubleshooting Tools

This is a high level guide of some troubleshooting tools the FWD team has found to be useful for working with FWD applications. Generally, these tools are free to use. Some are open source.

Performance and Monitoring

VisualVM (Java 8)

VisualVM is a free tool that comes with the the Oracle reference JDK distribution, many (all?) OpenJDK distributions, and probably others. If you've installed the full JDK, you should already have this tool available on your system.

VisualVM is a graphical application based on the NetBeans IDE platform, but it runs stand-alone. It can perform a range of troubleshooting operations, including real-time memory, thread, and performance monitoring, as well as thread and heap dump creation and analysis. It is extensible; additional plug-ins can be downloaded and installed from within the application. This discussion is focused primarily on the CPU sampling aspects of the tool, though the optional "Visual GC" tool is excellent for monitoring memory allocation and garbage collection in real time. IIRC, the CPU profiling/sampling facility is part of the base platform.

I typically use VisualVM's CPU sampling to track down performance bottlenecks, where there is a test/process/routine for which performance differs between the original code running in Progress and the corresponding, converted code running in FWD. It is most useful to have a test that runs for a long enough time (at least a minute, if not longer), to generate enough CPU sample data to come to a meaningful conclusion. Generally, the shorter a sampling session, the less reliable the results, and the less obvious it is to determine the cause.

VisualVM is very accessible to use. If you didn't install the visualvm package when you installed your JDK, you will need to do so. The project has a home page at https://visualvm.github.io/, but you don't need to do much research to get started. Just run "visualvm" from the command line (or set up a launcher for it). If you find you need to increase the max heap (I think this is the only configuration I have ever had to change), you can find the configuration file at /etc/visualvm/visualvm.conf on Ubuntu Linux. There is likely a corresponding file for Windows JDKs.

You have to connect to a running JVM you wish to examine to access the tools. To connect, hover over the vertical "Applications" button in the left margin, and it will open a view of all running JVMs. Double click on the one you want to monitor.

The tab you need for CPU performance profiling (sampling, actually) is the "Sampler" tab. I don't recommend using the "Profiler" tab for anything but the simplest application; it will spend forever instrumenting your bytecode and ultimately is so slow, I've found it to be unusable for anything complex. The settings for the Sampler let you specify the packages you want to include or exclude from sampling. Generally, the only settings changes I've made are to set the sampling frequency to its most granular (20ms) and to exclude the com.goldencode.p2j.net package. The latter is useful, because otherwise your results are dominated by the time spent waiting on sockets (which the sampler counts in its profiling, because the socket reader threads are marked "runnable", rather than "waiting").

Start up the FWD server on an otherwise quiet system. When you're measuring performance, you don't want background tasks to skew your results and you don't want to swap, ever. Attach the sampler to the FWD application server JVM as described above (or to the client JVM, if measuring client performance; note that the client does not execute converted business logic). I usually sort by the "Self Time (CPU)" column, both while the sampler is running and when reviewing snapshots. Run the target test/process/routine whose performance you want to measure. When the test is done, or at any time during its execution, you can push the "Snapshot" button to capture accumulated results. Each snapshot is separate from the others, so you can take multiple without dropping earlier ones. It is from the snapshots and the active sampling view that you can get an idea where the performance bottlenecks are. You may have to do a little googling or reading from the project home page to get a better idea how to interpret your results; it's too much to summarize these techniques here.

One feature I use a lot is the backtraces feature. From the "HotSpots" tab in a snapshot view, right-click on any row and select "Show Back Traces". When using these, it is helpful to also enable the "Samples" column, using the little table icon at the far right of the column headers. It's annoying that you have to do this every time you open the view, though. This column will tell you how many samples were taken in each method reported in the backtraces.

The thing to keep in mind when using the sampler is that the times are not necessarily accurate. They are a statistical estimate of the times spent in individual methods, based on how many times the CPU sampler found itself in a given method when taking a sample every x milliseconds. So, if the sampler is set to a 20ms frequency and takes 10 samples in a certain method, it will attribute 20ms x 10 = 200ms to that method, regardless of how long it actually takes. The idea is that over a long enough time period, statistically, the accumulated sample times will estimate reality.

Thus, it is important to determine whether an apparent bottleneck is due to a method's implementation being inefficient, or just because it was called a gazillion times. Usually, the truth is some combination of both. It is important to know as a starting point whether to look at the method's implementation itself (i.e., it is not called that often, but takes a long time to execute), or at the method's caller(s) (i.e., it is not necessarily slow to execute, but is called often). This informs which code needs to be scrutinized for potential performance problems. Hence, the note about enabling the samples column in the previous paragraph.

pgBadger (PostgreSQL 9.x)

This is a free, open source, PostgreSQL log analyzer which parses the log file and provides some interesting analytics. This can help you track down database performance bottlenecks typically caused either by slow queries or by executing the same query many times (i.e., lots of database round trips). It will also highlight errors and warnings that you might otherwise miss in voluminous log files.

The pgBadger home page is at http://dalibo.github.io/pgbadger/. Technology prerequisites are a modern Perl installation and a modern browser (the presentation of results is rendered using JavaScript).

To use pgBadger to postprocess your database log files, you will need to (1) enable logging; and (2) modify your logging options in postgresql.conf, so that PostgreSQL logs information in a format pgBadger can understand. The specific options to change depend upon the way your particular PostgreSQL cluster is installed. The details are documented in the Documentation section of the pgBadger website. One thing to keep in mind is that you generally want to use very verbose logging to generate log files for troubleshooting, but you probably will not want the same level of verbosity for production use, since the logging itself can impact production performance and disk use. Thus, you may want to maintain two sets of postgresql.conf files; one for troubleshooting and one for production.

The high level troubleshooting process with pgBadger is to:
  1. enable pgBadger-friendly database logging and restart the database cluster;
  2. start up the FWD application server;
  3. perform the test/routine which you want to analyze;
  4. use the pgBadger command line script to process the PostgreSQL log file(s) (you can use optional parameters to scope the analysis to a certain time window);
  5. review the results in a browser.

Don't forget to reset the logging options and bounce your cluster again after your troubleshooting session, if you are using different logging options normally.

Memory Leaks

When a review of code is not sufficient to determine the root cause of a memory leak, a heap dump analysis often is necessary. A good tool for this is the eclipse Memory Analysis Tool (MAT - https://www.eclipse.org/mat/). It can be installed either standalone or as an eclipse IDE plug-in. Installation is left as an exercise to the reader (see https://www.eclipse.org/mat/downloads.php).

The author of this page has not used the standalone version, so this page will be written from the perspective of using the MAT as a plug-in from within the eclipse IDE.

Note: to use the MAT with larger heap dumps, eclipse itself will need to be given a larger maximum heap setting, generally enough to accommodate the size of the heap dump captured. This can be done by adjusting the -Xmx<n>g setting within the eclipse.ini file and restarting eclipse. See your eclipse documentation for the location of eclipse.ini on your particular system.

Capture a Heap Dump

The simplest way to acquire a heap dump is to do so from within the eclipse IDE. When installed as a plug-in, a "Memory Analysis" perspective is added to eclipse. This perspective includes two useful buttons in eclipse's tool bar:

The button on the left is used to acquire a heap dump by attaching to a running JVM. You will be prompted to select the JVM to which to attach, and to provide a file name/location within which to store the acquired dump. When you acquire a heap dump in this way, you can choose to open the heap dump in the MAT as soon as it is acquired.

The button on the right is used to open a previously acquired heap dump from the file system.

It is advisable to use a disk with the fastest possible I/O (e.g., an SSD) if the heap dumps you acquire aree large. Larger dumps take a while to acquire and to write to disk, and take even longer to open in the tool for the first time, as multiple indices are created and cached on disk. Be prepared to consume a lot of disk quickly, and to either archive or discard old heaps as soon as possible to avoid running out of space.

eclipse can acquire a heap dump from a locally running JVM, but if your target JVM is not running on the same system as eclipse, you will need to use a different tool to acquire the heap dump and then transfer the resulting file to a system where the MAT is available. Once such tool that comes with the JDK is jmap (https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr014.html). jmap has several modes of operation; use the mode which produces a binary heap dump [TODO: document usage].

Analyzing Heap Dumps

mat_toolbar_buttons.png (912 Bytes) Eric Faulhaber, 08/09/2021 02:12 PM