Project

General

Profile

Troubleshooting Tools

This is a high level guide of some troubleshooting tools the FWD team has found to be useful for working with FWD applications. Generally, these tools are free to use. Some are open source.

Performance and Monitoring

VisualVM (Java 8)

VisualVM is a free tool that comes with the the Oracle reference JDK distribution, many (all?) OpenJDK distributions, and probably others. If you've installed the full JDK, you should already have this tool available on your system.

VisualVM is a graphical application based on the NetBeans IDE platform, but it runs stand-alone. It can perform a range of troubleshooting operations, including real-time memory, thread, and performance monitoring, as well as thread and heap dump creation and analysis. It is extensible; additional plug-ins can be downloaded and installed from within the application. This discussion is focused primarily on the CPU sampling aspects of the tool, though the optional "Visual GC" tool is excellent for monitoring memory allocation and garbage collection in real time. IIRC, the CPU profiling/sampling facility is part of the base platform.

I typically use VisualVM's CPU sampling to track down performance bottlenecks, where there is a test/process/routine for which performance differs between the original code running in Progress and the corresponding, converted code running in FWD. It is most useful to have a test that runs for a long enough time (at least a minute, if not longer), to generate enough CPU sample data to come to a meaningful conclusion. Generally, the shorter a sampling session, the less reliable the results, and the less obvious it is to determine the cause.

VisualVM is very accessible to use. If you didn't install the visualvm package when you installed your JDK, you will need to do so. The project has a home page at https://visualvm.github.io/, but you don't need to do much research to get started. Just run "visualvm" from the command line (or set up a launcher for it). If you find you need to increase the max heap (I think this is the only configuration I have ever had to change), you can find the configuration file at /etc/visualvm/visualvm.conf on Ubuntu Linux. There is likely a corresponding file for Windows JDKs.

You have to connect to a running JVM you wish to examine to access the tools. To connect, hover over the vertical "Applications" button in the left margin, and it will open a view of all running JVMs. Double click on the one you want to monitor.

The tab you need for CPU performance profiling (sampling, actually) is the "Sampler" tab. I don't recommend using the "Profiler" tab for anything but the simplest application; it will spend forever instrumenting your bytecode and ultimately is so slow, I've found it to be unusable for anything complex. The settings for the Sampler let you specify the packages you want to include or exclude from sampling. Generally, the only settings changes I've made are to set the sampling frequency to its most granular (20ms) and to exclude the com.goldencode.p2j.net package. The latter is useful, because otherwise your results are dominated by the time spent waiting on sockets (which the sampler counts in its profiling, because the socket reader threads are marked "runnable", rather than "waiting").

Start up the FWD server on an otherwise quiet system. When you're measuring performance, you don't want background tasks to skew your results and you don't want to swap, ever. Attach the sampler to the FWD application server JVM as described above (or to the client JVM, if measuring client performance; note that the client does not execute converted business logic). I usually sort by the "Self Time (CPU)" column, both while the sampler is running and when reviewing snapshots. Run the target test/process/routine whose performance you want to measure. When the test is done, or at any time during its execution, you can push the "Snapshot" button to capture accumulated results. Each snapshot is separate from the others, so you can take multiple without dropping earlier ones. It is from the snapshots and the active sampling view that you can get an idea where the performance bottlenecks are. You may have to do a little googling or reading from the project home page to get a better idea how to interpret your results; it's too much to summarize these techniques here.

One feature I use a lot is the backtraces feature. From the "HotSpots" tab in a snapshot view, right-click on any row and select "Show Back Traces". When using these, it is helpful to also enable the "Samples" column, using the little table icon at the far right of the column headers. It's annoying that you have to do this every time you open the view, though. This column will tell you how many samples were taken in each method reported in the backtraces.

The thing to keep in mind when using the sampler is that the times are not necessarily accurate. They are a statistical estimate of the times spent in individual methods, based on how many times the CPU sampler found itself in a given method when taking a sample every x milliseconds. So, if the sampler is set to a 20ms frequency and takes 10 samples in a certain method, it will attribute 20ms x 10 = 200ms to that method, regardless of how long it actually takes. The idea is that over a long enough time period, statistically, the accumulated sample times will estimate reality.

Thus, it is important to determine whether an apparent bottleneck is due to a method's implementation being inefficient, or just because it was called a gazillion times. Usually, the truth is some combination of both. It is important to know as a starting point whether to look at the method's implementation itself (i.e., it is not called that often, but takes a long time to execute), or at the method's caller(s) (i.e., it is not necessarily slow to execute, but is called often). This informs which code needs to be scrutinized for potential performance problems. Hence, the note about enabling the samples column in the previous paragraph.

pgBadger (PostgreSQL 9.x)

This is a free, open source, PostgreSQL log analyzer which parses the log file and provides some interesting analytics. This can help you track down database performance bottlenecks typically caused either by slow queries or by executing the same query many times (i.e., lots of database round trips). It will also highlight errors and warnings that you might otherwise miss in voluminous log files.

The pgBadger home page is at http://dalibo.github.io/pgbadger/. Technology prerequisites are a modern Perl installation and a modern browser (the presentation of results is rendered using JavaScript).

To use pgBadger to postprocess your database log files, you will need to (1) enable logging; and (2) modify your logging options in postgresql.conf, so that PostgreSQL logs information in a format pgBadger can understand. The specific options to change depend upon the way your particular PostgreSQL cluster is installed. The details are documented in the Documentation section of the pgBadger website. One thing to keep in mind is that you generally want to use very verbose logging to generate log files for troubleshooting, but you probably will not want the same level of verbosity for production use, since the logging itself can impact production performance and disk use. Thus, you may want to maintain two sets of postgresql.conf files; one for troubleshooting and one for production.

The high level troubleshooting process with pgBadger is to:
  1. enable pgBadger-friendly database logging and restart the database cluster;
  2. start up the FWD application server;
  3. perform the test/routine which you want to analyze;
  4. use the pgBadger command line script to process the PostgreSQL log file(s) (you can use optional parameters to scope the analysis to a certain time window);
  5. review the results in a browser.

Don't forget to reset the logging options and bounce your cluster again after your troubleshooting session, if you are using different logging options normally.