JVM Async Profiler is a powerful tool for identifying performance bottlenecks in Java applications.
Let’s see it in action. Imagine you have a web service that’s experiencing slow response times. You suspect a CPU-intensive part of your code is the culprit. You can attach the async-profiler to your running JVM process and capture a profile.
Here’s how you might start a profiling session, targeting CPU usage for 30 seconds:
./profiler.sh -d 30 -f profile.jfr <pid>
This command tells the profiler to run for 30 seconds (-d 30), save the output to a Java Flight Recorder (JFR) file named profile.jfr (-f profile.jfr), and attach to the Java process with the specified Process ID (<pid>).
After the profiling session completes, you’ll have a profile.jfr file. You can then analyze this file using tools like Java Mission Control (JMC) or the async-profiler’s own visualization tools.
Let’s say you open the profile.jfr in JMC. You’d navigate to the "CPU" tab, and you’d see a breakdown of where the CPU time is being spent. This isn’t just about which methods are called most often; it’s about which methods are consuming the most CPU cycles.
The async-profiler works by periodically interrupting the JVM’s execution and inspecting the call stack of each thread. It does this with very low overhead, making it suitable for production environments. Because it samples all threads, including those in the JVM’s internal operations, it can reveal performance issues that might be hidden in traditional profiling methods.
The output typically shows a flame graph or a call tree, where the width of a "flame" or a node represents the amount of CPU time spent in that method and its children. This makes it incredibly easy to spot "hotspots" – the methods that are consuming the most resources.
Here’s a deeper dive into what you control:
- Profiling Duration (
-d): How long you collect data. Longer durations capture more representative behavior but also generate larger files and potentially more overhead. - Output Format (
-f): You can choose different formats. JFR is excellent for integration with existing JVM tools.collapsedformat is good for simple text-based analysis.svggenerates a flame graph directly. - Event Types (
-e): By default, it profiles CPU time (cpu). But you can profile other events like allocation (alloc), lock contention (lock), and even garbage collection (gc). For example, to profile allocations:./profiler.sh -e alloc -d 30 -f profile.jfp <pid> - Sampling Interval (
-i): Controls how often the profiler interrupts the JVM. A lower interval means more samples, higher accuracy, and higher overhead. The default is usually sufficient. - Thread Filtering (
-t): You can choose to profile only specific threads or exclude certain threads, which can be useful for isolating issues.
The true power of async-profiler lies in its ability to capture asynchronous events. Unlike synchronous profilers that instrument code and add overhead to every method call, async-profiler takes snapshots. This means it can accurately profile blocking I/O, garbage collection, and other operations that don’t neatly map to method execution time. It’s this asynchronous nature that allows it to have such minimal impact on your application’s performance while still providing deep insights.
When you see a large flame in the flame graph, it’s not just that the method is called a lot. It means that during the sampled periods, the JVM was actively executing that method’s code and its descendant calls, consuming CPU cycles. This is the signal you’re looking for to find performance regressions or inefficient algorithms.
The most common mistake when analyzing profiler output is to focus solely on the methods at the top of a call tree. While these are often hotspots, the real problem might be a method that’s called infrequently but does an enormous amount of work, or a method that’s called by many different, seemingly unrelated parts of the application, creating a distributed hotspot. Always look at the cumulative width of the flames to understand the full picture.
Once you’ve identified a hotspot, the next step is often to dive into the specific code within that method or to investigate related system events like garbage collection or thread contention.