You probably thought JVM GC logs were just a stream of incomprehensible text, but they’re actually a highly detailed, real-time diagnostic log of how your application is managing its memory.
Let’s see what’s happening under the hood. Imagine a simple Java application that’s constantly creating new objects.
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
public class MemoryHog {
public static void main(String[] args) throws InterruptedException {
List<String> data = new ArrayList<>();
while (true) {
data.add(UUID.randomUUID().toString());
if (data.size() % 10000 == 0) {
System.out.println("List size: " + data.size());
}
if (data.size() > 1000000) {
data.clear(); // Simulate releasing memory
}
Thread.sleep(1); // Slow down a bit to observe
}
}
}
To generate GC logs for this, you’d start the JVM with flags like these. The exact flags depend on your JVM version, but the principle is the same: tell the JVM to log GC activity.
java -Xmx512m -Xms512m \
-XX:+PrintGCDetails \
-XX:+PrintGCTimeStamps \
-XX:+PrintGCDateStamps \
-XX:+UseG1GC \
-Xloggc:gc.log \
MemoryHog
Now, let’s look at a snippet from gc.log and break it down.
2023-10-27T10:30:05.123+0000: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.0156789s, 100% (1024/1024)] [ParallelTime: 10.0ms, GCThreadBytesAllocated: 10.0M] [Times: user=0.03s sys=0.01s, real=0.02s
This line tells us:
2023-10-27T10:30:05.123+0000: The exact date and time the GC event started.[GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.0156789s, 100% (1024/1024)]: This is the core event. It’s a Garbage Collection pause.G1 Evacuation Pause: The type of GC pause (we’re using G1 collector).young: It occurred during the "young generation" collection (where new objects are created and die quickly).to-space exhausted: The specific reason for this pause was that the "to-space" region, where live objects are copied to during a collection, ran out of space. This is a strong indicator of high object allocation.0.0156789s: The duration of the pause. Pauses are where your application threads are stopped.100% (1024/1024): This relates to the occupancy of the young generation regions.1024is the total number of regions, and1024means all of them were full.
[ParallelTime: 10.0ms, GCThreadBytesAllocated: 10.0M]: Details about parallel GC threads and memory allocated by them.[Times: user=0.03s sys=0.01s, real=0.02s]: CPU time breakdown (useris application code,sysis kernel code) and therealtime (wall-clock time) of the GC event.
The goal of GC is to reclaim memory that is no longer reachable by the application. The JVM divides memory into generations (young and old) to optimize this. New objects are born in the young generation, which is collected frequently. Objects that survive multiple young generation collections are promoted to the old generation. Full GC cycles collect the entire heap.
In our MemoryHog example, the (young) (to-space exhausted) message is key. It means we’re creating objects so rapidly that the young generation is filling up and needs to be cleared. The data.clear() in the loop helps, but if allocations outpace clears, you’ll see frequent young GCs.
Let’s look at a full GC event.
2023-10-27T10:31:15.543+0000: [Full GC (Ergonomics) (Allocation Overflow) "C2 thread 0x00007f8b8c001800"
... heap before GC ...
Heap
PS Young Gen Total Size Used Avail Capacity Garbage Collection
...
PS Old Gen Total Size Used Avail Capacity Garbage Collection
...
... heap after GC ...
...
2023-10-27T10:31:16.789+0000: [Full GC (Ergonomics) (Allocation Overflow)
... heap details ...
23.4567890s: Full GC (Metadata GC Threshold) 23.4567890s: Pause Full GC (Metadata GC Threshold)
...
Total time for which application threads were stopped: 1.2467890s, GC time: 1.2000000s
A Full GC means the entire Java heap was scanned.
(Allocation Overflow): This reason means that the JVM couldn’t free up enough space in the young generation to satisfy the current allocation rate, forcing a full collection.Pause Full GC: This is the pause time for the full collection. Notice this is much longer than the young GC pause.Total time for which application threads were stopped: This is the critical number. If this is high, your application is experiencing significant pauses.
The G1 collector tries to maintain a pause time goal (-XX:MaxGCPauseMillis). If it can’t meet this goal, it might trigger more aggressive collections or, in extreme cases, a Full GC.
To diagnose and fix issues:
-
Identify High Allocation Rate: Look for frequent young GCs, especially those with
to-space exhaustedorAllocation Overflowreasons. This means your application is creating too many short-lived objects.- Diagnosis: Use GC log analysis tools (like GCeasy, fastThread) or manually scan
gc.logfor the frequency and duration of young GCs. - Fix: Optimize object creation. Can you reuse objects instead of creating new ones? Are you holding onto references longer than necessary? Consider object pooling or using primitive types where appropriate. For our
MemoryHog, thedata.add(UUID.randomUUID().toString())is the primary culprit. Reducing the frequency ofdata.addor making thedatalist smaller would help. - Why it works: Less object creation means less work for the GC, leading to fewer and shorter pauses.
- Diagnosis: Use GC log analysis tools (like GCeasy, fastThread) or manually scan
-
Tune Heap Size: An undersized heap will lead to frequent GCs. An oversized heap can lead to longer GC pauses when they do occur.
- Diagnosis: Observe heap usage in GC logs. If the heap is consistently near 100% usage before a GC, it might be too small. If old generation GCs (often
Full GCorG1 Old Gen) are frequent and long, the heap might be too large for the available memory or GC tuning is needed. - Fix: Adjust
-Xmxand-Xms. For example, increase-Xmxfrom512mto1024mif memory pressure is the issue. - Why it works: A larger heap provides more breathing room, reducing GC frequency. A smaller heap might reduce the time taken for a full collection.
- Diagnosis: Observe heap usage in GC logs. If the heap is consistently near 100% usage before a GC, it might be too small. If old generation GCs (often
-
Optimize GC Algorithm: Different collectors (G1, Parallel, Shenandoah, ZGC) have different strengths and weaknesses. G1 is a good general-purpose collector.
- Diagnosis: If you’re experiencing long pauses with G1, investigate if another collector might be a better fit for your workload’s specific characteristics (e.g., low latency requirements).
- Fix: Change
-XX:+UseG1GCto-XX:+UseParallelGC(for throughput) or-XX:+UseZGC(for very low latency). - Why it works: Different collectors use different strategies and thread models, which can significantly impact pause times and throughput.
-
Tweak G1 Region Size and Pause Target: G1 divides the heap into regions. The size of these regions can be adjusted, impacting how it chooses regions to collect. The pause time goal also influences G1’s behavior.
- Diagnosis: If G1 is struggling to meet
-XX:MaxGCPauseMillis, it might be due to region sizing or too aggressive a target. - Fix: Consider
-XX:G1HeapRegionSize=N(e.g.,32m) and-XX:MaxGCPauseMillis=200(if the default200msis too low). - Why it works: Adjusting region size can affect evacuation efficiency. A more realistic pause target allows G1 more flexibility in its collection strategy.
- Diagnosis: If G1 is struggling to meet
-
Address Fragmentation: Over time, especially with older collectors or certain allocation patterns, the heap can become fragmented, making it harder to find contiguous free space.
- Diagnosis: Look for
Full GCevents that are not directly tied to allocation overflow but rather to "System.gc()" calls or "Metadata GC Threshold." - Fix: Ensure you’re using a modern collector like G1 or ZGC, which are designed to handle fragmentation better. Sometimes, a larger heap can alleviate fragmentation issues by reducing the frequency of collections that might exacerbate it.
- Why it works: Modern collectors use techniques like region management and compaction to minimize fragmentation, allowing for more efficient allocation.
- Diagnosis: Look for
-
Check for Application-Induced GC: Sometimes, applications explicitly call
System.gc(). This is generally discouraged as it can force a full GC at an inopportune time.- Diagnosis: Look for
Full GC (System)in the GC logs. - Fix: Remove
System.gc()calls from your application code. - Why it works: Prevents the JVM from being forced into a full collection when it might not be necessary or optimal.
- Diagnosis: Look for
The next thing you’ll likely encounter is understanding the different phases within a GC cycle, like "Mark Start," "Mark," "Remark," and "Cleanup" in G1, and what each signifies about the GC’s progress.