The most surprising true thing about Java garbage collection is that the choice of collector is less about raw speed and more about predictable latency, and the "best" one is often the one you don’t have to tune obsessively.
Let’s see how these modern collectors handle a typical application. Imagine an e-commerce site with a microservice that processes user sessions. This service sees bursts of activity, creating many short-lived objects (like session data) and occasional longer-lived ones (cached user profiles).
Here’s a simplified scenario. We’ll use a hypothetical application that simulates this:
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.TimeUnit;
public class SessionProcessor {
private static final List<byte[]> sessionData = new ArrayList<>();
private static final List<byte[]> cache = new ArrayList<>();
private static final int SESSION_SIZE = 1024 * 10; // 10KB
private static final int CACHE_SIZE = 1024 * 1024 * 5; // 5MB
public static void main(String[] args) throws InterruptedException {
System.out.println("Starting session processing...");
long startTime = System.currentTimeMillis();
int counter = 0;
while (true) {
// Simulate session creation and processing
byte[] newSession = new byte[SESSION_SIZE];
sessionData.add(newSession);
// Simulate occasional caching
if (counter % 100 == 0) {
byte[] cachedItem = new byte[CACHE_SIZE];
cache.add(cachedItem);
}
// Simulate garbage collection pressure
if (sessionData.size() > 5000) {
sessionData.subList(0, 1000).clear(); // Remove old sessions
}
if (cache.size() > 5) {
cache.subList(0, 1).clear(); // Remove old cache entries
}
counter++;
if (counter % 10000 == 0) {
long elapsed = TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis() - startTime);
System.out.printf("Processed %d requests in %d seconds. Heap usage: %.2fMB%n",
counter, elapsed, getHeapUsageMB());
}
// Small delay to make output readable, in a real app this would be request driven
// Thread.sleep(1);
}
}
private static double getHeapUsageMB() {
// This is a simplified way to get heap usage.
// In production, use JMX or GC logs for accurate metrics.
return (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / (1024.0 * 1024.0);
}
}
To run this with different GCs, you’d use JVM arguments.
Running with G1 (Default on modern JDKs):
java -jar -Xms2g -Xmx2g -XX:+UseG1GC SessionProcessor.jar
G1 aims for a pause time goal. It divides the heap into regions and tries to collect multiple regions in a single pause, balancing throughput and latency. It’s generally good for larger heaps and mixed workloads.
Running with ZGC (JDK 11+):
java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -jar -Xms2g -Xmx2g SessionProcessor.jar
ZGC is designed for extremely low pause times, aiming for sub-millisecond pauses regardless of heap size. It does this by performing most of its work concurrently with the application threads, using techniques like load barriers and colored pointers. It’s ideal for latency-sensitive applications and very large heaps.
Running with Shenandoah (JDK 12+):
java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -jar -Xms2g -Xmx2g SessionProcessor.jar
Shenandoah also focuses on low, predictable pauses, similar to ZGC. It achieves this through concurrent class unloading and evacuation, also using load barriers. It’s another strong contender for latency-sensitive applications and large heaps.
The core problem these collectors solve is the "stop-the-world" pause. Older GCs like Parallel or CMS would halt all application threads for significant periods during garbage collection. For applications requiring responsiveness, especially those with user-facing components or real-time data processing, these pauses are unacceptable. G1 attempts to limit pause times by breaking collections into smaller, more manageable chunks. ZGC and Shenandoah take this further by doing almost all the heavy lifting while the application is still running, only requiring brief pauses for critical synchronization points.
The mental model for these collectors revolves around concurrent work and load barriers. When application threads are running, the GC needs to know which objects are still reachable. Load barriers are small pieces of code that the JVM inserts before object access or allocation. When a load barrier is hit, it can inform the GC about object references, allowing the GC to track live objects without stopping the application. ZGC and Shenandoah heavily rely on sophisticated load barriers to maintain consistency between the live application threads and the concurrent GC threads. G1 uses a region-based approach and concurrent marking phases, but its evacuation phase still involves some stop-the-world pauses, albeit shorter ones.
The exact levers you control are often related to pause time goals and heap region sizing. For G1, -XX:MaxGCPauseMillis=N is your primary tuning knob. For ZGC and Shenandoah, the heap size (-Xms, -Xmx) is often the most critical factor, as their design scales pause times with heap size less dramatically than G1. You can also influence object promotion and collection cycles with options like -XX:G1NewSizePercent, -XX:G1MaxNewSizePercent for G1, or by understanding how ZGC and Shenandoah handle humongous objects.
What most people don’t realize is that ZGC and Shenandoah’s "concurrent" collection isn’t entirely free of application thread interaction. They use colored pointers and load barriers to track object state changes during concurrent collection. This means the GC can tell if an object reference was changed after the GC marked it as live, preventing it from being mistakenly collected. This sophisticated coordination is what allows for such short pauses, but it does introduce some overhead to object access and allocation.
The next concept you’ll grapple with is understanding how to measure GC performance accurately, moving beyond simple heap usage to analyze GC logs and pause times.