JVM False Sharing: Fix Cache Line Contention Issues (2026)

JVM false sharing is when unrelated data structures on different threads happen to reside on the same CPU cache line, causing performance degradation.

Let’s see it in action. Imagine two threads, Thread A and Thread B, each operating on a separate counter, counterA and counterB.

class Counter {
    volatile long value = 0;
}

class Worker implements Runnable {
    private final Counter counter;
    private final int id;

    Worker(Counter counter, int id) {
        this.counter = counter;
        this.id = id;
    }

    @Override
    public void run() {
        for (int i = 0; i < 1_000_000_000; i++) {
            counter.value++;
        }
        System.out.println("Thread " + id + " finished.");
    }
}

public class FalseSharingDemo {
    public static void main(String[] args) throws InterruptedException {
        Counter counterA = new Counter();
        Counter counterB = new Counter();

        Thread threadA = new Thread(new Worker(counterA, 1));
        Thread threadB = new Thread(new Worker(counterB, 2));

        long startTime = System.nanoTime();
        threadA.start();
        threadB.start();

        threadA.join();
        threadB.join();

        long endTime = System.nanoTime();
        System.out.println("Execution time: " + (endTime - startTime) / 1_000_000_000.0 + " seconds");
    }
}

If counterA and counterB happen to fall on the same cache line, even though they are independent, writes to counterA by Thread A will invalidate the cache line for Thread B, and vice-versa. This forces frequent, expensive cache line invalidations and reloads from main memory, drastically slowing down the increments. The surprising part is that this performance hit often appears after the code has been running for a while, or when moving from a single-core development machine to a multi-core production environment.

The core problem is that modern CPUs use cache lines (typically 64 bytes) to transfer data between the CPU and main memory. When a CPU core needs data, it fetches the entire cache line containing that data. If multiple cores are writing to different variables that reside within the same cache line, each write by one core causes that cache line to be marked as "dirty" and potentially invalidated in other cores’ caches. This is "false" sharing because the data being accessed is logically independent, but physically (in terms of memory layout) it’s causing contention.

To understand how to fix this, we need to control the memory layout. The JVM, by default, packs objects tightly. A Counter object, containing just a long (8 bytes), might be placed very close to another Counter object in memory. To prevent this, we can pad the objects.

The most direct way to pad is by adding unused fields to the Counter class. Since a cache line is typically 64 bytes, we can add arrays of long (8 bytes each) to push counterA and counterB sufficiently far apart.

class PaddedCounter {
    volatile long value = 0;
    // Padding to ensure it doesn't share a cache line with the next object
    // Assuming 64-byte cache lines, and a long is 8 bytes.
    // We need to add enough padding so that two PaddedCounter objects
    // are at least one cache line apart.
    // 7 * 8 bytes = 56 bytes. This, plus the 8 bytes for 'value', makes 64 bytes.
    // If another PaddedCounter immediately follows, they will be on separate cache lines.
    long p1, p2, p3, p4, p5, p6, p7;
}

class PaddedWorker implements Runnable {
    private final PaddedCounter counter;
    private final int id;

    PaddedWorker(PaddedCounter counter, int id) {
        this.counter = counter;
        this.id = id;
    }

    @Override
    public void run() {
        for (int i = 0; i < 1_000_000_000; i++) {
            counter.value++;
        }
        System.out.println("Thread " + id + " finished.");
    }
}

public class PaddedFalseSharingDemo {
    public static void main(String[] args) throws InterruptedException {
        PaddedCounter counterA = new PaddedCounter();
        PaddedCounter counterB = new PaddedCounter();

        Thread threadA = new Thread(new PaddedWorker(counterA, 1));
        Thread threadB = new Thread(new PaddedWorker(counterB, 2));

        long startTime = System.nanoTime();
        threadA.start();
        threadB.start();

        threadA.join();
        threadB.join();

        long endTime = System.nanoTime();
        System.out.println("Execution time: " + (endTime - startTime) / 1_000_000_000.0 + " seconds");
    }
}

Running this PaddedFalseSharingDemo should show a significant performance improvement compared to the first version, as counterA and counterB are now guaranteed (or highly likely) to reside on separate cache lines. The padding fields p1 through p7 occupy 56 bytes. Combined with the 8-byte value field, the PaddedCounter object is 64 bytes. When the JVM allocates two instances of PaddedCounter consecutively in memory, they will naturally fall onto different 64-byte cache lines, preventing false sharing.

Another common cause is when using arrays. If you have an array of longs and multiple threads access elements that are close together, you can hit false sharing. For example, if Thread A accesses myArray[0] and Thread B accesses myArray[1], and these happen to be on the same cache line.

The fix here is similar: pad the array elements or the array itself. A common pattern for padding array elements is to create a wrapper object for each element and pad that wrapper.

class PaddedLong {
    volatile long value = 0;
    long p1, p2, p3, p4, p5, p6, p7; // 56 bytes padding
}

public class PaddedArrayFalseSharingDemo {
    private static final int ARRAY_SIZE = 2; // Small array to demonstrate
    private static final PaddedLong[] array = new PaddedLong[ARRAY_SIZE];

    public static void main(String[] args) throws InterruptedException {
        for (int i = 0; i < ARRAY_SIZE; i++) {
            array[i] = new PaddedLong();
        }

        Thread thread1 = new Thread(() -> {
            for (int i = 0; i < 1_000_000_000; i++) {
                array[0].value++; // Thread 1 accesses element 0
            }
            System.out.println("Thread 1 finished.");
        });

        Thread thread2 = new Thread(() -> {
            for (int i = 0; i < 1_000_000_000; i++) {
                array[1].value++; // Thread 2 accesses element 1
            }
            System.out.println("Thread 2 finished.");
        });

        long startTime = System.nanoTime();
        thread1.start();
        thread2.start();

        thread1.join();
        thread2.join();

        long endTime = System.nanoTime();
        System.out.println("Execution time: " + (endTime - startTime) / 1_000_000_000.0 + " seconds");
    }
}

This PaddedArrayFalseSharingDemo uses an array of PaddedLong objects. Each PaddedLong is 64 bytes, and thus resides on its own cache line. Accessing array[0].value and array[1].value by different threads will not cause false sharing because each value is now within a distinct PaddedLong object, which is itself padded to a full cache line.

If you’re using a library that provides explicit padding, like Disruptor’s Value interface with its Sequencer implementations, that’s another way to abstract this.

For Java 8 and later, you can use the @Contended annotation from jdk.internal.vm.annotation.Contended (though it’s an internal API and might require --add-opens java.base/jdk.internal.vm.annotation=ALL-UNNAMED JVM argument). This annotation hints to the JVM to pad the annotated field or class.

// Requires: --add-opens java.base/jdk.internal.vm.annotation=ALL-UNNAMED
import jdk.internal.vm.annotation.Contended;

class ContendedCounter {
    @Contended("group1") // Using a group can help organize padding
    volatile long value = 0;
}

class ContendedWorker implements Runnable {
    private final ContendedCounter counter;
    private final int id;

    ContendedWorker(ContendedCounter counter, int id) {
        this.counter = counter;
        this.id = id;
    }

    @Override
    public void run() {
        for (int i = 0; i < 1_000_000_000; i++) {
            counter.value++;
        }
        System.out.println("Thread " + id + " finished.");
    }
}

public class ContendedFalseSharingDemo {
    public static void main(String[] args) throws InterruptedException {
        ContendedCounter counterA = new ContendedCounter();
        ContendedCounter counterB = new ContendedCounter();

        Thread threadA = new Thread(new ContendedWorker(counterA, 1));
        Thread threadB = new Thread(new ContendedWorker(counterB, 2));

        long startTime = System.nanoTime();
        threadA.start();
        threadB.start();

        threadA.join();
        threadB.join();

        long endTime = System.nanoTime();
        System.out.println("Execution time: " + (endTime - startTime) / 1_000_000_000.0 + " seconds");
    }
}

This ContendedFalseSharingDemo relies on the JVM’s interpretation of @Contended. The JVM may insert padding around the value field to ensure it doesn’t share a cache line with other annotated fields (especially if they are in the same group). This is often a cleaner approach than manual padding, but its effectiveness can depend on the JVM implementation.

A less common but still relevant cause is when unrelated volatile fields within the same object are accessed concurrently by different threads. While volatile guarantees atomicity and visibility for individual fields, it doesn’t prevent cache line contention if those fields are close enough in memory. The padding techniques described above apply here too.

The next error you’ll encounter if you haven’t addressed all the performance bottlenecks is a livelock in a complex concurrent data structure, where threads repeatedly acquire and release locks without making progress.