Keycloak JVM Tuning: Handle Thousands of Auth Requests (2026)

Keycloak’s JVM tuning is less about squeezing out marginal performance gains and more about preventing catastrophic slowdowns and outright outages when the load hits.

Let’s see Keycloak in action under load. Imagine a typical scenario: a frontend application with 1,000 concurrent users, each making an average of one API call per second. Each API call requires a valid JWT. Keycloak’s job is to issue and validate these tokens.

Here’s a snapshot of a kubectl exec into a running Keycloak pod, showing a GET /realms/myrealm/protocol/openid-connect/token request being handled:

# Inside the Keycloak pod
kubectl exec -it keycloak-0 -- bash

# Using curl to simulate a token request
curl -X POST \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=password&username=testuser&password=password123&client_id=myclient" \
  http://localhost:8080/realms/myrealm/protocol/openid-connect/token

The output is a JSON Web Token. Now, scale that up to 1,000 concurrent users. If the JVM isn’t tuned, this is where you start seeing latency spikes, increased error rates, and eventually, a cascade of failures.

The core problem Keycloak solves is centralized identity and access management. It acts as an OpenID Connect and OAuth 2.0 provider, handling user authentication, authorization, single sign-on (SSO), and token issuance. Internally, it relies heavily on its underlying Java Virtual Machine (JVM) for execution. The JVM’s garbage collection, thread management, and memory allocation directly impact Keycloak’s ability to respond quickly to authentication requests.

Here are the key levers you can pull:

Heap Size (-Xms, -Xmx): This is the most fundamental tuning parameter. It defines the minimum and maximum amount of memory allocated to the JVM heap. Too small, and you’ll get frequent OutOfMemoryErrors or excessive garbage collection. Too large, and GC pauses can become prohibitively long.
- Diagnosis: Monitor java.lang.OutOfMemoryError: Java heap space in your Keycloak logs. Also, observe GC activity using JVM metrics (e.g., jvm_memory_used_heap_bytes, jvm_gc_collection_seconds_count).
- Fix: For a deployment handling thousands of requests, start with -Xms4g -Xmx8g. This provides a solid foundation. You’ll need to adjust based on your specific workload and observed memory usage.
- Why it works: A larger heap allows Keycloak to hold more session data, cache, and objects in memory without needing to constantly reclaim space, reducing GC overhead.
Garbage Collector (GC): The choice of GC algorithm is critical. Different GCs offer different trade-offs between throughput and pause times. For interactive applications like Keycloak, minimizing pause times is usually preferred.
- Diagnosis: Observe GC pause times. High, frequent pauses (seconds long) will make your application unresponsive. Tools like GCViewer or JVM monitoring probes can help.
- Fix: For modern JVMs (8u20+), G1 GC (-XX:+UseG1GC) is often a good default. For even lower latency, consider Shenandoah (-XX:+UseShenandoahGC) or ZGC (-XX:+UseZGC) if your JVM version supports them and you’re willing to experiment.
- Why it works: G1 aims to balance throughput and pause times by dividing the heap into regions and collecting them concurrently. Shenandoah and ZGC are designed for extremely short, consistent pause times, even with very large heaps, by performing most of the GC work concurrently with the application threads.
Thread Stack Size (-Xss): Each thread in the JVM has its own stack. If you have many threads, the total stack memory can become significant.
- Diagnosis: Monitor java.lang.OutOfMemoryError: unable to create new native thread. This often indicates the JVM is running out of address space for thread stacks or the OS has hit its thread limit.
- Fix: Reduce the thread stack size. A common value to try is -Xss256k.
- Why it works: Smaller stack sizes allow the JVM to create more threads within the same memory footprint, which is crucial for handling high concurrency where Keycloak might spawn many threads for incoming requests or background tasks.
Max Metaspace Size (-XX:MaxMetaspaceSize): Metaspace stores class metadata. If Keycloak loads a large number of classes (e.g., through extensions or dynamic class loading), this can grow.
- Diagnosis: Look for java.lang.OutOfMemoryError: Metaspace in logs.
- Fix: Increase the Metaspace size. -XX:MaxMetaspaceSize=256m is a reasonable starting point if defaults are insufficient.
- Why it works: Provides more memory for the JVM to store class definitions, preventing it from running out of space as it loads and manages classes.
Connection Pool Tuning: Keycloak uses a connection pool (e.g., HikariCP) to manage database connections. The size of this pool directly impacts its ability to serve requests that require database access.
- Diagnosis: Monitor database connection usage. If your database shows a high number of active connections and Keycloak logs show connection acquisition timeouts, the pool might be too small.
- Fix: Adjust KEYCLOAK_DB_MAX_POOL_SIZE (or equivalent environment variable/configuration). A value between 50 and 100 is often appropriate for high-load scenarios, but this is highly dependent on your database’s capacity and your query patterns.
- Why it works: A larger connection pool allows Keycloak to have more concurrent database transactions ready, reducing the latency associated with acquiring a new connection when needed.
Off-Heap Memory (-XX:MaxDirectMemorySize): Direct memory is used by certain Java NIO operations, including those involved in network communication.
- Diagnosis: Monitor direct memory usage. Errors related to java.nio.ByteBuffer.allocateDirect can indicate this is a bottleneck.
- Fix: Increase -XX:MaxDirectMemorySize. A value like -XX:MaxDirectMemorySize=1g can be a good starting point if you suspect direct memory issues.
- Why it works: Ensures that Keycloak and the JVM have sufficient off-heap memory for efficient network I/O operations, which are frequent in a high-throughput authentication service.

When tuning, remember that these JVM arguments are often passed via environment variables in containerized deployments (e.g., JAVA_OPTS or KEYCLOAK_JAVA_OPTS). For example, in Kubernetes, you might set these in your Deployment’s spec.template.spec.containers[0].env.

The next error you’ll likely hit after optimizing JVM settings is around database contention or network saturation, as those become the new bottlenecks.