Keycloak’s Prometheus metrics are not just a way to see how many logins you’re getting; they’re a real-time pulse of your authentication system’s health and performance, offering granular insights into every critical step of the auth flow.

Let’s see this in action. Imagine a user trying to log in. Keycloak is doing a lot behind the scenes: validating credentials, checking session validity, potentially interacting with external identity providers, and issuing tokens. Each of these steps can be a bottleneck.

Here’s a snapshot of what you might see in Prometheus, querying for active sessions and login latency:

keycloak_sessions_active{realm="myrealm", client="myclient"} 150
keycloak_login_request_duration_seconds_bucket{realm="myrealm", client="myclient", le="0.1"} 120
keycloak_login_request_duration_seconds_bucket{realm="myrealm", client="myclient", le="0.5"} 145
keycloak_login_request_duration_seconds_bucket{realm="myrealm", client="myclient", le="+Inf"} 150

This tells us there are 150 active sessions for myclient in myrealm, and that 120 logins completed within 0.1 seconds, while 145 completed within 0.5 seconds. The +Inf bucket showing 150 means all requests were accounted for.

The real power comes from understanding what these metrics represent and how they map to Keycloak’s internal workings. Keycloak exposes metrics across several categories:

  • Sessions: keycloak_sessions_active, keycloak_sessions_max, keycloak_sessions_offline_max – These give you a direct view of user activity and the system’s capacity.
  • Login/Logout: keycloak_login_request_duration_seconds, keycloak_logout_request_duration_seconds, keycloak_login_failures_total – Crucial for understanding user experience and identifying authentication issues.
  • Token Issuance: keycloak_token_request_duration_seconds, keycloak_refresh_token_request_duration_seconds – Measures the performance of issuing JWTs, a core function.
  • User Federation: keycloak_user_federation_mapper_duration_seconds, keycloak_user_federation_search_duration_seconds – If you’re using LDAP or Active Directory, these metrics reveal performance of those integrations.
  • Database Operations: keycloak_jpa_transaction_duration_seconds, keycloak_jpa_query_duration_seconds – These are the bedrock. Slow DB operations will cascade into slow authentication.

To enable these metrics, you need to configure Keycloak to expose them. This is typically done by adding a -Dkeycloak.metrics.enabled=true JVM argument or setting the KEYCLOAK_METRICS_ENABLED=true environment variable. Then, you configure the metrics exporter. For Prometheus, you’ll usually set up a prometheus.yml configuration file on your Prometheus server to scrape the /metrics endpoint exposed by Keycloak. The exact endpoint and port depend on your Keycloak deployment (e.g., http://keycloak.example.com:8080/metrics).

The core problem these metrics solve is observability into a black box. Before metrics, troubleshooting authentication slowness meant guesswork. Now, you can pinpoint if latency is in credential validation, token generation, or an external user federation. For instance, if keycloak_login_request_duration_seconds is high, but keycloak_user_federation_search_duration_seconds is low, the bottleneck is likely within Keycloak itself, not your LDAP server.

You control the granularity of what you see through Keycloak’s configuration and your Prometheus query sophistication. Want to see latency per realm? Add realm to your PromQL query. Need to isolate performance for a specific client application? Add client. You can also use rate() and sum() to calculate throughput and error rates. For example, sum(rate(keycloak_login_failures_total{realm="myrealm"}[5m])) will show you the number of login failures per second averaged over the last 5 minutes for myrealm.

A common pitfall is overlooking the database. Keycloak relies heavily on its underlying database for nearly every operation, from user lookups to session management. If your database is slow, every authentication request will be slow. The keycloak_jpa_query_duration_seconds metric is your best friend here. A sudden spike in this metric, especially for specific queries like UserAdapter.findByUsername, almost always points to a database issue – be it indexing, connection pool exhaustion, or resource contention on the database server itself.

The next concept you’ll likely explore is setting up alerting based on these metrics, so you’re notified before users start complaining about slow logins.

Want structured learning?

Take the full Keycloak course →