A health check endpoint is surprisingly useless if it doesn’t tell you why it’s unhealthy.
Let’s get Fluent Bit’s HTTP health check endpoint up and running. This isn’t just about knowing if Fluent Bit is alive, but about getting actionable data when it’s not. We’ll configure it to expose metrics and status information that can be scraped by Prometheus or other monitoring tools.
First, we need to add a [HTTP] section to your fluent-bit.conf. This tells Fluent Bit to start an HTTP server.
[HTTP]
Port 2020
Bind 0.0.0.0
This configuration snippet tells Fluent Bit to listen on port 2020 on all available network interfaces (0.0.0.0). This is where our health check data will be served.
Next, we need to enable the health check endpoint itself. This is done within the [HTTP] configuration.
[HTTP]
Port 2020
Bind 0.0.0.0
Health_Check On
Health_Check_Port 2020
The Health_Check On directive enables the health check functionality. Health_Check_Port 2020 explicitly sets the port for health checks, which in this case is the same as the main HTTP port. If you wanted to expose health checks on a different port, you’d specify it here.
Now, let’s consider what metrics we want to expose. Fluent Bit can provide a wealth of information. We’ll enable the metrics endpoint, which is crucial for Prometheus integration.
[HTTP]
Port 2020
Bind 0.0.0.0
Health_Check On
Health_Check_Port 2020
Metrics On
With Metrics On, Fluent Bit will expose a /metrics endpoint. This endpoint will serve data in Prometheus exposition format, including information about input/output plugins, buffer usage, and more.
Let’s look at a sample output from the /metrics endpoint after Fluent Bit has been running for a bit and processing some logs.
# HELP fluentbit_input_bytes_total Total number of bytes read from input plugins.
# TYPE fluentbit_input_bytes_total counter
fluentbit_input_bytes_total{input_name="tail",plugin_id="0",plugin_type="in_tail"} 15485760
# HELP fluentbit_output_bytes_total Total number of bytes written to output plugins.
# TYPE fluentbit_output_bytes_total counter
fluentbit_output_bytes_total{output_name="es",plugin_id="0",plugin_type="out_es"} 14567890
# HELP fluentbit_output_retries_total Total number of retries for output plugins.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{output_name="es",plugin_id="0",plugin_type="out_es"} 5
# HELP fluentbit_buffer_mem_usage_bytes Current memory buffer usage in bytes.
# TYPE fluentbit_buffer_mem_usage_bytes gauge
fluentbit_buffer_mem_usage_bytes 2097152
# HELP fluentbit_pipeline_records_total Total number of records processed by the pipeline.
# TYPE fluentbit_pipeline_records_total counter
fluentbit_pipeline_records_total{pipeline="main"} 100000
This output shows key metrics. fluentbit_input_bytes_total and fluentbit_output_bytes_total give us insight into data flow. fluentbit_output_retries_total is critical for identifying issues with downstream services. fluentbit_buffer_mem_usage_bytes helps us monitor memory consumption, and fluentbit_pipeline_records_total tracks overall processing volume.
The /health endpoint (which is implicitly enabled by Health_Check On) provides a simpler, more direct status. It typically returns an HTTP 200 OK if Fluent Bit is running and healthy, and a 503 Service Unavailable otherwise. This is what your orchestrator or load balancer would poll.
The truly counterintuitive aspect of Fluent Bit’s health check is that while the /health endpoint gives a binary "yes/no" answer, it’s the /metrics endpoint that provides the diagnostic depth. A 200 OK on /health doesn’t guarantee smooth operation; it only means the Fluent Bit process is alive and its HTTP server is responsive. Real problems, like a failing output plugin or a choked buffer, will only manifest as changes in the metrics exposed on /metrics, not as a failure of the /health endpoint itself. You need to actively monitor specific metrics like fluentbit_output_retries_total or fluentbit_buffer_mem_usage_bytes to detect and diagnose these issues before they cause a full service outage.
To apply these changes, you’ll need to restart your Fluent Bit service. For example, if you’re using systemd:
sudo systemctl restart fluent-bit
Once restarted, you can verify the HTTP server is running by curling the port:
curl http://localhost:2020/metrics
This should return the Prometheus-formatted metrics. The next step is to configure your Prometheus server to scrape these metrics from your Fluent Bit instances.