Grafana’s data source proxy is the unsung hero that lets your dashboards talk to your databases, but when it breaks, it’s usually because the network path between Grafana and the data source is silently broken.
Here’s Grafana trying to reach Prometheus, which is running on 192.168.1.100:9090. Grafana’s internal proxy is making a GET request to http://192.168.1.100:9090/api/v1/query?query=up.
GET /api/v1/query?query=up HTTP/1.1
Host: 192.168.1.100:9090
User-Agent: Grafana
And Prometheus responds:
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "up",
"job": "prometheus",
"instance": "localhost:9090"
},
"value": [
1678886400,
"1"
]
}
]
}
}
This is the happy path. Now, let’s break it.
Connection Refused
The most common error is that the Grafana proxy can’t even establish a TCP connection to the data source. The operating system on the Grafana server is telling Grafana, "Nope, nothing listening on that IP and port."
Diagnosis: From the Grafana server’s command line, run:
nc -vz 192.168.1.100 9090
Common Causes & Fixes:
-
Data Source Not Running: The service you’re trying to connect to (e.g., Prometheus, InfluxDB) is simply not running on its host.
- Fix: Start the service. For systemd systems,
sudo systemctl start prometheus. This works because thestartcommand tells theprometheusservice to initialize and begin listening for connections on its configured port.
- Fix: Start the service. For systemd systems,
-
Incorrect IP Address or Port: You’ve misconfigured the data source’s address in Grafana, or the data source is listening on a different port than you think.
- Fix: Double-check your Grafana data source configuration. Ensure the "URL" field is set to
http://192.168.1.100:9090(or whatever the correct IP/port is). This is crucial because Grafana uses this exact string to construct its outgoing HTTP requests.
- Fix: Double-check your Grafana data source configuration. Ensure the "URL" field is set to
-
Firewall Blocking Port: A firewall on the Grafana server, the data source server, or an intermediate network device is preventing traffic on the data source’s port.
- Fix: On the data source server, allow incoming TCP traffic on port 9090. For
ufw, this would besudo ufw allow 9090/tcp. This command explicitly permits packets destined for TCP port 9090 to reach the server, enabling the connection.
- Fix: On the data source server, allow incoming TCP traffic on port 9090. For
-
Data Source Binding to Wrong Interface: The data source is configured to listen only on
localhost(127.0.0.1) and not on its external network interface.- Fix: Reconfigure the data source to bind to
0.0.0.0or its specific network IP. For Prometheus, this is often done via the--web.listen-addressflag in its startup command, e.g.,prometheus --web.listen-address="0.0.0.0:9090". Binding to0.0.0.0tells the application to accept connections on all available network interfaces, not just the loopback interface.
- Fix: Reconfigure the data source to bind to
-
Network Interface Down: The network interface on either the Grafana server or the data source server is down or misconfigured.
- Fix: Ensure the network interface is up and has a valid IP address. Use
ip addr showto check. If down, bring it up withsudo ip link set eth0 up(replaceeth0with your interface name). This command re-enables the network interface, allowing it to send and receive network traffic.
- Fix: Ensure the network interface is up and has a valid IP address. Use
-
Container Networking Issues (if applicable): If Grafana or your data source are running in Docker or Kubernetes, network policies or incorrect container networking configurations can prevent communication.
- Fix: Verify your container network configuration. For Docker, check
docker network inspect <network_name>. For Kubernetes, examineNetworkPolicyobjects. Ensure the network allows pods to reach each other on the required ports.
- Fix: Verify your container network configuration. For Docker, check
Timeout
If the connection is established but no response is received within Grafana’s configured timeout, you’ll see a timeout error. This means packets are likely reaching the data source, but the response is getting lost or delayed.
Diagnosis: Use tcpdump on the Grafana server to see if it’s sending requests and if any responses are coming back.
sudo tcpdump -i any host 192.168.1.100 and port 9090 -n
Common Causes & Fixes:
-
Data Source Overloaded: The data source is too busy to process the query and send a timely response.
- Fix: Scale up your data source (more CPU, RAM) or optimize your queries. Check the data source’s own metrics for CPU/memory usage and query latency. This addresses the root cause of slowness, allowing it to respond within the expected timeframe.
-
Network Latency/Packet Loss: High latency or packet loss between Grafana and the data source is delaying or dropping the response packets.
- Fix: Investigate network path. Use
ping -c 10 192.168.1.100andmtr 192.168.1.100from the Grafana server. If issues are found, work with your network team to resolve routing or congestion problems. Improving network reliability ensures data packets arrive promptly.
- Fix: Investigate network path. Use
-
Grafana Proxy Timeout Too Low: Grafana’s default timeout might be too aggressive for your data source’s response time.
- Fix: Increase the timeout in Grafana’s data source configuration. Navigate to Configuration -> Data Sources -> [Your Data Source] -> Advanced Settings and increase "HTTP Client Timeout" to
30sor60s. This gives the data source more time to process the request and send back its response without Grafana giving up.
- Fix: Increase the timeout in Grafana’s data source configuration. Navigate to Configuration -> Data Sources -> [Your Data Source] -> Advanced Settings and increase "HTTP Client Timeout" to
SSL Handshake Failures
If you’re using HTTPS for your data source connection, errors during the SSL/TLS handshake are common.
Diagnosis: Check Grafana server logs (/var/log/grafana/grafana.log or via journalctl -u grafana-server) for detailed SSL errors.
Common Causes & Fixes:
-
Incorrect/Expired Certificate: The data source’s SSL certificate is invalid, expired, or not trusted by the Grafana server’s CA bundle.
- Fix: Ensure the data source has a valid, trusted certificate. If using self-signed certificates, you might need to configure Grafana to trust them by adding the CA certificate to Grafana’s
tls.ca_certsetting or system trust store. A valid certificate is required for the TLS handshake to succeed, establishing a secure encrypted channel.
- Fix: Ensure the data source has a valid, trusted certificate. If using self-signed certificates, you might need to configure Grafana to trust them by adding the CA certificate to Grafana’s
-
Mismatched TLS Versions/Ciphers: Grafana and the data source are trying to negotiate TLS settings that neither supports.
- Fix: Ensure both Grafana and the data source support compatible TLS versions (e.g., TLS 1.2 or 1.3) and cipher suites. This might involve updating Grafana or the data source software, or configuring their respective TLS settings. Negotiating compatible protocols allows for secure communication.
-
Proxy/Load Balancer SSL Termination: If an intermediate proxy or load balancer is handling SSL, ensure it’s configured correctly and passing through necessary headers or traffic.
- Fix: Verify the SSL configuration on the proxy/load balancer. If it’s terminating SSL, ensure it’s presenting a valid certificate and that Grafana is configured to connect to the proxy’s HTTPS endpoint. Correct proxy configuration ensures traffic is decrypted and forwarded properly.
The next error you’ll likely encounter after fixing data source connection issues is a "query execution error," indicating the connection is fine but the query itself is malformed or the data source can’t process it.