The Istio ingress gateway is reporting "connection reset by foreign host" errors because it’s receiving TCP RST packets from upstream services that it doesn’t expect.
This typically happens when an upstream service, often outside of Istio’s control or misconfigured within it, prematurely closes the connection before the gateway has finished its request or received a full response. Here are the most common reasons and how to fix them:
1. Upstream Service Crashing or Restarting
The most frequent culprit is the application itself within the upstream service crashing and restarting. When a pod restarts, its network connections are abruptly terminated.
- Diagnosis: Check the logs of your upstream service pods. Look for OOMKilled messages, unhandled exceptions, or any indication of a crash. You can also monitor the pod lifecycle in Kubernetes using
kubectl get pods -n <namespace>. - Fix: Address the root cause of the application crash. This might involve increasing resource limits (
resources.limits.cpu,resources.limits.memoryin your deployment YAML), fixing bugs in the application code, or ensuring proper graceful shutdown handling. For example, if a pod is OOMKilled, increase its memory limit:resources: limits: memory: "512Mi" # Increase from previous value requests: memory: "256Mi" - Why it works: By stabilizing the upstream application, it can handle requests without crashing, thus maintaining active connections until the request is fully processed or a proper error response is sent.
2. Upstream Service Idle Timeout Exceeded
Many applications and load balancers have idle connection timeouts. If a request takes too long to process on the upstream service, or if the connection remains idle for too long after a response has started but before it’s fully received, the upstream might close it.
- Diagnosis: Examine the configuration of your upstream service’s web server (e.g., Nginx, Apache, Node.js HTTP server) or any intermediary load balancers before Istio. Look for
keepalive_timeout,client_header_timeout,send_timeout, or similar settings. - Fix: Increase the idle timeout settings on the upstream application or any intermediary load balancers. For Nginx, you might increase
keepalive_timeoutandsend_timeout:
Apply these changes to your application’s configuration and redeploy/reload.http { send_timeout 300s; # Increased from default (e.g., 60s) keepalive_timeout 120s; # Increased from default (e.g., 75s) # ... other http settings } - Why it works: A longer idle timeout allows the connection to remain open for the duration of longer-running requests or during periods of slow data transfer, preventing premature closure.
3. Istio Sidecar Misconfiguration or Resource Starvation
While less common for "connection reset by foreign host" (which implies the foreign host is resetting), a struggling Istio sidecar can sometimes lead to unexpected connection behavior, though it usually manifests as different errors. However, if the sidecar is overwhelmed, it might not properly proxy the connection, leading to issues on the upstream.
- Diagnosis: Check the resource utilization (CPU, memory) of the Istio sidecar proxy (Envoy) in the upstream pods:
kubectl top pod <upstream-pod-name> -n <namespace> -c istio-proxy. Also, check the sidecar’s logs for any errors:kubectl logs <upstream-pod-name> -n <namespace> -c istio-proxy. - Fix: Increase the resource requests and limits for the
istio-proxycontainer in your application’s deployment.containers: - name: my-app # ... app config - name: istio-proxy resources: limits: cpu: "500m" # Increased memory: "256Mi" # Increased requests: cpu: "100m" memory: "128Mi" - Why it works: Providing sufficient resources to the Envoy proxy ensures it can efficiently handle and forward network traffic, preventing it from becoming a bottleneck that could indirectly cause upstream connection issues.
4. Network Policy Blocking or Dropping Packets
Kubernetes Network Policies can restrict traffic. If a policy is too restrictive, it might inadvertently drop packets, leading the upstream service (or the client trying to send data) to believe the connection is dead and reset it.
- Diagnosis: Review your Kubernetes Network Policies in the namespace of the upstream service. Ensure that traffic from the Istio ingress gateway (or the pod network if traffic passes through other Istio components) is explicitly allowed.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-ingress-to-app namespace: <upstream-namespace> spec: podSelector: matchLabels: app: <upstream-app-label> policyTypes: - Ingress ingress: - from: - podSelector: {} # Allow from all pods in the namespace # Or more specific: # - namespaceSelector: # matchLabels: # istio: ingressgateway # If ingress gateway is in a different namespace # # Or from specific pods that the gateway talks to ports: - protocol: TCP port: 8080 # Port your upstream app listens on - Fix: Adjust your Network Policies to explicitly allow the necessary ingress traffic to your upstream service pods.
- Why it works: Correctly configured Network Policies ensure that legitimate traffic from the ingress gateway reaches the upstream service without being silently dropped, preventing the foreign host from perceiving an invalid connection state.
5. Upstream Service Not Ready or Not Listening Correctly
The upstream service might be reporting "Ready" to Kubernetes but isn’t actually listening on the expected port, or it’s misconfigured and not accepting connections on the IP address Istio is trying to reach.
- Diagnosis: Exec into the upstream pod and use
netstat -tulnporss -tulnpto verify that the application is listening on the correct port and IP address (usually0.0.0.0or::). Also, trycurl localhost:<port>from within the pod to see if it responds locally. - Fix: Correct the application’s listening configuration or the Kubernetes
containerPortandtargetPortin your deployment/service definitions. Ensure your readiness and liveness probes are accurate and that the application is truly ready before K8s marks it as such.# Example Deployment snippet spec: containers: - name: my-app ports: - containerPort: 8080 # The port Kubernetes exposes targetPort: 8080 # The port the application listens on inside the container readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 periodSeconds: 5 - Why it works: This ensures that the upstream service is properly configured to accept incoming TCP connections on the port Istio is attempting to send traffic to.
6. MTU Mismatch
An MTU (Maximum Transmission Unit) mismatch between network interfaces along the path can cause large packets to be dropped, leading to connection issues. While often causing timeouts or dropped packets rather than resets, it can sometimes manifest as connection issues that the upstream interprets as an invalid state.
- Diagnosis: This is harder to diagnose. You might need to inspect network configurations on your nodes, CNI plugin, and any intervening network devices. Tools like
ping -s <size> -M do <destination>can help test packet fragmentation. - Fix: Ensure a consistent MTU is configured across your Kubernetes nodes, CNI network, and any external network gateways. This often involves configuring your CNI plugin (e.g., Calico, Flannel) and potentially node network interfaces. For example, with Calico, you might set
ipip.mtuorvxlan.mtu. - Why it works: A consistent MTU prevents packet fragmentation or dropping, allowing TCP segments to be transmitted and received reliably end-to-end.
After addressing these, you’ll likely encounter 503 Service Unavailable errors if the upstream service is still unhealthy or unready.