The Upstream Reset: Remote Reset error in Istio means a TCP connection was abruptly terminated by the remote service, not gracefully closed. This typically happens when the upstream service (the one receiving the request) crashes, times out internally, or forcefully rejects the connection without sending a proper FIN or RST packet indicating a graceful shutdown.

Here are the common causes and how to fix them:

1. Upstream Service Crashing or OOM Kills

Diagnosis: Check the logs of the actual application pods receiving the traffic. Look for any crash logs, segmentation faults, or messages indicating an Out-Of-Memory (OOM) kill.

kubectl logs <upstream-pod-name> -n <namespace>

If pods are restarting frequently, check their restart counts:

kubectl get pods -n <namespace> | grep <upstream-app-label>

Also, check the Kubernetes events for OOMKill events:

kubectl describe pod <upstream-pod-name> -n <namespace>

Fix: Increase the memory (and potentially CPU) limits and requests for the upstream application’s deployment. In your Kubernetes Deployment YAML:

resources:
  limits:
    memory: "512Mi" # Increase from current value
    cpu: "500m"    # Increase from current value
  requests:
    memory: "256Mi" # Increase from current value
    cpu: "250m"    # Increase from current value

This gives the application more resources, preventing it from crashing due to resource exhaustion.

Why it works: Pods are terminated by Kubernetes if they exceed their memory limits, leading to abrupt connection resets. Providing more memory allows the application to run without hitting these limits.

2. Upstream Application Internal Timeouts

Diagnosis: The upstream application itself might have internal timeouts for processing requests. If a request takes too long, the application might drop the connection. Check the upstream application’s logs for any messages indicating slow processing or internal timeouts.

kubectl logs <upstream-pod-name> -n <namespace>

If the application is a web server, check its access logs for requests that took an unusually long time to complete.

Fix: Increase the internal timeouts within the upstream application’s configuration. This is highly application-specific. For example, if it’s a Node.js app, you might adjust server.timeout or similar settings. If it’s a Java app, you might look at thread pool timeouts. Alternatively, if the upstream application is not configurable, you can increase Istio’s upstream connection timeout. Edit your istio-ingressgateway or istiod configuration (depending on where the timeout is being enforced, often related to egress or ingress policies) or the VirtualService associated with the upstream service. For a VirtualService:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-upstream-service
spec:
  hosts:
  - my-upstream-service.example.com
  http:
  - route:
    - destination:
        host: my-upstream-service
        port:
          number: 8080
    timeout:
      seconds: 30 # Increase from default (e.g., 15s) or current value

Why it works: This gives the upstream application more time to process the request before Istio (or the application itself) gives up and terminates the connection.

3. Upstream Application Graceful Shutdown Issues

Diagnosis: When an upstream pod is scaled down or updated, Kubernetes sends a SIGTERM signal. The application is supposed to shut down gracefully within a terminationGracePeriodSeconds. If it doesn’t, Kubernetes forcefully kills it (SIGKILL), which can cause connection resets. Check kubectl describe pod <upstream-pod-name> for the Termination Message Path and Termination Message if available, and look for logs indicating a slow shutdown.

Fix: Ensure your upstream application handles SIGTERM signals correctly and shuts down gracefully within the configured terminationGracePeriodSeconds. In your Deployment YAML, increase terminationGracePeriodSeconds:

terminationGracePeriodSeconds: 60 # Increase from default (30s) or current value

Make sure your application code is designed to:

  1. Stop accepting new connections.
  2. Finish processing existing in-flight requests.
  3. Release resources and exit.

Why it works: A longer grace period allows the application more time to complete ongoing requests before being forcibly terminated, preventing abrupt connection drops.

4. Network Instability or Firewall Resets

Diagnosis: Intermittent network issues between the Istio sidecar (in the client pod) and the upstream application pod can cause TCP RST packets. This is harder to diagnose directly. Use tools like tcpdump on both the client and server pods to capture traffic during the error. On the client pod:

kubectl exec <client-pod-name> -n <namespace> -- tcpdump -i eth0 'port <upstream-port>' -w /tmp/client_capture.pcap

On the upstream pod:

kubectl exec <upstream-pod-name> -n <namespace> -- tcpdump -i eth0 'port <upstream-port>' -w /tmp/upstream_capture.pcap

Look for RST packets without a corresponding FIN packet. Also, check intermediate network devices (load balancers, firewalls) for any signs of connection termination.

Fix: Address underlying network issues. This might involve:

  • Ensuring stable network connectivity within your Kubernetes cluster.
  • Verifying firewall rules between nodes or network segments are not prematurely closing connections.
  • If using external load balancers, check their connection timeout and idle timeout settings.

Why it works: This directly addresses the source of network interruptions, ensuring connections are not being unexpectedly terminated by infrastructure.

5. Istio Egress Gateway or Sidecar Misconfiguration (Less Common for Remote Reset)

Diagnosis: While less common for Remote Reset (more common for Upstream Connection Termination), a misconfigured egress gateway or an issue with the client-sidecar could theoretically lead to premature connection termination. Check Istio proxy logs (istio-proxy container) in both the client and upstream pods for any unusual errors.

kubectl logs <client-pod-name> -c istio-proxy -n <namespace>
kubectl logs <upstream-pod-name> -c istio-proxy -n <namespace>

Look for errors related to connection handling or upstream communication.

Fix: Ensure your Istio configuration, particularly ServiceEntry and VirtualService for egress traffic, is correct.

  • If using an egress gateway, verify the gateway configuration and the associated VirtualService.
  • For sidecars, ensure meshConfig.outboundTrafficPolicy.mode is set appropriately (REGISTRY_ONLY or ALLOW_ANY).
  • Check istio-proxy configuration for specific timeouts if they’ve been manually overridden.

Why it works: Correct Istio configuration ensures that traffic is routed and handled as intended by the mesh, preventing proxy-level issues from causing connection drops.

6. Upstream Application Resource Leaks or Deadlocks

Diagnosis: An application that leaks resources (like file handles, network sockets, or threads) can eventually become unstable and crash or stop responding, leading to connection resets. Deadlocks within the application can also cause it to hang indefinitely. Debugging this requires deep introspection into the upstream application’s behavior (profiling, thread dumps). Use application-specific profiling tools. For Java, jstack or jmap might be useful. For Go, pprof.

Fix: Identify and fix the resource leaks or deadlocks in the upstream application’s codebase. This is a development task, not an infrastructure one.

Why it works: Resolving these internal application bugs makes the application stable and responsive, preventing it from reaching a state where it needs to drop connections.

After addressing these, the next error you might encounter if the upstream service is still unhealthy or slow could be upstream connect error or disconnect (if the connection itself can’t be established or is dropped by the network layer) or a general gateway timeout if the request eventually succeeds but takes too long.

Want structured learning?

Take the full Istio course →