HTTP/2 as the transport for internal microservice communication is surprisingly a net negative for latency in most common microservice patterns.
Let’s see it in action. Imagine a simple request flow: a frontend service (frontend-svc) calls a backend service (backend-svc) which in turn calls a database service (db-svc).
# Request 1: frontend-svc -> backend-svc
POST /process_order HTTP/2
Host: backend-svc:8080
Content-Type: application/json
...
# Response 1: backend-svc -> frontend-svc
HTTP/2 200 OK
Content-Type: application/json
...
# Request 2: backend-svc -> db-svc
POST /save_order HTTP/2
Host: db-svc:9000
Content-Type: application/json
...
# Response 2: db-svc -> backend-svc
HTTP/2 200 OK
Content-Type: application/json
...
If we were using HTTP/1.1, each of these requests would establish a new TCP connection (or reuse one from a connection pool). With HTTP/2, the first request establishes a single TCP connection, and all subsequent requests (even to different hosts, if configured correctly via a proxy) multiplex over that same connection. This eliminates the overhead of establishing new TCP connections for every single RPC call.
The core problem HTTP/2 solves is the inefficiency of HTTP/1.1. HTTP/1.1’s "one request per connection" model led to:
- Head-of-Line Blocking: If a request on a connection is slow, it blocks all subsequent requests on that same connection.
- Connection Overhead: Establishing TCP connections is expensive (three-way handshake), and TLS handshakes are even more so. Many short-lived connections chew up resources.
HTTP/2 addresses these with:
- Multiplexing: Multiple requests and responses can be in flight simultaneously over a single TCP connection, broken down into frames. This means a slow response doesn’t block others.
- Header Compression (HPACK): Reduces the size of HTTP headers, which are particularly verbose in microservice communication.
- Server Push: Allows the server to send resources to the client that it anticipates the client will need, before the client explicitly requests them. (Less common for RPC, more for web assets).
- Flow Control: Prevents a fast sender from overwhelming a slow receiver.
The real levers you control are primarily at the network and application layer configurations of your HTTP/2 clients and servers.
- Client-side settings: You can configure things like
MaxConcurrentStreams(how many requests can be outstanding on a single connection) andInitialWindowSize(how much data can be sent before flow control is applied). - Server-side settings: Similar configurations apply, dictating how many streams the server will accept and how it handles incoming data.
- TLS Configuration: While HTTP/2 can run over plain TCP (h2c), it’s almost always used with TLS (h2). Proper TLS cipher suite selection and certificate management are crucial.
- Proxy Configuration: If you’re using a service mesh like Istio or Linkerd, or even a simple reverse proxy like Nginx or Envoy, their configurations for HTTP/2 settings will heavily influence behavior.
The most surprising thing about HTTP/2 multiplexing for internal microservices is how easily it can increase latency if not managed carefully. While it eliminates TCP connection overhead, the inherent nature of multiplexing means that a single slow request on a shared connection can cause delays for all other requests on that same connection. This is a different form of head-of-line blocking, now occurring at the stream level within a single TCP connection, rather than at the connection level. A single misbehaving or overloaded service can effectively throttle all other services sharing its connection, a problem often masked by the perceived "efficiency" of HTTP/2.
The next concept you’ll likely grapple with is how to effectively monitor and debug performance issues within these multiplexed HTTP/2 streams.