HTTP/2 gets rid of head-of-line (HOL) blocking at the HTTP layer by multiplexing requests and responses over a single TCP connection.
Let’s see it in action. Imagine you’re fetching a webpage with several assets: an HTML document, a CSS file, and a JavaScript file.
// Client initiates multiple requests
Client -> Server: GET /index.html
Client -> Server: GET /styles.css
Client -> Server: GET /script.js
In HTTP/1.1, if index.html takes a while to arrive due to network congestion or server processing, the entire connection stalls. styles.css and script.js requests, even if ready on the server, have to wait for index.html to finish before they can be delivered. This is HTTP-level HOL blocking.
HTTP/2 changes this dramatically. All these requests are sent over the same TCP connection, but they are broken down into smaller, independent frames.
// Client initiates requests, server responds with frames
Client -> Server: (Request 1: GET /index.html)
Server -> Client: (Frame A: part of index.html)
Server -> Client: (Frame B: part of styles.css) <-- Notice this!
Server -> Client: (Frame C: part of index.html)
Server -> Client: (Frame D: part of script.js) <-- And this!
Server -> Client: (Frame E: part of styles.css)
The client receives these frames out of order. It then reassembles them based on their stream identifiers. The crucial part is that the delivery of Frame B (from styles.css) is not blocked by the delivery of Frame A or Frame C (from index.html). Each stream (representing an individual request/response pair) is independent.
The core problem HTTP/2 solves is that HTTP/1.1 treated each request/response as a sequential, blocking operation on a connection. If one request was slow, the whole connection was effectively blocked for subsequent requests on that same connection. This was especially painful with multiple asset fetches on a single page.
HTTP/2 introduces the concept of streams. When you make a request, it’s assigned a unique stream ID. The data for that request and its response is then chopped up into frames. These frames, each tagged with the stream ID, are sent over a single TCP connection. The HTTP/2 endpoints (client and server) then use these stream IDs to reassemble the frames into the correct request and response sequences. Because frames from different streams can be interleaved on the wire, a delay in one stream (e.g., a slow index.html response) doesn’t prevent frames from other streams (e.g., styles.css or script.js) from being sent and received. The TCP connection itself can still experience HOL blocking (where a lost TCP packet delays all subsequent packets on that connection, regardless of stream), but at the HTTP layer, this problem is gone.
The key levers you control are related to how the server and client negotiate and manage these streams and their priorities. When a client sends multiple requests, it can assign priorities to them. For example, it might tell the server that the HTML document is HIGH priority, while images are LOW priority. The server then uses these priorities to decide which frames to send first when bandwidth is constrained. This is configured on the server-side, often through web server directives like http2_push_priority in Nginx, or on the client-side through browser settings or JavaScript libraries.
What most people don’t realize is how the server uses the PRIORITY frame. While a client can suggest priorities for streams, the server can also re-prioritize streams dynamically based on its own heuristics or by receiving PRIORITY frames from the client that modify existing stream priorities. This allows for sophisticated management of resource delivery, ensuring critical resources are delivered first even when multiple requests are active.
The next major hurdle to overcome is understanding the implications of TCP HOL blocking and how solutions like QUIC aim to address it.