Next.js’s Streaming with Suspense Boundaries is actually a sophisticated handshake between the client and server, not just a feature for pretty loading states.
Let’s see this in action. Imagine a page that fetches data from two different APIs, one for user profiles (fast) and another for product recommendations (slow).
// app/page.tsx
import UserProfile from './UserProfile';
import ProductRecommendations from './ProductRecommendations';
import { Suspense } from 'react';
export default function HomePage() {
return (
<div>
<h1>Welcome!</h1>
<Suspense fallback={<UserProfile.Skeleton />}>
<UserProfile userId="123" />
</Suspense>
<Suspense fallback={<ProductRecommendations.Skeleton />}>
<ProductRecommendations userId="123" />
</Suspense>
</div>
);
}
Here, UserProfile and ProductRecommendations are components that fetch data. The Suspense boundaries tell React what to render while the data for their children is being fetched.
Under the hood, Next.js leverages React Server Components (RSCs) and a specialized streaming renderer. When a user requests /, the server doesn’t wait for all data to be ready. Instead, it starts rendering the page. As soon as it has the HTML for the static parts and the UserProfile component (which is fast), it sends that down to the browser. The browser can immediately start rendering Welcome! and the UserProfile.
Simultaneously, the server continues fetching data for ProductRecommendations. Once that data is ready, the server sends down a small chunk of JavaScript (a "RSC payload") containing the rendered ProductRecommendations component. This payload is injected into the existing HTML, and React on the client seamlessly updates the DOM without a full page re-render. The user sees the product recommendations appear without the whole page flashing or showing a generic loading spinner for everything.
The core problem this solves is perceived performance. Traditional server-rendered apps send all HTML at once, meaning the user waits for the slowest data fetch to complete before seeing anything. Client-side rendered apps show a loading spinner for the entire page until all data is fetched and the JavaScript is executed. Streaming breaks the page into independent, streamable chunks, allowing the user to interact with parts of the page while other parts are still loading.
The exact levers you control are the Suspense boundaries themselves and the data fetching logic within your Server Components. By wrapping components that fetch data in Suspense, you declare to React that these components might "suspend" (pause rendering) while waiting for data. Next.js’s renderer then knows to stream these components individually.
The real magic happens in how the server communicates these chunks. It’s not just sending HTML. It’s sending a structured format that tells the client: "Here’s some HTML for this part of the page," followed by "Here’s the data and rendering instructions for this specific component that was previously suspended." This format is often referred to as the "React Server Component Flight Data," and it’s a highly optimized binary or JSON-like structure that allows for efficient patching of the client-side virtual DOM.
This streaming capability is what enables features like notFound() and redirect() within Server Components to work seamlessly. When notFound() is called in a Server Component, the server can immediately send a 404 Not Found response for that specific segment of the stream, and the client-side router will handle the full page navigation to the 404 page without the user experiencing a full page reload.
The most surprising thing is that the server doesn’t always send HTML first. In certain complex scenarios, particularly with deeply nested Suspense boundaries and dynamic routes, the server might even send the RSC payload for a component before it sends the initial HTML for that part of the page, allowing the client to render the data-driven content as soon as it arrives, even if the static shell hasn’t fully landed yet.
The next frontier here is understanding how to manage concurrent rendering and data fetching across multiple independent Suspense boundaries for truly granular loading states.