Stream Server Responses to gRPC Clients (2026)

gRPC clients don’t inherently receive streamed responses; they request them, and the server pushes them back over a persistent connection.

Let’s see this in action. Imagine you have a service that generates large reports, and you don’t want to wait for the whole thing to be ready before you start sending it.

// report_generator.proto
service ReportService {
  rpc GenerateReport(ReportRequest) returns (stream ReportChunk);
}

message ReportRequest {
  string report_id;
}

message ReportChunk {
  bytes data;
  int32 chunk_number;
}

On the server side, you’d implement GenerateReport like this:

// server.go
func (s *reportServer) GenerateReport(req *pb.ReportRequest, stream pb.ReportService_GenerateReportServer) error {
	log.Printf("Generating report for ID: %s", req.GetReportId())
	for i := 0; i < 10; i++ {
		// Simulate generating a chunk of the report
		chunkData := []byte(fmt.Sprintf("This is chunk %d of report %s\n", i, req.GetReportId()))
		err := stream.Send(&pb.ReportChunk{
			Data:       chunkData,
			ChunkNumber: int32(i),
		})
		if err != nil {
			log.Printf("Error sending chunk %d: %v", i, err)
			return err
		}
		time.Sleep(100 * time.Millisecond) // Simulate work
	}
	log.Printf("Finished generating report for ID: %s", req.GetReportId())
	return nil
}

And the client would consume it like this:

// client.go
func callGenerateReport(client pb.ReportServiceClient, reportID string) {
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	req := &pb.ReportRequest{ReportId: reportID}
	stream, err := client.GenerateReport(ctx, req)
	if err != nil {
		log.Fatalf("could not request report: %v", err)
	}

	log.Printf("Receiving report for ID: %s", reportID)
	var receivedData bytes.Buffer
	for {
		chunk, err := stream.Recv()
		if err == io.EOF {
			break // All chunks received
		}
		if err != nil {
			log.Fatalf("failed to receive chunk: %v", err)
		}
		log.Printf("Received chunk %d, size: %d", chunk.GetChunkNumber(), len(chunk.GetData()))
		receivedData.Write(chunk.GetData())
	}
	log.Printf("Report %s received completely. Total size: %d bytes", reportID, receivedData.Len())
}

The core problem gRPC streaming solves is asynchronous data delivery over a single, long-lived connection. Instead of making multiple round trips for discrete pieces of data or waiting for a massive single payload, the server can push data as it becomes available. This is crucial for scenarios like real-time analytics, large file transfers, or continuous data feeds where latency is critical or the total data size is unpredictable.

The mental model here is a one-way street for the request, but a two-way street for the response stream. The client initiates the conversation by sending a single request message. Once that request is received and processed by the server, the server can then send back multiple response messages, one after another, over the same underlying gRPC connection. The client continuously listens for these incoming response messages until the server signals the end of the stream (by returning an error, typically io.EOF on the client side when the server is done sending).

You control the flow by defining the stream keyword in your .proto file. This tells gRPC that the server will send a sequence of messages back to the client, rather than a single response. The client-side stub generation handles the complexity of receiving this stream, providing an iterator-like interface (stream.Recv()) that you call repeatedly. The server-side implementation receives a ServerStream object, which has a Send() method to push individual messages back to the client.

The underlying HTTP/2 connection is multiplexed. This means that even though the client requests a stream, and the server is sending back multiple messages, all of this traffic is still happening over a single TCP connection, with request and response frames interleaved efficiently by HTTP/2. This avoids the overhead of establishing new connections for each piece of data.

The most surprising thing about gRPC streaming is that the "stream" is initiated by the client’s request. It’s not a bidirectional channel where both client and server can send independent messages at any time; it’s a server-streaming RPC where the client sends one request, and the server sends back a sequence of responses. This is distinct from bidirectional streaming RPCs, where both sides can send messages independently after the initial setup.

The next hurdle you’ll run into is handling errors gracefully within the stream, especially when the client needs to cancel the operation mid-stream.