gRPC requests don’t just disappear into the ether; they leave a trail of breadcrumbs that, if you know where to look, can tell you the whole story of their journey.
Let’s see it in action. Imagine a simple gRPC service that greets users.
// proto/greeter.proto
syntax = "proto3";
package greeter;
service Greeter {
rpc SayHello (HelloRequest) returns (HelloReply) {}
}
message HelloRequest {
string name = 1;
}
message HelloReply {
string message = 1;
}
Here’s a basic Go server and client that uses this service:
Server (server/main.go):
package main
import (
"context"
"fmt"
"log"
"net"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
pb "your_module/proto" // Replace with your module path
)
type server struct {
pb.UnimplementedGreeterServer
}
func (s *server) SayHello(ctx context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
log.Printf("Received: %v", in.GetName())
if in.GetName() == "error_user" {
return nil, status.Errorf(codes.InvalidArgument, "cannot greet error_user")
}
return &pb.HelloReply{Message: fmt.Sprintf("Hello %s!", in.GetName())}, nil
}
func main() {
lis, err := net.Listen("tcp", ":50051")
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
s := grpc.NewServer()
pb.RegisterGreeterServer(s, &server{})
log.Println("Server listening on :50051")
if err := s.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
}
Client (client/main.go):
package main
import (
"context"
"log"
"time"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
pb "your_module/proto" // Replace with your module path
)
const (
address = "localhost:50051"
)
func main() {
conn, err := grpc.Dial(address, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
log.Fatalf("did not connect: %v", err)
}
defer conn.Close()
c := pb.NewGreeterClient(conn)
name := "World"
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
r, err := c.SayHello(ctx, &pb.HelloRequest{Name: name})
if err != nil {
log.Printf("could not greet: %v", err)
} else {
log.Printf("Greeting: %s", r.GetMessage())
}
// Example of an error case
nameErr := "error_user"
ctxErr, cancelErr := context.WithTimeout(context.Background(), time.Second)
defer cancelErr()
_, err = c.SayHello(ctxErr, &pb.HelloRequest{Name: nameErr})
if err != nil {
log.Printf("expected error for %s: %v", nameErr, err)
}
}
Running this, you’d see logs on the server like Received: World and on the client Greeting: Hello World!. The error case would show expected error for error_user: rpc error: code = InvalidArgument desc = cannot greet error_user.
The problem is that log.Printf is just a single point. To understand the flow, especially in complex systems with multiple services, you need to connect these dots. This is where distributed tracing comes in. gRPC has built-in support for tracing, which allows you to see the entire path of a request as it traverses your microservices.
The core idea is to propagate a unique trace ID and span IDs across service boundaries. When a service receives a request, it creates a new span representing its work and links it to the parent span from the incoming request. This creates a directed acyclic graph (DAG) of your request’s execution.
The key components you’ll interact with are:
- Trace ID: A unique identifier for an entire end-to-end request.
- Span ID: A unique identifier for an operation within a trace (e.g., a single gRPC call, a database query).
- Parent Span ID: The ID of the span that initiated the current span.
- Span Context: The combination of Trace ID, Span ID, and other metadata that gets propagated.
To enable this, you typically use a tracing backend like Jaeger, Zipkin, or OpenTelemetry. gRPC integrates with these via interceptors.
Let’s instrument our Go example with OpenTelemetry, a popular choice for distributed tracing.
First, add the necessary OpenTelemetry Go SDK packages:
go get go.opentelemetry.io/otel \
go.opentelemetry.io/otel/attribute \
go.opentelemetry.io/otel/exporters/stdout/stdouttrace \
go.opentelemetry.io/otel/propagation \
go.opentelemetry.io/otel/sdk/resource \
go.opentelemetry.io/otel/sdk/trace \
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
Server (server/main.go - modified):
package main
import (
"context"
"fmt"
"log"
"net"
"os"
"time"
"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/otel/trace"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
pb "your_module/proto" // Replace with your module path
)
var tracer trace.Tracer
func initTracer() (*sdktrace.TracerProvider, error) {
ctx := context.Background()
exporter, err := stdouttrace.New(
stdouttrace.WithPrettyPrint(),
stdouttrace.WithWriter(os.Stdout), // Export to stdout for demonstration
)
if err != nil {
return nil, err
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.AlwaysSample()),
sdktrace.WithBatcher(exporter,
sdktrace.WithMaxExportBatchSize(sdktrace.DefaultMaxExportBatchSize)),
sdktrace.WithResource(resource.NewWithAttributes(
resource.SchemaURL,
attribute.String("service.name", "greeter-server"),
attribute.String("environment", "demo"),
)),
)
otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
tracer = tp.Tracer("greeter-server-tracer")
return tp, nil
}
type server struct {
pb.UnimplementedGreeterServer
}
func (s *server) SayHello(ctx context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
ctx, span := tracer.Start(ctx, "SayHello",
trace.WithSpanKind(trace.SpanKindServer),
trace.WithAttributes(attribute.String("request.name", in.GetName())),
)
defer span.End()
log.Printf("Received: %v", in.GetName())
// Simulate some work
time.Sleep(10 * time.Millisecond)
if in.GetName() == "error_user" {
span.SetStatus(codes.Error, "invalid user name")
return nil, status.Errorf(codes.InvalidArgument, "cannot greet error_user")
}
message := fmt.Sprintf("Hello %s!", in.GetName())
span.SetAttributes(attribute.String("response.message", message))
return &pb.HelloReply{Message: message}, nil
}
func main() {
tp, err := initTracer()
if err != nil {
log.Fatalf("failed to initialize tracer: %v", err)
}
defer func() {
if err := tp.Shutdown(context.Background()); err != nil {
log.Printf("Error shutting down tracer provider: %v", err)
}
}()
lis, err := net.Listen("tcp", ":50051")
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
s := grpc.NewServer(
grpc.StatsHandler(otelgrpc.NewServerStatsHandler()), // Add OpenTelemetry stats handler
grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()), // Add OpenTelemetry interceptor
)
pb.RegisterGreeterServer(s, &server{})
log.Println("Server listening on :50051")
if err := s.Serve(lis); err != nil {
log.Fatalf("failed to serve: %v", err)
}
}
Client (client/main.go - modified):
package main
import (
"context"
"log"
"os"
"time"
"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/otel/trace"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
pb "your_module/proto" // Replace with your module path
)
var tracer trace.Tracer
func initTracer() (*sdktrace.TracerProvider, error) {
ctx := context.Background()
exporter, err := stdouttrace.New(
stdouttrace.WithPrettyPrint(),
stdouttrace.WithWriter(os.Stdout), // Export to stdout for demonstration
)
if err != nil {
return nil, err
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.AlwaysSample()),
sdktrace.WithBatcher(exporter,
sdktrace.WithMaxExportBatchSize(sdktrace.DefaultMaxExportBatchSize)),
sdktrace.WithResource(resource.NewWithAttributes(
resource.SchemaURL,
attribute.String("service.name", "greeter-client"),
attribute.String("environment", "demo"),
)),
)
otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
tracer = tp.Tracer("greeter-client-tracer")
return tp, nil
}
const (
address = "localhost:50051"
)
func main() {
tp, err := initTracer()
if err != nil {
log.Fatalf("failed to initialize tracer: %v", err)
}
defer func() {
if err := tp.Shutdown(context.Background()); err != nil {
log.Printf("Error shutting down tracer provider: %v", err)
}
}()
// Set up a connection to the server.
conn, err := grpc.Dial(address,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithStatsHandler(otelgrpc.NewClientStatsHandler()), // Add OpenTelemetry stats handler
grpc.WithUnaryClientInterceptor(otelgrpc.UnaryClientInterceptor()), // Add OpenTelemetry interceptor
)
if err != nil {
log.Fatalf("did not connect: %v", err)
}
defer conn.Close()
c := pb.NewGreeterClient(conn)
name := "World"
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
// Start a span for the client's operation
ctx, span := tracer.Start(ctx, "SayHelloOperation",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(attribute.String("request.name", name)),
)
defer span.End()
r, err := c.SayHello(ctx, &pb.HelloRequest{Name: name})
if err != nil {
log.Printf("could not greet: %v", err)
span.RecordError(err) // Record the error in the span
} else {
log.Printf("Greeting: %s", r.GetMessage())
span.SetAttributes(attribute.String("response.message", r.GetMessage()))
}
// Example of an error case
nameErr := "error_user"
ctxErr, cancelErr := context.WithTimeout(context.Background(), time.Second)
defer cancelErr()
ctxErr, spanErr := tracer.Start(ctxErr, "SayHelloOperationError",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(attribute.String("request.name", nameErr)),
)
defer spanErr.End()
_, err = c.SayHello(ctxErr, &pb.HelloRequest{Name: nameErr})
if err != nil {
log.Printf("expected error for %s: %v", nameErr, err)
spanErr.RecordError(err) // Record the error in the span
}
}
When you run these, the stdouttrace exporter will print the trace data directly to your console. You’ll see JSON output representing the spans. Each request will have a traceId. The client’s SayHelloOperation span will have a spanId, and the server’s SayHello span will have a different spanId but will reference the client’s spanId as its parentSpanId. This linkage is what builds the trace.
The otelgrpc interceptors automatically handle injecting and extracting the SpanContext into/from the gRPC metadata. This means you don’t have to manually manage passing trace information. The trace.Context in Go is the mechanism by which this context is carried.
The resource.NewWithAttributes part is crucial for identifying your services. In a real-world scenario, you’d configure exporters to send data to a centralized tracing backend (like Jaeger or Zipkin) instead of stdouttrace. There, you could visualize these traces, filter by service, error codes, or duration, and pinpoint performance bottlenecks or failures across your distributed system.
The server’s otelgrpc.NewServerStatsHandler() and otelgrpc.UnaryServerInterceptor() are what hook into the gRPC server lifecycle to create and manage server-side spans. Similarly, the client uses otelgrpc.NewClientStatsHandler() and otelgrpc.UnaryClientInterceptor() for client-side tracing.
The trace.WithSpanKind(trace.SpanKindServer) and trace.WithSpanKind(trace.SpanKindClient) tell the tracing system whether this span represents work done on a server or a client. This is important for visualizing the flow correctly.
The otel.SetTextMapPropagator ensures that trace context information embedded in request headers (like traceparent) is correctly read by the server and injected into outgoing requests from the client.
You can now see the duration of each gRPC call, the attributes associated with it (like the name being greeted), and if an error occurred, that will be clearly marked on the span. This provides a much richer view than simple logs.
The next step is to explore how to configure different sampling strategies to avoid overwhelming your tracing backend with data, especially in high-throughput systems.