You can get distributed tracing for your GKE applications with Cloud Trace, but it’s not as simple as just flipping a switch; you’re actually building a pipeline that captures and visualizes requests as they hop between your services.
Let’s see this in action. Imagine a user request hitting a frontend service, which then calls an authentication service, and finally a user profile service. Without tracing, if the request is slow, you’re blind. With tracing, you’ll see a single trace in Cloud Trace, with each service call represented as a span, showing how long each hop took and how they relate.
Here’s the setup you’ll need:
-
Enable the Cloud Trace API:
- Go to the Google Cloud Console -> APIs & Services -> Library.
- Search for "Cloud Trace API" and enable it for your project. This is the foundational step, allowing your Google Cloud project to receive trace data.
-
Instrument Your Applications: This is where the magic happens. You need to add libraries to your application code that will generate trace spans. The OpenTelemetry SDKs are the modern, vendor-neutral way to do this.
-
For Go: Add
go.opentelemetry.io/otelandgo.opentelemetry.io/otel/exporters/otlp/ot traceto yourgo.mod.import ( "context" "log" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/exporters/otlp/ot trace/ot tracegrpc" "go.opentelemetry.io/otel/sdk/resource" "go.opentelemetry.io/otel/sdk/trace" semconv "go.opentelemetry.io/otel/semconv/v1.10.0" ) func initTracer(ctx context.Context) func() { // Replace with your GKE cluster's collector endpoint endpoint := "your-collector-service.your-namespace.svc.cluster.local:4317" // Replace with your Project ID projectID := "your-gcp-project-id" res, err := resource.New(ctx, resource.WithAttributes( semconv.ServiceNameKey.String("my-go-service"), semconv.DeploymentEnvironmentKey.String("production"), semconv.CloudPlatformKey.String("gke"), semconv.CloudRegionKey.String("us-central1"), // Or your region ), ) if err != nil { log.Fatalf("failed to create resource: %v", err) } clientOpts := []ot tracegrpc.Option{ ot tracegrpc.WithEndpoint(endpoint), ot tracegrpc.WithReconnectionPeriod(5 * time.Second), ot tracegrpc.WithDialTimeout(10 * time.Second), } exporter, err := ot tracegrpc.New(ctx, clientOpts...) if err != nil { log.Fatalf("failed to create OTLP exporter: %v", err) } tp := trace.NewTracerProvider( trace.WithBatcher(exporter), trace.WithResource(res), ) otel.SetTracerProvider(tp) return func() { ctx := context.Background() if err := tp.Shutdown(ctx); err != nil { log.Fatalf("failed to shutdown TracerProvider: %v", err) } } }This code initializes an OpenTelemetry tracer. It configures an OTLP (OpenTelemetry Protocol) exporter to send traces over gRPC to a specified endpoint (your collector). It also sets up resource attributes to identify your service within GKE and Google Cloud.
-
For Java (Spring Boot): Add
io.opentelemetry.javaagent:opentelemetry-javaagentas a Java agent to your JVM.java -javaagent:/path/to/opentelemetry-javaagent.jar \ -Dotel.exporter.otlp.endpoint=your-collector-service.your-namespace.svc.cluster.local:4317 \ -Dotel.resource.attributes=service.name=my-java-service,cloud.platform=gke,cloud.region=us-central1 \ -jar my-app.jarThe Java agent automatically instruments many common libraries (HTTP clients, JDBC, etc.) and can be configured to export OTLP traces to your collector.
-
For Python: Install
opentelemetry-api,opentelemetry-sdk,opentelemetry-exporter-otlp, andopentelemetry-instrumentation-requests.import os from opentelemetry import trace from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter # Replace with your GKE cluster's collector endpoint collector_endpoint = "your-collector-service.your-namespace.svc.cluster.local:4317" # Replace with your Project ID project_id = "your-gcp-project-id" resource = Resource(attributes={ "service.name": "my-python-service", "deployment.environment": "production", "cloud.platform": "gke", "cloud.region": "us-central1", # Or your region }) provider = TracerProvider(resource=resource) span_processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=collector_endpoint)) provider.add_span_processor(span_processor) trace.set_tracer_provider(provider) # For HTTP requests, you'll also need: from opentelemetry.instrumentation.requests import RequestsInstrumentor RequestsInstrumentor().instrument()This Python code sets up the OpenTelemetry SDK with an OTLP exporter. The
RequestsInstrumentorautomatically adds tracing to outgoing HTTP requests made by therequestslibrary.
-
-
Deploy an OpenTelemetry Collector: This is the crucial intermediary that receives spans from your applications and forwards them to Cloud Trace.
-
Deployment: You’ll typically deploy the collector as a
Deploymentin GKE.apiVersion: apps/v1 kind: Deployment metadata: name: otel-collector namespace: tracing # Or your preferred namespace spec: replicas: 1 selector: matchLabels: app: otel-collector template: metadata: labels: app: otel-collector spec: containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:0.70.0 # Use a recent version ports: - name: otlp-grpc containerPort: 4317 - name: otlp-http containerPort: 4318 - name: health containerPort: 13112 # Add a volumeMount for config if you're using a config file -
Configuration (
otel-collector-config.yaml): This tells the collector how to receive, process, and export data.receivers: otlp: protocols: grpc: http: processors: batch: memory_limiter: check_interval: 1s limit_mib: 200 spike_limit_mib: 100 resource: attributes: - key: cloud.provider value: gcp action: insert - key: cloud.platform value: gke action: insert - key: cloud.region value: us-central1 # Match your GKE region action: insert exporters: google_cloud: project: your-gcp-project-id # Replace with your Project ID trace: timeout: 10s # If using a private GKE cluster, you might need to configure authentication # e.g., using Workload Identity or a service account key. # For public clusters with default service accounts, it might work out-of-the-box. service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch, resource] exporters: [google_cloud]This config defines OTLP receivers, a batch processor to group spans, and crucially, a
google_cloudexporter that sends traces to Cloud Trace. Theresourceprocessor adds common cloud attributes if your application didn’t already provide them. -
Service and ServiceAccount: Expose the collector using a
Serviceand ensure it has permissions to write to Cloud Trace.apiVersion: v1 kind: Service metadata: name: otel-collector-service namespace: tracing spec: selector: app: otel-collector ports: - name: otlp-grpc port: 4317 targetPort: 4317 - name: otlp-http port: 4318 targetPort: 4318 type: ClusterIP # Or LoadBalancer if you need external access (not recommended for internal tracing)If your GKE nodes don’t have a default service account with the
cloud-platformscope, or if you’re using Workload Identity, you’ll need to create aServiceAccountfor the collector and grant it theroles/cloudtrace.agentIAM role.apiVersion: v1 kind: ServiceAccount metadata: name: otel-collector-sa namespace: tracing --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: otel-collector-binding subjects: - kind: ServiceAccount name: otel-collector-sa namespace: tracing roleRef: kind: ClusterRole name: google-cloud-trace-agent # This role is often pre-defined or can be created apiGroup: rbac.authorization.k8s.io # If you need to create the role manually: # apiVersion: rbac.authorization.k8s.io/v1 # kind: ClusterRole # metadata: # name: google-cloud-trace-agent # rules: # - apiGroups: [""] # resources: ["projects"] # verbs: ["get"] # - apiGroups: ["cloudtrace.googleapis.com"] # resources: ["traces"] # verbs: ["create", "update"]Then, in your
otel-collectorDeployment, addserviceAccountName: otel-collector-sa.
-
-
View Traces in Cloud Trace:
- Go to the Google Cloud Console -> Trace -> Explorer.
- You should start seeing traces appear within a minute or two. You can filter by service name, latency, and other attributes.
The key insight is that Cloud Trace itself doesn’t generate the traces; it’s a backend for collecting and visualizing them. Your applications and the OpenTelemetry collector are the active participants in the tracing pipeline.
Once you have tracing set up, the next logical step is to start correlating traces with logs. You’ll want to ensure your logs contain trace IDs and span IDs so you can jump directly from a slow trace segment to the detailed logs generated by that specific operation.