You can get distributed tracing for your GKE applications with Cloud Trace, but it’s not as simple as just flipping a switch; you’re actually building a pipeline that captures and visualizes requests as they hop between your services.

Let’s see this in action. Imagine a user request hitting a frontend service, which then calls an authentication service, and finally a user profile service. Without tracing, if the request is slow, you’re blind. With tracing, you’ll see a single trace in Cloud Trace, with each service call represented as a span, showing how long each hop took and how they relate.

Here’s the setup you’ll need:

  1. Enable the Cloud Trace API:

    • Go to the Google Cloud Console -> APIs & Services -> Library.
    • Search for "Cloud Trace API" and enable it for your project. This is the foundational step, allowing your Google Cloud project to receive trace data.
  2. Instrument Your Applications: This is where the magic happens. You need to add libraries to your application code that will generate trace spans. The OpenTelemetry SDKs are the modern, vendor-neutral way to do this.

    • For Go: Add go.opentelemetry.io/otel and go.opentelemetry.io/otel/exporters/otlp/ot trace to your go.mod.

      import (
          "context"
          "log"
          "go.opentelemetry.io/otel"
          "go.opentelemetry.io/otel/attribute"
          "go.opentelemetry.io/otel/exporters/otlp/ot trace/ot tracegrpc"
          "go.opentelemetry.io/otel/sdk/resource"
          "go.opentelemetry.io/otel/sdk/trace"
          semconv "go.opentelemetry.io/otel/semconv/v1.10.0"
      )
      
      func initTracer(ctx context.Context) func() {
          // Replace with your GKE cluster's collector endpoint
          endpoint := "your-collector-service.your-namespace.svc.cluster.local:4317"
          // Replace with your Project ID
          projectID := "your-gcp-project-id"
      
          res, err := resource.New(ctx,
              resource.WithAttributes(
                  semconv.ServiceNameKey.String("my-go-service"),
                  semconv.DeploymentEnvironmentKey.String("production"),
                  semconv.CloudPlatformKey.String("gke"),
                  semconv.CloudRegionKey.String("us-central1"), // Or your region
              ),
          )
          if err != nil {
              log.Fatalf("failed to create resource: %v", err)
          }
      
          clientOpts := []ot tracegrpc.Option{
              ot tracegrpc.WithEndpoint(endpoint),
              ot tracegrpc.WithReconnectionPeriod(5 * time.Second),
              ot tracegrpc.WithDialTimeout(10 * time.Second),
          }
      
          exporter, err := ot tracegrpc.New(ctx, clientOpts...)
          if err != nil {
              log.Fatalf("failed to create OTLP exporter: %v", err)
          }
      
          tp := trace.NewTracerProvider(
              trace.WithBatcher(exporter),
              trace.WithResource(res),
          )
          otel.SetTracerProvider(tp)
      
          return func() {
              ctx := context.Background()
              if err := tp.Shutdown(ctx); err != nil {
                  log.Fatalf("failed to shutdown TracerProvider: %v", err)
              }
          }
      }
      

      This code initializes an OpenTelemetry tracer. It configures an OTLP (OpenTelemetry Protocol) exporter to send traces over gRPC to a specified endpoint (your collector). It also sets up resource attributes to identify your service within GKE and Google Cloud.

    • For Java (Spring Boot): Add io.opentelemetry.javaagent:opentelemetry-javaagent as a Java agent to your JVM.

      java -javaagent:/path/to/opentelemetry-javaagent.jar \
           -Dotel.exporter.otlp.endpoint=your-collector-service.your-namespace.svc.cluster.local:4317 \
           -Dotel.resource.attributes=service.name=my-java-service,cloud.platform=gke,cloud.region=us-central1 \
           -jar my-app.jar
      

      The Java agent automatically instruments many common libraries (HTTP clients, JDBC, etc.) and can be configured to export OTLP traces to your collector.

    • For Python: Install opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-otlp, and opentelemetry-instrumentation-requests.

      import os
      from opentelemetry import trace
      from opentelemetry.sdk.resources import Resource
      from opentelemetry.sdk.trace import TracerProvider
      from opentelemetry.sdk.trace.export import BatchSpanProcessor
      from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
      
      # Replace with your GKE cluster's collector endpoint
      collector_endpoint = "your-collector-service.your-namespace.svc.cluster.local:4317"
      # Replace with your Project ID
      project_id = "your-gcp-project-id"
      
      resource = Resource(attributes={
          "service.name": "my-python-service",
          "deployment.environment": "production",
          "cloud.platform": "gke",
          "cloud.region": "us-central1", # Or your region
      })
      
      provider = TracerProvider(resource=resource)
      span_processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=collector_endpoint))
      provider.add_span_processor(span_processor)
      trace.set_tracer_provider(provider)
      
      # For HTTP requests, you'll also need:
      from opentelemetry.instrumentation.requests import RequestsInstrumentor
      RequestsInstrumentor().instrument()
      

      This Python code sets up the OpenTelemetry SDK with an OTLP exporter. The RequestsInstrumentor automatically adds tracing to outgoing HTTP requests made by the requests library.

  3. Deploy an OpenTelemetry Collector: This is the crucial intermediary that receives spans from your applications and forwards them to Cloud Trace.

    • Deployment: You’ll typically deploy the collector as a Deployment in GKE.

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: otel-collector
        namespace: tracing # Or your preferred namespace
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: otel-collector
        template:
          metadata:
            labels:
              app: otel-collector
          spec:
            containers:
            - name: otel-collector
              image: otel/opentelemetry-collector-contrib:0.70.0 # Use a recent version
              ports:
              - name: otlp-grpc
                containerPort: 4317
              - name: otlp-http
                containerPort: 4318
              - name: health
                containerPort: 13112
              # Add a volumeMount for config if you're using a config file
      
    • Configuration (otel-collector-config.yaml): This tells the collector how to receive, process, and export data.

      receivers:
        otlp:
          protocols:
            grpc:
            http:
      
      processors:
        batch:
        memory_limiter:
          check_interval: 1s
          limit_mib: 200
          spike_limit_mib: 100
        resource:
          attributes:
          - key: cloud.provider
            value: gcp
            action: insert
          - key: cloud.platform
            value: gke
            action: insert
          - key: cloud.region
            value: us-central1 # Match your GKE region
            action: insert
      
      exporters:
        google_cloud:
          project: your-gcp-project-id # Replace with your Project ID
          trace:
            timeout: 10s
            # If using a private GKE cluster, you might need to configure authentication
            # e.g., using Workload Identity or a service account key.
            # For public clusters with default service accounts, it might work out-of-the-box.
      
      service:
        pipelines:
          traces:
            receivers: [otlp]
            processors: [memory_limiter, batch, resource]
            exporters: [google_cloud]
      

      This config defines OTLP receivers, a batch processor to group spans, and crucially, a google_cloud exporter that sends traces to Cloud Trace. The resource processor adds common cloud attributes if your application didn’t already provide them.

    • Service and ServiceAccount: Expose the collector using a Service and ensure it has permissions to write to Cloud Trace.

      apiVersion: v1
      kind: Service
      metadata:
        name: otel-collector-service
        namespace: tracing
      spec:
        selector:
          app: otel-collector
        ports:
        - name: otlp-grpc
          port: 4317
          targetPort: 4317
        - name: otlp-http
          port: 4318
          targetPort: 4318
        type: ClusterIP # Or LoadBalancer if you need external access (not recommended for internal tracing)
      

      If your GKE nodes don’t have a default service account with the cloud-platform scope, or if you’re using Workload Identity, you’ll need to create a ServiceAccount for the collector and grant it the roles/cloudtrace.agent IAM role.

      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: otel-collector-sa
        namespace: tracing
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: otel-collector-binding
      subjects:
      - kind: ServiceAccount
        name: otel-collector-sa
        namespace: tracing
      roleRef:
        kind: ClusterRole
        name: google-cloud-trace-agent # This role is often pre-defined or can be created
        apiGroup: rbac.authorization.k8s.io
      # If you need to create the role manually:
      # apiVersion: rbac.authorization.k8s.io/v1
      # kind: ClusterRole
      # metadata:
      #   name: google-cloud-trace-agent
      # rules:
      # - apiGroups: [""]
      #   resources: ["projects"]
      #   verbs: ["get"]
      # - apiGroups: ["cloudtrace.googleapis.com"]
      #   resources: ["traces"]
      #   verbs: ["create", "update"]
      

      Then, in your otel-collector Deployment, add serviceAccountName: otel-collector-sa.

  4. View Traces in Cloud Trace:

    • Go to the Google Cloud Console -> Trace -> Explorer.
    • You should start seeing traces appear within a minute or two. You can filter by service name, latency, and other attributes.

The key insight is that Cloud Trace itself doesn’t generate the traces; it’s a backend for collecting and visualizing them. Your applications and the OpenTelemetry collector are the active participants in the tracing pipeline.

Once you have tracing set up, the next logical step is to start correlating traces with logs. You’ll want to ensure your logs contain trace IDs and span IDs so you can jump directly from a slow trace segment to the detailed logs generated by that specific operation.

Want structured learning?

Take the full Gke course →