Java services can trace their requests across distributed systems using OpenTelemetry, but it’s not just about adding a JAR; it’s about understanding how the JVM and your application’s lifecycle interact with tracing instrumentation.
Let’s see this in action. Imagine a simple Java service that calls another service. We’ll instrument it with OpenTelemetry.
First, add the OpenTelemetry Java agent to your JVM startup. This is done via the -javaagent flag.
java -javaagent:./opentelemetry-javaagent-all.jar -jar my-java-service.jar
When a request comes in, say an HTTP GET to /process, the agent automatically creates a root span. If this service then makes an outbound HTTP call to another service at /external-data, the agent injects tracing context into the outgoing request headers. The downstream service, if also instrumented, will pick up this context and create a child span, linking the two operations.
Here’s a snippet of what a trace might look like conceptually:
Trace ID: abcdef1234567890
Span ID: 001 (Root Span - /process)
Start Time: 2023-10-27T10:00:00Z
Duration: 150ms
Attributes:
http.method: GET
http.url: /process
service.name: my-java-service
Span ID: 002 (Child Span - HTTP Client Call)
Start Time: 2023-10-27T10:00:00.050Z
Duration: 100ms
Attributes:
http.method: GET
http.url: http://external-service/external-data
service.name: my-java-service
span.kind: client
// Context injected into headers: traceparent: 00-abcdef1234567890-1234567890123456-01
The core problem OpenTelemetry tracing solves is making the invisible visible. In a microservices architecture, a single user request can traverse dozens of services. Without tracing, debugging a slow or failing request means diving into logs of each service independently, trying to correlate timestamps and request IDs. Tracing stitches these individual service operations into a single, coherent view of the entire request flow, showing you exactly where time is spent and where errors originate.
Internally, the OpenTelemetry Java agent works by using Java instrumentation agents (Javassist or Byte Buddy under the hood). It intercepts bytecode at load time for specific classes (like HttpServlet, HttpClient, ExecutorService, etc.) and injects code to create, manage, and export spans. This happens without you modifying your application’s source code for many common libraries. You can configure which instrumentations are enabled, what attributes are collected, and where traces are exported.
The mental model to build is one of nested, timed events. Each span represents a unit of work. Spans can be parented by other spans, forming a causal hierarchy. The root span is the entry point of the traced operation (e.g., an incoming HTTP request, a message from a queue). Child spans represent operations initiated by their parent, such as outgoing network calls, database queries, or executing a specific piece of business logic. The total duration of a trace is the duration of the root span, but the time spent within a specific service can be understood by summing the durations of its direct children and its own execution time.
When configuring the agent, you’ll often encounter environment variables for setting the exporter and service name. For instance, to send traces to a Jaeger collector running on localhost:14268, you’d set:
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:14268/v1/traces"
export OTEL_SERVICE_NAME="my-java-service"
The agent automatically handles context propagation for many protocols. For HTTP, it injects and extracts the traceparent header (W3C Trace Context standard) by default. For other protocols or custom communication mechanisms, you might need to manually propagate the Context object using the OpenTelemetry API. This involves getting the current Context and passing it along with your request, then calling context.makeCurrent() on the receiving end.
A common point of confusion is how the agent handles asynchronous operations and thread pools. By default, the agent instruments ExecutorService implementations. This means that if you submit a task to an ExecutorService, the tracing context from the thread that submitted the task is automatically propagated to the thread that executes the task. However, if you manually create new threads or use certain low-level concurrency primitives without using an instrumented ExecutorService, you might lose context. In such cases, you would need to explicitly use Context.current().wrap(Runnable) or similar mechanisms to ensure context propagation.
The next step after getting basic tracing set up is often to start enriching your spans with custom attributes and events to provide more detailed diagnostic information.