trace every request path across microservices in Kubernetes

Distributed tracing with OpenTelemetry in production Kubernetes

14 min read

One user request can cross dozens of services before it returns. Logs and metrics alone cannot show where latency or errors appear in the chain. This guide deploys the OpenTelemetry Operator, agent and gateway collectors, auto-instrumentation, and W3C context propagation to Tempo.

Why logs and metrics cannot answer cross-service request questions

In Kubernetes microservice estates a single API call often traverses gateways, domain services, caches, and message queues. Metrics show aggregate latency and error rate; logs capture local events. Neither reconstructs the full hop-by-hop path or pinpoints where a timeout started. Without distributed tracing, on-call engineers grep timestamps across namespaces and guess which dependency broke. Partial traces appear when one service forgets to propagate context, when head sampling drops spans mid-chain, or when high-cardinality span names explode storage cost. Tracing succeeds when context propagation is uniform, instrumentation is consistent across languages, and the pipeline keeps errors while sampling routine traffic.

Production architecture: Operator, collectors, propagation, and Tempo

OpenTelemetry separates instrumentation from export. Applications emit OTLP spans with W3C Trace Context headers traceparent and tracestate on HTTP, gRPC metadata, and message attributes. The OpenTelemetry Operator manages Collector custom resources and injects auto-instrumentation agents into pods. A DaemonSet agent on each node receives local spans, applies memory_limiter and batch, and forwards to a gateway Deployment. The gateway runs tail-based sampling so error and slow traces are kept while routine traffic is probabilistically reduced, then exports to Grafana Tempo or any OTLP-compatible backend. Head sampling at the SDK reduces volume early; tail sampling at the gateway preserves complete error traces that head sampling might discard.

Install the OpenTelemetry Operator and gateway collector

Install the operator from the official Helm chart and pin the contrib collector image tag. Define a gateway OpenTelemetryCollector custom resource with memory_limiter first in the pipeline, tail_sampling for errors and slow requests, attribute enrichment, and OTLP export to Tempo. Size gateway pods for sampling buffers—at least one gibibyte memory limit is a practical starting point for moderate trace volume.

Bash · install OpenTelemetry Operator via Helm
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

helm upgrade --install otel-operator open-telemetry/opentelemetry-operator \
  --namespace observability --create-namespace \
  --set manager.collectorImage.repository=otel/opentelemetry-collector-contrib \
  --set manager.collectorImage.tag=0.106.1
YAML · gateway OpenTelemetryCollector with tail sampling
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel-gateway
  namespace: observability
spec:
  mode: deployment
  replicas: 3
  resources:
    requests:
      cpu: 200m
      memory: 512Mi
    limits:
      cpu: "1"
      memory: 2Gi
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 1536
        spike_limit_mib: 384
      tail_sampling:
        decision_wait: 10s
        num_traces: 100000
        policies:
          - name: errors
            type: status_code
            status_code: { status_codes: [ERROR] }
          - name: slow-requests
            type: latency
            latency: { threshold_ms: 2000 }
          - name: standard
            type: probabilistic
            probabilistic: { sampling_percentage: 10 }
      attributes:
        actions:
          - key: deployment.environment
            action: upsert
            value: production
      batch:
        timeout: 5s
        send_batch_size: 8192
    exporters:
      otlp/tempo:
        endpoint: tempo.observability.svc:4317
        tls:
          insecure: true
        sending_queue:
          enabled: true
        retry_on_failure:
          enabled: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling, attributes, batch]
          exporters: [otlp/tempo]

Deploy the agent DaemonSet and auto-instrumentation

The agent OpenTelemetryCollector runs as a DaemonSet and forwards spans to the gateway service DNS name. Create an Instrumentation custom resource with parentbased_traceidratio head sampling at ten percent for routine traffic, tracecontext and baggage propagators, and language-specific auto-instrumentation images. Annotate Deployments with inject-java, inject-nodejs, or inject-python pointing to the Instrumentation resource name—not inject-sdk alone, which does not attach language agents. Validate end to end with a test request and confirm a multi-span trace appears in Tempo before rolling out cluster-wide.

YAML · agent DaemonSet collector
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel-agent
  namespace: observability
spec:
  mode: daemonset
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      memory_limiter:
        check_interval: 500ms
        limit_mib: 300
        spike_limit_mib: 80
      batch:
        timeout: 2s
        send_batch_size: 4096
    exporters:
      otlp:
        endpoint: otel-gateway-collector.observability.svc:4317
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp]
YAML · Instrumentation CR and Deployment annotations
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: order-service-instrumentation
  namespace: default
spec:
  exporter:
    endpoint: http://otel-agent-collector.observability.svc:4318
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:2.6.0
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.52.1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  namespace: default
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-java: order-service-instrumentation
    spec:
      containers:
        - name: order-service
          image: registry.example.com/order-service:v1.2.3

Manual context propagation when auto-instrumentation is not enough

Custom HTTP clients, legacy queues, or hand-rolled gRPC wrappers must inject W3C Trace Context manually. Use the OpenTelemetry Go propagator API—not vendor-specific injectors. For gRPC, register otelgrpc server and client interceptors so metadata carries the active span context. Missing propagation on even one internal hop splits traces and defeats the purpose of the pipeline.

Go · W3C Trace Context on outbound HTTP and gRPC
import (
	"context"
	"net/http"

	"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/propagation"
	"google.golang.org/grpc"
)

func callDownstream(ctx context.Context, url string) (*http.Response, error) {
	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
	if err != nil {
		return nil, err
	}
	otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
	return http.DefaultClient.Do(req)
}

func newGRPCServer() *grpc.Server {
	return grpc.NewServer(
		grpc.StatsHandler(otelgrpc.NewServerHandler()),
	)
}

Operational practices: sampling, naming, cost, correlation, and security

Use head sampling at the SDK to limit volume and tail sampling at the gateway to keep errors and slow traces. Name spans with route templates such as GET /api/v2/orders/{orderId}, never raw IDs that create cardinality explosions. Set Tempo or backend retention to seven to fourteen days unless compliance requires longer. Inject trace_id and span_id into structured JSON logs so Grafana or Loki can jump from log line to trace. Never attach PII to span attributes. Use TLS between collectors and backends. Monitor otelcol_exporter_send_failed_spans and otelcol_processor_refused_spans on the gateway. Before production cutover verify DaemonSet agents on every node, gateway resource limits, tail sampling policies, auto-instrumentation on critical services, end-to-end W3C propagation, and trace-to-log links in your dashboard tool.

Tracing data flows through the same Collector tiers described in our OpenTelemetry Collector unified telemetry pipeline guide.

Define which latency and error budgets matter before tuning sampling using SLO, SLI, and error budget practices for platform teams.