trace every request path across microservices in Kubernetes
Distributed tracing with OpenTelemetry in production Kubernetes
14 min read
One user request can cross dozens of services before it returns. Logs and metrics alone cannot show where latency or errors appear in the chain. This guide deploys the OpenTelemetry Operator, agent and gateway collectors, auto-instrumentation, and W3C context propagation to Tempo.
Why logs and metrics cannot answer cross-service request questions
In Kubernetes microservice estates a single API call often traverses gateways, domain services, caches, and message queues. Metrics show aggregate latency and error rate; logs capture local events. Neither reconstructs the full hop-by-hop path or pinpoints where a timeout started. Without distributed tracing, on-call engineers grep timestamps across namespaces and guess which dependency broke. Partial traces appear when one service forgets to propagate context, when head sampling drops spans mid-chain, or when high-cardinality span names explode storage cost. Tracing succeeds when context propagation is uniform, instrumentation is consistent across languages, and the pipeline keeps errors while sampling routine traffic.
Production architecture: Operator, collectors, propagation, and Tempo
OpenTelemetry separates instrumentation from export. Applications emit OTLP spans with W3C Trace Context headers traceparent and tracestate on HTTP, gRPC metadata, and message attributes. The OpenTelemetry Operator manages Collector custom resources and injects auto-instrumentation agents into pods. A DaemonSet agent on each node receives local spans, applies memory_limiter and batch, and forwards to a gateway Deployment. The gateway runs tail-based sampling so error and slow traces are kept while routine traffic is probabilistically reduced, then exports to Grafana Tempo or any OTLP-compatible backend. Head sampling at the SDK reduces volume early; tail sampling at the gateway preserves complete error traces that head sampling might discard.
Install the OpenTelemetry Operator and gateway collector
Install the operator from the official Helm chart and pin the contrib collector image tag. Define a gateway OpenTelemetryCollector custom resource with memory_limiter first in the pipeline, tail_sampling for errors and slow requests, attribute enrichment, and OTLP export to Tempo. Size gateway pods for sampling buffers—at least one gibibyte memory limit is a practical starting point for moderate trace volume.
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm upgrade --install otel-operator open-telemetry/opentelemetry-operator \
--namespace observability --create-namespace \
--set manager.collectorImage.repository=otel/opentelemetry-collector-contrib \
--set manager.collectorImage.tag=0.106.1apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otel-gateway
namespace: observability
spec:
mode: deployment
replicas: 3
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: "1"
memory: 2Gi
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 1s
limit_mib: 1536
spike_limit_mib: 384
tail_sampling:
decision_wait: 10s
num_traces: 100000
policies:
- name: errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-requests
type: latency
latency: { threshold_ms: 2000 }
- name: standard
type: probabilistic
probabilistic: { sampling_percentage: 10 }
attributes:
actions:
- key: deployment.environment
action: upsert
value: production
batch:
timeout: 5s
send_batch_size: 8192
exporters:
otlp/tempo:
endpoint: tempo.observability.svc:4317
tls:
insecure: true
sending_queue:
enabled: true
retry_on_failure:
enabled: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, attributes, batch]
exporters: [otlp/tempo]Deploy the agent DaemonSet and auto-instrumentation
The agent OpenTelemetryCollector runs as a DaemonSet and forwards spans to the gateway service DNS name. Create an Instrumentation custom resource with parentbased_traceidratio head sampling at ten percent for routine traffic, tracecontext and baggage propagators, and language-specific auto-instrumentation images. Annotate Deployments with inject-java, inject-nodejs, or inject-python pointing to the Instrumentation resource name—not inject-sdk alone, which does not attach language agents. Validate end to end with a test request and confirm a multi-span trace appears in Tempo before rolling out cluster-wide.
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otel-agent
namespace: observability
spec:
mode: daemonset
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 500ms
limit_mib: 300
spike_limit_mib: 80
batch:
timeout: 2s
send_batch_size: 4096
exporters:
otlp:
endpoint: otel-gateway-collector.observability.svc:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: order-service-instrumentation
namespace: default
spec:
exporter:
endpoint: http://otel-agent-collector.observability.svc:4318
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "0.1"
java:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:2.6.0
nodejs:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.52.1
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: default
spec:
template:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-java: order-service-instrumentation
spec:
containers:
- name: order-service
image: registry.example.com/order-service:v1.2.3Manual context propagation when auto-instrumentation is not enough
Custom HTTP clients, legacy queues, or hand-rolled gRPC wrappers must inject W3C Trace Context manually. Use the OpenTelemetry Go propagator API—not vendor-specific injectors. For gRPC, register otelgrpc server and client interceptors so metadata carries the active span context. Missing propagation on even one internal hop splits traces and defeats the purpose of the pipeline.
import (
"context"
"net/http"
"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/propagation"
"google.golang.org/grpc"
)
func callDownstream(ctx context.Context, url string) (*http.Response, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return nil, err
}
otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
return http.DefaultClient.Do(req)
}
func newGRPCServer() *grpc.Server {
return grpc.NewServer(
grpc.StatsHandler(otelgrpc.NewServerHandler()),
)
}Operational practices: sampling, naming, cost, correlation, and security
Use head sampling at the SDK to limit volume and tail sampling at the gateway to keep errors and slow traces. Name spans with route templates such as GET /api/v2/orders/{orderId}, never raw IDs that create cardinality explosions. Set Tempo or backend retention to seven to fourteen days unless compliance requires longer. Inject trace_id and span_id into structured JSON logs so Grafana or Loki can jump from log line to trace. Never attach PII to span attributes. Use TLS between collectors and backends. Monitor otelcol_exporter_send_failed_spans and otelcol_processor_refused_spans on the gateway. Before production cutover verify DaemonSet agents on every node, gateway resource limits, tail sampling policies, auto-instrumentation on critical services, end-to-end W3C propagation, and trace-to-log links in your dashboard tool.
Tracing data flows through the same Collector tiers described in our OpenTelemetry Collector unified telemetry pipeline guide.
Define which latency and error budgets matter before tuning sampling using SLO, SLI, and error budget practices for platform teams.
