switch production traffic atomically with instant rollback

Blue-green deployments on Kubernetes: zero-downtime releases without complexity

12 min read

Rolling updates mix old and new versions under live traffic. Blue-green keeps two full stacks, validates the idle color through a preview Service, then switches the production selector in one step—with instant rollback by patching back.

Why rolling updates expose users before you can react

Default rolling updates are convenient but mix versions under production load. While old and new pods share traffic, a broken endpoint or memory leak already hurts customers before you abort. Rollback is slow: drain pods, pull images, wait for readiness probes—thirty seconds at high traffic can mean failed payments and support tickets. Schema migrations make it worse: rolling the app back does not undo a forward database migration. Blue-green keeps two full stacks—blue on current production, green on the candidate—routes production through one Service selector, and validates green in isolation before the switch. If green fails checks, production stays on blue. Rollback is a single patch back to the previous selector, not a redeploy race.

Kubernetes-native blue-green: two Deployments, preview Service, one production Service

Kubernetes has no blue-green CRD, but Deployments, Services, and labels are enough—no service mesh required. Run app-blue and app-green Deployments with distinct version labels and the same replica count in production. myapp-service selects the active color for user traffic. Add myapp-green-preview and myapp-blue-preview Services that always point at each color so smoke tests hit the idle stack without touching production. Promotion updates the production Service selector app and version together—partial patches that drop the app label strand traffic. After switch, keep the previous color running for a defined rollback window before scale-down. Ingress controllers follow Endpoints updates within seconds; still verify health on the public URL after promotion. Argo Rollouts blueGreen strategy automates the same pattern if you want a controller instead of scripts.

YAML · blue Deployment with readiness probe
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
  labels:
    app: myapp
    version: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
        - name: app
          image: myregistry/myapp:v1.4.2
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
YAML · production and preview Services
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  selector:
    app: myapp
    version: blue
  ports:
    - port: 80
      targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-green-preview
spec:
  selector:
    app: myapp
    version: green
  ports:
    - port: 8080
      targetPort: 8080

Deploy green, validate through preview, switch atomically

Apply or update the green Deployment with an immutable image tag. Wait for rollout status, then curl the preview Service from an in-cluster job—never guess DNS names that do not exist. Patch the production Service with both selector keys. Confirm the public health endpoint, then leave blue running for two hours or until alerts stay quiet. The promotion script alternates colors so you reuse the same Deployment names every release.

YAML · green Deployment with new image tag
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
  labels:
    app: myapp
    version: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
        - name: app
          image: myregistry/myapp:v1.5.0
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
Bash · manual promotion workflow
kubectl apply -f app-green.yaml
kubectl rollout status deployment/app-green --timeout=120s

kubectl run smoke-green --rm -i --restart=Never --image=curlimages/curl:8.5.0 -- \
  curl -sf http://myapp-green-preview.production.svc.cluster.local:8080/healthz

kubectl patch svc myapp-service --type=merge -p \
  '{"spec":{"selector":{"app":"myapp","version":"green"}}}'

curl -sf https://myapp.example.com/healthz
Bash · promotion script with rollback-friendly selector patch
#!/usr/bin/env bash
set -euo pipefail

NAMESPACE=production
NEW_VERSION=${1:?usage: promote.sh <image-tag>}

CURRENT=$(kubectl get svc myapp-service -n "$NAMESPACE" -o jsonpath='{.spec.selector.version}')
if [[ "$CURRENT" == "blue" ]]; then NEW_COLOR=green; else NEW_COLOR=blue; fi

kubectl set image deployment/app-"$NEW_COLOR" app=myregistry/myapp:"$NEW_VERSION" -n "$NAMESPACE"
kubectl rollout status deployment/app-"$NEW_COLOR" -n "$NAMESPACE" --timeout=180s

kubectl run smoke-"$NEW_COLOR" --rm -i --restart=Never -n "$NAMESPACE" \
  --image=curlimages/curl:8.5.0 -- \
  curl -sf "http://myapp-${NEW_COLOR}-preview.${NAMESPACE}.svc.cluster.local:8080/healthz"

kubectl patch svc myapp-service -n "$NAMESPACE" --type=merge -p \
  "{\"spec\":{\"selector\":{\"app\":\"myapp\",\"version\":\"$NEW_COLOR\"}}}"

echo "Active color: $NEW_COLOR. Previous color $CURRENT kept for rollback."

Operational practices: capacity, health gates, data, and hybrid rollouts

Keep both colors at full replica count during promotion windows—idle pods cost less than a failed switch under cold start. Pin immutable image digests or version tags; mutable latest on both stacks destroys determinism. Extend /healthz beyond process-alive checks to database ping, cache warmup, and critical dependency reachability. For stateful tiers, run expand-and-contract schema migrations as a separate forward-only step and keep application code compatible with both schema versions during the overlap window. Scope dashboards and alerts to version labels so green is visible before it takes production traffic. For high-risk changes, combine blue-green with a short canary phase—route five percent through weighted Ingress or Argo Rollouts, validate error rate and latency, then complete the selector switch. Blue-green is not universal, but two Deployments, preview Services, and one atomic selector change give most teams mesh-free zero-downtime releases with rollback measured in seconds.

For gradual traffic shifts and metric gates instead of an instant switch, see progressive delivery with canary and feature flags in Kubernetes.

When schema changes must stay compatible across both colors, align rollout order with database schema migrations in CI/CD pipelines.