switch production traffic atomically with instant rollback
Blue-green deployments on Kubernetes: zero-downtime releases without complexity
12 min read
Rolling updates mix old and new versions under live traffic. Blue-green keeps two full stacks, validates the idle color through a preview Service, then switches the production selector in one step—with instant rollback by patching back.
Why rolling updates expose users before you can react
Default rolling updates are convenient but mix versions under production load. While old and new pods share traffic, a broken endpoint or memory leak already hurts customers before you abort. Rollback is slow: drain pods, pull images, wait for readiness probes—thirty seconds at high traffic can mean failed payments and support tickets. Schema migrations make it worse: rolling the app back does not undo a forward database migration. Blue-green keeps two full stacks—blue on current production, green on the candidate—routes production through one Service selector, and validates green in isolation before the switch. If green fails checks, production stays on blue. Rollback is a single patch back to the previous selector, not a redeploy race.
Kubernetes-native blue-green: two Deployments, preview Service, one production Service
Kubernetes has no blue-green CRD, but Deployments, Services, and labels are enough—no service mesh required. Run app-blue and app-green Deployments with distinct version labels and the same replica count in production. myapp-service selects the active color for user traffic. Add myapp-green-preview and myapp-blue-preview Services that always point at each color so smoke tests hit the idle stack without touching production. Promotion updates the production Service selector app and version together—partial patches that drop the app label strand traffic. After switch, keep the previous color running for a defined rollback window before scale-down. Ingress controllers follow Endpoints updates within seconds; still verify health on the public URL after promotion. Argo Rollouts blueGreen strategy automates the same pattern if you want a controller instead of scripts.
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
labels:
app: myapp
version: blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: app
image: myregistry/myapp:v1.4.2
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
version: blue
ports:
- port: 80
targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: myapp-green-preview
spec:
selector:
app: myapp
version: green
ports:
- port: 8080
targetPort: 8080Deploy green, validate through preview, switch atomically
Apply or update the green Deployment with an immutable image tag. Wait for rollout status, then curl the preview Service from an in-cluster job—never guess DNS names that do not exist. Patch the production Service with both selector keys. Confirm the public health endpoint, then leave blue running for two hours or until alerts stay quiet. The promotion script alternates colors so you reuse the same Deployment names every release.
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
labels:
app: myapp
version: green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myregistry/myapp:v1.5.0
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080kubectl apply -f app-green.yaml
kubectl rollout status deployment/app-green --timeout=120s
kubectl run smoke-green --rm -i --restart=Never --image=curlimages/curl:8.5.0 -- \
curl -sf http://myapp-green-preview.production.svc.cluster.local:8080/healthz
kubectl patch svc myapp-service --type=merge -p \
'{"spec":{"selector":{"app":"myapp","version":"green"}}}'
curl -sf https://myapp.example.com/healthz#!/usr/bin/env bash
set -euo pipefail
NAMESPACE=production
NEW_VERSION=${1:?usage: promote.sh <image-tag>}
CURRENT=$(kubectl get svc myapp-service -n "$NAMESPACE" -o jsonpath='{.spec.selector.version}')
if [[ "$CURRENT" == "blue" ]]; then NEW_COLOR=green; else NEW_COLOR=blue; fi
kubectl set image deployment/app-"$NEW_COLOR" app=myregistry/myapp:"$NEW_VERSION" -n "$NAMESPACE"
kubectl rollout status deployment/app-"$NEW_COLOR" -n "$NAMESPACE" --timeout=180s
kubectl run smoke-"$NEW_COLOR" --rm -i --restart=Never -n "$NAMESPACE" \
--image=curlimages/curl:8.5.0 -- \
curl -sf "http://myapp-${NEW_COLOR}-preview.${NAMESPACE}.svc.cluster.local:8080/healthz"
kubectl patch svc myapp-service -n "$NAMESPACE" --type=merge -p \
"{\"spec\":{\"selector\":{\"app\":\"myapp\",\"version\":\"$NEW_COLOR\"}}}"
echo "Active color: $NEW_COLOR. Previous color $CURRENT kept for rollback."Operational practices: capacity, health gates, data, and hybrid rollouts
Keep both colors at full replica count during promotion windows—idle pods cost less than a failed switch under cold start. Pin immutable image digests or version tags; mutable latest on both stacks destroys determinism. Extend /healthz beyond process-alive checks to database ping, cache warmup, and critical dependency reachability. For stateful tiers, run expand-and-contract schema migrations as a separate forward-only step and keep application code compatible with both schema versions during the overlap window. Scope dashboards and alerts to version labels so green is visible before it takes production traffic. For high-risk changes, combine blue-green with a short canary phase—route five percent through weighted Ingress or Argo Rollouts, validate error rate and latency, then complete the selector switch. Blue-green is not universal, but two Deployments, preview Services, and one atomic selector change give most teams mesh-free zero-downtime releases with rollback measured in seconds.
For gradual traffic shifts and metric gates instead of an instant switch, see progressive delivery with canary and feature flags in Kubernetes.
When schema changes must stay compatible across both colors, align rollout order with database schema migrations in CI/CD pipelines.
