replace VMs by rebaking images instead of patching in place

Immutable infrastructure with Packer and Terraform: zero-downtime VM deployments at scale

14 min read

SSH patches and config drift make mutable VMs unreliable. Bake application and OS state into images with Packer, provision with Terraform, and replace instances through Auto Scaling rolling refresh—without treating servers as pets.

Why mutable VMs drift and where immutability still matters

The classic mutable server lifecycle ends in archaeology: SSH in, install a package, tweak a config, restart a service. Weeks later another engineer repeats the process. Identical machines diverge, staging stops matching production, and security patches become roulette. Containers enforce immutability at the process layer, but not every workload fits a container: legacy OS dependencies, GPU or latency-sensitive jobs, databases on local disks, and regulated images with fixed OS baselines still run on VMs. The goal is container-like discipline—rebuild instead of patch—applied to machine images. When something changes, bake a new AMI or cloud image and replace instances; never mutate production in place.

Bake then deploy: separate image and infrastructure cadences

Phase one uses Packer to produce a machine image with OS packages, application artifacts, monitoring agents, and hardening scripts baked in. The artifact is immutable after build: updates mean a new image ID. Phase two uses Terraform to wire networking, Auto Scaling Groups, load balancers, and IAM from that image. Application releases change the image pipeline; capacity, subnets, and security groups change the infrastructure pipeline. Independent cadences keep each change small and testable. Flow: code commit, CI builds app artifacts, Packer bake, manifest outputs AMI ID, Terraform applies launch template, instance refresh rolls replacements behind the load balancer.

Packer template: provision, validate, emit manifest

Pin the Amazon plugin, parameterize app_version and base_ami, tag images with build provenance, and run self-tests before finalize. Copy application tarballs from CI, install agents with version-pinned packages, and emit packer-manifest.json for downstream Terraform. Keep user_data in Terraform minimal—secrets and environment-specific values belong in SSM Parameter Store or instance roles, not in the golden image.

HCL · Packer web server AMI template

packer {
  required_plugins {
    amazon = {
      source  = "github.com/hashicorp/amazon"
      version = "~> 1.3"
    }
  }
}

variable "app_version" { type = string }
variable "base_ami" { type = string }

source "amazon-ebs" "webserver" {
  ami_name      = "webserver-${var.app_version}-{{timestamp}}"
  instance_type = "t3.medium"
  region        = "us-east-1"
  source_ami    = var.base_ami
  ssh_username  = "ec2-user"

  tags = {
    Name       = "webserver-${var.app_version}"
    AppVersion = var.app_version
    BuildTime  = "{{timestamp}}"
    ManagedBy  = "packer"
  }
}

build {
  sources = ["source.amazon-ebs.webserver"]

  provisioner "shell" {
    inline = [
      "sudo dnf update -y",
      "sudo dnf install -y amazon-cloudwatch-agent awscli jq",
      "sudo systemctl enable amazon-cloudwatch-agent",
    ]
  }

  provisioner "file" {
    source      = "build/app-${var.app_version}.tar.gz"
    destination = "/tmp/app.tar.gz"
  }

  provisioner "shell" {
    inline = [
      "sudo mkdir -p /opt/app",
      "sudo tar -xzf /tmp/app.tar.gz -C /opt/app",
      "sudo chown -R appuser:appuser /opt/app",
      "sudo systemctl enable app-server",
    ]
  }

  provisioner "shell" {
    inline = ["sudo /opt/app/bin/healthcheck --self-test"]
  }

  post-processor "manifest" {
    output     = "packer-manifest.json"
    strip_path = true
  }
}

Terraform: launch template, Auto Scaling Group, and rolling instance refresh

Reference the baked ami_id in a launch template with create_before_destroy. Attach the Auto Scaling Group to an Application Load Balancer target group with ELB health checks. Start aws_autoscaling_instance_refresh after launch template updates so replacements respect min_healthy_percentage and instance_warmup. Ignore desired_capacity drift if autoscaling policies adjust capacity separately.

HCL · launch template and Auto Scaling Group

resource "aws_launch_template" "webserver" {
  name_prefix   = "webserver-"
  image_id      = var.ami_id
  instance_type = "t3.medium"

  iam_instance_profile {
    name = aws_iam_instance_profile.webserver.name
  }

  network_interfaces {
    security_groups             = [aws_security_group.webserver.id]
    associate_public_ip_address = false
  }

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name       = "webserver"
      AppVersion = var.app_version
      ManagedBy  = "terraform"
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "webserver" {
  name                = "webserver-asg"
  desired_capacity    = var.desired_capacity
  min_size            = var.desired_capacity
  max_size            = var.desired_capacity + 2
  vpc_zone_identifier = var.private_subnet_ids

  launch_template {
    id      = aws_launch_template.webserver.id
    version = "$Latest"
  }

  target_group_arns         = [aws_lb_target_group.webserver.arn]
  health_check_type         = "ELB"
  health_check_grace_period = 120

  lifecycle {
    ignore_changes = [desired_capacity]
  }
}

resource "aws_autoscaling_instance_refresh" "webserver" {
  autoscaling_group_name = aws_autoscaling_group.webserver.name
  strategy               = "Rolling"

  preferences {
    min_healthy_percentage = 75
    instance_warmup        = 120
  }

  triggers {
    launch_template {
      versions = [aws_launch_template.webserver.latest_version]
    }
  }
}

HCL · ALB target group health check

resource "aws_lb_target_group" "webserver" {
  name     = "webserver-tg"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = var.vpc_id

  health_check {
    path                = "/healthz"
    interval            = 15
    timeout             = 5
    healthy_threshold   = 2
    unhealthy_threshold = 3
  }
}

CI pipeline: bake, apply, wait for refresh

Chain Packer build, extract AMI ID from manifest, Terraform apply with new variables, then poll instance refresh status until Successful. There is no aws autoscaling wait instance-refresh command—use describe-instance-refreshes in a loop. Store previous AMI IDs in SSM or a manifest registry for one-command rollback.

Bash · deploy script with refresh polling

#!/usr/bin/env bash
set -euo pipefail

APP_VERSION="${1:?Usage: deploy.sh <version>}"

packer init packer/
packer build -var "app_version=${APP_VERSION}" packer/web-server.pkr.hcl

AMI_ID=$(jq -r '.builds[-1].artifact_id' packer/packer-manifest.json | cut -d: -f2)

cd infrastructure/
terraform init -input=false
terraform apply -auto-approve \
  -var "app_version=${APP_VERSION}" \
  -var "ami_id=${AMI_ID}"

REFRESH_ID=$(aws autoscaling describe-instance-refreshes \
  --auto-scaling-group-name webserver-asg \
  --query 'InstanceRefreshes[0].InstanceRefreshId' --output text)

until [[ "$(aws autoscaling describe-instance-refreshes \
  --auto-scaling-group-name webserver-asg \
  --instance-refresh-ids "$REFRESH_ID" \
  --query 'InstanceRefreshes[0].Status' --output text)" == "Successful" ]]; do
  sleep 15
done

echo "Deployed ${APP_VERSION} on AMI ${AMI_ID}"

Operational practices: provenance, no SSH, lean images, and rollback

Tag every image and instance with commit SHA, build ID, and template version. Block SSH to production with security groups; detect config drift with agents. Pin package versions in Packer provisioners. Split a quarterly hardened base image from frequent application layers. Run integration tests against a temporary instance before promoting AMI IDs. Keep the last three to five images for rollback by re-applying Terraform with the previous ami_id. Target application bake times under ten minutes through caching and smaller artifacts. Wire the full lineage in CI so auditors trace commit to AMI to instance refresh without manual steps. Immutable VMs deliver reproducibility and safe rollback for workloads that still need the machine model—without the pet-server habit.

Validate Terraform modules before production rollout using patterns from our Terraform and Kitchen-Terraform testing guide.

Immutable deploys still need drift detection on the orchestration layer, covered in our infrastructure drift detection and remediation with Terraform guide.

Tags:terraform packer devops infrastructure

Discuss your infrastructure goals

Immutable infrastructure with Packer and Terraform: zero-downtime VM deployments at scale

Why mutable VMs drift and where immutability still matters

Bake then deploy: separate image and infrastructure cadences

Packer template: provision, validate, emit manifest

Terraform: launch template, Auto Scaling Group, and rolling instance refresh

CI pipeline: bake, apply, wait for refresh

Operational practices: provenance, no SSH, lean images, and rollback

You might also like

Infrastructure drift detection and remediation with Terraform

Testing Infrastructure as Code: reliable deployments with Terraform and Kitchen-Terraform

Standardizing infrastructure operations across containerized and virtualized workloads