reduce delivery friction through a standardized internal platform
Building an internal developer platform: from scattered CI/CD scripts to a unified deployment experience
13 min read
When each team owns a different pipeline style, delivery slows and platform risk grows. This guide shows how to build an Internal Developer Platform with a deployment abstraction layer, service catalog, policy gates, and centralized secrets.
Why fragmented CI/CD slows delivery over time
Most teams start with practical scripts that solve immediate needs: a Jenkins job here, a GitHub Actions workflow there, and a custom shell deployment path maintained by two people. The problem is not any single tool. The problem is that each team evolves its own conventions for environments, secrets, approvals, and rollback steps. Over time this creates hidden deployment variance, knowledge silos, and policy gaps that are expensive during incidents. Developers spend time understanding platform differences instead of shipping product value.
IDP as an abstraction layer, not a full CI/CD replacement
An Internal Developer Platform (IDP) should wrap existing delivery systems, not force a big-bang migration. The platform exposes one stable deploy interface while mapping requests to the right underlying workflow, cluster, and approval chain. Developers express intent; the platform resolves operational details. This reduces cognitive load and lets platform teams enforce shared controls in one place.
# Unified command used by developers
idp deploy api-service --version v2.3.1 --env staging
# Platform resolves behind the scenes:
# 1) environment -> cluster + namespace
# 2) service metadata from catalog
# 3) secret references from vault path
# 4) workflow trigger with validated inputs
# 5) status updates to platform dashboardService catalog: the contract between teams and platform
A useful IDP needs a canonical service definition. The catalog should include ownership, repository, deployment targets, required secrets, runtime constraints, and reliability expectations. This document becomes a contract: developers describe service intent and requirements, while the platform implements consistent execution against that contract. Keep the catalog in Git, review every change, and version it like application code.
apiVersion: idp.angri-tech.org/v1
kind: Service
metadata:
name: api-gateway
owner: platform-team
spec:
repository:
url: https://github.com/angri-tech/api-gateway
branch: main
deployment:
targets:
- name: staging
cluster: eks-staging-us-east-1
namespace: api-gateway-staging
autoDeploy: true
- name: production
cluster: eks-prod-us-east-1
namespace: api-gateway-prod
approvalRequired: true
approvers:
- platform-team-leads
secrets:
- path: secret/api-gateway/database
required: true
resources:
cpu:
request: 500m
limit: 2000m
memory:
request: 512Mi
limit: 2Gi
slo:
availability: 99.9Policy gate and centralized secrets are non-negotiable
Without enforcement, platform standards become guidelines that teams bypass under pressure. Add a policy gate that validates deployment requests before execution: recent security scan status, required resource limits, production SLO declarations, and dependency compatibility checks. In parallel, centralize secrets in Vault or cloud secret managers and inject them at runtime. Developers should reference secret dependencies, not handle raw secret values.
func EvaluateDeployment(ctx context.Context, req DeploymentRequest) PolicyResult {
var result PolicyResult
scanStatus, err := getLatestScanStatus(ctx, req.ServiceName, req.Version)
if err != nil || scanStatus.AgeHours > 24 {
result.Errors = append(result.Errors, PolicyViolation{
Policy: "security-scan-required",
Resource: req.ServiceName,
Message: "No recent security scan found",
Remediation: "Run: idp security-scan <service>",
})
}
if req.ServiceCatalog.Spec.Resources.CPU.Request == "" {
result.Errors = append(result.Errors, PolicyViolation{
Policy: "cost-tag-required",
Resource: req.ServiceName,
Message: "No CPU resource requests defined",
Remediation: "Add resources.cpu.request to service-catalog.yaml",
})
}
if req.Environment == "production" && req.ServiceCatalog.Spec.SLO.Availability == 0 {
result.Errors = append(result.Errors, PolicyViolation{
Policy: "slo-required-production",
Resource: req.ServiceName,
Message: "Production services must define an SLO",
Remediation: "Add slo.availability to service-catalog.yaml",
})
}
result.Passed = len(result.Errors) == 0
return result
}Reference workflow: IDP-triggered deployment in GitHub Actions
The workflow should be thin and deterministic: validate inputs, authenticate to the cluster, fetch runtime secrets from your manager, deploy with immutable image tags, and report status back to the platform API. Keep this flow reusable across services so teams do not rewrite deployment logic for each repository.
name: IDP Service Deployment
on:
workflow_dispatch:
inputs:
service:
required: true
version:
required: true
environment:
required: true
cluster:
required: true
namespace:
required: true
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Validate inputs
run: |
test -n "${{ inputs.service }}"
test -n "${{ inputs.version }}"
- name: Authenticate to Kubernetes
uses: azure/k8s-set-context@v1
with:
kubeconfig: ${{ secrets.KUBECONFIG }}
context: ${{ inputs.cluster }}
- name: Fetch secrets from Vault
uses: hashicorp/vault-action@v2
with:
method: kubernetes
url: https://vault.internal.angri-tech.org
secrets: |
secret/data/${{ inputs.service }}/${{ inputs.environment }} DATABASE_URL
- name: Deploy with Helm
run: |
helm upgrade --install "${{ inputs.service }}" ./charts/${{ inputs.service }} \
--namespace "${{ inputs.namespace }}" \
--set image.tag="${{ inputs.version }}" \
--wait --timeout 5mOperate the platform as an internal product
Treat platform engineering as product development for internal users. Track adoption and friction with metrics: deployment frequency, lead time, policy failure rate, and time spent on platform toil. Build a paved road that is easier than bypassing the platform. Detect drift between catalog intent and runtime state, but avoid surprise auto-remediation in production without clear ownership. Start with one service or team, prove reduced friction, then scale patterns incrementally. The goal is not a perfect platform in one quarter; the goal is a platform that makes each next deployment safer and faster.
If your deployment flow is already inconsistent across environments, start by mapping bottlenecks with the release pipeline bottlenecks framework.
Once the platform API is stable, declarative rollout control becomes much easier with GitOps workflows using Argo CD and Flux.
