Zero-Downtime Deployment

Automated canary rollback in under 90 seconds — no pager at 3 AM.

Kubestead validates every microservice deployment against your SLOs. When error budgets cross the threshold, it rolls back — automatically, before your users notice.

Start Free Trial Read the Docs

Controller v0.8.1 · Last deployed 3 minutes ago · 0 incidents

Traffic split 12% canary

error-rate

0.18%

threshold

0.30%

p99-latency

312ms

error budget

72% remain

canary at 12% traffic

Used by platform teams running canary deployments across 50+ microservices in production

Corvo Systems Halcyon Infra Delvian Health Meltwater Labs

The Problem

Your team gets paged.
Kubestead doesn't.

canary-error-rate: 0.3% → amber threshold crossed

p99-latency: 847ms → SLO breach imminent

rollback triggered at 12% traffic → 0 user impact

Rollout Spec

apiVersion: kubestead.io/v1alpha1
kind: Rollout
metadata:
  name: payments-service
spec:
  canarySteps:
    - setWeight: 5
    - pause: {duration: 2m}
    - setWeight: 12
    - pause: {duration: 5m}
  analysisTemplate:
    name: error-rate-analysis
  errorBudgetPolicy:
    burnRateThreshold: 2.0
  rollbackPolicy:
    autoRollback: true
    maxRollbackDuration: 90s

The Deploy Loop

Three steps. Zero 3 AM pages.

Canary Slice

1–20% traffic split

Kubestead routes a configurable percentage of live traffic to your new version. You define the steps; the controller executes them precisely.

Metric Validation

error rate / p99 / custom SLOs

Every analysis interval, the controller queries your metrics backend — Prometheus, Datadog, or custom PromQL — and evaluates against your SLO thresholds.

Automated Decision

promote or rollback, no human

Metrics pass? Traffic advances to the next step. Error budget exhausted? Rollback fires in under 90 seconds. No Slack notification required, no runbook to follow.

See Full Architecture

By the Numbers

Numbers from teams where
every deploy is a production event.

< 90s

median rollback time from error budget exhaustion to full traffic reroute

user-facing incidents during canary validation window across active rollouts

50+

microservices per team on average managed through Kubestead rollout policies

12%

maximum canary traffic slice before automated promote/rollback verdict is issued

Configuration as Code

# kubestead-rollout.yaml
apiVersion: kubestead.io/v1alpha1
kind: Rollout

spec:
  # canarySteps: progressive traffic slice
  canarySteps:
    - setWeight: 1
    - pause: {duration: 1m}
    - setWeight: 5
    - pause: {duration: 2m}
    - setWeight: 12
    - pause: {duration: 5m}

  # analysisTemplate: PromQL queries
  analysisTemplate:
    name: error-rate-analysis
    args:
      - name: service
        value: payments-service

  # errorBudgetPolicy: burn rate guard
  errorBudgetPolicy:
    burnRateThreshold: 2.0
    windowMinutes: 60

  # rollbackPolicy: automatic action
  rollbackPolicy:
    autoRollback: true
    maxRollbackDuration: 90s
    notifySlack: true
    notifyPagerDuty: false

Every field earns its place.

canarySteps

Define exactly how traffic advances. Each setWeight is a percentage point; each pause is the analysis window. The controller won't advance until metrics pass.

analysisTemplate

Points to a PromQL or Datadog query. The template evaluates error rate, p99 latency, or any custom SLO metric you define. Results are compared against your thresholds.

errorBudgetPolicy

Burn rate detection catches fast-burning errors before they exhaust your 30-day budget. A burn rate of 2.0× triggers the rollback gate immediately.

rollbackPolicy

When the gate fires, the controller scales down the canary, reweights traffic back to stable, and completes the full rollback in under 90 seconds. No runbook. No human.

Full Rollout Spec Reference

Works With Your Stack

Your metrics backend, your service mesh, your CI pipeline. Kubestead wraps around them.

Prometheus

Grafana

Datadog

Istio

Linkerd

Argo Rollouts

GitHub Actions

GitLab CI

FluxCD

From Platform Teams

The people who never sleep
now actually sleep.

"We had three 3 AM pages in six months related to bad deploys. Since Kubestead, zero. The error budget burn rate gate catches the kind of thing that used to wake me up at 3:17 AM and have me manually kubectl rollout undoing things in a panic."

Senior SRE, logistics platform (400+ microservices)

"The YAML-native config was the deciding factor. Our team already lives in Helm charts and kustomize — dropping a Rollout resource into the repo felt exactly right. No dashboard to maintain, no new UI to learn. Just a spec that the controller executes."

Platform Engineering Lead, fintech

Get Started

Start your first canary rollout
in 10 minutes.

Start Free Trial View Pricing

14-day free trial. No credit card. Works with your existing Kubernetes clusters.

Automated canary rollback in under 90 seconds — no pager at 3 AM.

Your team gets paged.Kubestead doesn't.