Kubernetes Jan 15, 2025 Raj Mehta

Four canary rollout patterns for Kubernetes microservices

Replicated routing, header-based targeting, shadow traffic, and replica-fraction splitting. Each pattern suits a different failure mode tolerance. We explain when to use each.

Kubernetes pod topology showing canary and stable replica sets

The term "canary deployment" gets used loosely to describe any rollout that doesn't flip 100% of traffic at once. But there are meaningfully different patterns underneath that umbrella, and the differences matter when you're choosing between them for a specific service. Each pattern makes distinct tradeoffs around blast radius, observability fidelity, rollback speed, and infrastructure complexity. Understanding when each one fits is more useful than picking a default and applying it everywhere.

Pattern 1: Replica-fraction splitting

Replica-fraction splitting is the simplest canary pattern and the one most naturally supported by vanilla Kubernetes. You run two ReplicaSets — a stable set with most of your replicas, and a canary set with a small fraction — and let kube-proxy distribute traffic proportionally to replica count. At 10 stable pods and 1 canary pod, roughly 9% of traffic routes to the canary.

The advantage is zero additional infrastructure. No service mesh required, no ingress controller weight configuration, no VirtualService objects. Any team running standard Kubernetes with a standard Service can implement this immediately.

The limitation is that traffic weight is coarse-grained and coupled to replica count. If you want 5% canary traffic but your service needs at least 3 replicas for load reasons, you're stuck at approximately 23% canary weight (3 canary / 13 total). You can't express "1% canary" without running 99 stable replicas, which is impractical.

Replica-fraction splitting is appropriate for:

Teams without a service mesh who want minimal operational complexity.
Services where coarse-grained traffic percentages (10%, 20%, 33%) are sufficient for analysis confidence.
Internal or batch-adjacent services where low canary traffic isn't a risk — you don't need the precision of 1% exposure.

It's a poor fit for high-stakes user-facing services where you want to start at 1-2% exposure and advance slowly, since getting there requires either running a large number of stable replicas or accepting that you can't hold tight percentages.

Pattern 2: Mesh-based traffic weight splitting

With a service mesh (Istio, Linkerd, Cilium) or a configurable ingress (NGINX with canary annotations, AWS Load Balancer Controller with target group weights), you can separate traffic weight from replica count entirely. The mesh's data plane applies weights at the request level, regardless of how many pods back each destination.

In Istio, this looks like a VirtualService with two HTTPRouteDestination entries pointing at two Subsets defined in a DestinationRule. At 5% canary weight, 95 out of every 100 requests go to stable pods; 5 go to canary pods — no matter how many replicas each has.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payment-api
spec:
  http:
    - route:
        - destination:
            host: payment-api
            subset: stable
          weight: 95
        - destination:
            host: payment-api
            subset: canary
          weight: 5

This is the pattern used by Flagger, Argo Rollouts with Istio, and Kubestead when a mesh is present. It gives you precise traffic control at any percentage and is the right default for services where you want to validate 1-2% exposure before advancing.

The cost is operational: you need the mesh installed, maintained, and correctly configured for the service. Mesh adds latency (though modern implementations like Ambient Mesh reduce this), complexity to debug, and a new failure domain. For organizations already running Istio or Linkerd, this is the obvious pattern. For organizations that would need to install a mesh solely for this purpose, the replica-fraction pattern may be a better starting point.

Pattern 3: Header-based request targeting

Rather than splitting by percentage, header-based targeting routes specific requests to the canary based on HTTP headers. A typical implementation routes any request with X-Canary: true or a specific X-User-Group: beta header to the canary Subset. All other traffic continues to stable.

This is not a replacement for percentage-based canary analysis — it's a different tool for a different goal. Header-based targeting is useful when:

You want to test a change with a specific user cohort (internal employees, beta opt-in users) before any general population exposure.
You're rolling out a feature that's gated by a feature flag, and you want the canary to serve that flag's enabled users specifically.
You need to reproduce a production bug with a specific request shape — you can route only requests matching certain header combinations to the canary.

The limitation is that header-based routing doesn't generate the request volume needed for statistically meaningful canary analysis unless your targeting covers a large-enough population. A canary serving 200 requests per hour won't give you reliable p99 latency or meaningful error rate confidence intervals. Header-based targeting is best used as a pre-canary step or in combination with percentage splitting: first validate with internal users via header targeting, then open to a percentage of general traffic for metric-based analysis.

Pattern 4: Shadow traffic (mirroring)

Shadow traffic mirroring sends a copy of live requests to the canary pod in parallel with the primary request path. The canary receives and processes the request, but its response is discarded — the client only sees the stable pod's response. This is zero blast-radius testing: the canary can throw errors or return incorrect data without any user impact.

In Istio, this is configured via the mirror and mirrorPercentage fields on a HTTPRoute:

http:
  - route:
      - destination:
          host: payment-api
          subset: stable
        weight: 100
    mirror:
      host: payment-api
      subset: canary
    mirrorPercentage:
      value: 25.0

Shadow traffic is powerful for services where any canary-induced error is unacceptable — payment processing, authentication, write-path operations that would need to be compensated if they fail. You can validate latency behavior and internal errors under real production load before exposing users to the canary at all.

The significant caveat is side effects. If your canary pod processes a mirrored payment request and writes to a database, you've now written a duplicate record. Shadow traffic is only safe for services where canary-side execution has no observable side effects, or where you've explicitly implemented shadow-safe handling (writing to a shadow database, suppressing external API calls). This makes it operationally complex to implement correctly for stateful services. Use shadow traffic for read-path or stateless services first; apply it carefully to write-path services only when you've explicitly handled side effect isolation.

Choosing between patterns

A decision framework that handles most cases:

No service mesh, internal service, or low-stakes workload → replica-fraction splitting
Service mesh available, user-facing service, need fine-grained traffic control → mesh-based weight splitting
Want to pre-validate with a specific user group before general exposure → header-based targeting as a pre-step before weight splitting
Zero-blast-radius validation needed, stateless or read-path service → shadow mirroring as initial validation before any weight splitting

Most mature microservices platforms end up combining patterns: shadow mirroring to validate the initial build, header-based targeting to validate with internal users, then mesh-based weight splitting for the actual progressive rollout with metric-gated advancement. The patterns aren't mutually exclusive — they're sequential stages in a risk reduction workflow.

The common mistake is treating pattern selection as a one-time infrastructure decision ("we use Istio, so we do weight splitting") rather than a per-rollout risk judgment. A routine config change in a well-observed service might go straight to 10% weight splitting with a 5-minute analysis window. A major refactor of a payment processing path deserves shadow traffic validation first, header targeting second, and weight splitting only after both have passed.