Canary Analysis

Define PromQL queries, thresholds, and evaluation windows in AnalysisTemplate resources. The controller evaluates them at every canary step.

Canary Analysis Templates

AnalysisTemplate Resource

An AnalysisTemplate defines what metrics to query, how to evaluate them, and what counts as a pass or failure.

apiVersion: kubestead.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-analysis
spec:
  args:
  - name: service
  metrics:
  - name: error-rate
    interval: 30s
    provider:
      prometheus:
        address: http://prometheus.monitoring:9090
        query: |
          sum(rate(http_requests_total{status=~"5..",service="{{args.service}}",}[2m]))
          /
          sum(rate(http_requests_total{service="{{args.service}}",}[2m]))
    successCondition: result[0] < 0.003
    failureCondition: result[0] > 0.010
    failureLimit: 1
  - name: p99-latency
    interval: 30s
    provider:
      prometheus:
        query: |
          histogram_quantile(0.99,
            sum(rate(http_request_duration_seconds_bucket{
              service="{{args.service}}"
            }[2m])) by (le)
          )
    successCondition: result[0] < 0.5
    failureCondition: result[0] > 1.0

Evaluation Logic

At each evaluation interval, the controller runs all configured metric queries. The result is one of:

  • Successful: all metrics pass their successCondition
  • Failed: any metric exceeds its failureCondition more than failureLimit times
  • Inconclusive: metrics backend unavailable; behavior controlled by inconclusivePolicy

Datadog Integration

provider:
  datadog:
    query: "avg:http.request.errors{service:{{args.service}}} by {service}.as_rate()"
    apiVersion: v2