Skip to content

HPAScaleCapability

The Horizontal Pod Autoscaler for a workload has reached its configured maxReplicas. The HPA will not add more pods until the cap is raised or load subsides. Downstream this can push other resources (CPU, memory) into saturation.

  • SLI: gitlab_component_saturation:ratio{component="kube_horizontalpodautoscaler_desired_replicas"}
  • Soft SLO: 90%. Hard SLO: 95% (alert fires above hard).
  • Dashboard: HPA desired replicas saturation

HPAs are configured on the workload — typically in k8s-workloads/gitlab-com for the GitLab chart, or in argocd/apps for platform workloads.

Some HPAs are excluded from the alert via the metric selector: shard!~"database-throttled|elasticsearch|gitaly-throttled|urgent-authorized-projects", namespace!~"pubsubbeat".

  • Severity s3.
  • Scope silences to (cluster, namespace, horizontalpodautoscaler).
  • S3 when the workload’s own SLIs (Apdex, errors) remain healthy — you have time to raise the cap.
  • Escalate to S2 if Apdex or error SLIs are violating on the same workload — the HPA cap is causing customer impact.
  1. Confirm the HPA is at maxReplicas:

    Terminal window
    kubectl -n <namespace> get hpa
    kubectl -n <namespace> describe hpa <name>
  2. Check the workload’s service dashboard. If Apdex and errors are healthy this alert alone does not warrant urgent action.

  1. Determine what is driving the HPA. kubectl -n <ns> describe hpa <name> shows the current metric values and targets.
  2. Look at related saturation alerts on the same workload: component_saturation_slo_out_of_bounds:kube_container_cpu_limit, :kube_container_throttling, :kube_container_memory_limit. If the HPA is at cap and containers are throttled, raising maxReplicas will actually help.
  3. If SLIs are healthy and the HPA has been steadily at cap for weeks, raise maxReplicas in the workload’s chart values. Before raising, confirm the cluster has capacity — check GKENodeCountCritical / GKENodeCountHigh for the affected pool.
  4. If SLIs are violating, raise a P1/P2 issue, silence briefly, and coordinate a fix (either capacity or performance).
  • Raise spec.maxReplicas in the workload’s chart values, or the corresponding HPA object.
  • Right-size resource requests so each pod handles more load, reducing HPA pressure.
  • Add capacity to the node pool (see GKENodeCountCritical) if scaling out the HPA would exhaust node capacity.
  • Node pool capacity — HPA scaling stalls if the cluster cannot schedule more pods.
  • Custom metrics adapter (for HPAs driven by non-CPU metrics).
  • Identify the workload; escalate to its owning team.
  • #g_fleet_management for cluster-wide issues.