HPAScaleCapability

Overview

The Horizontal Pod Autoscaler for a workload has reached its configured maxReplicas. The HPA will not add more pods until the cap is raised or load subsides. Downstream this can push other resources (CPU, memory) into saturation.

Services

Kubernetes Service Overview
Owner: Fleet Management

Metrics

SLI: gitlab_component_saturation:ratio{component="kube_horizontalpodautoscaler_desired_replicas"}
Soft SLO: 90%. Hard SLO: 95% (alert fires above hard).
Dashboard: HPA desired replicas saturation

HPAs are configured on the workload — typically in k8s-workloads/gitlab-com for the GitLab chart, or in argocd/apps for platform workloads.

Some HPAs are excluded from the alert via the metric selector: shard!~"database-throttled|elasticsearch|gitaly-throttled|urgent-authorized-projects", namespace!~"pubsubbeat".

Alert Behavior

Severity s3.
Scope silences to (cluster, namespace, horizontalpodautoscaler).

Severities

S3 when the workload’s own SLIs (Apdex, errors) remain healthy — you have time to raise the cap.
Escalate to S2 if Apdex or error SLIs are violating on the same workload — the HPA cap is causing customer impact.

Verification

Confirm the HPA is at maxReplicas:

kubectl -n <namespace> get hpa
kubectl -n <namespace> describe hpa <name>

Check the workload’s service dashboard. If Apdex and errors are healthy this alert alone does not warrant urgent action.

Recent changes

Recent k8s-workloads/gitlab-com MRs — the GitLab chart HPAs live here.
Recent argocd/apps MRs — platform HPAs live here.

Troubleshooting

Determine what is driving the HPA. kubectl -n <ns> describe hpa <name> shows the current metric values and targets.
Look at related saturation alerts on the same workload: component_saturation_slo_out_of_bounds:kube_container_cpu_limit, :kube_container_throttling, :kube_container_memory_limit. If the HPA is at cap and containers are throttled, raising maxReplicas will actually help.
If SLIs are healthy and the HPA has been steadily at cap for weeks, raise maxReplicas in the workload’s chart values. Before raising, confirm the cluster has capacity — check GKENodeCountCritical / GKENodeCountHigh for the affected pool.
If SLIs are violating, raise a P1/P2 issue, silence briefly, and coordinate a fix (either capacity or performance).

Possible resolutions

Raise spec.maxReplicas in the workload’s chart values, or the corresponding HPA object.
Right-size resource requests so each pod handles more load, reducing HPA pressure.
Add capacity to the node pool (see GKENodeCountCritical) if scaling out the HPA would exhaust node capacity.

Dependencies

Node pool capacity — HPA scaling stalls if the cluster cannot schedule more pods.
Custom metrics adapter (for HPAs driven by non-CPU metrics).

Escalation

Identify the workload; escalate to its owning team.
#g_fleet_management for cluster-wide issues.

Definitions

Saturation resource definition: libsonnet/saturation-monitoring/kube_horizontalpodautoscaler_desired_replicas.libsonnet
Generated rule: mimir-rules/gitlab-gprd/kube/autogenerated-gitlab-gprd-kube-saturation-alerts.yml
Tunable parameters: slos.soft and slos.hard in the resource libsonnet.
Edit this playbook

Related alerts
k8s-operations.md — Manually scale a Deployment
KubeSchedulingFailures — related when scale-up is blocked at the node level.
GKENodeCountCritical / GKENodeCountHigh — related capacity signals.