HPAScaleCapability
Overview
Section titled “Overview”The Horizontal Pod Autoscaler for a workload has reached its configured maxReplicas. The HPA will not add more pods until the cap is raised or load subsides. Downstream this can push other resources (CPU, memory) into saturation.
Services
Section titled “Services”Metrics
Section titled “Metrics”- SLI:
gitlab_component_saturation:ratio{component="kube_horizontalpodautoscaler_desired_replicas"} - Soft SLO: 90%. Hard SLO: 95% (alert fires above hard).
- Dashboard: HPA desired replicas saturation
HPAs are configured on the workload — typically in k8s-workloads/gitlab-com for the GitLab chart, or in argocd/apps for platform workloads.
Some HPAs are excluded from the alert via the metric selector: shard!~"database-throttled|elasticsearch|gitaly-throttled|urgent-authorized-projects", namespace!~"pubsubbeat".
Alert Behavior
Section titled “Alert Behavior”- Severity
s3. - Scope silences to
(cluster, namespace, horizontalpodautoscaler).
Severities
Section titled “Severities”S3when the workload’s own SLIs (Apdex, errors) remain healthy — you have time to raise the cap.- Escalate to
S2if Apdex or error SLIs are violating on the same workload — the HPA cap is causing customer impact.
Verification
Section titled “Verification”-
Confirm the HPA is at
maxReplicas:Terminal window kubectl -n <namespace> get hpakubectl -n <namespace> describe hpa <name> -
Check the workload’s service dashboard. If Apdex and errors are healthy this alert alone does not warrant urgent action.
Recent changes
Section titled “Recent changes”- Recent
k8s-workloads/gitlab-comMRs — the GitLab chart HPAs live here. - Recent
argocd/appsMRs — platform HPAs live here.
Troubleshooting
Section titled “Troubleshooting”- Determine what is driving the HPA.
kubectl -n <ns> describe hpa <name>shows the current metric values and targets. - Look at related saturation alerts on the same workload:
component_saturation_slo_out_of_bounds:kube_container_cpu_limit,:kube_container_throttling,:kube_container_memory_limit. If the HPA is at cap and containers are throttled, raisingmaxReplicaswill actually help. - If SLIs are healthy and the HPA has been steadily at cap for weeks, raise
maxReplicasin the workload’s chart values. Before raising, confirm the cluster has capacity — checkGKENodeCountCritical/GKENodeCountHighfor the affected pool. - If SLIs are violating, raise a P1/P2 issue, silence briefly, and coordinate a fix (either capacity or performance).
Possible resolutions
Section titled “Possible resolutions”- Raise
spec.maxReplicasin the workload’s chart values, or the corresponding HPA object. - Right-size resource requests so each pod handles more load, reducing HPA pressure.
- Add capacity to the node pool (see
GKENodeCountCritical) if scaling out the HPA would exhaust node capacity.
Dependencies
Section titled “Dependencies”- Node pool capacity — HPA scaling stalls if the cluster cannot schedule more pods.
- Custom metrics adapter (for HPAs driven by non-CPU metrics).
Escalation
Section titled “Escalation”- Identify the workload; escalate to its owning team.
#g_fleet_managementfor cluster-wide issues.
Definitions
Section titled “Definitions”- Saturation resource definition:
libsonnet/saturation-monitoring/kube_horizontalpodautoscaler_desired_replicas.libsonnet - Generated rule:
mimir-rules/gitlab-gprd/kube/autogenerated-gitlab-gprd-kube-saturation-alerts.yml - Tunable parameters:
slos.softandslos.hardin the resource libsonnet. - Edit this playbook
Related Links
Section titled “Related Links”- Related alerts
- k8s-operations.md — Manually scale a Deployment
KubeSchedulingFailures— related when scale-up is blocked at the node level.GKENodeCountCritical/GKENodeCountHigh— related capacity signals.