GitalyRequestsDropped
Overview
Section titled “Overview”Gitaly is dropping incoming requests on a node because its concurrency queue is
full. When the per-RPC concurrency limiter (or the adaptive limiter) has reached
its maximum queue size, new requests are rejected immediately with
GRPC::ResourceExhausted. This is measured by the
gitaly_requests_dropped_total counter.
When a request is dropped, user-facing errors will be triggered.
Services
Section titled “Services”- Service Overview
- Team that owns the service: Tenant Scale:Gitaly Team
Metrics
Section titled “Metrics”The alert uses the following query which looks for dropped requests:
sum by (fqdn, reason) (rate(gitaly_requests_dropped_total{env="gprd", fqdn="<gitaly-node-here>"}[5m]))- The alert fires when
rate(gitaly_requests_dropped_total[5m]) > 0for 5 minutes on any node.
Alert Behavior
Section titled “Alert Behavior”- Under normal conditions this metric is zero. Any sustained non-zero rate means requests are actively being dropped.
Severities
Section titled “Severities”- Currently configured as s3 to allow a period of monitoring and tuning before upgrading to s2.
- Dropped requests indicate active user impact and can escalate into a wider incident.
Verification
Section titled “Verification”- Gitaly Service Overview dashboard
- Gitaly host detail
dashboard
- See the “gitaly per-RPC metrics” and “Adaptive limit metrics” panels.
- Identify the affected nodes from the
fqdnlabel and the affected RPCs fromgrpc_method. - Check node saturation (CPU, memory, disk) on the affected node.
- Check whether a single repository or traffic source is driving the load, using Elasticsearch.
- Review whether the adaptive limiter is backing off (see GitalyAdaptiveLimiterBackoff), which usually precedes dropped requests.
Escalation
Section titled “Escalation”Gitaly has a Tier 2 rotation. Follow the How to Escalate guidance to page the team.
Several Slack channels are also available: