Skip to content

GitalyRequestsDropped

Gitaly is dropping incoming requests on a node because its concurrency queue is full. When the per-RPC concurrency limiter (or the adaptive limiter) has reached its maximum queue size, new requests are rejected immediately with GRPC::ResourceExhausted. This is measured by the gitaly_requests_dropped_total counter.

When a request is dropped, user-facing errors will be triggered.

The alert uses the following query which looks for dropped requests:

sum by (fqdn, reason) (rate(gitaly_requests_dropped_total{env="gprd", fqdn="<gitaly-node-here>"}[5m]))
  • The alert fires when rate(gitaly_requests_dropped_total[5m]) > 0 for 5 minutes on any node.
  • Under normal conditions this metric is zero. Any sustained non-zero rate means requests are actively being dropped.
  • Currently configured as s3 to allow a period of monitoring and tuning before upgrading to s2.
  • Dropped requests indicate active user impact and can escalate into a wider incident.
  • Gitaly Service Overview dashboard
  • Gitaly host detail dashboard
    • See the “gitaly per-RPC metrics” and “Adaptive limit metrics” panels.
  • Identify the affected nodes from the fqdn label and the affected RPCs from grpc_method.
  • Check node saturation (CPU, memory, disk) on the affected node.
  • Check whether a single repository or traffic source is driving the load, using Elasticsearch.
  • Review whether the adaptive limiter is backing off (see GitalyAdaptiveLimiterBackoff), which usually precedes dropped requests.

Gitaly has a Tier 2 rotation. Follow the How to Escalate guidance to page the team.

Several Slack channels are also available: