Skip to content

Gitaly error rate is too high

  • Message in prometheus-alerts Gitaly error rate is too high

1. Ensure that the same version of Gitaly is running across the entire fleet

Section titled “1. Ensure that the same version of Gitaly is running across the entire fleet”
  • Visit the Gitaly Version Tracker grafana dashboard.
  • Ensure the the entire fleet is running the same major and minor versions of Gitaly. The build time tag on the version should be ignored until gitlab-org/gitaly#388 is resolved.
  • The only time that the fleet should be runnnig mixed versions of Gitaly is during the deployment process
    • During a deploy, it is important that the storage tier (NFS servers) are upgraded before the front-end tier
    • Otherwise, it’s likely that front-end servers will make requests to the NFS servers that they are unable to fulfill.
sudo less /var/log/gitlab/gitaly/current

The Prometheus alert should be specific to one Gitaly shard.

The following sections contain some common causes, and steps to diagnose, of elevated Gitaly shard error rates.

  • Are the errors associated with only a few projects?
  • What is the origin of the Gitaly requests?
    • Check this pie chart and filter down to the relevant paths (/namespace/project).
  • If there are a lot of requests to RawController:
    • Using your admin account, take a look at the requested paths.
    • While the RawController is of course a legitimate endpoint that we offer, it can be quite expensive to serve, and we don’t expect a high request rate to it under common use.
    • Consider taking the relevant project private and/or blocking the owner and engaging support to contact them. Engage support to contact the user after this.
    • If abuse is suspected (e.g. if the repository contains copyrighted media files) then engage the abuse team.
  • Check if the repository cgroup is reaching it’s memory limits and getting oom killed