Gitaly is down
Symptoms
Section titled “Symptoms”- Message in prometheus-alerts Gitaly is down on [hostname]
1. Ensure that the file server is running
Section titled “1. Ensure that the file server is running”- Is the NFS file server running and accessible? Can you access it via a shell session?
If the server rebooted
Section titled “If the server rebooted”- try to find the reason for the reboot.
- have a look at the stackdriver GCE VM instance logs for cloudaudit system events and serial console output.
- check for zero size object files
- necessary until this get’s fixed
- else there will be errors with pushing, cloning, web ui…
cd /var/opt/gitlab/git-data/repositories/@hashedionice -n 5 find . -regextype sed -regex ".*/objects/.*" -size 0 -mtime +1 > /var/tmp/zerofiles.txt
sudo -u gitcd <repo>git fsck
#
git update-ref -d <invalid_ref_found_by_git_fsck>
git fsck --full
2. Check the Gitaly Logs
Section titled “2. Check the Gitaly Logs”- Check Sentry for unusual errors
- Check Kibana for increased error rates
- Check the Gitaly service logs on the affected host
- grep for
SIGSEGV
orSIGILL
in/var/log/gitlab/gitaly/
- grep for
- Check Grafana dashboards to check for a cause of this outage
3. Ensure that the Gitaly server process is running
Section titled “3. Ensure that the Gitaly server process is running”- Can you see the process in
ps aux | grep gitaly
? - Is the prometheus port responding: Does
curl https://localhost:9236/metrics
respond? - Attempt to restart gitaly service:
sudo gitlab-ctl restart gitaly