Gitaly Snapshot Verification
Symptoms
Section titled “Symptoms”GitalySnapshotVerificationDelayed
is fired
Section titled “GitalySnapshotVerificationDelayed is fired”When this alert is fired, it means that no verification pipeline was able to complete successfully. More precisely, the report
job
wasn’t able to complete or it couldn’t publish the metrics properly.
Troubleshooting
Section titled “Troubleshooting”Check the scheduled pipelines in verification project, see if there are any failures in the first 3 stages:
- For the
restore
stage, errors would be mostly GCP-related. Check the service account used for restoring the snapshots, or if there any quotas being hit. - For the
verify
stage, errors could be related to SSH-ing into the machine. Check if the restored machine is reachable bygcloud compute ssh ...
. - For the
report
stage, errors could be related to SSH-ing into the machine or hitting blackbox endpoint is failing for some reason. Check if the Ops blackbox instance is reachable from the runner. - In general, the first 3 stages depend on Vault running successfully to get the required tokens. Check if the correct permissions are in place.
GitalySnapshotVerificationNoRecentChanges
is fired
Section titled “GitalySnapshotVerificationNoRecentChanges is fired”This alert means a verification pipeline found no repository that has a recent commit. Take note of the Gitaly instance and its project name for which this alert was fired.
Troubleshooting
Section titled “Troubleshooting”- The Gitaly instance could be a low-traffic one that just didn’t receive any new writes in the previous day. Try triggering a new pipeline for it using these instructions.
- The Gitaly instance may not have a recent snapshot created. Track the most recent pipeline for this instance and find the name of the snapshot used in the logs of the
restore
job. See if it’s really recent or a stale snapshot.- If it’s a stale one, then there’s a problem with the scheduled snapshotting that needs addressing.
- Trigger a new verification pipeline for the Gitaly instance instructions. After the
verify
is finished, trying SSH-ing into the machine and check if the verification script is running (usingps
) and there are lines being printed in/var/tmp/git-report.log
.- If not, trying running the script (the highlighted lines only) yourself and note any errors.