Gitaly version mismatch
Symptoms
Section titled “Symptoms”- Multiple versions of Gitaly are running within a fleet.
1. Figure out which version of Gitaly are running, and on what hosts
Section titled “1. Figure out which version of Gitaly are running, and on what hosts”Visit the Gitaly Version Tracker dashboard to find out which versions are running on each host.
If a deployment is currently being carried out there may be two versions running alongside one another for a period of up to 30 minutes.
Longer periods indicate a problem with the deployment, including
- Have hosts been skipped from the deployment?
- Is the deployment stuck?
2. Figure out which version of Gitaly is the expected version
Section titled “2. Figure out which version of Gitaly is the expected version”/chatops run auto_deploy status
- Click on the production revision
- Click “Browse files”
- Open “GITALY_SERVER_VERSION”
Known scenarios
Section titled “Known scenarios”Race condition during upgrade
Section titled “Race condition during upgrade”Symptoms:
- The old gitaly process has forked its child, but has not exited. Note that Gitaly processes spawn many gitaly-ruby workers, do not confuse these for the new main gitaly process.
gitlab-ctl hup gitaly
fails- The shard should be healthy, serving requests as normal - but subsequent gitaly deployments might fail. Some requests on the affected nodes will be served by outdated gitaly versions.
Resolution:
- Examine the process table, write down the pids of the “old” gitaly process (“the parent”) that refuses to exit, and its main gitaly process child.
- Ensure that the gitaly binary has been replaced with the desired new version:
- On the affected host:
/opt/gitlab/embedded/bin/gitaly --version
- On the affected host:
- Follow
/var/log/gitlab/gitaly/current
- Look for logs with a “pid” field. Both parent and child PIDs should be serving requests successfully.
- Keep following this log file throughout the resolution.
kill -9 <parent PID>
.- Note that this might interrupt in-flight requests, but there is not a more graceful solution to this problem at this time.
gitlab-ctl hup gitaly
. This should succeed. The process tree should appear “normal”, with one main gitaly process with a set of gitaly-ruby worker children.
Example process tree:
[email protected]:~# ps -ef --forest | grep gitalyroot 23333 23272 0 13:00 pts/0 00:00:00 \_ grep gitalyroot 2798 2771 0 Apr09 ? 00:00:02 \_ runsv gitalyroot 30136 2798 0 Jun02 ? 00:00:28 \_ svlogd /var/log/gitlab/gitalygit 16705 2798 0 Jul23 ? 00:00:11 \_ /opt/gitlab/embedded/bin/gitaly-wrapper /opt/gitlab/embedded/bin/gitaly /var/opt/gitlab/gitaly/config.tomlgit 13658 1 2 Jul23 ? 00:38:02 /opt/gitlab/embedded/bin/gitaly /var/opt/gitlab/gitaly/config.tomlgit 13688 13658 0 Jul23 ? 00:02:30 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.3git 13689 13658 0 Jul23 ? 00:02:33 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.5git 13697 13658 0 Jul23 ? 00:02:39 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.4git 13699 13658 0 Jul23 ? 00:02:32 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.1git 13701 13658 0 Jul23 ? 00:02:32 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.6git 13705 13658 0 Jul23 ? 00:02:33 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.2git 13706 13658 0 Jul23 ? 00:02:41 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.15git 13708 13658 0 Jul23 ? 00:02:32 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.0git 13710 13658 0 Jul23 ? 00:02:40 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.7git 13716 13658 0 Jul23 ? 00:02:37 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.10git 13723 13658 0 Jul23 ? 00:02:32 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.8git 13724 13658 0 Jul23 ? 00:02:35 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.9git 13725 13658 0 Jul23 ? 00:02:32 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.11git 13726 13658 0 Jul23 ? 00:02:33 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.12git 13727 13658 0 Jul23 ? 00:02:33 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.13git 13728 13658 0 Jul23 ? 00:02:33 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.18git 13731 13658 0 Jul23 ? 00:02:37 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.16git 13740 13658 0 Jul23 ? 00:02:33 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.17git 13741 13658 0 Jul23 ? 00:02:34 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.14git 13744 13658 0 Jul23 ? 00:02:34 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 13658 /var/opt/gitlab/gitaly/internal_sockets/ruby.19git 12827 13658 5 11:54 ? 00:03:33 \_ /opt/gitlab/embedded/bin/gitaly /var/opt/gitlab/gitaly/config.tomlgit 12853 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.0git 12855 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.2git 12866 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.1git 12867 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.15git 12874 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.3git 12881 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.5git 12883 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.4git 12884 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.13git 12885 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.12git 12888 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.14git 12890 12827 0 11:54 ? 00:00:08 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.17git 12893 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.16git 12897 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.8git 12899 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.18git 12900 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.9git 12901 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.19git 12903 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.6git 12910 12827 0 11:54 ? 00:00:08 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.7git 12912 12827 0 11:54 ? 00:00:08 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.11git 12915 12827 0 11:54 ? 00:00:09 \_ ruby /opt/gitlab/embedded/service/gitaly-ruby/bin/gitaly-ruby 12827 /var/opt/gitlab/gitaly/internal_sockets/ruby.10git 23079 12827 77 13:00 ? 00:00:24 \_ /opt/gitlab/embedded/bin/git --git-dir /var/opt/gitlab/git-data/repositories/@hashed/fa/53/fa539965395b8382145f8370b34eab249cf610d2d6f2943c95b9b9d08a63d4a3.git fetch --prune ssh://gitaly/internal.git +refs/*:refs/* --end-of-options
Example incident: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2452
Tracking issue: https://gitlab.com/gitlab-org/gitaly/-/issues/2988