Partial Gitaly Storage Rebalancing
To conserve space on Gitaly shards, we stop placing repositories in shards that are over space-utilized above certain percentage (currently 80%). However, due to organic growth in the existing repositories, disk utilization can still rise up and potentially reach 100%.
To counteract this, we have a capacity warning (example) that is triggered whenever the projected disk space utilization is going to exceed a certain threshold. This document describes the actions needed to resolve this alert by moving big repositories from heavily-utilized shards to least-utilized ones.
Procedure
Section titled “Procedure”-
Identify the heavily-utilized shards. Go to the “gitaly service | disk_space resource” dashboard, sort the entries by “Last” descendingly. Any shard entry for the
/dev/sdbdisk above the “aggregated disk_space” value is a candidate for rebalancing. If there are no entries above, then the immediate values after “aggregated disk_space” should be considered. -
On each shard identified, we run the following:
Terminal window # ssh gitaly-01-stor-gprd.c.gitlab-gitaly-gprd-uxyz.internaltmuxsudo du -hcd 3 /var/opt/gitlab/git-data/repositories/@hashed | grep G | grep \\.git | tee /tmp/duThe
ducommand looks for repositories that are over 1 gigabyte, it will take some time to finish (usually within an hour) hence we run in it in a tmux session. -
Once
dufinishes, we get the list of repositories we plan on moving out of the shard:Terminal window sort -n /tmp/du | tail -n 50 | cut -d@ -f2 | cut -d. -f1 | sed -e 's/^/@/'Copy the output.
-
On the production Rails console, run the following script, replacing
<repos>with the output we copied in the last step, and<shard>by the FQDN of the Gitaly shard we are processing:# ssh console-01-sv-gprd.c.gitlab-production.internal# sudo gitlab-rails chashes = '<repos>'projects = ProjectRepository.where(disk_path: hashes.split, shard: Shard.by_name('<shard>')).includes(:project).map(&:project).reject(&:forked?)projects.each { _1.repository_storage_moves.build(source_storage_name: _1.repository_storage).schedule }We exclude forks because moving them would also copy their object pool to the new destination, and since we don’t specify a destination (automatically chosen by Rails to be a least-utilized shard), we risk potentially copying object pools to different shards, increasing disk space, not reducing it.
-
Note the projects we’ve moved in an internal note in the warning issue, for future bookkeeping.
-
Repeat for other identified shards.
-
Usually within a day the issue would be auto-closed, if not, then more shards needs rebalancing.