Pull mirror overdue queue is too large
Symptoms
Section titled “Symptoms”- Message in #alerts-gprd: Large number of overdue pull mirror jobs
Background
Section titled “Background”Mirroring repositories is executed as follows:
- Sidekiq runs a
UpdateAllMirrorsWorker
job every minute UpdateAllMirrorsWorker
schedulesProjectImportScheduleWorker
jobs in bulks, the number depends on the available mirroring capacity- In other words, each minute we schedule a number of
ProjectImportScheduleWorker
jobs equal to the available mirroring capacity
- In other words, each minute we schedule a number of
- Each
ProjectImportScheduleWorker
job schedules aRepositoryUpdateMirrorWorker
, in which the actual mirroring happens - When
RepositoryUpdateMirrorWorker
runs, it adds the project ID to a Redis set, when it finishes (or fails), it removes the project ID from the set.- That’s how we track the available mirroring capacity; which equals maximum mirroring capacity - number of project IDs in the set
As GitLab.com grows, the number of mirrored project is going to grow as well. We may need to adjust mirroring capacity accordingly.
Troubleshoot
Section titled “Troubleshoot”-
View the repository_update_mirror dashboard
-
View the catchall dashboard
-
View the Sidekiq Queue size graph.
-
This alert may just be a symptom of slow Sidekiq jobs. If there are many jobs in the queue (i.e. over 10,000 and growing), you may want to investigate the state of PgBouncer.
-
Under “Running Jobs”, pay attention to the
UpdateAllMirrorsWorker
. If that has gone flat, then you may need to log the state of the pending pull mirror queue. -
Check Sentry for new 500 errors relating to
UpdateAllMirrorsWorker
. -
Check the logs, to see if a big upstream (e.g. bitbucket.org, github.com) are down/returning errors Look for consistent hostnames, projects/repos, or errors; note that there is a low grade normal rate of failure here, so you’re looking for outliers.
-
Check the top long-running jobs using the script below, it displays how many minutes they have been running and the project ID. Check the projects (i.e.
Project.find(id)
) for a common pattern (e.g. they belong to the same user/group, they reside on the same shard, their upstream is the same, …).jobs = []Sidekiq::Workers.new.each do |process, thread, msg|job = Sidekiq::Job.new(msg['payload'])jobs << [Time.now - Time.at(msg['run_at']), job] if msg['queue'] == 'repository_update_mirror'endjobs.sort_by { |(t, job)| t }.reverse.first(25).each do |(t, job)|puts "#{t / 60} | #{job.args}"end; nil