Runner Managers
Overview
Section titled “Overview”The Runner Manager fleet are responsible for creating ephemeral runners used to carry out CI jobs for GitLab.com.
The Runner Manager fleet uses a blue/green deployment strategy that can be leveraged to apply security patches to the set of instances that are not currently active without disruption to service.
Lead Time
Section titled “Lead Time”Because there should always be an inactive set of runner instances, there should be minimal lead time required to begin patching these systems.
Process
Section titled “Process”See Linux Patching Overview for generic processes applied to all Linux systems.
See the Additional Automated Tooling
section below for how to execute the runner specific patching process on a given shard.
We will take advantage of the Runner Manager’s blue/green deployment to apply patches to the currently inactive color, make them active, then apply patches to the color that was removed from active service.
- Identify the currently inactive color
- Select the current shard on the ci-runners: Deployment overview dashboard. The active color will appear in the deployment column of the
GitLab Runner Versions
panel.
- Select the current shard on the ci-runners: Deployment overview dashboard. The active color will appear in the deployment column of the
- Initiate package updates across these nodes.
- Reboot
- Perform a deployment to make this color active.
- Wait for the now-inactive color to completely drain.
- You can use this Prometheus query to help determine when the inactive color of a given shard is no longer processing jobs.
- Patch and reboot these instances.
CI COS runner images
Section titled “CI COS runner images”The OS images deployed for ephemeral runner VMs is statically defined via Chef roles in chef-repo. Example. Updating these Chef attributes will change the deployed image used by the ephemeral runners for a given shard.
Additional Automation Tooling
Section titled “Additional Automation Tooling”There is a Slack command that can be executed from the #production
Slack channel to initiate patching of the individual runner shards. To use this:
- Identify the shard and inactive color you want to patch.
- In the
#production
Slack channel:- Issue command:
/runner run system-patch-dry-run <shard> <color>
- Verify the upgraded package lists don’t contain anything unexpected.
- Issue command:
/runner run system-patch <shard> <color>
- Issue command: