Hosted Runners Debugging Guide
Debugging a hosted runner involves two main steps:
- Verifying a runner-manager’s ability to spin up ephemeral VMs.
- Ensuring the ephemeral VMs can connect to GitLab.com or the CI Gateway.
Quick Overview
Section titled “Quick Overview”For a visual walkthrough, check out this video: Hosted Runners Testing.
Part 1: Testing Ephemeral VM Creation
Section titled “Part 1: Testing Ephemeral VM Creation”The most challenging aspect of testing runner-managers is composing the docker-machine command with all the required custom options. These options vary by manager, so we’ve created handy scripts to automate this process.
Using generate-create-machine.sh
Section titled “Using generate-create-machine.sh”This script is typically located in the /tmp folder of runner-manager VMs. It generates another script based on the configurations in the /etc/gitlab-runner/config.toml file of each runner-manager.
Steps to Run
Section titled “Steps to Run”$ sudo su# cd /tmp# export VM_MACHINE=test1# less create-machine.sh # Review the generated script# ./create-machine.sh # Run the scriptExample Output of a Successful Run
Section titled “Example Output of a Successful Run”tmp# ./create-machine.shRunning pre-create checks...(test1) Check that the project exists(test1) Check if the instance already existsCreating machine...(test1) Generating SSH Key(test1) Creating host...(test1) Opening firewall ports(test1) Creating instance(test1) Waiting for Instance(test1) Uploading SSH KeyWaiting for machine to be running, this may take a few minutes...Detecting operating system of created instance...Waiting for SSH to be available...Detecting the provisioner...Provisioning with cos...Copying certs to the local machine directory...Copying certs to the remote machine...Setting Docker configuration on the remote daemon...Checking connection to Docker...Docker is up and running!
To connect your Docker Client to the Docker Engine running on this VM, run: docker-machine env test1Part 2: Testing Ephemeral VM Connectivity
Section titled “Part 2: Testing Ephemeral VM Connectivity”Once the ephemeral VM is created successfully, you can verify its connectivity.
Steps to Test Connectivity
Section titled “Steps to Test Connectivity”# docker-machine ssh test1cos@test1 ~ $ curl -IL https://us-east1-c.ci-gateway.int.gprd.gitlab.net:8989cos@test1 ~ $ curl -IL https://gitlab.comExpected Outcome
Section titled “Expected Outcome”-
A successful call will return a
200status code. -
If any command times out, it may indicate a network misconfiguration.
Part 3: Connecting to a running job
Section titled “Part 3: Connecting to a running job”If there is a problem in an existing job that is still running, it is possible to connect to it directly. Note that this should only be done for our own workloads.
Get the runner-manager
Section titled “Get the runner-manager”This is visible on the web page for the job logs. Either on the top right, or in the logs themselves.
It will look something like this:
Running with gitlab-runner 18.4.0~pre.115.gb2218bab (b2218bab) on blue-4.saas-linux-small-amd64.runners-manager.gitlab.com/default J2nyww-sK, system ID: s_cf1798852952This needs to be translated into the actual hostname, which in this case would be:
runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internalThis mapping is implicit, but can be discovered via:
host="$(cd ~/code/chef-repo && knife node list | grep -vE '^INFO:' | fzf -0 -1 | awk -F: '{print $1}')"if [[ -n $host && "$hostname" != *".internal" ]]then host="$(cd ~/code/chef-repo && knife node show "$host" | grep -vE '^INFO:' | yq '.FQDN')"fiOr via:
knife search 'roles:runners-manager' --attribute 'fqdn' --attribute 'cookbook-gitlab-runner.runners.default.global.name' --format json | grep -vE '^INFO:' | jq -r '.rows[].[]|[.fqdn, ."cookbook-gitlab-runner.runners.default.global.name"]|@tsv' | sort -nGet runner (job VM) and container
Section titled “Get runner (job VM) and container”This is also in the job logs and looks like this:
Running on runner-j2nyww-sk-project-75050198-concurrent-0 via runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff...The second part is the job VM, the first part is the container name on that job VM.
SSH into the job
Section titled “SSH into the job”Now we have all the pieces to get a shell inside of the job.
First, SSH into the runner-manager:
ssh runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internalNext up, SSH into the job VM. We do this through docker-machine.
iwiedler@runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internal:~# sudo -H docker-machine ssh runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceffThis is a containerd-based container-optimized OS. It is possible to run a toolbox:
cos@runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff ~ $ toolboxAs well as docker commands. We can now get a shell inside of the job container:
cos@runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff ~ $ docker exec -it runner-j2nyww-sk-project-75050198-concurrent-0-d0c939fb2a356dee-predefined bashTroubleshooting Tips
Section titled “Troubleshooting Tips”Common Issue: Network Misconfiguration
Section titled “Common Issue: Network Misconfiguration”One frequent issue is a missing network configuration for the CI Gateway. Ensure that the network is allowed in the CI Gateway configuration.
If problems persist, verify the VM’s network settings and access permissions.