Skip to content

Hosted Runners Debugging Guide

Debugging a hosted runner involves two main steps:

  1. Verifying a runner-manager’s ability to spin up ephemeral VMs.
  2. Ensuring the ephemeral VMs can connect to GitLab.com or the CI Gateway.

For a visual walkthrough, check out this video: Hosted Runners Testing.


The most challenging aspect of testing runner-managers is composing the docker-machine command with all the required custom options. These options vary by manager, so we’ve created handy scripts to automate this process.

This script is typically located in the /tmp folder of runner-manager VMs. It generates another script based on the configurations in the /etc/gitlab-runner/config.toml file of each runner-manager.

./generate-create-machine.sh
$ sudo su
# cd /tmp
# export VM_MACHINE=test1
# less create-machine.sh # Review the generated script
# ./create-machine.sh # Run the script
tmp# ./create-machine.sh
Running pre-create checks...
(test1) Check that the project exists
(test1) Check if the instance already exists
Creating machine...
(test1) Generating SSH Key
(test1) Creating host...
(test1) Opening firewall ports
(test1) Creating instance
(test1) Waiting for Instance
(test1) Uploading SSH Key
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with cos...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To connect your Docker Client to the Docker Engine running on this VM, run: docker-machine env test1

Once the ephemeral VM is created successfully, you can verify its connectivity.

Terminal window
# docker-machine ssh test1
cos@test1 ~ $ curl -IL https://us-east1-c.ci-gateway.int.gprd.gitlab.net:8989
cos@test1 ~ $ curl -IL https://gitlab.com
  • A successful call will return a 200 status code.

  • If any command times out, it may indicate a network misconfiguration.


If there is a problem in an existing job that is still running, it is possible to connect to it directly. Note that this should only be done for our own workloads.

This is visible on the web page for the job logs. Either on the top right, or in the logs themselves.

It will look something like this:

Running with gitlab-runner 18.4.0~pre.115.gb2218bab (b2218bab)
on blue-4.saas-linux-small-amd64.runners-manager.gitlab.com/default J2nyww-sK, system ID: s_cf1798852952

This needs to be translated into the actual hostname, which in this case would be:

runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internal

This mapping is implicit, but can be discovered via:

host="$(cd ~/code/chef-repo && knife node list | grep -vE '^INFO:' | fzf -0 -1 | awk -F: '{print $1}')"
if [[ -n $host && "$hostname" != *".internal" ]]
then
host="$(cd ~/code/chef-repo && knife node show "$host" | grep -vE '^INFO:' | yq '.FQDN')"
fi

Or via:

knife search 'roles:runners-manager' --attribute 'fqdn' --attribute 'cookbook-gitlab-runner.runners.default.global.name' --format json | grep -vE '^INFO:' | jq -r '.rows[].[]|[.fqdn, ."cookbook-gitlab-runner.runners.default.global.name"]|@tsv' | sort -n

This is also in the job logs and looks like this:

Running on runner-j2nyww-sk-project-75050198-concurrent-0 via runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff...

The second part is the job VM, the first part is the container name on that job VM.

Now we have all the pieces to get a shell inside of the job.

First, SSH into the runner-manager:

ssh runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internal

Next up, SSH into the job VM. We do this through docker-machine.

iwiedler@runners-manager-saas-linux-small-amd64-blue-4.c.gitlab-ci-155816.internal:~# sudo -H docker-machine ssh runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff

This is a containerd-based container-optimized OS. It is possible to run a toolbox:

cos@runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff ~ $ toolbox

As well as docker commands. We can now get a shell inside of the job container:

cos@runner-j2nyww-sk-s-l-s-amd64-1759673243-5f16ceff ~ $ docker exec -it runner-j2nyww-sk-project-75050198-concurrent-0-d0c939fb2a356dee-predefined bash

One frequent issue is a missing network configuration for the CI Gateway. Ensure that the network is allowed in the CI Gateway configuration.

If problems persist, verify the VM’s network settings and access permissions.