Skip to content

Network Info

For high capacity shards (like shared) we create dedicated projects for ephemeral VMs.

All these projects have the same networking structure:

Network NameSubnet NameCIDRPurpose
ephemeral-runnersephemeral-runnersUNIQUE CIDRRunner manager machines
runners-gkerunners-gke10.9.4.0/24Primary; GKE nodes range
runners-gkerunners-gke10.8.0.0/16Secondary; GKE pods range
runners-gkerunners-gke10.9.0.0/22secondary; GKE services range

Please read GCP documentation about VPC-native clusters to understand how the different ranges of the subnet are being used by GKE.

VPC Implementation for CI SaaS Hosted Runners

Section titled “VPC Implementation for CI SaaS Hosted Runners”

The primary goal is to address scalability challenges due to the limitations of VPC peering and to improve performance, security, and manageability by implementing a shared VPC architecture. This new architecture enables greater flexibility and scalability for ephemeral projects, each linked to a corresponding shard within GCP.

Previously, the CI SaaS infrastructure employed individual VPC peering per project. Each project was paired directly with the central network through unique VPC peering links, which resulted in a highly complex and difficult-to-manage architecture. This approach also led to quickly reaching the VPC peering limit of 70 connections in GCP, which constrained our ability to scale and required re-evaluation of our networking strategy.

vpc-before

The new architecture transitions from individual VPC peerings to a Shared VPC model. This Shared VPC is managed within a central ‘gitlab-ci’ hub project, with each shard configured as a service project.

The isolation between the shared VPC networks is a logical one, occuring on the application-level, where each runner-manager is configured to use its own isolated network.

By using a Shared VPC, multiple projects can securely communicate through a common network structure, reducing the need for direct VPC peerings and thereby addressing the peering limits.

vpc-after

The architecture provides:

  • Scalability: Allowing for additional shards without reaching peering limits.
  • Isolation: Maintaining network security and separation between different shards.
  • Improved Management: Centralizing control through a single shared VPC with consistent firewall rules and policies.

The new VPC structure provides the following performance and scalability benefits:

  • Improved Scalability: The Shared VPC approach bypasses the previous 70-peer limit, enabling the CI infrastructure to scale without hitting GCP’s peering constraints.
  • Enhanced Isolation and Security: With each shard operating within its own VPC network, network policies and firewall rules provide stronger isolation, reducing potential security risks.

As peering automatically adds routes, it may introduce a conflict if the network “in the middle” have two different subnetworks with overlapping CIDR peered. Let’s consider few simple examples.

graph LR
  classDef subnetwork          color:#fff,fill:#555,stroke:#000,stroke-dasharray: 5 5;
  classDef subnetwork_conflict color:#fff,fill:#555,stroke:#a00,stroke-width:2px,stroke-dasharray: 5 5;
  classDef vpc                 color:#fff,fill:#555,stroke:#000,stroke-width:2px;

  subgraph Network A
    subnetwork_A_1("10.0.1.0/24"):::subnetwork -->|part of| network_A["Network A"]:::vpc
    subnetwork_A_2("10.0.0.0/24"):::subnetwork_conflict -->|part of| network_A
  end

  subgraph Network B
    subnetwork_B_2("10.0.0.0/24"):::subnetwork_conflict -->|part of| network_B["Network B"]:::vpc
    subnetwork_B_1("10.0.2.0/24"):::subnetwork -->|part of| network_B
  end

  network_B ===|peering| network_A

  subnetwork_A_2 -.-|direct conflict| subnetwork_B_2

  linkStyle 5 stroke:#a00,stroke-width:2px,stroke-dasharray: 5 5;

In this example we have two networks: Network A and Network B. Both have two subnetworks defined. One of the subnetworks in each of the networks is unique (10.0.1.0/24 in Network A and 10.0.2.0/24 in Network B). Both networks contain also a second subnetwork, which have exactly the same CIDR: 10.0.0.0/24.

When trying to peer these two networks directly, we will get a routing conflict, as it will be impossible to define where to route traffic to 10.0.0.0/24. When defining this in GCP (which requires peering definition to be specified from both sides), first side of peering will be saved. It will be however not activated yet and GCP will fail and reject to create the second side of the peering.

Conclusion: Networks peered directly can’t have conflicting CIDRs.

Peering conflicting networks with one hop between them
Section titled “Peering conflicting networks with one hop between them”
graph LR
  classDef subnetwork          color:#fff,fill:#555,stroke:#000,stroke-dasharray: 5 5;
  classDef subnetwork_conflict color:#fff,fill:#555,stroke:#a00,stroke-width:2px,stroke-dasharray: 5 5;
  classDef vpc                 color:#fff,fill:#555,stroke:#000,stroke-width:2px;

  subgraph Network A
    subnetwork_A_1("10.0.1.0/24"):::subnetwork -->|part of| network_A["Network A"]:::vpc
    subnetwork_A_2("10.0.0.0/24"):::subnetwork_conflict -->|part of| network_A
  end

  subgraph Network B
    subnetwork_B_2("10.0.0.0/24"):::subnetwork_conflict -->|part of| network_B["Network B"]:::vpc
    subnetwork_B_1("10.0.2.0/24"):::subnetwork -->|part of| network_B
  end

  subgraph Network C
    subnetwork_C_1("10.0.3.0/24"):::subnetwork -->|part of| network_C["Network C"]:::vpc
  end

  network_B ===|peering| network_C
  network_C ===|peering| network_A

  subnetwork_A_2 -.-|conflict in Network C| subnetwork_B_2

  linkStyle 7 stroke:#a00,stroke-width:2px,stroke-dasharray: 5 5;

Here we extend the previous example with a new network: Network C. It has only one subnetwork with unique CIDR: 10.0.3.0/24. Instead of peering Network A and Network B directly, we try to peer them through Network C.

For Network A there is no problem - it knows only one 10.0.0.0/24 subnetwork - its own. The same goes for Network B.

However, when we will try to connect them both to Network C, it will report a conflict as it gets routing to 10.0.0.0/24 CIDR from two different peers. When trying to apply this in GCP, one peering will be created successfully. The second one will fail just like in the case of peering conflicting networks directly.

Conclusion: Two networks peered through third common network also can’t have conflicting CIDRs.

Peering conflicting networks with more than one hop between them
Section titled “Peering conflicting networks with more than one hop between them”
graph LR
  classDef subnetwork color:#fff,fill:#555,stroke:#000,stroke-dasharray: 5 5;
  classDef vpc        color:#fff,fill:#555,stroke:#000,stroke-width:2px;

  subgraph Network A
    subnetwork_A_1("10.0.1.0/24"):::subnetwork -->|part of| network_A["Network A"]:::vpc
    subnetwork_A_2("10.0.0.0/24"):::subnetwork -->|part of| network_A
  end

  subgraph Network B
    subnetwork_B_2("10.0.0.0/24"):::subnetwork -->|part of| network_B["Network B"]:::vpc
    subnetwork_B_1("10.0.2.0/24"):::subnetwork -->|part of| network_B
  end

  subgraph Network C
    subnetwork_C_1("10.0.3.0/24"):::subnetwork -->|part of| network_C["Network C"]:::vpc
  end

  subgraph Network D
    subnetwork_D_1("10.0.4.0/24"):::subnetwork -->|part of| network_D["Network D"]:::vpc
  end

  network_B ===|peering| network_D
  network_C ===|peering| network_A
  network_D ===|peering| network_C

In this example we add fourth network: Network D. It has only one subnetwork with unique CIDR: 10.0.4.0/24. We also extend the peering chain, injecting Network D in the middle.

With this layout, we finally have no conflicts. Network A connected with Network C doesn’t have any directly overlapping subnetworks. As Network C is connected now with Network D it doesn’t create conflict for Network C as it was in the previous example.

Then we have Network D, which is connected with Network B and again without any direct overlapping.

The two only subnetworks that have conflicting CIDRs are now separated with two hops between them. As automatic routing is being added only for directly connected networks, we have no place where two different routes for 10.0.0.0/24 would show up.

Conclusion: If you need to define conflicting CIDRs, ensure that you have at least two hops when peering the VPC networks. Or in other words: If you have more than two hops when peering VPC networks, you don’t need to worry about CIDR conflicts between the edge networks.

Let’s consider this example layout:

graph LR
  classDef subnetwork color:#fff,fill:#555,stroke:#000,stroke-dasharray: 5 5;
  classDef vpc        color:#fff,fill:#555,stroke:#000,stroke-width:2px;

  subgraph gitlab-ci
    ci_ci[gitlab-ci/ci]:::vpc
    ci_ci_bastion(bastion subnetwork):::subnetwork
    ci_ci_runner_managers(runner-managers subnetwork):::subnetwork
    ci_ci_ep(ephemeral-runners-private subnetwork):::subnetwork
    ci_ci_esgo(ephemeral-runners-shared-gitlab-org subnetwork):::subnetwork

    ci_ci_gke[gitlab-ci/gke]:::vpc
    ci_ci_gke_gke(gke subnetwork):::subnetwork

    ci_ci_bastion --> ci_ci
    ci_ci_runner_managers --> ci_ci
    ci_ci_ep --> ci_ci
    ci_ci_esgo --> ci_ci
    ci_ci ===|peering| ci_ci_gke
    ci_ci_gke_gke --> ci_ci_gke
  end

  subgraph gitlab-production
    prd_gprd[gitlab-production/gprd]:::vpc
    prd_gprd_monitoring(monitoring-gprd subnetwork):::subnetwork

    prd_gprd_monitoring --> prd_gprd
  end

  subgraph gitlab-ci-plan-free-4
    ci_plan_free_4_ephemeral[gitlab-ci-plan-free-4/ephemeral-runners]:::vpc
    ci_plan_free_4_ephemeral_e(ephemeral-runners subnetwork):::subnetwork

    ci_plan_free_4_gke[gitlab-ci-plan-free-4/gke]:::vpc
    ci_plan_free_4_gke_gke(gke subnetwork):::subnetwork

    ci_plan_free_4_ephemeral_e --> ci_plan_free_4_ephemeral
    ci_plan_free_4_ephemeral ===|peering| ci_plan_free_4_gke
    ci_plan_free_4_gke_gke --> ci_plan_free_4_gke
  end

  subgraph gitlab-ci-plan-free-3
    ci_plan_free_3_ephemeral[gitlab-ci-plan-free-3/ephemeral-runners]:::vpc
    ci_plan_free_3_ephemeral_e(ephemeral-runners subnetwork):::subnetwork

    ci_plan_free_3_gke[gitlab-ci-plan-free-3/gke]:::vpc
    ci_plan_free_3_gke_gke(gke subnetwork):::subnetwork

    ci_plan_free_3_ephemeral_e --> ci_plan_free_3_ephemeral
    ci_plan_free_3_ephemeral ===|peering| ci_plan_free_3_gke
    ci_plan_free_3_gke_gke --> ci_plan_free_3_gke
  end

  ci_ci ===|temporary peering| prd_gprd
  ci_ci ===|peering| ci_plan_free_3_ephemeral
  ci_ci ===|peering| ci_plan_free_4_ephemeral

  linkStyle 13 stroke:#0a0,stroke-width:4px;

gitlab-ci-plan-free-3 project have two networks that are peered: ephemeral-runners and gke. They are peered as Prometheus in gke network needs to be able to scrape node exporter on ephemeral VMs in ephemeral-runners network.

As it’s a direct peering, the networks can’t have conflicting CIDRS.

The same goes for gitlab-ci-plan-free-4 project.

The ephemeral-runners networks from gitlab-ci-plan-free-3 and gitlab-ci-plan-free-4 are also peered with ci network in gitlab-ci project. This is done because runner managers in runner-managers subnetwork need to be able to communicate with ephemeral VMs created in the gitlab-ci-plan-free-X projects.

Here we have a mix of direct peering and peering with one hop:

  • gitlab-ci/ci and gitlab-ci-plan-free-3/ephemeral-runners are peered directly, so their subnetworks can’t have conflicting CIDRs.
  • gitlab-ci/ci and gitlab-ci-plan-free-3/gke are peered through gitlab-ci-plan-free-3/ephemeral-runners. Their networks also can’t have conflicting CIDRs, as this would create conflict in gitlab-ci-plan-free-3/ephemeral-runners.

Also gitlab-ci-plan-free-X/ephemeral-runners are connected between each other with only one hop (gitlab-ci/ci), which means that all ephemeral-runners subnetwork need to have unique CIDRs.

gitlab-ci-plan-free-X/gke are connected with more than one hop (sibling ephemeral-runners network -> gitlab-ci/ci network -> other ephemeral-runners network -> other gitlab-ci-plan-free-X/gke network), they may have exactly the same CIDRs.

Having the peering rules in minds we’ve designed such networking layout:

  1. Each project used for CI runners will have a dedicated gke network with gke subnetwork. As these are never connected directly or with one hop, they all will use exactly the same CIDR, following the philosophy of “convention over configuration”.

  2. The ephemeral-runners subnetworks will be conflicting, as they all will have a one-hop common point in gitlab-ci/ci. This means that we need to make them unique across whole layout. For that we will maintain a list of unique CIDRs for ephemeral-runners subnetworks. The rule needs to be followed no matter if the network is created in a dedicated project (like the ci-plan-free-X ones) or in the main gitlab-ci project.

  3. Utility subnetworks like bastion or runner-managers need to not conflict with any other subnetworks. As we will have just these two subnetworks only in gitlab-ci/ci network, we’ve chosen static CIDRs for them and will not change that.

  4. Until we will introduce dedicated Prometheus servers for our CI projects and integrate them with our Thanos cluster, we need to use our main Prometheus server in gitlab-production project. For that we’ve created and need to maintain a temporary peering between gitlab-ci/ci and gitlab-production/gprd networks. When creating this peering we’ve resolved all CIDR conflicts, so all is good for now and our ephemeral-runners CIDR creation rule should ensure we will not introduce new conflicts. We will however need to carefully chose the CIDR for the gke subnetworks, as there is one-hop peering between gitlab-production/gprd and gitlab-ci/gke.

For ephemeral-runners subnetworks we’ve decided to use subsequent CIDRs, starting from 10.10.0.0/21.

The /21 network gives use place for 2046 nodes per network. In case we need more, we scale up the saturated shard.

Ideally, every new CIDR should start at directly after the previously reserved one ends, although that’s not the case now.

The list bellow is the SSOT of the CIDRs we should use!

Please consult every new range with it and keep this list up-to-date!

When adding any new ephemeral-runners subnetwork don’t forget to update the ci-gateway firewall!

EnvironmentNetwork {$PROJECT}/$VPC/$SUBNETWORKCIDR
GCP/gl-r-saas-l-m-amd64-gpu-1gitlab-ci/saas-l-m-gpu-s/p110.10.48.0/21
GCP/gl-r-saas-l-m-amd64-gpu-2gitlab-ci/saas-l-m-gpu-s/p210.10.248.0/21
GCP/gl-r-saas-l-m-amd64-gpu-3gitlab-ci/saas-l-m-gpu-s/p310.11.8.0/21
GCP/gitlab-r-saas-l-s-arm64-1gitlab-ci/saas-l-s-arm64/p110.12.48.0/21
GCP/gitlab-r-saas-l-s-arm64-2gitlab-ci/saas-l-s-arm64/p210.12.56.0/21
GCP/gitlab-r-saas-l-s-arm64-3gitlab-ci/saas-l-s-arm64/p310.12.64.0/21
GCP/gitlab-cigitlab-ci/saas-l-p-amd64/psc10.12.0.0/24
GCP/gitlab-r-saas-l-p-amd64-1gitlab-ci/saas-l-p-amd64/p110.12.8.0/21
GCP/gitlab-r-saas-l-p-amd64-2gitlab-ci/saas-l-p-amd64/p210.12.16.0/21
GCP/gitlab-r-saas-l-p-amd64-3gitlab-ci/saas-l-p-amd64/p310.12.24.0/21
GCP/gitlab-r-saas-l-p-amd64-4gitlab-ci/saas-l-p-amd64/p410.12.72.0/21
GCP/gitlab-r-saas-l-p-amd64-5gitlab-ci/saas-l-p-amd64/p510.12.80.0/21
GCP/gitlab-r-saas-l-p-amd64-6gitlab-ci/saas-l-p-amd64/p610.12.88.0/21
GCP/gitlab-r-saas-l-p-amd64-7gitlab-ci/saas-l-p-amd64/p710.12.96.0/21
GCP/gitlab-r-saas-l-p-amd64-8gitlab-ci/saas-l-p-amd64/p810.12.104.0/21
GCP/gitlab-r-saas-l-m-amd64-1gitlab-ci/saas-l-m-amd64/p110.13.64.0/21
GCP/gitlab-r-saas-l-m-amd64-2gitlab-ci/saas-l-m-amd64/p210.13.72.0/21
GCP/gitlab-r-saas-l-m-amd64-3gitlab-ci/saas-l-m-amd64/p310.13.80.0/21
GCP/gitlab-r-saas-l-m-amd64-4gitlab-ci/saas-l-m-amd64/p410.13.88.0/21
GCP/gitlab-r-saas-l-m-amd64-5gitlab-ci/saas-l-m-amd64/p510.13.96.0/21
GCP/gitlab-r-saas-l-l-amd64-1gitlab-ci/saas-l-l-amd64/p110.13.104.0/21
GCP/gitlab-r-saas-l-l-amd64-2gitlab-ci/saas-l-l-amd64/p210.13.112.0/21
GCP/gitlab-r-saas-l-l-amd64-3gitlab-ci/saas-l-l-amd64/p310.13.120.0/21
GCP/gitlab-r-saas-l-l-amd64-4gitlab-ci/saas-l-l-amd64/p410.13.128.0/21
GCP/gitlab-r-saas-l-l-amd64-5gitlab-ci/saas-l-l-amd64/p510.13.136.0/21
GCP/gitlab-r-saas-l-xl-amd64-1gitlab-ci/saas-l-xl-amd64/p110.13.144.0/21
GCP/gitlab-r-saas-l-xl-amd64-2gitlab-ci/saas-l-xl-amd64/p210.13.152.0/21
GCP/gitlab-r-saas-l-xl-amd64-3gitlab-ci/saas-l-xl-amd64/p310.13.160.0/21
GCP/gitlab-r-saas-l-xl-amd64-4gitlab-ci/saas-l-xl-amd64/p410.13.168.0/21
GCP/gitlab-r-saas-l-xl-amd64-5gitlab-ci/saas-l-xl-amd64/p510.13.176.0/21
GCP/gitlab-r-saas-l-2xl-amd64-1gitlab-ci/saas-l-2xl-amd64/p110.13.184.0/21
GCP/gitlab-r-saas-l-2xl-amd64-2gitlab-ci/saas-l-2xl-amd64/p210.13.192.0/21
GCP/gitlab-r-saas-l-2xl-amd64-3gitlab-ci/saas-l-2xl-amd64/p310.13.200.0/21
GCP/gitlab-r-saas-l-2xl-amd64-4gitlab-ci/saas-l-2xl-amd64/p410.13.208.0/21
GCP/gitlab-r-saas-l-2xl-amd64-5gitlab-ci/saas-l-2xl-amd64/p510.13.216.0/21
GCP/gitlab-r-saas-l-m-arm64-1gitlab-ci/saas-l-m-arm64/p110.13.224.0/21
GCP/gitlab-r-saas-l-m-arm64-2gitlab-ci/saas-l-m-arm64/p210.13.232.0/21
GCP/gitlab-r-saas-l-m-arm64-3gitlab-ci/saas-l-m-arm64/p310.13.240.0/21
GCP/gitlab-r-saas-l-l-arm64-1gitlab-ci/saas-l-l-arm64/p110.13.248.0/21
GCP/gitlab-r-saas-l-l-arm64-2gitlab-ci/saas-l-l-arm64/p210.14.0.0/21
GCP/gitlab-r-saas-l-l-arm64-3gitlab-ci/saas-l-l-arm64/p310.14.8.0/21
GCP/gitlab-r-saas-l-m-amd64-org-1gitlab-ci/saas-l-m-amd64-org/p110.14.16.0/21
GCP/gitlab-r-saas-l-m-amd64-org-2gitlab-ci/saas-l-m-amd64-org/p210.14.24.0/21
GCP/gitlab-r-saas-l-m-amd64-org-3gitlab-ci/saas-l-m-amd64-org/p310.14.32.0/21
GCP/gitlab-r-saas-l-m-amd64-org-4gitlab-ci/saas-l-m-amd64-org/p410.14.40.0/21
GCP/gitlab-r-saas-l-m-amd64-org-5gitlab-ci/saas-l-m-amd64-org/p510.14.48.0/21
GCP/gitlab-r-saas-l-m-amd64-org-6gitlab-ci/saas-l-m-amd64-org/p610.14.56.0/21
GCP/gitlab-r-saas-l-s-amd64-1gitlab-ci/saas-l-s-amd64/p110.14.64.0/21
GCP/gitlab-r-saas-l-s-amd64-2gitlab-ci/saas-l-s-amd64/p210.14.72.0/21
GCP/gitlab-r-saas-l-s-amd64-3gitlab-ci/saas-l-s-amd64/p310.14.80.0/21
GCP/gitlab-r-saas-l-s-amd64-4gitlab-ci/saas-l-s-amd64/p410.14.88.0/21
GCP/gitlab-r-saas-l-s-amd64-5gitlab-ci/saas-l-s-amd64/p510.14.96.0/21
GCP/gitlab-r-saas-l-s-amd64-6gitlab-ci/saas-l-s-amd64/p610.14.104.0/21
AWS/r-saas-m-stagingjobs-vpc/saas-macos-staging-blue-110.20.0.0/21
AWS/r-saas-m-stagingjobs-vpc/saas-macos-staging-blue-210.20.8.0/21
AWS/r-saas-m-stagingjobs-vpc/saas-macos-staging-green-110.20.16.0/21
AWS/r-saas-m-stagingjobs-vpc/saas-macos-staging-green-210.20.24.0/21
AWS/r-saas-m-m1jobs-vpc/saas-macos-m1-blue-110.30.0.0/21
AWS/r-saas-m-m1jobs-vpc/saas-macos-m1-blue-210.30.8.0/21
AWS/r-saas-m-m1jobs-vpc/saas-macos-m1-green-110.30.16.0/21
AWS/r-saas-m-m1jobs-vpc/saas-macos-m1-green-210.30.24.0/21
AWS/r-saas-m-l-m2projobs-vpc/saas-macos-l-m2pro-blue-110.40.0.0/21
AWS/r-saas-m-l-m2projobs-vpc/saas-macos-l-m2pro-blue-210.40.8.0/21
AWS/r-saas-m-l-m2projobs-vpc/saas-macos-l-m2pro-green-110.40.16.0/21
AWS/r-saas-m-l-m2projobs-vpc/saas-macos-l-m2pro-green-210.40.24.0/21

When updating the ephemeral-runners CIDRs please remember to update the firewall rules for the ci-gateway ILBs.

The rules are managed with Terraform in GPRD and GSTG environments within the google_compute_firewall resource named ci-gateway-allow-runners.

The GPRD (GitLab.com) definition can be found here.

The GSTG (staging.gitlab.com) definition can be found here

When doing any changes related to ephemeral runners make sure to check which GitLab environments that runner supports (for example our private runners support both GPRD and GSTG while shared only GPRD) and update the firewall rules respectively.

Here you can find details about networking in different projects used by CI Runners service.

Network NameSubnet NameCIDRPurpose
defaultdefault10.142.0.0/20all non-runner machines (managers, prometheus, etc.). In us-east1 - we don’t use this subnetwork in any other region.
cisd-exporter-ci10.142.16.0/24Monitoring subnetwork
cibastion-ci10.1.4.0/24Bastion network
cirunner-managers10.1.5.0/24Network for Runner Managers (new ones)
runners-gkerunners-gke10.9.4.0/24Primary; GKE nodes range
runners-gkerunners-gke10.8.0.0/16Secondary; GKE pods range
runners-gkerunners-gke10.9.0.0/22secondary; GKE services range

The default network will be removed once we will move all of the runner managers to a new infrastructure, which is being tracked by this epic.

The ci network will be getting new subnetworks for ephemeral-runners-X while working on this epic.

The runners-gke network, at least for now, is in the expected state.

Network NameSubnet NameCIDRPurpose
windows-cimanager-subnet10.1.0.0/16Runner manager machines
windows-ciexecutor-subnet10.2.0.0/16Ephemeral runner machines
windows-cirunner-windows-ci10.3.0.0/24Runner network for ansible/packer
windows-cibastion-windows-ci10.3.1.0/24bastion network

Windows project will most probably get the runners-gke network and GKE based monitoring in the future. This is however not yet scheduled.

To reduce the amount of traffic that goes through the public Internet (which causes additional costs) and to add a little performance improvements, Runner managers and Git are set to use internal load balancers which routes the traffic through GCP internal networking.

For that we’ve created a special VPC named ci-gateway. Dedicated VPC was added to avoid peering with the main VPC of GitLab backend - for security reasons and to reduce the number of possible CIDRs collisions.

This configuration was first tested with private runners shard and staging.gitlab.com. And next was replicated in GPRD - for gitlab.com and with the three Linux runners shards we have.

graph LR
  subgraph project::GSTG
    subgraph VPC::gstg::gstg
      gstg_haproxy_b(HaProxy us-east1-b)
      gstg_haproxy_c(HaProxy us-east1-c)
      gstg_haproxy_d(HaProxy us-east1-d)

      gstg_gitlab_backends(GitLab backend services)

      gstg_haproxy_b -->|routes direct without peering| gstg_gitlab_backends
      gstg_haproxy_c -->|routes direct without peering| gstg_gitlab_backends
      gstg_haproxy_d -->|routes direct without peering| gstg_gitlab_backends
    end

    subgraph VPC::gstg::ci-gateway
      gstg_ci_gateway_ILB_b(ILB us-east1-b)
      gstg_ci_gateway_ILB_c(ILB us-east1-c)
      gstg_ci_gateway_ILB_d(ILB us-east1-d)
    end

    gstg_ci_gateway_ILB_b -->|routes direct without peering| gstg_haproxy_b
    gstg_ci_gateway_ILB_c -->|routes direct without peering| gstg_haproxy_c
    gstg_ci_gateway_ILB_d -->|routes direct without peering| gstg_haproxy_d
  end

  subgraph project::GPRD
    subgraph VPC::gprd::gprd
      gprd_haproxy_b(HaProxy us-east1-b)
      gprd_haproxy_c(HaProxy us-east1-c)
      gprd_haproxy_d(HaProxy us-east1-d)

      gprd_gitlab_backends(GitLab backend services)

      gprd_haproxy_b -->|routes direct without peering| gprd_gitlab_backends
      gprd_haproxy_c -->|routes direct without peering| gprd_gitlab_backends
      gprd_haproxy_d -->|routes direct without peering| gprd_gitlab_backends
    end

    subgraph VPC::gprd::ci-gateway
      gprd_ci_gateway_ILB_b(ILB us-east1-b)
      gprd_ci_gateway_ILB_c(ILB us-east1-c)
      gprd_ci_gateway_ILB_d(ILB us-east1-d)
    end

    gprd_ci_gateway_ILB_b -->|routes direct without peering| gprd_haproxy_b
    gprd_ci_gateway_ILB_c -->|routes direct without peering| gprd_haproxy_c
    gprd_ci_gateway_ILB_d -->|routes direct without peering| gprd_haproxy_d
  end

  subgraph project::gitlab-ci
    subgraph VPC::gitlab-ci::ci
      subgraph runner-managers
        runners_manager_private_1[runners-manager-private-1]
        runners_manager_private_2[runners-manager-private-2]

        runners_manager_shared_gitlab_org_X[runners-manager-shared-gitlab-org-X]

        runners_manager_shared_X[runners-manager-shared-X]
      end

      subgraph example-ephemeral-vms
        private_ephemeral_vm_1
        private_ephemeral_vm_2
        shared_gitlab_org_ephemeral_vm
      end

      runners_manager_private_1 -->|manages a job on| private_ephemeral_vm_1
      runners_manager_private_2 -->|manages a job on| private_ephemeral_vm_2

      runners_manager_shared_gitlab_org_X -->|manages a job on| shared_gitlab_org_ephemeral_vm
    end
  end

  subgraph project::gitlab-ci-plan-free-X
    subgraph VPC::gitlab-ci-plan-free-X::ephemeral-runners
      shared_ephemeral_vm
    end
  end

  runners_manager_shared_X --> shared_ephemeral_vm

  runners_manager_private_1 -->|connects through VPC peering| gstg_ci_gateway_ILB_c
  runners_manager_private_2 -->|connects through VPC peering| gstg_ci_gateway_ILB_d

  runners_manager_private_1 -->|connects through VPC peering| gprd_ci_gateway_ILB_c
  runners_manager_private_2 -->|connects through VPC peering| gprd_ci_gateway_ILB_d

  runners_manager_shared_gitlab_org_X -->|connects through VPC peering| gprd_ci_gateway_ILB_c

  runners_manager_shared_X -->|connects through VPC peering| gprd_ci_gateway_ILB_d

  private_ephemeral_vm_1 -->|connects through VPC peering| gstg_ci_gateway_ILB_c
  private_ephemeral_vm_2 -->|connects through VPC peering| gprd_ci_gateway_ILB_d

  shared_gitlab_org_ephemeral_vm -->|connects through VPC peering| gprd_ci_gateway_ILB_c

  shared_ephemeral_vm -->|connects through VPC peering| gprd_ci_gateway_ILB_d

The above diagram shows a general view of how this configuration is set up.

In both GSTG and GPRD projects we’ve created a dedicated VPC named ci-gateway. This VPC contains Internal Load Balancers (ILBs) available on defined FQDNs. The VPCs are peered with CI VPCs that contain runner managers and ephemeral VMs on which the jobs are executed.

As an ILB can route traffic only to nodes in the same VPC, we had to add a small change to our HaProxy configuration. We’ve created a dedicated cluster of new HaProxy nodes provisioned with two network interfaces: in gprd and in ci-gateway VPCs. The same configuration is created in GSTG.

HaProxy got a new frontend named https_git_ci_gateway and listening on port 8989. This fronted passes the detected git+https traffic and a limited amount of API endpoints (purely for Runner communication, which includes requesting for a job, sending trace update and sending job update) to GitLab backends. Other requests are redirected with 307 HTTP response code to staging.gitlab.com or gitlab.com - depending on the requested resource.

To reduce the cost that is created by traffic made across availability zones, in each project we have two ILBs - one for each availability zone (us-east1-c and us-east1-d) used by the CI fleet in the us-east1 region. Each ILB is configured to target HaProxy nodes only in its availability zone.

For that, the following FQDNs were created:

  • git-us-east1-c.ci-gateway.int.gstg.gitlab.net
  • git-us-east1-d.ci-gateway.int.gstg.gitlab.net
  • git-us-east1-c.ci-gateway.int.gprd.gitlab.net
  • git-us-east1-d.ci-gateway.int.gprd.gitlab.net

Runner nodes are configured to point the ILBs with the url and clone_url settings. As we set our runners to operate in a specific availability zone, each of them points the relevant ILB FQDN.

GitLab Runner is configured to talk with the dedicated ILB. Communication goes through the VPC peering and reaches one of the HaProxy nodes backing the ILB. TLS certificate is verified and Runner saves this information to configure Git in the job environment.

When job is received, Runner starts executing it on the ephemeral VM. It configures Git to use the CAChain resolved from initial API request. Repo URL is configured to use the ILB as GitLab’s endpoint.

When job reaches the step in which sources are updated, git clone operation is executed against the ILB. Communication again goes through the VPC peering and reaches one of the HaProxy nodes. TLS certificate is verified using the CAChain resolved earlier.

When job reaches the step when artifact needs to be downloaded or uploaded, it also tries to talk with the ILB. However, HaProxy frontend detects that this communication is unsupported and redirects it to the public Internet gateway of GitLab instance that the job belongs to.

In the meantime, Runner receives job logs and transfers them back - together with updating the status of the job - to GitLab’s API. For that the communication through VPC peering and the dedicaed ILB is used as well.