Network Info
Ephemeral runner VMs networking
Section titled “Ephemeral runner VMs networking”For high capacity shards (like shared) we create dedicated projects for ephemeral VMs.
All these projects have the same networking structure:
| Network Name | Subnet Name | CIDR | Purpose |
|---|---|---|---|
ephemeral-runners | ephemeral-runners | UNIQUE CIDR | Runner manager machines |
runners-gke | runners-gke | 10.9.4.0/24 | Primary; GKE nodes range |
runners-gke | runners-gke | 10.8.0.0/16 | Secondary; GKE pods range |
runners-gke | runners-gke | 10.9.0.0/22 | secondary; GKE services range |
Please read GCP documentation about VPC-native clusters to understand how the different
ranges of the subnet are being used by GKE.
VPC Implementation for CI SaaS Hosted Runners
Section titled “VPC Implementation for CI SaaS Hosted Runners”The primary goal is to address scalability challenges due to the limitations of VPC peering and to improve performance, security, and manageability by implementing a shared VPC architecture. This new architecture enables greater flexibility and scalability for ephemeral projects, each linked to a corresponding shard within GCP.
VPC Architecture
Section titled “VPC Architecture”Previous Architecture
Section titled “Previous Architecture”Previously, the CI SaaS infrastructure employed individual VPC peering per project. Each project was paired directly with the central network through unique VPC peering links, which resulted in a highly complex and difficult-to-manage architecture. This approach also led to quickly reaching the VPC peering limit of 70 connections in GCP, which constrained our ability to scale and required re-evaluation of our networking strategy.

New Architecture
Section titled “New Architecture”The new architecture transitions from individual VPC peerings to a Shared VPC model. This Shared VPC is managed within a central ‘gitlab-ci’ hub project, with each shard configured as a service project.
The isolation between the shared VPC networks is a logical one, occuring on the application-level, where each runner-manager is configured to use its own isolated network.
By using a Shared VPC, multiple projects can securely communicate through a common network structure, reducing the need for direct VPC peerings and thereby addressing the peering limits.

The architecture provides:
- Scalability: Allowing for additional shards without reaching peering limits.
- Isolation: Maintaining network security and separation between different shards.
- Improved Management: Centralizing control through a single shared VPC with consistent firewall rules and policies.
Expected Benefits
Section titled “Expected Benefits”The new VPC structure provides the following performance and scalability benefits:
- Improved Scalability: The Shared VPC approach bypasses the previous 70-peer limit, enabling the CI infrastructure to scale without hitting GCP’s peering constraints.
- Enhanced Isolation and Security: With each shard operating within its own VPC network, network policies and firewall rules provide stronger isolation, reducing potential security risks.
Network peering theory and issues
Section titled “Network peering theory and issues”As peering automatically adds routes, it may introduce a conflict if the network “in the middle” have two different subnetworks with overlapping CIDR peered. Let’s consider few simple examples.
Peering conflicting networks directly
Section titled “Peering conflicting networks directly”graph LR
classDef subnetwork color:#fff,fill:#555,stroke:#000,stroke-dasharray: 5 5;
classDef subnetwork_conflict color:#fff,fill:#555,stroke:#a00,stroke-width:2px,stroke-dasharray: 5 5;
classDef vpc color:#fff,fill:#555,stroke:#000,stroke-width:2px;
subgraph Network A
subnetwork_A_1("10.0.1.0/24"):::subnetwork -->|part of| network_A["Network A"]:::vpc
subnetwork_A_2("10.0.0.0/24"):::subnetwork_conflict -->|part of| network_A
end
subgraph Network B
subnetwork_B_2("10.0.0.0/24"):::subnetwork_conflict -->|part of| network_B["Network B"]:::vpc
subnetwork_B_1("10.0.2.0/24"):::subnetwork -->|part of| network_B
end
network_B ===|peering| network_A
subnetwork_A_2 -.-|direct conflict| subnetwork_B_2
linkStyle 5 stroke:#a00,stroke-width:2px,stroke-dasharray: 5 5;
In this example we have two networks: Network A and Network B. Both have two subnetworks defined. One of the
subnetworks in each of the networks is unique (10.0.1.0/24 in Network A and 10.0.2.0/24 in Network B).
Both networks contain also a second subnetwork, which have exactly the same CIDR: 10.0.0.0/24.
When trying to peer these two networks directly, we will get a routing conflict, as it will be impossible to
define where to route traffic to 10.0.0.0/24. When defining this in GCP (which requires peering definition
to be specified from both sides), first side of peering will be saved. It will be however not activated yet
and GCP will fail and reject to create the second side of the peering.
Conclusion: Networks peered directly can’t have conflicting CIDRs.
Peering conflicting networks with one hop between them
Section titled “Peering conflicting networks with one hop between them”graph LR
classDef subnetwork color:#fff,fill:#555,stroke:#000,stroke-dasharray: 5 5;
classDef subnetwork_conflict color:#fff,fill:#555,stroke:#a00,stroke-width:2px,stroke-dasharray: 5 5;
classDef vpc color:#fff,fill:#555,stroke:#000,stroke-width:2px;
subgraph Network A
subnetwork_A_1("10.0.1.0/24"):::subnetwork -->|part of| network_A["Network A"]:::vpc
subnetwork_A_2("10.0.0.0/24"):::subnetwork_conflict -->|part of| network_A
end
subgraph Network B
subnetwork_B_2("10.0.0.0/24"):::subnetwork_conflict -->|part of| network_B["Network B"]:::vpc
subnetwork_B_1("10.0.2.0/24"):::subnetwork -->|part of| network_B
end
subgraph Network C
subnetwork_C_1("10.0.3.0/24"):::subnetwork -->|part of| network_C["Network C"]:::vpc
end
network_B ===|peering| network_C
network_C ===|peering| network_A
subnetwork_A_2 -.-|conflict in Network C| subnetwork_B_2
linkStyle 7 stroke:#a00,stroke-width:2px,stroke-dasharray: 5 5;
Here we extend the previous example with a new network: Network C. It has only one subnetwork with unique CIDR:
10.0.3.0/24. Instead of peering Network A and Network B directly, we try to peer them through Network C.
For Network A there is no problem - it knows only one 10.0.0.0/24 subnetwork - its own. The same goes for
Network B.
However, when we will try to connect them both to Network C, it will report a conflict as it gets routing to
10.0.0.0/24 CIDR from two different peers. When trying to apply this in GCP, one peering will be created
successfully. The second one will fail just like in the case of peering conflicting networks directly.
Conclusion: Two networks peered through third common network also can’t have conflicting CIDRs.
Peering conflicting networks with more than one hop between them
Section titled “Peering conflicting networks with more than one hop between them”graph LR
classDef subnetwork color:#fff,fill:#555,stroke:#000,stroke-dasharray: 5 5;
classDef vpc color:#fff,fill:#555,stroke:#000,stroke-width:2px;
subgraph Network A
subnetwork_A_1("10.0.1.0/24"):::subnetwork -->|part of| network_A["Network A"]:::vpc
subnetwork_A_2("10.0.0.0/24"):::subnetwork -->|part of| network_A
end
subgraph Network B
subnetwork_B_2("10.0.0.0/24"):::subnetwork -->|part of| network_B["Network B"]:::vpc
subnetwork_B_1("10.0.2.0/24"):::subnetwork -->|part of| network_B
end
subgraph Network C
subnetwork_C_1("10.0.3.0/24"):::subnetwork -->|part of| network_C["Network C"]:::vpc
end
subgraph Network D
subnetwork_D_1("10.0.4.0/24"):::subnetwork -->|part of| network_D["Network D"]:::vpc
end
network_B ===|peering| network_D
network_C ===|peering| network_A
network_D ===|peering| network_C
In this example we add fourth network: Network D. It has only one subnetwork with unique CIDR: 10.0.4.0/24.
We also extend the peering chain, injecting Network D in the middle.
With this layout, we finally have no conflicts. Network A connected with Network C doesn’t have any directly
overlapping subnetworks. As Network C is connected now with Network D it doesn’t create conflict for Network C
as it was in the previous example.
Then we have Network D, which is connected with Network B and again without any direct overlapping.
The two only subnetworks that have conflicting CIDRs are now separated with two hops between them. As automatic
routing is being added only for directly connected networks, we have no place where two different routes for
10.0.0.0/24 would show up.
Conclusion: If you need to define conflicting CIDRs, ensure that you have at least two hops when peering the VPC networks. Or in other words: If you have more than two hops when peering VPC networks, you don’t need to worry about CIDR conflicts between the edge networks.
Networking layout design
Section titled “Networking layout design”Let’s consider this example layout:
graph LR
classDef subnetwork color:#fff,fill:#555,stroke:#000,stroke-dasharray: 5 5;
classDef vpc color:#fff,fill:#555,stroke:#000,stroke-width:2px;
subgraph gitlab-ci
ci_ci[gitlab-ci/ci]:::vpc
ci_ci_bastion(bastion subnetwork):::subnetwork
ci_ci_runner_managers(runner-managers subnetwork):::subnetwork
ci_ci_ep(ephemeral-runners-private subnetwork):::subnetwork
ci_ci_esgo(ephemeral-runners-shared-gitlab-org subnetwork):::subnetwork
ci_ci_gke[gitlab-ci/gke]:::vpc
ci_ci_gke_gke(gke subnetwork):::subnetwork
ci_ci_bastion --> ci_ci
ci_ci_runner_managers --> ci_ci
ci_ci_ep --> ci_ci
ci_ci_esgo --> ci_ci
ci_ci ===|peering| ci_ci_gke
ci_ci_gke_gke --> ci_ci_gke
end
subgraph gitlab-production
prd_gprd[gitlab-production/gprd]:::vpc
prd_gprd_monitoring(monitoring-gprd subnetwork):::subnetwork
prd_gprd_monitoring --> prd_gprd
end
subgraph gitlab-ci-plan-free-4
ci_plan_free_4_ephemeral[gitlab-ci-plan-free-4/ephemeral-runners]:::vpc
ci_plan_free_4_ephemeral_e(ephemeral-runners subnetwork):::subnetwork
ci_plan_free_4_gke[gitlab-ci-plan-free-4/gke]:::vpc
ci_plan_free_4_gke_gke(gke subnetwork):::subnetwork
ci_plan_free_4_ephemeral_e --> ci_plan_free_4_ephemeral
ci_plan_free_4_ephemeral ===|peering| ci_plan_free_4_gke
ci_plan_free_4_gke_gke --> ci_plan_free_4_gke
end
subgraph gitlab-ci-plan-free-3
ci_plan_free_3_ephemeral[gitlab-ci-plan-free-3/ephemeral-runners]:::vpc
ci_plan_free_3_ephemeral_e(ephemeral-runners subnetwork):::subnetwork
ci_plan_free_3_gke[gitlab-ci-plan-free-3/gke]:::vpc
ci_plan_free_3_gke_gke(gke subnetwork):::subnetwork
ci_plan_free_3_ephemeral_e --> ci_plan_free_3_ephemeral
ci_plan_free_3_ephemeral ===|peering| ci_plan_free_3_gke
ci_plan_free_3_gke_gke --> ci_plan_free_3_gke
end
ci_ci ===|temporary peering| prd_gprd
ci_ci ===|peering| ci_plan_free_3_ephemeral
ci_ci ===|peering| ci_plan_free_4_ephemeral
linkStyle 13 stroke:#0a0,stroke-width:4px;
gitlab-ci-plan-free-3 project have two networks that are peered: ephemeral-runners and gke. They are peered as
Prometheus in gke network needs to be able to scrape node exporter on ephemeral VMs in ephemeral-runners network.
As it’s a direct peering, the networks can’t have conflicting CIDRS.
The same goes for gitlab-ci-plan-free-4 project.
The ephemeral-runners networks from gitlab-ci-plan-free-3 and gitlab-ci-plan-free-4 are also peered with
ci network in gitlab-ci project. This is done because runner managers in runner-managers subnetwork
need to be able to communicate with ephemeral VMs created in the gitlab-ci-plan-free-X projects.
Here we have a mix of direct peering and peering with one hop:
gitlab-ci/ciandgitlab-ci-plan-free-3/ephemeral-runnersare peered directly, so their subnetworks can’t have conflicting CIDRs.gitlab-ci/ciandgitlab-ci-plan-free-3/gkeare peered throughgitlab-ci-plan-free-3/ephemeral-runners. Their networks also can’t have conflicting CIDRs, as this would create conflict ingitlab-ci-plan-free-3/ephemeral-runners.
Also gitlab-ci-plan-free-X/ephemeral-runners are connected between each other with only one hop (gitlab-ci/ci),
which means that all ephemeral-runners subnetwork need to have unique CIDRs.
gitlab-ci-plan-free-X/gke are connected with more than one hop (sibling ephemeral-runners network -> gitlab-ci/ci
network -> other ephemeral-runners network -> other gitlab-ci-plan-free-X/gke network), they may have exactly the
same CIDRs.
Having the peering rules in minds we’ve designed such networking layout:
-
Each project used for CI runners will have a dedicated
gkenetwork withgkesubnetwork. As these are never connected directly or with one hop, they all will use exactly the same CIDR, following the philosophy of “convention over configuration”. -
The
ephemeral-runnerssubnetworks will be conflicting, as they all will have a one-hop common point ingitlab-ci/ci. This means that we need to make them unique across whole layout. For that we will maintain a list of unique CIDRs forephemeral-runnerssubnetworks. The rule needs to be followed no matter if the network is created in a dedicated project (like theci-plan-free-Xones) or in the maingitlab-ciproject. -
Utility subnetworks like
bastionorrunner-managersneed to not conflict with any other subnetworks. As we will have just these two subnetworks only ingitlab-ci/cinetwork, we’ve chosen static CIDRs for them and will not change that. -
Until we will introduce dedicated Prometheus servers for our CI projects and integrate them with our Thanos cluster, we need to use our main Prometheus server in
gitlab-productionproject. For that we’ve created and need to maintain a temporary peering betweengitlab-ci/ciandgitlab-production/gprdnetworks. When creating this peering we’ve resolved all CIDR conflicts, so all is good for now and ourephemeral-runnersCIDR creation rule should ensure we will not introduce new conflicts. We will however need to carefully chose the CIDR for thegkesubnetworks, as there is one-hop peering betweengitlab-production/gprdandgitlab-ci/gke.
ephemeral-runners unique CIDRs list
Section titled “ephemeral-runners unique CIDRs list”For ephemeral-runners subnetworks we’ve decided to use subsequent CIDRs, starting from 10.10.0.0/21.
The /21 network gives use place for 2046 nodes per network. In case we need more, we scale up the saturated
shard.
Ideally, every new CIDR should start at directly after the previously reserved one ends, although that’s not the case now.
The list bellow is the SSOT of the CIDRs we should use!
Please consult every new range with it and keep this list up-to-date!
When adding any new ephemeral-runners subnetwork don’t forget to update the
ci-gateway firewall!
| Environment | Network {$PROJECT}/$VPC/$SUBNETWORK | CIDR |
|---|---|---|
GCP/gl-r-saas-l-m-amd64-gpu-1 | gitlab-ci/saas-l-m-gpu-s/p1 | 10.10.48.0/21 |
GCP/gl-r-saas-l-m-amd64-gpu-2 | gitlab-ci/saas-l-m-gpu-s/p2 | 10.10.248.0/21 |
GCP/gl-r-saas-l-m-amd64-gpu-3 | gitlab-ci/saas-l-m-gpu-s/p3 | 10.11.8.0/21 |
GCP/gitlab-r-saas-l-s-arm64-1 | gitlab-ci/saas-l-s-arm64/p1 | 10.12.48.0/21 |
GCP/gitlab-r-saas-l-s-arm64-2 | gitlab-ci/saas-l-s-arm64/p2 | 10.12.56.0/21 |
GCP/gitlab-r-saas-l-s-arm64-3 | gitlab-ci/saas-l-s-arm64/p3 | 10.12.64.0/21 |
GCP/gitlab-ci | gitlab-ci/saas-l-p-amd64/psc | 10.12.0.0/24 |
GCP/gitlab-r-saas-l-p-amd64-1 | gitlab-ci/saas-l-p-amd64/p1 | 10.12.8.0/21 |
GCP/gitlab-r-saas-l-p-amd64-2 | gitlab-ci/saas-l-p-amd64/p2 | 10.12.16.0/21 |
GCP/gitlab-r-saas-l-p-amd64-3 | gitlab-ci/saas-l-p-amd64/p3 | 10.12.24.0/21 |
GCP/gitlab-r-saas-l-p-amd64-4 | gitlab-ci/saas-l-p-amd64/p4 | 10.12.72.0/21 |
GCP/gitlab-r-saas-l-p-amd64-5 | gitlab-ci/saas-l-p-amd64/p5 | 10.12.80.0/21 |
GCP/gitlab-r-saas-l-p-amd64-6 | gitlab-ci/saas-l-p-amd64/p6 | 10.12.88.0/21 |
GCP/gitlab-r-saas-l-p-amd64-7 | gitlab-ci/saas-l-p-amd64/p7 | 10.12.96.0/21 |
GCP/gitlab-r-saas-l-p-amd64-8 | gitlab-ci/saas-l-p-amd64/p8 | 10.12.104.0/21 |
GCP/gitlab-r-saas-l-m-amd64-1 | gitlab-ci/saas-l-m-amd64/p1 | 10.13.64.0/21 |
GCP/gitlab-r-saas-l-m-amd64-2 | gitlab-ci/saas-l-m-amd64/p2 | 10.13.72.0/21 |
GCP/gitlab-r-saas-l-m-amd64-3 | gitlab-ci/saas-l-m-amd64/p3 | 10.13.80.0/21 |
GCP/gitlab-r-saas-l-m-amd64-4 | gitlab-ci/saas-l-m-amd64/p4 | 10.13.88.0/21 |
GCP/gitlab-r-saas-l-m-amd64-5 | gitlab-ci/saas-l-m-amd64/p5 | 10.13.96.0/21 |
GCP/gitlab-r-saas-l-l-amd64-1 | gitlab-ci/saas-l-l-amd64/p1 | 10.13.104.0/21 |
GCP/gitlab-r-saas-l-l-amd64-2 | gitlab-ci/saas-l-l-amd64/p2 | 10.13.112.0/21 |
GCP/gitlab-r-saas-l-l-amd64-3 | gitlab-ci/saas-l-l-amd64/p3 | 10.13.120.0/21 |
GCP/gitlab-r-saas-l-l-amd64-4 | gitlab-ci/saas-l-l-amd64/p4 | 10.13.128.0/21 |
GCP/gitlab-r-saas-l-l-amd64-5 | gitlab-ci/saas-l-l-amd64/p5 | 10.13.136.0/21 |
GCP/gitlab-r-saas-l-xl-amd64-1 | gitlab-ci/saas-l-xl-amd64/p1 | 10.13.144.0/21 |
GCP/gitlab-r-saas-l-xl-amd64-2 | gitlab-ci/saas-l-xl-amd64/p2 | 10.13.152.0/21 |
GCP/gitlab-r-saas-l-xl-amd64-3 | gitlab-ci/saas-l-xl-amd64/p3 | 10.13.160.0/21 |
GCP/gitlab-r-saas-l-xl-amd64-4 | gitlab-ci/saas-l-xl-amd64/p4 | 10.13.168.0/21 |
GCP/gitlab-r-saas-l-xl-amd64-5 | gitlab-ci/saas-l-xl-amd64/p5 | 10.13.176.0/21 |
GCP/gitlab-r-saas-l-2xl-amd64-1 | gitlab-ci/saas-l-2xl-amd64/p1 | 10.13.184.0/21 |
GCP/gitlab-r-saas-l-2xl-amd64-2 | gitlab-ci/saas-l-2xl-amd64/p2 | 10.13.192.0/21 |
GCP/gitlab-r-saas-l-2xl-amd64-3 | gitlab-ci/saas-l-2xl-amd64/p3 | 10.13.200.0/21 |
GCP/gitlab-r-saas-l-2xl-amd64-4 | gitlab-ci/saas-l-2xl-amd64/p4 | 10.13.208.0/21 |
GCP/gitlab-r-saas-l-2xl-amd64-5 | gitlab-ci/saas-l-2xl-amd64/p5 | 10.13.216.0/21 |
GCP/gitlab-r-saas-l-m-arm64-1 | gitlab-ci/saas-l-m-arm64/p1 | 10.13.224.0/21 |
GCP/gitlab-r-saas-l-m-arm64-2 | gitlab-ci/saas-l-m-arm64/p2 | 10.13.232.0/21 |
GCP/gitlab-r-saas-l-m-arm64-3 | gitlab-ci/saas-l-m-arm64/p3 | 10.13.240.0/21 |
GCP/gitlab-r-saas-l-l-arm64-1 | gitlab-ci/saas-l-l-arm64/p1 | 10.13.248.0/21 |
GCP/gitlab-r-saas-l-l-arm64-2 | gitlab-ci/saas-l-l-arm64/p2 | 10.14.0.0/21 |
GCP/gitlab-r-saas-l-l-arm64-3 | gitlab-ci/saas-l-l-arm64/p3 | 10.14.8.0/21 |
GCP/gitlab-r-saas-l-m-amd64-org-1 | gitlab-ci/saas-l-m-amd64-org/p1 | 10.14.16.0/21 |
GCP/gitlab-r-saas-l-m-amd64-org-2 | gitlab-ci/saas-l-m-amd64-org/p2 | 10.14.24.0/21 |
GCP/gitlab-r-saas-l-m-amd64-org-3 | gitlab-ci/saas-l-m-amd64-org/p3 | 10.14.32.0/21 |
GCP/gitlab-r-saas-l-m-amd64-org-4 | gitlab-ci/saas-l-m-amd64-org/p4 | 10.14.40.0/21 |
GCP/gitlab-r-saas-l-m-amd64-org-5 | gitlab-ci/saas-l-m-amd64-org/p5 | 10.14.48.0/21 |
GCP/gitlab-r-saas-l-m-amd64-org-6 | gitlab-ci/saas-l-m-amd64-org/p6 | 10.14.56.0/21 |
GCP/gitlab-r-saas-l-s-amd64-1 | gitlab-ci/saas-l-s-amd64/p1 | 10.14.64.0/21 |
GCP/gitlab-r-saas-l-s-amd64-2 | gitlab-ci/saas-l-s-amd64/p2 | 10.14.72.0/21 |
GCP/gitlab-r-saas-l-s-amd64-3 | gitlab-ci/saas-l-s-amd64/p3 | 10.14.80.0/21 |
GCP/gitlab-r-saas-l-s-amd64-4 | gitlab-ci/saas-l-s-amd64/p4 | 10.14.88.0/21 |
GCP/gitlab-r-saas-l-s-amd64-5 | gitlab-ci/saas-l-s-amd64/p5 | 10.14.96.0/21 |
GCP/gitlab-r-saas-l-s-amd64-6 | gitlab-ci/saas-l-s-amd64/p6 | 10.14.104.0/21 |
GCP/gitlab-ci | gitlab-ci/saas-l-m-amd64-org/gitlab-org-experimental | 10.14.112.0/21 |
GCP/gitlab-ci | gitlab-ci/saas-l-m-amd64-org/gitlab-org-experimental | 10.14.120.0/21 |
AWS/r-saas-m-staging | jobs-vpc/saas-macos-staging-blue-1 | 10.20.0.0/21 |
AWS/r-saas-m-staging | jobs-vpc/saas-macos-staging-blue-2 | 10.20.8.0/21 |
AWS/r-saas-m-staging | jobs-vpc/saas-macos-staging-green-1 | 10.20.16.0/21 |
AWS/r-saas-m-staging | jobs-vpc/saas-macos-staging-green-2 | 10.20.24.0/21 |
AWS/r-saas-m-m1 | jobs-vpc/saas-macos-m1-blue-1 | 10.30.0.0/21 |
AWS/r-saas-m-m1 | jobs-vpc/saas-macos-m1-blue-2 | 10.30.8.0/21 |
AWS/r-saas-m-m1 | jobs-vpc/saas-macos-m1-green-1 | 10.30.16.0/21 |
AWS/r-saas-m-m1 | jobs-vpc/saas-macos-m1-green-2 | 10.30.24.0/21 |
AWS/r-saas-m-l-m2pro | jobs-vpc/saas-macos-l-m2pro-blue-1 | 10.40.0.0/21 |
AWS/r-saas-m-l-m2pro | jobs-vpc/saas-macos-l-m2pro-blue-2 | 10.40.8.0/21 |
AWS/r-saas-m-l-m2pro | jobs-vpc/saas-macos-l-m2pro-green-1 | 10.40.16.0/21 |
AWS/r-saas-m-l-m2pro | jobs-vpc/saas-macos-l-m2pro-green-2 | 10.40.24.0/21 |
ci-gateway ILB firewall
Section titled “ci-gateway ILB firewall”When updating the ephemeral-runners CIDRs please remember to update the firewall rules for
the ci-gateway ILBs.
The rules are managed with Terraform in GPRD and GSTG environments within the google_compute_firewall resource
named ci-gateway-allow-runners.
The GPRD (GitLab.com) definition can be found here.
The GSTG (staging.gitlab.com) definition can be found here
When doing any changes related to ephemeral runners make sure to check which GitLab environments that runner
supports (for example our private runners support both GPRD and GSTG while shared only GPRD) and update
the firewall rules respectively.
GCP projects
Section titled “GCP projects”Here you can find details about networking in different projects used by CI Runners service.
gitlab-ci project
Section titled “gitlab-ci project”| Network Name | Subnet Name | CIDR | Purpose |
|---|---|---|---|
default | default | 10.142.0.0/20 | all non-runner machines (managers, prometheus, etc.). In us-east1 - we don’t use this subnetwork in any other region. |
ci | sd-exporter-ci | 10.142.16.0/24 | Monitoring subnetwork |
ci | bastion-ci | 10.1.4.0/24 | Bastion network |
ci | runner-managers | 10.1.5.0/24 | Network for Runner Managers (new ones) |
runners-gke | runners-gke | 10.9.4.0/24 | Primary; GKE nodes range |
runners-gke | runners-gke | 10.8.0.0/16 | Secondary; GKE pods range |
runners-gke | runners-gke | 10.9.0.0/22 | secondary; GKE services range |
The default network will be removed once we will move all of the runner managers to a new
infrastructure, which is being tracked by this epic.
The ci network will be getting new subnetworks for ephemeral-runners-X while working on
this epic.
The runners-gke network, at least for now, is in the expected state.
gitlab-ci-windows project
Section titled “gitlab-ci-windows project”| Network Name | Subnet Name | CIDR | Purpose |
|---|---|---|---|
windows-ci | manager-subnet | 10.1.0.0/16 | Runner manager machines |
windows-ci | executor-subnet | 10.2.0.0/16 | Ephemeral runner machines |
windows-ci | runner-windows-ci | 10.3.0.0/24 | Runner network for ansible/packer |
windows-ci | bastion-windows-ci | 10.3.1.0/24 | bastion network |
Windows project will most probably get the runners-gke network and GKE based monitoring in the future. This
is however not yet scheduled.
ci-gateway Internal Load Balancers
Section titled “ci-gateway Internal Load Balancers”To reduce the amount of traffic that goes through the public Internet (which causes additional costs) and to add a little performance improvements, Runner managers and Git are set to use internal load balancers which routes the traffic through GCP internal networking.
For that we’ve created a special VPC named ci-gateway. Dedicated VPC was added to avoid
peering with the main VPC of GitLab backend - for security reasons and to reduce the number
of possible CIDRs collisions.
This configuration was first tested with private runners shard and staging.gitlab.com. And
next was replicated in GPRD - for gitlab.com and with the three Linux runners shards we have.
graph LR
subgraph project::GSTG
subgraph VPC::gstg::gstg
gstg_haproxy_b(HaProxy us-east1-b)
gstg_haproxy_c(HaProxy us-east1-c)
gstg_haproxy_d(HaProxy us-east1-d)
gstg_gitlab_backends(GitLab backend services)
gstg_haproxy_b -->|routes direct without peering| gstg_gitlab_backends
gstg_haproxy_c -->|routes direct without peering| gstg_gitlab_backends
gstg_haproxy_d -->|routes direct without peering| gstg_gitlab_backends
end
subgraph VPC::gstg::ci-gateway
gstg_ci_gateway_ILB_b(ILB us-east1-b)
gstg_ci_gateway_ILB_c(ILB us-east1-c)
gstg_ci_gateway_ILB_d(ILB us-east1-d)
end
gstg_ci_gateway_ILB_b -->|routes direct without peering| gstg_haproxy_b
gstg_ci_gateway_ILB_c -->|routes direct without peering| gstg_haproxy_c
gstg_ci_gateway_ILB_d -->|routes direct without peering| gstg_haproxy_d
end
subgraph project::GPRD
subgraph VPC::gprd::gprd
gprd_haproxy_b(HaProxy us-east1-b)
gprd_haproxy_c(HaProxy us-east1-c)
gprd_haproxy_d(HaProxy us-east1-d)
gprd_gitlab_backends(GitLab backend services)
gprd_haproxy_b -->|routes direct without peering| gprd_gitlab_backends
gprd_haproxy_c -->|routes direct without peering| gprd_gitlab_backends
gprd_haproxy_d -->|routes direct without peering| gprd_gitlab_backends
end
subgraph VPC::gprd::ci-gateway
gprd_ci_gateway_ILB_b(ILB us-east1-b)
gprd_ci_gateway_ILB_c(ILB us-east1-c)
gprd_ci_gateway_ILB_d(ILB us-east1-d)
end
gprd_ci_gateway_ILB_b -->|routes direct without peering| gprd_haproxy_b
gprd_ci_gateway_ILB_c -->|routes direct without peering| gprd_haproxy_c
gprd_ci_gateway_ILB_d -->|routes direct without peering| gprd_haproxy_d
end
subgraph project::gitlab-ci
subgraph VPC::gitlab-ci::ci
subgraph runner-managers
runners_manager_private_1[runners-manager-private-1]
runners_manager_private_2[runners-manager-private-2]
runners_manager_shared_gitlab_org_X[runners-manager-shared-gitlab-org-X]
runners_manager_shared_X[runners-manager-shared-X]
end
subgraph example-ephemeral-vms
private_ephemeral_vm_1
private_ephemeral_vm_2
shared_gitlab_org_ephemeral_vm
end
runners_manager_private_1 -->|manages a job on| private_ephemeral_vm_1
runners_manager_private_2 -->|manages a job on| private_ephemeral_vm_2
runners_manager_shared_gitlab_org_X -->|manages a job on| shared_gitlab_org_ephemeral_vm
end
end
subgraph project::gitlab-ci-plan-free-X
subgraph VPC::gitlab-ci-plan-free-X::ephemeral-runners
shared_ephemeral_vm
end
end
runners_manager_shared_X --> shared_ephemeral_vm
runners_manager_private_1 -->|connects through VPC peering| gstg_ci_gateway_ILB_c
runners_manager_private_2 -->|connects through VPC peering| gstg_ci_gateway_ILB_d
runners_manager_private_1 -->|connects through VPC peering| gprd_ci_gateway_ILB_c
runners_manager_private_2 -->|connects through VPC peering| gprd_ci_gateway_ILB_d
runners_manager_shared_gitlab_org_X -->|connects through VPC peering| gprd_ci_gateway_ILB_c
runners_manager_shared_X -->|connects through VPC peering| gprd_ci_gateway_ILB_d
private_ephemeral_vm_1 -->|connects through VPC peering| gstg_ci_gateway_ILB_c
private_ephemeral_vm_2 -->|connects through VPC peering| gprd_ci_gateway_ILB_d
shared_gitlab_org_ephemeral_vm -->|connects through VPC peering| gprd_ci_gateway_ILB_c
shared_ephemeral_vm -->|connects through VPC peering| gprd_ci_gateway_ILB_d
How it’s configured
Section titled “How it’s configured”The above diagram shows a general view of how this configuration is set up.
In both GSTG and GPRD projects we’ve created a dedicated VPC named ci-gateway. This VPC
contains Internal Load Balancers (ILBs) available on defined FQDNs. The VPCs are peered with
CI VPCs that contain runner managers and ephemeral VMs on which the jobs are executed.
As an ILB can route traffic only to nodes in the same VPC, we had to add a small change
to our HaProxy configuration. We’ve created a dedicated cluster of new HaProxy nodes
provisioned with two network interfaces: in gprd and in ci-gateway VPCs. The same
configuration is created in GSTG.
HaProxy got a new frontend named https_git_ci_gateway and listening on port 8989. This
fronted passes the detected git+https traffic and a limited amount of API endpoints (purely
for Runner communication, which includes requesting for a job, sending trace update and sending
job update) to GitLab backends. Other requests are redirected with 307 HTTP response code
to staging.gitlab.com or gitlab.com - depending on the requested resource.
To reduce the cost that is created by traffic made across availability zones, in each project
we have two ILBs - one for each availability zone (us-east1-c and us-east1-d) used by the CI
fleet in the us-east1 region. Each ILB is configured to target HaProxy nodes only in its
availability zone.
For that, the following FQDNs were created:
git-us-east1-c.ci-gateway.int.gstg.gitlab.netgit-us-east1-d.ci-gateway.int.gstg.gitlab.netgit-us-east1-c.ci-gateway.int.gprd.gitlab.netgit-us-east1-d.ci-gateway.int.gprd.gitlab.net
Runner nodes are configured to point the ILBs with the url and clone_url settings.
As we set our runners to operate in a specific availability zone, each of them
points the relevant ILB FQDN.
How it works
Section titled “How it works”GitLab Runner is configured to talk with the dedicated ILB. Communication goes through the VPC peering and reaches one of the HaProxy nodes backing the ILB. TLS certificate is verified and Runner saves this information to configure Git in the job environment.
When job is received, Runner starts executing it on the ephemeral VM. It configures Git to use the CAChain resolved from initial API request. Repo URL is configured to use the ILB as GitLab’s endpoint.
When job reaches the step in which sources are updated, git clone operation is
executed against the ILB. Communication again goes through the VPC peering and reaches
one of the HaProxy nodes. TLS certificate is verified using the CAChain resolved
earlier.
When job reaches the step when artifact needs to be downloaded or uploaded, it also tries to talk with the ILB. However, HaProxy frontend detects that this communication is unsupported and redirects it to the public Internet gateway of GitLab instance that the job belongs to.
In the meantime, Runner receives job logs and transfers them back - together with updating the status of the job - to GitLab’s API. For that the communication through VPC peering and the dedicaed ILB is used as well.