Creating a new GKE cluster
The following is a high level guide on what it takes to build out the necessary bits for adding GKE and bringing over components of GitLab into Kubernetes.
Our current application configuration components:
- https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com
- https://gitlab.com/gitlab-com/gl-infra/argocd/apps
- https://gitlab.com/gitlab-com/gl-infra/argocd/config
For GitLab.com there is a regional cluster and multiple zonal clusters to service traffic for each environment. This document covers how to build a new cluster, note that currently this procedure is not automated and may take hours to complete.
Provision the cluster in Terraform
Section titled “Provision the cluster in Terraform”-
Three modules create the IP reservations needed for monitoring and ingress, the cluster and node pools, and external DNS, see this example for one of the zonal clusters or the full regional cluster terraform configuration.
-
Set IAM user permissions on cluster
- This is manual, documented here: https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/blob/master/README.md#service-account
- The service account will have an authentication key file in json format created, we’ll need this for the next step.
Register the cluster with ArgoCD
Section titled “Register the cluster with ArgoCD”ArgoCD manages workloads on the cluster from https://gitlab.com/gitlab-com/gl-infra/argocd/apps; it discovers clusters via Secrets stored in https://gitlab.com/gitlab-com/gl-infra/argocd/config.
To onboard the new cluster, follow How to onboard a GKE cluster into ArgoCD.
Prometheus rules
Section titled “Prometheus rules”New Clusters
Section titled “New Clusters”- Inside of our
runbooksrepo, we need to add a configuration inside of.gitlab-ci.yamlto deploy to our new cluster. - Ensure the appropriate variables are added to the ops instance Utilize this MR as a guideline: https://gitlab.com/gitlab-com/runbooks/merge_requests/1200
Replacement Clusters
Section titled “Replacement Clusters”- Go to the
runbooksCI jobs, find the latest green pipeline on our default branch, then find the job associated with the existing cluster and retry it - Check the CI output, we should see the addition of our custom resources
required by our various Prometheus components
- This includes various service monitors, alert rules, and prometheus rules
Thanos configuration
Section titled “Thanos configuration”Thanos query needs to know about the prometheus endpoints, these are set in the ops-base.json chef role
Configure gitlab-com
Section titled “Configure gitlab-com”See bootstrapping new clusters for how to apply the gitlab helm chart on the cluster.