Skip to content

GitLab API Service

:warning: All links to code/configurations are Permalinks that go to a specific snapshot in time. Keep that in mind and ensure that one views that of the master branch for the most up-to-date view of the target configuration :warning:

The API service is enabling internal and external clients to interact with GitLab.com via http(s) endpoints without using the web UI. The API service is mainly used to drive automation using GitLab and critical for the function of GitLab.com.

The API is directly depending on the DB (pgbouncer/patroni), Redis, Redis Cache, Redis Sidekiq and Gitaly.

graph TB
Cloudflare
subgraph GCP
subgraph gitlab-production
GCP-TCP-LB
HAPROXY
subgraph GKE-zonal-cluster
GCP-TCP-LB-GKE
subgraph Namespaces
logging
monitoring
consul
subgraph gitlab
subgraph defualt-node-pool
nginx-ingress-controller
end
subgraph api-node-pool
subgraph API-pods
gitlab-workhorse
webservice
end
end
end
end
end
end
DB
Redis
Gitaly
end
Cloudflare -->
GCP-TCP-LB -->
HAPROXY -->
GCP-TCP-LB-GKE -->
nginx-ingress-controller -->
webservice --> DB
webservice --> Redis
webservice --> Gitaly
gitlab-workhorse --> Redis
gitlab-workhorse --> webservice
style GCP fill:#9BF;
style gitlab-production fill:#1AD;
style GKE-zonal-cluster fill:#EAD;
style api-node-pool fill:#ECD;
style defualt-node-pool fill:#ECD;
style Namespaces fill:#FED;
style API-pods fill:#3E7;
style gitlab fill:#FAA;
style logging fill:#E77;
style monitoring fill:#F89;
style consul fill:#FAB;

API main stage is deployed into 3 zonal clusters to reduce the failure domain and traffic costs. The API canary stage resides on the regional cluster.

Most objects deployed are installed as part of the GitLab Helm Chart’s Webservice Chart: https://gitlab.com/gitlab-org/charts/gitlab/-/tree/master/charts/gitlab/charts/webservice

This chart leverages multiple webservice deployments pending the configuration of the deployments key associated with the ingress path. Currently we are creating an api deployment which uses an ingress path of /.

graph TB
a[webservice chart]--> b[api Deployment]
b --> c[api Replicaset]
a --> d
d[api HPA] --> c
c --> e[X number of Pods]
a --> f[webservice ServiceAccount]
a --> g[api Ingress]
a --> h[api Service]
h --> i[api Endpoints]
i --> e

We currently (Feb 2021) serve between 4k to 5k API requests/s to workhorse on a business day and 2.5k to 3.5k requests/s to puma.

Performance of the API service mainly depends on these factors:

See https://gitlab.com/gitlab-com/gl-infra/delivery/-/issues/1592 for a detailed analysis.

It is important to always have enough API capacity so that rolling deployments or turning off canary is not affecting user experience.

In a K8s deployment we need to tune resource requests, autoscaling and rollingUpdate settings to ensure API is able to meet it’s SLOs and can react to surges in traffic as well as to prevent frequent POD eviction as starting an API POD is taking a long time. Have a look at:

  • hpa target values
  • rollingUpdate strategy maxSurge
  • minReplicas and maxReplicas
  • resource requests and limits for CPU and memory

Requests are limited mainly by HAProxy (2000rps per IP) and RackAttack. See the handbook for published limits and ../rate-limiting/README.md for details of our rate limiting. A few customer IPs are still excluded from rate limiting.

The API service is stateless and mostly CPU bound. Scaling can easily be done horizontally by adding more pods. But this also means we open more connections to the database or Redis and could shift scalability issues downward. We have saturation alerts to cover us but should also have a regular look at the capacity planning forecast.

Scaling is currently handled automatically by the Horizontal Pod Autoscaler. The configuration for such is defined here: https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/8b9068833c57a1d986a6461e1f6b5aa61b79b7e4/releases/gitlab/values/gprd.yaml.gotmpl#L32-35

Scaling vertically by using a more powerful machine type also can be considered.

Healthchecks are defined here: https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/blob/master/roles/gprd-base-haproxy-main-config.json

See our cookbook for how this configuration is built: https://gitlab.com/gitlab-cookbooks/gitlab-haproxy/

Healthchecks are performed for a variety of reasons from differing systems.

  1. HAProxy needs to determine if the API service is healthy
  2. Kubernetes needs to determine if a Pod is healthy
  3. Pods destined to be turned down, need to respond accordingly such that Kubernetes pulls a Pod out of Service while it completes cleaning up

This healthcheck is sent to a single backend, otherwise known as the NGINX Ingress that sits between the API Pods and the outside world. It should be noted that HAProxy has no knowledge of how the backend is all interconnected. A single healthcheck will effectively land on one randomly assigned Pod. These healthchecks therefore depend on the health of both the NGINX Ingress, and the one randomly chosen Pod for all traffic that is destined into Kubernetes. It’s important to keep in mind that during deployments, healthchecks will hit both the old and new Pods as the deployment cycle completes.

sequenceDiagram
participant a as HAProxy
participant b as NGINX Ingress
participant c as Workhorse
a->>b: HTTP GET `/-/readiness`
b->>c: HTTP GET `/-/readiness`
c->>a: HTTP GET `/-/readiness`

Kubernetes will send healthchecks to each container to determine if the Pod is healthy. Our /-/liveness probe should always return an HTTP200 so long as puma is running. Our /-/readiness endpoint should do the same, though kill signals will modify this behavior. If we send a SIGTERM to the Pod, this endpoint will begin returning an HTTP503. This fails the readiness probe and removes the Pod from servicing any future requests.

Workhorse uses a script embedded into it’s container that is executed. Should a liveness probe fail for any Pods, Kubernetes will eventually restart the Pod.

sequenceDiagram
participant a as Kubernetes
participant b as Webservice
participant c as Workhorse
a->>b: Liveness Probe: HTTP GET `/-/liveness`
a->>b: Readiness Probe: HTTP GET `/-/readiness`
a->>c: Livenss Probe: exec /scripts/healthcheck

To minimize interrupted requests between the client and the API service, we have a special workhorse configuration to prevent unnecessary HTTP502 errors from occurring during deployments.

  1. We extend how long a Pod waits until it is SIGKILL’d by Kubernetes - https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/1f0a73923666f8bb50085362d2771653e8001308/releases/gitlab/values/values.yaml.gotmpl#L658
  2. We enable API Long Polling https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/1f0a73923666f8bb50085362d2771653e8001308/releases/gitlab/values/values.yaml.gotmpl#L651

With API Long Polling configured, we for requests that are destined for /api/v4/jobs/request to sit in a lengthy poll such that Runner clients are not unnecessarily pounding the API for jobs. Doing so forces us to need to increase our terminationGracePeriodSeconds to something higher, otherwise, NGINX will respond with a massive amount of HTTP502’s as connections are severed when a Pod is removed prior to responding to the client.

A plan of action to make this better: https://gitlab.com/gitlab-org/gitlab/-/issues/331460

For our APDEX score latencies we generally have set satisfiedThreshold to 1s and toleratedThreshold to 10s and we aim for an APDEX score threshold of 0.995 and error ratio threshold of 0.999 - see metrics catalog.

  1. Handbook architecture overview for GitLab.com
  2. Handbook K8s Cluster configuration
  3. General K8s design docs
  4. API K8s Migration Epic