GitLab API Service

Service Overview
Alerts: https://alerts.gitlab.net/#/alerts?filter=%7Btype%3D%22api%22%2C%20tier%3D%22sv%22%7D
Label: gitlab-com/gl-infra/production~“Service::API”

Logging

Summary

:warning: All links to code/configurations are Permalinks that go to a specific snapshot in time. Keep that in mind and ensure that one views that of the master branch for the most up-to-date view of the target configuration :warning:

The API service is enabling internal and external clients to interact with GitLab.com via http(s) endpoints without using the web UI. The API service is mainly used to drive automation using GitLab and critical for the function of GitLab.com.

The API is directly depending on the DB (pgbouncer/patroni), Redis, Redis Cache, Redis Sidekiq and Gitaly.

Architecture

Logical Architecture

graph TB

Cloudflare

subgraph GCP
  subgraph gitlab-production
    GCP-TCP-LB
    HAPROXY

    subgraph GKE-zonal-cluster
      GCP-TCP-LB-GKE

      subgraph Namespaces
        logging
        monitoring
        consul

        subgraph gitlab
          subgraph defualt-node-pool
            nginx-ingress-controller
          end

          subgraph api-node-pool
            subgraph API-pods
              gitlab-workhorse
              webservice
            end
          end
        end
      end
    end
  end

  DB
  Redis
  Gitaly
end

Cloudflare -->
GCP-TCP-LB -->
HAPROXY -->
GCP-TCP-LB-GKE -->
nginx-ingress-controller -->

webservice --> DB
webservice --> Redis
webservice --> Gitaly

gitlab-workhorse --> Redis
gitlab-workhorse --> webservice

style GCP fill:#9BF;
style gitlab-production fill:#1AD;
style GKE-zonal-cluster fill:#EAD;
style api-node-pool fill:#ECD;
style defualt-node-pool fill:#ECD;
style Namespaces fill:#FED;
style API-pods fill:#3E7;
style gitlab fill:#FAA;
style logging fill:#E77;
style monitoring fill:#F89;
style consul fill:#FAB;

API main stage is deployed into 3 zonal clusters to reduce the failure domain and traffic costs. The API canary stage resides on the regional cluster.

Kubernetes Architecture

Most objects deployed are installed as part of the GitLab Helm Chart’s Webservice Chart: https://gitlab.com/gitlab-org/charts/gitlab/-/tree/master/charts/gitlab/charts/webservice

This chart leverages multiple webservice deployments pending the configuration of the deployments key associated with the ingress path. Currently we are creating an api deployment which uses an ingress path of /.

graph TB

a[webservice chart]--> b[api Deployment]
b --> c[api Replicaset]
a --> d
d[api HPA] --> c
c --> e[X number of Pods]
a --> f[webservice ServiceAccount]
a --> g[api Ingress]
a --> h[api Service]
h --> i[api Endpoints]
i --> e

Performance

We currently (Feb 2021) serve between 4k to 5k API requests/s to workhorse on a business day and 2.5k to 3.5k requests/s to puma.

Performance of the API service mainly depends on these factors:

Node Pool:
- https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/blob/6ac69bda99a722b0238c28fa348ddb85436c54f6/environments/gprd/gke-zonal.tf
- This configuration exists for EACH cluster
Node type: https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/blob/6ac69bda99a722b0238c28fa348ddb85436c54f6/environments/gprd/variables.tf#L422
Number of puma worker_processes: sum(puma_workers{env="gprd", type="api"})
Number of puma threads: sum(puma_max_threads{env="gprd", type="api"})

See https://gitlab.com/gitlab-com/gl-infra/delivery/-/issues/1592 for a detailed analysis.

It is important to always have enough API capacity so that rolling deployments or turning off canary is not affecting user experience.

K8s

In a K8s deployment we need to tune resource requests, autoscaling and rollingUpdate settings to ensure API is able to meet it’s SLOs and can react to surges in traffic as well as to prevent frequent POD eviction as starting an API POD is taking a long time. Have a look at:

hpa target values
rollingUpdate strategy maxSurge
minReplicas and maxReplicas
resource requests and limits for CPU and memory

Request Throttling

Requests are limited mainly by HAProxy (2000rps per IP) and RackAttack. See the handbook for published limits and ../rate-limiting/README.md for details of our rate limiting. A few customer IPs are still excluded from rate limiting.

Scalability

The API service is stateless and mostly CPU bound. Scaling can easily be done horizontally by adding more pods. But this also means we open more connections to the database or Redis and could shift scalability issues downward. We have saturation alerts to cover us but should also have a regular look at the capacity planning forecast.

Scaling is currently handled automatically by the Horizontal Pod Autoscaler. The configuration for such is defined here: https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/8b9068833c57a1d986a6461e1f6b5aa61b79b7e4/releases/gitlab/values/gprd.yaml.gotmpl#L32-35

Scaling vertically by using a more powerful machine type also can be considered.

Availability

Healthchecks are defined here: https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/blob/master/roles/gprd-base-haproxy-main-config.json

See our cookbook for how this configuration is built: https://gitlab.com/gitlab-cookbooks/gitlab-haproxy/

Healthcheck Flow

Healthchecks are performed for a variety of reasons from differing systems.

HAProxy needs to determine if the API service is healthy
Kubernetes needs to determine if a Pod is healthy
Pods destined to be turned down, need to respond accordingly such that Kubernetes pulls a Pod out of Service while it completes cleaning up

HAProxy’s Healthcheck

This healthcheck is sent to a single backend, otherwise known as the NGINX Ingress that sits between the API Pods and the outside world. It should be noted that HAProxy has no knowledge of how the backend is all interconnected. A single healthcheck will effectively land on one randomly assigned Pod. These healthchecks therefore depend on the health of both the NGINX Ingress, and the one randomly chosen Pod for all traffic that is destined into Kubernetes. It’s important to keep in mind that during deployments, healthchecks will hit both the old and new Pods as the deployment cycle completes.

sequenceDiagram
  participant a as HAProxy
  participant b as NGINX Ingress
  participant c as Workhorse

  a->>b: HTTP GET `/-/readiness`
  b->>c: HTTP GET `/-/readiness`
  c->>a: HTTP GET `/-/readiness`

Kubernetes

Kubernetes will send healthchecks to each container to determine if the Pod is healthy. Our /-/liveness probe should always return an HTTP200 so long as puma is running. Our /-/readiness endpoint should do the same, though kill signals will modify this behavior. If we send a SIGTERM to the Pod, this endpoint will begin returning an HTTP503. This fails the readiness probe and removes the Pod from servicing any future requests.

Workhorse uses a script embedded into it’s container that is executed. Should a liveness probe fail for any Pods, Kubernetes will eventually restart the Pod.

sequenceDiagram
  participant a as Kubernetes
  participant b as Webservice
  participant c as Workhorse

  a->>b: Liveness Probe: HTTP GET `/-/liveness`
  a->>b: Readiness Probe: HTTP GET `/-/readiness`
  a->>c: Livenss Probe: exec /scripts/healthcheck

Durability

To minimize interrupted requests between the client and the API service, we have a special workhorse configuration to prevent unnecessary HTTP502 errors from occurring during deployments.

We extend how long a Pod waits until it is SIGKILL’d by Kubernetes - https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/1f0a73923666f8bb50085362d2771653e8001308/releases/gitlab/values/values.yaml.gotmpl#L658
We enable API Long Polling https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/1f0a73923666f8bb50085362d2771653e8001308/releases/gitlab/values/values.yaml.gotmpl#L651

With API Long Polling configured, we for requests that are destined for /api/v4/jobs/request to sit in a lengthy poll such that Runner clients are not unnecessarily pounding the API for jobs. Doing so forces us to need to increase our terminationGracePeriodSeconds to something higher, otherwise, NGINX will respond with a massive amount of HTTP502’s as connections are severed when a Pod is removed prior to responding to the client.

A plan of action to make this better: https://gitlab.com/gitlab-org/gitlab/-/issues/331460

Monitoring/Alerting

For our APDEX score latencies we generally have set satisfiedThreshold to 1s and toleratedThreshold to 10s and we aim for an APDEX score threshold of 0.995 and error ratio threshold of 0.999 - see metrics catalog.