GitLab API Service
- Service Overview
- Alerts: https://alerts.gitlab.net/#/alerts?filter=%7Btype%3D%22api%22%2C%20tier%3D%22sv%22%7D
- Label: gitlab-com/gl-infra/production~“Service::API”
Logging
Section titled “Logging”Summary
Section titled “Summary”:warning: All links to code/configurations are Permalinks that go to a specific snapshot in time. Keep that in mind and ensure that one views that of the master branch for the most up-to-date view of the target configuration :warning:
The API service is enabling internal and external clients to interact with GitLab.com via http(s) endpoints without using the web UI. The API service is mainly used to drive automation using GitLab and critical for the function of GitLab.com.
The API is directly depending on the DB (pgbouncer/patroni), Redis, Redis Cache, Redis Sidekiq and Gitaly.
Architecture
Section titled “Architecture”- Handbook architecture overview for GitLab.com
- Handbook K8s Cluster configuration
- Handbook K8s architecture overview
Logical Architecture
Section titled “Logical Architecture”graph TB
Cloudflare
subgraph GCP subgraph gitlab-production GCP-TCP-LB HAPROXY
subgraph GKE-zonal-cluster GCP-TCP-LB-GKE
subgraph Namespaces logging monitoring consul
subgraph gitlab subgraph defualt-node-pool nginx-ingress-controller end
subgraph api-node-pool subgraph API-pods gitlab-workhorse webservice end end end end end end
DB Redis Gitalyend
Cloudflare -->GCP-TCP-LB -->HAPROXY -->GCP-TCP-LB-GKE -->nginx-ingress-controller -->
webservice --> DBwebservice --> Rediswebservice --> Gitaly
gitlab-workhorse --> Redisgitlab-workhorse --> webservice
style GCP fill:#9BF;style gitlab-production fill:#1AD;style GKE-zonal-cluster fill:#EAD;style api-node-pool fill:#ECD;style defualt-node-pool fill:#ECD;style Namespaces fill:#FED;style API-pods fill:#3E7;style gitlab fill:#FAA;style logging fill:#E77;style monitoring fill:#F89;style consul fill:#FAB;
API main stage is deployed into 3 zonal clusters to reduce the failure domain and traffic costs. The API canary stage resides on the regional cluster.
Kubernetes Architecture
Section titled “Kubernetes Architecture”Most objects deployed are installed as part of the GitLab Helm Chart’s Webservice Chart: https://gitlab.com/gitlab-org/charts/gitlab/-/tree/master/charts/gitlab/charts/webservice
This chart leverages multiple webservice
deployments pending the configuration
of the deployments key associated with the ingress path. Currently we are
creating an api
deployment which uses an ingress path of /
.
graph TB
a[webservice chart]--> b[api Deployment]b --> c[api Replicaset]a --> dd[api HPA] --> cc --> e[X number of Pods]a --> f[webservice ServiceAccount]a --> g[api Ingress]a --> h[api Service]h --> i[api Endpoints]i --> e
Performance
Section titled “Performance”We currently (Feb 2021) serve between 4k to 5k API requests/s to workhorse on a business day and 2.5k to 3.5k requests/s to puma.
Performance of the API service mainly depends on these factors:
- Node Pool:
- https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/blob/6ac69bda99a722b0238c28fa348ddb85436c54f6/environments/gprd/gke-zonal.tf
- This configuration exists for EACH cluster
- Node type: https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/blob/6ac69bda99a722b0238c28fa348ddb85436c54f6/environments/gprd/variables.tf#L422
- Number of puma worker_processes:
sum(puma_workers{env="gprd", type="api"})
- Number of puma threads:
sum(puma_max_threads{env="gprd", type="api"})
See https://gitlab.com/gitlab-com/gl-infra/delivery/-/issues/1592 for a detailed analysis.
It is important to always have enough API capacity so that rolling deployments or turning off canary is not affecting user experience.
In a K8s deployment we need to tune resource requests, autoscaling and rollingUpdate settings to ensure API is able to meet it’s SLOs and can react to surges in traffic as well as to prevent frequent POD eviction as starting an API POD is taking a long time. Have a look at:
- hpa target values
- rollingUpdate strategy maxSurge
- minReplicas and maxReplicas
- resource requests and limits for CPU and memory
Request Throttling
Section titled “Request Throttling”Requests are limited mainly by HAProxy (2000rps per IP) and RackAttack. See the handbook for published limits and ../rate-limiting/README.md for details of our rate limiting. A few customer IPs are still excluded from rate limiting.
Scalability
Section titled “Scalability”The API service is stateless and mostly CPU bound. Scaling can easily be done horizontally by adding more pods. But this also means we open more connections to the database or Redis and could shift scalability issues downward. We have saturation alerts to cover us but should also have a regular look at the capacity planning forecast.
Scaling is currently handled automatically by the Horizontal Pod Autoscaler. The configuration for such is defined here: https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/8b9068833c57a1d986a6461e1f6b5aa61b79b7e4/releases/gitlab/values/gprd.yaml.gotmpl#L32-35
Scaling vertically by using a more powerful machine type also can be considered.
Availability
Section titled “Availability”Healthchecks are defined here: https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/blob/master/roles/gprd-base-haproxy-main-config.json
See our cookbook for how this configuration is built: https://gitlab.com/gitlab-cookbooks/gitlab-haproxy/
Healthcheck Flow
Section titled “Healthcheck Flow”Healthchecks are performed for a variety of reasons from differing systems.
- HAProxy needs to determine if the API service is healthy
- Kubernetes needs to determine if a Pod is healthy
- Pods destined to be turned down, need to respond accordingly such that Kubernetes pulls a Pod out of Service while it completes cleaning up
HAProxy’s Healthcheck
Section titled “HAProxy’s Healthcheck”This healthcheck is sent to a single backend, otherwise known as the NGINX Ingress that sits between the API Pods and the outside world. It should be noted that HAProxy has no knowledge of how the backend is all interconnected. A single healthcheck will effectively land on one randomly assigned Pod. These healthchecks therefore depend on the health of both the NGINX Ingress, and the one randomly chosen Pod for all traffic that is destined into Kubernetes. It’s important to keep in mind that during deployments, healthchecks will hit both the old and new Pods as the deployment cycle completes.
sequenceDiagram participant a as HAProxy participant b as NGINX Ingress participant c as Workhorse
a->>b: HTTP GET `/-/readiness` b->>c: HTTP GET `/-/readiness` c->>a: HTTP GET `/-/readiness`
Kubernetes
Section titled “Kubernetes”Kubernetes will send healthchecks to each container to determine if the Pod is
healthy. Our /-/liveness
probe should always return an HTTP200 so long as
puma is running. Our /-/readiness
endpoint should do the same, though kill
signals will modify this behavior. If we send a SIGTERM
to the Pod, this
endpoint will begin returning an HTTP503. This fails the readiness probe and
removes the Pod from servicing any future requests.
Workhorse uses a script embedded into it’s container that is executed. Should a liveness probe fail for any Pods, Kubernetes will eventually restart the Pod.
sequenceDiagram participant a as Kubernetes participant b as Webservice participant c as Workhorse
a->>b: Liveness Probe: HTTP GET `/-/liveness` a->>b: Readiness Probe: HTTP GET `/-/readiness` a->>c: Livenss Probe: exec /scripts/healthcheck
Durability
Section titled “Durability”To minimize interrupted requests between the client and the API service, we have a special workhorse configuration to prevent unnecessary HTTP502 errors from occurring during deployments.
- We extend how long a Pod waits until it is
SIGKILL
’d by Kubernetes - https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/1f0a73923666f8bb50085362d2771653e8001308/releases/gitlab/values/values.yaml.gotmpl#L658 - We enable API Long Polling https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/1f0a73923666f8bb50085362d2771653e8001308/releases/gitlab/values/values.yaml.gotmpl#L651
With API Long Polling configured, we for requests that are destined for
/api/v4/jobs/request
to sit in a lengthy poll such that Runner clients are not
unnecessarily pounding the API for jobs. Doing so forces us to need to increase
our terminationGracePeriodSeconds to something higher, otherwise, NGINX will
respond with a massive amount of HTTP502’s as connections are severed when a Pod
is removed prior to responding to the client.
A plan of action to make this better: https://gitlab.com/gitlab-org/gitlab/-/issues/331460
Monitoring/Alerting
Section titled “Monitoring/Alerting”For our APDEX score latencies we generally have set satisfiedThreshold
to 1s
and toleratedThreshold
to 10s and we aim for an APDEX score threshold of 0.995
and error ratio threshold of 0.999 - see metrics
catalog.