Skip to content

index

This contains the relevant information for Disaster Recovery on GitLab.com as it relates to testing, validation, and current gaps that would prevent recovery.

GitLab backups are designed to be tolerant for both zonal and regional outages by storing data in global (multi-region) object storage.

The DR strategy for SaaS is based on our current backup strategy:

Validation of restores happen in CI pipelines for both the Postgresql database and disk snapshots:

If you suspect there is a regional or zonal outage (or degredation), please read through the recovery guide.

For testing recovery of snapshots the dr-testing environment can be used, this environment holds examples of different recovery types including Gitaly snapshot recovery.

Denying network traffic to an availability zone

Section titled “Denying network traffic to an availability zone”

A helper script is available to help simulate a zonal outage by setting up firewall rules that prevent both ingress and egress traffic, currently this is available to run in our non-prod environments for the zones us-east1-b and us-east1-d. The zone us-east1-c has SPOFs like the deploy and console nodes so we should avoid running tests on this zone until they have been resolved in the epic tracking critical work related to zonal failures.

Note: Run this script with care! All changes should go through change management, even for non-prod environments!

$ ./zone-denier -h
Usage: ./zone-denier [-e <environment> (gstg|pre) -a <action> (deny|allow) -z <zone> -d]
-e : Environment to target, must be a non-prod env
-a : deny or allow traffic for the specified zone
-z : availability zone to target
-d (optional): run in dry-run mode
Examples:
# Use the dry-run option to see what infra will be denied
./zone-denier -e pre -z us-east1-b -a deny -d
# Deny both ingress and egress traffic in us-east1-b in PreProd
./zone-denier -e pre -z us-east1-b -a deny
# Revert the deny to allow traffic
./zone-denier -e pre -z us-east1-b -a allow

The script is configured to exclude a static list of known SPOFs for each environment.