Backups
Verification
Section titled “Verification”Backups, and verification process use gitlab-restore pipeline to automate everything.
The pipeline fetches the ID of the latest CloudSQL backup for the GCP project it
runs against (e.g. gitlab-subscriptions-prod
). A pre-provisioned CloudSQL on
gitlab-restore
is used to restore the backup. Finally a verify job runs a set
of queries to make sure that a recent database has been restored.
GitLab Analysis
Section titled “GitLab Analysis”The data team uses the backups as a data source to extract information without affecting the production database.
flowchart LR subgraph gitlab-restore[gitlab-restore VPC] customers-dot-proxy[customers-dot-proxy machine] restore-stgsub[restore-stgsub CloudSQL instance] restore-prdsub[restore-prdsub CloudSQL instance] end
subgraph gitlab-analysis[gitlab-anaalysis VPC] extract-pipeline--query postgres-->customers-dot-proxy end
gitlab-restore -.peering.-> gitlab-analysis gitlab-analysis -.peering.-> gitlab-restore customers-dot-proxy --> restore-stgsub customers-dot-proxy --> restore-prdsub
style gitlab-restore fill:#88c style gitlab-analysis fill:#548
Connections from peered projects into CloudSQL instances cannot be established
directly, so a proxy of sorts is needed for the gitlab-analysis
to access the
databases. A proxy VM (named customers-dot-proxy
) is created by the
gitlab-restore
pipeline, on which a CloudSQL Auth Proxy process is running.
Processes from gitlab-analysis
connect to CloudSQL Auth Proxy as if they are
connecting to a normal PostgreSQL database.
The proxy VM has a static private
IP,
and GCP firewall rules exists to allow connections from known
subnets
on gitlab-analysis
to the gitlab-restore
project.
The name of the PostgreSQL databases are CustomersDot_production
and
CustomersDot_stg
, for production and staging, respectively.