Skip to content

Pulp Backup and Restore

This runbook covers backup and restore procedures for the Pulp service. Pulp’s backup strategy consists of two main components:

  1. CloudSQL Database Backups - Automated backups of the PostgreSQL database
  2. Object Storage - Artifacts stored in GCS buckets with built-in redundancy

Note that we are not leveraging the native Pulp operator for backup and restoration. We instead rely solely on the strategies provided by our cloud provider. Refer to the Pulp Operator documentation for additional details on why we are not using the Pulp Operator for backups.

Backups should be configured via Terraform:

  • Access to GCP Cloud SQL Console
  • Appropriate IAM permissions for database restoration
  • Pulp CLI configured and authenticated
  • Understanding that the application will be degraded during restore

Follow GCP’s documentation for restoring from backup.

Section titled “1. Document Current State (Recommended for Validation)”

Gather the Pod Counts so that we can scale down and scale back up post restoration.

Terminal window
kubectl get deploy -n pulp

Document the desired Pod counts. Note that these Deployments do NOT use Horizontal Pod Autoscalers.

This depends on the scenario thus the below is only an example. Determine how we can validate a restoration was successful. Before restoring, document the current state for post-restore comparison:

Terminal window
# List current users (example verification)
pulp user list | jq '.[].username' 2>/dev/null || pulp user list
# Example output:
# "user1"
# "user2"
# "admin"
# Check system status
pulp status

Save this output for comparison after the restore completes.

Scale down the Pods as to prevent any interference while the database is being restored.

Terminal window
kubectl patch pulp pulp -n pulp --type='merge'
-p='
{"spec":{
"api":{"replicas":0},
"content":{"replicas":0},
"web":{"replicas":0},
"worker":{"replicas":0}
}}'
  1. In the GCP Console, navigate to your CloudSQL instance
  2. Click on “Backups” in the left sidebar
  3. Select the backup you want to restore from (verify the timestamp)
  4. Click “Restore”
  5. Confirm the restoration

Note: Restoration time varies based on database size. For small databases (<1GB range), expect approximately 10 minutes. Larger databases may take significantly longer.

Scale up, substitute the below numbers with what was documented earlier:

Terminal window
kubectl patch pulp pulp -n pulp --type='merge'
-p='
{"spec":{
"api":{"replicas":1},
"content":{"replicas":1},
"web":{"replicas":1},
"worker":{"replicas":1}
}}'

Once the restore completes and pods are stable:

  1. Wait for all pods to reach Ready state:

    Terminal window
    kubectl get pods -n pulp -w
  2. Verify database connectivity, using the pulp-cli:

    Terminal window
    pulp status
  3. Verify data integrity by comparing with pre-restore state (the below is example only):

    Terminal window
    # Check users match the backup timestamp
    pulp user list | jq .[].username
  4. Confirm the data matches the backup timestamp (data created after the backup should not exist)

  1. Monitor application logs for any persistent errors
  2. Verify that all Pulp services are functioning correctly
  3. Test critical workflows (e.g., package uploads, downloads)
  4. Document the restore in an incident issue

GCS buckets used by Pulp benefit from GCP’s built-in redundancy features. In the event of storage issues:

  1. Verify bucket configuration and replication settings
  2. Check GCS availability and durability documentation
  3. Review the Terraform configuration to identify backup bucket settings and replication configuration. If data loss is confirmed, coordinate with the infrastructure team to restore from replicated buckets.
  4. Contact GCP support if data loss is suspected