Pulp Backup and Restore

Overview

This runbook covers backup and restore procedures for the Pulp service. Pulp’s backup strategy consists of two main components:

CloudSQL Database Backups - Automated backups of the PostgreSQL database
Object Storage - Artifacts stored in GCS buckets with built-in redundancy

Important Notes

Note that we are not leveraging the native Pulp operator for backup and restoration. We instead rely solely on the strategies provided by our cloud provider. Refer to the Pulp Operator documentation for additional details on why we are not using the Pulp Operator for backups.

Backup Configuration

Backups should be configured via Terraform:

Database Restore Procedure

Prerequisites

Access to GCP Cloud SQL Console
Appropriate IAM permissions for database restoration
Pulp CLI configured and authenticated
Understanding that the application will be degraded during restore

Restore Steps

Follow GCP’s documentation for restoring from backup.

1. Document Current State (Recommended for Validation)

Pod Counts

Gather the Pod Counts so that we can scale down and scale back up post restoration.

kubectl get deploy -n pulp

Document the desired Pod counts. Note that these Deployments do NOT use Horizontal Pod Autoscalers.

Database PreRestore Analysis

This depends on the scenario thus the below is only an example. Determine how we can validate a restoration was successful. Before restoring, document the current state for post-restore comparison:

# List current users (example verification)
pulp user list | jq '.[].username' 2>/dev/null || pulp user list

# Example output:
# "user1"
# "user2"
# "admin"

# Check system status
pulp status

Save this output for comparison after the restore completes.

2. Scale down Pulp

Scale down the Pods as to prevent any interference while the database is being restored.

kubectl patch pulp pulp -n pulp --type='merge'
  -p='
  {"spec":{
    "api":{"replicas":0},
    "content":{"replicas":0},
    "web":{"replicas":0},
    "worker":{"replicas":0}
  }}'

3. Perform the Restore

In the GCP Console, navigate to your CloudSQL instance
Click on “Backups” in the left sidebar
Select the backup you want to restore from (verify the timestamp)
Click “Restore”
Confirm the restoration

Note: Restoration time varies based on database size. For small databases (<1GB range), expect approximately 10 minutes. Larger databases may take significantly longer.

4. Scale up Pulp

Scale up, substitute the below numbers with what was documented earlier:

kubectl patch pulp pulp -n pulp --type='merge'
  -p='
  {"spec":{
    "api":{"replicas":1},
    "content":{"replicas":1},
    "web":{"replicas":1},
    "worker":{"replicas":1}
  }}'

5. Verify the Restore

Once the restore completes and pods are stable:

Wait for all pods to reach Ready state:
Terminal window
```
kubectl get pods -n pulp -w
```
Verify database connectivity, using the pulp-cli:
Terminal window
```
pulp status
```
Verify data integrity by comparing with pre-restore state (the below is example only):
Terminal window
```
# Check users match the backup timestamp
pulp user list | jq .[].username
```
Confirm the data matches the backup timestamp (data created after the backup should not exist)

Post-Restore Actions

Monitor application logs for any persistent errors
Verify that all Pulp services are functioning correctly
Test critical workflows (e.g., package uploads, downloads)
Document the restore in an incident issue

Object Storage Restore Procedure

GCS buckets used by Pulp benefit from GCP’s built-in redundancy features. In the event of storage issues:

Verify bucket configuration and replication settings
Check GCS availability and durability documentation
Review the Terraform configuration to identify backup bucket settings and replication configuration. If data loss is confirmed, coordinate with the infrastructure team to restore from replicated buckets.
Contact GCP support if data loss is suspected

References

Disaster Recovery Testing