Zero Downtime Postgres Database Decomposition
Overview
Section titled “Overview”This runbook documents a zero-downtime Postgres database decomposition strategy using Logical Replication and PgBouncer reconfiguration. It enables the migration of a functional set of tables from a monolithic cluster to a dedicated Postgres cluster with a controlled, reversible cutover for both read and write traffic—ensuring continuous application availability.
This approach was successfully used to decompose the Security (Sec) application’s data from the Main cluster and now serves as a reusable framework for similar migrations.
Architecture Summary
Section titled “Architecture Summary”- Source Cluster: The original Postgres cluster containing all application data prior to decomposition. In the Security (Sec) migration, this was the Main cluster—accessible via the
master.patroni
service endpoint for reads and writes, anddb-replica.service.consul
for read-only traffic. - Target Cluster: The new Postgres cluster hosting the decomposed tables. In the Sec migration example, this was the Sec cluster—accessible via the
master.patroni-sec
service endpoint for reads and writes, andsec-db-replica.service.consul
for read-only traffic.
Initially, applications connect through PgBouncer for all database read-write traffic via the Source Cluster’s RW Consul endpoint (master.patroni
). Read traffic is routed to the Source Cluster’s read endpoint (db-replica.service.consul
).
Source Cluster | Target Cluster | |
---|---|---|
Postgres Cluster | Main Cluster | Sec Cluster |
PgBouncer Instances | pgbouncer , pgbouncer-sidekiq | pgbouncer-sec , pgbouncer-sidekiq-sec |
RW Consul Endpoint | master.patroni | master.patroni-sec |
Read Consul Endpoint | db-replica.service.consul | sec-db-replica.service.consul |
Application Behavior | Connects through PgBouncer; RW via master.patroni , reads via db-replica.service.consul | Initially same as Source |
Strategy Summary
Section titled “Strategy Summary”This strategy uses Logical Replication to maintain real-time synchronization of selected tables during migration. It eliminates downtime by rerouting traffic through PgBouncer and validating at each stage.
Initial Setup
Section titled “Initial Setup”This phase prepares the Target Cluster and related infrastructure before any traffic is switched to it. The goal is to ensure the new environment is fully provisioned, replicating data from the Source Cluster, and ready to accept traffic in future phases—without impacting the live application’s behavior or availability.
- Provision the Target Cluster (e.g., Sec cluster
patroni-sec
). - Initialize it as a physical standby of the Source Cluster (e.g., Main cluster
patroni
). - Provision the Target Cluster’s PgBouncer instances (
pgbouncer-sec
,pgbouncer-sidekiq-sec
). Initially, these PgBouncer instances are configured to connect to the Source Cluster via themaster.patroni
endpoint. - Configure the Target Cluster’s application to:
- Route read traffic through the Source Cluster’s read endpoint:
db-replica.service.consul
. - Route read-write traffic through the Target Cluster’s PgBouncer instances, which are configured to connect to the Source Cluster via
master.patroni
.
- Route read traffic through the Source Cluster’s read endpoint:
Read Traffic Switchover
Section titled “Read Traffic Switchover”This phase transitions the Target Cluster’s application (e.g., Sec Cluster application) read traffic from the Source Cluster to the Target Cluster’s read replicas, allowing validation of the Target Cluster’s ability to serve production queries under real-world load. Importantly, write traffic continues to go through the Source Cluster, so this change is low-risk and reversible.
🔧 Preparation
Section titled “🔧 Preparation”- Prepare a merge request (MR) to update the application’s read endpoint from the Source Cluster (e.g.,
db-replica.service.consul
) to the Target Cluster (e.g.,sec-db-replica.service.consul
).
🚀 Execution
Section titled “🚀 Execution”- Validate that physical replication from the Source Cluster to the Target Cluster is healthy (e.g., replication lag is within acceptable limits).
- Merge the MR to update the application’s read endpoint from the Source Cluster (e.g.,
db-replica.service.consul
) to the Target Cluster (e.g.,sec-db-replica.service.consul
). - Validate that application read traffic is now routed to the Target Cluster’s read replicas (
sec-db-replica.service.consul
). - Monitor application-level metrics (e.g., tuple fetches, index usage) and PgBouncer performance on the Target Cluster.
- Conduct full QA and functional validation during a defined observation window to ensure application correctness and stability.
🔁 Rollback
Section titled “🔁 Rollback”- Revert the MR to restore the application’s read endpoint to the Source Cluster (e.g.,
db-replica.service.consul
). - Confirm that read queries are once again routed to the Source Cluster, and validate application behavior for correctness.
Full Cutover using Logical Replication
Section titled “Full Cutover using Logical Replication”🔧 Preparation
Section titled “🔧 Preparation”- Convert the Target Cluster from physical to logical replication.
- Freeze DDL on all decomposed tables in the Source Cluster (e.g., Main cluster), if possible; otherwise, implement a full DDL freeze for the entire Source Cluster via a feature flag.
🔁 Read Traffic Switchover to Target Cluster (Logical Replicas)
Section titled “🔁 Read Traffic Switchover to Target Cluster (Logical Replicas)”- Follow the steps from Read Traffic Switchover to re-apply the read traffic migration, now targeting the Target Cluster, which is replicating data via logical replication.
- Confirm the application continues to read from the Target Cluster (
sec-db-replica.service.consul
). - Observe application logs and Prometheus metrics to verify read traffic is routed to the Target Cluster’s read replicas and validate correctness.
✍️ Write Traffic Switchover using PgBouncer Reconfiguration
Section titled “✍️ Write Traffic Switchover using PgBouncer Reconfiguration”- Disable Chef and PAUSE PgBouncers on the Target Cluster (
pgbouncer-sec
,pgbouncer-sidekiq-sec
). - Sync Postgres sequences for decomposed tables from the Source Cluster to the Target Cluster.
- Ensure Logical Replication lag is 0 bytes.
- Drop the logical replication subscription on the Target Cluster.
- Apply write-lock triggers on the Source Cluster for all decomposed tables—ideally before creating reverse replication; if not feasible, apply them immediately after RESUMEing PgBouncers.
- Create reverse Logical Replication:
- Publication on the Target Cluster
- Subscription on the Source Cluster
- Update PgBouncer configs on the Target Cluster to point to its RW endpoint (
master.patroni-sec
) and reload PgBouncers. - RESUME PgBouncers.
- Validate that application write traffic, in addition to previously switched read traffic, is now routed to the Target Cluster (
master.patroni-sec
andsec-db-replica.service.consul
). - Merge Chef MR to persist PgBouncer configuration changes.
- Sequentially run
chef-client
on PgBouncers. - Re-enable Chef on all affected nodes.
Automation: Use the
switchover.yml
Ansible playbook to automate most of the Read and Write Traffic Switchover steps. See the References section for the playbook repo and the related CR.
✅ Post-Switchover Tasks
Section titled “✅ Post-Switchover Tasks”Post-Switchover QA Tests
Section titled “Post-Switchover QA Tests”- Run full E2E QA test suite against the decomposed environment.
- You may proceed with wrapping up the upgrade while monitoring test results in parallel.
Wrapping Up
Section titled “Wrapping Up”- Remove any silences.
- Set up wal-g daily restore schedule for the Target Cluster.
- Ensure Smoke tests (automated via MR enabling
db_database_tasks
on k8s) and the Full manual run have both passed.
🚪 Close Rollback Window
Section titled “🚪 Close Rollback Window”Rollback Window: After switchover, a rollback window (e.g., 4 hours) allows reverting to the previous state without data loss, as logical replication continues replicating changes from the Target Cluster back to the Source Cluster. See the Rollback Plan for details.
Note: The rollback window ensures a safe reversion path, backed by reverse logical replication from the Target Cluster to the Source Cluster.
- Run the Ansible playbook to stop reverse logical replication:
stop_reverse_replication.yml
. - On the Source Cluster’s leader/writer node, drop the subscription (if still existing) for reverse logical replication.
- On the Target Cluster’s leader/writer node, drop the publication and associated
logical_replication_slot
for reverse replication. - Unfreeze DDL on all decomposed tables, preferably, otherwise, unfreeze DDL for the entire cluster via a feature flag.
Automation: Use the
stop_reverse_replication.yml
Ansible playbook to automate the safe closure of the rollback window. See the References section for the playbook repo.
🔁 Rollback Plan: Full Traffic Reversal
Section titled “🔁 Rollback Plan: Full Traffic Reversal”📖 Read Rollback
Section titled “📖 Read Rollback”- Revert application configuration to use the Source Cluster’s read endpoint (
db-replica.service.consul
). - Validate that read traffic routes correctly.
- Confirm that application read traffic is now routed to the Source Cluster’s read replicas (
db-replica.service.consul
).
✍️ RW Rollback (Write Traffic)
Section titled “✍️ RW Rollback (Write Traffic)”- Disable Chef on PgBouncer hosts.
- Drop write-lock triggers from the Source Cluster (Main).
- PAUSE PgBouncers.
- Sync Postgres sequences for all decomposed tables from the Target Cluster back to the Source Cluster.
- Wait for logical replication lag = 0 bytes.
- Drop the reverse logical replication subscription from the Source Cluster, and the corresponding publication from the Target Cluster.
- Apply write-lock triggers on the Target Cluster for all decomposed tables. If not feasible, apply them immediately after RESUMING PgBouncers.
- Update PgBouncer configuration to reconnect to the Source Cluster’s RW endpoint (
master.patroni
). - RESUME PgBouncers.
- Validate that application write traffic, along with previously switched read traffic, is now routed to the Source Cluster (
master.patroni
anddb-replica.service.consul
). - Merge the Chef MR to persist PgBouncer configuration changes.
- Sequentially run
chef-client
on PgBouncer hosts. - Re-enable Chef on all affected nodes.
Automation: Use the
switchover_rollback.yml
Ansible playbook to automate most of the RW rollback steps. See the References section for the playbook repo.
🧹 Cleanup
Section titled “🧹 Cleanup”- Merge Terraform MR to decommission the Target Cluster, if no longer needed.
✅ Benefits of This Approach
Section titled “✅ Benefits of This Approach”- Zero-downtime cutover for both read and write traffic.
- Staged and reversible migration process.
- No Consul service endpoint manipulation required.
- Avoids split-brain scenarios by using one-directional Logical Replication and write-lock triggers.
📌 Future Enhancements
Section titled “📌 Future Enhancements”The following improvements are planned for a future iteration of this runbook:
- Add diagrams to visualize traffic flow, cluster roles, and switchover stages.
- Include application-specific steps tailored for the Sec application.
- Expand automation coverage for switchover steps and validation.