Adding a PostgreSQL replica
In order to add a PostgreSQL replica to the Patroni cluster, please refer to patroni-management.md.
Other examples here are largely based on taking a basebackup manually.
Using pg_basebackup to resync a replica
Section titled “Using pg_basebackup to resync a replica”If a replica gets out of sync or fails for some reason, you have several options to recover:
- Run pg_basebackup from the secondary
- Use WAL-G (or older WAL-E) to restore a backup and catch up from there
- Take a disk snapshot of the primary and clone it on the secondary. Be sure to drop replication slots on the secondary after it comes up.
Running pg_basebackup
Section titled “Running pg_basebackup”All commands are supposed to be run on the replica in question if not stated otherwise.
$upstream is the hostname of the upstream database box we want to use
to resync. This is not necessarily the primary as we can (and should!)
use another secondary for this purpose.
- Stop PostgreSQL:
gitlab-ctl stop postgresql - Backup
postgresql.auto.conffor later reference:cp /var/opt/gitlab/postgreql/data/postgresql.auto.conf /var/opt/gitlab/postgreql/postgresql.auto.conf.$(date +%F) - Remove old data directory
mv /var/opt/gitlab/postgreql/data{,.bak} - Connect to
$upstreamand create physical replication slot on agitlab-psqlsession:select pg_create_physical_replication_slot($slot)with$slotbeing a random string you define - Start
pg_basebackupprocess (preferably in tmux session as this takes a while):sudo -u gitlab-psql PGSSLMODE=disable /opt/gitlab/embedded/bin/pg_basebackup -D /var/opt/gitlab/postgresql/data --slot=$slot -c fast -X stream -P --host=$upstream -p 5432 --username=gitlab-replicator -R - Once finished, review
postgresql.auto.confand compare with backup in/var/opt/gitlab/postgreql/postgresql.auto.conf.$(date +%F). Check upstream is$upstream. - Start PostgreSQL:
gitlab-ctl start postgresql - Let the new replica catch up and become in-sync with
$upstream. - If
$upstreamwas a secondary, reconfigure to use the primary and also drop the replication slot created in (5) by connecting to$upstreamand perform aselect pg_drop_replication_slot($slot)there - Persist
postgresql.auto.confconfiguration in Chef usinggitlab-server::postgresql-standbyrecipe (example). - Make sure to clean up and remove the old data directory in
/var/opt/gitlab/postgreql/data.bak
After having started PostgreSQL again, there are the following phases:
- Crash recovery (during this time, the secondary is not accessible: FATAL: the database system is starting up)
- Catchup with upstream (secondary is accessible but lags behind)
Useful things to look at:
/var/log/gitlab/postgresql/currentselect * from pg_stat_replicationon$upstream
Here’s an example of a postgresql.auto.conf that has both streaming
replication and archive recovery enabled (note $upstream, $fqdn, $slot
need to be replaced):
primary_conninfo = 'user=gitlab-replicator password=REDACTED host='$upstream' port=5432 fallback_application_name=repmgr sslmode=prefer sslcompression=0 application_name='$fqdn''recovery_target_timeline = 'latest'primary_slot_name = $slot
restore_command = '/usr/bin/envdir /etc/wal-g.d/env /opt/wal-g/bin/wal-g wal-fetch -p 32 "%f" "%p"'