PITR and Streaming Replication environments
Recently, I had to address a support case where a user was struggling to perform Point-in-Time Recovery in his PostgreSQL cluster using streaming replication. We have already discussed how to achieve a successful recovery in our past webinars, but the question remains: what is the impact on the standby servers? In this specific case, the user was unsure whether he needed to apply the Point-in-Time Recovery steps on the standby as well, and after doing so, the standby refused to connect to the primary.
In this post, we will examine this specific use case, perform a recovery on the primary, and explore the best approach to re-synchronize the standby servers.
For the purpose of this post, we will use 2 nodes called pg1 and pg2. Both are running on Rocky Linux 9.
Installation
On both pg1 and pg2 server, first configure the PGDG yum repositories:
$ sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/\ EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm $ sudo dnf -qy module disable postgresql
Then, install PostgreSQL and create a basic PostgreSQL cluster on pg1:
$ sudo dnf install -y postgresql16-server $ sudo PGSETUP_INITDB_OPTIONS="--data-checksums"\ /usr/pgsql-16/bin/postgresql-16-setup initdb $ sudo systemctl enable postgresql-16 --now $ sudo -iu postgres ps -o pid,cmd fx
As we are talking about Point-in-time Recovery, we need to store the WAL archives in a location shared across the nodes. In this example, we will create an archives
directory inside a /shared
NFS mount:
$ sudo -iu postgres mkdir /shared/archives
Configure the listening addresses and enable archiving in the postgresql(.auto).conf file:
listen_addresses = '*' archive_mode = on archive_command = 'test ! -f /shared/archives/%f && cp %p /shared/archives/%f'
Finally, restart the PostgreSQL service using:
$ sudo systemctl restart postgresql-16.service
Setup the streaming replication
On pg1, create a specific user for the replication:
$ sudo -iu postgres psql postgres=# CREATE ROLE replic_user WITH LOGIN REPLICATION PASSWORD 'mypwd';
Configure pg_hba.conf:
host replication replic_user pg2 scram-sha-256
Reload configuration and allow the service in the firewall (if needed):
$ sudo systemctl reload postgresql-16.service $ sudo firewall-cmd --permanent --add-service=postgresql $ sudo firewall-cmd --reload
Configure ~postgres/.pgpass on pg2:
$ sudo su - postgres $ echo "*:*:replication:replic_user:mypwd" >> ~postgres/.pgpass $ chown postgres: ~postgres/.pgpass $ chmod 0600 ~postgres/.pgpass
Initialize the standby server on pg2 by copying the data directory of pg1 using pg_basebackup
:
$ sudo su - postgres $ pg_basebackup -h pg1 -U replic_user -D /var/lib/pgsql/16/data\ --write-recovery-conf --checkpoint=fast --slot='pg2' --create-slot --progress 23211/23211 kB (100%), 1/1 tablespace $ cat /var/lib/pgsql/16/data/postgresql.auto.conf |grep primary_conninfo primary_conninfo = '...'
The pg_basebackup
command will create the pg2 physical replication slot, so the primary will keep all the necessary WAL files needed for the standby to catch-up.
Start and enable on startup the PostgreSQL service using:
$ sudo systemctl enable postgresql-16 --now $ sudo -iu postgres ps -o pid,cmd fx
Check the streaming replication status using pg_stat_replication
and pg_replication_slots
on pg1, and pg_stat_wal_receiver
on pg2:
postgres=# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 2599 usesysid | 16384 usename | replic_user application_name | walreceiver client_addr | 192.168.121.101 client_hostname | pg2 client_port | 46890 backend_start | 2024-09-03 09:12:36.20348+00 backend_xmin | state | streaming sent_lsn | 0/3000060 write_lsn | 0/3000060 flush_lsn | 0/3000060 replay_lsn | 0/3000060 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2024-09-03 09:12:55.094425+00
postgres=# SELECT * FROM pg_replication_slots; -[ RECORD 1 ]-------+---------- slot_name | pg2 plugin | slot_type | physical datoid | database | temporary | f active | t active_pid | 2599 xmin | catalog_xmin | restart_lsn | 0/3000060 confirmed_flush_lsn | wal_status | reserved safe_wal_size | two_phase | f conflicting |
postgres=# SELECT * FROM pg_stat_wal_receiver; -[ RECORD 1 ]---------+------------------------------ pid | 2289 status | streaming receive_start_lsn | 0/3000000 receive_start_tli | 1 written_lsn | 0/3000060 flushed_lsn | 0/3000000 received_tli | 1 last_msg_send_time | 2024-09-03 09:13:36.230958+00 last_msg_receipt_time | 2024-09-03 09:13:35.104889+00 latest_end_lsn | 0/3000060 latest_end_time | 2024-09-03 09:12:36.207722+00 slot_name | pg2 sender_host | pg1 sender_port | 5432 conninfo | ...
Database content initialization
Let’s now create some test data using pgbench
:
$ sudo -iu postgres createdb test $ sudo -iu postgres /usr/pgsql-16/bin/pgbench -i -s 600 test
Since we want to experiment Point-in-time Recovery, we also need to take an initial backup of pg1:
$ sudo su - postgres $ mkdir /shared/backups $ pg_basebackup -D "/shared/backups/$(date +'%F_%T')" \ --format=plain --wal-method=none --checkpoint=fast --progress NOTICE: all required WAL segments have been archived 9218540/9218540 kB (100%), 1/1 tablespace
Generate some activity using pgbench
:
$ /usr/pgsql-16/bin/pgbench -c 4 -j 2 -T 300 test
Let’s dig into our database content to find out a possible restore time using the pgbench_history.mtime
column:
test=# SELECT min(mtime),max(mtime) FROM pgbench_history; min | max ----------------------------+---------------------------- 2024-09-03 09:25:30.474732 | 2024-09-03 09:30:30.469797 (1 row)
test=# SELECT * FROM pgbench_history WHERE mtime between '2024-09-03 09:27:00' AND '2024-09-03 09:28:00' ORDER BY mtime DESC LIMIT 10; tid | bid | aid | delta | mtime | filler ------+-----+----------+-------+----------------------------+-------- 614 | 206 | 2388179 | 1489 | 2024-09-03 09:27:59.999669 | 5278 | 569 | 38093646 | 235 | 2024-09-03 09:27:59.999645 | 2470 | 160 | 6196059 | -1997 | 2024-09-03 09:27:59.99962 | 598 | 466 | 27063999 | 4034 | 2024-09-03 09:27:59.999108 | 542 | 438 | 50846207 | 2724 | 2024-09-03 09:27:59.997713 | 568 | 452 | 17938862 | 1501 | 2024-09-03 09:27:59.996939 | 1738 | 377 | 55042253 | 3608 | 2024-09-03 09:27:59.996903 | 4002 | 497 | 16634903 | -2938 | 2024-09-03 09:27:59.996889 | 5551 | 577 | 382773 | 4765 | 2024-09-03 09:27:59.995913 | 2398 | 51 | 8128754 | 2714 | 2024-09-03 09:27:59.995004 | (10 rows)
mtime
is a timestamp without time zone, which could potentially create some issues but for our example we will try to restore up until 2024-09-03 09:28:00
.
Perform Point-in-time Recovery on the primary
First, stop the PostgreSQL service and (if you get enough disk space) save the old data directory:
$ sudo systemctl stop postgresql-16 $ sudo mv /var/lib/pgsql/16/data /var/lib/pgsql/16/data.old
Restore the backup and create the recovery signal:
$ sudo su - postgres $ cp -rfp /shared/backups/2024-09-03_09\:21\:46/ /var/lib/pgsql/16/data $ touch /var/lib/pgsql/16/data/recovery.signal
Add the recovery settings to the postgresql(.auto).conf:
restore_command = 'cp /shared/archives/%f %p' recovery_target_action = 'promote' recovery_target_time = '2024-09-03 09:28:00' recovery_target_inclusive = off
Start the PostgreSQL service and watch the recovery process in the logs:
LOG: starting point-in-time recovery to 2024-09-03 09:28:00+00 LOG: starting backup recovery with redo LSN 1/D0000028,... LOG: restored log file "0000000100000001000000D0" from archive LOG: restored log file "0000000100000001000000D1" from archive LOG: completed backup recovery with redo LSN 1/D0000028 and end LSN 1/D0000138 LOG: consistent recovery state reached at 1/D0000138 ... LOG: recovery stopping before commit of transaction 259244, time 2024-09-03 09:28:00.000048+00 LOG: redo done at 2/B1F61158 LOG: last completed transaction was at log time 2024-09-03 09:27:59.998357+00 LOG: selected new timeline ID: 2 LOG: archive recovery complete LOG: database system is ready to accept connections
Check the database content:
test=# SELECT min(mtime),max(mtime) FROM pgbench_history; min | max ----------------------------+---------------------------- 2024-09-03 09:25:30.474732 | 2024-09-03 09:27:59.997713 (1 row)
Obviously, since we left the standby server untouched, the streaming replication is broken. You should see this kind of messages in the PostgreSQL logs:
FATAL: could not start WAL streaming: ERROR: replication slot "pg2" does not exist LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 3/760000A0 LOG: waiting for WAL to become available at 3/760000B8
Standby server re-synchronization
There are several possibilities for re-synchronizing the standby server. The first approach is to reuse the same command that was used to build it the first time, using pg_basebackup
.
After a successful recovery, a common best practice is to take a fresh full backup of the primary. Instead of using pg_basebackup
to rebuild the standby server, we could use it to take a backup of the primary and then restore that backup on the standby!
Finally, the approach we will explore in this blog post is to rewind the standby server using the old backup. We will ask PostgreSQL to replay the WAL archives up to the current primary location, follow the new timeline, and then stop. Once PostgreSQL has been recovered to that location, we can remove the recovery settings and re-enable streaming replication.
To determine the current location of the primary, we will first create a physical replication slot on pg1. This ensures that the primary retains all the necessary WAL files needed for the standby to catch up.
postgres=# SELECT * FROM pg_create_physical_replication_slot( slot_name := 'pg2', immediately_reserve := TRUE); slot_name | lsn -----------+------------ pg2 | 2/B2323178
On pg2, let’s now stop the PostgreSQL service and (if you get enough disk space) save the old data directory:
$ sudo systemctl stop postgresql-16 $ sudo mv /var/lib/pgsql/16/data /var/lib/pgsql/16/data.old
Restore the old backup and create the recovery signal:
$ sudo su - postgres $ cp -rfp /shared/backups/2024-09-03_09\:21\:46/ /var/lib/pgsql/16/data $ touch /var/lib/pgsql/16/data/recovery.signal
Add the recovery settings to the postgresql(.auto).conf:
archive_mode = off restore_command = 'cp /shared/archives/%f %p' recovery_target_action = 'shutdown' recovery_target_lsn = '2/B2323178' recovery_target_timeline = 'latest'
Start the PostgreSQL service and watch the recovery process in the logs:
LOG: starting point-in-time recovery to WAL location (LSN) "2/B2323178" LOG: starting backup recovery with redo LSN 1/D0000028,... LOG: restored log file "00000002.history" from archive LOG: restored log file "0000000100000001000000D0" from archive LOG: redo starts at 1/D0000028 LOG: restored log file "0000000100000001000000D1" from archive LOG: completed backup recovery with redo LSN 1/D0000028 and end LSN 1/D0000138 LOG: consistent recovery state reached at 1/D0000138 LOG: database system is ready to accept read-only connections ... LOG: restored log file "0000000100000002000000B0" from archive LOG: restored log file "0000000200000002000000B1" from archive LOG: restored log file "0000000200000002000000B2" from archive LOG: recovery stopping after WAL location (LSN) "2/B2323178" LOG: shutdown at recovery target
The standby server should be stopped. So we can now replace the recovery settings by the streaming replication settings in postgresql(.auto).conf:
primary_conninfo = 'user=replic_user host=pg1' primary_slot_name = 'pg2'
Create the standby signal:
$ sudo -iu postgres mv 16/data/recovery.signal 16/data/standby.signal
Start the PostgreSQL service again and watch the logs:
LOG: database system was shut down in recovery LOG: entering standby mode LOG: redo starts at 2/B1F611D0 LOG: consistent recovery state reached at 2/B23231B0 LOG: database system is ready to accept read-only connections LOG: started streaming WAL from primary at 2/B3000000 on timeline 2
Check the streaming replication status using pg_stat_replication
and pg_replication_slots
on pg1, and pg_stat_wal_receiver
on pg2:
postgres=# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 4955 usesysid | 16384 usename | replic_user application_name | walreceiver client_addr | 192.168.121.101 client_hostname | pg2 client_port | 35622 backend_start | 2024-09-03 10:14:48.468542+00 backend_xmin | state | streaming sent_lsn | 2/B3000148 write_lsn | 2/B3000148 flush_lsn | 2/B3000148 replay_lsn | 2/B3000148 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2024-09-03 10:15:47.430308+00
postgres=# SELECT * FROM pg_replication_slots; -[ RECORD 1 ]-------+----------- slot_name | pg2 plugin | slot_type | physical datoid | database | temporary | f active | t active_pid | 4955 xmin | catalog_xmin | restart_lsn | 2/B3000148 confirmed_flush_lsn | wal_status | reserved safe_wal_size | two_phase | f conflicting |
postgres=# SELECT * FROM pg_stat_wal_receiver; -[ RECORD 1 ]---------+------------------------------ pid | 4419 status | streaming receive_start_lsn | 2/B3000000 receive_start_tli | 2 written_lsn | 2/B3000148 flushed_lsn | 2/B3000148 received_tli | 2 last_msg_send_time | 2024-09-03 10:16:18.569324+00 last_msg_receipt_time | 2024-09-03 10:16:17.443183+00 latest_end_lsn | 2/B3000148 latest_end_time | 2024-09-03 10:14:48.533651+00 slot_name | pg2 sender_host | pg1 sender_port | 5432 conninfo | ...
And that’s it: the standby server is back online!
Conclusion
The fastest way to re-sync a standby server after PITR is usually to create a new base backup and set up the standby again. Rebuilding from a fresh backup is generally quicker than replaying WAL archives from the point of recovery.
And if you know me, you probably already know that I’d recommend using pgBackRest over pg_basebackup
. The pgBackRest --delta
restore would save a lot of time when restoring a backup to refresh a standby server.