Table of contents
Preamble
We are just starting our PostgreSQL journey but we already have requirements for high availability. This can be achieved by a (simple) Operating System cluster that balancing a virtual IP and the lead on shared disks on a remote server. In our particular case the disk array is kept up–to-date (synchronous or not) with a black fiber link between our data centers.
Same as Data Guard for Oracle there is a pure database solution with PostgreSQL called Write-Ahead Log (WAL) Shipping or physical replication. Same as Data Guard this solution has the benefit of reducing the required bandwidth over the network as well as, for PostgreSQL, making this standby server available for free for read only queries.
From official documentation:
Warm and hot standby servers can be kept current by reading a stream of write-ahead log (WAL) records. If the main server fails, the standby contains almost all of the data of the main server, and can be quickly made the new primary database server. This can be synchronous or asynchronous and can only be done for the entire database server.
The overall picture of PostgreSQL physical replication is the following:
My test environment will be made of two virtual machines:
- server3.domain.com (192.168.56.103) will be my primary server
- server4.domain.com (192.168.56.104) will be my standby server
There are both running Oracle Enterprise Linux OEL 8.4 and PostgreSQL 14.
In a further post I will probably give a try with Replication Manager for PostgreSQL clusters (repmgr) but at the time of writing this post the tool is not yet compatible with PostgreSQL 14…
Primary server setup
Installed PostgreSQL directly from official repository:
[root@server3 ~]# dnf -y install postgresql14-server |
Created a dedicated mount point:
[root@server3 /]# ll -d /postgres drwxr-xr-x 3 postgres postgres 20 Oct 29 12:18 /postgres [root@server3 ~]# df -h /postgres Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg00-lvol30 1014M 40M 975M 4% /postgres |
Customized the profile by adding few shortcuts:
[postgres@server3 ~]$ cat /var/lib/pgsql/.pgsql_profile PATH=/usr/pgsql-14/bin:$PATH export PATH MANPATH=/usr/local/pgsql/share/man:$MANPATH export MANPATH PGDATA=/postgres/data01 export PGDATA alias psql="psql --port=5433" alias pg_start="cd ${PGDATA};pg_ctl -l logfile start" alias pg_restart="cd ${PGDATA};pg_ctl -l logfile restart" alias pg_stop="cd ${PGDATA};pg_ctl -l logfile stop" |
Initialized the starter instance with:
[postgres@server3 ~]$ pg_ctl initdb -D /postgres/data01 The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory /postgres/data01 ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... CET creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: /usr/pgsql-14/bin/pg_ctl -D /postgres/data01 -l logfile start |
Then updated a bit the postgresql.conf parameter file:
[postgres@server3 data01]$ cp postgresql.conf postgresql.conf.org [postgres@server3 data01]$ egrep -v "(^\s*#|^#)" postgresql.conf.org | grep . > postgresql.conf [postgres@server3 data01]$ vi postgresql.conf [postgres@server3 data01]$ cat postgresql.conf listen_addresses = 'server3.domain.com' port = 5433 archive_mode = on archive_command = 'test ! -f /postgres/arch/%f && cp %p /postgres/arch/%f' max_connections = 100 # (change requires restart) shared_buffers = 128MB # min 128kB dynamic_shared_memory_type = posix # the default is the first option max_wal_size = 1GB min_wal_size = 80MB log_destination = 'stderr' # Valid values are combinations of logging_collector = on # Enable capturing of stderr and csvlog log_directory = 'log' # directory where log files are written, log_filename = 'postgresql-%a.log' # log file name pattern, log_rotation_age = 1d # Automatic rotation of logfiles will log_rotation_size = 0 # Automatic rotation of logfiles will log_truncate_on_rotation = on # If on, an existing log file with the log_line_prefix = '%m [%p] ' # special values: log_timezone = 'CET' datestyle = 'iso, mdy' timezone = 'CET' lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' |
For administrative tools I create my own personal account:
postgres=# create role yjaquier with superuser login password 'secure_password'; CREATE ROLE |
Added in pg_hba.conf to allow me to connect from anywhere:
host all yjaquier 0.0.0.0/0 md5 |
It is strongly recommended to create a dedicated replication user with:
postgres=# create role repuser with replication login password 'secure_password'; CREATE ROLE |
Added in pg_hba.conf to allow replication related connection from standby node:
host replication all 192.168.56.0/24 md5
Remark:
You can further increase security by putting the exact IP address of your standby server and /32 network mask. Example: 192.168.56.104/32. You can also specify the repuser we have created for replication purpose…
Standby server setup for physical replication
Your standby server must be created with a copy of your primary primary instance or you will get this error message:
2021-11-01 11:52:07.697 CET [7416] FATAL: database system identifier differs between the primary and standby 2021-11-01 11:52:07.697 CET [7416] DETAIL: The primary's identifier is 7024430574144521833, the standby's identifier is 7024465441503407932. |
I will use the default PostgreSQL backup tool (https://blog.yannickjaquier.com/postgresql/postgresql-backup-and-restore-tools-comparison-for-pitr-recovery.html#pg_basebackup) called pg_basebackup. Full instance backup:
[postgres@server3 ~]$ pg_basebackup --pgdata=/postgres/backup/full_backup --format=t --compress=9 --progress --verbose --port=5433 --username=postgres pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/5000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_12536" 34891/34891 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 0/5000138 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed |
To simulate users’ activity on my primary database I create a new database and a test table with one row after I have performed the full backup:
postgres=# create database testdb; CREATE DATABASE postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# create table test01(id integer, descr varchar(20)); CREATE TABLE testdb=# insert into test01 values(1,'One'); INSERT 0 1 |
Copy the backup files to your standby server (using scp):
[postgres@server3 data01]$ scp -rp /postgres/backup/full_backup server4.domain.com:/postgres/backup postgres@server4.domain.com's password: base.tar.gz 100% 4294KB 46.1MB/s 00:00 pg_wal.tar.gz 100% 17KB 9.4MB/s 00:00 backup_manifest |
On the standby server restore your full backup in an empty PGDATA directory with something like:
[postgres@server4 data01]$ cd ${PGDATA} [postgres@server4 data01]$ tar xf /postgres/backup/full_backup/base.tar.gz [postgres@server4 data01]$ tar xf /postgres/backup/full_backup/pg_wal.tar.gz --directory pg_wal |
Then after you have two options to setup WAL shipping.
Without streaming replication
In this situation it implies that your standby server has an access to the WAL archive directory of your primary server (NFS or whatever) as explained in official documentation:
The archive location should be accessible from the standby even when the primary is down, i.e., it should reside on the standby server itself or another trusted server, not on the primary server.
Or that you have a process to stream your primary WAL files to the standby server. To test this (sub-optimal) method I have set in my standby postgresql.conf file (listen_addresses also modified to match my standby server name):
restore_command = 'cp /postgres/arch/%f %p' |
Create the important standby.signal file:
[postgres@server4 data01]$ touch standby.signal |
Start the standby PostgreSQL instance:
[postgres@server4 data01]$ pg_ctl -l logfile start |
One cool feature with PostgreSQL is that your standby server is accessible in read-only mode by default (hot_standby = on by default). So for Oracle database guys you have active Data Guard for free. If you connect to the standby instance you see all is there:
postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres testdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | (4 rows) postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# \dt List of relations Schema | Name | Type | Owner --------+--------+-------+---------- public | test01 | table | postgres (1 row) testdb=# select * from test01; id | descr ----+------- 1 | One (1 row) |
If you insert new rows in the test01 test table and copy the generated archive file (I have set archive_timeout = 1min on my primary server to have archive command triggered more often, never do this on a production server). Ultimately you can also copy (scp) the latest generated WAL files from the pg_wal directory of your primary to your standby server you will have your standby server process them::
2021-11-05 17:33:09.883 CET [84126] LOG: restored log file "000000010000000000000011" from archive 2021-11-05 17:33:24.901 CET [84126] LOG: restored log file "000000010000000000000012" from archive |
This because PostgreSQL log-shipping replication is made like this:
At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_wal directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_wal. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_wal, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file.
But of course there is a much more elegant solution with streaming replication…
With streaming replication
By far the best method. Update postgresql.conf change listen_addresses to match standby server name, add same restore_command and add primary_conninfo parameter (instead of supplying password in clear text in postgresql.conf file you can also use ~/.pgpass file):
restore_command = 'cp /postgres/arch/%f %p' primary_conninfo = 'host=server3.domain.com port=5433 user=repuser password=secure_password options=''-c wal_sender_timeout=5000''' |
Create the important standby.signal file:
[postgres@server4 data01]$ touch standby.signal |
Start the standby PostgreSQL instance:
[postgres@server4 data01]$ pg_ctl -l logfile start |
And magically the standby server is recovering what has been done on live server since your full backup. If you check the log you see output like:
2021-11-01 14:29:33.568 CET [17242] LOG: starting PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit 2021-11-01 14:29:33.568 CET [17242] LOG: listening on IPv4 address "192.168.56.104", port 5433 2021-11-01 14:29:33.570 CET [17242] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433" 2021-11-01 14:29:33.571 CET [17242] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433" 2021-11-01 14:29:33.575 CET [17244] LOG: database system was interrupted; last known up at 2021-11-01 12:25:59 CET 2021-11-01 14:29:33.835 CET [17244] LOG: entering standby mode 2021-11-01 14:29:33.838 CET [17244] LOG: redo starts at 0/5000028 2021-11-01 14:29:33.841 CET [17244] LOG: consistent recovery state reached at 0/5000138 2021-11-01 14:29:33.841 CET [17242] LOG: database system is ready to accept read-only connections 2021-11-01 14:29:33.885 CET [17248] LOG: started streaming WAL from primary at 0/6000000 on timeline 1 |
If I insert a new row in my primary database:
testdb=# insert into test01 values(2,'Two'); INSERT 0 1 |
I immediately see it in the hot standby node without any file copy this time:
testdb=# select * from test01; id | descr ----+------- 1 | One 2 | Two (2 rows) |
Replication slots for physical replication
As the documentation says:
Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys. In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_size, or by storing the segments in an archive using archive_command.
Create a replication slot with:
postgres=# SELECT * FROM pg_create_physical_replication_slot('server3_slot'); slot_name | lsn --------------+----- server3_slot | (1 row) postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots; slot_name | slot_type | active --------------+-----------+-------- server3_slot | physical | f (1 row) |
Add it in postgres.conf file of your standby instance:
primary_slot_name = 'server3_slot' |
Then on primary the active column changed:
postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots; slot_name | slot_type | active --------------+-----------+-------- server3_slot | physical | t (1 row) |
To drop the replication slot, either stop the standby (cleaning process) or unset primary_slot_name parameter on your standby instance, then:
postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots; slot_name | slot_type | active --------------+-----------+-------- server3_slot | physical | f (1 row) postgres=# select pg_drop_replication_slot('server3_slot'); pg_drop_replication_slot -------------------------- (1 row) |
Promoting a standby node
Reading the official documentation I have not seen any mention to switchover (switching from primary to standby when both are still up). They only mention failover (the primary is dead). Let’s simulate this by killing my primary server:
[postgres@server3 data01]$ ps -ef | grep pgsql-14 | grep -v grep postgres 4485 1 0 10:37 ? 00:00:00 /usr/pgsql-14/bin/postgres [postgres@server3 data01]$ kill -9 4485 |
Something must be done on the standby server as it is still in read only mode:
testdb=# insert into test01 values(10,'Ten'); ERROR: cannot execute INSERT in a read-only transaction |
You can either issue run pg_ctl promote or execte pg_promote(), let’s try with pg_ctl promote:
[postgres@server4 data01]$ pg_ctl promote waiting for server to promote.... done server promoted |
And that’s it (standby.signal file has been deleted from ${PGDATA} directory):
testdb=# insert into test01 values(10,'Ten'); INSERT 0 1 |
In this situation the standby is your new primary and the previous primary will have to be re-instantiated to become a standby. To help to re-synchronize if you have big PostgreSQL cluster you might want to study pg_rewind tool…
Physical replication monitoring
On primary instance one very useful table is pg_stat_replication, we can see:
postgres=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | re play_lag | sync_priority | sync_state | reply_time -------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+--- ---------+---------------+------------+------------------------------- 17763 | 16386 | repuser | walreceiver | 192.168.56.104 | | 54096 | 2021-11-01 14:29:33.487621+01 | | streaming | 0/601A308 | 0/601A308 | 0/601A308 | 0/601A308 | | | | 0 | async | 2021-11-01 14:59:02.010878+01 (1 row) |
On the standby instance you can compute the apply lag with a query like:
postgres=# select last_msg_receipt_time, now(), now()-last_msg_receipt_time delay,to_char(current_timestamp-last_msg_receipt_time,'hh24:mi:ss:ms') delay from pg_stat_wal_receiver; last_msg_receipt_time | now | delay | delay ------------------------------+-------------------------------+-----------------+-------------- 2021-11-01 15:47:26.11948+01 | 2021-11-01 15:47:27.362229+01 | 00:00:01.242749 | 00:00:01:242 (1 row) |
All in one pg_basebackup remote command
The de-facto PostgreSQL backup tool (pg_basebackup) provide a all-in-one (complex) command to duplicate the primary instance to your standby server and configure the log-shipping replication at same time (RMAN duplicate for Oracle DBAs). From my standby node in an empty PGDATA directory issue something like:
[postgres@server4 data01]$ pg_basebackup --host=server3.domain.com --username=repuser --port=5433 --pgdata=/postgres/data01/ --format=p --wal-method=stream --progress --write-recovery-conf --create-slot --slot=server3_slot Password: 34921/34921 kB (100%), 1/1 tablespace |
Required options are:
- –format=p Backup in plain text
- –wal-method=stream Include WAL streaming
- –write-recovery-conf Automatically generate replication configuration
- –create-slot Create the replication slot automatically
- –slot=server3_slot Provide the name of the replication slot
Checking my target /postgres/data01 directory all is there and even the standby.signal file:
[postgres@server4 data01]$ ll total 252 -rw------- 1 postgres postgres 225 Nov 1 17:55 backup_label -rw------- 1 postgres postgres 181149 Nov 1 17:55 backup_manifest drwx------ 6 postgres postgres 54 Nov 1 17:55 base -rw------- 1 postgres postgres 30 Nov 1 17:55 current_logfiles drwx------ 2 postgres postgres 4096 Nov 1 17:55 global drwx------ 2 postgres postgres 58 Nov 1 17:55 log -rw------- 1 postgres postgres 1315 Nov 1 17:55 logfile drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_commit_ts drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_dynshmem -rw------- 1 postgres postgres 4925 Nov 1 17:55 pg_hba.conf -rw------- 1 postgres postgres 1636 Nov 1 17:55 pg_ident.conf drwx------ 4 postgres postgres 68 Nov 1 17:55 pg_logical drwx------ 4 postgres postgres 36 Nov 1 17:55 pg_multixact drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_notify drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_replslot drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_serial drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_snapshots drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_stat drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_stat_tmp drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_subtrans drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_tblspc drwx------ 2 postgres postgres 6 Nov 1 17:55 pg_twophase -rw------- 1 postgres postgres 3 Nov 1 17:55 PG_VERSION drwx------ 3 postgres postgres 60 Nov 1 17:55 pg_wal drwx------ 2 postgres postgres 18 Nov 1 17:55 pg_xact -rw------- 1 postgres postgres 376 Nov 1 17:55 postgresql.auto.conf -rw------- 1 postgres postgres 1190 Nov 1 17:55 postgresql.conf -rw------- 1 postgres postgres 28750 Nov 1 17:55 postgresql.conf.org -rw------- 1 postgres postgres 0 Nov 1 17:55 standby.signal |
The replication configuration is in postgres.auto.conf file. Replication slot and so on is already there and when trying to insert a new row in my primary test table it has been replicated with no issue:
[postgres@server4 data01]$ cat postgresql.auto.conf # Do not edit this file manually! # It will be overwritten by the ALTER SYSTEM command. primary_conninfo = 'user=repuser password=secure_password channel_binding=prefer host=server3.domain.com port=5433 sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any' primary_slot_name = 'server3_slot' |
I just had to change listen_address parameter to match my standby server name and all other tasks have been done by pg_basebackup… Almost magical for small server where you don’t have lots of data to transfer over the network.
Ritesh Ramesh says:
What happens to schema changes as PG doesn’t replicate schema just data natively? also is there a way to deploy schema changes to all servers as part of our CI/CD without making the secondary servers primary? Any other strategy to deal with schema changes will be appreciated.
Yannick Jaquier says:
Have you tested it ? Unless I’m completely wrong DDL are stored in WAL and are so replicated in case of Write-Ahead Log Shipping replication… Please feedback once tested…
Damion says:
I was under the impression its only Logical Replication that does not propogate certain DDL/DML changes made on the PostgreSQL Primary to the PostgreSQL Standby’s…..
i.e. Logical Replication does not support commands like “ALTER TABLE RENAME COLUMN”, “CREATE INDEX “, “DROP INDEX <z." and others (I'm not sure of the exact others though).