PostgreSQL physical replication hands-on

Preamble

We are just starting our PostgreSQL journey but we already have requirements for high availability. This can be achieved by a (simple) Operating System cluster that balancing a virtual IP and the lead on shared disks on a remote server. In our particular case the disk array is kept up–to-date (synchronous or not) with a black fiber link between our data centers.

Same as Data Guard for Oracle there is a pure database solution with PostgreSQL called Write-Ahead Log (WAL) Shipping or physical replication. Same as Data Guard this solution has the benefit of reducing the required bandwidth over the network as well as, for PostgreSQL, making this standby server available for free for read only queries.

From official documentation:

Warm and hot standby servers can be kept current by reading a stream of write-ahead log (WAL) records. If the main server fails, the standby contains almost all of the data of the main server, and can be quickly made the new primary database server. This can be synchronous or asynchronous and can only be done for the entire database server.

The overall picture of PostgreSQL physical replication is the following:

postgresql_physical_replication01
postgresql_physical_replication01

My test environment will be made of two virtual machines:

  • server3.domain.com (192.168.56.103) will be my primary server
  • server4.domain.com (192.168.56.104) will be my standby server

There are both running Oracle Enterprise Linux OEL 8.4 and PostgreSQL 14.

In a further post I will probably give a try with Replication Manager for PostgreSQL clusters (repmgr) but at the time of writing this post the tool is not yet compatible with PostgreSQL 14…

Primary server setup

Installed PostgreSQL directly from official repository:

[root@server3 ~]# dnf -y install postgresql14-server

Created a dedicated mount point:

[root@server3 /]# ll -d /postgres
drwxr-xr-x 3 postgres postgres 20 Oct 29 12:18 /postgres
[root@server3 ~]# df -h /postgres
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/vg00-lvol30 1014M   40M  975M   4% /postgres

Customized the profile by adding few shortcuts:

[postgres@server3 ~]$ cat /var/lib/pgsql/.pgsql_profile
PATH=/usr/pgsql-14/bin:$PATH
export PATH
MANPATH=/usr/local/pgsql/share/man:$MANPATH
export MANPATH
PGDATA=/postgres/data01
export PGDATA
alias psql="psql --port=5433"
alias pg_start="cd ${PGDATA};pg_ctl -l logfile start"
alias pg_restart="cd ${PGDATA};pg_ctl -l logfile restart"
alias pg_stop="cd ${PGDATA};pg_ctl -l logfile stop"

Initialized the starter instance with:

[postgres@server3 ~]$ pg_ctl initdb -D /postgres/data01
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
 
The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
 
Data page checksums are disabled.
 
creating directory /postgres/data01 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... CET
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
 
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
 
Success. You can now start the database server using:
 
    /usr/pgsql-14/bin/pg_ctl -D /postgres/data01 -l logfile start

Then updated a bit the postgresql.conf parameter file:

[postgres@server3 data01]$ cp postgresql.conf postgresql.conf.org
[postgres@server3 data01]$ egrep -v "(^\s*#|^#)" postgresql.conf.org | grep . > postgresql.conf
[postgres@server3 data01]$ vi postgresql.conf
[postgres@server3 data01]$ cat postgresql.conf
listen_addresses = 'server3.domain.com'
port = 5433
archive_mode = on
archive_command = 'test ! -f /postgres/arch/%f && cp %p /postgres/arch/%f'
max_connections = 100                   # (change requires restart)
shared_buffers = 128MB                  # min 128kB
dynamic_shared_memory_type = posix      # the default is the first option
max_wal_size = 1GB
min_wal_size = 80MB
log_destination = 'stderr'              # Valid values are combinations of
logging_collector = on                  # Enable capturing of stderr and csvlog
log_directory = 'log'                   # directory where log files are written,
log_filename = 'postgresql-%a.log'      # log file name pattern,
log_rotation_age = 1d                   # Automatic rotation of logfiles will
log_rotation_size = 0                   # Automatic rotation of logfiles will
log_truncate_on_rotation = on           # If on, an existing log file with the
log_line_prefix = '%m [%p] '            # special values:
log_timezone = 'CET'
datestyle = 'iso, mdy'
timezone = 'CET'
lc_messages = 'en_US.UTF-8'                     # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting
default_text_search_config = 'pg_catalog.english'

For administrative tools I create my own personal account:

postgres=# create role yjaquier with superuser login password 'secure_password';
CREATE ROLE

Added in pg_hba.conf to allow me to connect from anywhere:

host    all             yjaquier             0.0.0.0/0            md5

It is strongly recommended to create a dedicated replication user with:

postgres=# create role repuser with replication login password 'secure_password';
CREATE ROLE

Added in pg_hba.conf to allow replication related connection from standby node:

host    replication     all             192.168.56.0/24            md5

Remark:
You can further increase security by putting the exact IP address of your standby server and /32 network mask. Example: 192.168.56.104/32. You can also specify the repuser we have created for replication purpose…

Standby server setup for physical replication

Your standby server must be created with a copy of your primary primary instance or you will get this error message:

2021-11-01 11:52:07.697 CET [7416] FATAL:  database system identifier differs between the primary and standby
2021-11-01 11:52:07.697 CET [7416] DETAIL:  The primary's identifier is 7024430574144521833, the standby's identifier is 7024465441503407932.

I will use the default PostgreSQL backup tool (https://blog.yannickjaquier.com/postgresql/postgresql-backup-and-restore-tools-comparison-for-pitr-recovery.html#pg_basebackup) called pg_basebackup. Full instance backup:

[postgres@server3 ~]$ pg_basebackup --pgdata=/postgres/backup/full_backup --format=t --compress=9 --progress --verbose --port=5433 --username=postgres
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_12536"
34891/34891 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/5000138
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

To simulate users’ activity on my primary database I create a new database and a test table with one row after I have performed the full backup:

postgres=# create database testdb;
CREATE DATABASE
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# create table test01(id integer, descr varchar(20));
CREATE TABLE
testdb=# insert into test01 values(1,'One');
INSERT 0 1

Copy the backup files to your standby server (using scp):

[postgres@server3 data01]$ scp -rp /postgres/backup/full_backup server4.domain.com:/postgres/backup
postgres@server4.domain.com's password:
base.tar.gz                                                                 100% 4294KB  46.1MB/s   00:00
pg_wal.tar.gz                                                               100%   17KB   9.4MB/s   00:00
backup_manifest

On the standby server restore your full backup in an empty PGDATA directory with something like:

[postgres@server4 data01]$ cd ${PGDATA}
[postgres@server4 data01]$ tar xf /postgres/backup/full_backup/base.tar.gz
[postgres@server4 data01]$ tar xf /postgres/backup/full_backup/pg_wal.tar.gz --directory pg_wal

Then after you have two options to setup WAL shipping.

Without streaming replication

In this situation it implies that your standby server has an access to the WAL archive directory of your primary server (NFS or whatever) as explained in official documentation:

The archive location should be accessible from the standby even when the primary is down, i.e., it should reside on the standby server itself or another trusted server, not on the primary server.

Or that you have a process to stream your primary WAL files to the standby server. To test this (sub-optimal) method I have set in my standby postgresql.conf file (listen_addresses also modified to match my standby server name):

restore_command = 'cp /postgres/arch/%f %p'

Create the important standby.signal file:

[postgres@server4 data01]$ touch standby.signal

Start the standby PostgreSQL instance:

[postgres@server4 data01]$ pg_ctl -l logfile start

One cool feature with PostgreSQL is that your standby server is accessible in read-only mode by default (hot_standby = on by default). So for Oracle database guys you have active Data Guard for free. If you connect to the standby instance you see all is there:

postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 testdb    | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
(4 rows)
 
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# \dt
         List of relations
 Schema |  Name  | Type  |  Owner
--------+--------+-------+----------
 public | test01 | table | postgres
(1 row)
 
testdb=# select * from test01;
 id | descr
----+-------
  1 | One
(1 row)

If you insert new rows in the test01 test table and copy the generated archive file (I have set archive_timeout = 1min on my primary server to have archive command triggered more often, never do this on a production server). Ultimately you can also copy (scp) the latest generated WAL files from the pg_wal directory of your primary to your standby server you will have your standby server process them::

2021-11-05 17:33:09.883 CET [84126] LOG:  restored log file "000000010000000000000011" from archive
2021-11-05 17:33:24.901 CET [84126] LOG:  restored log file "000000010000000000000012" from archive

This because PostgreSQL log-shipping replication is made like this:

At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_wal directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_wal. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_wal, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file.

But of course there is a much more elegant solution with streaming replication…

With streaming replication

By far the best method. Update postgresql.conf change listen_addresses to match standby server name, add same restore_command and add primary_conninfo parameter (instead of supplying password in clear text in postgresql.conf file you can also use ~/.pgpass file):

restore_command = 'cp /postgres/arch/%f %p'
primary_conninfo = 'host=server3.domain.com port=5433 user=repuser password=secure_password options=''-c wal_sender_timeout=5000'''

Create the important standby.signal file:

[postgres@server4 data01]$ touch standby.signal

Start the standby PostgreSQL instance:

[postgres@server4 data01]$ pg_ctl -l logfile start

And magically the standby server is recovering what has been done on live server since your full backup. If you check the log you see output like:

2021-11-01 14:29:33.568 CET [17242] LOG:  starting PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-11-01 14:29:33.568 CET [17242] LOG:  listening on IPv4 address "192.168.56.104", port 5433
2021-11-01 14:29:33.570 CET [17242] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2021-11-01 14:29:33.571 CET [17242] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2021-11-01 14:29:33.575 CET [17244] LOG:  database system was interrupted; last known up at 2021-11-01 12:25:59 CET
2021-11-01 14:29:33.835 CET [17244] LOG:  entering standby mode
2021-11-01 14:29:33.838 CET [17244] LOG:  redo starts at 0/5000028
2021-11-01 14:29:33.841 CET [17244] LOG:  consistent recovery state reached at 0/5000138
2021-11-01 14:29:33.841 CET [17242] LOG:  database system is ready to accept read-only connections
2021-11-01 14:29:33.885 CET [17248] LOG:  started streaming WAL from primary at 0/6000000 on timeline 1

If I insert a new row in my primary database:

testdb=# insert into test01 values(2,'Two');
INSERT 0 1

I immediately see it in the hot standby node without any file copy this time:

testdb=# select * from test01;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

Replication slots for physical replication

As the documentation says:

Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys. In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_size, or by storing the segments in an archive using archive_command.

Create a replication slot with:

postgres=# SELECT * FROM pg_create_physical_replication_slot('server3_slot');
  slot_name   | lsn
--------------+-----
 server3_slot |
(1 row)
 
postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
  slot_name   | slot_type | active
--------------+-----------+--------
 server3_slot | physical  | f
(1 row)

Add it in postgres.conf file of your standby instance:

primary_slot_name = 'server3_slot'

Then on primary the active column changed:

postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
  slot_name   | slot_type | active
--------------+-----------+--------
 server3_slot | physical  | t
(1 row)

To drop the replication slot, either stop the standby (cleaning process) or unset primary_slot_name parameter on your standby instance, then:

postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
  slot_name   | slot_type | active
--------------+-----------+--------
 server3_slot | physical  | f
(1 row)
 
postgres=# select pg_drop_replication_slot('server3_slot');
 pg_drop_replication_slot
--------------------------
 
(1 row)

Promoting a standby node

Reading the official documentation I have not seen any mention to switchover (switching from primary to standby when both are still up). They only mention failover (the primary is dead). Let’s simulate this by killing my primary server:

[postgres@server3 data01]$ ps -ef | grep pgsql-14 | grep -v grep
postgres    4485       1  0 10:37 ?        00:00:00 /usr/pgsql-14/bin/postgres
[postgres@server3 data01]$ kill -9 4485

Something must be done on the standby server as it is still in read only mode:

testdb=# insert into test01 values(10,'Ten');
ERROR:  cannot execute INSERT in a read-only transaction

You can either issue run pg_ctl promote or execte pg_promote(), let’s try with pg_ctl promote:

[postgres@server4 data01]$ pg_ctl promote
waiting for server to promote.... done
server promoted

And that’s it (standby.signal file has been deleted from ${PGDATA} directory):

testdb=# insert into test01 values(10,'Ten');
INSERT 0 1

In this situation the standby is your new primary and the previous primary will have to be re-instantiated to become a standby. To help to re-synchronize if you have big PostgreSQL cluster you might want to study pg_rewind tool…

Physical replication monitoring

On primary instance one very useful table is pg_stat_replication, we can see:

postgres=# select * from pg_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr   | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | re
play_lag | sync_priority | sync_state |          reply_time
-------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+---
---------+---------------+------------+-------------------------------
 17763 |    16386 | repuser | walreceiver      | 192.168.56.104 |                 |       54096 | 2021-11-01 14:29:33.487621+01 |              | streaming | 0/601A308 | 0/601A308 | 0/601A308 | 0/601A308  |           |           |
         |             0 | async      | 2021-11-01 14:59:02.010878+01
(1 row)

On the standby instance you can compute the apply lag with a query like:

postgres=# select last_msg_receipt_time, now(), now()-last_msg_receipt_time delay,to_char(current_timestamp-last_msg_receipt_time,'hh24:mi:ss:ms') delay from pg_stat_wal_receiver;
    last_msg_receipt_time     |              now              |      delay      |    delay
------------------------------+-------------------------------+-----------------+--------------
 2021-11-01 15:47:26.11948+01 | 2021-11-01 15:47:27.362229+01 | 00:00:01.242749 | 00:00:01:242
(1 row)

All in one pg_basebackup remote command

The de-facto PostgreSQL backup tool (pg_basebackup) provide a all-in-one (complex) command to duplicate the primary instance to your standby server and configure the log-shipping replication at same time (RMAN duplicate for Oracle DBAs). From my standby node in an empty PGDATA directory issue something like:

[postgres@server4 data01]$ pg_basebackup --host=server3.domain.com --username=repuser --port=5433 --pgdata=/postgres/data01/ --format=p --wal-method=stream --progress --write-recovery-conf --create-slot --slot=server3_slot
Password:
34921/34921 kB (100%), 1/1 tablespace

Required options are:

  • –format=p Backup in plain text
  • –wal-method=stream Include WAL streaming
  • –write-recovery-conf Automatically generate replication configuration
  • –create-slot Create the replication slot automatically
  • –slot=server3_slot Provide the name of the replication slot

Checking my target /postgres/data01 directory all is there and even the standby.signal file:

[postgres@server4 data01]$ ll
total 252
-rw------- 1 postgres postgres    225 Nov  1 17:55 backup_label
-rw------- 1 postgres postgres 181149 Nov  1 17:55 backup_manifest
drwx------ 6 postgres postgres     54 Nov  1 17:55 base
-rw------- 1 postgres postgres     30 Nov  1 17:55 current_logfiles
drwx------ 2 postgres postgres   4096 Nov  1 17:55 global
drwx------ 2 postgres postgres     58 Nov  1 17:55 log
-rw------- 1 postgres postgres   1315 Nov  1 17:55 logfile
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_commit_ts
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_dynshmem
-rw------- 1 postgres postgres   4925 Nov  1 17:55 pg_hba.conf
-rw------- 1 postgres postgres   1636 Nov  1 17:55 pg_ident.conf
drwx------ 4 postgres postgres     68 Nov  1 17:55 pg_logical
drwx------ 4 postgres postgres     36 Nov  1 17:55 pg_multixact
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_notify
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_replslot
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_serial
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_snapshots
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_stat
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_stat_tmp
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_subtrans
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_tblspc
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_twophase
-rw------- 1 postgres postgres      3 Nov  1 17:55 PG_VERSION
drwx------ 3 postgres postgres     60 Nov  1 17:55 pg_wal
drwx------ 2 postgres postgres     18 Nov  1 17:55 pg_xact
-rw------- 1 postgres postgres    376 Nov  1 17:55 postgresql.auto.conf
-rw------- 1 postgres postgres   1190 Nov  1 17:55 postgresql.conf
-rw------- 1 postgres postgres  28750 Nov  1 17:55 postgresql.conf.org
-rw------- 1 postgres postgres      0 Nov  1 17:55 standby.signal

The replication configuration is in postgres.auto.conf file. Replication slot and so on is already there and when trying to insert a new row in my primary test table it has been replicated with no issue:

[postgres@server4 data01]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=repuser password=secure_password channel_binding=prefer host=server3.domain.com port=5433 sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
primary_slot_name = 'server3_slot'

I just had to change listen_address parameter to match my standby server name and all other tasks have been done by pg_basebackup… Almost magical for small server where you don’t have lots of data to transfer over the network.

References

About Post Author

3 thoughts on “PostgreSQL physical replication hands-on

  1. Ritesh Ramesh says:

    What happens to schema changes as PG doesn’t replicate schema just data natively? also is there a way to deploy schema changes to all servers as part of our CI/CD without making the secondary servers primary? Any other strategy to deal with schema changes will be appreciated.

      • I was under the impression its only Logical Replication that does not propogate certain DDL/DML changes made on the PostgreSQL Primary to the PostgreSQL Standby’s…..
        i.e. Logical Replication does not support commands like “ALTER TABLE RENAME COLUMN”, “CREATE INDEX “, “DROP INDEX <z." and others (I'm not sure of the exact others though).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>