Table of contents
PostgreSQL backup and restore
When you are a DBA the most important part, in my opinion, of your job is backup and restore. You can fail on any other parts of your job but not if you are not able to restore and recover a database after a disaster then you will be the only one to blame (not to say fire).
Obviously being able to recover a disaster scenario is your number one priority ! Those disasters could be hardware (power failure, Datacenter issues, ..) or software (accidently dropped a table, corruption, file deleted, …). I have also written restore because you have countless stories on Internet of DBAs that have not been able to restore their precious backup often by lack of testing.
Restore and recover must be done within the agreed Recovery Time Objective (RTO) and within the agreed (most probably as small as you can) Recovery Point Objective (RPO). The recovery is ideally as close as possible to the time of the problem (just before an important table drop for example) and are called Point In Time Recovery (PITR).
In this blog post I plan to test few famous PostgreSQL 13 backup tools and check if they are usable in a production environment…
My test PostgreSQL instance is release 13.3 running on RedHat 7.8 (Maipo). I will add new tool in future if I find interesting candidates…
My personal preference so far:
- PgBackRest
- Pg_basebackup
- Barman
- EDB Backup and Recovery Tool (BART)
Pg_basebackup
Pg_basebackup is the default tool coming directly with your PostgreSQL installation. We cannot expect miracle from it, but if it does a decent job why would we need more ?
I set the obvious below parameter to activate WAL and WAL archiving of my instance. As you can see I have decided to put my backup in /postgres/backup and archive WAL in /postgres/backup/archivedir:
archive_mode = on
archive_command = 'test ! -f /postgres/backup/archivedir/%f && cp %p /postgres/backup/archivedir/%f' |
I create my own database and my traditional test table. Idea is same as usual, insert one row, perform a backup, insert a second row and try to recover till last transaction:
postgres=# create database yjaquierdb; CREATE DATABASE postgres=# \c yjaquierdb You are now connected to database "yjaquierdb" as user "postgres". yjaquierdb=# create table test(id int,descr varchar(50)); CREATE TABLE yjaquierdb=# insert into test values(1,'One'); INSERT 0 1 |
I have only one, current, WAL file not yet archived obviously:
[postgres@server data]$ ll pg_wal total 16384 -rw------- 1 postgres postgres 16777216 Jul 20 12:17 000000010000000000000001 drwx------ 2 postgres postgres 96 Jul 20 12:16 archive_status [root@server backup]# ll /postgres/backup/archivedir/ |
I perform a backup:
[postgres@server data]$ mkdir -p /postgres/backup/full_20jul2021 [postgres@server data]$ pg_basebackup --pgdata=/postgres/backup/full_20jul2021 --format=t --compress=9 --progress --verbose --host=localhost --port=5433 --username=postgres pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/3000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_19427" 24989/24989 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 0/3000100 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed [postgres@server data]$ ll /postgres/backup/full_20jul2021 total 3124 -rw------- 1 postgres postgres 135690 Jul 16 17:38 backup_manifest -rw------- 1 postgres postgres 3044276 Jul 16 17:38 base.tar.gz -rw------- 1 postgres postgres 17643 Jul 16 17:38 pg_wal.tar.gz |
We can see that 2 WAL files have been archived and 000000010000000000000003 is current WAL file:
[postgres@server data]$ ll /postgres/backup/archivedir/ total 65537 -rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001 -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002 -rw------- 1 postgres postgres 339 Jul 20 12:19 000000010000000000000002.00000028.backup [postgres@server data]$ ll pg_wal total 49154 -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002 -rw------- 1 postgres postgres 339 Jul 20 12:19 000000010000000000000002.00000028.backup -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000003 drwx------ 2 postgres postgres 1024 Jul 20 12:19 archive_status |
I insert a second row in my test table (we can see it does not change the current WAL, so if you loose this file that is not included in full backup you would have data loss):
[postgres@server data]$ psql --dbname=yjaquierdb psql (13.3) Type "help" for help. yjaquierdb=# insert into test values(2,'Two'); INSERT 0 1 yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two (2 rows) yjaquierdb=# \q [postgres@server data]$ ll pg_wal total 49154 -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002 -rw------- 1 postgres postgres 339 Jul 20 12:19 000000010000000000000002.00000028.backup -rw------- 1 postgres postgres 16777216 Jul 20 12:21 000000010000000000000003 drwx------ 2 postgres postgres 1024 Jul 20 12:19 archive_status |
To have this file archived I perform a WAL switch, we will see later that a more elegant option is available:
[postgres@server data]$ psql psql (13.3) Type "help" for help. postgres=# select pg_switch_wal(); pg_switch_wal --------------- 0/3000170 (1 row) postgres=# \q [postgres@server data]$ ll pg_wal total 71618 -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002 -rw------- 1 postgres postgres 339 Jul 20 12:19 000000010000000000000002.00000028.backup -rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003 -rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000004 drwx------ 2 postgres postgres 1024 Jul 20 12:22 archive_status [postgres@server data]$ ll /postgres/backup/archivedir/ total 81921 -rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001 -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002 -rw------- 1 postgres postgres 339 Jul 20 12:19 000000010000000000000002.00000028.backup -rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003 |
I kill my PostgreSQL instance and clean my PGDATA directory:
[postgres@server data]$ kill -9 `ps -ef | grep /usr/pgsql-13/bin/postgres | grep -v grep | awk '{print $2}'` [postgres@server data]$ rm -rf * |
Pg_basebackup restore is simply two gtar commands:
[postgres@server data]$ ll /postgres/backup/full_20jul2021/ total 4148 -rw------- 1 postgres postgres 178001 Jul 20 12:19 backup_manifest -rw------- 1 postgres postgres 4050202 Jul 20 12:19 base.tar.gz -rw------- 1 postgres postgres 17662 Jul 20 12:19 pg_wal.tar.gz [postgres@server data]$ gtar xvf /postgres/backup/full_20jul2021/base.tar.gz . . [postgres@server data]$ gtar xvf /postgres/backup/full_20jul2021/pg_wal.tar.gz --directory pg_wal 000000010000000000000002 archive_status/000000010000000000000002.done |
I update restore_command in my postgresql.conf file:
restore_command = 'cp /postgres/backup/archivedir/%f %p' |
To start recover I touch the recovery.signal file and start the instance. We can see that latest inserted row is there:
[postgres@server data]$ touch recovery.signal [postgres@server data]$ pg_start waiting for server to start.... done server started [postgres@server data]$ psql --dbname=yjaquierdb psql (13.3) Type "help" for help. yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two (2 rows) |
Of course if you don’t execute the pg_switch_wal() procedure the WAL 000000010000000000000003 file contains le latest insert (the second row of my test table for example) and in case of crash you would need to cross your finger to recover this latest WAL file with your latest database transactions. Same as other product, by default, PostgreSQL offer a streaming of WAL files with pg_receivewal binary.
I execute pg_receivewal in the background:
[postgres@server data]$ nohup pg_receivewal --directory=/postgres/backup/archivedir/ --verbose --port=5433 --username=postgres & [1] 4661 [postgres@server data]$ ll -rt pg_wal total 49154 -rw------- 1 postgres postgres 16777216 Jul 20 12:29 000000020000000000000006 -rw------- 1 postgres postgres 41 Jul 20 12:29 00000002.history -rw------- 1 postgres postgres 16777216 Jul 20 15:21 000000020000000000000004 drwx------ 2 postgres postgres 1024 Jul 20 15:21 archive_status -rw------- 1 postgres postgres 16777216 Jul 20 15:22 000000020000000000000005 [postgres@server data]$ ll /postgres/backup/archivedir/ total 105650 -rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001 -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002 -rw------- 1 postgres postgres 339 Jul 20 12:19 000000010000000000000002.00000028.backup -rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003 -rw------- 1 postgres postgres 16777216 Jul 20 15:21 000000020000000000000004 -rw------- 1 postgres postgres 16777216 Jul 20 15:26 000000020000000000000005.partial -rw------- 1 postgres postgres 41 Jul 20 12:29 00000002.history |
I perform a second full instance backup:
[postgres@server data]$ pg_basebackup --pgdata=/postgres/backup/full_20jul2021_2 --format=t --compress=9 --progress --verbose --host=localhost --port=5433 --username=postgres pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/6000028 on timeline 2 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_4923" 33131/33131 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 0/6000138 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed [postgres@server data]$ ll -rt pg_wal total 49155 -rw------- 1 postgres postgres 41 Jul 20 12:29 00000002.history -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000005 -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000006 drwx------ 2 postgres postgres 1024 Jul 20 15:28 archive_status -rw------- 1 postgres postgres 339 Jul 20 15:28 000000020000000000000006.00000028.backup -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000007 [postgres@server data]$ ll /postgres/backup/archivedir/ total 171186 -rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001 -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002 -rw------- 1 postgres postgres 339 Jul 20 12:19 000000010000000000000002.00000028.backup -rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003 -rw------- 1 postgres postgres 16777216 Jul 20 15:21 000000020000000000000004 -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000005 -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000006 -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000007.partial -rw------- 1 postgres postgres 41 Jul 20 12:29 00000002.history |
I insert a third row in my test table:
[postgres@server data]$ psql --dbname=yjaquierdb psql (13.3) Type "help" for help. yjaquierdb=# insert into test values(3,'Three'); INSERT 0 1 yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two 3 | Three (3 rows) |
I kill my PostgreSQL instance and purge PGDATA directory, I noticed that pg_receiwal command is surviving to instance shutdown: to be safe I have decided to kill it !
I restore my second full backup:
[postgres@server data]$ gtar xvf /postgres/backup/full_20jul2021_2/base.tar.gz . . . [postgres@server data]$ gtar xvf /postgres/backup/full_20jul2021_2/pg_wal.tar.gz --directory pg_wal 00000002.history archive_status/00000002.history.done 000000020000000000000006 archive_status/000000020000000000000006.done |
You have to modify restore_command paramleter in postgresql.conf or you get below error (I’m not sure why this is required to do so):
restore_command = 'cp /postgres/backup/archivedir/%f %p' 2021-07-20 15:59:27.897 CEST [9188] FATAL: must specify restore_command when standby mode is not enabled |
You have to copy latest partial WAL file and rename it to make PostgreSQL understand it:
[postgres@server data]$ ll -rt pg_wal total 16385 -rw------- 1 postgres postgres 41 Jul 20 15:28 00000002.history -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000006 drwx------ 2 postgres postgres 96 Jul 20 15:37 archive_status [postgres@server data]$ ll /postgres/backup/archivedir/ total 163842 -rw------- 1 postgres postgres 16777216 Jul 20 12:18 000000010000000000000001 -rw------- 1 postgres postgres 16777216 Jul 20 12:19 000000010000000000000002 -rw------- 1 postgres postgres 339 Jul 20 12:19 000000010000000000000002.00000028.backup -rw------- 1 postgres postgres 16777216 Jul 20 12:22 000000010000000000000003 -rw------- 1 postgres postgres 16777216 Jul 20 15:21 000000020000000000000004 -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000005 -rw------- 1 postgres postgres 16777216 Jul 20 15:28 000000020000000000000006 -rw------- 1 postgres postgres 16777216 Jul 20 15:30 000000020000000000000007.partial -rw------- 1 postgres postgres 41 Jul 20 12:29 00000002.history [postgres@server data]$ cp /postgres/backup/archivedir/000000020000000000000007.partial pg_wal/000000020000000000000007 |
Touch recovery.signal file and start the instance and you get latest transaction on your database:
[postgres@server data]$ touch recovery.signal [postgres@server data]$ pg_start waiting for server to start.... done server started [postgres@server data]$ psql --dbname=yjaquierdb psql (13.3) Type "help" for help. yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two 3 | Three (3 rows) |
Overall pg_basebackup associated, or not, with pg_receivewal are low level tools but as an old monkey I really enjoyed them. Obviously you need a better understanding of PostgreSQL to use them but if you have it there is no surprise: they do what they sell. What’s missing currently is incremental backup…
PgBackRest
On my RedHat test server installation og PgBackRest is as simple as (postgresql-libs is also required but I had it already):
yum install pgbackrest-2.34-1.rhel7.x86_64.rpm |
I set below parameter, the archive_command is coming from pgBackRest docuementation:
archive_mode = on
archive_command = 'pgbackrest --stanza=localhost archive-push %p' |
I have customized the /etc/pgbackrest/pgbackrest.conf file as:
[localhost] pg1-path=/postgres/13/data pg1-port=5433 [global] repo1-path=/postgres/backup repo1-retention-full=2 [global:archive-push] compress-level=3 |
I create the stanza with:
[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info stanza-create 2021-07-20 16:32:28.091 P00 INFO: stanza-create command begin 2.34: --exec-id=13449-af55e788 --log-level-console=info --pg1-path=/postgres/13/data --pg1-port=5433 --repo1-path=/postgres/backup --stanza=localhost 2021-07-20 16:32:28.695 P00 INFO: stanza-create for stanza 'localhost' on repo1 2021-07-20 16:32:28.706 P00 INFO: stanza-create command end: completed successfully (616ms) [postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info check 2021-07-20 16:34:08.856 P00 INFO: check command begin 2.34: --exec-id=13677-eadf8ac5 --log-level-console=info --pg1-path=/postgres/13/data --pg1-port=5433 --repo1-path=/postgres/backup --stanza=localhost 2021-07-20 16:34:09.466 P00 INFO: check repo1 configuration (primary) 2021-07-20 16:34:09.668 P00 INFO: check repo1 archive for WAL (primary) 2021-07-20 16:34:09.769 P00 INFO: WAL segment 000000030000000000000009 successfully archived to '/postgres/backup/archive/localhost/13-1/0000000300000000/000000030000000000000009-fbaa8cde71f2165ceb75365b34f3b515fe8675c6.gz' on repo1 2021-07-20 16:34:09.769 P00 INFO: check command end: completed successfully (914ms) |
I have my test table:
yjaquierdb=# select * from test; id | descr ----+------- 1 | One (1 row) |
Perform a backup with:
[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info backup 2021-07-20 16:39:43.658 P00 INFO: backup command begin 2.34: --exec-id=14313-31cfa7cf --log-level-console=info --pg1-path=/postgres/13/data --pg1-port=5433 --repo1-path=/postgres/backup --repo1-retention-full=2 --stanza=localhost WARN: no prior backup exists, incr backup has been changed to full 2021-07-20 16:39:44.364 P00 INFO: execute non-exclusive pg_start_backup(): backup begins after the next regular checkpoint completes 2021-07-20 16:39:44.864 P00 INFO: backup start archive = 00000003000000000000000B, lsn = 0/B000028 . . [postgres@server data]$ pgbackrest info stanza: localhost status: ok cipher: none db (current) wal archive min/max (13): 000000030000000000000007/00000003000000000000000B full backup: 20210720-163944F timestamp start/stop: 2021-07-20 16:39:44 / 2021-07-20 16:39:47 wal start/stop: 00000003000000000000000B / 00000003000000000000000B database size: 31.8MB, database backup size: 31.8MB repo1: backup set size: 3.9MB, backup size: 3.9MB |
I insert a second row in my test table and objective will be to rstore it with no data loss:
[postgres@server data]$ psql --dbname=yjaquierdb psql (13.3) Type "help" for help. yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two (2 rows) |
PgBackRest is currently not capable of WAL streaming like you would do with barman or low level with pg_receivewal so the only option not to loose anything is, for example, to perform an incremental backup:
[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info --type=incr backup . . [postgres@server data]$ pgbackrest info stanza: localhost status: ok cipher: none db (current) wal archive min/max (13): 00000003000000000000000C/000000030000000000000011 full backup: 20210720-164748F timestamp start/stop: 2021-07-20 16:47:48 / 2021-07-20 16:47:51 wal start/stop: 00000003000000000000000E / 00000003000000000000000E database size: 31.8MB, database backup size: 31.8MB repo1: backup set size: 3.9MB, backup size: 3.9MB incr backup: 20210720-164748F_20210720-170531I timestamp start/stop: 2021-07-20 17:05:31 / 2021-07-20 17:05:33 wal start/stop: 000000030000000000000010 / 000000030000000000000010 database size: 31.8MB, database backup size: 24.3KB repo1: backup set size: 3.9MB, backup size: 634B backup reference list: 20210720-164748F |
I kill and erase PGDATA directory to simulate a disaster !
Restore your instance with something like (PITR possible with multiple option):
[postgres@server data]$ pgbackrest --stanza=localhost --log-level-console=info --target-timeline=latest restore . . |
Simply start the instance (pgbackrest has create recovery.signal and set required option for you) and check everything is there:
[postgres@server data]$ pg_start waiting for server to start.... done server started [postgres@server data]$ psql --dbname=yjaquierdb psql (13.3) Type "help" for help. yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two (2 rows) |
PgBackRest is closer from what I know with Oracle RMAN. You can work local or remote. Except WAL streaming that is not there you can do full, incremental or differential backups. Ease of use is there and I had no pain in implementing it: my preferred backup tool so far !
EDB Backup and Recovery Tool
EDB Backup and Recovery Tool (BART) is the commercial product from EnterpriseDB, once you have configured the EDB repository with your (trial) account and password installation is as simple as:
yum -y install edb-bart |
[postgres@server ~]$ cp /usr/edb/bart/etc/bart.cfg.sample /postgres/backup/bart.cfg |
Cutomized the BART configuration to map your environment:
[BART] bart_host= postgres@127.0.0.1 backup_path = /postgres/backup pg_basebackup_path = /usr/bin/pg_basebackup logfile = /postgres/backup/bart.log scanner_logfile = /postgres/backup/bart_scanner.log thread_count = 5 [localhost] host = 127.0.0.1 port = 5433 user = postgres cluster_owner = postgres description = "PostgreSQL 13 Community" allow_incremental_backups = enabled |
I start my test PostgreSQL insatnce with:
[postgres@server log]$ pg_ctl -l logfile start waiting for server to start.... stopped waiting pg_ctl: could not start server |
I had a startup issue:
21-07-06 12:22:23.090 CEST [18106] LOG: invalid primary checkpoint record 2021-07-06 12:22:23.090 CEST [18106] PANIC: could not locate a valid checkpoint record 2021-07-06 12:22:23.090 CEST [18104] LOG: startup process (PID 18106) was terminated by signal 6: Aborted 2021-07-06 12:22:23.090 CEST [18104] LOG: aborting startup due to startup process failure 2021-07-06 12:22:23.092 CEST [18104] LOG: database system is shut down |
Which I solved with:
[postgres@server log]$ pg_resetwal /postgres/13/data Write-ahead log reset [postgres@server log]$ pg_ctl -l logfile start waiting for server to start.... done server started |
As asked you need to configure ssh passwordless access for connection with no password of the form of account@ip_address (this is because BART will configure archive_command by default to scp %p postgres@127.0.0.1:/postgres/backup/localhost/archived_wals/%f):
[postgres@server ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/postgres/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/postgres/.ssh/id_rsa. Your public key has been saved in /home/postgres/.ssh/id_rsa.pub. The key fingerprint is: SHA256:vMu4U0BRmKPgFDigdQ570ehg6/7lzwFsYGY7VtQWEzo postgres@server The key's randomart image is: +---[RSA 2048]----+ |o.+.ooo*=o | |+.==.o*.o. | |.=.=BoEo | | o=o=.o | | . + +.S | | .. o ... | | . ..o | | . o.+ o | | .. ++= | +----[SHA256]-----+ [postgres@server ~]$ cd .ssh [postgres@server .ssh]$ ll total 12 -rw------- 1 postgres postgres 1679 Jul 7 12:02 id_rsa -rw-r----- 1 postgres postgres 400 Jul 7 12:02 id_rsa.pub -rw-r--r-- 1 postgres postgres 716 Jul 7 12:01 known_hosts [postgres@server .ssh]$ cat id_rsa.pub >> authorized_keys [postgres@server .ssh]$ chmod 600 authorized_keys [postgres@server .ssh]$ ssh postgres@127.0.0.1 Last failed login: Wed Jul 7 12:03:05 CEST 2021 from 127.0.0.1 on ssh:notty Last login: Wed Jul 7 11:44:05 2021 [postgres@server ~]$ |
As I have already configured my PostgreSQL cluster for continuous WAL archiving with below parameter I might use the –no-configure option of BART initialization. If you decide to keep your own configuration you must mimic BART way of working i.e. the target directory for WAL archiving will be located inside your BART root directory (/postgres/backup for me) and configured server name (localhost for me) sub-directory:
checkpoint_timeout = 30s max_wal_size = 1GB min_wal_size = 80MB archive_mode = on archive_command = 'test ! -f /postgres/backup/localhost/archived_wals/%f && cp %p /postgres/backup/localhost/archived_wals/%f' #restore_command = 'cp /postgres/backup/localhost/archived_wals/%f %p' |
Initialiaz BART with (my BART root directory is /postgres/backup). If you want to override your configuration you have the –override option:
[postgres@server backup]$ bart init --server=localhost --no-configure |
If you decide to let BART configure your PostgreSQL instanced for you check the postgresql.auto.conf file to see what has been changed. There is an interesting view called pg_file_settings to see from which file the setting is coming. To reset setting of postgresql.auto.conf file use ALETER SYSTEM SET parameter TO DEFAULT command:
postgres=# select sourcefile,count(*) from pg_file_settings where sourcefile is not null group by sourcefile; sourcefile | count ----------------------------------------+------- /postgres/13/data/postgresql.conf | 24 /postgres/13/data/postgresql.auto.conf | 2 (2 rows) |
Perform a backup with:
[postgres@server backup]$ bart backup --server=localhost --gzip --compress-level=9 --backup-name=full_6jul2021 --with-pg_basebackup --thread-count=4 INFO: DebugTarget - getVar(checkDiskSpace.bytesAvailable) INFO: creating full backup using pg_basebackup for server 'localhost' INFO: creating backup for server 'localhost' INFO: backup identifier: '1625567515978' INFO: backup completed successfully INFO: BART VERSION: 2.6.2 BACKUP DETAILS: BACKUP STATUS: active BACKUP IDENTIFIER: 1625567515978 BACKUP NAME: full_6jul2021 BACKUP PARENT: none BACKUP LOCATION: /postgres/backup/localhost/1625567515978 BACKUP SIZE: 4.19 MB BACKUP FORMAT: tar.gz BACKUP TIMEZONE: Europe/Paris XLOG METHOD: fetch BACKUP CHECKSUM(s): 0 TABLESPACE(s): 0 START WAL LOCATION: 00000003000000010000007F BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2021-07-06 12:31:55 CEST STOP TIME: 2021-07-06 12:32:01 CEST TOTAL DURATION: 6 sec(s) |
List existing backups:
[postgres@server backup]$ bart show-backups --server=localhost SERVER NAME BACKUP ID BACKUP NAME BACKUP PARENT BACKUP TIME BACKUP SIZE WAL(s) SIZE WAL FILES STATUS localhost 1625567515978 full_6jul2021 none 2021-07-06 12:32:01 CEST 4.19 MB 0.00 bytes 0 active |
Delete backups:
[postgres@server backup]$ bart delete --server=localhost --backupid=1625567515978 INFO: deleting backup '1625567515978' of server 'localhost' INFO: deleting backup '1625567515978' INFO: WALs of deleted backup(s) will belong to prior backup(if any), or will be marked unused WARNING: not marking any WALs as unused WALs, the WAL file '/postgres/backup/localhost/archived_wals/00000003000000010000007F' is required, yet not available in archived_wals directory INFO: backup(s) deleted [postgres@server backup]$ bart show-backups --server=localhost SERVER NAME BACKUP ID BACKUP NAME BACKUP PARENT BACKUP TIME BACKUP SIZE WAL(s) SIZE WAL FILES STATUS |
As usual I create a test table where I will insert rows after backup to see if I can do a PITR recovery and get the latest inserted rows:
yjaquierdb=# create table test(id int, descr varchar(50)); CREATE TABLE yjaquierdb=# insert into test values(1,'One'); INSERT 0 1 yjaquierdb=# select * from test; id | descr ----+------- 1 | One (1 row) |
I create a full backup with:
[postgres@server backup]$ bart backup --server=localhost --gzip --compress-level=9 --backup-name=full_7jul2021 --with-pg_basebackup --thread-count=4 . . . [postgres@server backup]$ bart show-backups --server=localhost SERVER NAME BACKUP ID BACKUP NAME BACKUP PARENT BACKUP TIME BACKUP SIZE WAL(s) SIZE WAL FILES STATUS localhost 1625662500032 full_7jul2021 none 2021-07-07 14:55:05 CEST 4.23 MB 0.00 bytes 0 active |
I insert a new row in my test table with:
yjaquierdb=# insert into test values(2,'Two'); INSERT 0 1 yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two (2 rows) |
I kill my PostgreSQL instance with:
[postgres@server ~]$ kill -9 `ps -ef | grep /usr/pgsql-13/bin/postgres | grep -v grep | awk '{print $2}'` |
In a disaster scenario you might not be able to copy the latest WAL file that have not been archived and so your recovery will be incomplete and you will have data loss. My test scenario is favorable and I can copy latest unarchived WAL file with:
[postgres@server data]$ cp pg_wal/* /postgres/backup/localhost/archived_wals/ cp: omitting directory ‘pg_wal/archive_status’ |
Then before BART restore I crash everything with:
[postgres@server ~]$ rm -rf /postgres/13/data [postgres@server ~]$ ll /postgres/13/data total 0 |
BART restore:
[postgres@server backup]$ bart restore --server=localhost --backupid=1625662500032 --restore-path=/postgres/13/data --target-tli=latest INFO: restoring backup '1625662500032' of server 'localhost' INFO: base backup restored INFO: writing recovery settings to postgresql.auto.conf INFO: WAL file(s) will be streamed from the BART host INFO: archiving is disabled INFO: permissions set on $PGDATA INFO: restore completed successfully |
Remark:
The –target-tli (or any other –target_xx parameters) is highly important because if you omit it then PostgreSQL will perform a recovery until the first consistent state of the database and so WILL NOT perform a PITR recovery and you desperately won’t see the latest inserted rows in your test table. –target-tli BART option is equivalent to PostgreSQL recovery_target_timeline parameter.
Everything restored and recovery.signal file automatically created for you:
[postgres@server data]$ ll /postgres/13/data/ total 62 -rw------- 1 postgres postgres 227 Jul 7 14:55 backup_label -rw------- 1 postgres postgres 227 Jul 6 18:00 backup_label.old drwx------ 6 postgres postgres 96 Jul 7 14:55 base -rw------- 1 postgres postgres 30 Jul 7 12:33 current_logfiles drwx------ 2 postgres postgres 2048 Jul 7 14:57 global drwx------ 2 postgres postgres 1024 Jul 7 12:06 log -rw------- 1 postgres postgres 1249 Jul 7 12:33 logfile drwx------ 2 postgres postgres 96 Jun 17 17:21 pg_commit_ts drwx------ 2 postgres postgres 96 Jun 17 17:21 pg_dynshmem -rw------- 1 postgres postgres 4831 Jun 17 11:54 pg_hba.conf -rw------- 1 postgres postgres 1636 Jun 14 16:42 pg_ident.conf drwx------ 4 postgres postgres 96 Jul 7 14:55 pg_logical drwx------ 4 postgres postgres 96 Jun 17 17:21 pg_multixact drwx------ 2 postgres postgres 96 Jun 17 17:21 pg_notify drwx------ 2 postgres postgres 96 Jun 17 17:21 pg_replslot drwx------ 2 postgres postgres 96 Jun 17 17:21 pg_serial drwx------ 2 postgres postgres 96 Jun 17 17:21 pg_snapshots drwx------ 2 postgres postgres 96 Jul 7 12:33 pg_stat drwx------ 2 postgres postgres 96 Jul 7 14:55 pg_stat_tmp drwx------ 2 postgres postgres 96 Jul 6 18:07 pg_subtrans drwxr-x--- 2 postgres postgres 96 Jul 7 14:57 pg_tblspc drwx------ 2 postgres postgres 96 Jun 17 17:21 pg_twophase -rw------- 1 postgres postgres 3 Jun 14 16:42 PG_VERSION drwx------ 3 postgres postgres 1024 Jul 7 14:57 pg_wal drwx------ 2 postgres postgres 96 Jun 17 17:21 pg_xact -rw-r----- 1 postgres postgres 92 Jul 7 14:57 postgresql.auto.conf -rw------- 1 postgres postgres 28256 Jul 7 14:57 postgresql.conf -rw-r----- 1 postgres postgres 1 Jul 7 14:57 recovery.signal -rw-r----- 1 postgres postgres 4648 Jun 17 16:40 server.crt -rw------- 1 postgres postgres 1675 Jun 17 16:40 server.key -rw-r----- 1 postgres postgres 3610 Jun 17 16:38 server.req |
And the postgresql.auto.conf contains the required recovery commands:
[postgres@server data]$ cat postgresql.auto.conf # Do not edit this file manually! # It will be overwritten by the ALTER SYSTEM command. restore_command = 'cp /postgres/backup/localhost/archived_wals/%f %p' recovery_target_timeline = latest |
Once you have started your instance you can query your test table and see latest insertion is there:
[postgres@server data]$ psql --dbname=yjaquierdb psql (13.3) Type "help" for help. yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two (2 rows) yjaquierdb=# |
Please not that you need to edit postgresql.conf file and restart your instance to correct this strange behavior of BART:
The BART RESTORE operation stops WAL archiving by adding an archive_mode = off parameter at the very end of the postgresql.conf file. This last parameter in the file overrides any other previous setting of the same parameter in the file. Delete the last setting and restart the database server to start WAL archiving.
# Add settings for extensions here
archive_mode = off
Barman
With the Enterprise Linux (or EPEL) YUM repository configured on your server installation is as simple as:
[root@server ~]# yum install barman-2.12-1.el7.noarch.rpm . . [root@server ~]# su - barman -bash-4.2$ id uid=991(barman) gid=496(barman) groups=496(barman) -bash-4.2$ barman -v 2.12 Barman by 2ndQuadrant (www.2ndQuadrant.com) |
To test full feature (barman switch-wal) I have decided to create a PostgreSQL superuser on my test instance:
[postgres@server data]$ createuser --superuser --pwprompt --port=5433 barman Enter password for new role: Enter it again: [postgres@server data]$ createuser -P --replication --port=5433 streaming_barman Enter password for new role: Enter it again: |
Control everything is fine with:
postgres=# \du List of roles Role name | Attributes | Member of ------------------+------------------------------------------------------------+----------- barman | Superuser, Create role, Create DB | {} postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {} streaming_barman | Replication | {} |
In /etc/barman.conf main configuration file I have just changed (make sure this directory is accessible in read/write by barman Linux account):
barman_home = /postgres/backup/barman |
You need to create a configuration for your PostgreSQL server you plan to backup. I have decided to call it localhost as I have only one server for barman server and my PostgreSQL instance. Overall, same a Oralce RMAN, I am not a pure fan of this central backup server and I rather prefer to have one management per PostgreSQL server… So creating it from streaming template, check the official docuementation to determine if you prefer to do rsync/ssh or streaming:
[root@server ~]# cd /etc/barman.d/ [root@server barman.d]# ll total 12 -rw-r--r-- 1 root root 947 Nov 4 2020 passive-server.conf-template -rw-r--r-- 1 root root 1565 Nov 4 2020 ssh-server.conf-template -rw-r--r-- 1 root root 1492 Nov 4 2020 streaming-server.conf-template [root@server barman.d]# cp streaming-server.conf-template localhost.conf [root@server barman.d]# cat localhost.conf | grep -v "^;" [localhost] description = "My PostgreSQL 13 test instance" conninfo = host=localhost user=barman dbname=postgres port=5433 streaming_conninfo = host=localhost user=streaming_barman dbname=postgres port=5433 backup_method = postgres streaming_archiver = on slot_name = barman create_slot = auto archiver = on |
Remak:
archiver option is coming from a problem I had and automatic slot_name creation is to ease my life:
[barman@server ~]$ barman check localhost Server localhost: empty incoming directory: FAILED ('/postgres/backup/barman/localhost/incoming' must be empty when archiver=off) |
If you decide to manage slot by yourself, below caommands could be useful:
[barman@server ~]$ barman receive-wal --drop-slot localhost Dropping physical replication slot 'barman' on server 'localhost' Replication slot 'barman' dropped [barman@server ~]$ barman receive-wal --create-slot localhost Creating physical replication slot 'barman' on server 'localhost' Replication slot 'barman' created [barman@server ~]$ barman receive-wal --reset localhost Resetting receive-wal directory status Removing status file /postgres/backup/barman/localhost/streaming/00000001000000000000000C.partial Creating status file /postgres/backup/barman/localhost/streaming/00000001000000000000000E.partial |
Configure passwordless connection with barman and streaming_barman onto your test PostgreSQL instance with .pgpass file:
[barman@server ~]$ echo "localhost:5433:postgres:barman:barman" > ~/.pgpass [barman@server ~]$ echo "localhost:5433:postgres:streaming_barman:barman" > ~/.pgpass [barman@server ~]$ chmod 600 ~/.pgpass [barman@server ~]$ psql --host=localhost --port=5433 --dbname=postgres --username=barman psql (9.2.24, server 13.3) WARNING: psql version 9.2, server version 13.0. Some psql features might not work. Type "help" for help. postgres=# \q |
Check it works without prompting for password:
[barman@server ~]$ psql -U streaming_barman -h localhost --port=5433 -c "IDENTIFY_SYSTEM" replication=1 systemid | timeline | xlogpos | dbname ---------------------+----------+-----------+-------- 6982909595436191101 | 1 | 0/161E7E8 | (1 row) |
Configure authorized_keys for barman and postgres accounts… From barman you must be able to connect to your host passwordless and vice versa from postgres account…
Configure the PostgreSQL instance you plan to backup with barman with below option:
archive_mode = on
archive_command = 'barman-wal-archive localhost localhost %p' |
To get barman-wal-archive command you must install Barma CLI package with:
yum install barman-cli |
Remark:
The barmal-wal-archive command is a little weird, this is because I’m using the same host for barman and PostgreSQL instance. The command is: barman-wal-archive [-h] [-V] [-U USER] [-c CONFIG] [-t] BARMAN_HOST SERVER_NAME WAL_PATH. So the double localhost…
Activate streaming of WAL files, barman account must have access to pg_receivewal binary:
[barman@server .ssh]$ barman receive-wal localhost Starting receive-wal for server localhost ERROR: ArchiverFailure:pg_receivexlog not present in $PATH [barman@server ~]$ export PATH=$PATH:/usr/pgsql-13/bin/ [barman@server ~]$ nohup barman receive-wal localhost & [1] 22955 [barman@server ~]$ Starting receive-wal for server localhost localhost: pg_receivewal: starting log streaming at 0/1000000 (timeline 1) localhost: pg_receivewal: finished segment at 0/2000000 (timeline 1) |
If your barman account is superuser on your target PostgreSQL instance you can even remotely switch WAL file with:
[barman@server ~]$ barman switch-wal --force --archive localhost The WAL file 000000010000000000000006 has been closed on server 'localhost' Waiting for the WAL file 000000010000000000000006 from server 'localhost' (max: 30 seconds) localhost: pg_receivewal: finished segment at 0/7000000 (timeline 1) Processing xlog segments from streaming for localhost 000000010000000000000006 |
Perform a final check with:
[barman@server ~]$ barman check localhost Server localhost: PostgreSQL: OK superuser or standard user with backup privileges: OK PostgreSQL streaming: OK wal_level: OK replication slot: OK directories: OK retention policy settings: OK backup maximum age: OK (no last_backup_maximum_age provided) compression settings: OK failed backups: OK (there are 0 failed backups) minimum redundancy requirements: OK (have 0 backups, expected at least 0) pg_basebackup: OK pg_basebackup compatible: OK pg_basebackup supports tablespaces mapping: OK systemid coherence: OK (no system Id stored on disk) pg_receivexlog: OK pg_receivexlog compatible: OK receive-wal running: OK archiver errors: OK [barman@server ~]$ barman status localhost Server localhost: Description: My PostgreSQL 13 test instance Active: True Disabled: False PostgreSQL version: 13.3 Cluster state: in production pgespresso extension: Not available Current data size: 32.0 MiB PostgreSQL Data directory: /postgres/13/data Current WAL segment: 000000010000000000000006 PostgreSQL 'archive_command' setting: barman-wal-archive localhost localhost %p Last archived WAL: 000000010000000000000005, at Fri Jul 16 11:05:16 2021 Failures of WAL archiver: 0 Server WAL archiving rate: 11.54/hour Passive node: False Retention policies: not enforced No. of available backups: 1 First available backup: 20210716T110510 Last available backup: 20210716T110510 Minimum redundancy requirements: satisfied (1/0) |
As usual I create a yjaquierdb database with a test table inside that contains only one row:
yjaquierdb=# select * from test; id | descr ----+------- 1 | One (1 row) |
Time to perform our first backup:
[barman@server localhost]$ barman backup localhost --wait Starting backup using postgres method for server localhost in /postgres/backup/barman/localhost/base/20210716T125713 Backup start at LSN: 0/4012760 (000000010000000000000004, 00012760) Starting backup copy via pg_basebackup for 20210716T125713 Copy done (time: 4 seconds) Finalising the backup. This is the first backup for server localhost WAL segments preceding the current backup have been found: 000000010000000000000001 from server localhost has been removed 000000010000000000000002 from server localhost has been removed 000000010000000000000003 from server localhost has been removed Backup size: 31.9 MiB Backup end at LSN: 0/6000000 (000000010000000000000005, 00000000) Backup completed (start time: 2021-07-16 12:57:13.538788, elapsed time: 4 seconds) Waiting for the WAL file 000000010000000000000005 from server 'localhost' Processing xlog segments from streaming for localhost 000000010000000000000004 Processing xlog segments from file archival for localhost 000000010000000000000004 000000010000000000000005 000000010000000000000005.00000028.backup |
Confirm backup creation with:
[barman@server localhost]$ barman list-backup localhost localhost 20210716T125713 - Fri Jul 16 12:57:18 2021 - Size: 47.9 MiB - WAL Size: 0 B [barman@server localhost]$ barman show-backup localhost 20210716T125713 Backup 20210716T125713: Server Name : localhost System Id : 6985474264206038247 Status : DONE PostgreSQL Version : 130003 PGDATA directory : /postgres/13/data Base backup information: Disk usage : 31.9 MiB (47.9 MiB with WALs) Incremental size : 31.9 MiB (-0.00%) Timeline : 1 Begin WAL : 000000010000000000000005 End WAL : 000000010000000000000005 WAL number : 1 Begin time : 2021-07-16 12:57:16+02:00 End time : 2021-07-16 12:57:18.404710+02:00 Copy time : 4 seconds Estimated throughput : 6.6 MiB/s Begin Offset : 40 End Offset : 0 Begin LSN : 0/5000028 End LSN : 0/6000000 WAL information: No of files : 0 Disk usage : 0 B Last available : 000000010000000000000005 Catalog information: Retention Policy : not enforced Previous Backup : - (this is the oldest base backup) Next Backup : - (this is the latest base backup) |
I insert a new row in my test table and then I crash and delete entirely the PostgreSQL instance and objective will be to perform a PITR recovery and get the latest inserted row in the table.
yjaquierdb=# insert into test values(2,'Two'); INSERT 0 1 yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two (2 rows) |
Latest modification go inside the streaming directory of your barman backup (barman receive-wal command):
[barman@server localhost]$ pwd /postgres/backup/barman/localhost [barman@server localhost]$ ll streaming total 32768 -rw------- 1 barman barman 16777216 Jul 16 13:02 000000010000000000000006.partial [barman@server localhost]$ ll wals/0000000100000000/ total 16385 -rw-r----- 1 barman barman 16777216 Jul 16 12:57 000000010000000000000005 -rw-r----- 1 barman barman 339 Jul 16 12:57 000000010000000000000005.00000028.backup |
Confirmation in PostgreSQL instance:
[postgres@server data]$ ll -rt pg_wal/ total 49162 -rw------- 1 postgres postgres 16777216 Jul 16 12:57 000000010000000000000007 -rw------- 1 postgres postgres 16777216 Jul 16 12:57 000000010000000000000008 -rw------- 1 postgres postgres 339 Jul 16 12:57 000000010000000000000005.00000028.backup drwx------ 2 postgres postgres 1024 Jul 16 13:02 archive_status -rw------- 1 postgres postgres 16777216 Jul 16 13:02 000000010000000000000006 |
I have knocked my head on the wall for a long time before finding why the partial streamed WAL file are not included in restore command, this is a current (huge) limitation of barman:
IMPORTANT: A current limitation of Barman is that the recover command is not yet able to transparently manage .partial files. In such situations, users will need to manually copy the latest partial file from the server’s streaming_wals_directory of their Barman installation to the destination for recovery, making sure that the .partial suffix is removed. Restoring a server using the last partial file, reduces data loss, by bringing down recovery point objective to values around 0, or exactly 0 in case of synchronous replication.
Even if you move it to archive directory barman does not take it into account:
[barman@server localhost]$ barman get-wal --partial localhost 000000010000000000000006 --output-directory wals/0000000100000000/ Sending WAL '000000010000000000000006' for server 'localhost' into 'wals/0000000100000000/000000010000000000000006' file [barman@server localhost]$ ll wals/0000000100000000/ total 16385 -rw-r----- 1 barman barman 16777216 Jul 16 12:57 000000010000000000000005 -rw-r----- 1 barman barman 339 Jul 16 12:57 000000010000000000000005.00000028.backup -rw-r----- 1 barman barman 0 Jul 16 15:24 000000010000000000000006 |
I have killed the PostgreSQL instance, remove (rm -rf *) the PGDATA directory, now time to restore and recover.
Remark:
If you use the traditional “barman recover localhost 20210716T125713 /postgres/13/data” command then target directory will be owned by barman (and you might have error when barman will try to write to it). Once recovered, final directory is owned by barman account so you would have to change ownership with root account to postgres account before starting the instance. One nice trick is to use –remote-ssh-command “ssh postgres@localhost” option of barman recover and this will be postgres account that will write everything (and remember we configured passwordless access from barman).
To map PosgreSQL recovery I have first tried with:
[barman@server localhost]$ barman recover --remote-ssh-command "ssh postgres@localhost" --target-tli 'latest' localhost 20210716T110510 /postgres/13/data usage: barman recover [-h] [--target-tli TARGET_TLI] [--target-time TARGET_TIME] [--target-xid TARGET_XID] [--target-lsn TARGET_LSN] [--target-name TARGET_NAME] [--target-immediate] [--exclusive] [--tablespace NAME:LOCATION] [--remote-ssh-command SSH_COMMAND] [--bwlimit KBPS] [--retry-times RETRY_TIMES] [--retry-sleep RETRY_SLEEP] [--no-retry] [--jobs NJOBS] [--get-wal] [--no-get-wal] [--network-compression] [--no-network-compression] [--target-action TARGET_ACTION] [--standby-mode] server_name backup_id destination_directory barman recover: error: argument --target-tli: 'latest' is not a valid positive integer |
So finally decided to use below command. I don’t know by heart any recovery time or LSN number and I just want to recover with smallest RPO:
[barman@server localhost]$ barman recover --remote-ssh-command "ssh postgres@localhost" --target-tli 99999999 localhost 20210716T125713 /postgres/13/data Starting remote restore for server localhost using backup 20210716T125713 Destination directory: /postgres/13/data Remote command: ssh postgres@localhost Doing PITR. Recovery target timeline: 'True' Using safe horizon time for smart rsync copy: 2021-07-16 12:57:16+02:00 Copying the base backup. Copying required WAL segments. Generating recovery configuration Identify dangerous settings in destination directory. IMPORTANT These settings have been modified to prevent data losses postgresql.conf line 237: archive_command = false Recovery completed (start time: 2021-07-16 15:25:03.033741, elapsed time: 4 seconds) Your PostgreSQL server has been successfully prepared for recovery! |
Copy streamed .partial file to $PGDATA/barman_wal/ directory removing the .partial extention…
Change recovery_target_timeline from 99999999 to ‘latest’ in postgresql.auto.conf file:
[postgres@server data]$ cat postgresql.auto.conf # Do not edit this file manually! # It will be overwritten by the ALTER SYSTEM command. restore_command = 'cp barman_wal/%f %p' recovery_end_command = 'rm -fr barman_wal' recovery_target_timeline = latest |
Just start the PosgreSQL instance and all should be there (you must bounce it again to reconfigure archive_command parameter that has been overwritten). And restart barman receive-wal command:
[postgres@server data]$ psql --dbname=yjaquierdb psql (13.3) Type "help" for help. yjaquierdb=# select * from test; id | descr ----+------- 1 | One 2 | Two (2 rows) |
To be continued…
References
- PostgreSQL Backup Strategy for an Enterprise-Grade Environment
- Backup PostgreSQL databases with Barman