blog
How to Use pgBackRest to Backup PostgreSQL and TimescaleDB
Your data is probably the most valuable assets in the company, so you should have a Disaster Recovery Plan (DRP) to prevent data loss in the event of an accident or hardware failure. A backup is the simplest form of DR. It might not always be enough to guarantee an acceptable Recovery Point Objective (RPO) but is a good first approach. Also, you should define a Recovery Time Objective (RTO) according to your company requirements. There are many ways to reach the RTO value, it depends on the company goals.
In this blog, we’ll see how to use pgBackRest for backing up PostgreSQL and TimescaleDB and how to use one of the most important features of this backup tool, the combination of Full, Incremental and Differential backups, to minimize downtime.
What is pgBackRest?
There are different types of backups for databases:
- Logical: The backup is stored in a human-readable format like SQL.
- Physical: The backup contains binary data.
- Full/Incremental/Differential: The definition of these three types of backups is implicit in the name. The full backup is a full copy of all your data. Incremental backup only backs up the data that has changed since the previous backup and the differential backup only contains the data that has changed since the last full backup executed. The incremental and differential backups were introduced as a way to decrease the amount of time and disk space usage that it takes to perform a full backup.
pgBackRest is an open source backup tool that creates physical backups with some improvements compared to the classic pg_basebackup tool. We can use pgBackRest to perform an initial database copy for Streaming Replication by using an existing backup, or we can use the delta option to rebuild an old standby server.
Some of the most important pgBackRest features are:
- Parallel Backup & Restore
- Local or Remote Operation
- Full, Incremental and Differential Backups
- Backup Rotation and Archive Expiration
- Backup Integrity check
- Backup Resume
- Delta Restore
- Encryption
Now, let’s see how we can use pgBackRest to backup our PostgreSQL and TimescaleDB databases.
How to Use pgBackRest
For this test, we’ll use CentOS 7 as OS and PostgreSQL 11 as the database server. We’ll assume you have the database installed, if not you can follow these links to deploy both PostgreSQL or TimescaleDB in an easy way by using ClusterControl.
First, we need to install the pgbackrest package.
$ yum install pgbackrest
pgBackRest can be used from the command line, or from a configuration file located by default in /etc/pgbackrest.conf on CentOS7. This file contains the following lines:
[global]
repo1-path=/var/lib/pgbackrest
#[main]
#pg1-path=/var/lib/pgsql/10/data
You can check this link to see which parameter we can add in this configuration file.
We’ll add the following lines:
[testing]
pg1-path=/var/lib/pgsql/11/data
Make sure that you have the following configuration added in the postgresql.conf file (these changes require a service restart).
archive_mode = on
archive_command = 'pgbackrest --stanza=testing archive-push %p'
max_wal_senders = 3
wal_level = logical
Now, let’s take a basic backup. First, we need to create a “stanza”, that defines the backup configuration for a specific PostgreSQL or TimescaleDB database cluster. The stanza section must define the database cluster path and host/user if the database cluster is remote.
$ pgbackrest --stanza=testing --log-level-console=info stanza-create
2019-04-29 21:46:36.922 P00 INFO: stanza-create command begin 2.13: --log-level-console=info --pg1-path=/var/lib/pgsql/11/data --repo1-path=/var/lib/pgbackrest --stanza=testing
2019-04-29 21:46:37.475 P00 INFO: stanza-create command end: completed successfully (554ms)
And then, we can run the check command to validate the configuration.
$ pgbackrest --stanza=testing --log-level-console=info check
2019-04-29 21:51:09.893 P00 INFO: check command begin 2.13: --log-level-console=info --pg1-path=/var/lib/pgsql/11/data --repo1-path=/var/lib/pgbackrest --stanza=testing
2019-04-29 21:51:12.090 P00 INFO: WAL segment 000000010000000000000001 successfully stored in the archive at '/var/lib/pgbackrest/archive/testing/11-1/0000000100000000/000000010000000000000001-f29875cffe780f9e9d9debeb0b44d945a5165409.gz'
2019-04-29 21:51:12.090 P00 INFO: check command end: completed successfully (2197ms)
To take the backup, run the following command:
$ pgbackrest --stanza=testing --type=full --log-level-stderr=info backup
INFO: backup command begin 2.13: --log-level-stderr=info --pg1-path=/var/lib/pgsql/11/data --repo1-path=/var/lib/pgbackrest --stanza=testing --type=full
WARN: option repo1-retention-full is not set, the repository may run out of space
HINT: to retain full backups indefinitely (without warning), set option 'repo1-retention-full' to the maximum.
INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2019-04-30 15:43:21": backup begins after the next regular checkpoint completes
INFO: backup start archive = 000000010000000000000006, lsn = 0/6000028
WARN: aborted backup 20190429-215508F of same type exists, will be cleaned to remove invalid files and resumed
INFO: backup file /var/lib/pgsql/11/data/base/16384/1255 (608KB, 1%) checksum e560330eb5300f7e2bcf8260f37f36660ce3a2c1
INFO: backup file /var/lib/pgsql/11/data/base/13878/1255 (608KB, 3%) checksum e560330eb5300f7e2bcf8260f37f36660ce3a2c1
INFO: backup file /var/lib/pgsql/11/data/base/13877/1255 (608KB, 5%) checksum e560330eb5300f7e2bcf8260f37f36660ce3a2c1
. . .
INFO: full backup size = 31.8MB
INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
INFO: backup stop archive = 000000010000000000000006, lsn = 0/6000130
INFO: new backup label = 20190429-215508F
INFO: backup command end: completed successfully (12810ms)
INFO: expire command begin
INFO: option 'repo1-retention-archive' is not set - archive logs will not be expired
INFO: expire command end: completed successfully (10ms)
Now, we have the backup finished with the “completed successfully” output, so, let’s go to restore it. We’ll stop the postgresql-11 service.
$ service postgresql-11 stop
Redirecting to /bin/systemctl stop postgresql-11.service
And leave the datadir empty.
$ rm -rf /var/lib/pgsql/11/data/*
Now, run the following command:
$ pgbackrest --stanza=testing --log-level-stderr=info restore
INFO: restore command begin 2.13: --log-level-stderr=info --pg1-path=/var/lib/pgsql/11/data --repo1-path=/var/lib/pgbackrest --stanza=testing
INFO: restore backup set 20190429-215508F
INFO: restore file /var/lib/pgsql/11/data/base/16384/1255 (608KB, 1%) checksum e560330eb5300f7e2bcf8260f37f36660ce3a2c1
INFO: restore file /var/lib/pgsql/11/data/base/13878/1255 (608KB, 3%) checksum e560330eb5300f7e2bcf8260f37f36660ce3a2c1
INFO: restore file /var/lib/pgsql/11/data/base/13877/1255 (608KB, 5%) checksum e560330eb5300f7e2bcf8260f37f36660ce3a2c1
. . .
INFO: write /var/lib/pgsql/11/data/recovery.conf
INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
INFO: restore command end: completed successfully (10819ms)
Then, start the postgresql-11 service.
$ service postgresql-11 stop
And now we have our database up and running.
$ psql -U app_user world
world=> select * from city limit 5;
id | name | countrycode | district | population
----+----------------+-------------+---------------+------------
1 | Kabul | AFG | Kabol | 1780000
2 | Qandahar | AFG | Qandahar | 237500
3 | Herat | AFG | Herat | 186800
4 | Mazar-e-Sharif | AFG | Balkh | 127800
5 | Amsterdam | NLD | Noord-Holland | 731200
(5 rows)
Now, let’s see how we can take a differential backup.
$ pgbackrest --stanza=testing --type=diff --log-level-stderr=info backup
INFO: backup command begin 2.13: --log-level-stderr=info --pg1-path=/var/lib/pgsql/11/data --repo1-path=/var/lib/pgbackrest --stanza=testing --type=diff
WARN: option repo1-retention-full is not set, the repository may run out of space
HINT: to retain full backups indefinitely (without warning), set option 'repo1-retention-full' to the maximum.
INFO: last backup label = 20190429-215508F, version = 2.13
INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2019-04-30 21:22:58": backup begins after the next regular checkpoint completes
INFO: backup start archive = 00000002000000000000000B, lsn = 0/B000028
WARN: a timeline switch has occurred since the last backup, enabling delta checksum
INFO: backup file /var/lib/pgsql/11/data/base/16429/1255 (608KB, 1%) checksum e560330eb5300f7e2bcf8260f37f36660ce3a2c1
INFO: backup file /var/lib/pgsql/11/data/base/16429/2608 (448KB, 8%) checksum 53bd7995dc4d29226b1ad645995405e0a96a4a7b
. . .
INFO: diff backup size = 40.1MB
INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
INFO: backup stop archive = 00000002000000000000000B, lsn = 0/B000130
INFO: new backup label = 20190429-215508F_20190430-212258D
INFO: backup command end: completed successfully (23982ms)
INFO: expire command begin
INFO: option 'repo1-retention-archive' is not set - archive logs will not be expired
INFO: expire command end: completed successfully (14ms)
For more complex backups you can follow the pgBackRest user guide.
As we mentioned earlier, you can use the command line or the configuration files to manage your backups.
How to Use pgBackRest in ClusterControl
Since 1.7.2 version, ClusterControl added support for pgBackRest for backing up PostgreSQL and TimescaleDB databases, so let’s see how we can use it from ClusterControl.
Creating a Backup
For this task, go to ClusterControl -> Select Cluster -> Backup -> Create Backup.
We can create a new backup or configure a scheduled one. For our example, we will create a single backup instantly.
We must choose one method, the server from which the backup will be taken, and where we want to store the backup. We can also upload our backup to the cloud (AWS, Google or Azure) by enabling the corresponding button.
In this case, we’ll choose the pgbackrestfull method to take an initial full backup. When selecting this option, we’ll see the following red note:
“During first attempt of making pgBackRest backup, ClusterControl will re-configure the node (deploys and configures pgBackRest) and after that the db node needs to be restarted first.”
So, please, take it into account for the first backup attempt.
Then we specify the use of compression and the compression level for our backup.
On the backup section, we can see the progress of the backup, and information like the method, size, location, and more.
The steps are the same to create a differential of incremental backup. We only need to choose the wanted method during the backup creation.
Restoring a Backup
Once the backup is finished, we can restore it by using ClusterControl. For this, in our backup section (ClusterControl -> Select Cluster -> Backup), we can select “Restore Backup”, or directly “Restore” on the backup that we want to restore.
We have three options to restore the backup. We can restore the backup in an existing database node, restore and verify the backup on a standalone host or create a new cluster from the backup.
If we choose the Restore on Node option, we must specify the Master node, because it’s the only one writable in the cluster.
We can monitor the progress of our restore from the Activity section in our ClusterControl.
Automatic Backup Verification
A backup is not a backup if it’s not restorable. Verifying backups is something that is usually neglected by many. Let’s see how ClusterControl can automate the verification of PostgreSQL and TimescaleDB backups and help avoid any surprises.
In ClusterControl, select your cluster and go to the “Backup” section, then, select “Create Backup”.
The automatic verify backup feature is available for the scheduled backups. So, let’s choose the “Schedule Backup” option.
When scheduling a backup, in addition to selecting the common options like method or storage, we also need to specify schedule/frequency.
In the next step, we can compress our backup and enable the “Verify Backup” feature.
To use this feature, we need a dedicated host (or VM) that is not part of the cluster.
ClusterControl will install the software and it’ll restore the backup in this host. After restoring, we can see the verification icon in the ClusterControl Backup section.
Recommendations
There are also some tips that we can take into account when creating our backups:
- Store the backup on a remote location: We shouldn’t store the backup on the database server. In case of server failure, we could lose the database and the backup at the same time.
- Keep a copy of the latest backup on the database server: This could be useful for faster recovery.
- Use incremental/differential backups: To reduce the backup recovery time and disk space usage.
- Backup the WALs: If we need to restore a database from the last backup, if you only restore it, you’ll lose the changes since the backup was taken until the restore time, but if we have the WALs we can apply the changes and we can use PITR.
- Use both Logical and Physical backups: Both are necessary for different reasons, for example, if we want to restore only one database/table, we don’t need the physical backup, we only need the logical backup and it’ll be even faster that restoring the entire server.
- Take backups from standby nodes (if it’s possible): To avoid extra load on the primary node, it’s a good practice to take the backup from the standby server.
- Test your backups: The confirmation that the backup is done is not enough to ensure the backup is working. We should restore it on a standalone server and test it to avoid a surprise in case of failure.
Conclusion
As we could see, pgBackRest is a good option to improve our backup strategy. It helps you protect your data and it could be useful to reach the RTO by reducing the downtime in case of failure. Incremental backups can help reduce the amount of time and storage space used for the backup process. ClusterControl can help automate the backup process for your PostgreSQL and TimescaleDB databases and, in case of failure, restore it with a few clicks.