There are many ways to address taking backups of a PostgreSQL cluster. There are several articles and blogs which present the various technologies by which we can save our precious data in PostgreSQL. There are logical backup solutions, physical backup at the OS level, at the filesystem level, and so forth. Here in this blog we are not gonna cover the theoretical part which is adequately covered by various blogs and articles as well as the official documentation.
This blog is focusing on the state of the various tools and solutions available and an effort on presenting a thorough comparison based on real life experiences. This article in no way tries to promote any specific product, I really like all the tools, solutions and technologies described in this blog. The aim here is to note down their strengths, their weaknesses and to guide the end user as to which tool would best fit his/her environment, infrastructure and specific requirements. Here is a nice article describing backup tools for PostgreSQL at various levels.
I will not describe how to use the various tools in this blog, since this info is documented in the above blog and also in the official docs as well as other resources over the net. But I will describe the pros and cons as I experienced them in practice. In this blog, we are dealing exclusively with classic PITR-based physical PostgreSQL backups dependent on:
- pg_basebackup or pg_start_backup()/pg_stop_backup
- physical copy
- archiving of WALs or streaming replication
There are several fine products and solutions, some are open source and free to use while others are commercial. To the best of my knowledge, those are:
- pgbarman by 2ndquadrant (free)
- pgbackrest (free)
- pg_probackup by Postgres Professional (free)
- BART by EDB (commercial)
I did not have the chance to try out BART since it runs on flavors of Linux that I don’t use. In this blog, I will include my own thoughts and impressions while interacting with the respective authors/community/maintainers of each solution since this a very important aspect which goes usually underestimated in the beginning. A little bit of terminology in order to better understand the various terms in each of the tools:
|name for backup site location
|name for cluster
Pgbarman or just barman is the oldest of those tools. The latest release is 2.6 (released while I had this blog in the works! which is great news).
Pgbarman supports base backup via two methods:
- pg_basebackup (backup_method=postgres)
- rsync (backup_method=rsync)
and WAL transfer via:
- WAL archiving
- via rsync
- via barman-wal-archive / put-wal
- WAL via streaming replication with replication slot
This gives us 8 out of the box combinations by which we can use barman. Each has its pros and cons.
Base backup via pg_basebackup (backup_method = postgres)
- the newest/modern way
- relies on proven core PostgreSQL technology
- recommended by the official docs
- no incremental backup
- no parallel backup
- no network compression
- no data deduplication
- no network bandwidth limit
Base backup via rsync (backup_method = rsync)
- old and proven
- Incremental backup
- data deduplication
- network compression
- parallel backup
- network bandwidth limit
- not the recommended (by the authors) way
WAL transfer via WAL archiving (via rsync)
- simpler to setup
- No RPO=0 (zero data loss)
- no way to recover from long and persisting network failures
WAL transfer via WAL archiving (via barman-wal-archive / put-wal)
- the latest and recommended way (introduced in 2.6)
- more reliable/safe than rsync
- No RPO=0 (zero data loss)
- still no way to recover from long and persisting network failures
WAL transfer via WAL streaming with replication slot (via pg_receivewal)
- more modern (and recommended)
- RPO=0 (zero data loss) in synchronous mode
- always associated with replication slot. Could grow in case of network failures
So, while pg_basebackup (postgres method) seems like the future for pgbarman, in reality, all the fancy features come with the rsync method. So let us list all the features of Barman in more detail:
- Remote operation (backups/restores)
- Incremental backups. One of the great features of barman, incremental backups are based on the file level comparison of the database files against those of the last backup in the catalog. In barman the term “differential” refers to a different concept: By barman terminology, a differential backup is the last backup + the individual changes from the last backup. Barman docs say that they provide differential backups via WALs. Barman incremental backups work on file-level, which means if a file is changed the whole file is transfered. This is like pgbackrest and unlike some other offerings like pg_probackup or BART which support block-level differential/incremental backups. Barman incremental backups are specified via: reuse_backup = link or copy. By defining “copy” we achieve reduced time of the backup since only the changed files are transferred and backed up but still no reduction in space since the unchanged files are copied from the previous backup. By defining “link” then the unchanged files are hard linked (not copied) from the last backup. This way we achieve both time reduction and space reduction. I don’t want in any way to bring more confusion in this, but in reality, barman incremental backups are directly comparable with pgbackrest incremental backups, since barman treats (via link or copy) an incremental backup effectively as a full backup. So in both systems, an incremental backup deals with the files which were changed since the last backup. However, regarding differential backups, it means a different thing in each of the aforementioned system, as we’ll see below.
- Backup from standby. Barman gives the option to perform the bulk of the base backup operations from a standby thus freeing the primary from the added IO load. However, note that still the WALs must come from the primary. It doesn’t matter if you use archive_command or WAL streaming via replication slots, you can’t yet (as of this writing with barman being on version 2.6) offload this task to the standby.
- parallel jobs for backup and recover
- A rich and comprehensive set of retention settings based on either:
- Redundancy (number of backups to keep)
- Recovery window (how back in the past should the backups be kept)
In my opinion from a user perspective, the above is great. The user may define reuse_backup = link and a recovery window and let barman (its cron job) deal with the rest. No diff/incr/full etc backups to worry about or schedule or manage. The system (barman) just does the right thing transparently.
- Programming your own pre/post event hook scripts.
- Tablespace remapping
Those are the best strengths of barman. And truly this is almost more than the average DBA would ask from a backup and recovery tool. However, there are some points that could be better:
- The mailing list is not so active and the maintainers rarely write or answer questions
- No feature to resume a failed/interrupted backup
- Replication slots or the use of rsync/barman-wal-archive for archiving are not forgiving in case of failed network or other failures of the backup site. In either case, if the network outage is long enough and the changes in the DB worth a lot of WAL files then the primary will suffer from “no space left on device” and will eventually crash. (not a good thing). What is promising here is that barman now provides an alternative (to rsync) way to transfer WALs so that additional protection against e.g. pg_wal space exhaustion might be implemented in the future, which along with backup resume would truly make barman perfect, at least for me.
Pgbackrest is the current trend among the open source backup tools, mainly because its efficiency to cope with very large volumes of data and the extreme care its creators put into validation of backups via checksums. As of this writing it is on version v2.09, and the docs are to found here. The User Guide might be slightly outdated but the rest of the docs are very up to date and accurate. Pgbackrest relies on WAL archiving using its own archive_command and its own file transfer mechanism which is better and safer than rsync. So pgbackrest is pretty forward since it does not give the larger set of choices that barman provides. Since there is no synchronous mode involved, naturally pgbackrest does not guarantee RPO=0 (zero data loss). Let us describe pgbackrest’s concepts:
- A backup can be:
- Full. A full backup copies the entire database cluster.
- Differential (diff). A differential backup copies only the files that were changed since the last full backup. For a successful restore, both the differential backup and the previous full backup must be valid.
- Incremental (incr). An incremental backup copies only the files that were changed since the last backup (which may be a full backup, a differential or even an incremental one). Similarly to the differential backup, in order to do a successful restore, all previous required backups (including this backup, the latest diff and the previous full) must be valid.
- A stanza is the definition of all required parameters of a PostgreSQL cluster. A PostgreSQL server normal has its own stanza, whereas backup servers will have one stanza for every PostgreSQL cluster that they backup.
- A configuration is where information about stanzas is kept (usually /etc/pgbackrest.conf)
- A repository is where pgbackrest keeps WALs and backups
The user is encouraged to follow the documentation as the documentation itself suggests, from the top to the bottom. The most important features of pgbackrest are:
- Parallel backup and restore
- No direct SQL access to the PostgreSQL server needed
- Local/Remote operation
- Retention based on:
- full backup retention (numbers of full backups to keep)
- diff backup retention (numbers of diff backups to keep)
Incremental backups don’t have their own retention and are expired as soon as a prior backup expires. So the user can define a schedule for taking full backups and a rolling set of diff backups between them.
- Backup from standby. Some files still need to come from the primary but the bulk copy takes place on the standby. Still WALs must originate from the primary.
- Backup integrity. The people behind pgbackrest are extremely careful when it comes to the integrity of the backups. Each file is checksummed at backup time and also is checked after the restore to make sure that no problematic hardware or software bug may result in a faulty restore. Also if page level checksums are enabled on the PostgreSQL cluster then they are also computed for each file. In addition, checksums are computed for every WAL file.
- If compression is disabled and hard links are enabled it is possible to bring up the cluster directly from the catalog. This is extremely important for multi TB large databases.
- Resume of a failed/interrupted back. Very handy in case of unreliable networks.
- Delta restore: Ultra fast restore for large databases, without cleaning the whole cluster.
- Asynchronous & Parallel WAL push to the backup server. This is one of the strongest points of pgbackrest. The PostgreSQL archiver only copies to the spool via archive-push and the heavy job of the transfer and the processing happens in a separate pgbackrest process. This allows for massive WAL transfers while ensuring low PostgreSQL response times.
- Tablespace remapping
- Amazon S3 support
- Support for max WAL queued size. When the backup site is down or the network is failing, using this option will mock like the archiving was successful, allowing PostgreSQL to delete WAL preventing filling up pg_wal, and thus save the pgsql server from a potential PANIC.
So feature-wise pgbackrest puts a lot of emphasis when it comes to data validation and performance, no surprise that it is used by the biggest and busiest PostgreSQL installations. However, there is one thing that could be improved:
- It would really handy to have a more “liberal” option as far as retention is concerned, i.e. provide a way to declarative specify some retention period and then let pgbackrest deal with full/diff/incr backups as needed.
Pg_proback is another promising tool for backups. It is originally based on pg_arman. Its emphasis is on the performance of the backup. It is based on catalogs, and instances, very similar to the rest of the tools, so we have. Its main features include:
- Rich backup-level support ranging from:
- Full backups
- Incremental of three types:
- PAGE backup. Level changes found via WAL scanning. Requires full access to the uninterrupted WAL sequence since the previous backup.
- DELTA backup. Only changed pages are copied to the backup. Independent from WAL archiving, puts certain load on the server.
- PTRACK backup. Requires special pgsql core patching. Works by maintaining a bitmap on the fly as soon as pages are modified. Really fast backup with minimal load on the server.
- Backups can also be divided into:
- Autonomous backups. Those have no requirements on WAL outside the backup. No PITR.
- Archive backups. Those rely on continuous archiving, and support PITR.
- multithreaded model (in contrast to barman, pgbackrest and of course PostgreSQL itself which follow a multiprocess model)
- Data consistency and on demand validation without restore
- Backup from a standby without access to the primary.
- An expressive retention policy specification where redundancy can be used in an AND fashion along with window. Merging backups (via merge) is supported by converting previous incremental backups to full as a way to free space and to provide a method for smooth backup rotation.
So, pg_probackup provides a set of great features with emphasis on performance something which would benefit large installations. However, there are still some things missing, namely:
- No official release supports remote backups. This means that pg_probackup must run on the same host as the pgsql cluster. (There is a dev branch which deals with backup from a remote site as well as archiving to a remote backup server)
- No support for a failed backup resume.
We can summarize the above paragraphs with a feature matrix like the below.
|Zero data loss
|pgbackrest over ssh
|WAL via archiving
|WAL archiving method
|Async WAL archiving
|WAL via streaming
|YES (only for autonomous)
|backup from standby
|WALs from standby
|backup exclusively from standby
|diff backups (from last full)
|YES (via merge)
|incr backups (from last backup)
(same as above)
|transparent backups rotation
|Resume failed backup
|Resilience during network/recovery site failure (pg_wal related)
|retention based on
|Redundancy OR Window
|Full And/Or Diff Redundancy
|Redundancy AND Window
|Help from community/ maintainers/authors
ClusterControl provides the functionality either to generate an immediate backup or to schedule one, and automate the task in a simple and fast way.
We can choose between two backup methods, pgdump (logical) and pg_basebackup (binary). We can also specify where to store the backups (on the database server, on the ClusterControl server or in the cloud), the compression level and encryption.