Logical & Physical PostgreSQL Backups

Krzysztof Ksiazek

Taking backups and having a recovery plan is one of the most important tasks of a DBA - they are crucial to the availability and integrity of the data. You can build replication clusters spanning across multiple datacenters and this helps you handle failure of a node (one or more), but only a backup can help you after an incidental DELETE or DROP. In this blog post we will discuss some backup options available for PostgreSQL.

Backup Types

There are two main backup types - logical and physical backups. This is quite important to understand before you design your backup strategy. The backup type also determines which tools you will be using.

Logical Backups

Logical backups typically contain data stored in plain text, in human-readable format or it can be easily converted to such format. This is quite useful because, if needed, you can extract even a single row and restore it. On the other hand, restoring an entire logical backup can be quite slow - the database has to process all of the queries required to load the data and this will always be slower than to just copy the database files.

Physical Backups

Physical backups grab the data as it is - in the end, the data is just a bunch of files stored on some disk. You can copy the data files and have a copy of your database at a given point in time. In real life though, it is not as easy as running a cp command. When taking a physical backup you need to keep in mind that there might be database changes in the meantime. To ensure your backup can be safely used, any physical backup has to be consistent.

Physical backups are great for recovering whole data sets - the speed is limited by hardware: disk and sometimes network (if you copy the data over the network).

Backups in PostgreSQL

PostgreSQL, like many other databases, supports both physical and logical backups. Let’s take a look how you can setup them up.

Logical Backup in PostgreSQL

PostgreSQL comes with a tool called pg_dump - it’s a utility designed to perform logical backups on the PostgreSQL database. It supports several output formats: by default it stores data in plain text files but it can also produce tar file or a custom binary format, compressed, which can be restored using pg_restore (you can easily extract SQL contents from this format). Let’s see how we can use pg_dump to create a backup of our PostgreSQL server.

First, backup in default mode, plain text SQL:

[email protected]:~/c# pg_dump -d pgbench -U psql -h > pgbench.sql

Then, in the output file, you can see data stored like this:

COPY pgbench_accounts (aid, bid, abalance, filler) FROM stdin;
1       1       0
2       1       0
3       1       0
4       1       0
5       1       0
6       1       0
7       1       0
8       1       0
9       1       0
10      1       0
11      1       0
12      1       0
13      1       0
14      1       0

If you feel like using some other output format, you need to specify it explicitly:

[email protected]:~# pg_dump -d pgbench -U psql -h --format c  -f /root/c/pgdump.c

Restoring data is fairly simple. If you do have data in default, SQL format, just use psql to load it into the database. If you used one of the other formats, you can use pg_restore to load it. If you still have the old data and you want to recover the full database, you may want to drop the existing database and then recreate it. You can also use the following flats:

  -c, --clean                  clean (drop) database objects before recreating
  --if-exists                  use IF EXISTS when dropping objects

While dumping the data, this makes sure that all existing objects will be overwritten.

postgres=# drop database pgbench;
postgres=# create database pgbench;
[email protected]:~# pg_restore --format c /root/c/pgdump.c -d pgbench -U psql -W -h

Please keep in mind that pg_dump generates consistent backups - it uses a single transaction to dump the data so it is relying on MVCC to keep its session separated and not affected by transactions opened afterwards.

Physical Backup in PostgreSQL

Physical backups are all-or-nothing - you take a full copy of the database cluster and you can restore all of it. The challenge is to make the backup consistent. Of course, the easiest way to do that would be to stop PostgreSQL completely and then make a copy of the data using one of the typical Unix tools like cp, scp or rsync. Unfortunately, it is not often that you have the possibility to shutdown your database. Therefore you need to take some precautions to ensure your backup is consistent. To take manual, consistent, physical backups, following steps are required:

  1. You need to create a checkpoint using pg_start_backup(‘some label’)
  2. Copy contents of the data directory
  3. Stop backup using pg_stop_backup()

This can be simplified by running pg_basebackup:

[email protected]:~# pg_basebackup -U psql -h -x -D /pgbase/

If you want to have Point-In-Time-Recovery, you need to configure WAL archiving by enabling it:

wal_level = archive
archive_mode = on

You also want to define archive_command in a way it will copy WAL segments to a separate location.

Restoring base backup is easy - copy the data into clear PostgreSQL data directory and you are all set - once you start the PostgreSQL process, you will restore your data up to the time of the backup. To recover your PostgreSQL to point-in-time, using archived WAL logs, you still need to recover the base backup but then you need to create a recovery.conf file with, at least, a restore_command defined - you want to tell PostgreSQL how it can copy back archived WAL segments to be replayed at the start of the PostgreSQL process.

Download the Whitepaper Today
PostgreSQL Management & Automation with ClusterControl
Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

PostgreSQL Backups in ClusterControl

As of ClusterControl 1.4.2, only pg_dump backups are supported. As with other supported database types, you can create ad-hoc backups or create a backup schedule. Below you can see the dialog related to creating ad-hoc backups.

As you can see in the screenshot above, the backup can be streamed to the controller node using netcat. You can also pick a node to backup from or let ClusterControl to pick one automatically. There’s a also an option to enable compression.

If you’d like to create a backup schedule, it’s very similar to the screen above.

You can define when the backup should be taken and what to do if a backup node is not available - should ClusterControl pick another available node to take a backup from, and which node should it be?

Of course, if you have a backup, you need to be able to restore it.

As you can see in the screenshot above, you can pick the node on which the backup should be restored.

We hope you enjoyed this short blog on different backup scenarios for PostgreSQL. If you’d like to see how easy is to schedule backups of your PostgreSQL servers using ClusterControl, feel free to check it out.

The only management system you’ll ever need to take control of your open source database infrastructure.