blog

Backup PostgreSQL Using pg_dump and pg_dumpall

Joshua Otwell

Published June 20, 2018

Businesses and services deliver value based on data. Availability, consistent state, and durability are top priorities for keeping customers and end-users satisfied. Lost or inaccessible data could possibly equate to lost customers.

Database backups should be at the forefront of daily operations and tasks.

We should be prepared for the event that our data becomes corrupted or lost.

I’m a firm believer in an old saying I’ve heard: “It’s better to have it and not need it than to need it and not have it.“

That applies to database backups as well. Let’s face it, without them, you basically have nothing. Operating on the notion that nothing can happen to your data is a fallacy.

Most DBMS’s provide some means of built-in backup utilities. PostgreSQL has pg_dump and pg_dumpall out of the box.

Both present numerous customization and structuring options. Covering them all individually in one blog post would be next to impossible. Instead, I’ll look at those examples I can apply best, to my personal development/learning environment.

That being said, this blog post is not targeted at a production environment. More likely, a single workstation/development environment should benefit the most.

What are pg_dump and pg_dumpall?

The documentation describes pg_dump as: “pg_dump is a utility for backing up a PostgreSQL database”

And the pg_dumpall documentation: “pg_dumpall is a utility for writing out (“dumping”) all PostgreSQL databases of a cluster into one script file.”

Backing up a Database and/or Table(s)

To start, I’ll create a practice database and some tables to work with using the below SQL:

postgres=# CREATE DATABASE example_backups;
CREATE DATABASE

example_backups=# CREATE TABLE students(id INTEGER,
example_backups(# f_name VARCHAR(20),
example_backups(# l_name VARCHAR(20));
CREATE TABLE

example_backups=# CREATE TABLE classes(id INTEGER,
example_backups(# subject VARCHAR(20));
CREATE TABLE

example_backups=# INSERT INTO students(id, f_name, l_name)
example_backups-# VALUES (1, 'John', 'Thorn'), (2, 'Phil', 'Hampt'),
example_backups-# (3, 'Sue', 'Dean'), (4, 'Johnny', 'Rames');
INSERT 0 4

example_backups=# INSERT INTO classes(id, subject)
example_backups-# VALUES (1, 'Math'), (2, 'Science'),
example_backups-# (3, 'Biology');
INSERT 0 3

example_backups=# dt;
         List of relations
Schema |   Name | Type  | Owner
--------+----------+-------+----------
public | classes  | table | postgres
public | students | table | postgres
(2 rows)

example_backups=# SELECT * FROM students;
id | f_name | l_name
----+--------+--------
 1 | John   | Thorn
 2 | Phil   | Hampt
 3 | Sue    | Dean
 4 | Johnny | Rames
(4 rows)

example_backups=# SELECT * FROM classes;
id | subject
----+---------
 1 | Math
 2 | Science
 3 | Biology
(3 rows)

Database and tables all set up.

To note:

In many of these examples, I’ll take advantage of psql’s ! meta-command, allowing you to either drop into a shell (command-line), or execute whatever shell commands that follow.

Just be aware that in a terminal or command-line session (denoted by a leading ‘$’ in this blog post), the ! meta-command should not be included in any of the pg_dump or pg_dumpall commands. Again, it is a convenience meta-command within psql.

Backing up a single table

In this first example, I’ll dump the only the students table:

example_backups=# ! pg_dump -U postgres -t students example_backups > ~/Example_Dumps/students.sql.

Listing out the directory’s contents, we see the file is there:

example_backups=# ! ls -a ~/Example_Dumps
.  .. students.sql

The command-line options for this individual command are:

-U postgres: the specified username
-t students: the table to dump
example_backups: the database

What’s in the students.sql file?

$ cat students.sql
--
-- PostgreSQL database dump
--
-- Dumped from database version 10.4 (Ubuntu 10.4-2.pgdg16.04+1)
-- Dumped by pg_dump version 10.4 (Ubuntu 10.4-2.pgdg16.04+1)
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
 
SET default_tablespace = '';
 
SET default_with_oids = false;
 
--
-- Name: students; Type: TABLE; Schema: public; Owner: postgres
--
CREATE TABLE public.students (
   id integer,
   f_name character varying(20),
   l_name character varying(20)
);
 
ALTER TABLE public.students OWNER TO postgres;
 
--
-- Data for Name: students; Type: TABLE DATA; Schema: public; Owner: postgres
--
COPY public.students (id, f_name, l_name) FROM stdin;
1 John Thorn
2 Phil Hampt
3 Sue Dean
4 Johnny Rames
.
--
-- PostgreSQL database dump complete

We can see the file has the necessary SQL commands to re-create and re-populate table students.

But, is the backup good? Reliable and working?

We will test it out and see.

example_backups=# DROP TABLE students;
DROP TABLE

example_backups=# dt;
         List of relations
Schema |  Name | Type  | Owner
--------+---------+-------+----------
public | classes | table | postgres
(1 row)

It’s gone.

Then from the command-line pass the saved backup into psql:

$ psql -U postgres -W -d example_backups -f ~/Example_Dumps/students.sql
Password for user postgres:
SET
SET
SET
SET
SET
set_config
------------
(1 row)
 
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
COPY 4

Let’s verify in the database:

example_backups=# dt;
         List of relations
Schema |   Name | Type  | Owner
--------+----------+-------+----------
public | classes  | table | postgres
public | students | table | postgres
(2 rows)

example_backups=# SELECT * FROM students;
id | f_name | l_name
----+--------+--------
 1 | John   | Thorn
 2 | Phil   | Hampt
 3 | Sue    | Dean
 4 | Johnny | Rames
(4 rows)

Table and data have been restored.

Backing up multiple tables

In this next example, we will back up both tables using this command:

example_backups=# ! pg_dump -U postgres -W -t classes -t students -d example_backups > ~/Example_Dumps/all_tables.sql
Password:

(Notice I needed to specify a password in this command due to the -W option, where I did not in the first example. More on this to come.)

Let’s again verify the file was created by listing out the directory contents:

example_backups=# ! ls -a ~/Example_Dumps
.  .. all_tables.sql  students.sql

Then drop the tables:

example_backups=# DROP TABLE classes;
DROP TABLE
example_backups=# DROP TABLE students;
DROP TABLE
example_backups=# dt;
Did not find any relations.

Then restore with the all_tables.sql backup file:

$ psql -U postgres -W -d example_backups -f ~/Example_Dumps/all_tables.sql
Password for user postgres:
SET
SET
SET
SET
SET
set_config
------------
(1 row)
 
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
COPY 3
COPY 4

example_backups=# dt;
         List of relations
Schema |   Name | Type  | Owner
--------+----------+-------+----------
public | classes  | table | postgres
public | students | table | postgres
(2 rows)

Both tables have been restored.

As we can see with pg_dump, you can back up just one, or multiple tables within a specific database.

Backing up a database

Let’s now see how to backup the entire example_backups database with pg_dump.

example_backups=# ! pg_dump -U postgres -W -d example_backups > ~/Example_Dumps/ex_back_db.sql
Password:
 
example_backups=# ! ls -a ~/Example_Dumps
.  .. all_tables.sql  ex_back_db.sql students.sql

The ex_back_db.sql file is there.

I’ll connect to the postgres database in order to drop the example_backups database.

postgres=# DROP DATABASE example_backups;
DROP DATABASE

Then restore from the command-line:

$ psql -U postgres -W -d example_backups -f ~/Example_Dumps/ex_back_db.sql
Password for user postgres:
psql: FATAL:  database "example_backups" does not exist

It’s not there. Why not? And where is it?

We have to create it first.

postgres=# CREATE DATABASE example_backups;
CREATE DATABASE

Then restore with the same command:

$ psql -U postgres -W -d example_backups -f ~/Example_Dumps/ex_back_db.sql
Password for user postgres:
SET
SET
SET
SET
SET
set_config
------------
(1 row)
 
SET
SET
SET
CREATE EXTENSION
COMMENT
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
COPY 3
COPY 4

postgres=# c example_backups;
You are now connected to database "example_backups" as user "postgres".
example_backups=# dt;
         List of relations
Schema |   Name | Type  | Owner
--------+----------+-------+----------
public | classes  | table | postgres
public | students | table | postgres
(2 rows)

Database and all tables present and accounted for.

We can avoid this scenario of having to create the target database first, by including the -C option when taking the backup.

example_backups=# ! pg_dump -U postgres -W -C -d example_backups > ~/Example_Dumps/ex_back2_db.sql
Password:

I’ll reconnect to the postgres database and drop the example_backups database so we can see how the restore works now (Note those connect and DROP commands not shown for brevity).

Then on the command-line (notice no -d dbname option included):

$ psql -U postgres -W -f ~/Example_Dumps/ex_back2_db.sql
Password for user postgres:
……………..
(And partway through the output...)
CREATE DATABASE
ALTER DATABASE
Password for user postgres:
You are now connected to database "example_backups" as user "postgres".
SET
SET
SET
SET
SET
set_config
------------
(1 row)
 
SET
SET
SET
CREATE EXTENSION
COMMENT
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
COPY 3
COPY 4

Using the -C option, we are prompted for a password to make a connection as mentioned in the documentation concerning the -C flag:

“Begin the output with a command to create the database itself and reconnect to the created database.”

Then in the psql session:

postgres=# c example_backups;
You are now connected to database "example_backups" as user "postgres".

Everything is restored, good to go, and without the need to create the target database prior to the restore.

pg_dumpall for the entire cluster

So far, we have backed up a single table, multiple tables, and a single database.

But if we want more than that, for instance backing up the entire PostgreSQL cluster, that’s where we need to use pg_dumpall.

So what are some notable differences between pg_dump and pg_dumpall?

For starters, here is an important distinction from the documentation:

“Since pg_dumpall reads tables from all databases, you will most likely have to connect as a database superuser in order to produce a complete dump. Also, you will need superuser privileges to execute the saved script in order to be allowed to add users and groups and to create databases.”

Using the below command, I’ll back up my entire PostgreSQL cluster and save it in the entire_cluster.sql file:

$ pg_dumpall -U postgres -W -f ~/Example_Dumps/Cluster_Dumps/entire_cluster.sql
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:

What on earth? Are you wondering did I have to enter a password for each prompt?

Yep, sure did. 24 times.

Count ’em. (Hey, I like to explore and delve into different databases as I learn? What can I say?)

But why all the prompts?

First of all, after all that hard work, did pg_dumpall create the backup file?

postgres=# ! ls -a ~/Example_Dumps/Cluster_Dumps
.  .. entire_cluster.sql

Yep, the backup file is there.

Let’s shed some light on all that ‘typing practice’ by looking at this passage from the documentation:

“pg_dumpall needs to connect several times to the PostgreSQL server (once per database). If you use password authentication it will ask for a password each time.”

I know what you’re thinking.

This may not be ideal or even feasible. What about processes, scripts, or cron jobs that run in the middle of the night?

Is someone going to hover over the keyboard, waiting to type?

Probably not.

One effective measure to prevent facing those repeated password prompts is a ~/.pgpass file.

Here is the syntax the ~/.pgpass file requires to work (example provided from the documentation see link above):

hostname:port:database:username:password

With a ~/.pgpass file present in my development environment, containing the necessary credentials for the postgres role, I can omit the -W (also -w) option and run pg_dumpall without manually authenticating with the password:

$ pg_dumpall -U postgres -f ~/Example_Dumps/Cluster_Dumps/entire_cluster2nd.sql

Listing out the directory contents:

postgres=# ! ls -a ~/Example_Dumps/Cluster_Dumps
.  .. entire_cluster2nd.sql  entire_cluster.sql

The file is created and no repeating password prompts.

The saved file can be reloaded with psql similar to pg_dump.

The connection database is less critical as well according to this passage from the documentation: ”It is not important to which database you connect here since the script file created by pg_dumpall will contain the appropriate commands to create and connect to the saved databases.”

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

pg_dump, pg_dumpall, and shell scripts – A Handy Combination

In this section, we will see a couple of examples of incorporating pg_dump and pg_dumpall into simple shell scripts.

The be clear, this is not a shell script tutorial. Nor am I a shell script guru. I’ll mainly provide a couple of examples I use in my local development/learning environment.

Up first, let’s look at a simple shell script you can use to backup a single database:

#!/bin/bash
# This script performs a pg_dump, saving the file the specified dir.
# The first arg ($1) is the database user to connect with.
# The second arg ($2) is the database to backup and is included in the file name.
# $(date +"%Y_%m_%d") includes the current system date into the actual file name.

pg_dump -U $1 -W -C -d $2 > ~/PG_dumps/Dump_Scripts/$(date +"%Y_%m_%d")_$2.sql

As you can see, this script accepts 2 arguments: the first one is the user (or role) to connect with for the backup, while the second is the name of the database you want to back up.

Notice the -C option in the command so that we can restore if the database happens to be non-existent, without the need to manually create it beforehand.

Let’s call the script with the postgres role for the example_backups database (Don’t forget to make the script executable with at least chmod +x prior to calling for the first time):

$ ~/My_Scripts/pgd.sh postgres example_backups
Password:

And verify it’s there:

$ ls -a ~/PG_dumps/Dump_Scripts/
.  .. 2018_06_06_example_backups.sql

Restoration is performed with this backup script as in the previous examples.

A similar shell script can be used with pg_dumpall for backing up the entire PostgreSQL cluster.

This shell script will pipe (|) pg_dumpall into gzip, which is then directed to a designated file location:

#!/bin/bash
# This shell script calls pg_dumpall and pipes into the gzip utility, then directs to
# a directory for storage.
# $(date +"%Y_%m_%d") incorporates the current system date into the file name.
 
pg_dumpall -U postgres | gzip > ~/PG_dumps/Cluster_Dumps/$(date +"%Y_%m_%d")_pg_bck.gz

Unlike the previous example script, this one does not accept any arguments.

I’ll call this script on the command-line, (no password prompt since the postgres role utilizes the ~/.pgpass file – See section above.)

$ ~/My_Scripts/pgalldmp.sh

Once complete, I’ll list the directory contents also showing file sizes for comparison between the .sql and gz files:

postgres=# ! ls -sh ~/PG_dumps/Cluster_Dumps
total 957M
37M 2018_05_22_pg_bck.gz   32M 2018_06_06_pg_bck.gz 445M entire_cluster2nd.sql  445M entire_cluster.sql

A note for the gz archive format from the docs:

“The alternative archive file formats must be used with pg_restore to rebuild the database.”

Summary

I have assembled key points from the documentation on pg_dump and pg_dumpall, along with my observations, to close out this blog post:

Note: Points provided from the documentation are in quotes.

“pg_dump only dumps a single database”
The plain-text SQL file format is the default output for pg_dump.
A role needs the SELECT privilege to run pg_dump according to this line in the documentation: “pg_dump internally executes SELECT statements. If you have problems running pg_dump, make sure you are able to select information from the database using, for example, psql”
To include the necessary DDL CREATE DATABASE command and a connection in the backup file, include the -C option.
-W: This option forces pg_dump to prompt for a password. This flag is not necessary since if the server requires a password, you are prompted anyway. Nevertheless, this passage in the documentation caught my eye so I thought to include it here: “However, pg_dump will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing -W to avoid the extra connection attempt.”
-d: Specifies the database to connect to. Also in the documentation: ”This is equivalent to specifying dbname as the first non-option argument on the command line.”
Utilizing flags such as -t (table) allows users to backup portions of the database, namely tables, they do have access privileges for.
Backup file formats can vary. However, .sql files are a great choice among others. Backup files are read back in by psql for a restore.
pg_dump can back up a running, active database without interfering with other operations (i.e., other readers and writers).
One caveat: pg_dump does not dump roles or other database objects including tablespaces, only a single database.
To take backups on your entire PostgreSQL cluster, pg_dumpall is the better choice.
pg_dumpall can handle the entire cluster, backing up information on roles, tablespaces, users, permissions, etc… where pg_dump cannot.
Chances are, a role with SUPERUSER privileges will have to perform the dump, and restore/recreate the file when it is read back in through psql because during restore, the privilege to read all tables in all databases is required.

My hope is through this blog post, I have provided adequate examples and details for a beginner level overview on pg_dump and pg_dumpall for a single development/learning PostgreSQL environments.

Although all available options were not explored, the official documentation contains a wealth of information with examples for both utilities so be sure and consult that resource for further study, questions, and reading.