blog
Upgrading PostgreSQL 11 to PostgreSQL 13 with TimescaleDB and PostGIS in Linux using pg_upgrade
Businesses and enterprises using old versions of PostgreSQL (PG) face challenges when upgrading to at least the most recent stable version from PostgreSQL 12 or PostgreSQL 13. There are many reasons why upgrading to the latest version is a must. Some of the main reasons for this are to take advantage of its critical enhancements to its built-in functionalities, security updates, performance improvements, and new implementations beneficial for database management.
Upgrading PostgreSQL comes with a few challenges as it’s not as easy compared to other mainstream databases. If you are facing this type of problem, don’t worry. PostgreSQL does not lock you up on a specific version to use. In this blog, we’ll walk through an example of this challenge while having a TimescaleDB and PostGIS installed on an existing PostgreSQL 11 host.
Why pg_upgrade?
pg_upgrade has been around for a very long time as a tool for upgrading major versions of PostgreSQL. Using this tool is not required for minor version upgrades, which means that upgrading your current version of 11.9 to 11.13 is not necessary.
When upgrading your PostgreSQL to a major version with pg_upgrade, the tool works by allowing the data stored in PostgreSQL data files to be upgraded to a later PostgreSQL major version. This works without needing a data dump/reload, which can take some time if you have a big dataset.
Now, here comes the fuss. PostgreSQL, especially for major version releases, is known to have new features added that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files. If a future major release ever changes the data storage format in a way that makes the old data format unreadable, pg_upgrade will not be usable for such upgrades. (The community will attempt to avoid such situations.)
Some may consider pg_upgrade to be dangerous, especially for the production environment. Well, this tool has been widely used elsewhere from QA, to dev, to production environments. It does have its limitations or caveats, such as the known Unicode or character sets stored in your dataset. In that case, you might consider using pg_dump/pg_restore, but it can take some time to finish depending on how large your data is. For newer versions of PostgreSQL, such as PG 14.0, you can only take a dump/restore (or export/import) or logical replication, otherwise use pg_upgrade.
For larger datasets, using pg_upgrade requires you to run this on the same host, which by default applies a copy of all your physical files from your data directory. In that case, pg_upgrade supports the -k or –link option, which means it will use hard links instead of copying files to the new cluster.
pg_upgrade does its best to make sure the old and new clusters are binary-compatible, e.g., by checking for compatible compile-time settings, including 32/64-bit binaries. It is also important that any external modules are binary compatible, though this cannot be checked by pg_upgrade.
pg_upgrade supports upgrades from 8.4.X and later to the current major release of PostgreSQL, including snapshot and beta releases.
Here’s the situation…
In this setup, I used ClusterControl to deploy a PostgreSQL 11 database cluster for a single node. The following have been tested on Centos 7 and Ubuntu Focal (20.04.1):
$ /usr/pgsql-11/bin/postgres --version
postgres (PostgreSQL) 11.13
postgres=# dx
List of installed extensions
Name | Version | Schema | Description
------------------------+---------+------------+-------------------------------------------------------------------
fuzzystrmatch | 1.1 | public | determine similarities and distance between strings
pg_stat_statements | 1.6 | public | track execution statistics of all SQL statements executed
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
postgis | 3.1.4 | public | PostGIS geometry and geography spatial types and functions
postgis_raster | 3.1.4 | public | PostGIS raster types and functions
postgis_sfcgal | 3.1.4 | public | PostGIS SFCGAL functions
postgis_tiger_geocoder | 3.1.4 | tiger | PostGIS tiger geocoder and reverse geocoder
postgis_topology | 3.1.4 | topology | PostGIS topology spatial types and functions
timescaledb | 2.3.1 | public | Enables scalable inserts and complex queries for time-series data
(9 rows)
So I got the following,
PostgreSQL server version: 11.13
TimescaleDB version: 2.3.1
PostGIS version: 3.1.4
If you want to test this with ClusterControl, there are two ways to have TimescaleDB. You can deploy a TimescaleDB cluster or have PostgreSQL and enable the TimescaleDB plugin.
Setting up for your PostgreSQL 13
Using your package manager setup for the Linux environment with your PostgreSQL and TimescaleDB repository is easier. Here are the steps to do so:
Setup the required repositories
First, let’s add the PostgreSQL repository.
For CentOS/RHEL/Oracle Linux
You have to make sure that you have the right repository. For Enterprise Linux (EL) 7, you can do the following:
sudo yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
For other architecture, you can base here https://download.postgresql.org/pub/repos/yum/reporpms/ and replace the EL-7-x86_64 subdirectory.
Let’s add the TimescaleDB repository as well.
vi /etc/yum.repos.d/timescale_timescaledb.repo
Then add the following contents for this file,
[timescale_timescaledb]
name=timescale_timescaledb
baseurl=https://packagecloud.io/timescale/timescaledb/el/7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://packagecloud.io/timescale/timescaledb/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300
Just replace the baseurl accordingly if you are using a version other than EL 7.
For Ubuntu/Debian
Add the PG repository for Ubuntu Focal:
deb http://apt.postgresql.org/pub/repos/apt/ focal-pgdg main
For other Ubuntu/Debian distributions, just replace the focal accordingly, which you can find here http://apt.postgresql.org/pub/repos/apt/dists/. For example, replace focal-pgdg with buster-pgdg.
Now, let’s add the repository for TimescaleDB,
sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/timescale.keyring] https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c -s) main' > /etc/apt/sources.list.d/timescaledb.list"
Import the keyring,
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/timescale.keyring
and update the package lists for upgrades for packages that need upgrading, as well as new packages that have just come to the repositories.
sudo apt-get update
You can replace the sub-directory in the URL if you are using Debian from ubuntu.
Now that we have the repository ready, we’re good to go.
Install PostgreSQL version 13 with TimescaleDB and PostGIS
Installing PostgreSQL 13 can be done on the same host. First, you must make sure things such as the database port are unique. In other words, it has to be different from the current PostgreSQL 11 installed on the same host.
For CentOS/RHEL/Oracle Linux
Run the command below to install PostgreSQL 13 and its dependent packages:
yum install postgresql13.x86_64 postgresql13-server.x86_64 postgresql13-contrib.x86_64 postgresql13-libs.x86_64
Then initialize the database cluster and its required collection of databases by running the command below:
$ /usr/pgsql-13/bin/postgresql-13-setup initdb
At this point, there should be two data directories both for PG 11 and PG 13:
[root@pupnode28 ~]# ls -alth /var/lib/pgsql/* -d
drwx------. 4 postgres postgres 51 Sep 22 14:19 /var/lib/pgsql/13
drwx------. 4 postgres postgres 33 Sep 21 18:53 /var/lib/pgsql/11
Now that we’re good with PostgreSQL 13 let’s install TimescaleDB. We need to make sure that the plugin to be installed is the same version on PostreSQL 11.
Take note that, in order to make sure that pg_upgrade will work smoothly, the plugins of your source and major version destination should be the same version. This is because pg_upgrade will look for its designated libraries linked to the plugins or extensions that have been loaded or used by your old or source database version of your PostgreSQL. You can verify this in your Enterprise Linux by running showduplicates or by verifying with info just like below either with dnf or yum:
$ yum --showduplicates list timescaledb_13.x86_64 timescaledb-2-postgresql-13.x86_64 timescaledb-2-oss-postgresql-13.x86_64 timescaledb-2-loader-postgresql-13.x86_64|grep '2.3.1'
Repository pgdg-common is listed more than once in the configuration
timescaledb-2-loader-postgresql-13.x86_64 2.3.1-0.el7 timescale_timescaledb
timescaledb-2-oss-postgresql-13.x86_64 2.3.1-0.el7 timescale_timescaledb
timescaledb-2-postgresql-13.x86_64 2.3.1-0.el7 timescale_timescaledb
Or verify it with the info option:
$ yum info timescaledb-2-loader-postgresql-13-2.3.1-0.el7.x86_64 timescaledb-2-postgresql-13-2.3.1-0.el7.x86_64
Now, we’re ready to install the TimescaleDB package for the PG 13 version.
$ yum install timescaledb-2-loader-postgresql-13-2.3.1-0.el7.x86_64 timescaledb-2-postgresql-13-2.3.1-0.el7.x86_64
After you’ve installed it, you can try to run timescaledb-tune tool to tune up your postgresql.conf configuration file. Just run the command below:
$ timescaledb-tune --pg-config=/usr/pgsql-13/bin/pg_config
Now, let’s install the PostGIS package for the PG 13 version as well.
$ yum install -y postgis31_13.x86_64
For Ubuntu/Debian
Simply run:
$ apt install postgresql-client-13 postgresql-13
The great thing with Ubuntu/Debian distributions is that there are tools for PostgreSQL that are very handy for managing your PostgreSQL clusters, such as pg_lsclusters, pg_ctlcluster, etc.
You can verify your available clusters installed.
$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
11 main 5432 online postgres /var/lib/postgresql/11/main log/postgresql-%Y-%m-%d_%H%M%S.log
13 main 5433 online postgres /var/lib/postgresql/13/main /var/log/postgresql/postgresql-13-main.log
In Ubuntu/Debian, there’s no need to change the port since it will be handled during the installation phase and it detects and sets it uniquely, accordingly.
Now, let’s install TimescaleDB.
$ apt install timescaledb-2-loader-postgresql-13 timescaledb-2-postgresql-13
Optionally, you can run the timescaledb-tune tool to tune up your postgresql.conf configuration file by simply just invoking the tool as the following:
$ timescaledb-tune
Now, we’re ready to install the PostGIS package for PG 13.
$ apt install postgresql-13-postgis-3-scripts postgresql-13-postgis-3
Review your postgresql.conf
It’s always better to review your postgresql.conf configuration file. In Enterprise Linux versions, you can locate your postgresql.conf either in your data_directory or PGDATA path. Whereas for Ubuntu/Debian, you can locate it in /etc/postgresql/
shared_preload_libraries = 'pg_stat_statements,timescaledb' # pg_stat_statements is not required but if you are using ClusterControl, make sure this is appended.
port = 5532 # make sure that the port number is unique than the old version of your PostgreSQL
listen_address = * # depends on your setup but if you need to specify the available network interfaces to its IP addresses (IPv4 or IPv6) set it accordingly.
It’s best practice to compare your old and new versions of your PostgreSQL configuration files to ensure that your postgresql.conf is identical to what is needed and set.
Before proceeding with the next step, we also need to check that your PostgreSQL version 13 is loaded accordingly. Make sure that you have the latest version acquired or installed in your host. Start the database and make sure that it starts and runs correctly.
To start in EL distributions, run the command below:
$ sudo -iu postgres /usr/pgsql-13/bin/pg_ctl -D /var/lib/pgsql/13/data/ -o "-c config_file=/var/lib/pgsql/13/data/postgresql.conf" start
or for Ubuntu/Debian, run the command below:
$ sudo -iu postgres /usr/lib/postgresql/13/bin/pg_ctl -D /var/lib/postgresql/13/main/ -o "-c config_file=/etc/postgresql/13/main/postgresql.conf" start
or use the pg_ctlcluster tool to start, restart, or stop your PG Cluster.
When ready, run pg_upgrade…
Before moving on, first make sure that you always have your backup from your old server ready and available. Always take a logical backup and physical backup as good practice before proceeding with a major upgrade.
Now that you’re ready, you are good to run pg_upgrade. In practice, you have to initially run pg_upgrade with a check to determine the incompatibility and issues before proceeding to the main procedure of pg_upgrade. Before running the pg_upgrade, make sure that both PG 11 and PG 13 are down while doing this process. That just means downtime is needed for this process.
To do that, run the following command with –check option:
$ sudo -iu postgres /usr/lib/postgresql/13/bin/pg_upgrade -o "-c config_file=/etc/postgresql/11/main/postgresql.conf" --old-datadir=/var/lib/postgresql/11/main/ -O "-c config_file=/etc/postgresql/13/main/postgresql.conf" --new-datadir=/var/lib/postgresql/13/main/ --old-bindir=/usr/lib/postgresql/11/bin --new-bindir=/usr/lib/postgresql/13/bin --check
Replace the –old-datadir value or config_file accordingly if it’s different from your setup.
This shall run compatibility checks just like the result below:
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for tables WITH OIDS ok
Checking for invalid "sql_identifier" user columns ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok
*Clusters are compatible*
If all checks signal “ok,” that means a successful check, and the bottom message shows that clusters are compatible, then you should be good to go.
Finally, run the command again without the –check option:
$ sudo -iu postgres /usr/lib/postgresql/13/bin/pg_upgrade -o "-c config_file=/etc/postgresql/11/main/postgresql.conf" --old-datadir=/var/lib/postgresql/11/main/ -O "-c config_file=/etc/postgresql/13/main/postgresql.conf" --new-datadir=/var/lib/postgresql/13/main/ --old-bindir=/usr/lib/postgresql/11/bin --new-bindir=/usr/lib/postgresql/13/bin
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for tables WITH OIDS ok
Checking for invalid "sql_identifier" user columns ok
Creating dump of global objects ok
Creating dump of database schemas ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok
If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_xact to new server ok
Setting oldest XID for new cluster ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluste ok
Copying user relation files ok
Setting next OID for new cluster ok
Sync data directory to disk ok
Creating script to analyze new cluster ok
Creating script to delete old cluster ok
Checking for extension updates notice
Your installation contains extensions that should be updated
with the ALTER EXTENSION command. The file
update_extensions.sql
when executed by psql by the database superuser will update
these extensions.
Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
./analyze_new_cluster.sh
Running this script will delete the old cluster's data files:
./delete_old_cluster.sh
Depending on the size of your dataset, it might take a bit of time. In the example command above, it does copy the transaction logs, which are of physical files. However, if your disk size is tight, then you can use the -k or –link option, which uses hard links. If all goes well, it should provide you with messages such as the above notifying you with complete upgrade and notices to update the extensions you might have. In this setup, it goes smoothly. The update_extensions.sql contains only the following:
$ cat update_extensions.sql
connect postgres
ALTER EXTENSION "pg_stat_statements" UPDATE;
As you noticed, pg_upgrade does a series of actions. It analyzes all rows from the source cluster, copying transaction metadata logs and its multi-transaction status data and the rest.
Once you figured out that everything looks good, then run the analyze_new_cluster.sh script, which shall run
vacuumdb --all --analyze-only.
$ sudo -iu postgres PGPORT=5433 ./analyze_new_cluster.sh
Take note that you should specify the port of your PostgreSQL 13 so that vacuumdb will run correctly to the right target server.
In this case, the upgrade outcome with TimescaleDB and PostGIS extensions enabled goes well. Once all appears in order, make sure to start the server and do a series of tests and checks.
Test the upgrade process
Always test and review the upgrade process. Here are a few tables and a user-defined database, which contains hypertables and PostGIS tables that rely on using geometry and geography spatial types and functions.
$ sudo -iu postgres psql -p5433
psql (13.4 (Ubuntu 13.4-1.pgdg20.04+1))
Type "help" for help.
postgres=# l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+---------+---------+-----------------------
mydb | postgres | UTF8 | C.UTF-8 | C.UTF-8 |
postgres | postgres | UTF8 | C.UTF-8 | C.UTF-8 |
template0 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | postgres=CTc/postgres+
| | | | | =c/postgres
(4 rows)
postgres=# c mydb
You are now connected to database "mydb" as user "postgres".
mydb=# set search_path="$user",public;
SET
mydb=# dt+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+-----------------+-------+----------+-------------+------------+-------------
public | conditions | table | postgres | permanent | 8192 bytes |
public | global_points | table | postgres | permanent | 16 kB |
public | roads | table | postgres | permanent | 16 kB |
public | sample_table | table | postgres | permanent | 8192 bytes |
public | spatial_ref_sys | table | postgres | permanent | 6968 kB |
(5 rows)
Checking some of my PostGIS and hypertables:
mydb=# d roads
Table "public.roads"
Column | Type | Collation | Nullable | Default
------------+----------------------+-----------+----------+---------
road_id | integer | | |
road_name | character varying | | |
roads_geom | geometry(LineString) | | |
mydb=# d sample_table
Table "public.sample_table"
Column | Type | Collation | Nullable | Default
--------+--------------------------+-----------+----------+------------------------------------------
id | integer | | not null | nextval('sample_table_id_seq'::regclass)
time | timestamp with time zone | | not null |
name | character varying | | not null |
Indexes:
"sample_table_pkey" PRIMARY KEY, btree (id, "time")
"sample_table_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON sample_table FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Number of child tables: 371 (Use d+ to list them.)
mydb=# d conditions
Table "public.conditions"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+---------
time | timestamp with time zone | | not null |
location | text | | not null |
location2 | character(10) | | not null |
temperature | double precision | | |
humidity | double precision | | |
Indexes:
"conditions_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON conditions FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Number of child tables: 366 (Use d+ to list them.)
mydb=# select count(*) from sample_table;
count
-------
2588
(1 row)
mydb=# SELECT name FROM global_points WHERE ST_DWithin(location, 'SRID=4326;POINT(-110 29)'::geography, 1000000);
name
--------
Town
Forest
(2 rows)
mydb=# SELECT n, ST_AsEWKT(ST_GeometryN(the_geom, n)) As geomewkt
mydb-# FROM (
mydb(# VALUES (ST_GeomFromEWKT('MULTIPOINT(1 2 7, 3 4 7, 5 6 7, 8 9 10)') ),
mydb(# ( ST_GeomFromEWKT('MULTICURVE(CIRCULARSTRING(2.5 2.5,4.5 2.5, 3.5 3.5), (10 11, 12 11))') )
mydb(# )As foo(the_geom)
mydb-# CROSS JOIN generate_series(1,100) n
mydb-# WHERE n <= ST_NumGeometries(the_geom);
n | geomewkt
---+-----------------------------------------
1 | POINT(1 2 7)
1 | CIRCULARSTRING(2.5 2.5,4.5 2.5,3.5 3.5)
2 | POINT(3 4 7)
2 | LINESTRING(10 11,12 11)
3 | POINT(5 6 7)
4 | POINT(8 9 10)
(6 rows)
Now everything looks ready to serve as my new cluster.
Once you get things going, then you can run:
$ sudo -iu postgres ./delete_old_cluster.sh
Make sure you only run the script since this is a very dangerous script, as it will delete all files in your old PostgreSQL cluster.
Conclusion
pg_upgrade is such a great tool for managing and upgrading your PostgreSQL database server. It can handle complex setups such as with TimescaleDB or PostGIS extensions enabled. Although this tool comes with its limitations, there’s no stopping from using it, especially for your DBA work and on production servers apart from QA and development environments.