blog

Top Open Source Tools for PostgreSQL Migrations

Sebastian Insausti

Published: October 27, 2020
Last Updated: May 4, 2022

Database migration is the process of migrating data from one or more source databases to one or more target databases by using a service or tool. There are different types of migrations. You can migrate from a technology to a different one, to a different datacenter or to the cloud, or even in the same place and same technology to another machine. The best tool for a PostgreSQL migration depends on the type of migration and requirements like availability, or downtime tolerance, and sometimes it is hard to find the best tool to do the job.

In this blog, we will mention some open-source tools for PostgreSQL migrations with a short overview of each option.

Backup and Restore Migration Tools

This could be the easiest way to perform a data migration, but also the slowest in terms of Recovery Time Objective (RTO). Using this method means you will take a backup from your current database, most probably a logical backup if you want to restore it in a different version or infrastructure, copy the dump, and restore it in the new server. Depending on the amount of data, it could take a lot of time, but it is the basic way to migrate your database. Let’s see some tools for this.

pgdump/pgdumpall

pg_dump is a utility for backing up a single PostgreSQL database. It makes consistent backups even if the database is being used concurrently, as It doesn’t block other users. To backup an entire cluster, or to backup global objects that are common to all databases in a cluster (such as roles and tablespaces), you must use pg_dumpall instead.

Dumps can be output in script or archive file formats. Script dumps are plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved. It can be used to reconstruct the database on other machines, other architectures, or even on other SQL database products.

pg_basebackup

pg_basebackup is used to take a base backup of a running PostgreSQL database cluster. The backup is taken without affecting other database clients and can be used for both PITR (Point-In-Time-Recovery) and as a starting point of a Streaming Replication standby server. It makes an exact copy of the database cluster’s files while making sure the server is put into and out of backup mode automatically. Backups are always taken of the entire database cluster; it is not possible to back up individual databases or database objects.

pgBackRest

pgBackRest is an open-source backup tool that creates physical backups with some improvements compared to the classic pg_basebackup tool. You can use pgBackRest to perform an initial database copy for Streaming Replication by using an existing backup, or you can use the delta option to rebuild an old standby server.

Some of the most important pgBackRest features are:

Parallel Backup & Restore
Local or Remote Operation
Full, Incremental, and Differential Backups
Backup Rotation and Archive Expiration
Backup Integrity check
Backup Resume
Delta Restore
Encryption

PostgreSQL Migration Tools

Instead of using a backup tool, there are different utilities to perform this migration in a faster way, with different methods. It can use an ETL approach, or even configure replication between different database technologies using the same concept about Extract – Transform – Load. Let’s see some of these tools.

pg_chameleon

pg_chameleon is a MySQL to PostgreSQL replica system. It can connect to the MySQL replication protocol and replicate the data changes in PostgreSQL. Whether the user needs to setup a permanent replica between MySQL and PostgreSQL, or perform an engine migration, pg_chameleon is a good option for the task.

The main features are:

Read from multiple MySQL schemas and restore them into a target PostgreSQL database
Setup PostgreSQL to act as a MySQL slave
Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY KEY/TRUNCATE, RENAME)
Tables that generate errors are automatically excluded from the replica
Possibility to refresh single tables or single schemas
Detach replica from MySQL for migration support
Data type override
Daemonized init_replica process
Daemonized replica process with two separate subprocesses, one for the read and one for the replay

pgloader

pgloader loads data from various sources into PostgreSQL. It can transform the data it reads on the fly and submits raw SQL before and after the loading. It uses the COPY PostgreSQL protocol to stream the data into the server and manages errors by filling a pair of reject.dat and reject.log files.

Thanks to being able to load data directly from a database source, pgloader also supports migrations from other products to PostgreSQL. In this mode of operations, pgloader handles both the schema and data parts of the migration, in a single unmanned command, allowing to implement Continuous Migration.

The main features are:

Many source formats supported, like CSV, db3, and IBM IXF files
On the fly data transformation
Full Field projections
Reading from compressed files (zip, tar, and gzip)
HTTP(S) support
Target schema discovery
On error stop/resume next options
Pre/Post SQL commands
One-command migration
Schema discovery
User-defined casting rules
Partial Migrations including/excluding tables
Schema or Data only
Materialized Views, or schema rewrite on-the-fly
Continuous Migration

Ora2Pg

Ora2Pg is a free tool used to migrate an Oracle or MySQL database to a PostgreSQL compatible schema. It connects your current database, scans it, and extracts its structure or data, it then generates SQL scripts that you can load into your PostgreSQL database.

The main features are:

Export full database schema (tables, views, sequences, indexes), with unique, primary, foreign key and check constraints
Export grants/privileges for users and groups
Export range/list partitions and subpartitions
Export a table selection (by specifying the table names).
Export predefined functions, triggers, procedures, packages, and package bodies
Export full data or follow a WHERE clause
Full support of Oracle BLOB object as PG BYTEA
Export Oracle views as PG tables
Export Oracle user-defined types
Provide some basic automatic conversion of PLSQL code to PLPGSQL
Export Oracle tables as foreign data wrapper tables
Export materialized view
Show a detailed report of an Oracle database content
Migration cost assessment of an Oracle database
Migration difficulty level assessment of an Oracle database
Migration cost assessment of PL/SQL code from a file
Migration cost assessment of Oracle SQL queries stored in a file
Generate XML ktr files to be used with Penthalo Data Integrator (Kettle)
Export Oracle locator and spatial geometries into PostGis
Export DBLINK as Oracle FDW
Export SYNONYMS as views
Export DIRECTORY as an external table or directory for external_file extension
Full MySQL export just like Oracle database

There are more ETL available options, like Pentaho or Talen, but those are more oriented to data integration or data management than migration, so we won’t describe them here to avoid an extensive blog post. You can find more information here.

PostgreSQL Logical Replication for Migrations

Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). It is based on a publish and subscribe mode, where one or more subscribers subscribe to one or more publications on a publisher node.

A publication is a set of changes generated from a table or a group of tables (also referred to as a replication set). The node where a publication is defined is referred to as publisher. A subscription is the downstream side of logical replication. The node where a subscription is defined is referred to as the subscriber, and it defines the connection to another database and set of publications (one or more) to which it wants to subscribe. Subscribers pull data from the publications they subscribe to.

This method can be used for migrating or upgrading your PostgreSQL database.

For more information, you can refer to the corresponding blog post about upgrading PostgreSQL with zero downtime.

Conclusion

Migrations are a difficult and risky task, as you are moving or transforming data from one (or more) source to a target (or more than one), and it is even worse if your target is a different engine. So, you will need to have a good plan, with detailed steps, and of course, a test environment to test all the stages of the migration. In this blog, we mentioned some tools to help with this task, and the best choice will depend on your requirements about the availability of your system.

ClusterControl now supports PowerBroker pbrun for secure privilege management

Replacing MySQL Enterprise Monitor with Enterprise Manager

ClusterControl adds Kubernetes Database Operator management in v2.3.2

Why Cloud Repatriation Matters Now More Than Ever

Backup and Restore Migration Tools

pgdump/pgdumpall

pg_basebackup

pgBackRest

PostgreSQL Migration Tools

pg_chameleon

pgloader

Ora2Pg

PostgreSQL Logical Replication for Migrations

Conclusion

Recommended

ClusterControl now supports PowerBroker pbrun for secure privilege management

Replacing MySQL Enterprise Monitor with Enterprise Manager

ClusterControl adds Kubernetes Database Operator management in v2.3.2

Why Cloud Repatriation Matters Now More Than Ever

Subscribe below to be notified of fresh posts