Database migration is the process of migrating data from one or more source databases to one or more target databases by using a service or tool. There are different types of migrations. You can migrate from a technology to a different one, to a different datacenter or to the cloud, or even in the same place and same technology to another machine. The best tool for a PostgreSQL migration depends on the type of migration and requirements like availability, or downtime tolerance, and sometimes it is hard to find the best tool to do the job.
In this blog, we will mention some open-source tools for PostgreSQL migrations with a short overview of each option.
Backup and Restore Migration Tools
This could be the easiest way to perform a data migration, but also the slowest in terms of Recovery Time Objective (RTO). Using this method means you will take a backup from your current database, most probably a logical backup if you want to restore it in a different version or infrastructure, copy the dump, and restore it in the new server. Depending on the amount of data, it could take a lot of time, but it is the basic way to migrate your database. Let’s see some tools for this.
pg_dump is a utility for backing up a single PostgreSQL database. It makes consistent backups even if the database is being used concurrently, as It doesn’t block other users. To backup an entire cluster, or to backup global objects that are common to all databases in a cluster (such as roles and tablespaces), you must use pg_dumpall instead.
Dumps can be output in script or archive file formats. Script dumps are plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved. It can be used to reconstruct the database on other machines, other architectures, or even on other SQL database products.
pg_basebackup is used to take a base backup of a running PostgreSQL database cluster. The backup is taken without affecting other database clients and can be used for both PITR (Point-In-Time-Recovery) and as a starting point of a Streaming Replication standby server. It makes an exact copy of the database cluster’s files while making sure the server is put into and out of backup mode automatically. Backups are always taken of the entire database cluster; it is not possible to back up individual databases or database objects.
pgBackRest is an open-source backup tool that creates physical backups with some improvements compared to the classic pg_basebackup tool. You can use pgBackRest to perform an initial database copy for Streaming Replication by using an existing backup, or you can use the delta option to rebuild an old standby server.
Some of the most important pgBackRest features are:
- Parallel Backup & Restore
- Local or Remote Operation
- Full, Incremental, and Differential Backups
- Backup Rotation and Archive Expiration
- Backup Integrity check
- Backup Resume
- Delta Restore
PostgreSQL Migration Tools
Instead of using a backup tool, there are different utilities to perform this migration in a faster way, with different methods. It can use an ETL approach, or even configure replication between different database technologies using the same concept about Extract – Transform – Load. Let’s see some of these tools.
pg_chameleon is a MySQL to PostgreSQL replica system. It can connect to the MySQL replication protocol and replicate the data changes in PostgreSQL. Whether the user needs to setup a permanent replica between MySQL and PostgreSQL, or perform an engine migration, pg_chameleon is a good option for the task.
The main features are:
- Read from multiple MySQL schemas and restore them into a target PostgreSQL database
- Setup PostgreSQL to act as a MySQL slave
- Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY KEY/TRUNCATE, RENAME)
- Tables that generate errors are automatically excluded from the replica
- Possibility to refresh single tables or single schemas
- Detach replica from MySQL for migration support
- Data type override
- Daemonized init_replica process
- Daemonized replica process with two separate subprocesses, one for the read and one for the replay
pgloader loads data from various sources into PostgreSQL. It can transform the data it reads on the fly and submits raw SQL before and after the loading. It uses the COPY PostgreSQL protocol to stream the data into the server and manages errors by filling a pair of reject.dat and reject.log files.
Thanks to being able to load data directly from a database source, pgloader also supports migrations from other products to PostgreSQL. In this mode of operations, pgloader handles both the schema and data parts of the migration, in a single unmanned command, allowing to implement Continuous Migration.
The main features are:
- Many source formats supported, like CSV, db3, and IBM IXF files
- On the fly data transformation
- Full Field projections
- Reading from compressed files (zip, tar, and gzip)
- HTTP(S) support
- Target schema discovery
- On error stop/resume next options
- Pre/Post SQL commands
- One-command migration
- Schema discovery
- User-defined casting rules
- Partial Migrations including/excluding tables
- Schema or Data only
- Materialized Views, or schema rewrite on-the-fly
- Continuous Migration
Ora2Pg is a free tool used to migrate an Oracle or MySQL database to a PostgreSQL compatible schema. It connects your current database, scans it, and extracts its structure or data, it then generates SQL scripts that you can load into your PostgreSQL database.
The main features are:
- Export full database schema (tables, views, sequences, indexes), with unique, primary, foreign key and check constraints
- Export grants/privileges for users and groups
- Export range/list partitions and subpartitions
- Export a table selection (by specifying the table names).
- Export predefined functions, triggers, procedures, packages, and package bodies
- Export full data or follow a WHERE clause
- Full support of Oracle BLOB object as PG BYTEA
- Export Oracle views as PG tables
- Export Oracle user-defined types
- Provide some basic automatic conversion of PLSQL code to PLPGSQL
- Export Oracle tables as foreign data wrapper tables
- Export materialized view
- Show a detailed report of an Oracle database content
- Migration cost assessment of an Oracle database
- Migration difficulty level assessment of an Oracle database
- Migration cost assessment of PL/SQL code from a file
- Migration cost assessment of Oracle SQL queries stored in a file
- Generate XML ktr files to be used with Penthalo Data Integrator (Kettle)
- Export Oracle locator and spatial geometries into PostGis
- Export DBLINK as Oracle FDW
- Export SYNONYMS as views
- Export DIRECTORY as an external table or directory for external_file extension
- Full MySQL export just like Oracle database
There are more ETL available options, like Pentaho or Talen, but those are more oriented to data integration or data management than migration, so we won’t describe them here to avoid an extensive blog post. You can find more information here.
PostgreSQL Logical Replication for Migrations
Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). It is based on a publish and subscribe mode, where one or more subscribers subscribe to one or more publications on a publisher node.
A publication is a set of changes generated from a table or a group of tables (also referred to as a replication set). The node where a publication is defined is referred to as publisher. A subscription is the downstream side of logical replication. The node where a subscription is defined is referred to as the subscriber, and it defines the connection to another database and set of publications (one or more) to which it wants to subscribe. Subscribers pull data from the publications they subscribe to.
This method can be used for migrating or upgrading your PostgreSQL database.
For more information, you can refer to the corresponding blog post about upgrading PostgreSQL with zero downtime.
Migrations are a difficult and risky task, as you are moving or transforming data from one (or more) source to a target (or more than one), and it is even worse if your target is a different engine. So, you will need to have a good plan, with detailed steps, and of course, a test environment to test all the stages of the migration. In this blog, we mentioned some tools to help with this task, and the best choice will depend on your requirements about the availability of your system.