Migrating from MySQL to PostgreSQL – What You Should Know
Whether migrating a database or project from MySQL to PostgreSQL, or choosing PostgreSQL for a new project with only MySQL knowledge, there are a few things to know about PostgreSQL and the differences between the two database systems.
PostgreSQL is a fully open source database system released under its own license, the PostgreSQL License, which is described as “a liberal Open Source license, similar to the BSD or MIT licenses.” This has allowed The PostgreSQL Global Development Group (commonly referred to as PGDG), who develops and maintains the open source project, to improve the project with help from people around the world, turning it into one of the most stable and feature rich database solutions available. Today, PostgreSQL competes with the top proprietary and open source database systems for features, performance, and popularity.
PostgreSQL is a highly compliant Relational Database System that’s scalable, customizable, and has a thriving community of people improving it every day.
What PostgreSQL Needs
In a previous blog, we discussed setting up and optimizing PostgreSQL for a new project. It is a good introduction to PostgreSQL configuration and behavior, and can be found here: https://severalnines.com/blog/setting-optimal-environment-postgresql.
If migrating an application from MySQL to PostgreSQL, the best place to start would be to host it on similar hardware or hosting platform as the source MySQL database.
If hosting the database on premise, bare metal hosts (rather than Virtual Machines) are generally the best option for hosting PostgreSQL. Virtual Machines do add some helpful features at times, but they come at the cost of losing power and performance from the host in general, while bare metal allows the PostgreSQL software to have full access to performance with fewer layers between it and the hardware. On premise hosts would need an administrator to maintain the databases, whether it’s a full time employee or contractor, whichever makes more sense for the application needs.
In The Cloud
Cloud hosting has come a long way in the past few years, and countless companies across the world host their databases in cloud based servers. Since cloud hosts are highly configurable, the right size and power of host can be selected for the specific needs of the database, with a cost that matches.
Depending on the hosting option used, new hosts can be provisioned quickly, memory / cpu / disk can be tweaked quickly, and even additional backup methods can be available. When choosing a cloud host, look for whether a host is dedicated or shared, dedicated being better for extremely high load databases. Another key is to make sure the IOPS available for the cloud host is good enough for the database activity needs. Even with a large memory pool for PostgreSQL, there will always be disk operations to write data to disk, or fetch data when not in memory.
Since PostgreSQL is increasing in popularity, it’s being found available on many cloud database hosting services like Heroku, Amazon AWS, and others, and is quickly catching up to the popularity of MySQL. These services allow a third party to host and manage a PostgreSQL database easily, allowing focus to remain on the application.
Concepts / term comparisons
There are a few comparisons to cover when migrating from MySQL to PostgreSQL, common configuration parameters, terms, or concepts that operate similarly but have their differences.
Various database terms can have different meanings within different implementations of the technology. Between MySQL and PostgreSQL, there’s a few basic terms that are understood slightly differently, so a translation is sometimes needed.
In MySQL, a ‘cluster’ usually refers to multiple MySQL database hosts connected together to appear as a single database or set of databases to clients.
In PostgreSQL, when referencing a ‘cluster’, it is a single running instance of the database software and all its sub-processes, which then contains one or more databases.
In MySQL, queries can access tables from different databases at the same time (provided the user has permission to access each database).
SELECT * FROM customer_database.customer_table t1 JOIN orders_database.order_table t2 ON t1.customer_id = t2.customer_id WHERE name = ‘Bob’;
However in PostgreSQL this cannot happen unless using Foreign Data Wrappers (a topic for another time). Instead, a PostgreSQL database has the option for multiple ‘schemas’ which operate similarly to databases in MySQL. Schemas contain the tables, indexes, etc, and can be accessed simultaneously by the same connection to the database that houses them.
SELECT * FROM customer_schema.customer_table t1 JOIN orders_schema.order_table t2 ON t1.customer_id = t2.customer_id WHERE name = ‘Bob’;
Interfacing with the PostgreSQL
In the MySQL command line client (mysql), interfacing with the database uses key works like ‘DESCRIBE table’ or ‘SHOW TABLES’. The PostgreSQL command line client (psql) uses its own form of ‘backslash commands’. For example, instead of ‘SHOW TABLES’, PostgreSQL’s command is ‘dt’, and instead of ‘SHOW DATABASES;’, the command is ‘l’.
A full list of commands for ‘psql’ can be found by the backslash command ‘?’ within psql.
Like MySQL, PostgreSQL has libraries and plugins for all major languages, as well as ODBC drivers along the lines of MySQL and Oracle. Finding a great and stable library for any language needed is an easy task.
Unlike MySQL, PostgreSQL has a wide range of supported Procedural Languages to choose from. In the base install of PostgreSQL, the supported languages are PL/pgSQL (SQL Procedural Language), PL/Tcl (Tcl Procedural Language), PL/Perl (Perl Procedural Language), and PL/Python (Python Procedural Language). Third party developers may have more languages not officially supported by the main PostgreSQL group.
MySQL tunes this with key_buffer_size when using MyISAM, and with innodb_buffer_pool_size when using InnoDB.
PostgreSQL uses shared_buffers for the main memory block given to the database for caching data, and generally sticks around 1/4th of system memory unless certain scenarios require that to change. Queries using memory for sorting use the work_mem value, which should be increased cautiously.
Tools for migration
Migrating to PostgreSQL can take some work, but there are tools the community has developed to help with the process. Generally they will convert / migrate the data from MySQL to PostgreSQL, and recreate tables / indexes. Stored Procedures or functions, are a different story, and usually require manual re-writing either in part, or from the ground up.
Some example tools available are pgloader and FromMySqlToPostgreSql. Pgloader is a tool written in Common Lisp that imports data from MySQL into PostgreSQL using the COPY command, and loads data, indexes, foreign keys, and comments with data conversion to represent the data correctly in PostgreSQL as intended. FromMySqlToPostgreSql is a similar tool written in PHP, and can convert MySQL data types to PostgreSQL as well as foreign keys and indexes. Both tools are free, however many other tools (free and paid) exist and are newly developed as new versions of each database software are released.
Converting should always include in depth evaluation after the migration to make sure data was converted correctly and functionality works as expected. Testing beforehand is always encouraged for timings and data validation.
If coming from MySQL where replication has been used, or replication is needed at all for any reason, PostgreSQL has several options available, each with its own pros and cons, depending on what is needed through replication.
By default, PostgreSQL has its own built in replication mode for Point In Time Recovery (PITR). This can be set up using either file-based log shipping, where Write Ahead Log files are shipped to a standby server where they are read and replayed, or Streaming Replication, where a read only standby server fetches transaction logs over a database connection to replay them.
Either one of these built in options can be set up as either a ‘warm standby’ or ‘hot standby.’ A ‘warm standby’ doesn’t allow connections but is ready to become a master at any time to replace a master having issues. A ‘hot standby’ allows read-only connections to connect and issue queries, in addition to being ready to become a read/write master at any time as well if needed.
One of the oldest replication tools for PostgreSQL is Slony, which is a trigger based replication method that allows a high level of customization. Slony allows the setup of a Master node and any number of Replica nodes, and the ability to switch the Master to any node desired, and allows the administrator to choose which tables (if not wanting all tables) to replicate. It’s been used not just for replicating data in case of failure / load balancing, but shipping specific data to other services, or even minimal downtime upgrades, since replication can go across different versions of PostgreSQL.
Slony does have the main requirement that every table to be replicated have either a PRIMARY KEY, or a UNIQUE index without nullable columns.
When it comes to multi-master options, Bucardo is one of few for PostgreSQL. Like Slony, it’s a third party software package that sits on top of PostgreSQL. Bucardo calls itself “an asynchronous PostgreSQL replication system, allowing for both multi-master and multi-slave operations.” The main benefit is multi-master replication, that works fairly well, however it does lack conflict resolution, so applications should be aware of possible issues and fix accordingly.
There are many other replication tools as well, and finding the one that works best for an application depends on the specific needs.
PostgreSQL has a thriving community willing to help with any issues / info that may be needed.
An active IRC chatroom named #postgresql is available on freenode, as administrators and developers world wide chat about PostgreSQL and related projects / issues. There’s even smaller rooms for specifics like Slony, Bucardo, and more.
There are a handful of PostgreSQL mailing lists for ‘general’, ‘admin’, ‘performance’, and even ‘novice’ (a great place to start if new to PostgreSQL in general). The mailing lists are subscribed to by many around the world, and provide a very useful wealth of resources to answer any question that may need answering.
A full list of PostgreSQL mailing lists can be found at https://www.postgresql.org/list/
User groups are a great place to get involved and active in the community, and many large cities worldwide have a PostgreSQL User Group (PUG) available to join and attend, and if not, consider starting one. These groups are great for networking, learning new technologies, and even just asking questions in person to people from any level of experience.
Most Importantly, PostgreSQL is documented very well. Any information for configuration parameters, SQL functions, usage, all can be easily learned through the official documentation provided on PostgreSQL’s website. If at all anything is unclear, the community will help in the previous outlined options.
Subscribe to get our best and freshest content