Large organizations that are using MySQL or MariaDB database platforms are often faced with a need to perform a database migration from one place to another. Regardless of the platform, type of database software (such as from RDBMS to NoSQL or NoSQL going back to RDBMS), or if it’s just a data migration, performing a migration is a huge amount of work and costs.
A database migration will always involve the process of migrating data from one or more source databases to one or more target databases. This can involve a database migration service or a mashup set of tools that the engineers have built to create a service and tailor to this kind of problem.
It is expected that a database migration does not mean the source database platform will end up its target platform to be exactly as the source of origin. Once a migration is finished, the dataset from the target databases could end up being possibly restructured. What matters most, once migration is fully done, is that the clients accessing the database shall be redirected to the newly source databases. The new source database has to provide the exact copy of data from source, and with no impacts to performance that may impact the overall user experience.
Moving your data from one platform to the target destination platform is a huge task to do. This is what a database migration covers when an organization or company decides to switch off its light to the current platform for numerous reasons. The common reasons for migrating data is because of cost effectiveness to the target destination platform or for its flexibility upon deployment and scalability. While the current platform hosting the current production data causes more costs for its upgrades and scalability wise, it is just burdensome when deploying small changes which can actually be deployed in a microservice platform.
Backup Tools For Data Migration
The most easy path to use when performing migration is to use database backup tools. We’ll look at what these tools are and how you can use them during migration.
This tool is one of the most famous utilities for MySQL or MariaDB that a database admin or system admin will hook this tool to migrate either a full database or a partial copy of the database. For database admins that are not familiar with MySQL/MariaDB, this tool will allow you to create a copy of backup which will generate a logical copy of data that you can dump on the target database.
A common setup with using this tool is, whenever a target database is located somewhere else and is hosted on a different platform than the source, the target acts as a slave or replica. Using mysqldump commonly invoked with –single-transaction on a busy system, and with –master-data will provide you the coordinates to set up a slave on the target database which will be used as a host for data migration. An alternative to mysqldump is also mysqlpump but with a lesser feature yet can do parallel processing of databases, and of objects within databases, to speed up the dump process. The downside is that, with mysqlpump, there’s no option you can use such as –master-data which is very useful if you want to create a replica which will be used as a target destination for database migration.
mysqlpump is advantageous if your data is more of idle or is put into maintenance mode such that no processed writes or changes ongoing to the source database. It is faster and quicker compared to mysqldump.
mydumper/myloader is a very nifty and efficient tool that you can use for logical backup especially for importing bulk data with faster processing speed as it offers parallelism, ability to send data by chunks, supports threshold and control rate through number of threads, rows, statement size, and compress the result. It does generate or include binary log file and log positions which is very helpful if you setup the target destination platform to act as a replica of the current source and production environment.
Basically, mydumper is the binary and the command you have to invoke via command line for creating the logical backup. Whereas, myloader is the binary and the command you have to use when loading data to the desired target destination. Its flexibility allows you to manage your desired rate when processing the actions whether its creating a backup or loading the data. Using mydumper, you can also create a full backup or just a partial backup copy of your source database. This is very useful in case you need a large data or schema that you wanted to move away from the current database host, and slightly move it to another target destination while starting to setup a new database shards. This can also be one way to migrate large data by pulling a huge segment of the dataset then moving it but as a new shard node.
mydumper/myloader has also its limitations. It has been stopped from updates from the original authors but saved by Max Bube yet the tool is still being widely used even for production environments.
Percona XtraBackup/MariaDB Backup
Percona’s XtraBackup is a gift for database administrators that do not want to use and spend money for the enterprise Oracle MySQL Enterprise Backup. Whereas MariaDB Backup is forked and derived from Percona XtraBackup, they also have MariaDB Enterprise Backup.
Both of these tools share the same concepts when performing or taking a backup. It’s a binary backup which offers a hot online backup, PITR, incremental and full backup, partial backup, also useful for data recovery as it understands recovery such that produces binary log file and position, supports GTID’s, and a lot more. Although MariaDB Backup and Percona XtraBackup are two different types of software nowadays as they are architected onwards to support the database focused to provide a backup. MariaDB Backup is definitely applicable if you are intending to use or take backups from a MariaDB database source. Whereas, Percona XtraBackup is applicable on Oracle MySQL and also on Percona Server or some derived MySQL servers such as Percona XtraDB Server or Codership’s version of Galera Cluster for MySQL.
Both backups are very beneficial for database migrations. Performing a hot online backup is quicker and faster and produces a backup that you can directly use to load it to your target database. More often, streaming backup is handy as well like you can perform an online backup and stream the binary data to the target database using socat or netcat. This allows you to shorten the time of migration since moving data to the target destination is being streamed directly.
The most common approach of data migration while using this tool is to copy the data from the source then stream the data to the target destination. Once in the target database destination, you can just prepare the binary backup with –prepare option where it applies the logs that were recorded during the time of the backup creation so it will copy the full data as is and exactly from the point of time where the backup was taken. Then set the target database destination as a replica to act as a replica or slave of the existing source cluster and replicate all those changes and transactions that have occurred from the main cluster.
Of course there’s a limitation as well of using this tool but database administrators must know how to use this tool and also how to throttle and customize the usage in accordance to its desired use. You might not want to bog down your source database if your source is taking too much traffic or large processing from that time. Its limitation also ensures that its a homogeneous setup where the target source is of Linux compatible system and not on a Windows type environment since Percona XtraBackup and MariaDB Backup operate only in the Linux environment.
Database Schema Migration Tools
Database migration does not speak itself only on a specific tool and a specific task, then migration can happen. There’s a lot of considerations and underlying subsequent tasks that have to be done to fulfill a complete database migration. One of these is the schema migration or database migration. A schema in MySQL/MariaDB is a collection of data which consists of a group of tables with its columns and rows, events, triggers, stored procedures or routines, and functions. There are occasions when you might only want to migrate a schema or only a table. Let say a specific table on a schema requires a change in its table structure and that requires a DDL statement. The problem is that, running a direct DDL statement such as ALTER TABLE …ENGINE=InnoDB blocks any incoming transactions or connections that will also refer or use to the target table. For some huge tables that comprises long data definition and structure of the table, it adds more real challenge and also complicates more especially if the table is a hot table. Whereas in a database migration, it can be hard to copy the exact full copy of the full table without downtime from source. So let’s see what these are.
It’s part of the famous Percona Toolkit which originally derived from Maatkit and Aspersa. This tool is very useful when performing a table definition change especially for a hot table consisting of a huge amount of data. For some common yet naive approach for performing a table definition change, running ALTER TABLE can do the job. Although it does suffice the case, ALTER TABLE without using ALGORITHM=INPLACE causes a full table copy which acquires a full-metadata lock and that means your database can possibly piled up and lock up for a long period of time, especially if the table is huge. In that case, this tool is built to solve that problem. This tool is very beneficial for database migration in such a way that a detected inconsistent copy of a hot table with very huge data from your already setup target database destination. Instead of performing a backup either using a logical or binary/physical copy, pt-online-schema-change can be used which copies the rows from source table to its target table chunk-by-chunk. You can even customize the command with proper calls to its parameters depending on your requirements.
Aside from using, pt-online-schema-change also uses triggers. By using triggers, any subsequent or on-going traffic that tries to apply changes into that reference table shall also be copied to the target database which acts as a replica of the current source database cluster. This copies all data exactly what data that the source database has down to your target database that is lying on a different platform, for example. Using triggers is applicable to be used for MySQL and MariaDB as long as it’s engine is InnoDB and has a primary key presence on that table, which is a requirement. You may know that InnoDB uses a row-locking mechanism which allows that, for some number of chunks (a group of select records), pt-online-schema-change will try to copy that and then applies the INSERT statement to the target table. The target table is a dummy table which acts as a target copy of the soon-to-be replacement of the existing source table. pt-online-schema-change though allows the user to either remove the dummy table or just let the dummy table in-placed until the administrator is ready to remove that table. Take note that, dropping or removing a table acquires a meta-datalock. Since it acquires triggers, any subsequent changes shall be copied exactly to the target table leaving no discrepancy on the target or dummy table.
Shares the same concept as pt-online-schema-change. This tool approaches differently compared to pt-online-schema-change. I shall say, this schema tool migration approaches those production-based impediments that can cause your database to slow down and possibly stuck up causing your database cluster to go fall down under maintenance mode or down for an unknown period of time, until the problem is solved. This problem is caused usually with triggers. If you have a busy or hot table that is undergoing a schema change or table definition change, triggers can cause your database to piled up due to lock contention. MySQL/MariaDB triggers allow your database to define triggers for INSERT, UPDATE, and DELETE. If the target table is on a hotspot, then it can end up nasty. Your database starts to get slower until it gets stuck up unless you are able to kill those incoming queries or best to remove the triggers, but that’s not what the ideal approach is all about.
Because of those issues, gh-ost addresses that problem. It acts as if there’s a binary log server where the incoming events or transactions are logged in a binary log format, specifically using RBR (Row Based Replication). In fact, it is very safe and less worries in terms of impact you need to face. In fact, you have also the option to do a test or dry-run (same as with pt-online-schema-change) but test it directly into the replica or a slave node. This is perfect if you want to play around and check the exact copy to your target database during migration.
This tool is very flexible in accordance to your needs and implies assurance that your cluster shall not stuck up or probably end up performing a failover or data recovery if it goes worse. For more information and want to learn this tool, I suggest reading this post from Github by Shlomi Noach.
Other OSC Tools
I can say, those two tools are more of a recommendable approach but other alternatives you can also try. Mostly, these tools apply MySQL/MariaDB triggers so it somehow shares the same concept as pt-online-schema-change. Here’s the following list:
- LHM – Rails style database migrations are a useful way to evolve your data schema in an agile manner. Most Rails projects start like this, and at first, making changes is fast and easy.
- OnlineSchemaChange – Created and initiated by Facebook. This tool is used for making schema changes for MySQL tables in a non-blocking way
- TableMigrator – Initiated by Serious Business and ex-employees of Twitter. This tool shares the same principle with zero-downtime migrations of large tables in MySQL. It is implemented using Rails so it can be useful if you have a Ruby-on-Rails application environment.
- oak-online-alter-table – this is an old tool created by Shlomi Noach though it’s somehow approaches the same approach that pt-online-schema-change and does perform a non-blocking ALTER TABLE operation
Database Migration Wizard Tools
There are few migration tools that offer free usage which is very beneficial to some extent. What’s more advantageous with using migration wizard tools is that they have GUI for which you can have the convenience to see the current structure or just follow the steps the UI provides during migration. There can be numerous services or wizard tools but it’s not open source and it’s not available for free. Of course, a database migration is a very complex yet a systematic process but for some cases, it requires large work and efforts. Let’s take a look at these free tools.
As the name suggests, it’s for MySQL and derivative databases such as Percona Server for example, can be useful when it comes to database migration. Since MariaDB has totally shifted to a different route especially since the 10.2 version, there are some incompatibility issues you might encounter if you attempt to use this from a MariaDB source or target. Workbench can be used for heterogeneous types of databases such as those coming from different source databases and wants to dump the data to MySQL.
The MySQL Workbench is composed of community and enterprise versions. Yet, the community version is freely available as GPL which you can find here https://github.com/mysql/mysql-workbench. As the documentation states, MySQL Workbench Allows you to migrate from Microsoft SQL Server, Microsoft Access, Sybase ASE, SQLite, SQL Anywhere, PostreSQL, and other RDBMS tables, objects and data to MySQL. Migration also supports migrating from earlier versions of MySQL to the latest releases.
For those working as web developers using the LAMP stack, this tool comes to no surprise to be one of their swiss army knives when dealing with database tasks. phpMyAdmin is a free software tool written in PHP, intended to handle the administration of MySQL over the Web. phpMyAdmin supports a wide range of operations on MySQL and MariaDB. Frequently used operations (managing databases, tables, columns, relations, indexes, users, permissions, etc) can be performed via the user interface, while you still have the ability to directly execute any SQL statement.
Although it’s quite simple when it comes to import and export, what’s important is that it makes the job done. Although for larger and more complex migration, this might not suffice to handle your desired needs.
HeidiSQL is free software, and has the aim to be easy to learn. “Heidi” lets you see and edit data and structures from computers running one of the database systems MariaDB, MySQL, Microsoft SQL, PostgreSQL and SQLite. Invented in 2002 by Ansgar, HeidiSQL belongs to the most popular tools for MariaDB and MySQL worldwide.
For migration purposes, it allows you to export from one server/database directly to another server/database. It also has import features to allow text fields such as CSV, and also export table rows also into a wide range of supported file types such as CSV, HTML, XML, SQL, LaTeX, Wiki Markup and PHP Array. Although it is built to manage databases for db server administration purposes, yet you can use this for simple migration stuff.
Percona Toolkit As Your Swiss Army Knife
Percona Toolkit is notable software being distributed as an open-source software under the warranty of GPL. Percona Toolkit is a collection of advanced command-line tools commonly used internally by Percona but it’s also applicable for any database work related especially for MySQL/MariaDB servers.
So how and why is it also helpful for data migration especially in MySQL/MariaDB migrations? They have a number of tools here which is beneficial to use upon migration and after migration.
As mentioned earlier, a common approach of data migration is to have the target destination server as a replica of the main source database cluster but in a homogeneous setup. This means that, if the situation is moving from on-prem to a public cloud provider, you can setup an elected node from that platform and this node will replicate all transactions from the main cluster. By using backup tools, you may be able to achieve this type of migration setup. But it doesn’t end there. Percona Toolkit has pt-table-checksum/pt-table-sync tools for example in order to help you identify data inconsistencies between on-prem versus the target destination database server. With pt-table-checksum, you can perform checksum calculations based on a series of chunks for all databases or just selectively checksum on particular databases, or particular tables, or even a range of records of the table. pt-table-sync will be used to perform data synchronization so your target databases will be refreshed again with a new copy of the exact data from the main source cluster.
On the other hand, pt-upgrade is very useful after the migration from backup tools is performed. With pt-upgrade, you can use this tool to perform analysis by running a set of queries, for example, from a slow query log file. These results can be used to compare from the source database and against the target database server.
Database migration, especially from a heterogeneous setup, can be very complicated. Yet on a homogenous setup it can be quite straightforward; regardless if the data is large or small as long as you are equipped with proper tools and, of course, the correct systematic approach to determine that migration is complete with data being consistent. There can be times when a migration requires consultation with the experts, but it’s always a great start to come up and try with these open source tools to achieve your desired database migration task.