Severalnines Blog
The automation and management blog for open source databases

Online schema change for MySQL & MariaDB - comparing GitHub’s gh-ost vs pt-online-schema-change

Krzysztof Ksiazek
Posted in:

Database schema change is one of the most common activities that a MySQL DBA has to tackle. No matter if you use MySQL Replication or Galera Cluster, direct DDL’s are troublesome and, sometimes, not feasible to execute. Add the requirement to perform the change while all databases are online, and it can get pretty daunting.

Thankfully, online schema tools are there to help DBAs deal with this problem. Arguably, the most popular of them is Percona’s pt-online-schema-change, which is part of Percona Toolkit.

It has been used by MySQL DBAs for years and is proven as a flexible and reliable tool. Unfortunately, not without drawbacks.

To understand these, we need to understand how it works internally.

How does pt-online-schema-change work?

Pt-online-schema-change works in a very simple way. It creates a temporary table with the desired new schema - for instance, if we added an index, or removed a column from a table. Then, it creates triggers on the old table - those triggers are there to mirror changes that happen on the original table to the new table. Changes are mirrored during the schema change process. If a row is added to the original table, it is also added to the new one. Likewise if a row is modified or deleted on the old table, it is also applied on the new table. Then, a background process of copying data (using LOW_PRIORITY INSERT) between old and new table begins. Once data has been copied, RENAME TABLE is executed to rename “yourtable” into “yourtable_old” and “yourtable_new” into “yourtable”. This is an atomic operation and in case something goes wrong, it is possible to recover the old table.

The process described above has some limitations. For starters, it is not possible to reduce the overhead of the tool to 0. Pt-online-schema-change gives you an option to define the maximum allowed replication lag and, if that threshold is crossed, it stops to copy data between the old and new table. It is also possible to pause the background process entirely. The problem is that we are talking only about the background process of running INSERTs. It is not possible to reduce the overhead caused by the fact that every operation in “yourtable” is duplicated in “yourtable_new” through triggers. If you remove the triggers, the old and new table would go out of sync without any means to sync them again. Therefore, when you run pt-online-schema-change on your system, it always adds some overhead, even if it is paused or throttled. How big overhead depends on how many writes hit the table which is undergoing a schema change.

Another issue is caused again by triggers - precisely by the fact that, to create triggers, one has to acquire a lock on MySQL’s metadata. This can become a serious problem if you have highly concurrent traffic or if you use longer transactions. Under such load, it may be virtually impossible (and we’ve seen such databases) to use pt-online-schema-change due to the fact that it is not able to acquire metadata lock to create the required triggers. Additionally, the process of acquiring metadata can also lock further transactions, basically grinding all database operations to halt.

Yet another problem are foreign keys - unfortunately, there is no simple way of handling them. Pt-online-schema-change gives you two methods to approach this issue. Neither of those are really good. The main issue here is that a foreign key of a given name can only refer to a single table and it sticks to it - even if you rename the table referred to, the foreign key will follow this change. This leads to the problem: after RENAME TABLE, the foreign key will point to ‘yourtable_old’, not ‘yourtable’.

One workaround is to not use:

RENAME TABLE ‘yourtable’ TO ‘yourtable_old’, ‘yourtable_new’ TO ‘yourtable’;

Instead, use a two step approach:

DROP TABLE ‘yourtable’; RENAME TABLE ‘yourtable_new’ TO ‘yourtable’;

This poses a serious problem. If for some reason, RENAME TABLE won’t work, there’s no going back as the original table has been already dropped.

Another approach would be to create a second foreign key, under a different name, which refers to ‘yourtable_new’. After RENAME TABLE, it will point to ‘yourtable’, which is exactly what we want. Thing is, you need to execute a direct ALTER to create such foreign key - which kind of invalidates the point of using online schema change - to avoid direct alters. If the altered table is large, such operation is not feasible to execute on Galera Cluster (cluster-wide stall caused by TOI) and MySQL replication cluster (slave lag induced by serialized ALTER).

As you can see, while being a useful tool, pt-online-schema-change has serious limitations which you need to be aware of before you use it. If you use MySQL at scale, limitations may become a serious motivation to do something about it.

Introducing GitHub’s gh-ost

Motivation alone is not enough - you also need resources to create a new solution. GitHub recently released gh-ost, their take on online schema change. Let’s take a look at how it compares to Percona’s pt-online-schema-change and how it can be used to avoid some of its limitations.

To understand better what is the difference between both tools, let’s take a look at how gh-ost works.

Gh-ost creates a temporary table with the altered schema, just like pt-online-schema-change does - it uses “_yourtable_gho” pattern. It executes INSERT queries which use the following pattern to copy data from old to new table:

insert /* gh-ost `sbtest1`.`sbtest1` */ ignore into `sbtest1`.`_sbtest1_gho` (`id`, `k`, `c`, `pad`)
      (select `id`, `k`, `c`, `pad` from `sbtest1`.`sbtest1` force index (`PRIMARY`)
        where (((`id` > ?)) and ((`id` < ?) or ((`id` = ?)))) lock in share mode

As you can see, it is a variation of INSERT INTO new_table  SELECT * FROM old_table. It uses primary key to split data in chunks and then work on them.

In pt-online-schema-change, the current traffic was handled using triggers. Gh-ost uses a triggerless approach - it uses binary logs to track and apply changes which happened since gh-ost started to copy data. It connects to one of the hosts, by default it is one of the slaves, simulates that it is a slave itself and asks for binary logs.

This behavior has a couple of repercussions. First of all, network traffic is increased compared to pt-online-schema-change - not only gh-ost has to copy data but it also has to copy binary logs.

It also requires binary logs in row-based format for full data consistency - if you use statement or mixed replication, gh-ost won’t work in your setup. As a workaround, you can create a new slave, enable log_slave_updates and set it to store events in row format. Reading data from a slave is, actually, the default way in which gh-ost operates - it makes perfect sense as pulling binary logs adds some overhead and if you can avoid additional overhead on the master, you most likely want to do it. Of course, if your master uses row-based replication format, you can force gh-ost to connect to it and get binary logs.

What is good about this design is that you don’t have to create triggers, which, as we discussed, could become a serious problem or even a blocker. What is also great is that you can always stop parsing binary logs - it’s like you’d just run STOP SLAVE. You have the binlog coordinates so you can easily start in the same position later on. This makes it possible to stop practically all operations executed by gh-ost. Not only the background process of copying data from old to new table, but also any load related to keeping the new table in sync with the old one. This is a great feature in a production environment - pt-online-schema-change requires constant monitoring as you can only estimate the additional load on the system. Even if you paused it, it will still add some overhead and, under heavy load, this overhead may result in an unstable database. On the other hand, with gh-ost, you can just pause the whole process and the workload pattern will go back to what you are used to see - no additional load whatsoever related to the schema change. This is really great - it means you can start the migration at 9am, when you start your day, stop it at 5pm when you are leaving your office. You can be sure that you won’t get paged late at night because the paused schema change process is not actually 100% paused, and is causing problems to your production systems.

Unfortunately, gh-ost is not without drawbacks. For starters, foreign keys. Pt-online-schema-change does not provide any good way of altering tables which contain foreign keys. It is still way better than gh-ost as gh-ost does not support foreign keys at all. At the moment of writing, that is - it may change in the future. Triggers - gh-ost, at the moment of writing, does not support triggers at all. The same is true for pt-online-schema-change - it was a limitation of pre-5.7 MySQL where you couldn’t have more than one trigger of a given type defined in a table (and pt-online-schema-change had to create them for its own purposes). Even if the limitation is removed in MySQL 5.7, pt-online-schema-change still does not support tables with triggers.

One of the main limitations of gh-ost is, definitely, the fact that it does not support Galera Cluster. It is because of how gh-ost performs a table switch - it uses LOCK TABLE which do not work well with Galera - as of now there is no known fix or workaround for this issue and this leaves pt-online-schema-change as the only option for Galera Cluster.

These are probably the most important  limitations of gh-ost, but there are more of them. Minimal row image is not supported (which makes your binlogs grow larger), JSON and generated columns in 5.7 are not supported. Migration key must not contain NULL values, there are limitations when it comes to mixed cases in table names. You can find more details on all requirements and limitations of gh-ost in its documentation.

In our next blog post we will take a look at how gh-ost operates, how you can test your changes and how to perform it. We will also discuss throttling of gh-ost.

Related Post

MySQL on Docker: Composing the Stack

Docker 1.13 introduces a long-awaited feature called Compose-file support. Compose-file defines everything about an application - services, databases, volumes, networks, and dependencies can all be defined in one place.  In this blog, we’ll show you how to use Compose-file to simplify the Docker deployment of MySQL containers.

Posted in:

How to deploy and manage MySQL multi-master replication setups with ClusterControl 1.4

MySQL replication, while simple and popular, may come in different shapes and flavors. Master slave or master master topologies can be configured to suit your environment.  ClusterControl 1.4 brings a list of enhancements to deploy and manage different types of MySQL replication setups. This blog outlines the different topologies that can be deployed, the merits of each topology, and shows how each can be managed in a live environment.

Posted in:

Automatic failover of MySQL Replication - New in ClusterControl 1.4

MySQL replication setups are inevitably related to failovers - what do you do when your master fails and your applications are not able to write to the database anymore? Automated failover is required if you need to quickly recover an environment to keep your database up 24x7. This blog post discusses this new replication feature recently introduced in ClusterControl 1.4.

Posted in:

Automating MySQL Replication with ClusterControl 1.4.0 - what’s new

This blog post will go through new replication features in ClusterControl 1.4.0, including enhanced multi-master deployment, managing replication topology changes, automated failover and handling of replication errors.

Posted in: