Handling Replication Issues from non-GTID to GTID MariaDB Database Clusters

Krzysztof Ksiazek

We recently ran into an interesting customer support case involving a MariaDB replication setup. We spent a lot of time researching this problem and thought it would be worth sharing this with you in this blog post.

Customer’s Environment Description

The issue was as follows: an old (pre 10.x) MariaDB server was in use and an attempt was made to migrate data from it into more recent MariaDB replication setup. This resulted in issues with using Mariabackup to rebuild slaves in the new replication cluster. For the purpose of the tests we recreated this behavior in the following environment:

The data has been migrated from 5.5 to 10.4 using mysqldump:

mysqldump --single-transaction --master-data=2 --events --routines sbtest > /root/dump.sql

This allowed us to collect master binary log coordinates and the consistent dump. As a result, we were able to provision MariaDB 10.4 master node and set up the replication between old 5.5 master and new 10.4 node. The traffic was still running on 5.5 node. 10.4 master was generating GTID’s as it had to replicate data to 10.4 slave. Before we dig into details, let's take a quick look into how GTID’s work in MariaDB.

MariaDB and GTID

For starters, MariaDB uses a different format of the GTID than Oracle MySQL. It consists of three numbers separated by dashes:

0 - 1 - 345

First is a replication domain, which allows for multi-source replication to be properly handled. This is not relevant to our case as all the nodes are in the same replication domain. Second number is the server ID of the node that generated the GTID. Third one is the sequence number - it monotonically increases with every event stored in the binary logs.

MariaDB uses several variables to store the information about GTID’s executed on a given node. The most interesting for us are:

Gtid_binlog_pos - as per the documentation, this variable is the GTID of the last event group written to the binary log.

Gtid_slave_pos - as per the documentation, this system variable contains the GTID of the last transaction applied to the database by the server's slave threads.

Gtid_current_pos - as per the documentation, this system variable contains the GTID of the last transaction applied to the database. If the server_id of the corresponding GTID in gtid_binlog_pos is equal to the servers own server_id, and the sequence number is higher than the corresponding GTID in gtid_slave_pos, then the GTID from gtid_binlog_pos will be used. Otherwise the GTID from gtid_slave_pos will be used for that domain.

So, to make it clear, gtid_binlog_pos stores GTID of the last locally executed event. Gtid_slave_pos stores GTID of the event executed by the slave thread and gtid_current_pos shows either the value from gtid_binlog_pos, if it has the highest sequence number and it has server-id or gtid_slave_pos if it has the highest sequence. Please keep this in your mind.

An Overview of the Issue

The initial state of the relevant variables are on 10.4 master:

MariaDB [(none)]> show global variables like '%gtid%';

+-------------------------+----------+

| Variable_name           | Value |

+-------------------------+----------+

| gtid_binlog_pos         | 0-1001-1 |

| gtid_binlog_state       | 0-1001-1 |

| gtid_cleanup_batch_size | 64       |

| gtid_current_pos        | 0-1001-1 |

| gtid_domain_id          | 0 |

| gtid_ignore_duplicates  | ON |

| gtid_pos_auto_engines   | |

| gtid_slave_pos          | 0-1001-1 |

| gtid_strict_mode        | ON |

| wsrep_gtid_domain_id    | 0 |

| wsrep_gtid_mode         | OFF |

+-------------------------+----------+

11 rows in set (0.001 sec)

Please note gtid_slave_pos which, theoretically, doesn’t make sense - it came from the same node but via slave thread. This could happen if you make a master switch before. We did just that - having two 10.4 nodes we switched the masters from host with server ID of 1001 to host with server ID of 1002 and then back to 1001.

Afterwards we configured the replication from 5.5 to 10.4 and this is how things looked like:

MariaDB [(none)]> show global variables like '%gtid%';

+-------------------------+-------------------------+

| Variable_name           | Value |

+-------------------------+-------------------------+

| gtid_binlog_pos         | 0-55-117029 |

| gtid_binlog_state       | 0-1001-1537,0-55-117029 |

| gtid_cleanup_batch_size | 64                      |

| gtid_current_pos        | 0-1001-1 |

| gtid_domain_id          | 0 |

| gtid_ignore_duplicates  | ON |

| gtid_pos_auto_engines   | |

| gtid_slave_pos          | 0-1001-1 |

| gtid_strict_mode        | ON |

| wsrep_gtid_domain_id    | 0 |

| wsrep_gtid_mode         | OFF |

+-------------------------+-------------------------+

11 rows in set (0.000 sec)

As you can see, the events replicated from MariaDB 5.5, they all have been accounted for in gtid_binlog_pos variable: all events with server ID of 55. This results in a serious issue. As you may remember, gtid_binlog_pos should contain events executed locally on the host. Here it contains events replicated from another server with different server ID.

This makes things dicey when you want to rebuild the 10.4 slave, here’s why. Mariabackup, just like Xtrabackup, works in a simple way. It copies the files from the MariaDB server while scanning redo logs and storing any incoming transactions. When the files have been copied, Mariabackup would freeze the database using either FLUSH TABLES WITH READ LOCK or backup locks, depending on the MariaDB version and the availability of the backup locks. Then it reads the latest executed GTID and stores it alongside the backup. Then the lock is released and backup is completed. The GTID stored in the backup should be used as the latest executed GTID on a node. In case of rebuilding slaves it will be put as a gtid_slave_pos and then used to start the GTID replication. This GTID is taken from gtid_current_pos, which makes perfect sense - after all it is the “GTID of the last transaction applied to the database”. Acute reader can already see the problem. Let’s show the output of the variables when 10.4 replicates from the 5.5 master:

MariaDB [(none)]> show global variables like '%gtid%';

+-------------------------+-------------------------+

| Variable_name           | Value |

+-------------------------+-------------------------+

| gtid_binlog_pos         | 0-55-117029 |

| gtid_binlog_state       | 0-1001-1537,0-55-117029 |

| gtid_cleanup_batch_size | 64                      |

| gtid_current_pos        | 0-1001-1 |

| gtid_domain_id          | 0 |

| gtid_ignore_duplicates  | ON |

| gtid_pos_auto_engines   | |

| gtid_slave_pos          | 0-1001-1 |

| gtid_strict_mode        | ON |

| wsrep_gtid_domain_id    | 0 |

| wsrep_gtid_mode         | OFF |

+-------------------------+-------------------------+

11 rows in set (0.000 sec)

Gtid_current_pos is set to 0-1001-1. This is definitely not the correct moment in time, it’s taken from gtid_slave_pos while we have a bunch of transactions that came from 5.5 after that. The problem is that those transactions are stored as gtid_binlog_pos. On the other hand gtid_current_pos is calculated in a way that it requires local server ID for GTID’s in gitd_binlog_pos before they can be used as the gtid_current_pos. In our case they have the server ID of the 5.5 node so they will not be treated properly as events executed on the 10.4 master. After backup restore, if you’d set the slave according to the GTID state stored in the backup, it would end up re-applying all the events that came from 5.5. This, obviously, would break the replication.

The Solution

A solution to this problem is to take several additional steps:

  1. Stop the replication from 5.5 to 10.4. Run STOP SLAVE on 10.4 master
  2. Execute any transaction on 10.4 - CREATE SCHEMA IF NOT EXISTS bugfix - this will change the GTID situation like this:
MariaDB [(none)]> show global variables like '%gtid%';

+-------------------------+---------------------------+

| Variable_name           | Value   |

+-------------------------+---------------------------+

| gtid_binlog_pos         | 0-1001-117122   |

| gtid_binlog_state       | 0-55-117121,0-1001-117122 |

| gtid_cleanup_batch_size | 64                        |

| gtid_current_pos        | 0-1001-117122   |

| gtid_domain_id          | 0   |

| gtid_ignore_duplicates  | ON   |

| gtid_pos_auto_engines   |   |

| gtid_slave_pos          | 0-1001-1   |

| gtid_strict_mode        | ON   |

| wsrep_gtid_domain_id    | 0   |

| wsrep_gtid_mode         | OFF   |

+-------------------------+---------------------------+

11 rows in set (0.001 sec)

The latest GITD was executed locally, so it was stored as gtid_binlog_pos. As it has the local server ID, it’s picked as the gtid_current_pos. Now, you can take a backup and use it to rebuild slaves off 10.4 master. Once this is done, start the slave thread again.

MariaDB is aware that this kind of bug exists, one of the relevant bug report we found is: https://jira.mariadb.org/browse/MDEV-10279 Unfortunately, there’s no fix so far. What we found is that this issue affects MariaDB up to 5.5. Non-GTID events that come from MariaDB 10.0 are correctly accounted on 10.4 as coming from the slave thread and gtid_slave_pos is properly updated. MariaDB 5.5 is quite an old one (even though it still supported) so you still may see setups running on it and attempts to migrate from 5.5 to more recent, GTID-enabled MariaDB versions. What’s worse, according to the bug report we found, this also affects replication coming from non-MariaDB (one of the comments mentions issue showing up on Percona Server 5.6) servers into MariaDB. 

Anyway, we hope you found this blog post useful and hopefully you will not run into the problem we just described.

 
ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.