Severalnines Blog
The automation and management blog for open source databases

MySQL slave promotion with- and without using GTID

Severalnines
Posted in:

MySQL Replication can be fragile: whenever it encounters a connectivity error, it will retry and if it is a serious error, it will simply stop. Obviously in the latter case, you will need to repair the broken replication yourself.

The most common problems with replication are when it stops due to either master failure or network problems. In case the master fails, the whole topology becomes read-only and this means the write queries cannot be applied anymore. This is where normally you would promote one of the replicas to become the new master. To illustrate the difference in promotion between GTID and non-GTID cases, we will go through the manual promotion process below.

Most advanced slave without GTID

The first step in this promotion is to find the most advanced slave. As the master is no longer available, not all replicas may have copied and applied the same amount of transactions. So it is key to find the most advanced slave first.

We first iterate through all replicas to see which one is the furthest in the last binary log and elect this host to become the new master.

SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.10.12.11
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 1447420
...
          Exec_Master_Log_Pos: 1447420

The next step is to advance the other replicas to the latest transactions on the candidate master. As the replicas are logging their slave updates in their own binary logs, they have a different numbering for their own transactions and thus it is very difficult to match this data. An automated tool like MySQL HA Master (MHA) is capable of doing this, so when you are failing over by hand, you generally scan through the binary logs or skip these transactions.

Once we have done this, we tell the replicas to start replicating from the designated points of the new master.

CHANGE MASTER TO 
  MASTER_HOST = 'new.master',
  MASTER_PORT = 3306,
  MASTER_USER = 'repl',
  MASTER_PASSWORD = 'repl',
  MASTER_LOG_FILE = 'binlog.000002',
  MASTER_LOG_POS = 1446089;

Most advanced slave with GTID

By far the greatest benefit of using GTIDs within replication is that within the replication topology, all we have to do is to find the most advanced slave, promote it to master, and point the others to this new master.

So the most advanced slave is the same as without GTID:

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.10.12.14
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 1590
...
          Exec_Master_Log_Pos: 1590

Finding out which part of the binary logs the other hosts are missing is not necessary, as the new master’s binary logs already contain transactions with the GTIDs of the dead master and thus the slaves can realign with the new master automatically. This applies to both MariaDB and MySQL implementations of GTID.

CHANGE MASTER TO 
  MASTER_HOST = new.master',
  MASTER_PORT = 3306,
  MASTER_USER = 'repl',
  MASTER_PASSWORD = 'repl',
  MASTER_AUTO_POSITION = 1;

As you can see this is a far more reliable way of promoting a slave to a master without the chance of loss of transactions. Therefore GTID failover is the preferred way in ClusterControl.

To learn more about MySQL Replication and failover, check out our MySQL Replication Blueprint eBook  for a more comprehensive walkthrough. Apart from failover, it covers installation and configuration best practices, what you should monitor, how to make topology changes and repair broken setups, backup strategies, schema changes and load balancing with read-write splitting.