blog

How to Recover MySQL Galera Cluster From an Asynchronous Slave

Sebastian Insausti

Published: July 10, 2018
Last Updated: May 4, 2022

Introduction

When running Galera Cluster, it is a common practice to add one or more asynchronous slaves in the same or in a different datacenter. This provides us with a contingency plan with low RTO, and with a low operating cost. In the case of an unrecoverable problem in our cluster, we can quickly failover to it so applications can continue to have access to data.

When using this type of setup, we cannot just then rebuild our cluster from a previous backup. Since the async slave is now the new source of truth, we need to rebuild the cluster from it.

This does not mean that we only have one way to do it, maybe there is even a better way! Feel free to give us your suggestions in the comments section at the end of this post.

Topology

ClusterControl Topology View Online

Above, we can see a sample topology with Galera Cluster and an asynchronous replica/slave.

Database Diagram 1

Next we will see how we can recreate our cluster, starting from the slave, in the case of finding something like this:

Database Diagram 2

ClusterControl Topology View Offline

If we look at the previous image, we can see our 3 Galera nodes are down. Our slave is not able to connect to the Galera master, but it is in an “Up and running” state.

Promote Slave

As our slave is working properly, we can promote it to master and point our applications to it. For this, we must disable the read-only parameter in our slave and reset the slave configuration.

In our slave (mysql1):

mysql> SET GLOBAL read_only=0;
Query OK, 0 rows affected (0.00 sec)
mysql> STOP SLAVE;
Query OK, 0 rows affected (0.00 sec)
mysql> RESET SLAVE;
Query OK, 0 rows affected (0.18 sec)

Create new Cluster

Next, to start recovery of our failed cluster, we will create a new Galera Cluster. This can be easily done through ClusterControl ClusterControl, please scroll further down in this blog to see how.

Once we have deployed our new Galera cluster, we would have something like the following:

Database Diagram 3

Replication

We must ensure that we have the replication parameters configured.

For Galera nodes (galera1, galera2, galera3):

server_id=         # Different value in each node
binlog_format=ROW
log_bin = /var/lib/mysql-binlog/binlog
log_slave_updates = ON
gtid_mode = ON
enforce_gtid_consistency = true
relay_log = relay-bin
expire_logs_days = 7

For Master node (mysql1):

server_id=         # Different value in each node
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
gtid_mode=ON
enforce_gtid_consistency=1
relay_log=relay-bin
expire_logs_days=7
read_only=ON
sync_binlog=1
report_host=    # Local server

In order for our new slave (galera1) to connect with our new master (mysql1), we must create a user with replication permissions in our master.

In our new master (mysql1):

mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave_user'@'%' IDENTIFIED BY 'slave_password';

Note: We can replace the “%” with the IP of the Galera Cluster node that will be our slave, in our example, galera1.

Backup

If we do not have it, we must create a consistent backup of our master (mysql1) and load it in our new Galera Cluster. For this, we can use XtraBackup tool or mysqldump. Let’s see both options.

In our example we use the sakila database available for testing.

XtraBackup tool

We generate the backup in the new master (mysql1). In our case we send it to the local directory /root/backup:

$ innobackupex /root/backup/

We must get the message:

180705 22:08:14 completed OK!

We compress the backup and send it to the node that will be our slave (galera1):

$ cd /root/backup
$ tar zcvf 2018-07-05_22-08-07.tar.gz 2018-07-05_22-08-07
$ scp /root/backup/2018-07-05_22-08-07.tar.gz galera1:/root/backup/

In galera1, extract the backup:

$ tar zxvf /root/backup/2018-07-05_22-08-07.tar.gz

We stop the cluster (if it is started). For this we stop the mysql services of the 3 nodes:

$ service mysql stop

In galera1, we rename the data directory of mysql and load the backup:

$ mv /var/lib/mysql /var/lib/mysql.bak
$ innobackupex --copy-back /root/backup/2018-07-05_22-08-07

We must get the message:

180705 23:00:01 completed OK!

We assign the correct permissions on the data directory:

$ chown -R mysql.mysql /var/lib/mysql

Then we must initialize the cluster.

Once the first node is initialized, we must start the MySQL service for the remaining nodes, eliminating any previous copy of the file grastate.dat, and then verify that our data is updated.

$ rm /var/lib/mysql/grastate.dat
$ service mysql start

Note: Verify that the user used by XtraBackup is created in our initialized node, and is the same in each node.

mysqldump

In general, we do not recommend doing it with mysqldump, because it can be quite slow with a large volume of data. But it is an alternative to perform the task.

We generate the backup in the new master (mysql1):

$ mysqldump -uroot -p --single-transaction --skip-add-locks --triggers --routines --events --databases sakila > /root/backup/sakila_dump.sql

We compress it and send it to our slave node (galera1):

$ gzip /root/backup/sakila_dump.sql
$ scp /root/backup/sakila_dump.sql.gz galera1:/root/backup/

We load the dump into galera1.

$ gunzip /root/backup/sakila_dump.sql.gz
$ mysql -p < /root/backup/sakila_dump.sql

When the dump is loaded in galera1, we must restart the MySQL service on the remaining nodes, removing the file grastate.dat, and verify that we have our data updated.

$ rm /var/lib/mysql/grastate.dat
$ service mysql start

Start Replication Slave

Regardless of which option we choose, XtraBackup or mysqldump, if everything went well, in this step we can already turn on replication in the node that will be our slave (galera1).

$ mysql> CHANGE MASTER TO MASTER_HOST = 'mysql1', MASTER_PORT = 3306, MASTER_USER = 'slave_user', MASTER_PASSWORD = 'slave_password', MASTER_AUTO_POSITION = 1;
$ mysql> START SLAVE;

We verify that the slave is working:

mysql> SHOW SLAVE STATUSG
       Slave_IO_Running: Yes
       Slave_SQL_Running: Yes

At this point, we have something like the following:

Database Diagram 4

After NewGalera1 is up to date, we can re-point the application to our new galera cluster, and reconfigure the asynchronous replication.

ClusterControl

As we mentioned earlier, with ClusterControl we can do several of the tasks mentioned above in a few simple clicks. It also has automatic recovery options, for both the nodes and the cluster. Let's see some tasks that it can assist with.

ClusterControl Deployment 1

To perform a deployment, simply select the option “Deploy Database Cluster” and follow the instructions that appear.

ClusterControl Deployment 2

We can choose between different kinds of technologies and vendors. We must specify User, Key or Password and port to connect by SSH to our servers. We also need the name for our new cluster and if we want ClusterControl to install the corresponding software and configurations for us.

ClusterControl Deployment 3

After setting up the SSH access information, we must define the nodes in our cluster. We can also specify which repository to use. We need to add our servers to the cluster that we are going to create.

We can monitor the status of the creation of our new cluster from the ClusterControl activity monitor.

Also, we can do an import of our current cluster or database following the same steps. In this case, ClusterControl won’t install the database software, because there is already a database running.

ClusterControl Add Replication Salve

To add a replication slave, you need to click on Cluster Actions, select Add Replication Slave, and add the SSH access information of the new server. ClusterControl will connect to the server to make the necessary configurations for this action.

ClusterControl Enable Binary Logging

To turn one or more Galera nodes into master servers (as in the sense of producing binlogs), you can go to Node Actions and select Enable Binary Logging.

ClusterControl Backups

Backups can be configured with XtraBackup (full or incremental) and mysqldump, and you have other options like upload the backup to the cloud, encryption, compression, schedule and more.

ClusterControl Restore

To restore the backup, go to Backup tab and choose Restore option, then you select in what server you want to restore.

ClusterControl Change Replication Master

If you have a slave and you want to change the master, or rebuild the replication, you can go to Node Actions and select the option.

Conclusion

As we could see, we have several ways to achieve our goal, some more complex, others more user friendly, but with any of them you can recreate a cluster from an asynchronous slave. Xtrabackup would restore faster for larger data volumes. To guard against operator error (e.g., an erroneous DROP TABLE), you could also use a delayed slave so you hopefully have time to stop the statement from propagating.

We hope that this information is useful, and that you never have to use it in production 😉

Subscribe to our newsletter

You’ll get two emails every month full of fresh database ops tips and strategic considerations.

Comparing DevOps tooling approaches: Terraform, Ansible, Chef, Puppet, and DIY scripting

Why Cloud Repatriation Matters Now More Than Ever

Automating Day 2 operations: Scaling, upgrades and maintenance

Clustercontrol adds PostgreSQL Bi-Directional Logical Replication and Valkey Sentinel v8.0 in latest release