Severalnines Blog
The automation and management blog for open source databases

From the Field - Live Migration from MMM to MariaDB Galera Cluster

Severalnines
Posted in:

This is a guest post from Igor Shevtsov, a MySQL DBA based in Twickenham, UK. Igor has had the opportunity to deploy and manage different types of high availability MySQL solutions, and we invited him to share his experience. Igor can be contacted by email or LinkedIn.

 

Migrating from MMM to MariaDB Galera Multi-Master Replication

Agenda:

  • MySQL HA with MMM
  • Why MMM did not work for us
  • Migration requirements
  • Migration

 

MySQL HA with MMM

Firstly I would like to say few words about our original MMM (Multi Master Management) MySQL setup and some reasons why we decided to start looking for another MySQL HA solution.

We had 2 Data Centres with Active master1 in DC1 and passive master2 and Management node in DC2, both masters were running MariaDB 5.5 server. The clients were unevenly distributed across both DC's. DC's were connected by a VPN that wasn't 100% reliable and that occasionally failed.

 

The simplicity of MMM setup and management as well as speed of failover let us to believe that it was a good choice for MySQL HA. As it usually happens, we did not have much time to test it properly and we quickly realized its limitations after the VPN failed. And of course, it had to fail at night :-( 

 

 

Why MMM did not work for us

The post-mortem:

  • VPN failure caused management node to assign VIP to Master2 and promoted Master2 to active master
  • At this point, we have two masters having the same VIP. Each master is serving requests from clients that are in the same DC.
  • After the VPN was re-established, the management node brought Master1 back into configuration and set it up into a READ ONLY mode but failed to drop network interface with VIP.
  • At this point, a number of clients returned and error “Failed to write to mysql read-only” and we were down at this point with an inconsistent database and a lot to think about.  

Thanks to Robert Hodges for very well explained behaviour of VIP

So, we needed a MySQL HA solution with:

  • no single point of failure
  • ability to read/write to any node
  • simple management
  • good performance via WAN
  • NO data integrity issues
  • automatic node provisioning
  • no need for manual failover or VIP needed

As you see the requirements are quite bold. After a bit of research we decided to move to a Galera Cluster solution as it looked like Galera ticked all the boxes plus it works with MariaDB which would simplify the migration process.

 

Migration requirements

Taking to consideration some Galera limitations, the dataset need to meet some requirements:

  • All tables must be InnoDB
  • All tables must have Primary Key
  • binlog format must be ROW

Thanks to Giuseppe Maxia for a nice check query that helps to identify tables with missing PK as well as tables with FULLTEXT or SPATIAL functionality:

SELECT DISTINCT
       CONCAT(t.table_schema,'.',t.table_name) as tbl,
       t.engine,
       IF(ISNULL(c.constraint_name),'NOPK','') AS nopk,
       IF(s.index_type = 'FULLTEXT','FULLTEXT','') as ftidx,
       IF(s.index_type = 'SPATIAL','SPATIAL','') as gisidx
  FROM information_schema.tables AS t
  LEFT JOIN information_schema.key_column_usage AS c
    ON (t.table_schema = c.constraint_schema AND t.table_name = c.table_name
        AND c.constraint_name = 'PRIMARY')
  LEFT JOIN information_schema.statistics AS s
    ON (t.table_schema = s.table_schema AND t.table_name = s.table_name
        AND s.index_type IN ('FULLTEXT','SPATIAL'))
  WHERE t.table_schema NOT IN ('information_schema','performance_schema','mysql')
    AND t.table_type = 'BASE TABLE'
    AND (t.engine <> 'InnoDB' OR c.constraint_name IS NULL OR s.index_type IN ('FULLTEXT','SPATIAL'))
  ORDER BY t.table_schema,t.table_name;

 

Once identified Primary Key could be added as follows:

To change binlog format to ROW:

Master2 (Passive Master):

STOP SLAVE;

Master1 (Active Master):

FLUSH TABLES WITH READ LOCK;
FLUSH LOGS;
SET GLOBAL binlog_format = 'ROW';
FLUSH LOGS;
UNLOCK TABLES;

Master2:

START SLAVE;
SET GLOBAL binlog_format = 'ROW';

Edit /etc/my.cnf:

binlog_format = ROW

At this point, we are ready to start the migration process.

Migration

Migration steps:

  1. Disable MMM
  2. Upgrade Master2 to Galera node1
  3. Add Galera node2
  4. Failover clients to Galera cluster
  5. Upgrade Master1 to Galera node3

Disable MMM

Here, we need to stop MMM management demon and MMM agents.

On the Management node:

$ /etc/init.d/mysql-mmm-monitor stop

On Master1 and Master2:

$ /etc/init.d/mysql-mmm-agent stop

Upgrade Master2 to first Galera node (Galera1)

On Master2:

1. Stop Replication:

mariadb> STOP SLAVE;

2. Remove existent MariaDB server with YUM:

$ sudo yum remove MariaDB-server MariaDB-client

 

3. Install Galera Cluster with YUM:

$ sudo yum install MariaDB-Galera-server MariaDB-client galera

4. Edit /etc/my.cnf; remove MMM replication specific entries and add Galera entries

REMOVE Master-Master entries:

auto_increment_offset = 2
auto_increment_increment = 2
sync_binlog = 1
read_only = 1

 

Add Galera entries:

wsrep_provider=/usr/lib64/galera/libgalera_smm.so
# wsrep_cluster_address=gcomm://   # disable this for now!!!
wsrep_node_name=node1
wsrep_provider_options="gcache.size=1G;gcs.fc_limit=256;evs.keepalive_period = PT1S; evs.inactive_check_period = PT3S; evs.suspect_timeout = PT10S; evs.inactive_timeout = PT30S; evs.install_timeout = PT30S"
wsrep_drupal_282555_workaround
wsrep_sst_method=rsync

Make sure rsync is installed and iptables open for ports 4567, 4568, 4444.

For MySQL asynchronous replication to work properly and propagate across all Galera nodes, log_slave_updates, binary logging need to be enabled and server_id must be the same across all Galera nodes:

[mysqld]
default_storage_engine         = InnoDB
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite             = 0
log_slave_updates              = 1
log_bin                        = bin-log
server_id                      = 1111
binlog_format                  = ROW

 

5. Restart Galera server:

$ service mysql start --wsrep_cluster_address=gcomm://

6. Start replication on Galera node1:

node1> START SLAVE;
node1> SHOW SLAVE STATUS\G

At this point, all Clients are connected to Master1 and Master2/Galera1 is using asynchronous MySQL replication to replicate data from Master1.

Add 2nd Galera node (Galera2)

In order to bring Galera2 online, we need:

  1. Install Galera Cluster with YUM.
  2. Edit /etc/my.cnf; add Galera entries, set wsrep_cluster_address to IP Galera node1:
    wsrep_cluster_address=gcomm://192.168.200.2,192.168.200.3
  3. Start the Galera node:
    $ service mysql start
  4. Edit /etc/my.cnf on Galera1:
    wsrep_cluster_address=gcomm://192.168.200.2,192.168.200.3

 

After the Galera node completes full Snapshot State Transfer (SST), check that MySQL server is running:

$ netstat -plant | grep mysql

Log into the Galera nodes and check their wsrep status:

node1> show status like 'wsrep%';
+----------------------------+-----------------------------------------+
| Variable_name              | Value                                   |
+----------------------------+-----------------------------------------+
| wsrep_local_state_uuid     | 2859f5b2-7119-11e2-0800-49d284b2aa61    |
| wsrep_last_committed       | 451                                     |
| wsrep_local_state          | 4                                       |
| wsrep_local_state_comment  | Synced                                  |
| wsrep_cluster_size         | 2                                       |
| wsrep_cluster_status       | Prim                                    |
| wsrep_connected            | ON                                      |
| wsrep_ready                | ON                                      |

 

Looks good.

 

Failover clients to Galera cluster

We use HAProxy Load Balancers running on a pair of nodes managed by Keepalived daemon.

I will skip the instructions of installations of keepalived and HAProxy daemons.

The main idea behind this setup is HAProxy is serving clients requests to the Galera cluster in load balancing leastconn mode. Keepalived daemon looks after availability of the HAProxy resource and takes care of failover in case the resource is unavailable.

 

1. Briefly stop the writing to Master1:

master1> FLUSH TABLES WITH READ LOCK;

2. Make sure Galera1 caught up with Master1:

master1> SHOW MASTER STATUS;
node1> SHOW SLAVE STATUS\G

3. Take a note of GTID on Galera1:

node1> SHOW STATUS LIKE 'wsrep_%';
+----------------------------+-----------------------------------------+
| Variable_name              | Value                                   |
+----------------------------+-----------------------------------------+
| wsrep_local_state_uuid     | 2859f5b2-7119-11e2-0800-49d284b2aa61    |
| wsrep_last_committed       | 39591                                   |

 

4. Direct the load to Galera through HAProxy.

Upgrade Master1 to Galera3

1. Remove existent MariaDB server with YUM.

2. Install Galera server with YUM.

3. Edit /etc/my.cnf; remove MMM replication specific entries, add Galera entries.

REMOVE Master-Master entries:

auto_increment_offset = 1
auto_increment_increment = 2
sync_binlog = 1
read_only = 1

 

ADD Galera entries:

wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.200.2,192.168.200.3,192.168.200.1
wsrep_node_name=node3
wsrep_provider_options="gcache.size=1G;gcs.fc_limit=256;evs.keepalive_period = PT1S; evs.inactive_check_period = PT3S; evs.suspect_timeout = PT10S; evs.inactive_timeout = PT30S; evs.install_timeout = PT30S"
wsrep_drupal_282555_workaround
wsrep_sst_method=rsync
 
[mysqld]
default_storage_engine         = InnoDB
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite             = 0
log_slave_updates              = 1
log_bin                        = bin-log
server_id                      = 1111
binlog_format                  = ROW

 

Copy grastate.dat file from Galera1. Set state information to the GTID we noted previously:

uuid: 2859f5b2-7119-11e2-0800-49d284b2aa61
seqno: 39591

Start Galera server on Master1/Galera3:

$ service mysql start

 

At this stage Master1 becomes Galera3 and joins the Galera cluster by Incremental State Transfer (IST)

Login to Galera3 and check wsrep status:

node1> show status like 'wsrep%';
+----------------------------+-----------------------------------------+
| Variable_name              | Value                                   |
+----------------------------+-----------------------------------------+
| wsrep_local_state_uuid     | 2859f5b2-7119-11e2-0800-49d284b2aa61    |
| wsrep_last_committed       | 10672332                                |
| wsrep_local_state          | 4                                       |
| wsrep_local_state_comment  | Synced                                  |
| wsrep_cluster_size         | 3                                       |
| wsrep_cluster_status       | Prim                                    |
| wsrep_connected            | ON                                      |
| wsrep_ready                | ON                                      |

 

Looks good.

 

Migration done. Hopefully this post will help to migrate from MMM to Galera if somebody needs it.

I have not covered load balancers and cluster management, but we use a pair of HAProxy instances mirroring each other in front of the Galera Cluster, and ClusterControl for monitoring and maintenance.