blog
Live Database Migration from MMM to MariaDB Galera Cluster
This is a guest post from Igor Shevtsov, a MySQL DBA based in Twickenham, UK. Igor has had the opportunity to deploy and manage different types of high availability MySQL solutions, and we invited him to share his experience. Igor can be contacted by email or LinkedIn.
Migrating from MMM to MariaDB Galera Multi-Master Replication
Agenda:
- MySQL HA with MMM
- Why MMM did not work for us
- Migration requirements
- Migration
MySQL HA with MMM
Firstly I would like to say few words about our original MMM (Multi Master Management) MySQL setup and some reasons why we decided to start looking for another MySQL HA solution.
We had 2 Data Centres with Active master1 in DC1 and passive master2 and Management node in DC2, both masters were running MariaDB 5.5 server. The clients were unevenly distributed across both DC’s. DC’s were connected by a VPN that wasn’t 100% reliable and that occasionally failed.
The simplicity of MMM setup and management as well as speed of failover let us to believe that it was a good choice for MySQL HA. As it usually happens, we did not have much time to test it properly and we quickly realized its limitations after the VPN failed. And of course, it had to fail at night đ
Why MMM Did Not Work for Us
The post-mortem:
- VPN failure caused management node to assign VIP to Master2 and promoted Master2 to active master
- At this point, we have two masters having the same VIP. Each master is serving requests from clients that are in the same DC.
- After the VPN was re-established, the management node brought Master1 back into configuration and set it up into a READ ONLY mode but failed to drop network interface with VIP.
- At this point, a number of clients returned and error âFailed to write to mysql read-onlyâ and we were down at this point with an inconsistent database and a lot to think about.
Thanks to Robert Hodges for very well explained behaviour of VIP.
So, we needed a MySQL HA solution with:
- no single point of failure
- ability to read/write to any node
- simple management
- good performance via WAN
- NO data integrity issues
- automatic node provisioning
- no need for manual failover or VIP needed
As you see the requirements are quite bold. After a bit of research we decided to move to a Galera Cluster solution as it looked like Galera ticked all the boxes plus it works with MariaDB which would simplify the migration process.
Migration Requirements
Taking to consideration some Galera limitations, the dataset need to meet some requirements:
- All tables must be InnoDB
- All tables must have Primary Key
- binlog format must be ROW
Thanks to Giuseppe Maxia for a nice check query that helps to identify tables with missing PK as well as tables with FULLTEXT or SPATIAL functionality:
SELECT DISTINCT CONCAT(t.table_schema,'.',t.table_name) as tbl, t.engine, IF(ISNULL(c.constraint_name),'NOPK','') AS nopk, IF(s.index_type = 'FULLTEXT','FULLTEXT','') as ftidx, IF(s.index_type = 'SPATIAL','SPATIAL','') as gisidx FROM information_schema.tables AS t LEFT JOIN information_schema.key_column_usage AS c ON (t.table_schema = c.constraint_schema AND t.table_name = c.table_name AND c.constraint_name = 'PRIMARY') LEFT JOIN information_schema.statistics AS s ON (t.table_schema = s.table_schema AND t.table_name = s.table_name AND s.index_type IN ('FULLTEXT','SPATIAL')) WHERE t.table_schema NOT IN ('information_schema','performance_schema','mysql') AND t.table_type = 'BASE TABLE' AND (t.engine <> 'InnoDB' OR c.constraint_name IS NULL OR s.index_type IN ('FULLTEXT','SPATIAL')) ORDER BY t.table_schema,t.table_name
Once identified Primary Key could be added as follows:
ALTER TABLE t ADD id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY FIRST, ENGINE = InnoDB
To change binlog format to ROW:
Master2 (Passive Master):
STOP SLAVE
Master1 (Active Master):
Master2:
Edit /etc/my.cnf:
binlog_format = ROW
At this point, we are ready to start the migration process.
Migration
Migration steps:
- Disable MMM
- Upgrade Master2 to Galera node1
- Add Galera node2
- Failover clients to Galera cluster
- Upgrade Master1 to Galera node3
Disable MMM
Here, we need to stop MMM management demon and MMM agents.
On the Management node:
$ /etc/init.d/mysql-mmm-monitor stop
On Master1 and Master2:
$ /etc/init.d/mysql-mmm-agent stop
Upgrade Master2 to first Galera node (Galera1)
On Master2:
1. Stop Replication:
mariadb> STOP SLAVE
2. Remove existent MariaDB server with YUM:
$ sudo yum remove MariaDB-server MariaDB-client
3. Install Galera Cluster with YUM:
$ sudo yum install MariaDB-Galera-server MariaDB-client galera
4. Edit /etc/my.cnf
REMOVE Master-Master entries:
auto_increment_offset = 2 auto_increment_increment = 2 sync_binlog = 1 read_only = 1
Add Galera entries:
wsrep_provider=/usr/lib64/galera/libgalera_smm.so # wsrep_cluster_address=gcomm:// # disable this for now!!! wsrep_node_name=node1 wsrep_provider_options="gcache.size=1G;gcs.fc_limit=256;evs.keepalive_period = PT1S; evs.inactive_check_period = PT3S; evs.suspect_timeout = PT10S; evs.inactive_timeout = PT30S; evs.install_timeout = PT30S" wsrep_drupal_282555_workaround wsrep_sst_method=rsync
Make sure rsync is installed and iptables open for ports 4567, 4568, 4444.
For MySQL asynchronous replication to work properly and propagate across all Galera nodes, log_slave_updates, binary logging need to be enabled and server_id must be the same across all Galera nodes:
[mysqld] default_storage_engine = InnoDB innodb_file_per_table innodb_flush_log_at_trx_commit = 2 innodb_doublewrite = 0 log_slave_updates = 1 log_bin = bin-log server_id = 1111 binlog_format = ROW
5. Restart Galera server:
$ service mysql start --wsrep_cluster_address=gcomm://
6. Start replication on Galera node1:
At this point, all Clients are connected to Master1 and Master2/Galera1 is using asynchronous MySQL replication to replicate data from Master1.
Add 2nd Galera node (Galera2)
In order to bring Galera2 online, we need:
- Install Galera Cluster with YUM.
- Edit /etc/my.cnf
wsrep_cluster_address=gcomm://192.168.200.2,192.168.200.3
$ service mysql start
wsrep_cluster_address=gcomm://192.168.200.2,192.168.200.3
After the Galera node completes full Snapshot State Transfer (SST), check that MySQL server is running:
$ netstat -plant | grep mysql
Log into the Galera nodes and check their wsrep status:
node1> show status like 'wsrep%' +----------------------------+-----------------------------------------+ | Variable_name | Value | +----------------------------+-----------------------------------------+ | wsrep_local_state_uuid | 2859f5b2-7119-11e2-0800-49d284b2aa61 | | wsrep_last_committed | 451 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cluster_size | 2 | | wsrep_cluster_status | Prim | | wsrep_connected | ON | | wsrep_ready | ON |
Looks good.
Failover clients to Galera cluster
We use HAProxy Load Balancers running on a pair of nodes managed by Keepalived daemon.
I will skip the instructions of installations of keepalived and HAProxy daemons.
The main idea behind this setup is HAProxy is serving clients requests to the Galera cluster in load balancing leastconn mode. Keepalived daemon looks after availability of the HAProxy resource and takes care of failover in case the resource is unavailable.
1. Briefly stop the writing to Master1:
2. Make sure Galera1 caught up with Master1:
3. Take a note of GTID on Galera1:
node1> SHOW STATUS LIKE 'wsrep_%' +----------------------------+-----------------------------------------+ | Variable_name | Value | +----------------------------+-----------------------------------------+ | wsrep_local_state_uuid | 2859f5b2-7119-11e2-0800-49d284b2aa61 | | wsrep_last_committed | 39591 |
4. Direct the load to Galera through HAProxy.
Upgrade Master1 to Galera3
1. Remove existent MariaDB server with YUM.
2. Install Galera server with YUM.
3. Edit /etc/my.cnf
REMOVE Master-Master entries:
auto_increment_offset = 1 auto_increment_increment = 2 sync_binlog = 1 read_only = 1
ADD Galera entries:
wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_address=gcomm://192.168.200.2,192.168.200.3,192.168.200.1 wsrep_node_name=node3 wsrep_provider_options="gcache.size=1G;gcs.fc_limit=256;evs.keepalive_period = PT1S; evs.inactive_check_period = PT3S; evs.suspect_timeout = PT10S; evs.inactive_timeout = PT30S; evs.install_timeout = PT30S" wsrep_drupal_282555_workaround wsrep_sst_method=rsync [mysqld] default_storage_engine = InnoDB innodb_file_per_table innodb_flush_log_at_trx_commit = 2 innodb_doublewrite = 0 log_slave_updates = 1 log_bin = bin-log server_id = 1111 binlog_format = ROW
Copy grastate.dat file from Galera1. Set state information to the GTID we noted previously:
uuid: 2859f5b2-7119-11e2-0800-49d284b2aa61 seqno: 39591
Start Galera server on Master1/Galera3:
$ service mysql start
At this stage Master1 becomes Galera3 and joins the Galera cluster by Incremental State Transfer (IST)
Login to Galera3 and check wsrep status:
node1> show status like 'wsrep%' +----------------------------+-----------------------------------------+ | Variable_name | Value | +----------------------------+-----------------------------------------+ | wsrep_local_state_uuid | 2859f5b2-7119-11e2-0800-49d284b2aa61 | | wsrep_last_committed | 10672332 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cluster_size | 3 | | wsrep_cluster_status | Prim | | wsrep_connected | ON | | wsrep_ready | ON |
Looks good.
Migration done. Hopefully this post will help to migrate from MMM to Galera if somebody needs it.
I have not covered load balancers and cluster management, but we use a pair of HAProxy instances mirroring each other in front of the Galera Cluster, and ClusterControl for monitoring and maintenance.