We have previously blogged about What’s New in MySQL Galera Cluster 4.0, Handling Large Transactions with Streaming Replication and MariaDB 10.4 and presented some guides about using the new Streaming Replication feature in a part 1 & part 2 series.
Moving your database technology from MySQL Replication to MySQL Galera Cluster requires you to have the right skills and an understanding of what you are doing to be successful. In this blog we’ll share some tips for migrating from a MySQL Replication setup to MySQL Galera Cluster 4.0 one.
The Differences Between MySQL Replication and Galera Cluster
If you’re not yet familiar with Galera, we suggest you to go over our Galera Cluster for MySQL Tutorial. Galera Cluster uses a whole different level of replication based on synchronous replication, in contrast to the MySQL Replication which uses asynchronous replication (but could be configured also to achieve a semi-synchronous replication).
Galera Cluster also supports multi-master replication. It is capable of unconstrained parallel applying (i.e., “parallel replication”), multicast replication, and automatic node provisioning.
The primary focus of Galera Cluster is data consistency, whereas with MySQL Replication, it’s prone to data inconsistency (which can be avoided with best practices and proper configuration such as enforcing read-only on the slaves to avoid unwanted writes within the slaves).
Although transactions received by Galera are either applied to every node or not at all, each of these nodes certifies the replicated write-set in the applier queue (transaction commits) which also includes information on all of the locks that were held by the database during the transaction. These write-set, once no conflicting locks identified, are applied. Up to this point, transactions are considered committed and continues to apply it to the tablespace. Unlike in asynchronous replication, this approach is also called virtually synchronous replication since the writes and commits happens in a logical synchronous mode but the actual writing and committing to the tablespace happens independently and goes asynchronous on each node.
Unlike MySQL Replication, a Galera Cluster is a true multi-master, multi-threaded slave, a pure hot-standby, with no need for master-failover or read-write splitting. However, migrating to Galera Cluster doesn’t mean an automatic answer to your problems. Galera Cluster supports only InnoDB, so there could be design modifications if you are using MyISAM or Memory storage engines.
Converting Non-InnoDB Tables to InnoDB
Galera Cluster does allow you to use MyISAM, but this is not what Galera Cluster was designed for. Galera Cluster is designed to strictly implement data consistency within all of the nodes within the Cluster and this requires a strong ACID compliant database engine. InnoDB is an engine that has this strong capabilities in this area and is recommended that you use InnoDB; especially when dealing with transactions.
If you’re using ClusterControl, you can benefit easily to determine your database instance(s) for any MyISAM tables which is provided by Performance Advisors. You can find this under Performance → Advisors tab. For example,
If you require MyISAM and MEMORY tables, you can still use it but make sure your data that does not need to be replicated. You can use your data stored for read-only and, use “START TRANSACTION READONLY” wherever appropriate.
Adding Primary Keys To your InnoDB Tables
Since Galera Cluster only supports InnoDB, it is very important that all of your tables must have a clustered index, (also called primary key or unique key). To get the best performance from queries, inserts, and other database operations, it is very important that you must define every table with a unique key(s) since InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table. This helps avoid long running queries within the cluster and possible can slow down write/read operations in the cluster.
In ClusterControl, there are advisors which can notify you of this. For example, in your MySQL Replication master/slave cluster, you’ll an alarm from the or view from the list of advisors. The example screenshot below reveals that you have no tables that has no primary key:
Identify a Master (or Active-Writer) Node
Galera Cluster is purely a true multi-master replication. However, it doesn’t mean that you’re all free to write whichever node you would like to target. One thing to identify is, when writing on a different node and a conflicting transaction will be detected, you’ll get into a deadlock issue just like below:
2019-11-14T21:14:03.797546Z 12 [Note] [MY-011825] [InnoDB] *** Priority TRANSACTION: TRANSACTION 728431, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 MySQL thread id 12, OS thread handle 140504401893120, query id 1414279 Applying batch of row changes (update) 2019-11-14T21:14:03.797696Z 12 [Note] [MY-011825] [InnoDB] *** Victim TRANSACTION: TRANSACTION 728426, ACTIVE 3 sec updating or deleting mysql tables in use 1, locked 1 , undo log entries 11409 MySQL thread id 57, OS thread handle 140504353195776, query id 1414228 localhost root updating update sbtest1_success set k=k+1 where id > 1000 and id < 100000 2019-11-14T21:14:03.797709Z 12 [Note] [MY-011825] [InnoDB] *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 1663 page no 11 n bits 144 index PRIMARY of table `sbtest`.`sbtest1_success` trx id 728426 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;;
The problem with multiple nodes writing without identifying a current active-writer node, you'll end up with these issues which are very common problems I've seen when using Galera Cluster when writing on multiple nodes at the same time. In order to avoid this, you can use single-master setup approach:
From the documentation,
To relax flow control, you might use the settings below:
wsrep_provider_options = "gcs.fc_limit = 256; gcs.fc_factor = 0.99; gcs.fc_master_slave = YES"
The above requires a server restart since fc_master_slave is not dynamic.
Enable Debugging Mode For Logging Conflicts or Deadlocks
Debugging or tracing issues with your Galera Cluster is very important. Locks in Galera is implemented differently compared to MySQL Replication. It uses optimistic locking when dealing with transactions cluster-wide. Unlike the MySQL Replication, it has only pessimistic locking which doesn't know if there's such same or conflicting transaction being executed in a co-master on a multi-master setup. Galera still uses pessimistic locking but on the local node since it's managed by InnoDB, which is the storage engine supported. Galera uses optimistic locking when it goes to other nodes. This means that no checks are made with other nodes on the cluster when local locks are attained (pessimistic locking). Galera assumes that, once the transaction passes the commit phase within the storage engine and the other nodes are informed, everything will be okay and no conflicts will arise.
In practice, it's best to enable wsrep_logs_conflicts. This will log the details of conflicting MDL as well as InnoDB locks in the cluster. Enabling this variable can be set dynamically but caveat once this is enabled. It will verbosely populate your error-log file and can fill up your disk once your error-log file size is too large.
Be Careful With Your DDL Queries
Unlike MySQL Replication, running an ALTER statement can affect only incoming connections that requires to access or reference that table targeted by your ALTER statement. It can also affect slaves if the table is large and can bring slave lag. However, writes to your master won't be block as long as your queries does not conflict with the current ALTER. However, this is entirely not the case when running your DDL statements such as ALTER with Galera Cluster. ALTER statements can bring problems such as Galera Cluster stuck due to cluster-wide lock or flow control starts to relax the replication while some nodes are recovering from large writes.
In some situations, you might end up having downtime to your Galera Cluster if that table is too large and is a primary and vital table to your application. However, it can be achieved without downtime. As Rick James pointed out in his blog, you can follow the recommendations below:
RSU vs TOI
- Rolling Schema Upgrade = manually do one node (offline) at a time
- Total Order Isolation = Galera synchronizes so that it is done at the same time (in the replication sequence) on all nodes. RSU and TOI
Caution: Since there is no way to synchronize the clients with the DDL, you must make sure that the clients are happy with either the old or the new schema. Otherwise, you will probably need to take down the entire cluster while simultaneously switching over both the schema and the client code.
A "fast" DDL may as well be done via TOI. This is a tentative list of such:
- CREATE/DROP/RENAME DATABASE/TABLE
- ALTER to change DEFAULT
- ALTER to change definition of ENUM or SET (see caveats in manual)
- Certain PARTITION ALTERs that are fast.
- DROP INDEX (other than PRIMARY KEY)
- ADD INDEX?
- Other ALTERs on 'small' tables.
- With 5.6 and especially 5.7 having a lot of ALTER ALGORITHM=INPLACE cases, check which ALTERs should be done which way.
Otherwise, use RSU. Do the following separately for each node:
SET GLOBAL wsrep_OSU_method='RSU';
This also takes the node out of the cluster.
ALTER TABLE SET GLOBAL wsrep_OSU_method='TOI';
Puts back in, leading to resync (hopefully a quick IST, not a slow SST)
Preserve the Consistency Of Your Cluster
Galera Cluster does not support replication filters such as binlog_do_db or binlog_ignore_db since Galera does not rely with binary logging. It relies on the ring-buffer file also called GCache which stores write-sets that are replicated along the cluster. You cannot apply any inconsistent behavior or state of such database nodes.
Galera, on the other hand, strictly implements data consistency within the cluster. It's still possible that there can be inconsistency where rows or records cannot be found. For example, setting your variable wsrep_OSU_method either RSU or TOI for your DDL ALTER statements might bring inconsistent behavior. Check this external blog from Percona discussing about inconsistency with Galera with TOI vs RSU.
Setting wsrep_on=OFF and subsequently run DML or DDL queries can be dangerous to your cluster. You must also review your stored procedures, triggers, functions, events, or views if results are not dependent on a node's state or environment. When a certain node(s) can be inconsistent, it can potentially bring the entire cluster to go down. Once Galera detects an inconsistent behavior, Galera will attempt to leave the cluster and terminate that node. Hence, it's possible that all of the nodes can be inconsistent leaving you under a state of dilemma.
If a Galera Cluster node as well experiences a crash especially upon a high-traffic period, it's better not to start right away the node. Instead, perform a full SST or bring a new instance as soon as possible or once the traffic goes low. It can be possible that node can bring inconsistent behavior which might have corrupted data.
Segregate Large Transactions and Determine Whether to Use Streaming Replication
Let's get straight on this one. One of the biggest changes features especially on Galera Cluster 4.0 is the streaming replication. Past versions of Galera Cluster 4.0, it limits transactions < 2GiB which is typically controlled by variables wsrep_max_ws_rows and wsrep_max_ws_size. Since Galera Cluster 4.0, you can able to send > 2GiB of transactions but you must determine how large the fragments has to be processed during replication. It has to be set by session and the only variables you need to take care are wsrep_trx_fragment_unit and wsrep_trx_fragment_size. Disabling the Streaming Replication is simple as setting the wsrep_trx_fragment_size = 0 will do it. Take note that, replicating a large transaction also possess overhead on the slave nodes (nodes that are replicating against the current active-writer/master node) since logs will be written to wsrep_streaming_log table in the MySQL database.
Another thing to add, since you're dealing with large transaction, it's considerable that your transaction might take some time to finish so setting the variable innodb_lock_wait_timeout high must have to be taken into account. Set this via session depending on the time you estimate but larger than the time you estimate it to finish, otherwise raise a timeout.
We recommend you read this previous blog about streaming replication in action.
Replicating GRANTs Statements
If you're using GRANTs and related operations act on the MyISAM/Aria tables in the database `mysql`. The GRANT statements will be replicated, but the underlying tables will not. So this means, INSERT INTO mysql.user ... will not be replicated because the table is MyISAM.
However, the above might not be true anymore since Percona XtraDB Cluster(PXC) 8.0 (currently experimental) as mysql schema tables have been converted to InnoDB, whilst in MariaDB 10.4, some of the tables are still in Aria format but others are in CSV or InnoDB. You should determine what version and provider of Galera you have but best to avoid using DML statements referencing mysql schema. Otherwise, you might end up on unexpected results unless you're sure that this is PXC 8.0.
XA Transactions, LOCK/UNLOCK TABLES, GET_LOCK/RELEASE_LOCK are Not Supported
Galera Cluster does not support XA Transactions since XA Transactions handles rollback and commits differently. LOCK/UNLOCK or GET_LOCK/RELEASE_LOCK statements are dangerous to be applied or used with Galera. You might experience a crash or locks that are not killable and stay locked. For example,
---TRANSACTION 728448, ACTIVE (PREPARED) 13356 sec mysql tables in use 2, locked 2 3 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 5 MySQL thread id 67, OS thread handle 140504353195776, query id 1798932 localhost root wsrep: write set replicated and certified (13) insert into sbtest1(k,c,pad) select k,c,pad from sbtest1_success limit 5
This transaction has already been unlocked and even been killed but to no avail. We suggest that you have to redesign your application client and get rid of these functions when migrating to Galera Cluster.
Network Stability is a MUST!!!
Galera Cluster can work even with inter-WAN topology or inter-geo topology without any issues (check this blog about implementing inter-geo topology with Galera). However, if your network connectivity between each nodes is not stable or intermittently going down for an unsuspected time, it can be problematic for the cluster. It's best you have a cluster running in a private and local network where each of these nodes are connected. When designing a node as a disaster recovery, then plan to create a cluster if these are on a different region or geography. You may start reading our previous blog, Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part One as this could help you best to decide your Galera Cluster topology.
Another thing to add about investing your network hardware, it would be problematic if your network transfer rate provides you a lower speed during rebuilding of an instance during IST or worse at SST especially if your data set is massive. It can take long hours of network transfer and that might affect the stability of your cluster especially if you have a 3-node cluster while 2 nodes are not available where these 2 are a donor and a joiner. Take note that, during SST phase, the DONOR/JOINER nodes cannot be in-used until it's finally able to sync with the primary cluster.
In previous version of Galera, when it comes to donor node selection, the State Snapshot Transfer (SST) donor was selected at random. In Glera 4, it has much more improved and has the ability to choose the right donor within the cluster, as it will favour a donor that can provide an Incremental State Transfer (IST), or pick a donor in the same segment. Alternatively, you can set wsrep_sst_donor variable to the right donor you would like to always pick.
Backup Your Data and Do Rigid Testing During Migration and Before Production
Once you are suit up and has decided to try and migrate your data to Galera Cluster 4.0, make sure you always have your backup prepared. If you tried ClusterControl, taking backups shall be easier to do this.
Ensure that you are migrating to the right version of InnoDB and do not forget to always apply and run mysql_upgrade before doing the test. Ensure that all your test passes the desired result from which the MySQL Replication can offer you. Most likely, there's no difference with the InnoDB storage engine you're using in a MySQL Replication Cluster versus the MySQL Galera Cluster as long as the recommendations and tips have been applied and prepared beforehand.
Migrating to Galera Cluster 4.0 might not be your desired database technology solution. However, it is not pulling you away to utilize Galera Cluster 4.0 as long as its specific requirements can be prepared, setup, and provided. Galera Cluster 4.0 has now become a very powerful viable choice and option especially on a highly-available platform and solution. We also suggest that you read these external blogs about Galera Caveats or the Limitations of Galera Cluster or this manual from MariaDB.