blog
Top Common Issues with MHA and How to Fix Them
In our previous blogs, we discussed MHA as a failover tool used in MySQL master-slave setups. Last month, we also blogged about how to handle MHA when it crashed. Today, we will see the top issues that DBAs usually encounter with MHA, and how you can fix them.
A Brief Intro To MHA (Master High Availability)
MHA stands for (Master High Availability) is still relevant and widely used today, especially in master-slave setups based on non-GTID replication. MHA performs well a failover or master-switch, but it does come with some pitfalls and limitations. Once MHA performs a master failover and slave promotion, it can automatically complete its database failover operation within ~30 seconds, which can be acceptable in a production environment. MHA can ensure the consistency of data. All this with zero performance degradation, and it requires no additional adjustments or changes to your existing deployments or setup. Apart from this, MHA is built on top of Perl and is an open-source HA solution – so it is relatively easy to create helpers or extend the tool in accordance to your desired setup. Check out this presentation for more information.
MHA software consists of two components, you need to install one of the following packages in accordance to its topology role:
MHA manager node = MHA Manager (manager)/MHA Node (data node)
Master/Slave nodes = MHA Node (data node)
MHA Manager is the software that manages the failover (automatic or manual), takes decisions on when and where to failover, and manages slave recovery during promotion of the candidate master for applying differential relay logs. If the master database dies, MHA Manager will coordinate with MHA Node agent as it applies differential relay logs to the slaves that do not have the latest binlog events from the master. The MHA Node software is a local agent that will monitor your MySQL instance and allow the MHA Manager to copy relay logs from the slaves. Typical scenario is that when the candidate master for failover is currently lagging and MHA detects it do not have the latest relay logs. Hence, it will wait for its mandate from MHA Manager as it searches for the latest slave that contains the binlog events and copies missing events from the slave using scp and applies them to itself.
Note though that MHA is currently not actively maintained, but the current version itself is stable and may be “good enough” for production. You can still echo your voice through github to address some issues or provide patches to the software.
Top Common Issues
Now let’s look at the most common issues that a DBA will encounter when using MHA.
Slave is lagging, non-interactive/automated failover failed!
This is a typical issue causing automated failover to abort or fail. This might sound simple but it does not point to only one specific problem. Slave lag may have different reasons:
- Disk issues on the candidate master causing it to be disk I/O bound to process read and writes. It can also lead to data corruption if not mitigated.
- Bad queries are replicated especially tables that have no primary keys or clustered indexes.
- high server load.
- Cold server and server hasn’t yet warmed up
- Not enough server resources. Possible that your slave can be too low in memory or server resources while replicating high intensive writes or reads.
Those can be mitigated in advance if you have proper monitoring of your database. One example with regards to slave lags in MHA is low-memory when dumping a big binary log file. As an example below, a master was marked as dead and it has to perform a non-interactive/automatic failover. However, as the candidate master was lagging and it has to apply the logs that weren’t yet executed by the replication threads, MHA will locate the most up-to-date or latest slave as it will attempt to recover a slave against the oldest ones. Hence, as you can see below, while it was performing a slave recovery, the memory went too low:
vagrant@testnode20:~$ masterha_manager --conf=/etc/app1.cnf --remove_dead_master_conf --ignore_last_failover
Mon May 6 08:43:46 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon May 6 08:43:46 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Mon May 6 08:43:46 2019 - [info] Reading server configuration from /etc/app1.cnf..
…
Mon May 6 08:43:57 2019 - [info] Checking master reachability via MySQL(double check)...
Mon May 6 08:43:57 2019 - [info] ok.
Mon May 6 08:43:57 2019 - [info] Alive Servers:
Mon May 6 08:43:57 2019 - [info] 192.168.10.50(192.168.10.50:3306)
Mon May 6 08:43:57 2019 - [info] 192.168.10.70(192.168.10.70:3306)
Mon May 6 08:43:57 2019 - [info] Alive Slaves:
Mon May 6 08:43:57 2019 - [info] 192.168.10.50(192.168.10.50:3306) Version=5.7.23-23-log (oldest major version between slaves) log-bin:enabled
Mon May 6 08:43:57 2019 - [info] Replicating from 192.168.10.60(192.168.10.60:3306)
Mon May 6 08:43:57 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Mon May 6 08:43:57 2019 - [info] 192.168.10.70(192.168.10.70:3306) Version=5.7.23-23-log (oldest major version between slaves) log-bin:enabled
Mon May 6 08:43:57 2019 - [info] Replicating from 192.168.10.60(192.168.10.60:3306)
Mon May 6 08:43:57 2019 - [info] Not candidate for the new Master (no_master is set)
Mon May 6 08:43:57 2019 - [info] Starting Non-GTID based failover.
….
Mon May 6 08:43:59 2019 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Mon May 6 08:43:59 2019 - [info]
Mon May 6 08:43:59 2019 - [info] Server 192.168.10.50 received relay logs up to: binlog.000004:106167341
Mon May 6 08:43:59 2019 - [info] Need to get diffs from the latest slave(192.168.10.70) up to: binlog.000005:240412 (using the latest slave's relay logs)
Mon May 6 08:43:59 2019 - [info] Connecting to the latest slave host 192.168.10.70, generating diff relay log files..
Mon May 6 08:43:59 2019 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=vagrant --scp_host=192.168.10.50 --latest_mlf=binlog.000005 --latest_rmlp=240412 --target_mlf=binlog.000004 --target_rmlp=106167341 --server_id=3 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.10.50_3306_20190506084355.binlog --workdir=/tmp --timestamp=20190506084355 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=relay-bin.000007
Mon May 6 08:44:00 2019 - [info]
Relay log found at /var/lib/mysql, up to relay-bin.000007
Fast relay log position search failed. Reading relay logs to find..
Reading relay-bin.000007
Binlog Checksum enabled
Master Version is 5.7.23-23-log
Binlog Checksum enabled
…
…...
Target relay log file/position found. start_file:relay-bin.000004, start_pos:106167468.
Concat binary/relay logs from relay-bin.000004 pos 106167468 to relay-bin.000007 EOF into /tmp/relay_from_read_to_latest_192.168.10.50_3306_20190506084355.binlog ..
Binlog Checksum enabled
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 361.. ok.
Dumping effective binlog data from /var/lib/mysql/relay-bin.000004 position 106167468 to tail(1074342689)..Out of memory!
Mon May 6 08:44:00 2019 - [error][/usr/local/share/perl/5.26.1/MHA/MasterFailover.pm, ln1090] Generating diff files failed with return code 1:0.
Mon May 6 08:44:00 2019 - [error][/usr/local/share/perl/5.26.1/MHA/MasterFailover.pm, ln1584] Recovering master server failed.
Mon May 6 08:44:00 2019 - [error][/usr/local/share/perl/5.26.1/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_manager line 65.
Mon May 6 08:44:00 2019 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.10.60(192.168.10.60:3306)
Master 192.168.10.60(192.168.10.60:3306) is down!
Check MHA Manager logs at testnode20 for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.10.60(192.168.10.60:3306)
The latest slave 192.168.10.70(192.168.10.70:3306) has all relay logs for recovery.
Selected 192.168.10.50(192.168.10.50:3306) as a new master.
Recovering master server failed.
Got Error so couldn't continue failover from here.
Thus, the failover failed. This example above shows that node 192.168.10.70 contains the most updated relay logs. However, in this example scenario, node 192.168.10.70 is set as no_master because it has a low memory. As it tries to recover the slave 192.168.10.50, it fails!
Fixes/Resolution:
This scenario illustrates something very important. An advanced monitoring environment must be setup! For example, you can run a background or daemon script which monitors the replication health. You can add as an entry through a cron job. For example, add an entry using the built-in script masterha_check_repl:
/usr/local/bin/masterha_check_repl --conf=/etc/app1.cnf
or create a background script which invokes this script and runs it in an interval. You can use report_script option to setup an alert notification in case it doesn’t conform to your requirements, e.g., slave is lagging for about 100 seconds during a high peak load. You can also use monitoring platforms such as ClusterControl set it up to send you notifications based on the metrics you want to monitor.
In addition to this, take note that, in the example scenario, failover failed due to out-of-memory error. You might consider ensuring all your nodes to have enough memory and the right size of binary logs as they would need to dump the binlog for a slave recovery phase.
Inconsistent Slave, Applying diffs failed!
In relevance to slave lag, since MHA will try to sync relay logs to a candidate master, make sure that your data is in sync. Say for an example below:
...
Concat succeeded.
Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.10.50_3306_20190506054328.binlog .
scp testnode7:/tmp/relay_from_read_to_latest_192.168.10.50_3306_20190506054328.binlog to [email protected](22) succeeded.
Mon May 6 05:43:53 2019 - [info] Generating diff files succeeded.
Mon May 6 05:43:53 2019 - [info]
Mon May 6 05:43:53 2019 - [info] * Phase 3.5: Master Log Apply Phase..
Mon May 6 05:43:53 2019 - [info]
Mon May 6 05:43:53 2019 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Mon May 6 05:43:53 2019 - [info] Starting recovery on 192.168.10.50(192.168.10.50:3306)..
Mon May 6 05:43:53 2019 - [info] Generating diffs succeeded.
Mon May 6 05:43:53 2019 - [info] Waiting until all relay logs are applied.
Mon May 6 05:43:53 2019 - [info] done.
Mon May 6 05:43:53 2019 - [info] Getting slave status..
Mon May 6 05:43:53 2019 - [info] This slave(192.168.10.50)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(binlog.000010:161813650). No need to recover from Exec_Master_Log_Pos.
Mon May 6 05:43:53 2019 - [info] Connecting to the target slave host 192.168.10.50, running recover script..
Mon May 6 05:43:53 2019 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='cmon' --slave_host=192.168.10.50 --slave_ip=192.168.10.50 --slave_port=3306 --apply_files=/tmp/relay_from_read_to_latest_192.168.10.50_3306_20190506054328.binlog --workdir=/tmp --target_version=5.7.23-23-log --timestamp=20190506054328 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx
Mon May 6 05:43:53 2019 - [info]
MySQL client version is 5.7.23. Using --binary-mode.
Applying differential binary/relay log files /tmp/relay_from_read_to_latest_192.168.10.50_3306_20190506054328.binlog on 192.168.10.50:3306. This may take long time...
mysqlbinlog: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
FATAL: applying log files failed with rc 1:0!
Error logs from testnode5:/tmp/relay_log_apply_for_192.168.10.50_3306_20190506054328_err.log (the last 200 lines)..
ICwgMmM5MmEwZjkzY2M5MTU3YzAxM2NkZTk4ZGQ1ODM0NDEgLCAyYzkyYTBmOTNjYzkxNTdjMDEz
….
…..
M2QxMzc5OWE1NTExOTggLCAyYzkyYTBmOTNjZmI1YTdhMDEzZDE2NzhiNDc3NDIzNCAsIDJjOTJh
MGY5M2NmYjVhN2EwMTNkMTY3OGI0N2Q0MjMERROR 1062 (23000) at line 72: Duplicate entry '12583545' for key 'PRIMARY'
5ICwgMmM5MmEwZjkzY2ZiNWE3YTAxM2QxNjc4YjQ4
OTQyM2QgLCAyYzkyYTBmOTNjZmI1YTdhMDEzZDE2NzhiNDkxNDI1MSAsIDJjOTJhMGY5M2NmYjVh
N2EwMTNkMTczN2MzOWM3MDEzICwgMmM5MmEwZjkzY2ZiNWE3YTAxM2QxNzM3YzNhMzcwMTUgLCAy
…
--------------
Bye
at /usr/local/bin/apply_diff_relay_logs line 554.
eval {...} called at /usr/local/bin/apply_diff_relay_logs line 514
main::main() called at /usr/local/bin/apply_diff_relay_logs line 121
Mon May 6 05:43:53 2019 - [error][/usr/local/share/perl/5.26.1/MHA/MasterFailover.pm, ln1399] Applying diffs failed with return code 22:0.
Mon May 6 05:43:53 2019 - [error][/usr/local/share/perl/5.26.1/MHA/MasterFailover.pm, ln1584] Recovering master server failed.
Mon May 6 05:43:53 2019 - [error][/usr/local/share/perl/5.26.1/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_manager line 65.
Mon May 6 05:43:53 2019 - [info]
An inconsistent cluster is really bad especially when automatic failover is enabled. In this case, failover cannot proceed as it detects a duplicate entry for primary key ‘12583545‘.
Fixes/Resolution:
There are multiple things you can do here to avoid inconsistent state of your cluster.
- Enable Lossless Semi-Synchronous Replication. Check out this external blog which is a good way to learn why you should consider using semi-sync in a standard MySQL replication setup.
- Constantly run a checksum against your master-slave cluster. You can use pt-table-checksum and run it like once a week or month depending on how constantly your table is updated. Take note that pt-table-checksum can add overhead to your database traffic.
- Use GTID-based replication. Although this won’t impact the problem per se. However, GTID-based replication helps you determine errant transactions, especially those transactions that were ran on the slave directly. Another advantage of this, it’s easier to manage GTID-based replication when you need to switch master host in replication.
Hardware Failure On The Master But Slaves Haven’t Caught Up Yet
One of the many reasons why you would invest in automatic failover is a hardware failure on the master. For some setups, it may be more ideal to perform automatic failover only when the master encounters a hardware failure. The typical approach is to notify by sending an alarm – which might mean waking up the on-call ops person in the middle of the night let the person decide what to do. This type of approach is done on Github or even Facebook. A hardware failure, especially if the volume where your binlogs and data directory resides is affected, can mess with your failover especially if the binary logs are stored on that failed disk. By design, MHA will try to save binary logs from the crashed master but this cannot be possible if the disk failed. One possible scenario can happen is that server cannot be reachable via SSH. MHA can not save binary logs and has to do failover without applying binary log events that exist on the crashed master only. This will result in losing the latest data, especially if no slave has caught up with the master.
Fixes/Resolution
As one of the use cases by MHA, it’s recommended to use semi-synchronous replication as it greatly reduces the risk of such data loss. It is important to note that any writes going to the master must ensure that slaves have received the latest binary log events before syncing to disk. MHA can apply the events to all other slaves so they can be consistent with each other.
Additionally, it’s better as well to run a backup stream of your binary logs for disaster recovery in case the main disk volume has failed. If server is still accessible via SSH, then pointing the binary log path to the backup path of your binary log can still work, so failover and slave recovery can still move forward. In this way, you can avoid data loss.
VIP (Virtual IP) Failover Causing Split-Brain
MHA, by default, does not handle any VIP management. However, it’s easy to incorporate this with MHA’s configuration and assign hooks in accordance to what you want MHA to do during the failover. You can set up your own script and hook it to the parameters master_ip_failover_script or master_ip_online_change_script. There are sample scripts as well which are located in
During an automatic failover, once your script with VIP management is invoked and executed, MHA will do the following: check status, remove (or stop) the old VIP, and then re-assign the new VIP to the new master. A typical example of split brain is, when a master is identified as dead due to a network issue but in fact, slave nodes are still able to connect to the master. This is a false positive, and often leads to data inconsistency across the databases in the setup. Incoming client connections using the VIP will be sent to the new master. While on the other hand, there can be local connections running on old master, which is supposed to be dead. The local connections could be using the unix socket or localhost to lessen network hops. This can cause the data to drift against the new master and the rest of its slaves, as data from old master won’t be replicated into the slaves.
Fixes/Resolution:
As stated earlier, some may prefer to avoid automatic failover unless the checks have determined that the master is totally down (like hardware failure), i.e. even the slave nodes are not able to reach it. The idea is that a false positive could be caused by a network glitch between the MHA node controller and the master, so a human may be better suited in this case to make a decision on whether to failover or not.
When dealing with false alarms, MHA has a parameter called secondary_check_script. The value placed here can be your custom scripts or you can use the built-in script /usr/local/bin/masterha_secondary_check which is shipped along with MHA Manager package. This adds extra checks which is actually the recommended approach to avoid false positives. In the example below from my own setup, I am using the built-in script masterha_secondary_check:
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.10.50 --user=root --master_host=testnode6 --master_ip=192.168.10.60 --master_port=3306
In the above example, MHA Manager will do a loop based on the list of slave nodes (specified by -s argument) which will check the connection against MySQL master (192.168.10.60) host. Take note that, these slave nodes in the example can be some external remote nodes that can establish a connection to the database nodes within the cluster. This is a recommended approach especially for those setups where MHA Manager is running on a different datacenter or different network than the database nodes. The following sequence below illustrates how it proceeds with the checks:
- From the MHA Host -> check TCP connection to the 1st Slave Node (IP: 192.168.10.50). Let’s call this as Connection A. Then from the Slave Node, checks TCP connection to the Master Node (192.168.10.60). Let’s call this Connection B.
If “Connection A” was successful but “Connection B” was unsuccessful in both routes, masterha_secondary_check script exits with return code 0 and MHA Manager decides that MySQL master is really dead, and will start failover. If “Connection A” was unsuccessful, masterha_secondary_check exits with return code 2 and MHA Manager guesses that there is a network problem and it does not start failover. If “Connection B” was successful, masterha_secondary_check exits with return code 3 and MHA Manager understands that MySQL master server is actually alive, and does not start failover.
An example of how it reacts during the failover based on the log,
Tue May 7 05:31:57 2019 - [info] OK.
Tue May 7 05:31:57 2019 - [warning] shutdown_script is not defined.
Tue May 7 05:31:57 2019 - [info] Set master ping interval 1 seconds.
Tue May 7 05:31:57 2019 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s 192.168.10.50 -s 192.168.10.60 -s 192.168.10.70 --user=root --master_host=192.168.10.60 --master_ip=192.168.10.60 --master_port=3306
Tue May 7 05:31:57 2019 - [info] Starting ping health check on 192.168.10.60(192.168.10.60:3306)..
Tue May 7 05:31:58 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.60' (110))
Tue May 7 05:31:58 2019 - [warning] Connection failed 1 time(s)..
Tue May 7 05:31:58 2019 - [info] Executing SSH check script: exit 0
Tue May 7 05:31:58 2019 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.10.50 -s 192.168.10.60 -s 192.168.10.70 --user=root --master_host=192.168.10.60 --master_ip=192.168.10.60 --master_port=3306 --user=vagrant --master_host=192.168.10.60 --master_ip=192.168.10.60 --master_port=3306 --master_user=cmon --master_password=R00tP@55 --ping_type=SELECT
Master is reachable from 192.168.10.50!
Tue May 7 05:31:58 2019 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Tue May 7 05:31:59 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.60' (110))
Tue May 7 05:31:59 2019 - [warning] Connection failed 2 time(s)..
Tue May 7 05:32:00 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.60' (110))
Tue May 7 05:32:00 2019 - [warning] Connection failed 3 time(s)..
Tue May 7 05:32:01 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.60' (110))
Tue May 7 05:32:01 2019 - [warning] Connection failed 4 time(s)..
Tue May 7 05:32:03 2019 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.10.60! at /usr/local/share/perl/5.26.1/MHA/HealthCheck.pm line 343.
Tue May 7 05:32:03 2019 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Tue May 7 05:32:04 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.10.60' (110))
Tue May 7 05:32:04 2019 - [warning] Connection failed 1 time(s)..
Tue May 7 05:32:04 2019 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.10.50 -s 192.168.10.60 -s 192.168.10.70 --user=root --master_host=192.168.10.60 --master_ip=192.168.10.60 --master_port=3306 --user=vagrant --master_host=192.168.10.60 --master_ip=192.168.10.60 --master_port=3306 --master_user=cmon --master_password=R00tP@55 --ping_type=SELECT
Tue May 7 05:32:04 2019 - [info] Executing SSH check script: exit 0
Another thing to add is assigning a value to the parameter shutdown_script. This script is especially useful if you have to implement a proper STONITH or node fencing so it won’t rise from the dead. This can avoid data inconsistency.
Lastly, ensure that the MHA Manager resides within the same local network along with the cluster nodes as it lessens the possibility of network outages, especially the connection from MHA Manager to the database nodes.
Avoiding SPOF in MHA
MHA can crash for various reasons, and unfortunately, there’s no built-in feature to fix this, i.e. High Availability for MHA. However, as we have discussed in our previous blog “Master High Availability Manager (MHA) Has Crashed! What Do I Do Now?”, there’s a way to avoid SPOF for MHA.
Fixes/Resolution:
You can leverage Pacemaker to create active/standby nodes handled by cluster resource manager (crm). Alternatively, you can create a script to check the health of the MHA manager node. For example, you can provision a stand-by node which actively checks the MHA manager node by ssh’ing to run the built-in script masterha_check_status just like below:
vagrant@testnode20:~$ /usr/local/bin/masterha_check_status --conf=/etc/app1.cnf
app1 is stopped(2:NOT_RUNNING).
then do some node fencing if that controller is borked. You may also extend MHA tool with a helper script that runs via cron job and monitor the system process of the masterha_manager script and re-spawn it if process is dead.
Data Loss During Failover
MHA relies on the traditional async replication. Although it does support semi-sync, still, semi-sync relies on asynchronous replication. In this type of environment, data loss may happen after a failover. When your database is not setup properly and using an old-fashioned approach of replication, then it can be a pain especially when dealing with data consistency and lost transactions.
Another important thing to note with data loss with MHA, is when GTID is used with no semi-sync enabled. MHA with GTID will not connect through ssh to the master but will try to sync the binary logs for node recovery with the slaves first. This may potentially lead to more data loss than compared to MHA non-GTID with semi-sync not enabled.
Fixes/Resolution
When performing automatic failover, build a list of scenarios when you expect your MHA to failover. Since MHA is dealing with master-slave replication, then our advice to you to avoid data loss are the following:
- Enable lossless semi-sync replication (exists in version MySQL 5.7)
- Use GTID-based replication. Of course, you can use the traditional replication by using binlog’s x & y coordinates. However, it makes things more difficult and time consuming when you need to locate a specific binary log entry that wasn’t applied on the slave. Hence, with GTID in MySQL, it’s easier to detect errant transactions.
- For ACID compliance of your MySQL master-slave replication, enable these specific variables: sync_binlog = 1, innodb_flush_log_at_trx_commit = 1. This is expensive as it requires more processing power when MySQL calls the fsync() function when it commits, and performance can be disk bound in case of high number of writes. However, using RAID with battery-backup cache saves your disk I/O. Additionally, MySQL itself has improved with binary log group commit but still using a backup cache can save some disk syncs.
- Leverage parallel replication or multi-threaded slave replication. This can help your slaves become more performant, and avoids slave lags against the master. You don’t want your automatic failover to occur when the master is not reachable at all via either ssh or tcp connection, or if it encountered a disk failure, and your slaves are lagging behind. That could lead to data loss.
- When performing an online or manual failover, it’s best that you are performing it during non-peak periods to avoid unexpected mishaps that could lead to data loss. Or to avoid time-consuming searches grepping through your binary logs while there is a lot of activity going on.
MHA Says APP is Not Running, or Failover Does Not Work. What Should I Do?
Running checks using the built-in script masterha_check_status will check if the mastreha_manager script is running. Otherwise, you’ll get an error like below:
vagrant@testnode20:~$ /usr/local/bin/masterha_check_status --conf=/etc/app1.cnf app1 is stopped(2:NOT_RUNNING).
However, there are certain cases where you might get NOT_RUNNING even when masterha_manager is running. This can be due to privilege of the ssh_user you set, or you run masterha_manager with a different system user, or the ssh user encountered a permission denied.
Fixes/Resolution:
MHA will use the the ssh_user defined in the configuration if specified. Otherwise, will use the current system user that you use to invoke the MHA commands. When running the script masterha_check_status for example, you need to ensure that the masterha_manager runs with the same user that is specified in ssh_user in your configuration file, or the user that will interface with the other database nodes in the cluster. You need to ensure that it has password-less, no passphrase SSH keys so MHA won’t have any issues when establishing connection to the nodes that MHA is monitoring.
Take note that you need the ssh_user to have access to the following:
- Can read the binary and relay logs of the MySQL nodes that MHA is monitoring
- Must have access to the MHA monitoring logs. Check out these parameters in MHA: master_binlog_dir, manager_workdir, and manager_log
- Must have access to the MHA configuration file. This is also very important. During a failover, once it finishes the failover, it will try to update the configuration file and remove the entry of the dead master. If the configuration file does not allow the ssh_user or the OS user you are currently using, it won’t update the configuration file, leading to an escalation of the problem if disaster strikes again.
Candidate Master Lags, How to Force And Avoid Failed Failover
In reference to MHA’s wiki, by default, if a slave behinds master more than 100MB of relay logs (= needs to apply more than 100MB of relay logs), MHA does not choose the slave as new master because it takes too long time to recover.
Fixes/Resolution
In MHA, this can be overridden by setting the parameter check_repl_delay=0. During a failover, MHA ignores replication delay when selecting a new master and will execute missing transactions. This option is useful when you set candidate_master=1 on a specific host and you want to make sure that the host can be new master.
You can also integrate with pt-heartbeat to achieve accuracy of slave lag (see this post and this one). But this can also be alleviated with parallel replication or multi-threaded replication slaves, present since MySQL 5.6 or, with MariaDB 10 – claiming to have a boost with 10x improvement in parallel replication and multi-threaded slaves. This can help your slaves replicate faster.
MHA Passwords Are Exposed
Securing or encrypting the passwords isn’t something that is handled by MHA. The parameters password or repl_password will be exposed via the configuration file. So your system administrator or security architect must evaluate the grants or privileges of this file as you don’t want to expose valuable database/SSH credentials.
Fixes/Resolution:
MHA has an optional parameter init_conf_load_script. This parameter can be used to have a custom script load your MHA config that will interface to e.g. a database, and retrieve the user/password credentials of your replication setup.
Of course, you can also limit the file attribute of the configuration and the user you are using, and limit the access to the specific Ops/DBA’s/Engineers that will handle MHA.
MHA is Not My Choice, What Are the Alternatives for replication failover?
MHA is not a one-size-fits-all solution, it has its limitations and may not fit your desired setup. However, here’s a list of variants that you can try.
- PRM
- Maxscale with Mariadb Monitor or MariaDB Replication Manager (MRM)
- Orchestrator
- ClusterControl