Multi-Cloud Full Database Cluster Failover Options for MariaDB Cluster

Krzysztof Ksiazek

With high availability being paramount in today’s business reality, one of the most common scenarios for users to deal with is how to ensure that the database will always be available for the application. 

Every service provider comes with an inherited risk of service disruption therefore one of the steps that can be taken are to rely on multiple providers to alleviate the risk and additional redundancy. 

Cloud service providers are no different - they can fail and you should plan for this in the advance. What options are available for MariaDB Cluster? Let’s take a look at it in this blog post.

MariaDB Database Clustering in Multi-Cloud Environments

If SLA proposed by one cloud service provider is not enough, there’s always an option to create a disaster recovery site outside of that provider. Thanks to this, whenever one of the cloud providers experiences some service degradation, you can always switch to another provider and keep your database up and available.

One of the problems that are typical for multi-cloud setups is the network latency that’s unavoidable if we are talking about larger distances or, in general, multiple geographically separated locations. Speed of light is quite high but it is finite, every hop, every router also adds some latency into the network infrastructure. 

MariaDB Cluster works great on low-latency networks. It is a quorum-based cluster where prompt communication between all nodes is required to keep the operations smooth. Increase in network latency will impact cluster operations, especially performance of the writes. There are several ways this problem can be addressed. 

First we have an option to use separate clusters connected using asynchronous replication links. This allows us to almost forget about latency because asynchronous replication is significantly better suited to work in high latency environments. 

Another option is that,  given low latency networks between datacenters, you still might be perfectly fine to run a MariaDB Cluster spanning across several data centers. After all, multiple datacenters don’t always mean vast distances geographically-wise - you can as well use multiple providers located within the same metropolitan area, connected with fast, low-latency networks. Then we’ll be talking about latency increase to tens of milliseconds at most, definitely not hundreds. It all depends on the application but such an increase may be acceptable.

Asynchronous Replication Between MariaDB Clusters

Let’s take a quick look at the asynchronous approach. The idea is simple - two clusters connected with each other using asynchronous replication. 

Asynchronous Replication Between MariaDB Clusters

This comes with several limitations. For starters, you have to decide if you want to use multi-master or would you send all traffic to one datacenter only. We would recommend to stay away from writing to both datacenters and using master - master replication. This may lead to serious issues if you do not exercise caution.

If you decide to use the active - passive setup, you would probably want to implement some sort of a DNS-based routing for writes, to make sure that your application servers will always connect to a set of proxies located in the active datacenter. This might be achieved by either literally DNS entry that would be changed when failover is required or it can be done through some sort of a service discovery solution like Consul or etcd.

The main downside of the environment built using the asynchronous replication is the lack of ability to deal with network splits between datacenters. This is inherited from the replication - no matter what you want to link with the replication (single nodes, MariaDB Clusters), there is no way to go around the fact that replication is not quorum-aware. There is no mechanism to track the state of the nodes and understand the high level picture of the whole topology. As a result, whenever the link between two datacenters goes down, you end up with two separate MariaDB clusters that are not connected and that are both ready to accept traffic. It will be up to the user to define what to do in such a case. It is possible to implement additional tools that would monitor the state of the databases from outside (i.e. from the third datacenter) and then take actions (or do not take actions) based on that information. It is also possible to collocate tools that would share the infrastructure with databases but would be cluster-aware and could track the state of the datacenter connectivity and be used as the source of truth for the scripts that would manage the environment. For example, ClusterControl can be deployed in a three-node cluster, node per datacenter, that uses RAFT protocol to ensure the quorum. If a node losts the connectivity with the rest of the cluster it could be assumed that the datacenter has experienced network partitioning.

Multi-DC MariaDB Clusters

Alternative to the asynchronous replication could be an all-MariaDB Cluster solution that spans across multiple datacenters.

Multi-DC MariaDB Clusters

As stated at the beginning of this blog, MariaDB Cluster, just like every Galera-based cluster, will be impacted by the high latency. Having said that, it is perfectly acceptable to run it in “not-so-high” latency environments and expect it to behave properly, delivering acceptable performance. It all depends on the network throughput and design, distance between datacenters and application requirements. Such an approach will work great especially if we use segments to differentiate separate data centers. It allows MariaDB Cluster to optimize its intra cluster connectivity and reduce cross-DC traffic to the minimum.

The main advantage of this setup is that it relies on MariaDB Cluster to handle failures. If you use three data centers, you are pretty much covered against the split-brain situation - as long as there is a majority, it will continue to operate. It is not required to have a full-blown node in the third datacenter - you can as well use Galera Arbitrator, a daemon that acts as a part of the cluster but it does not have to handle any database operations. It connects to the nodes, takes part in the quorum calculation and may be used to relay the traffic should the direct connection between the two data centers not work. 

In that case the whole failover process can be described as: define all nodes in the load balancers (all if data centers are close to each other, in other case you may want to add some priority for the nodes located closer to the load balancer) and that’s pretty much it. MariaDB Cluster nodes that form the majority will be reachable through any proxy.

Deploying a Multi-Cloud MariaDB Cluster Using ClusterControl

Let’s take a look at two options you can use to deploy multi-cloud MariaDB Clusters using ClusterControl. Please keep in mind that ClusterControl requires SSH connectivity to all of the nodes it will manage so it would be up to you to ensure network connectivity across multiple datacenters or cloud providers. As long as the connectivity is there, we can proceed with two methods.

Deploying MariaDB Clusters Using Asynchronous Replication

ClusterControl can help you to deploy two clusters connected using asynchronous replication. When you have a single MariaDB Cluster deployed, you want to ensure that one of the nodes has binary logs enabled. This will allow you to use that node as a master for the second cluster that we will create shortly.

Deploying MariaDB Clusters Using Asynchronous Replication
Deploying MariaDB Clusters Using Asynchronous Replication

Once the binary log has been enabled, we can use Create Slave Cluster job to start the deployment wizard.

Deploying MariaDB Clusters Using Asynchronous Replication
Deploying MariaDB Clusters Using Asynchronous Replication

We can either stream the data directly from the master or you can use one of the backups to provision the data.

Deploying MariaDB Clusters Using Asynchronous Replication

Then you are presented with a standard cluster deployment wizard where you have to pass SSH connectivity details.

Deploying MariaDB Clusters Using Asynchronous Replication

You will be asked to pick the vendor and version of the databases as well as asked for the password for the root user.

Deploying MariaDB Clusters Using Asynchronous Replication

Finally, you are asked to define nodes you would like to add to the cluster and you are all set.

Deploying MariaDB Clusters Using Asynchronous Replication

When deployed, you will see it on the list of the clusters in the ClusterControl UI.

Deploying Multi-Cloud MariaDB Cluster

As we mentioned earlier, another option to deploy MariaDB Cluster would be to use separate segments when adding nodes to the cluster. In the ClusterControl UI you will find an option to “Add Node”:

Deploying Multi-Cloud MariaDB Cluster

When you use it, you will be presented with following screen:

Deploying Multi-Cloud MariaDB Cluster

The default segment is 0 so you want to change it to a different value.

After nodes have been added you can check in which segment they are located by looking at the Overview tab:

Deploying Multi-Cloud MariaDB Cluster

Conclusion

We hope this short blog gave you a better understanding of the options you have for multi-cloud MariaDB Cluster deployments and how they can be used to ensure high availability of your database infrastructure.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.