Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part Two

Krzysztof Ksiazek

In the previous blog in the series we discussed the pros and cons of using Galera Cluster to create geo-distributed cluster. In this post we will design a Galera-based geo-distributed cluster and we will show how you can deploy all the required pieces using ClusterControl.

Designing a Geo-Distributed Galera Cluster

We will start with explaining the environment we want to build. We will use three remote data centers, connected via Wide Area Network (WAN). Each datacenter will receive writes from local application servers. Reads will also be only local. This is intended to avoid unnecessary traffic crossing the WAN. 

For this setup the connectivity is in place and secured, but we won’t describe exactly how this can be achieved. There are numerous methods to secure the connectivity starting from proprietary hardware and software solutions through OpenVPN and ending up on SSH tunnels. 

We will use ProxySQL as a loadbalancer. ProxySQL will be deployed locally in each datacenter. It will also route traffic only to the local nodes. Remote nodes can always be added manually and we will explain cases where this might be a good solution. Application can be configured to connect to one of the local ProxySQL nodes using round-robin algorithm. We can as well use Keepalived and Virtual IP to route the traffic towards the single ProxySQL node, as long as a single ProxySQL node would be able to handle all of the traffic. 

Another possible solution is to collocate ProxySQL with application nodes and configure the application to connect to the proxy on the localhost. This approach works quite well under the assumption that it is unlikely that ProxySQL will not be available yet the application would work ok on the same node. Typically what we see is either node failure or network failure, which would affect both ProxySQL and application at the same time.

Geo-Distributed MySQL Galera Cluster with ProxySQL

The diagram above shows the version of the environment, where ProxySQL is collocated on the same node as the application. ProxySQL is configured to distribute the workload across all Galera nodes in the local datacenter. One of those nodes would be picked as a node to send the writes to while SELECTs would be distributed across all nodes. Having one dedicated writer node in a datacenter helps to reduce the number of possible certification conflicts, leading to, typically, better performance. To reduce this even further we would have to start sending the traffic over the WAN connection, which is not ideal as the bandwidth utilization would significantly increase. Right now, with segments in place, only two copies of the writeset are being sent across datacenters - one per DC.

The main concern with Galera Cluster geo-distributed deployments is latency. This is something you always have to test prior launching the environment. Am I ok with the commit time? At every commit certification has to happen so writesets have to be sent and certified on all nodes in the cluster, including remote ones. It may be that the high latency will deem the setup unsuitable for your application. In that case you may find multiple Galera clusters connected via asynchronous replication more suitable. This would be a topic for another blog post though.

Deploying a Geo-Distributed Galera Cluster Using ClusterControl

To clarify things, we will show here how a deployment may look like. We won’t use actual multi-DC setup, everything will be deployed in a local lab. We assume that the latency is acceptable and the whole setup is viable. What is great about ClusterControl is that it is infrastructure-agnostic. It doesn’t care if the nodes are close to each other, located in the same datacenter or if the nodes are distributed across multiple cloud providers. As long as there is SSH connectivity from ClusterControl instance to all of the nodes, the deployment process looks exactly the same. That’s why we can show it to you step by step using just local lab.

Installing ClusterControl

First, you have to install ClusterControl. You can download it for free. After registering, you should access the page with guide to download and install ClusterControl. It is as simple as running a shell script. Once you have ClusterControl installed, you will be presented with a form to create an administrative user:

Installing ClusterControl

Once you fill it, you will be presented with a Welcome screen and access to deployment wizards:

ClusterControl Welcome Screen

We’ll go with deploy. This will open a deployment wizard:

ClusterControl Deployment Wizard

We will pick MySQL Galera. We have to pass SSH connectivity details - either root user or sudo user are supported. On the next step we are to define servers in the cluster.

Deploy Database Cluster

We are going to deploy three nodes in one of the data centers. Then we will be able to extend the cluster, configuring new nodes in different segments. For now all we have to do is to click on “Deploy” and watch ClusterControl deploying the Galera cluster.

Cluster List - ClusterControl

Our first three nodes are up and running, we can now proceed to adding additional nodes in other datacenters.

Add a Database Node - ClusterControl

You can do that from the action menu, as shown on the screenshot above.

Add a Database Node - ClusterControl

Here we can add additional nodes, one at a time. What is important, you should change the Galera segment to non-zero (0 is used for the initial three nodes).

After a while we end up with all nine nodes, distributed across three segments.

ClusterControl Geo-Distributed Database Nodes

Now, we have to deploy proxy layer. We will use ProxySQL for that. You can deploy it in ClusterControl via Manage -> Load Balancer:

Add a Load Balancer - ClusterControl

This opens a deployment field:

Deploy Load Balancer - ClusterControl

First, we have to decide where to deploy ProxySQL. We will use existing Galera nodes but you can type anything in the field so it is perfectly possible to deploy ProxySQL on top of the application nodes. In addition, you have to pass access credentials for the administrative and monitoring user.

Deploy Load Balancer - ClusterControl

Then we have to either pick one of existing users in MySQL or create one right now. We also want to ensure that the ProxySQL is configured to use Galera nodes located only in the same datacenter.

When you have one ProxySQL ready in the datacenter, you can use it as a source of the configuration:

Deploy ProxySQL - ClusterControl

This has to be repeated for every application server that you have in all datacenters. Then the application has to be configured to connect to the local ProxySQL instance, ideally over the Unix socket. This comes with the best performance and the lowest latency.

Reducing Latency - ClusterControl

After the last ProxySQL is deployed, our environment is ready. Application nodes connect to local ProxySQL. Each ProxySQL is configured to work with Galera nodes in the same datacenter:

ProxySQL Server Setup - ClusterControl

Conclusion

We hope this two-part series helped you to understand the strengths and weaknesses of geo-distributed Galera Clusters and how ClusterControl makes it very easy to deploy and manage such cluster.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.