Comparing Galera Cluster Cloud Offerings: Part Two Google Cloud Platform (GCP)

Paul Namuag

In our last blog we discussed the offerings available within Amazon Web Services (AWS) when running a MySQL Galera Cluster. In this blog, we'll continue the discussion by looking further at what the offerings are for running the same clustering technology, but this time on the Google Cloud Platform (GCP)

GCP, as an alternative to AWS, has been continuously attracting applications suited for DevOps by offering support for a wide array of full-stack technologies, containerized applications, and large production database systems. Google Cloud is a full-blown, battle-tested environment which powers its own hardware infrastructure at Google for products like YouTube and Gmail.

GCP has gained traction largely because of its ever-growing list of capabilities. It offers support for platforms like Visual Studio, Android Studio, Eclipse, Powershell and many others. GCP has one of the largest and most advanced computer networks and it provides access to numerous tools that help you focus on building your application. 

Another thing that attracts customers to migrate, import, or use Google Cloud is their strong support and solutions for containerization. Kubernetes (GKE: Google Kubernetes Engine) is built on their platform. 

GCP has also recently launched a new solution called Anthos. This product is designed to let organizations manage workloads using the same interface on the Google Cloud Platform (GCP) or on-premises using GKE On-Prem, and even on rival clouds such as Amazon Web Services (AWS) or Azure. 

In addition to these technologies, GCP offers sophisticated and powerful, compute-optimized machine types like the C2 family in GCE which is built on the latest generation Intel Scalable Processors (Cascade Lake).

GCP is continuing to support open source as well, which benefits users by providing well-supported and a straightforward framework that makes it easy to deliver a final product in a timely manner. Despite this support of open source technology, GCP does not provide native support for the deployment or configuration of a MySQL Galera Cluster. In this blog we will show you the only option available to you if you wish to use this technology, deployment via a compute instance which you have to manage yourself.

The Google Compute Engine (GCE)

GCE has a sophisticated and powerful set of compute nodes available for your consumption. Unlike AWS, GCE has the most powerful compute node available on the market (n1-ultramem-160 having 160 vCPU and 3.75 TB of memory). GCE also just recently introduced a new type of compute instance family called C2 machine-type. Built on the latest generation of Intel Scalable Processors (Cascade Lake), C2 machine types offer up to 3.8 GHz sustained all-core turbo and provide full transparency into the architecture of the underlying server platforms; letting you fine-tune the performance. C2 machine types offer much more computing power, run on a newer platform, and are generally more robust for compute-intensive workloads than the N1 high-CPU machine types. C2 family offerings are limited (as of the time of writing) and it’s not available in all regions and zones. C2 also does not support regional persistent disks though it would be a great add-on for stateful database services that requires redundancy and high availability. The resources of a C2 instance is too much for a Galera node, so we'll focus on the compute nodes instead, which are ideal.

GCE also uses KVM as its virtualization technology software, whereas Amazon is using Xen. Let's take a look at the compute nodes available in GCE which are suitable for running Galera alongside its equivalence in AWS EC2. Prices differs based on region, but for this chart, we use us-east region using on-demand pricing type for AWS.

 

Machine/Instance Type

Google Compute Engine

AWS EC2

Shared

f1-micro

G1-small

 

Prices starts at $0.006 -  $0.019 hourly

t2.nano – t3.2xlarge'

 

Price starts at $0.0058 - $0.3328 hourly

Standard

n1-standard-1 – n1-standard-96

 

Prices starts at $0.034  - $3.193 hourly

m4.large – m4.16xlarge

m5.large – m5d.metal

 

Prices starts at $0.1 - $5.424  hourly

High Memory/ Memory Optimized

n1-highmem-2 – n1-highmem-96

n1-megamem-96

n1-ultramem-40 – n1-ultramem-160

 

Prices starts at $0.083  - $17.651 hourly

r4.large – r4.16xlarge

x1.16xlarge – x1.32xlarge

x1e.xlarge – x1e.32xlarge

 

Prices starts at $0.133  - $26.688 hourly

High CPU/Storage Optimized

n1-highcpu-2 – n1-highcpu-32

 

Prices starts at $0.05 - $2.383 hourly

h1.2xlarge – h1.16xlarge

i3.large – i3.metal

I3en.large - i3en.metal

d2.xlarge – d2.8xlarge

 

Prices starts at $0.156 - $10.848  hourly

GCE has a fewer number of available predefined types of compute nodes to choose from, unlike AWS. When it comes to the type of node, however, it has more granularity. This makes it easier to setup and choose what kind of instance you want to use. For example, you can add a disk and set its physical block size (4 is default) to 16 or you can set its mode either read/write or read-only. This allows you to offer the right type of machine or compute instance ready to manage your Galera node. You may also instantiate your compute nodes using Cloud SDK, or by using Cloud APIs, to automate or integrate it to your Continuous Integration, Delivery, or Deployment (CI/CD). 

Pricing (Compute Instance, Disk, vCPU, Memory, and Network)

The price as well depends on the region where its located, the type of OS or licensing (RHEL vs Suse Linux Enterprise), and also the type of disk storage you're using. 

GCP also offers discounts which allows you to economize your resource consumption. For Compute Engine, it provides different discounts to avail. 

Sustained use discounts apply to the following resources:

Take note that sustained use discounts do not apply to VMs created using App Engine Flexible Environment and Cloud Dataflow.

You can also use Committed Use Discounts when you purchase a VMS which is bound to a contract. This type of choice is ideal for predictable workloads and resource needs. When you purchase a committed use contract you purchase a certain amount of vCPUs, memory, GPUs, and local SSDs at a discounted price in return for committing to paying for those resources for 1 year or 3 years. The discount is up to 57% for most resources like machine types or GPUs. The discount is up to 70% for memory-optimized machine types. Once purchased, you are billed monthly for the resources you purchased for the duration of the term you selected (whether you use the services or not). 

A preemptible VM is an instance that you can create and run at a much lower price than normal instances. Compute Engine may, however, terminate (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances use excess Compute Engine capacity, so their availability varies with usage.

If your applications are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on preemptible instances. If some of those instances terminate during processing, the job slows but does not completely stop. Preemptible instances complete your batch processing tasks without placing additional workload on your existing instances, and without requiring you to pay full price for additional normal instances.

For Compute Engine, disk size, machine type memory, and network usage are calculated in gigabytes (GB), where 1 GB is 230 bytes. This unit of measurement is also known as a gibibyte (GiB). This means that GCP offers you to only pay based on the resource consumption you have allocated. 

Now, if you have a high-grade, production database application, it's recommendable (and ideal) to attach or add a separate persistent disk. You would then use that disk as your database volume, as it offers you reliable and consistent disk performance in GCE. The higher the size you setup, the higher the IOPS it offers you.  Checkout their list of persistent disk pricing to determine the price you would get. In addition to this, GCE has regional persistent disk which is suitable in case you require more solid and sustainable high-availability within your database cluster. Regional persistent disk adds more redundancy in the case that your instance terminates or crashes or becomes corrupted. It provides synchronous replication of data between two zones in one region which happens transparently in the VM instance. In the unlikely event of  zone failure, your workload can fail-over to another VM instance in the same, or a secondary, zone. You can then force-attach your regional persistent disk to that instance. Force-attach time is estimated in less than one minute.

If you store backups as part of your disaster recovery solution, and requires a volume that is cluster-wide, GCP offers Cloud Filestore, NetApp Cloud Volumes, and some other alternative file-sharing solutions. These are fully-managed services that offers standard and premium services. You can checkout NetApp's pricing page here and Filestore pricing here.

Galera Encryption on GCP

GCP does not include specific support for the type of encryption available for Galera. GCP, however, encrypts customer data stored at rest by default, with no additional action required from you. GCP also offers another option to encrypt your data using Customer-managed encryption keys (CMEK) with Cloud KMS as well as with Customer-supplied encryption keys (CSEK). GCP also uses SSL/TLS encryption for all communications intercepted as data moves between your site and the cloud provider or between two services. This protection is achieved by encrypting the data before transmission; authenticating the endpoints; and decrypting and verifying the data on arrival.

Because Galera uses MySQL under the hood (Percona, MariaDB, or Codership build), you can take advantage of the File Key Management Encryption Plugin by MariaDB or by using the MySQL Keyring plugins. Here's an external blog by Percona which is a good resource on how you can implement this.

Galera Cluster Multi-AZ/Multi-Region/Multi-Cloud Deployments with GCP

Similarly to AWS, GCP does not offer direct support to deploy a Galera cluster on a Multi-AZ/-Region/-Cloud.

Galera Cluster High Availability, Scalability, and Redundancy on GCP

One of the primary reasons to use a Galera node cluster is the high-availability, redundancy, and it's ability to scale. If you are serving traffic globally, it's best that you cater your traffic based by regions with your architectural design including a geo-distribution of your database nodes. In order to achieve this, multi-AZ and multi-region or multi-cloud/multi-datacenter deployment is recommendable and achievable. This prevents the cluster from going down or a cluster malfunction due to lack of quorum. 

To help you more with your scalability design, GCP also has an autoscaler you can set up with an autoscaling group. This will work as long as you created your cluster as managed instance groups. For example, you can monitor the CPU utilization or relying on the metrics from Stackdriver defined in your autoscaling policy. This allows you to provision and automate instances when a certain threshold is reached, or terminate the instances when it goes back to its normal state.

For multi-region or multi-cloud deployment, Galera has its own parameter called gmcast.segment for which you can set this upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments. This includes writeset relaying and IST and SST donor selection. This type of setup allows you to deploy multiple nodes in different regions. Aside from that, you can also deploy your Galera nodes on a different cloud vendor routing from GCP, AWS, Microsoft Azure, or within on-premise. 

We recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Galera Cluster Database Performance on GCP

Since there's no available support for Galera in GCP your choices depend on the requirements and design of your application’s traffic and resource demands. For queries that are high on memory consumption, you can start with n1-highmem-2 instance. High CPU instances (n1-highcpu* family) can be a good fit if this is a high-transactional database, or a good fit for gaming applications.

Choosing the right storage and required IOPS for your database volume is a must. Generally, SSD-based persistent disk is your choice here. It depends on the volume of traffic is required, you might have to checkout the GCP storage options so you can determine the right size for your application.

We also recommend you to check and read our blog How to Improve Performance of Galera Cluster for MySQL or MariaDB to learn more about optimizing your Galera Cluster.

Galera Data Backups on GCP

Not only does your MySQL Galera data has to be backed-up, you should also backup the entire tier which comprises your database application. This includes log files (logical or binary), external files, temporary files, dump files, etc. Google recommends that you always create a snapshot of your persistent disks volumes which are being used by your GCE instances. You can easily create and schedule snapshots. GCP Snapshots are stored in Cloud Storage and you can select your desired location or region where the backup will be located. You can also setup a schedule for your snapshots as well as set a snapshot retention policy.

You can also use external services like, ClusterControl, which provides you both monitoring and backup solutions. Check this out if you want to know more.

Galera Cluster Database Monitoring on GCP

GCP does not offer database monitoring when using GCE. Monitoring your of instance health can be done through Stackdriver. For the database, though, you will need to grab an external monitoring tool which has advanced, highly-granular database metrics. There are a lot of choices you can choose from such as PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (Monitoring is FREE with ClusterControl Community.)

Galera Cluster Database Security on GCP

As discussed in our previous blog, you can take the same approach for securing your database in the public cloud. In GCP you can setup a private subnet, firewall rules to only allow the ports required for running Galera (particularly ports 3306, 4444, 4567, 4568). You can use NAT Gateway or setup a bastion host to access your private database nodes. When these nodes are encapsulated they cannot be accessed from the outside of the GCP premises. You can read our previous blog Deploying Secure Multicloud MySQL Replication on AWS and GCP with VPN on how we set this up.

In addition to this, you can secure your data-in-transit by using a TLS/SSL connection or by encrypting your data when it's at rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, you can follow the discussion I have stated earlier in the Encryption section of this blog.

Galera Cluster Troubleshooting 

GCP offers Stackdriver Logging which you can leverage to help you with observability, monitoring, and notification requirements. The great thing about Stackdriver Logging is that it offers integration with AWS. With it you can catch the events selectively and then raise an alert based on that event. This can keep you in the loop on certain issues which may arise and help you during troubleshooting. GCP also has Cloud Audit Logs which provide you more traceable information from inside the GCP environment, from admin activity, data access, and system events. 

If you're using ClusterControl, going to Logs -> System Logs, and you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.

Conclusion

The Google Cloud Platform offers a wide-variety of efficient and powerful services that you can leverage. There are indeed pros and cons for each of public cloud platforms, but GCP proves that AWS doesn’t have a lock on the cloud. 

It's interesting that big companies such as Vimeo are moving to GCP coming from on-premise and they experienced some interesting results in their technology stack. Bloomberg as well is happy with GCP and is using Percona XtraDB Cluster (a Galera variant). Let us know what you think about using GCP for MySQL Galera setups in the comments below.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.