Driving Performance in a Hybrid Cloud Setup

Ashraf Sharif

A hybrid cloud refers to mixed computing, storage, and services environment made up of on-premises infrastructure, private cloud services, and a public cloud orchestration among the various platforms. Using a combination of public clouds, on-premises computing, and private clouds in your data center means that you have a hybrid cloud infrastructure.

Performance commonly has lower priority in a hybrid cloud, since the focus of having hybrid cloud infrastructure is commonly towards disaster recovery, availability and scalability. In this blog post, we are going to cover some general tips to drive the performance of our applications, workloads and clusters running on a hybrid cloud setup.

Dedicated Hosts/Instances/Resources

The cost of cloud services is claimed to be lower due to the extensive sharing of resources. However, a higher degree of sharing would automatically mean fewer guarantees of performance.

Cloud instances are prone to unpredictable stability but with some additional costs, we can reduce this risk with dedicated resources. Dedicated instances are instances that run on hardware that's dedicated to a single tenant. Commonly, the dedicated hosts or instances are physically isolated at the host hardware level from instances that belong to other tenants. This will guarantee adequate resources to the service and practically stabilize the performance of your workloads in a long run. Depending on your budget, you have multiple options to go for dedicated resources like dedicated hosts, instances or bandwidth.

There are also a lot of offerings and discounts if you are planning to run the instances for a longer period. For example, by committing to AWS EC2 Reserve Instance, users can save up to 70% of the instance cost if compared to the standard on-demand costing. Once the applications or workloads are tested and ready for production, it is highly recommended that you opt for a long-term contract provided you allocate enough resources to the instance for that particular contract period.

Bandwidth Management

Bandwidth is expensive. What is expensive is the infrastructure to carry the bandwidth from one place to another. The laying fiber, carrier-grade and provider grade routing hardware, the monitoring and maintenance overheads to keep it all running, datacenter suite rental, staffing a 24/7 Network Operation Center (NOC) engineer, are all contribute to the high price of a reliable bandwidth. Not to mention the pace of technology, user demands and vendor product lifetimes often require that a large chunk of the above investment is thrown away and lifecycle every 7 to 10 years, in some cases 5 years.

Most of the public cloud provides allow data exchange with other Cloud Service Provider (CSP), which achievable in multiple ways:

  • Transfer of data through public IP addresses over the internet.

  • Using a managed VPN service between the on-premises network and the CSP network.

  • Connect directly from the on-premises network, or private cloud networks with the other CSP like Partner Interconnect for Google Cloud or AWS Direct Connect for AWS.

  • Transfer data to the other CSP through a common point of presence (POP).

  • Network peering with private cloud networks and the CSP network.

These options differ in terms of transfer speed, latency, reliability, service level agreements (SLAs), complexity, and costs. Regardless of the options, the idea is the same - the smaller data transfer used, the lower it costs.

To reduce bandwidth usage, compression is the foremost thing that we should do. Most of the replication services now support connection compression, which can greatly reduce the data transfer size between multiple sites. For instance, enabling connection compression for MySQL master-slave can easily reduce the bandwidth usage down to 1.5x, without additional configuration on the compression level or algorithm. This is called the lossless data compression technique. You may set to an even higher compression ratio, with a tradeoff of processing power for compression and decompression on both endpoints.

The workload placement is also important. With hybrid cloud setup, applications and workloads may exist on both private or public clouds. For in-house applications, it is much better to place them in the private cloud closer to the on-premises with lower network latency. To improve the performance of public applications, place the applications on the edge servers of Content Delivery Network (CDN), which will greatly reduce the burden of the main server to only handle the dynamic request and offload the static content delivery to multiple edge servers, which geographically closer to the end-users.

Faster Encryption

In-transit and at-rest encryption is mandatory in a hybrid cloud setup since we only own a fraction of the infrastructure. We don't want prying eyes to look at our data while being transmitted, or the risk of data breaches from theft or outsiders who have physical access to our data. In simple words, every part of moving data or non-physically-accessible data must be encrypted, period. However, some encryption ciphers may compromise the speed and performance of the workloads. 

A common fallacy is to assume that a message encrypted using AES256 is more difficult to crack than the same information protected using AES128. It makes logical sense that a larger key size introduces greater complexity but as with any systems, implementations are subject to weaknesses. Assuming we are talking about AES128 versus AES256, there is a known weakness in the key expansion function that affects AES256. Fundamentally, the weakness reduces the complexity of AES256 to that lower than AES128.

Some of the tunnelling tools like WireGuard are well-known for their faster encryption and fairly simple to implement when tunneling between multiple sites. It works similar to how SSH encryption works, using an asymmetric cryptography approach. According to this research, on average, WireGuard is 58% faster than OpenVPN across all the tested locations. Faster encryption means less time to encrypt and decrypt data, allowing your data exchange performance to increase significantly.

If you wonder how to set up WireGuard VPN for a hybrid cloud environment, check out this blog post, Multi-Cloud Deployment for MariaDB Replication Using WireGuard.

Monitor Everything

Cloud-based environments rely on a complicated set of resources and identifying the availability and performance issues that most affect business services is challenging. The operation team needs to be able to holistically monitor application health including the accompanying cloud infrastructure components.

The performance improvement on a hybrid cloud can't happen without broad visibility into all resources at all times. Resources like instance and network utilization, application performance, user experience, latency and log files are very important to be collected and sampled to ensure we can proactively troubleshoot performance and availability problems before they reach end-users or becoming worse. Misallocation of resources will always happen in a poor provisioning environment, which eventually leads to poor capacity planning and wasting money and resources.

Most of the public cloud providers offer in-depth monitoring services, covering multiple layers and components of the subscribed cloud services. However, the missing piece is commonly a monitoring unification between different cloud platforms, providers and environments. Open-source all-in-one monitoring tools like Icinga, Nagios Core and Zabbix can be configured to monitor almost everything involved in a hybrid cloud especially cloud instances, networks, services and applications. 

In the case of performance monitoring for database servers in the hybrid cloud environment, the following resources might help:

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.