Avoiding Cloud Lock-in with a DBaaS Provider

Agus Syafaat

As companies move more of their infrastructure to the cloud, they become increasingly reliant on cloud vendors. From an engineering perspective, IT’s focus is on building solutions for the business and ensuring a fast rollout of applications. Underlying infrastructure like servers, networking, storage, security mechanisms and databases are consumed, rather than built. The lock-in problem arises when the services that you are using are not available on other clouds.  

Cloud IaaS has little lock-in, as components like compute, storage and networking can generally be exchanged between cloud providers. Moving beyond basic IaaS, a value-added service like DBaaS is quite different between providers. For instance, AWS, Google, Microsoft Azure, Oracle and IBM all have cloud database services that work differently, and are proprietary in nature. Some of the database services even have specific APIs and data models. 

Taking as an example a DBaaS providing MySQL as a service, the service is not just about getting an instance of MySQL. There is a considerable amount of behind the hood automation that is invisible to the user, but crucial in ensuring ease of use, performance, availability and scalability.   

Data is also difficult to move, as it is stateful. It is constantly being updated by applications. It is not possible for businesses to pause their services while major upgrades take place. Any interruptions, whether planned or unplanned, need to be kept at a minimum. A five 9’s service level agreement amounts to a maximum of five minutes per year. Four 9’s availability amounts to about 52 minutes per year, or about 4 minutes and 22 seconds per month.  So, moving data is not something that an organization would look forward to as it will most probably mean loss of service for a period of time. The bigger the data size, the harder and riskier it is to move to another place. 

In this blog, we will discuss how to avoid cloud database lock-in with Database as a Service providers.

Why Should You Avoid Cloud Lock-in?

When assessing a cloud provider that is suitable for you based on your requirements, it is important to understand the cloud lock-in aspects so you know what your company is getting into. Because the problem is that you don’t want to get stuck with a solution that no longer fits your requirements or your budget. It is hard to predict the future. Businesses change, their requirements change, the world around them is also constantly changing. So what may seem like a good decision today, based on the information at hand, may not be true after a couple of years. 

For instance, all the major cloud provider’s provide DBaaS services - all these vendors are based in the USA; 

  • Amazon Web Services with RDS & Aurora, DynamoDB, and Elasticache. 
  • Google Cloud Platform, with CloudSQL, Spanner, and BigTable.
  • Microsoft Azure, with Cloud SQL Database, Azure Database for MySQL, MariaDB, and PostgreSQL.

These services are hugely popular, not just in the USA, but in Europe and other continents. But what happens to EU-based companies when something like the EU-US Privacy Shield is invalidated? Especially when data protection authorities start putting pressure on companies to store their data in a particular location. 

So the idea of avoiding cloud lock-in is to give yourself some options, or at least a plan B in case that plan A doesn't work out.

Alignment with Your Requirements

Choosing the right DBaaS platform for your datastore is not easy, and it is certainly not a decision that should be taken lightly. Because it is usually easy to get in, but not out. And even if there are tools to help you get out, the pressures of the business might not allow the time for a migration. So expect that you’ll be in there for a while, and might as well make sure that whatever you end up choosing will stand the test of time.

Therefore, you need to understand your data requirements, such as : data model, access APIs, geo location, encryption, regulatory compliance. Here are some questions to answer before you decide the DBaaS provider :

  • Does the cloud provider support your data model, or the specific database you are after?
  • Are the access APIs proprietary, or are they ‘industry standard’?
  • Is the behind the hood DBaaS automation proprietary, or available in other clouds or on-premises?
  • Do they have a standard of data protection and regulation ? Is it already aligned with your need?.
  • Is the cloud provider available in your country? Do you need your data stored within the country and aligned with local regulations?
  • Availability and Service Level Agreements
  • Support - who do you call if the database is inaccessible
  • Costs

DBaaS Based on Open Source Databases

MySQL, PostgreSQL and MongoDB are the most popular open source databases. Using a DBaaS that is based on vanilla MySQL and Postgres means that database clients would be the same as when accessing your self-managed instance. MongoDB provides their own Atlas service, but for instance Amazon provides a MongoDB-compatible DBaaS. Being able to run applications without any changes against the DBaaS or your own instance is a big plus. 

However, from an operational point of view, there will be differences. In a DBaaS, instantiating a database instance might be as simple as pressing a button. In the self-managed scenario, it is up to you to set it up with the right configuration and security. High availability and failover might just be a checkbox option in a DBaaS, but now you need to set this up on your own and manage failover and recovery. Other things include monitoring, backup management, security, scaling, and so on. Having all that management outsourced to the DBaaS vendor is convenient, and once you get used to it, it is hard to move. It should not come as a surprise that DBaaS prices have increased significantly over the past years.

There are other differentiated DBaaS services that are entirely proprietary, e.g. AWS DynamoDB and Redshift, Google Spanner and Azure CosmosDB. These are not databases that you could run on your own infrastructure. 

Database Migration: Both In and Out

DBaaS vendors would usually provide tools to migrate data into their cloud. For instance, Amazon provides AWS Database Migration Service to migrate data from your on-prem database into e.g. RDS or Aurora. It is also low cost - migrating a terabyte-size database can be done for as little as $3. Amazon does not charge for inbound traffic.

Getting your data out of the DBaaS is also an important factor, for instance, if you want to pump the data into a data warehouse for analytics. Here, we might find that cloud vendors would not be as interested in helping you get out. For instance, you can schedule backups from AWS RDS but you are not able to download the backup files. To get data out of RDS, you would have to use something like mysqldump (or why not mydumper & myloader). The outbound cost, based on the current list price of $0.15/GB, a terabyte would cost $150. The outbound cost might not be as much of an issue as the time it would take to run the logical backup and restore it, we’re looking at a process that would probably take days. But hopefully, you won’t have to migrate a terabyte of data. 

Using a DBaaS that builds upon a proprietary database and format means that the data will need to go through a transformation process. A migration would require reading the data out of the DBaaS, writing out to text files, reformatting the data so it can be loaded into the target database - a significant effort, in other words.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.