blog

3 principles of Sovereign DBaaS and how ClusterControl supports them

Krzysztof Ksiazek

Published

Flexible woman in business attire and ballet shoes, in the air mid-dance while holding a laptop and a cup of coffee.

In today’s era, where decision making has to be rapid and data-driven, the data that your business aggregates is increasingly critical. And as more and more regulation is passed, data sovereignty is now a significant area to address.

The concept of data sovereignty is important to incorporate into any organization’s data architecture design. Concerns over data privacy, regulatory requirements, and the ever increasing threats of cybercrime means you need to properly safeguard your data.

However, development speed is also considered essential to gain, or maintain, a competitive advantage. This, along with easy access to compute resources, pushes many businesses towards managed services like DBaaS (Database as a Service).

This type of offering, from the hyperscalers and others, allows you to build out data pipelines quickly. But that speed benefit can come at the cost of losing control over your data.

How to recover that control? One option is to consider a Sovereign DBaaS implementation.

Let’s explore the three core principles of Sovereign DBaaS. In doing so, we will also discuss the major differences between a traditional DBaaS and Sovereign DBaaS approach.

3 principles of Sovereign DBaaS

The concept of a Sovereign DBaaS rests upon three core principles:

  • End-user independence
  • Environment agnosticism
  • Open-source and source-available software

Ultimately, the goal of implementing a sovereign DBaaS is to regain more control over your data stack. Each principle supports this goal in its own way.

1. End-user independence

A Sovereign DBaaS user wants to be independent when it comes to the visibility and control of the database layer.

In traditional DBaaS models organizations often have limited visibility and control over the environment. You are presented with a self-service portal where you can perform some actions that were exposed by the DBaaS provider.

For example, you have specific options around deployment, configuration, and some particular changes in the database topology.

But you don’t know what kind of environment is going to be used, how the hardware is configured, or where exactly the database nodes are located.

You have to take some things for granted. And this severely limits your visibility.

For example, how are you going to perform a root cause analysis if you do not have access to the underlying infrastructure?

How will you diagnose performance issues if you don’t know how the virtual machines and their hypervisors are acting?

You can check what kind of network traffic you generate. However, you have no means to see and verify how this traffic is affected by tens of other services that do not belong to you but still reside in that segment of the network.

Sure, the network traffic is encrypted and separated using VLANs or other means. But it still goes through the same network cables and network infrastructure, and as such, it may interfere with the traffic that you generated.

Building a Sovereign DBaaS allows you to take full control over all of the aspects of the infrastructure. You get to decide how you want to connect servers, whether you use a dedicated or shared network, and if you’ll have direct connections between servers.

With more control over the infrastructure, you receive more insight into what is happening under the hood. You can monitor every aspect of the hardware and software that you use – debugging gets easier as you have more control over the environment.

You will be the one who is going to deploy and manage all hardware and software (ideally, using some software to help you with that, but it is still your responsibility).

This gives you full control over the location of your data. You know precisely where the data is located – not only where the server is located but even on which disk array in your SAN (Storage Area Network) database directory is stored. There is no way data can slip through the gaps and end up in a different location or country.

2. Environment agnosticism

You may not realize it at first, but traditional DBaaS providers can eventually feel like you are locked into a particular environment or ecosystem.

For example, let’s look at AWS. Let’s say you are using AWS Aurora and you are perfectly happy with the service. But new legal restrictions are introduced that dictate you must keep control over your data.

Can you use Aurora in your local data center? Well, no. You cannot.

You could explore AWS Outpost. But firstly, it does not support Aurora as it is too tightly integrated with AWS’ own infrastructure. And secondly, AWS Outpost is pretty much a black box that is installed in your datacenter.

To what extent it would comply with the legal restrictions that you are now obligated to respect, that might be a longer story and discussion.

Also, you have to consider how much control you have when using solutions like AWS Outpost. If you don’t fully understand how your data is transferred and where exactly it’s located, then arguably you don’t have control. And for regulatory purposes, can you ever be fully confident that your data is only located and processed in a local data center?

Make no mistake, this does not mean that AWS is malicious and sends your local data overseas.

It’s just that Outpost can be integrated quite easily with your cloud infrastructure. Then you are just a couple of keystrokes from writing some lines of code that utilize, let’s say, AWS Lambda to process the data.

This is a problem because Lambda does not run on premises (unless you specifically configure it like that) and you may not notice that your data is leaking out to the public AWS regions.

Even if you are happily using Outpost or its equivalent, are you fully in control of your data? Can you move it quickly and easily out of the AWS ecosystem if you would want to?

The answer is – no. There is no easy way out of the AWS ecosystem.

Can you take a backup and export it to any MySQL installation that you have?

Well, technically you can use mysqldump and other logical backup tools. But for larger data sets (even those at one hundred gigabytes, not to mention larger) this method is so tedious and problematic that, while it can be done, it’s far from easy.

By comparison, Sovereign DBaaS is environment-agnostic. It doesn’t matter where you want to have your database infrastructure deployed – locally, in the cloud, or a mix of both.

You are not tied to a single CSP provider or any particular ecosystem.

Sure, establishing and maintaining connectivity across multiple CSPs and on-prem infrastructure might not be trivial. But if you manage to accomplish that, then a Sovereign DBaaS implementation supported by the right tools will give you a full management solution. One that provides you with a single pane of glass to see and manage all of your infrastructure.

3. Embrace open-source and source-available software

The software that you choose to run on is an important factor to consider. One that could lead to data sovereignty and portability issues down the road..

For example, ask yourself the following questions:

  1. Is MySQL RDS or MySQL Aurora really MySQL?
  2. Is PostgreSQL Aurora really PostgreSQL?
  3. Can you take data in a binary format from Aurora and deploy it on MySQL that you installed from the packages available for your Linux distribution?

In each case, the answer is no.

Amazon states that both MySQL and PostgreSQL Aurora are “wire-compatible” with MySQL and PostgreSQL.

What this really means is that your application can talk to the database using MySQL or PostgreSQL protocol but the underlying database itself may not have anything to do with standard open source MySQL or PostgreSQL deployments.

In fact, you cannot promise 3x performance unless you reengineered the code of the database to take advantage of the AWS infrastructure.

This is a perfectly valid solution and Aurora is a great piece of software. But from an external standpoint, you have no control over the knobs and gears that are turning inside. You don’t even know what knobs and gears are there to begin with.

You just hope for the best and let AWS run your data.

If you’re not happy with those limitations, or if you need more control over your data, it may be time to explore other options available to you.

One of which is to build your own Sovereign DBaaS.

In such a setup, you are free to utilize truly open-source database software. And since these are freely available, this can help to reduce your costs.

As well as being free, these technologies are pretty popular. This means if you need to find resources familiar with MySQL or PostgreSQL, it shouldn’t be too difficult or lengthy of a process.

The benefits of open-source databases go well beyond cost savings, though.

For example, open source databases are the same no matter where you deploy them. It is the same code that is running. You can easily migrate from a Redis running on one CSP to a Redis running on another cloud, or locally in your datacenter.

The same is true for MySQL, PostgreSQL, MongoDB, and so on.

This gives you flexibility to create multi-cloud environments that span across multiple CSPs. It also lets you swap one environment to another. If you need to move data between different clouds, or from one cloud to an on-prem data center, this is perfectly doable.

Compatibility is important not only just to move your data to another location but also to interact with your data.

In many of the managed DBaaS services offered by CSPs, compatibility becomes a problem because software offered by them is typically modified to fit their needs. PostgreSQL is not PostgreSQL anymore. It is a black box that provides PostgreSQL API.

The problem is – this API may differ, even if slightly, across providers, making it tricky to utilize the same code to communicate with, theoretically, the same database. A bunch of ‘if X then Y’ is needed to work around those unexpected differences.

In a Sovereign DBaaS implementation that uses open-source or source-available database software, you do not have to worry about these slight differences in behavior between CSPs.

How ClusterControl supports a sovereign DBaaS implementation

ClusterControl is a full lifecycle database ops automation platform for open-source and source-available databases.

The software supports a Sovereign DBaaS implementation by providing you with a single management console for your clusters, no matter where they are deployed.

Let’s take a look at the role ClusterControl plays within a Sovereign DBaaS setup.

Monitoring

ClusterControl provides you with a set of dashboards that are intended to show the most important metrics. But it is you who are behind the steering wheel.

If you want to see more, or configure the dashboards in different ways, all you need to do is follow the available online guides to install external software like Grafana. Once installed, ClusterControl will continue to store the metrics in Prometheus, and you will be able to visualize them in the way you want to see them.

You can take the metrics directly from the database node or you can plug into the time-series datastore that ClusterControl uses – Prometheus, another open source solution, widely used across the industry.

If something fails, and we all know that eventually something will fail, you have full control over the whole environment.

It is your data center, your network gear, your servers, your hardware in those servers, and your Storage Area Network. It is also your responsibility to set up proper monitoring for that environment. After all, with great power comes great responsibility. There’s no one else who will be taking actions on any faults in your setup.

The best part is, though, if you did the preparations properly and you collect all metrics, you can perform the Root Cause Analysis down to the port in the switch or a NIC that might be having a bad day.

ClusterControl will assist you with the database part, collecting database logs and alerting on detected anomalies. But if that’s not enough, you can dig down layer after layer and see exactly why that network timed out, which resulted in a loss of connectivity between database nodes and a degradation of the database cluster.

Environment and vendor lock-in

Whether we are talking about environment lock-in or DBaaS vendor lock-in, the risk is significantly reduced with a Sovereign DBaaS implementation supported by ClusterControl.

When you choose to go with a traditional DBaaS provider, you’re limited to the database flavors that are available from their inventory.

A sovereign DBaaS implementation with ClusterControl provides you with a wider range of open-source and source-available database options.

You can deploy, monitor, and manage different flavors for specific use cases. And you’re free to run your clusters in whichever environment makes the most sense for your needs – on-premise, in the cloud, or in a hybrid setup.

This also grants you two key elements that are not present in traditional DBaaS:

1. Access to deeper levels of the database and the infrastructure layer, so that you can make changes that align your systems with your use case.

2. Portability via open-source databases and tooling that can be implemented anywhere, so that you can easily migrate to another environment if it’s ever required.

Let’s take a look at an example.

ClusterControl can be used to plan and execute a backup schedule including features like backup verification. While doing so, ClusterControl relies on industry-standard tools like, for example pgBackRest, Xtrabackup, MariaBackup or Percona Backup for MongoDB.

This means you can take every backup that you have created with ClusterControl and restore it manually on a database node. The node can be running in any kind of environment, as long as the database is the same version that you are managing using ClusterControl.

Using PostgreSQL 14? Take a backup and use it to provision data on any PostgreSQL node you may have. On prem or in any cloud that provides you with compute resources.

Are you running MariaDB 10.6? Take the backup and transfer that data to any other MariaDB 10.6 node where you can restore it just like that. There is no need for those database nodes to be managed by ClusterControl. You have the freedom to install them from scratch, by hand or using Ansible, Chef, Puppet, or a bash script that you wrote.

Environment lock-in becomes less of an issue with ClusterControl because, as a matter of fact, ClusterControl does not care where your resources are located. As long as it can connect to a database instance using SSH connectivity, it will happily manage that node.

Cluster nodes can be located wherever you choose. As long as there is network connectivity and SSH connection can be made, that’s all that’s needed.

Wrapping up

Traditional DBaaS services provide a lot of value, but you may find that they start to cause issues around the access and portability of your data. Ultimately, this leads to a lack of control.

An alternative approach is to build your own Sovereign DBaaS solution, supported by ClusterControl. This is cloud-agnostic and fully controlled by you – from the infrastructure to the database access management. It makes it easy to deploy, manage and operate a whole set of data stores, both open source and proprietary.

Your data can stay wherever you want it to, utilizing single or multiple cloud providers (or even on-prem). No vendor lock-in will limit your options and you are free to use any environment.

Stay on top of all things Sovereign DBaaS by subscribing to our newsletter below.

Follow us on LinkedIn and Twitter for more great content in the coming weeks. Stay tuned!

Subscribe below to be notified of fresh posts