Things to Consider When Building an Internal or Private DBaaS

Krzysztof Ksiazek

In one of the previous blogs we discussed a possible DBaaS environment built with ClusterControl as the deployment and management platform for the databases. Today we would like to take a closer look at that setup and discuss some of the important bits you need to consider when designing and building your own DBaaS.

Pieces of the DBaaS Puzzle

First, let’s clarify what are the elements of the whole setup that we have to incorporate into the solution we plan to build.

We definitely need a pool of nodes to work with. This can be local cloud, it can be an external cloud provider or it can as well be a mix of different solutions. We will use those resources to spin up virtual machines which will then be managed by ClusterControl (or a hand-made solution if you want to do things on your own). Based on what resources you plan to utilize, you may come up with some different approaches to how to spin up new VM’s and how to manage them - stop, start, destroy.

Then we have the management platform and let’s assume for the purpose of this blog that it will be the ClusterControl. Such a platform will be used to take the virtual machines created earlier and provision them with the software and, if needed, data. It may as well be used for the purpose of a life cycle management for databases. ClusterControl provides tools to perform management options, delivers detailed monitoring and alerting and many more.

Finally, we want to have a platform that users will use to interact with our DBaaS. Most likely it would take a form of UI - graphical interface that will allow users to spin up new database clusters, maybe even provision whole applications - both the application and the database backend.

Things to Consider When Building an Internal or Private DBaaS

DBaaS Connectivity

For starters, we have to think about the scope. Are we going to use an internal datacenter or are we going to use some of the cloud providers? Do we want to mix on-prem and cloud resources? The answer has to be given in order to plan how the connectivity will look like.

The baseline is quite obvious - all your resources should be connected in one way or another. The simplest scenario would be to use just a single datacenter, a couple of servers as hypervisors and then run your cloud on them. Virtual machines will be started and stopped, the connectivity will be in place using virtual networks.

Things to Consider When Building an Internal or Private DBaaS

This is the simplest scenario where everything is running locally and the access is handled on the virtualization level.

Another level of complexity would be to do mixed setups - on-prem DC and a cloud or even multiple clouds. Here we will have to take care of the connectivity between all of the elements in the setup. There are multiple ways you can approach it. For starters, cloud providers tend to have some ways of integrating with on-prem setups. For example, both Amazon Web Services and Google Cloud Platform come with VPN services that can be used to integrate your cloud infrastructure with whatever you have on-prem. As an alternative, if your cloud provider doesn’t have such a solution or it is not suitable for you, you can always revert to custom, software solutions like WireGuard or OpenVPN, which allows you to create secure connections between all pieces of the infrastructure, no matter where they are located. One way or the other, you have to make sure that the secure connectivity is there and you can manage the whole infrastructure from a single place.

Database Provisioning

You will have to make sure that you have an automated way of provisioning new nodes into your infrastructure. ClusterControl will take care of the software part but it is up to you to make sure that nodes are up before software will be installed.

There are many ways you can do that, more or less solution or hardware-agnostic. Most of the cloud providers come with some sort of CLI and automation tools that would allow you to spin up instances, load balancers and other pieces of the infrastructure. In most of the cases you can also manage it through the additional tools that work like an abstraction layer over the direct interaction with the tools provided by the cloud vendors - Terraform could be an example here. Combine this with infrastructure orchestration tools like Ansible and you will be able to quickly deploy even complex setups with just a single command to be executed. Then, add ClusterControl on top of that and you have a process where you can easily deploy complex database clusters and, what’s great, you don’t have to own the whole deployment process. You have to make sure that you can spin up new nodes and ClusterControl will take care of the rest.

ClusterControl has several methods it can be interacted with - graphical user interface, which does not really work great for automation. Two other methods are more useful regarding that. First, command line interface - it is possible to execute jobs, including the cluster or node provisioning via commands executed in the shell. Some examples of this would be:

  1. Create MariaDB cluster 
    s9s cluster --create --cluster-type=mysqlreplication --nodes="10.0.0.151?master;10.0.0.152?slave" --vendor=mariadb --cluster-name=MariaDB_replication --provider-version=10.3 --os-user=root --os-key-file=/root/.ssh/id_rsa
  2. Scale up cluster by adding node 
    s9s cluster --add-node --cluster-id=2 --nodes=10.0.0.155
  3. Add loadbalancer to the cluster 
    s9s cluster --add-node --cluster-id=2 --nodes="maxscale://10.0.0.151"

This is something that you can easily integrate into a script or a playbook that can be used to automatically provision database clusters.

Second method would be to use the RPC that also allows you to define jobs that will be executed by ClusterControl. An example below is a job that will deploy a HAProxy load balancer.

{

"operation" : "createJobInstance",

   "job"       : 

{

   "command": "haproxy",

"group_id": 1,

   "group_name": "admins",

"job_data": {

     "action": "setupHaProxy",

     "admin_password": "admin",

     "admin_port": 9600,

     "admin_user": "admin",

     "backend_name": "haproxy_10.0.0.143_5433",

     "build_from_source": false,

     "clusterId": "2",

     "disable_firewall": true,

     "disable_selinux": true,

     "haproxy_address": "10.0.0.143",

     "lb_rw_splitting": true,

     "listen_port": 5433,

     "listen_port_ro": 5434,

     "max_conn_be": 64,

     "max_conn_fe": 8192,

     "node_addresses": "",

     "overwrite_mysqlchk": true,

     "policy": "leastconn",

     "stats_socket": "/var/run/haproxy.socket",

     "timeout_client": 10800,

     "timeout_server": 10800,

     "xinetd_allow_from": "0.0.0.0/0"

   },

"user_id": 1,

"user_name": "[email protected]"

}

}

Self-Service Panel

Final piece of the puzzle - we have connectivity, we have means to deploy database clusters. We should now hide the complexity of the solution and present users with an easy way to deploy their databases. One way to do it would be to build an UI that will be used to manage the whole solution. The question to answer will be: what is that I want my users to be able to accomplish? What would the DBaaS be used for? Maybe the use case would be to spin up some particular applications? In that case maybe it would be good to let users deploy the whole bundle - both the application, including load balancing for high availability and the highly available database cluster? Would it make sense to allow users to scale out their deployment? With ClusterControl scaling the database tier it is pretty simple. Some code would have to be written to scale out the web part. As you can see, depending on the use cases, there are a couple of things to think about when building the user-facing part of the application.

Other DBaaS Considerations

On top of what we already covered there are a couple other aspects we’d like to briefly discuss.

DBaaS Cost

While building your own DBaaS will always give you more flexibility, you should double-check the development cost. You will have to use some sort of infrastructure that will cost you. No matter if it is your own datacenter or you use the external cloud, that’s an additional cost. Add to that development and maintenance costs and the bill could be a surprise - make sure it is a feasible solution.

DBaaS Maintenance

There will be code that you will have to write and that will have to be maintained. There will be even more code that would have to be written if you plan to create the management part from scratch, on your own. This will tie your resources. You may consider using a tool like ClusterControl that can be integrated into your environment and that will let you reduce the number of man-hours spent on the maintenance of your DBaaS.

We hope this short blog will be a food for thought and will help you to make sure you approach the topic of building your own DBaaS with a good amount of consideration.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.