How Database Automation Lowers The Total Cost of Ownership of Infrastructure Services

Krzysztof Ksiazek

With the pandemic scorching the world for more than a year already, businesses are looking for a reduction in the cost of operating a business. In many cases, IT is one of the most expensive parts of running a business. Infrastructure in particular often comes with a steep price. And, people working in IT are not the cheapest ones out there. What can you do to reduce expenses? In this blog post we will discuss one particular aspect of the whole picture - database automation. How can it impact your business?

Automation, in general, is a good source of cost reduction - if you automate a process, it will typically take less time for your employees to accomplish. It also reduces the chance of something going wrong while performing it. Automated processes can be properly tested and verified to perform exactly in the way they were designed. The database world is no different. Let’s take a look at a couple of elements that you may want to automate.

What can be automated?

This won’t be an exhaustive list but we will cover the most important components. 

Deployment

To start, one of the processes you can automate is the deployment of database clusters. The idea behind automation is to reduce the time spent on a task that is being performed over and over again. Deployment is definitely one  such task. It is not just about the deployment of production databases, but it is also about maintaining them through the whole lifecycle. Adding new nodes to replace old ones, upgrading hardware, automated backup verification, and so much more - all those activities require a new database to be deployed and set up properly. Quality control, testing, dev environments - this is another separate set of tasks that will constantly be performed by your team. It includes refreshing them using data from production, building new, temporary setups to test for upgrades, or to catch some particularly evasive bug. Being able to deploy whole environments by running a set of scripts (or even just one) can be a great help to your operations team. Not to mention that following the infrastructure as a code principle helps with disaster recovery scenarios: if you can deploy every bit of your infra by running well-tested scripts, you can quickly recover from situations in which your existing infrastructure is essentially vaporized.

Scaling

From deployment it’s just a small step forward and we can start discussing scaling. Of course, scaling  is based on deployment as new database nodes are created and provisioned. Generally speaking, scaling is simply a process of adding or removing capacity to your infrastructure (database clusters in particular) based on the level of CPU, memory, I/O, or network capacity utilization. If you lack resources, you can scale up by adding more nodes (horizontal scaling) or by increasing the size of the node, to the extent that it is possible (vertical scaling). You can also scale down, if the resources are underutilized - removing nodes from the cluster or reducing their size. In short, you want this process to be as automated as possible. It is quite unrealistic to expect the database engineering team to sit and watch how the load fluctuates. They are busy doing their jobs. On the other hand, you don’t want to wait till the last moment with bells going off.

The ideal scenario would be to  have a set of software: scripts, daemons, whatever you like, that monitors the state of the database cluster. Based on predefined conditions (the more detailed, the better), an action will be triggered to increase or decrease capacity.

High Availability

High availability follows the same principle we discussed above: you don’t want to wait for your staff to react to the failure of one of your database nodes. You want such failures to be handled automatically: perform the failover or move the traffic to another node, and distribute it across live database servers. Again, you want something to watch your database infrastructure and take actions whenever it is needed to assess the health of the nodes, detect the ones that are not available, and take actions required to restore the functionality of the database tier.

Schema changes

Like it or not, very few databases are designed perfectly from the start. Most likely, you will have to implement some sort of change within the schema. It can be a new feature that requires such change, or it can be a schema change that is going to introduce performance improvements. One way or the other, schema changes are a real thing and you have to deal with them.

The old-fashioned way of doing it was to have your database administrators assess the proposed schema changes, working with the developers to define the change, and finally execute the change on the database infrastructure. 

In the past this was acceptable, but the way software is being developed has changed over time. It’s not a once-per-couple-of-weeks release, but the deployment process is continuous. Such an approach causes database administrators to become a bottleneck - they cannot review and execute all required schema changes fast enough. Thus the need for the automation arises. How to do it is a topic for a separate blog, or even a whitepaper, but the process may involve using tools like gh-ost or pt-online-schema-change that allow for managing the schema change process (pause it, start again and so on). You may want to develop a set of scripts and code that will integrate with the development process and run the schema changes on the fly. On the database side, you want to monitor the load on the system to ensure that you do not overload the databases with schema changes. Every schema change should be tested, so you may want to have separate replicas running the production schema, dedicated to testing the schema change process. All of this served with an automation sauce should let you efficiently run the schema changes at the velocity required by your development team.

Why do I want database automation?

We have discussed some of the areas (definitely not all of them) where the automation of some processes may help significantly. What are the gains? Why spend time developing and maintaining code that runs those processes? Let’s take a quick look.

Reduce the load on staff

Obviously, the fewer the tasks to handle, the less the load and burnout faced by your database engineers. This is especially true if we are talking about the tedious, regular activities that you perform over and over again, on a daily basis. Time that otherwise would have been spent on such activities can be reused. Let your database engineers do what they are great at - optimizing database performance or perfecting the code managing database tasks. They will be happy because it is obviously more exciting to create a new functionality or spend more time tweaking the database ensuring that all is working just fine than creating a fifth index or modifying the tenth column today.

Reduce the chance for human error

To err is human. We’ve all made mistakes and we all will make them in the future. The only person who does not make any mistakes is someone who’s doing nothing. Automation is an ultimate runbook - you execute a code and that’s all; it will start doing exactly what it was programmed to do. Sure, there could be bugs and incompatibilities, but for the most part, automation works reliably. Especially if you put some time into testing the process and verifying it on a regular basis, that all works fine. What is great is that this also lets you reduce the learning curve - it’s easier to learn how to execute a command than learning how to perform all the actions that this execution triggers under the hood. Of course, as we mentioned, you have to invest some time in the maintenance of the automation tools, but ultimately, it is a very good investment.

Reduction in infrastructure expenses

We mentioned previously that scaling is a task that can be automated. Having the ability to automate this process may lead to significant savings. Infrastructure costs are typically one of the highest of your expenses. Managing the size of an infrastructure on the go is quite complex, and it is unlikely that you can react to all of the fluctuations in the load. Sure, if you expect an increase in activity, then obviously you can and you should plan accordingly and size your database tier properly. For the smaller, less predictable changes in the workload, automation may be able to act way faster than your staff can..

In this blog post we have just scratched the surface of the topic of automation, but we hope it is clear even at this stage that automation of the database infrastructure is something desirable. This is a way to reduce the burden on your engineers, free them to work on projects that are intended to bring in money instead of working on maintenance.

Let us know what you think about automating database tier. If you have any insights, feel free to share them in the comments below.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.