Charting the uncharted - the role of DevOps & Platform Engineering in Sovereign DBaaS

Charting the uncharted – the role of DevOps & Platform Engineering in Sovereign DBaaS

October 20, 2023

Josef Pullicino

Business experts emphasize the importance of cross-departmental collaboration and advise leaders to encourage their teams to leave their silos and work together as that’s the only way to meet customers at every step of their journey.

But what happens when one team ”threatens” to take over the other team’s job? We’ll try to answer this question in this episode of Sovereign DBaaS Decoded.

Our guest is Josef Pullicino, the core DevOps at MeDirect. Through the conversation, Josef shares his take on whether DevOps will take over the DBA role, the effects of such an event, the importance of automation in database management, and how DevOps can support Sovereign DBaaS initiatives.

Key Insights

⚡If a DBA hands the responsibilities to DevOps, does the database management change? According to our guest, it has already changed and will continue. But he also adds: ”The dynamics are changing gradually, bit by bit, and the goal of DBA is now taking the shape of an architect because you need to design the infrastructure to make it operational in the correct way and in a performance way. So before going to use automated tools to have your database fully up and running, you need to see the best way you’re going to design your account from an architecture point of view of your database. […]

You need to see it more from the logical perspective and connect all the components and all the dots.”

⚡With automation, we can stop worrying about issues caused by human error. Despite the ongoing discussion of whether AI will replace the human workforce, experts say tools that automate specific processes are critical for efficiency as they allow teams to focus more on strategic aspects of the business. Also, these solutions come with additional benefits. ”So you are a hundred percent sure that if you are going to create a user or set a password, you know that it was done right or that something was not missed or forgotten because the human element is not there anymore. In the traditional way of doing things, a human being is human. Everyone would be subject to mistakes here or there. But with automation, you have peace of mind because there is standardization and a pattern of how it is done. So, in addition to the values that you get for the time to delivery from a project point of view, these values, I think, would need to be also taken into consideration as part of the overall investment that you are getting from these tools.”

⚡Plan the best-case scenario, and prepare for the worst-case scenario. Doing a backup is the key piece in database management. But unfortunately, the horror stories around this process make companies question whether it’s worth their time. Here’s what Josef thinks. ”A database to run, it’s not only enough to have peace of mind from a data point of view, but you also need to consider the infrastructure. So if a database is hosted on a server, that would mean that you need to have backups and resources of the operating system, storage, and network configuration, which is in place of the actual platform. However, one has to keep in mind the disaster category scenario. It is something that many companies don’t put enough investment into or enough resources and attention to this concept or philosophy, I would say. But when the sh*t [happens], it would be too late because the time to cover and get back up and running could also result in a company going bankrupt or going into default.”

Episode Highlights

Is DevOps Taking Over the DBA Role?

”I would say partially. One of the main focuses of DevOps is automation. So it’ll facilitate the time to deliver your projects, which means that from a technical point of view, you need to deliver fast and with a structure. It’s not something that, okay, let’s build a database or an infrastructure. No, as part of the DevOp.

The other category would be MO projects, which would be more long-term in terms of effort, in terms of focus, in terms of attention that you would require, coordination between different teams, and having or making use of standardized tools, which helps you to deliver fast and with a pattern.

However, there is still a domain responsibility of a DBA systems administrator or an engineer who would need to master the tool used to make it operational from a database point of view. So like anything you can use in your life, it has advantages and disadvantages. But speaking from a time to deliver, yes, it would be much more beneficial than the opposite.”

Automation Frees us From Repetitive Tasks

”I would classify these operations: failing over a database, starting a database, updating a database, monitoring it, or security hardening; these I would categorize as the day-to-day operations. These requirements need to be automated as much as possible.

So, focusing from a DBA point of view would be more from a project perspective rather than day-to-day, the day-to-day something that is repetitive and always constant. That is something that can be done thanks to automation tools.

I can recall, for example, when using these automated tools. One of them is the Severalnines. It gets you off from that burden or hassle of trying to define the procedure of how you’re going to fail over a database, making sure that the replicas are always in sync, whether they are in synchronous mode or async mode, especially when they are different data centers.”

How DevOps Supports Sovereign DBaaS Initiatives

”The same concepts when having systems on the cloud, like platform as a service, infrastructure as a service, or software as a service. That would still apply when having systems on-premise.

However, with the introduction of DevOps to these systems being on-premise, it’s a case of building up a platform, providing dashboards for internal users so that requirements or use cases would arise from day-to-day, like the creation of users with a set of passwords, maybe extending some storage, for example, table spaces, adding columns or moving some goals, whatever it may be. So the goal of DevOps is crucial in this case because it is providing those tools.”

What We Do When We Manage Databases

”I would classify it into different levels: the infrastructure, the platform, and the service. From an infrastructure point of view, you need to manage the number of hosts that are going to host your database.

The architecture that you are going to employ. Whether it is active-active, active-passive, whether you are going to host or use different data centers geographically dispersed, what synchronization are you going to use, synchronous versus asynchronous? […]

Security nowadays is something that it’s not if it is going to happen, but when is it going to happen? […]

It’s very important to keep your database as secure as possible. Not only secure from a platform point of view but even the ecosystem of it. […]

Monitoring is very important because it gives you the right picture of where you are from a system health check status. […]

In addition to monitoring, one has to keep in mind also the alerting because it’s useless to have a good monitoring system for your database or any component if you are not getting alerted when there is an issue. […]

Another area that you need to take into account when managing databases is the day-to-day running of operations. Unless you are using certain tools such as Ansible, where you provide particular dashboards to helpdesk users or even application developers or engineers, it’s important to make sure operations like creating user and creating schemas columns do whatever you need.”

Here’s the full transcript:

Vinay: Hello, and welcome to another episode of Sovereign DBaaS Decoded, brought to you by Severalnines. I’m Vinay Joosery, Co-founder and CEO of Severalnines. Our guest today is Josef Pullicino, the core DevOps at MeDirect. Thanks for joining us today.

Josef: Thank you, thanks for the invitation and pleased to meet you.

Vinay: Excellent, so can you tell us a little bit about who you are and what you do?

Josef: I’m working with MeDirect Bank at the moment, I’m managing a team of DevOps, that’s one of the leading banks in Malta, however my experience also includes other jobs with other leading Telecom companies in Malta and was based mainly on applications and databases.

That brings me to a whole duration of around 13 years of experience between technical and management roles. From an academic point of view, I graduated from the University of Malta in IT and then from University of Cambridge from a management point of view.

Vinay: Excellent, so Sovereign DBaaS it relies on a platform-centric approach, where the database is operationalized for non-expert users in application teams. So that these application teams do not have to create the operating platforms for themselves.

And the advantages, well consistent governance across disputed teams and really no written tools, redundant tools or processors or parallel efforts by multiple teams. So all that sounds good, but the level of competence required for the role is ever increasing.

Ops folks, they need to understand Linux security networking, stuff like Ansible, Terraform Cloud CLIs, you know, from AWS, GSP Azure, you know or actually whatever Cloud they’re using.

So, Joseph, are DevOps taking over the DBA role?

Josef: I would say partially, so one of the main focuses of DevOps is having automation, so basically it will facilitate the time to deliver your projects which means that from a technical point of view you need to deliver fast and with a structure.

It’s not something that, okay let’s build a database, let’s build an infrastructure and let’s switch it on and that’s it. No, that’s part of the DevOps in addition to the efficiency and production that you will entail, it will also bring to you a standard method or a defined pattern of how you are building your infrastructure, your setup.

So, I would say that I would classify the main role activities between day to day the usual one of the middle operations that would be done by the DBAs or any other role, so that’s one category. The other category would be projects, which would be more long-term in terms of effort in terms of focus in terms of attention that you would require and coordination between different teams and for making use actually of standardized tools which helps you out to deliver fast and with a pattern.

I think, yes, partially speaking it takes a little bit of that role, however, there still remains the responsibility of the administrator or an engineer who would need to master that tool that is being used to make it operational from a database point of view.

So, I mean like anything you can use in your life it has always advantages and disadvantages, but speaking from a time to deliver point of view, I think, yes, it would be much more beneficial rather than the opposite.

Vinay: Okay all right, so the jobs might be shifting, but as you said you know the job still needs to be done whether it’s done by classic DBAs or DevOps, you know, folks who are doing a lot of other things as well. So when it’s transitioning to DevOps when actually the responsibility of the databases are being handed over to DevOps, how is this changing the way that we manage databases? Is DevOps really changing the way that we manage these databases, especially production databases?

Josef: I would say the dynamics are changing gradually bit by bit and the goal of a DBA is now taking the shape of an architect’s point of view, because you need to design the infrastructure, what you need to to do in order to make it operational in the correct way and in a performance way.

So before going to use automated tools to have your database fully up and running you need to see the best way that you are going to design and perform an architecture point of view or database. And the good thing about using such DevOps tools is that you will have a different app section layer. You need to see it more from the logical perspective the way you are going to connect all the components, all the dots, all together rather than going at infrastructure level and order to create this storage, in order to create users, in order to create the operating system and assigning the correct OS parameters and so on.

Those are things which nowadays through the infrastructure as a service or infrastructure as a code mentality that is being nowadays fully automated and it’s like epic requisites being done fully automated.

Nowadays, the role is like okay once you have the architecture set up already, the next step would be to go to the automated tool, whatever it is, and basically you need to get it up and running. So, yeah, I think over time the goal is gradually changing and I guess that over time it will continue further to change and it will become more abstract from a technical point of view.

Vinay: Yeah, and actually that’s my own experience, is sometimes talking to actual DBAs, production DBAs, and I mean you know at Severalnines we build a platform to help people automate their databases and one of the things it does is it automates failover. And, actually you would be surprised, but we have some companies, you know, where some DBAs say “That’s my job, to do the failover. I would never trust any tool or platform to do that for me.”

But I mean you know if you’re managing hundreds of databases, I mean, it’s impossible to manage, to do manual failovers. We do employ pretty experienced production databases, X production databases. And they say you know if you get paged in the middle of the night, because of a production problem, you know you might be sleeping or you know whatever I mean the time to get to the laptop to login to SSH into the Enterprise, you know, sort of corporate network and look at what’s going on, the logs to understand, I mean you’ve already burned minutes. So, you know, forget five nines, forget four nines, so I guess automation is one of the key factors when DevOps teams are managing databases – everything has to be automated really, well, most of it. And the role of the DBA actually is moving more towards an architect as you mentioned.

Josef: Yes that is correct, if I may I would classify these kinds of operations failing over a database, starting a database, updating a database, monitoring it or security, these I would classify actually into the day-to-day operations.

And these are things, requirements, which need to be automated as much as possible so that focus from a DBAs point of view would be more from a project perspective rather than day-to-day. The day-to-day is something which is repetitive, something which is always constant being the same not being changing at all. That is something which can be done thanks to the automation tools.

I can recall for example when using these automated tools one of them is the Severalnines because I tested it out personally myself, it’s basically it gets you off from that burden or hassle of trying to define the procedure of how you’re going to failover a database, making sure that the replicas are always in sync whether they are in synchronous mode or in async mode.

Especially when there are different data centers geographically speaking, for example other use cases would be to just update the operating systems from Linux from a positive point of view for example or from a Windows point of view depending on the platform you are using.

So these are use cases where you need to do them, I would say on a monthly basis, sometimes even they go as far as weekly depending on the paradigm and the business model that your company would be operating on. So these are something which can be automated, can be done fast and can be resilient, because just in case of any issues we would already have immediate feedback, immediate assistance of what the problem might be. And what might be the solution to it.

So, the time to think for a resolution is something which is significantly decreased if not eliminated in some instances. So, yes such tools are definitely worth investing for, I can understand that from a licensing point of view there may be some challenges, there may be some tricky situations in order to get approvals, however you need to get the big picture out of it because when you see the return on investment, the value that you are going to get out of those tools as opposed to not having them them, I think the weight balance would go to one side definitely.

Vinay: Yeah and actually one of our previous episodes you know we talked about in more details what actually you can do to you know what does it mean to build databases service within your own Enterprise.

Let me switch a little bit right to the bigger picture, so how does DevOps support Sovereign DBaaS initiatives? So we’re in database as a service, because I guess you know we’re used to thinking oh you know we’re going to use a dbas so so we’re going to use maybe you know Amazon RDS or Google Cloud SQL or you know something but but actually you know Enterprises are also running their own databases as a service in a way so how how does DevOps you know support such an initiative?

Josef: Yes, I think the same concepts when having assistance on the cloud like platform as a service infrastructure as a service, I think that would still apply when having systems on permits.

However with the introduction of DevOps to these systems being on premise, I think it’s a case of building up a platform, basically providing dashboards for internal users so that requirements or use cases that would arise from day to day, like creation of users resetting of passwords maybe extending some storage for example tablespaces and Oracle speaking or, for example, adding columns or removing some goals or removing some some foreign keys, whatever it may be, the goal of DevOps is crucial in this case.

Because it is providing those tools maybe from an API point of view or maybe through a script point of view by using well-known tools such as Ansible, such as Terraform in order to automate those kind of operations requests and responses and that would mean that all these kind of requests that would be raised would be eliminated or basically being avoided to be handled by the actual DBA itself.

So it is something that you have the functionality for, it’s there you can use it as much as you want whenever and wherever basically at any time of the day or the night. Because as we all know its not always during the day actually much of the work happens during the night.

Vinay: Yeah, and that’s true. I mean, you know, especially large organizations, maybe distributed teams, time zones—definitely self-service if you can actually empower your, you know, your sort of application developers and get them to actually do things on their own. So you basically set up the infrastructure. I guess in the DevOps world, this is what’s being called platform engineering, right?

Where you can, as you mentioned, through an API, you know, through sort of self-service APIs with plenty of documentation or even, you know, sort of in enterprise portals, you can actually go and, you know, get your own database and actually ask the system, you know, ask the platform for things so that you don’t have to worry about them. But then that platform itself—that’s kind of where DevOps has an important role, to help build that platform and run that platform reliably, so to speak.

Josef: And if I may add something to this, apart—or actually in addition to these operations, these operations from a DevOps point of view come with their own validations and verification checks. So you are actually pretty 100% sure that if you are going to create a user or you’re setting a password, you know that it was done right. Or you know that something was not missed or forgotten because the human element is not there anymore.

And I mean, in that traditional way of doing things, a human being is a human being. Everyone would be subject to mistakes here or there. But with automating things with automation tools, that is something which you have peace of mind because there is standardization, there is a pattern of how it is being done. So yes, in addition to the values that you get for the time to delivery from a project point of view, these kinds of values, I think, would need to be also taken into consideration as part of the overall investment that you are getting from these tools.

Vinay: Yeah, so removing this error element out of this.

So, you know, DevOps is about processes, right? We talk about people, processes, and tooling, right? It’s a lot about tooling as well. We often see companies using tools like Chef, Puppet, Ansible, you know, Terraform, to deploy and manage even database structure, right? So what are the challenges with that, especially, you know, when we talk about distributed cloud deployment models? Can you manage a database with the current tooling available?

Josef: I think the, I mean, database technologies exist in different shapes and forms, like any other service you are building. And one of the challenges is to be well on top of the technology from a knowledge point of view, from an experience point of view. So that is something which I think it’s an immediate risk that you have to consider.

You need to assess your team well in order to see what the gaps are for the technology. And once that is done, it is something that you need to see which tool you are going to use because there might be some tools which might be supported, for example, or compatible with Oracle technologies or Microsoft SQL Server only.

However, the truth of it nowadays—we all know it—that a company would be diversifying its database technologies and would not be reliant on a single database technology only. Different reasons exist for that, mostly from a licensing point of view, from technology point of view, and from a knowledge and experience point of view.

The facility or the easiness of finding the appropriate people with the acquired expertise, that is something which sometimes companies might consider it, but sometimes not. However, when you’re replacing people—because people do change and do move—you need to take that into consideration.

In my opinion, it would make more value to have or to rely on your business using standard or common popular database technologies such as Postgres, MongoDB, MySQL, MariaDB, SQL Server, or Oracle. Why not, if there is the budget approved for that? However, that would make it more easy to find the required skills from the external pool. So yes, specifying or actually trying to find the right tool in order to manage that database on the cloud—that is something which is challenging. However, if there is enough thought to it, I think it would be much more beneficial in the end.

Vinay: Yeah, I mean, the thing is, you know, what you see is you have things like, you know, Ansible. You go on the internet, you find so many recipes out there to actually deploy databases in different topologies and everything, right?

And then, obviously, deployment—you do it once, and then there’s a lot of other things that go with it. But you can still use Ansible to actually even help with, you know, the day-to-day stuff, as you mentioned. So one piece of technology which is pretty hot, you know, and it’s being heavily used now by a lot of enterprises, is Kubernetes, right?

So why not use Kubernetes to manage your databases and just let it handle the databases?

Josef: Yes, so Kubernetes, or actually the concept of containerization at the moment, is being used more for stateless applications, which means that basically, and in general terms, applications don’t process data on traditional files with so much importance, just as much as they do for databases.

However, there are use cases where you can deploy databases over Kubernetes, such as Postgres and even SQL Server nowadays. However, it’s always a challenging task to make sure that it’s running fine, that files are being persisted appropriately, that storage is being consumed appropriately, monitoring, security point of view as well, and do take—or actually must be taken into account as well.

So I think at the moment there is not enough confidence yet in deploying these database technologies over Kubernetes when the concept of it is for applications to be segregated into different bits and pieces, and it’s very different from this traditional way of deploying applications. I mean, an application consists of the actual core executables that are important, very much useful to make it run.

However, it has other components like the configuration files, like the credentials being used, like the logs. So these components are all exploded and segregated into different areas over Kubernetes so that they can be managed separately from the application.

Many advantages would be taken into consideration when having such circumstances. First and foremost, it would be easier to update an application because if nothing else is being updated or being changed, then it’s only a matter of deploying or updating the application—the new release of it through a pipeline.

So everything will be again automated from pulling of the new release from the source code repository such as Git, all the way down the infrastructure platform being used to host that application—in this case, it’s Kubernetes.

So that is one of the advantages. Another advantage would be that restart applications, resiliency, monitoring, failing over—these are concepts which are embedded automatically when deploying Kubernetes clusters or OpenShifts or other containerization technologies.

Vinay: Yeah, so I guess there is a bit of work to do there before we can put databases into these Kubernetes clusters. But yeah, obviously, pretty much all the database vendors out there have operators to help people run their databases.

But if you look at it, really, I mean, it does today. The operators, many of them, they do a couple of things. They help you deploy, they help you maybe maintain, like, you know, this is the number of replicas in your database cluster. It might help you with some backups, maybe some upgrades, right? But managing a database actually is much more than that.

So let’s talk a bit about that. When we talk about management of databases, what exactly are we talking about? What tasks, what activities? What is it that we do when we manage a database?

Josef: I would classify it into different levels, mainly the infrastructure, the platform itself, and the service in itself.

So from an infrastructure point of view, you need to manage the number of hosts that are going to host your database, the architecture that you are going to employ, whether it is active-active, active-passive, whether you are going to host or using actually different data centers geographically dispersed, what kind of synchronization you are going to use—some synchronous versus asynchronous—and the advantages and disadvantages that come along with each option.

What else? Failover from one side to another, for example, when doing OS updates. Is it a matter of updating both or actually all the nodes at one go? Or is it something that you would go more gradually, updating the master nodes, the secondary nodes first, and then the master node afterward?

Second, with the point of view nowadays, it’s becoming—or actually, it is switching—within the top management within a company, because security nowadays is something that it’s not if it’s going to happen but when it’s going to happen. When you are going to have, for example, a security breach, either from a data point of view or from the application point of view or from a network layer point of view.

So it’s very important to keep your database as secure as possible, but not only secure from a platform point of view, but I think even the ecosystem of it. So the parameter systems such as the network, such as the applications which are directly hitting the database or scripting which is directly hitting the database in order to get or deposit data. So all these peripheral devices need to be as secure as possible, just the same.

Monitoring is a very important concept to employ because it basically gives you the right picture of where you are from a system health check status. Basically, you need to see each individual component—how it is, whether it is up and running, whether it is under maintenance, and to plan maintenance mode, whether there is an issue in whatever component it is.

It might be storage point of view, might be OS point of view. However, when managing different clusters of databases, you don’t need to have too much detail in your eye because then it’s a matter of not giving enough attention to that monitoring dashboard that you are having.

It’s something which you need to see, okay, what is the health check status of this database? Is it up and running? Yes? No? Is it in a healthy state, performance state? Yes? No? Levels—whether it is low, medium, high performance.

So yes, that is something which needs to be taken into consideration. In addition to monitoring, one has to keep in mind also the alerting because it’s useless to have a good monitoring system for your database or for any component, basically, but you are not getting alerted when there is an issue.

And again, issues do happen. It’s not a question of if but when. And alerting has to go to the right channels, to the right people, at the right time. So getting alerted with multiple notifications for nothing is useless, but receiving the right amount of messages at the right place at the right time is very important because that would mean that you have peace of mind, you have trust, and you have confidence in that monitoring and alerting system that it is doing its job properly and very well.

So yes, that’s another area that you need to take into consideration when managing databases. I would say the day-to-day running of operations, I mean, unless you are using certain tools such as Ansible, where you are providing particular dashboards to help desk users or to even application developers or engineers, I think it’s very important to make sure that those operations like creation of users, creation of schemas, columns, rows—whatever—you need to take that into consideration as well. And in the absence of automation, you need to have a way of how you can verify that those changes have been applied well.

So yes, I would say those are some of the major considerations that you need to take into your mind.

Vinay: So if we look at, you know, some of these things that you mentioned, let’s take deployments, for example, right? I mean, you know, in terms of deployment, especially when you talk about distributed setups with, you know, multiple instances with high availability and, you know, some kind of failover mechanism, how hard is it, right, to standardize this high-availability stack?

We often see companies doing this manually. You know, you have this one good person who is very good at, you know, setting up the databases and making sure, you know, the files are in the right place, you know, the configuration files, the data files, what ports are open and what, you know, and all that.

And you’ve already locked it, you know, locked it down. But when you do that manually, you know, you sort of end up with snowflakes. That’s kind of the problem, right? Because rather than having a sort of standardized, you know, topology where you know what port you’re using, what users are defined, whether actually the vendor defaults are being taken out, right?

I mean, if you don’t do that, I mean, if you do this manually, then it’s hard to know whether these things are standardized, right? And if everybody is going to follow some kind of enterprise guideline, you know, it’s quite hard to know whether it is enforced, right, if people are doing it manually. But how do you see DevOps teams, you know, doing these deployments?

Josef: So I think, first of all, we have to define what is a deployment, what does it involve, and when it is being done, what changes are being part of the deployment. I would classify that deployment would be business-related, would be infrastructure-related, and there is a significant difference in between them.

Business deployments, I would say, which would mostly include changing of rates or new offers, removal of offers depending on the competition out there—that is something which can be standardized because the usual change is being done.

However, if one takes a look at infrastructure, what is being done, I think that is more tricky, I would say, because it’s not always the same changes being applied. Sometimes you are tweaking a database in order to become more performant, more efficient. So you are touching some major configuration files of the database.

Sometimes you are tweaking the operating system in order to make it more performant or more secure. And sometimes you are, for example, changing the monitoring system in order to get more access towards the database, in order to have a more complete picture of what is happening in some areas of the database, in order to see, for example, about the data core, about the backups, about the resource, and something.

So one has to start from the fact: what is a deployment, when is it being done, where and by who? Once that is done, I think there should be a distinction between an administrator and an operator. Operator, which is basically executing something which is already defined and well documented.

And an administrator is that kind of guy who would administer, manage, control the database from an infrastructure point of view. Looking from an operator point of view, he or she needs to have one point of contact to basically connect to the database, do the changes, and do whatever is required over there.

And then it will be up to the formal infrastructure point of view, from a platform point of view, to replicate those changes from the master or from the main node towards all its replicas. And that way, it would mean that data is synchronized across all nodes automatically without the need or without the requirement from the actual operator to make sure that all nodes are actually updated just the same.

So you eliminate the human element, you eliminate the dependency again on the human to make sure that all database replicas are in sync. So you have to segregate what can be automated from an infrastructure point of view, to have a robust system, and basically from what can be applied and changed from an operator point of view, having the peace of mind, the trust, and confidence in the system that whatever is being applied, it will be applied across the board.

Vinay: And I guess, you know, very closely after deployment, there’s monitoring, right? So I mean, monitoring is one of those aspects that, you know, you need to know what’s going on. And I mean, sometimes if things are running, you don’t really look at it, right? But when something goes wrong, then you need to know what’s going on under the hood, right? When you get alerted, right?

So, because, well, you need to have visibility; otherwise, you’re flying blind, right? So what types of data, right, typically are you looking at? I mean, you know, and how are clustered, replicated setups, high-availability setups, sort of different from maybe more single-node setups, right? I mean, from a monitoring system, you know, what are you looking at, so to speak?

Josef: So again, I think monitoring has to be at different levels. One of them is the infrastructure, to make sure that the databases are running fine from clusters at a point of view, whether the data is in sync between the master and the secondary nodes. And if not, by how much time it is lacking or it is falling behind, whether there are, for example, any unused indexes which are there, basically occupying unnecessary space storage, or whether there are any missing indexes, which basically, while having or doing some analysis, you would notice that there are some queries—select statements—that would become much more performant if we add this, this, and that in terms of indexes.

So yes, from an infrastructure point of view, you need to have different looks at where to look in order to make sure the databases are running fine. And not only from a database point of view. And sometimes you have to go even lower level, even from an operating system point of view: how much memory is being used, how much CPU, how much storage is being used, what kind of hits, reads, and writes over the storage system. So these are all metrics that you need to have in order to have an enriched picture of where you are.

Not only that, I would say you need to pay extra attention at different times of the day and night. What kind of operations would be going on? Because we all know that during the day, or at least the majority of the companies, their business would be going during the day. However, during the night, there are still critical processes going on, such as backups.

Again, I mean, we have to define what kind of backup you are taking. Whether it is a full backup every day, whether it is one full backup a week, and then having the incrementals along the week.

So processes related to backups again would be significantly having their head from a database point of view, from an OS point of view, from a storage point of view. What kind of locks over the database you would be having in different tables or maybe, I mean, some locks which would be having concurrent due to concurrent SQL statements. From where are they coming? Are they coming from the originating from the same application or from different applications? What kind of accesses that you would be having towards your database?

For example, one issue, or one actually error that might be encountered, from Oracle’s point of view. With Oracle database, the concept of users and schemas and objects is different from users and schemas on SQL Server or on Postgres. So on Oracle, if you lock a user, it doesn’t mean that the objects underneath that user are locked or inaccessible just the same. They are still accessible.

So switching off access to a particular user—for example, user1—and having tables from 1 to 10, having another application accessing the database through user2, and this user2 having access to those tables of user1, it’s an indirect access towards the database just the same.

And I mean, having this not properly in your thoughts when doing analysis and troubleshooting, one could cause easily some locks or some performance deadlocks or whatever on the database that would significantly impair the operations of the database. So yes, these integrities and these low-level troubleshooting need to be carefully done because you wouldn’t know which or what can cause an issue.

Vinay: Usually, a lot of the data you don’t really need until you do need it. You mentioned backups, and actually, that’s one of the key aspects when you operate a database. So we had, in a previous episode, we had Christian Contop from Booking, and he would rather speak about restore, actually. He doesn’t talk about backups. He said, you know, backups are useless. You know, restore is what we talk about, right?

And actually, you know, because that’s also, you know, sometimes you read that companies, you know, they had crashes, they tried to restore, and actually, well, the backups were not really working, right?

How are we doing there? Are we doing a good job at, you know, really testing those databases that we back up to make sure that, you know, that insurance policy we have, that actually it’s worth something.

Josef: Like Chris was saying. I mean, it’s useless to have backups being done on a daily basis but without being used to them, that restored at least significantly from time to time.

So I think it has to be written everywhere in the DBA’s mind that any kind of backup you are taking from a database, it has to be restored. And we have to define the restore operation, how it’s going to take place, and when and where. Database refreshes, for example, in between different environments, are quite common and quite usual, I would say, as requested by different developers. So in terms of data, in terms of logical structure of a particular schema or a user, I think that is something which is being tackled quite regularly, I would say, in general.

However, the problems come when you mention the infrastructure. A database to run, I mean, it’s not only enough to have peace of mind from a data point of view, but you need to take also into consideration the infrastructure.

So if a database is hosted on a server, that would mean that you need to have backups and restores of the operating system, of the storage, of the network configuration which is in place, of the actual platform, how it’s installed, and the security levels and releases that are attributed to it. So yes, you need to take care of all the levels that would be attributed to the database.

However, one has to keep in mind also the disaster recovery scenario. It is something which many companies don’t put enough investment or put enough resources and attention to, this kind of concept or philosophy, I would say. But when the shit hits the fan, then it would be too late because the time to recover, the time to get back up and running—I think it could also result in a company going bankrupt or going into default. You wouldn’t know what kind of impact or damages would be applied just in case you have a database failure or a network failure. I mean, it doesn’t have to be only a database in this case.

So there has to be a business impact assessment for each service, for each system that is related, and you would need to see what are the impacts, the damages, and what needs to be done in order to recover that system. Case in point, I mean, speaking from a database point of view, it’s very important to at least practice or simulate a disaster recovery restore at least once every six months, at least. The ideal scenario, I think, would be at least once a month, but it depends on whether this is something automated, whether it is something which is manually done until now.

And then there comes the concept of DevOps once again. How are we automating the backups and restores? Where are we automating them? Are we using the same set of servers? Are we provisioning servers on the fly when we are restoring the data? What kind of servers are we provisioning, whether it’s virtualized servers using VMware or Hyper-V or Oracle or IBM?

So you need to take into consideration the whole process of restoring and not just, okay, we have backups, data backups. Okay, but that’s only 20-30% of the system in order to get back up and running.

And you need to take into consideration also the network accesses, which is the ultimate means of how users are going to interact with your database. It’s useless to have a database fully up and running, even from a networking point of view internally. However, you don’t have access to your end users for the application to access the database, then it’s just the same. It’s like you don’t have anything up and running.

Because having a database working in isolation when it should be working or coordinating with so many containers or applications going around, that is like a total failure just the same.

Vinay: Yeah, and that’s why, usually, you know, at Severalnines, when we actually help people deploy databases, we also make sure that, you know, a load balancing layer is also deployed, and it’s also highly available with, you know, sort of floating virtual IPs and multiple instances. Yeah, because you need to have a route to the data, so to speak.

So moving on here, you know, you work for a bank, and security is sort of what you do, right? So your customers need to be able to trust the bank, right? I mean, they trust the bank with their money. So what are the challenges you see, right, in terms of security, you know, securing databases? Banks, well, you know, but also even compliance, right? Because it’s a very regulated industry.

Josef: So first of all, you need to have—or actually build up—a careful picture of where you are in terms of releases of the database platform itself, whether there are any new service packs going on, whether there are any minor packages that you need to deploy. And these are areas, components that you need to take care of quite significantly.

Apart from that, you need to make sure that any kind of access to the database is either secured from a network point of view, or it’s happening in isolation. I mean, application to database only, or whenever there is the need to have end users connecting to the database directly themselves, either through some reporting systems or BI systems, data engineering tasks, or whatever, then again, you need to make sure that the relationship between a database and the device a user is having is secure, and there is no man-in-the-middle attacks along the way, for example.

That would also mean that you have the right access to the right resources. I mean, there is the concept of having minimum access actually always and everywhere, and you always ask for the kind of access you need upon request, upon demand.

What kind of backups are you hardening from a security point of view? Are your backup files being in plain text when it comes to the files being saved or being stored? Where are they being stored? Are they being stored on the file systems or local file systems, I mean, or on shared storage? Are these file systems being secured?

So again, these are concepts that you need to take into consideration, and you need to make sure that you don’t leave any loopholes along the way. However, there are use cases where, for example, there would be a single patch vulnerability that would be encountered or exploited very heavily with significant impact and damages that you would need to apply and deploy as early as possible. And that is one of the challenges which you need to carefully go through because you have to first understand what is that vulnerability, whether it applies to your database or not, in this case.

And if it is, you need to understand what is that vulnerability, what needs to be done in order to resolve the vulnerability or actually mitigate, putting controls along the way for that vulnerability. And I think one of the challenges would be to test it out. So is it something that you would need to apply on all database replicas or only on the master? When are you going to apply it? When is the best time to apply it?

And following that, how are you going to test the application to make sure that the database is still up and running and fully functioning as it was before? Or maybe having deployed a package which would then yield another issue.

So you need to do not only UAT testing from a user point of view, but you would also need to do regression testing to make sure that all the functionality that was present before wasn’t compromised. So yes, I think it’s one of the main hurdles being on a day-to-day basis, to make sure that you have your database systems up and running.

Having tools such as Severalnines’ ClusterControl to take over, I would say, from a security point of view, that I think would provide peace of mind or actually facilitate that your databases are up and running, healthy, and secure. So yes, using these tools, it’s a great way to facilitate these challenges in order to be good opportunities to have your systems up to date.

Vinay: Yeah, I mean, you know, database vendors, I mean, they do come up with patches, right? That’s quite often, right? I mean, I think for MySQL, typically you have every, you know, every month, every five, six weeks. And then enterprises are moving towards a more, you know, polyglot persistence kind of topology.

So it means, you know, multiple database types, and each database type has a number of patches. And then, yes, it is a pain in a way to do upgrades. Upgrading systems makes them unstable in a way, but then you need to upgrade and patch up security holes, right?

So let me tell you this, you know, application developers, they are measured on code, right? How much code they write, you know, how fast they churn new features, right? And that’s coming from the business. There’s a pressure to add more features and to continue, maybe, innovation. But then DevOps is measured in terms of uptime. You know, you have to have systems stable; you have to have them running, right? And there, the less changes, the less disruptions, the better, right?

So how do you balance this conflicting goal in a way where you have, you know, on one end, somebody being measured in adding more features, whereas, you know, on the other end, you have more stability, resiliency, and very little downtime as possible?

Josef: Yes, so I think I can relate that, or these kinds of changes, to deployments in general. Deployments coming from the business, I think, would need to be done very frequently and as much as early as possible. So then, there it comes the concept of containerization, to have the pipelines in order to fully automate those kinds of changes. And more often than not, actually, the preferred way is not to have any downtime over there due to high availability setups and so on.

However, from an infrastructure point of view, I think there has to be a standard way of how you are applying changes. For example, you have to define the time, the day and time when these changes are happening, and whether changes are happening from an OS point of view, network point of view, database point of view, and so on. So there again comes the concept of standardization, a pattern, and then, implicitly, it comes the concept of automation, and it comes as well the concept of using DevOps tools or DevOps operations in order to automate as much as possible those kinds of changes.

So I think the solution, or one of the answers to the business changes, would be to have high availability clusters as well, so that one downtime on one server wouldn’t impact the overall service, and similarly to the database as well. So if you are doing a security change, for example, applying a service pack on the database on an active-passive cluster, you would basically first start applying the changes on the secondary node in order to make sure that everything goes well from a procedure and documentation point of view.

Then you would flip the service towards that secondary node, leave it up for a couple of days, for example, or weeks, depending on what kind of change would have been deployed before. And following that, then the change would be replicated back to the other remaining node. So that is a way of making sure that from a stability point of view, the change would have been successful, stable, and would have reached its overall objectives.

Automating this can also be done through tools. Sometimes you can go big bang, I mean, updating all the nodes in a database cluster accordingly. However, that also depends on what kind of change would be deployed at that point in time, whether it’s a stable release, whether it’s something which would be comes out of the hood just as yesterday. So that makes a difference. What is going to impact? Whether it’s impacting the overall stability of the service, whether it’s the overall stability of a particular component—for example, the caching system, which we can do away with for a little bit of time, as long as the database would be still up and running.

So yeah, these concepts, these areas would be definitely taken into consideration when doing such changes. I believe that you have to use different metrics to gauge availability. I mean, the MTBF, the mean time between failures, for example, the MTTR, the mean time to repair, and the MTTF as well, although these metrics are usually engaged during an incident. However, they’re also part and parcel when planning out planned maintenance as well.

Vinay: Okay, so Joseph, we’re coming towards the end of this episode here. So I mean, there are probably, you know, many other DevOps teams out there in the same boat as your own team, right? Having to take responsibility in running infrastructure, in running even databases, right? And perhaps, you know, having to develop tools for automation of different parts of the system. What would be your best tips for these teams?

Josef: I think all the roles are equally important in IT. DevOps is just a single slice out of that. And I think one of the good skills that you need to have is organization. I think that’s something which helps you out in coordinating and scheduling and leveraging items with different teams, departments, and different people, and different kinds of people, different characters with different skill sets, with different gaps, and whatever.

So organization is very much important to basically have a picture of where you are, where you need to go, where you would like to go in one, two, three years down the line, short, medium, and long-term strategies, basically. You need to sharpen up your technical skills in order to make your technical setups as performant as possible.

You need to have the coordination skills when dealing up with different users. I mean, sometimes you would need to get engaged with a technical support guy for an issue you are encountering at the moment. Sometimes you would need to coordinate with a business person coming your way, telling you, “I need this report to include this, design this, and much more data,” and so on.

So you need to be able to understand, listen, understand the requirements, so that you would be able to be in a better position to provide your answers, your resolutions to these requirements. I think you need to be aware of all—I mean, all your external environment when it comes to laws.

Sometimes there may be changes here and there, for example, GDPR, which basically, I think, it shook up the majority of the companies. I mean, from just a simple application, from a simple website, all the way down to databases, operating systems, whatever.

So yes, that is another point which you need to be on top of it. And I think you need to see what automation tools nowadays you can employ. Actually, you have to employ—not you can—you have to employ in order to bring up the efficiency and the production to the next higher level.

Nowadays, we all know that competition is there—actually, it was there already—and competition will be there even higher, I would say, from other companies doing different offers, different products and services, and so on.

So in order to stay alive, you need to either react in a worst-case scenario towards new products and services, or better, you need to be proactive. You need to be leading; you need to be the leader in your industry. So you need to be creative; you need to be innovative. And this doesn’t start and stop from a business point of view. This, I mean, includes other departments, IT being one of them. And you need to be on top of what is happening, what other tools, technologies you can use in order to make your time to deliver your projects even shorter and even shorter, and sometimes getting it to a very minimum.

One other thing you need to employ, in my opinion, is what kind of investments you actually can do from a business point of view in relation to your technical setups that you can implement. Nowadays, there is the concept, I mean, of open source as much as possible, yes. But sometimes, some enterprise features would only coexist if you do the necessary or the right investment. And sometimes that would mean a lot of thousands of money, sometimes even millions. However, in the absence of them, you would be prone to some security risks or infrastructures that would impact your highly available clusters. So yes, I think all in all, these are all areas that you have to employ, in my opinion.

Vinay: Okay, well, thanks, Joseph. Thank you for that. You know, DevOps and databases, I guess, you know, we looked at the different areas of the different components that go into building a database service, really, right?

Whether it’s sovereign or not. I mean, you know, it’s, as you know, when we talk about enterprises managing their databases, we see that DevOps teams are increasingly being asked, right, to take responsibility for the actual database itself. You know, the DBA role is changing more into an architect’s role to actually think about the requirements and basically design the infrastructure, design, you know, the logical aspects of the database.

You know, we talked about deployments, monitoring, notifications, restore, the fact that maybe you want to do, you know, sort of try out the restore and see that it actually works on a more frequent basis. Yeah, there’s lots of things there. So thank you for joining us. And that’s it for today, folks.

See you all for the next episode. Thank you. Thanks for checking out this episode of Sovereign DBaaS Decoded, brought to you by Severalnines, the leader in intelligent multi-cloud database orchestration solutions for open-source and source-available databases. Like what you’ve heard in this episode? Then just follow the Sovereign DBaaS Decoded podcast wherever you listen to your favorite podcasts or visit severalnines.com/podcast to get access to all the latest episodes.

Guest-at-a-Glance

Name: Josef Pullicino

What he does: Josef is the core DevOps at MeDirect.

Website: MeDirect

Noteworthy: Prior to joining MeDirect, Josef worked at a leading telecom company in Malta, focusing on applications and databases. Over a decade-long career, he has gravitated toward technical and management roles.

You can find Josef Pullicino on LinkedIn