Running data infrastructure at an Enterprise scale

February 10, 2023
Kristian Köhntopp

Running and growing a robust database is a complex and time-consuming process. But, if managed properly, it sets the business up for long-term success. Therefore, as a manager, you must first ask yourself what you want from a database and whether you have the resources and people to create and own it.

In this episode of Sovereign DBaaS Decoded, we discuss the database methodology at Booking.com. Our guest is Kristian Köhntopp, the company’s principal system engineer. He and our host Vinay Joosery discuss the complexity of managing databases and the importance of having the right tools, processes, and people who understand their purpose and effect.

Kristian also shares his thoughts on cloud migration and reveals the pros and cons of using cloud database services. Finally, he explains why ‘managing’ is not the best word for the database department’s responsibilities and suggests ‘automation’ in its stead.

Key Insights

The Public Cloud Cost Paradox: While public cloud services solve the “day one” problem of provisioning, they often fall short on “day two” operational requirements like transparent failover and custom discovery. For a massive enterprise, running automated bare-metal infrastructure can be six to eight times cheaper than public cloud list prices.

Automation as a Limited Resource: Every company has a finite number of “innovation coins.” Using these to build internal database automation means diverting senior engineers away from business features. However, neglecting this leads to “institutionalized lying,” where database work is performed inefficiently across various business units without central visibility.

The Continuous Data Strategy: Booking.com uses MySQL for memory-resident reads, effectively eliminating the need for separate caching layers like Redis or Memcached. This allows developers to treat data as a “continuum,” evolving from flexible JSON blobs into structured, normalized databases without switching technology stacks.

Episode Highlights

00:01:42 — The 400x Growth Phase: Kristian discusses the evolution of Booking.com from a small travel agency to a tech giant.

00:04:52 — The Persistence Problem: Why the database department carries the heaviest burden by managing the state that stateless applications offload.

00:07:34 — Defining Automation: Why a team of 12 people is able to manage thousands of instances by replacing manual “management” with code.

00:10:26 — The Operational Checklist: A deep dive into the requirements of enterprise DBaaS, including cyclic instance remaking and automated restore tests.

00:27:24 — Killing the Caching Layer: How Booking.com avoids the complexity of Redis and Memcached by optimizing MySQL to run from memory.

00:31:16 — The Maturity Gap in Cloud Services: A critique of why public cloud DBaaS offerings often provide only 30% of the automation required by large-scale enterprises.

00:54:54 — Innovation Coins and Staffing: The strategic trade-offs between using senior talent for infrastructure versus business-level innovation.

Here’s the full transcript:

Vinay Joosery: Hello and welcome to this episode of Sovereign DBaaS Decoded. I’m Vinay Joosery, CEO of Severalnines, and this episode is brought to you by Severalnines. We build enterprise automation software to orchestrate your high availability open source database operations in any environment while maintaining total control. So our guest today is Kristian Köhntopp. Welcome, Kristian. Thank you for joining us.

Kristian Köhntopp: Yeah, thanks for having me.

Vinay Joosery: So this is pretty exciting. You know, I’ve seen your rooms back in the days when MySQL had conferences. They were always full and they were pretty exciting. And you were never afraid to speak your mind.

Kristian Köhntopp: Yeah, German is German, I think.

Vinay Joosery: Yes, yes. So, Kristian, can you tell us a bit about your background and your current role at Booking?

Kristian Köhntopp: Well, once upon a time, I worked for MySQL AB. And that time I had a recurring customer, a small travel agency in Amsterdam. And when the MySQL was bought by a big scary enterprise, I didn’t like the prospect of having 35,000 colleagues. So I signed on with that small travel agency, which is now a big scary enterprise for some reason. But that is, let me count, 16 years later or something. I have been there until 2014. That was a growth phase of 400x approximately in size and people and equipment, and a total number of databases. Then it became hard to do all of that living in Berlin, working remotely, flying over every other week. So I signed on with a German company that’s an open stack. We built an open stack from scratch for two years. Then Booking came back and made me an offer that I could not really reject. And well, I started again, but not in databases. That was my condition. And I moved to the Netherlands. That was their condition.

I built data centers, selected hardware, automated hardware provisioning helped with that, did sizing, things like that for three years, then the database principle left and data center land went too. And because I knew a thing about databases, in 2019, three years ago, I got dragged back into database land. And I’m now dealing with database automation, provisioning a few thousand databases and a few hundred replication hierarchies and things like that. I’m also dealing with a cloud migration. We will see how long that takes and how complete it will be. There are a lot of factors that point in both directions at once. So right now it’s very confusing.

Vinay Joosery: I like that, with the factors pointing both directions at once. So, you know, in Europe, Booking.com is a huge name, and I guess globally it is a massive name. It’s a tech giant. Can you give us an idea of the scale of the tech infrastructure?

Kristian Köhntopp: Well, I can’t tell absolute numbers, but let’s say it like this. There are 2 million Xeon CPUs made every year. 85% go to less than 10 companies. Of the remaining 15%, 12% go to about 150 customers. And we are one of them, which means that we talk directly to Intel for hardware purchases, or did until COVID stroke, and then things changed a lot again. And how it now continues is, again, complicated, but yes, especially for the travel sector. But on the other hand, this year has been a lot better than the two before. I think I can just point to the numbers we officially provided without spoiling any secrets for that. So overall, the company is rather happy with how this year went.

Vinay Joosery: Well, that’s a lot of servers and I guess that’s, you know, thousands of databases being automated. So that’s great. So Kristian, first thing, why are databases hard to manage?

Kristian Köhntopp: So when you talk to developers that write web applications, what we have is basically a web. You could say that booking for a living replaces variables in templates with their values and the values come out of databases. We do string copy at scale, like any web job. Then these web site front-end site developers, application developers, they push all the state, the contents of the variables. The things that need to live longer than a web request, they push that. At backend systems that do persistence and for anything that is structured and needs to be consistent. That’s a database, and the booking, that’s a MySQL. That is also why we have so many. So you can go and kill any web server and make it new. Nothing is ever lost. Because all of that state, all of these variable values, they are kept in MySQL systems.

And that also means that all the hard work like managing state without upgrading data, this is changing replication hierarchies without losing data and without interrupting reads and writes at all ever, like over the last 25 years. That is the job of the database department. And if they make a mistake in the stateless systems, they just roll out again and they get new instances. If I make a mistake of the evil kind, then I drop data. And even if I correct that and make the instance new, data on the volume isn’t coming back. So I better have a copy. And for MySQL, replication is not just enough to have a copy. You also need to have an uninterrupted replication timeline. That means that the bin logs all needs to be continuous clicking it together because otherwise in between there’s a gap.

And that is not just transactions lost, it’s also compliance, finance, and whatever liability problem. And it also means in our scenario that somebody will leap under a bridge because the booking got lost. And that probably was the vacation. So they will not think of us very nicely. So it is our job to make sure that this never happens. None of these things. At scale, we have a few thousand instances. So yeah, it doesn’t have to be, not just has to be always available for reads and for writes, neither lose data, never lose the threat of the binlogged application going on. It also needs to do that at sufficient capacity for a few million web pages served per second. That’s the job. Yeah. And that is somehow hard.

Vinay Joosery: Yeah. Yeah. The fact that you can’t really lose the data and everything has to just work pretty much. It has to be there.

Kristian Köhntopp: Yeah. Yeah. And I have to change all of this version upgrades, capacity changes, data correction, everything. I have to do that while maintaining availability at sufficient capacity, at least in the customer-facing systems.

Vinay Joosery: Yeah. And that’s one thing, right? I mean, when we talk about why are databases hard to manage, manage, what do we actually mean by manage? Because manage means different things for different people.

Kristian Köhntopp: Yeah, that’s a stupid word. I’m not using that. I say automation. What I want is, well, I have to provide these several hundred replication hierarchies in a few thousand servers. I have to provide that with a team. I think we are about between 2,000 and 3,000 people. So let’s say it’s 15 or maybe 18, depending on if you count management overhead into this, or people like me, I’m also doing not much operation. So I’m also kind of management overhead. And then it’s like 10, 12 people maybe running all of these databases, providing the service. And of course, nobody ever can log into a database to look things up. And that means that we have to write code basically to make that.

That’s what we did. We wrote our automation ourselves accidentally, because when we needed it, there was none. It started with a collection of shared scripts that got bundled into an RPM and then rolled out to every database server. It then became a command line Python program with subcommands like Git, called DBA or b.admin, badmin, because it was also very bad code at the beginning that did this, and it’s now becoming a web service that we use, so we are no longer root on the database machine. It’s the agent that is running. And then we can also open the API to application business units that then no longer ask us. They are going to break their legs by asking the automation directly.

And then the automation does whatever they ask, even if it is one of the less smart things they ask for. But that is an education problem that also needs to be addressed. But basically, when we’re finished with this project, we will have handed over all of the databases to the ABUs they can do whatever they want and we no longer care or where we do, but at a different level. On the one hand, we operate the infrastructure. On the other hand, we consign for them and tell them, or we care for them after, during, and before they break the database. The smart ones ask us before they break the database, but there’s all kinds of people.

Vinay Joosery: And I guess your automation, I mean, it’s everything around deploying of clusters, managing failovers, backup and restore.

Kristian Köhntopp: I made a list somewhere. Let me check. We make instances automatically. from bare metal in this case, also from OpenStack now, and soon also from AWS. We provide the instances with a database version and with data automatically. And we put the instances automatically into the replication hierarchy they belong to. We automatically measure replication capacity and load, and also catch-up speed because we have test instances that we stop for an hour and then how fast they catch up. We try to measure in a non-intrusive way the working set size so that we can advise people about the correct instance size that is really hard to do with current MySQL and currently doubling a bit in the server source to make that easier. There are schema changes that are handled automatically that means that all the compliance is done automatically and the schema change is then heuristically done depending on the database with an alter table or with Percona Toolkit and an online schema change.

But in the end, the developer doesn’t have to care. And the automation also fills in the compliance tickets so that we automatically ask the data owner if they approve. We make sure that the people asking for the change is not the person acknowledging the change. And we also match the business requirement ticket with the change ticket so that the compliance people are happy that this is not a random change, but there is a business reason for that. We do restore tests. And in order to pass a restore test, we have to have current backups. But the backups cost the values in the restore. So I say we have restore tests and that applies to backup. If you think about that any differently, you’re going to fail.

There’s a MySQL specific problem. We monitor primary key exhaustion when the key is defined with an auto increment. We remake all instances cyclically now every 30 days so that we have fresh operating system images that we have no open CVEs for security compliance. And that also means that all the instances have non-fixed IPs. And that means we also manage a discovery mechanism that the application is used to actually find an instance they can write to and that they get assigned a random instance that is close to them where they can read from.

And we also hand out secrets to applications and they’re rotated automatically. So we make every few days new users and new passwords for the application. We let them use the old password, monitor if the old password is still used and the old account. And once the old account is idle sufficiently long, we kill the old account. If it’s not idle for too long, we alert the application business unit that they have done things wrongly and need to check the application, why is the old handle still used. And that way, all things are completely liquid, if you will. Because the instances are fresh, even if they are several terabytes big. It’s the right amount of capacity.

We manage the load on the databases. We make more if necessary. We also provide a lot of metrics. So we collect what the database does and hand out standard Grafana panels and VividCortex panels to application business units. And we refer to these panels in our training that we also provide so that everybody is on the same page and has the same screens when they look at problems. And that means that we also speak the same. And that then works. All we need was only 12 people. Yeah. On well, 2019, we’ve decidedly above 10,000 instances.

But with COVID, we have done a big contraction. And we also had basically a year that we could mostly spend on optimization because there was not much business. There was no travel, so we didn’t have much business. And that means we are now a lot smaller, but it is still… Well, the number of replication hierarchies only grows. We shrunk. We come from a single schema. There was one single schema in 2006 BP. Everything was in there, every single table. It was 40 gigabytes of data with replication. That was the entire company. It is sufficient to say that it’s now a lot more. And part of that is new functionality. A lot of that is also identifying functionally dependent tables that model one aspect of the business. And as everything grew, we have basically split that off into their own replication hierarchies. So far a very long time, you could still tell one user story, one service if you so wish.

With talking to only one set of database handles, read and write handle, we still have a few of these shared ownership databases. Multiple things, multiplications talk to a single database. There are a compliance problem because you have data owners per table, not data owners per schema or per instance. So we try to get rid of that. And getting rid of that means, of course, you get more schemas, and when the schemas are backed by their own replication. You also get more replication hierarchies. They are often small, much smaller than the single standard piece of hardware. That is where the OpenStack enters the picture because we can now buy a lot bigger hardware and then slice and dice that into bite-sized pieces.

But if you run on pure bare metal, you have basically all instances the same size so that you can leverage economies of scale. And then you have all this shared stuff. That is okay if you are a smaller company, but once you are at the highest PCI level or even ask for a banking license, if you’re under SOX control, if you are under GDPR regime, often higher, then it is a lot easier if you just pull these things apart and put them into independently controlled and managed entities, their own replication hierarchies, because that simplifies a lot of the governance. That suddenly becomes a lot harsher than it is for smaller shops.

We have been growing into this since 2006 slowly. So it was still a bumpy ride, but manageable. There have been two or three big breakpoints where we basically changed classes, liabilities, also one where we did not actually shine. And that was always then leading up to a big compliance pool and also much harsher, more rigidly defined regimes for managing all of these things. When I finally get rid of the last 18 root users on the box, the last DBA accounts on databases, so that there are only machine accounts left, that will be something to celebrate.

Right now we still have DBA accounts on the boxes, only DBAs. So out of the several 10,000 people, there is like a dozen or two that actually can log into the box, but it’s too many people and too many boxes for the person that has to go. These accounts are hardly used because of the automation. If they are used, they’re also very strictly monitored. There’s a full recording, smile, you’re on camera, but they have to be there for emergency maintenance. And basically I can make a ticket. Everybody, somebody logs into a box for whatever reason, it’s either they look something up, so the monitoring is broken. Or they actually make a change, so the automation is broken. So if you actually use SSA, that’s a ticket-worthy event in my book, in my environment.

Vinay Joosery: Yeah. Yeah.

Kristian Köhntopp: It always happens, of course, but it is the vision, the idea to strive for, to never use SSA.

Vinay Joosery: Yeah. So management has… that’s a long list of things you have there from lots of deployment.

Kristian Köhntopp: Databases are complicated things.

Vinay Joosery: Yeah, yeah, yeah.

Kristian Köhntopp: So, yes.

Vinay Joosery: Yeah.

Kristian Köhntopp: All the complexity that the web developer of a stateless application offloads to the state storage, it ends up in our department, together with the requirement to never be unbroke. So I have redundancy, I have capacity, I have seamless procedures for everything. And I have this demand, because I use replication, of the history of the database to be unbroke across the split, across version changes, across upgrades. across capacity changes, across restores every time. So I have an unbroken history. If I didn’t have to throw away the bin logs, I have an unbroken history that goes back 25 years that basically documents every data change that happened in the company. Of course, I haven’t. I have about enough storage to do that for seven days or so. That history still clicked. If I could store it, it would still lead to the images of the databases I have today. And I can prove that.

Vinay Joosery: Yeah. So, Kristian, the cloud has been around for like 15 years. We’ve seen the rise of database services. I understand that Booking.com has built a lot of automation. But looking today at the cloud and what it offers, why not use the cloud?

Kristian Köhntopp: Well, we do. Every enterprise does all of the things all the time. So for everything that I say booking does, the opposite is also true somewhere in the company. You can’t be that large and not be in that situation. There are a lot of pointers that direct enterprises to the cloud. If you are a platform and need to interface with others, if you need to execute other people’s code in your own data center, that’s a very hard thing to do because you basically become a hoster. With hoster level security, so you have code from a completely different administrative domain running in your environment. And that’s all kinds of security and liability and compliance nightmares.

If you do that in the cloud, it’s Amazon’s problem, not yours. I mean, you’re a customer there, they are a customer there. And Amazon’s job, the only job they really have is to separate these domains and then provide controlled interfaces where you interact with these other entities. So that is the easiest thing. Buying and running hardware for your own private data center is also becoming harder because so we have a standard blade that is like the smallest kind of machine that you can reasonably buy as a piece of hardware. It has 32 threads. It has 128 gigs of memory and it has two or four terabyte of NVMe storage and a 10 gig network interface. And that’s a standalone piece of hardware. In Amazon terms, that’s an M5 8XL maybe, but that has remote storage or an i4 4XL or 8XL approximately. That’s a bit low on the CPU side, but with others, it might buy its match to description.

And for a lot of the newer split of location hierarchies and applications, that’s already way too large. And this kind of hardware goes out of service. It’s no longer being made. So what I would like to do is I would like to have a way to take a really big computer and slice and dice it into the appropriate sizes, and that means I either need virtualization, OpenStack. That’s another thing with a certain complexity in my data center. Or Kubernetes, which is another thing with a different kind of complexity. Again, in my data center, and I need the people there to run it, or I need to buy this as an external service so that somebody else manages that for me. Then I can buy machines that don’t run on $400 CPUs. Two of them run on $2,000 CPUs. They have about six times to eight times the capacity. And I can slice and dice them into, say, 6 to 12 to maybe 20 pieces that run databases, except that I wouldn’t want to put 20 databases on a single piece of hardware.

So if I’m so small that I actually fit on a single piece of hardware, I still want three at least so that I have also hardware redundancy. And that means I also have to have a certain minimum size to qualify for running my own hardware. And then I need data center space and hardware, that’s actually not the problem. I mean, if you buy an Amazon reserved instances for three years or buy a hardware and write that off over three or five years, there is no difference. That is OPEX or CAPEX, this is just a classification, but I have a commitment for three to five years on that. And for the business spent, it makes no difference.

But I also need people that understand these things. They are hard to find. Hiring a DBA is really hard. Hiring a DBA that also knows Python and Go is close to impossible. Hiring a Python and Go developer that understands databases and understands OpenStack or Kubernetes, that’s also a tall order. So we’re also looking at a lot of education and at least one senior person that knows how things are supposed to be done and what capabilities you need to have as an enterprise in order to pull that off. Yeah.

Or you need to choose products and partners externally. That’s your make and buy decision then. And then you’re not pulling it off alone. Support, that’s mostly a trust question. Externally that then works with you to develop these capabilities. And that also means that you don’t leave it with that external support. But you also strive to build the capability internally. Even if you do not exercise that, you need to be able to have a conversation with these external people about what they do. And that means there must be at least one person, if not three, inside your company that understand these things at least at the level that they can have a meaningful conversation with that external supplier. Because otherwise, they are capable of pulling you around without any controls. Over time, that is destined to go wrong. So you need to be able to still know all these things, even if somebody else is doing them for you.

And then you might as well try to actually have the capability of doing them and then decide if you actually want to do them or not. I mean, that is what infrastructure is about, right? When I talk to my management as an infrastructure person, they always say things about caring about the customer and understanding what they want. And as an infrastructure person, I can say that is entirely wrong. I need to provide you with certain capabilities, the ability to go into different directions, even if you choose to not go into all of them. At least I have to provide you with the means to make a meaningful decision so that you have the capability to go somewhere. And then I can, on my end, specialize and with the next step, give you new opportunities to further refine that or whatever.

But the infrastructure I build, it has to be able to run whatever the business wants tomorrow. And I can’t know that, what they want tomorrow. So when I, as an infrastructure developer, start to care about the customer, I’m actually partially doing it wrong because I need to be building infrastructure where I don’t care about what the customer actually is about to do. It always has to work. That is also not entirely true. The opposite is also true. I have, of course, to care about what the customer wants, how they expect it to be presented, and what the requirements are that legal liabilities compliance and so on that come in. So I have to be within all of these constraints, but I still can’t care too much about what the customer does because that changes. So I have to provide opportunities, spaces, so that they can actually make decisions that are not forced into a certain decision by technical depth. And that is then, I give freedom to management to actually move the business into a direction. That’s what the infrastructure does. I don’t know if that makes sense.

Vinay Joosery: No, it does, it does. I mean, I think definitely, you know, what you said was, in a way, cloud is also used. Public cloud is also used at some parts. But the big portion is actually, you know, stuff that you’ve built yourself in a way, the company has built. But I’m just thinking, you know, if we look at the rise of the cloud, you know, how database operations have evolved, if you look at the database market, I mean, the fastest growing segment is the public database as a service, services, right? So just wondering, you know, in a way, I guess you guys must have looked at that and said, well, maybe for some things, but maybe not for the core stuff.

Kristian Köhntopp: So for a very long time, we actually didn’t. Booking in 2002 to 2004 or 2005, just before I arrived, was a Postgres shop. And it ran on a single database using this trigger-based, I think it was Slony application back then. But that was extremely fragile. It also was extremely synchronous. It didn’t have any potential for growth. And the entire business was, long before I arrived, moved live from Postgres to MySQL and then used MySQL application to grow. And that is why as the business, we have so much MySQL and why we are so application focused. We have used that for scale out.

We at some point in time also discovered that we don’t need Redis or Memcached. I mean, a hardware blade is a hardware blade. It always costs the same no matter what runs on it. And when MySQL does, Memcached does a primary key lookup for a single row from memory, not from disk. It takes, I measured about 150 microseconds. 0.15 milliseconds or something. And that was fast enough. Memcached might be faster, but I don’t care. Or the applications didn’t care. So we eliminated basically all caching layers. And that means that we run all customer facing reads from MySQL from memory. And use replication to feed these caches, use the restart mechanisms for saving the buffer pool page numbers, and reloading them on start to never be called in the caches after restart. Memcached would be cold. We know because we have been in that situation. I can persist writes. Memcached cannot do that.

I still have to do a separate database write. And I can smoothly migrate from a single get to a multi-get, a wherein clause, to a join or arbitrarily complicated queries, depending on what the application requires. So for the application developer, it’s no longer I talk to the cache or talk to the database. It’s a continuum in expressiveness as well. And that gives me, as an application developer, also a lot of flexibility. And that is how we ended up with the stage until 2014. 2014, 2016, I wasn’t there. I was elsewhere. And then we get the enterprise booking after a return, after 2016, maybe 2018, where you suddenly actually have multiple database products and also polyglot persistence with Cassandra, with Elasticsearch, and with all the other things in there. And with commercial products coming in that require Postgres as well, and things like that. And a much, much wider base for the thing that is now called application data service. That is everything that persists, except Dupe, which is big enough to justify its own department data track.

But that is how you arrive at the current stage. But basically, the need for scale in that rapid growth phase, 400x, that I personally went through with the entire, that guided that design. And that also was… that was not planning. That was just making changes in a way that it doesn’t interrupt business and doesn’t hinder growth. So the challenge was to stay alive and not… there was not much. Well, there was a lot of architecture done, but not in the way we do architecture today, but more like in how good can I get a solution that does the job in time, like before we die from growth. Yes, so how can I get to the next stage of necessity in time and then also make it pretty, not taking on too much time. That was entirely secondary. First, you have to provide the service and then you can try to think about making it pretty.

And that is very different from what we do after 2016, where we are in a slower growth phase. Already being a market leader means there’s not much market left unless you grow the market. And it also is of course an entirely different in terms of what you have to do in terms of paperwork, spend control, compliance requirements, when you make changes to systems. So you need a lot more planning. It’s the same deliverable, provide a new system, but it’s done entirely different. That is always happening. You work when you solve the same problem, different scale.

And of course, there are solutions that you simply outgrow. There are things that basically creates pick lists, for example, for support or for translators or things like that. And they have been done, I think, or redone every power of 10. So I have personally witnessed this being redone four times over because there was like a factor of 1,000 or 10,000 growth from a very simple and naive select data from the database, create the most burning topics and create the top 10 and offer them to somebody who wants to do the work to something that does that in parallel. Add scale, enlarged, multi-threaded, I don’t know what.

And while the delivery is very simple and always the same, 10 tasks that you could do that are urgent and that match your skills. The way it’s done is entirely different. And there’s a lot of subsystems in the company that have been going through that. So whenever you change your power of 10, the deliverables stay the same. The solution is entirely different.

Vinay Joosery: But with all this rapid growth and all these changes, I mean, we see, for example, the cloud being a way to actually go faster, because you have services that you can just use. So you don’t need to even buy hardware. You don’t need to… You don’t need to figure out, okay, how am I going to manage X other number of services and databases?

Kristian Köhntopp: So adding capacity never was the problem. I mean, if you look at these cloud examples, the hello world of infrastructure automation is start a WordPress. And then you scale the WordPress and you do that by adding more front ends. But they all talk to the single MySQL backend of the WordPress. And that means that you are not actually scaling up. There is an absolute scalability limit that this thing can run, which is whatever the read capacity and the write capacity of your database is and the storage underneath. And when you run it not at that level, you have been scaling down to the capacity you now need. And you’re not scaling up. You’re scaling down from this absolute scalability limit, which is a big wall. down to what you need and you can then scale up again to that limit. But if you want to go past that, you need a very different architecture, something that is much more asynchronous, that works in the background.

Again, all of this is a journey. It was very different in the beginning and as we grew, additional requirements had to be onboarded. And there was change of architecture for growth. There was also change of architecture. Because of requirements evolution. That’s another discussion that I sometimes have with bright people. If you talk to Honeycomb, for example, Charity Majors, I have an ongoing discussion with her about schema evolution. For her, it’s metric schema. What data do I collect and then keep a history of, but then there is schema evolution, I collect more metrics and I somehow need to provide the old metrics and the new metrics, but when I go back in time. Certain things are missing.

So that’s like a data warehouse problem where I upgrade the facts that I collect. And as I go back into the past, there are more and more facts missing. And we have the same in database land where we have schema evolution in various ways are running schema changes. The developers just require new tables, new indexes, new columns, dropped columns, but we never dropped columns, we rename them. And then dropped them a few weeks later automatically, of course. But there’s also schema evolution that developers do. They start with the JSON table. Yeah. There is an ID and there is a big blob of the MySQL JSON type. There is a check constraint that then gradually becomes more complicated because it requires certain attributes in the JSON to be present.

If you require these attributes to be present and to have a certain type, you can define virtual columns. You can materialize the virtual columns and extract the data from the JSON. So these now defined things are no longer in that formless bag in the JSON. And in the end, you end up with something that actually looks like a normalized database. But it was a conversation that you had between the DBA as a development consultant and the developers exploring the business requirements and the business people looking at the data and asking for new things, seeing what data is there, discovering patterns and asking for changes again. So you have this no SQL schema-less thing. Nothing is ever schema-less.

But fluid schema JSON bag that then gradually evolves into something much more structured, 3NF, and using MySQL and MySQL schema changes and MySQL check constraints and JSON data types as a tool to instrument and facilitate that journey without actually being offline. I mean, we earn money along the way all of the time because there’s the business running and the business evolves and the business understands now better which data is necessary to model the things that they want to perform as a business process. And as that happens, we solidify the schema, get more strict there as well. And we have the tools to do that as well.

That is of course only possible if you get the operations again out of the way. That is a thing that needs to be there. That needs to be a given. I need more capacity. Fine, you get more capacity. I need a restore. Yeah, we have backup, we have tested restores. All of these things needs to be there. Only then you can as a DBA department, have a meaningful conversation with the ABUs, as the application business units over business processes and schema evolution and data modeling and performance and other aspects of the entire thing. Because if you don’t have solid operations, there’s no business, and then you can’t have that conversation. How would you finance that?

Vinay Joosery: So, Kristian, using the cloud, can you get rid of the operation spot? Because then, you know, you as a DBA, then you can bring more value to these application business units.

Kristian Köhntopp: So the only cloud that I know a bit is Amazon. I don’t know about others. And the only other cloud that I know is Trove, the non-working database service of OpenStack. And they both provide a certain automation. Basically, they make you an instance. And that is very dense. They also provide you a way to make snapshots that count as backups. But that’s not a restore test. They also provide you with static replication hierarchies no way as liquid as we do. There are IP addresses or hostnames hard-coded. There is not a formal AWS-provided discovery mechanism. There is a bit of the proxy, which is a bit experimental, but which is not there primarily for discovery, but for other things somehow, like providing read-write splits for people who are too stupid to code that into the application, or such things. So this is all like 30% there from where I met at work. It’s not even 50%.

The other big flaw is that it is a service that has hardly evolved since 2010, RDS specifically. It provides file system-based replication and resilience for arbitrary databases. So it works the same. as the implementation for MS SQL, for DB2, for Blue Oracle, MySQL, MariaDB, Postgres, because it does shenanigans at the disk level with, I believe it is a badly modified or a large, very largely modified DRBD that does these things that talks to EBS and that talks to S3 in the backend instead to a hardware, ISCSI interfaces or things like that. And that’s one way to do that. We did the same in 2010. when we run databases on NetAppFile as coming off DRPD.

But with the advent of global transaction IDs in MySQL, for example, and with the advent of Shlomi Noach’s Orchestrator, which is an awesome tool, we actually have moved to unrated local storage. First disks, now it’s NVMe. So we store one copy physically for each database instance. But of course, we have replicas for read scale out. And with GTID, we can promote any of them that has a proper bin log to a primary and hang the rest of the replication hierarchy of that. And Orchestrator does that for us. So we basically take web frontend blades instead of special database machines with the memory and a local NVMe. And we use them as databases successfully, which makes these things incredibly cheap. I pay 120 Euros for a small blade per month and 150 for a big blade. So that is like an M5 4XL and 8XL SEC2. And then I put my own service on top of that. But at the price that is, I think it’s six to eight times cheaper than the list price. I have to check. Yeah.

And it’s always cost the same. Like an EC2 instance, I can do anything with it. I can put a MS SQL on it or MySQL, and I have that choice and I can get away with a MySQL doing the work of a MS SQL. I, of course, take the MySQL because it provides me much more opportunity and I can eliminate the caching layer completely. So it also gives me architectural opportunities here that that are unique to being able to run your own stuff in a bare metal or virtual machine. So yes, you can use services in the cloud if they fit the bill and if they actually complete the task. With RDS, for example, you get maintenance windows that you must schedule, but there is no mechanism that automatically fails you over in a transparent way to a failover instance. So that is still a thing that you have to somehow code or accommodate. when the RDS instance goes out for a maintenance window. And it is supposedly possible to automate that with the message queues that announce these things.

And then you have those functions that consume these messages and reconfigure your AWS environment. To take the instance that needs maintenance out, you schedule the maintenance window automatically. And all the while you talk to another copy of the database to maintain business. But for some reason, RDS doesn’t provide you with that. A lot of people have probably automated that badly, halfway, one way or the other. And Amazon also doesn’t provide you with an open source repository that does that for you. When you talk to them, they’re always very proud of how much they contribute to open source and how many projects they forked off other open source projects because they suddenly changed the license or something like that. But if you look at this, the development to these projects is not actually done by Amazon, they’re done by other people into Amazon’s repositories.

And if you look at what Amazon does and provides code that actually would improve their own infrastructure, there is nothing. Yeah, if I ask Amazon, do you have any step functions, integrations that automate RDS even more, that they can then fork and customize for my own operation model for my own enterprise? There’s nothing. I know because I have asked and there was crickets. I’ve asked repeatedly in different levels of the hierarchy and there was always crickets. And I wonder why that is, because if I were running such a company, well, I was running in such a company in these two years when I did the Opus.

One of the things that we immediately did was providing a bunch of repositories that show people how to automate even more on top of the automation that we already provide. But Amazon doesn’t do that. And I don’t know why. I don’t understand why. So everybody starts from scratch with RDS, a service that does 30% of what we are currently doing. With our database service, for example, that’s quite frustrating. It’s also not really helpful. Yeah. I am in an environment that looks more at cloud now because it needs to interface and platform, and I’m supposed to run databases in the cloud. I’m supposed to use services. And I expect Amazon to actually have solved that. That’s a service that’s over a decade old. It’s very mature. But there is no such thing. And nobody can explain to me why. That’s really weird.

Vinay Joosery: Well, I think, I mean, as you said, it is a mature service. Well, RDS is. So from that perspective, it solves certain problems. But yeah, I mean, the flexibility is not there. And I guess the sort of automation is not there to fit the requirements.

Kristian Köhntopp: Amazon has handbooks. They say stuff about well-architectured, how to run things operationally, but they don’t seem to have that for RDS, or I have overlooked that so far. But if I gave you the list before, what I expect from database automation, what are the problems that I have to solve? They don’t have that automation because they never wrote that book for databases. Supposedly, they’re not using a lot of SQL databases in-house. And I would say, if that is true, it shows. But because there is no playbook from Amazon about how to run a database successfully, and because RDS is database agnostic, it solves you the day one problem, make me an instance. And it gives you a few forced, half-assed attempts at solving. certain day two problems somewhat, but actually running a database completely so that I as a DBA can actually have the business conversation with the ABU. That’s not there. That’s not there at all. That’s very, very far away.

And that is when you look at other services, that is then when we looked around like 2010, we had four people and we had a very rapidly growing number of databases. That is when we started to take our shell scripts. put them into the trash can and converted them into this Python thingy that then grew on us. And when it was called badmin for a reason, it was not pretty. Some people, Python over there, they were DBA, they knew everything about databases and knew everything about coding. Other people knew more about coding, but not a lot about databases. And so it was funny then.

And now it’s a thing that is growing rest interface that is growing rootless database administration. SSH-free database administration. And that then has a box for operations. These are the things that we need to do. This is how we do them. These are the error conditions. These are the known bad interactions. If you do an instance change here, while there is another change already running over there, then maybe you should not start that even. This needs to be locked because if you make a schema change to that table while at the same time you do storage manipulations underneath, you don’t have the IOPS to do that. All of these things are learned from experience and they are all coded somewhere into badmin. Not always perfectly modular. This is a very, very grown and then badly refactored architecture as well. So we are not going to share that with anybody because, yeah, it’s called badmin for a reason. But it’s also b.admin. Yeah, the Booking way of doing database operations.

And that packages these things. And that is why we can run so many instances with so few people. And we are easily the department with the most outages, but we are also by far the department with the most complex and the most instances, the most complex environment and the most instances. So, yeah. And we can show like how we are discovering new mistakes, how we are not making the same mistakes again, because we then put that into code, we test against that, and we make sure that we always fail in new and interesting ways. and discover new edge cases. Learning databases is not easy. I would never claim that. But you can become pretty good at that if you find a way to learn and put that learning somehow in a place that survives individual contributors. That would be code. And documentation.

Vinay Joosery: So, Kristian, you mentioned something that was interesting, right? I mean, well, going back to the discussion around using services from the cloud providers, well, they automate very little compared to your requirements, because you gave a long list of requirements at the beginning. These are the sort of things you expect to automate from an enterprise perspective. And then you also mentioned something, well, yes, you can do the day one operation, that can be done, you can get instances, but then you need to actually take care of the day two operations. And that’s interesting because in a way, the cloud is using the services, you’re supposed to be outsourcing the problem, right? But I mean, you know, we still see these DevOps teams, you know, you need to have DevOps teams to actually take care of things. And I guess…

Kristian Köhntopp: It’s not outsourced, right? Yeah? I mean, for Amazon, the sale is made the day you make an instance, day one. Then you have an instance and they get money. And I totally understand that they have a lot less interest to solve the hard problems, which are all the other days. And they would rather sell you a bigger instance, if that works too, make the problem go away, or a standby instance somehow or stuff like that and I would be the last person to say you shouldn’t have a standby instance. If you have MySQL and you don’t have one replica more than you need at all times, you’re holding it wrong. And this is this is how MySQL works. You always have one replica more than you need or two so that your clone sources that you have ready hot additional capacity that you have something to play with that you can wreck to try something out, but that is also not how RDS works. That is this DRBD based file system replication. And then there’s a bit non-GTID based, so very outdated static replication set up on top. But if you actually want a modern replication, I think group replication is not even available. GTID based replication can be turned on, but it’s not there. Trying to run Orchestrator against RDS is maybe possible, but they don’t do that. And you can’t just click it. Yeah.

So that is also not a thing. And they of course limit you to manage replicas like five per layer. And then there’s two layers. So this is all geared at tiny things, decidedly not enterprises. But if you are a little web shop that needs a database instance in front of a shopware or something, RDS is for you. Very much so. It solves your problem. You have a database. You have backups. Maybe you care to test them. This would be your job. And Amazon then says, yeah, well, that is this shared responsibility model. Yeah, we do the things we do. And all the other things are still your problem. That’s why you need a DevOps team. So it’s your job to test your restores.

But why is there not a restore service that tells me every day that the backup I made actually restores and then connects as a replica and catches up? And then I throw the replica away the moment it catches up because I have proven that the bin lock connects. Yeah, that my history of changes is unbroken. That would be the consistency check for a backup because that is what I want from a backup. I want to make a new instance that connects from a backup. That’s my restore test. Yeah. But I have to write that service. Everybody needs it. Nobody needs backup. Everybody needs restore, but there is no restores as a service. There is no metric that tells me what the ideal instance size is. How big is my working set? How many InnoDB pages did I read and write in the last hour as a sliding window? Plus the overhead from the instance. That would be my minimum instance size to be able to run from memory.

And then I have my actual instance size, which is larger, so I can go to a smaller one, or which is smaller. And I can see then that that means that I have a hit rate of that much. And with the next larger instance size, I would have a much higher hit rate. And that would speed me up. My P99 or P95 could be estimated with that number, but this number is not collected. It is not even collected by MySQL, which is a defect in the product at the BlueOracle level. But I mean, open source giant Amazon could easily modify the MySQL that they’re running, which they already have modified even more, collect these numbers and make them available and then provide an instance sizing service.

But they don’t do that. I have to patch my MySQLs myself and write C++ code to collect these numbers so that I get good guesstimates about how big is my working set and how much memory do I actually need to survive, and all of these other things. So this is work done that is good enough to make the sale. It’s not actually work done that is good enough to simplify my DevOps. The metrics I need are not there. And they don’t even understand the problem. The services I need are only partially there. They don’t think it to the end. The services that are there could be grown with the tools that they have. because the message notifications, date changes from the service and the step function, everything is there.

But they have no repos that use their own tools to the advantage of their customers and provide unified, uniform automation. They could easily do that, but it hasn’t happened. Over 10 years not. So I’m impressed by Amazon. They do operations pretty well, much better than most people in the world. You could think of them as an operations company. I’m also not impressed because what they deliver, at least at the database side, it’s incredibly incomplete and also outdated. So I have a lot of criticism. I think I can reason, okay, why it’s actually valid. I’m still impressed by how far they have come, but they have been very, very lazy for a very long time.

Vinay Joosery: Yeah, yeah. So you mentioned DevOps there, right? And we’re talking about database management. And we see… You mentioned also DBAs are hard to… it’s very hard to find a good DBA. And then if you’re in a polyglot persistence model where you have multiple databases, then you’re in trouble because you’re looking for, I don’t know, Superman, to know all this…

Kristian Köhntopp: Well, that is not a single person, obviously. And that is rapidly becoming a staffing problem. There is this concept of innovation coins. These are imaginary coins that you have as a startup. You have maybe three of them. And that means you should be boring. You should be using, try it and test it. technology to achieve what you want to do as a startup. And you should not, as a startup that explores a new business model, also build new technology that would be spending too many innovation coins at once and you don’t have that. If you think of innovation coins being materialized in very senior principal staff engineer level persons, each of them represents one or maybe a half.

Then that means that for every innovation coin that you want to spend, you are keeping one or two very senior people busy. And if that is not with business level stuff as a startup, then you’re basically dead in the water before you even have a business. And that is why you need to be very, very careful with these innovation coins. You should be choosing boring technology. There’s nothing wrong with MySQL. There’s nothing wrong with PHP. There’s nothing wrong with Postgres. There’s nothing wrong with Python. Or even with Ruby and Mastodon, that is not the bottleneck. Even if it’s very slow, the bottleneck is somewhere else usually. And that means that you need to be very careful in the stack that you’re building. It is almost not quite arbitrary what stack you choose. There are saner and less sane choices, of course, but I’m not going there.

It’s important that you actually have people for each piece of the stack that you choose that understand. how things actually work. Computer science is zeros and ones. It never gets any more complicated than that. But computer science piles abstractions on top of each other. And these abstractions are leaky and the behavioral changes are nonlinear. So you do something somewhere way up in the stack. Oracle, Blue Oracle, MySQL changes and improves the optimizer in MySQL. And the algorithm is now much better and everything becomes slower. And that is because the total code of the optimizer that is being run suddenly did no longer fit into the L2 cache of the CPU. The inner loop was too big. So you were running from a much, much slower caching and memory level of the CPU. And it doesn’t matter how much better you make your code, because suddenly it does not fit anymore for fast execution.

So you’re running a lot slower. And so they had to actually make this piece of the inner loop smaller again, less code in order to get back to the old performance levels so that they could apply these other optimized applications. So you have two completely unrelated abstraction levels that probably have five or six or seven layers in between, and they interacted in a bad way. And the hard thing in computer science is not the zero and ones. This is all trivial at every level always. But it is to have these layers of the stack vertically dependent on each other, open in your mind at all times. And to see what you do to the cache, what you do to the machine language, when you write this thing in C or in Python or whatever, and how that changes.

This is hard. And people who understand that at 10, 20, 30 layers deep and actually can tell you immediately by looking at this, yeah, this is because this other thing down below is broken by what you do here. So these two seemingly equivalent solutions are not equivalent at all. at the lowest level of execution. These people try to find them. Good luck. That’s really, really hard. And you do all of this cloud and the services thing also, because you don’t need these people in these areas then. So you free up the coins, but you might as well do that on premises with your own thing. If you have a good partner there as well, and maybe better automation that, that would also work. I mean, that is what works for us. We did it ourselves.

But that is, yeah, there’s been a long journey. So just finding the list of requirements, what is database automation about operationally, was a long process. Like we just did these things and suddenly through external pressure, we were forced to actually put this into an itemized list. What is it that you people do all day long with all your databases? And why are there so many? Yeah? And then we had to write down where we had this caching layer and this is what happened. And this is why it is economically to not have one, just use the MySQL as caches. And this is the things that badmin does. This is what we think database automation is about. And if you move that elsewhere to RDS, to Aurora or whatever, you will have to implement these things. And half of them are Booking specific. Like the way we do compliance tickets, probably nobody else does it that way. Or these compliance requirements are probably unique to our size. And yours is like 100 times smaller. So you don’t have that. And of course, we never built it to be modular. But if you wanted to productize that, it would have to be modular in a way that you can parametrize it and elect to use this feature and not that feature and things like that. But there are interactions.

Vinay Joosery: So you mentioned DevOps earlier and looking at these, you know, day-to-day operations. So many companies are using DevOps to manage databases, right? But DevOps and databases, you know, are these friends or foes?

Kristian Köhntopp: So that is again a wide, wide… Most companies have very few databases. That also means they have a lot lower pressure to automate things. And when they try to automate things, they will find that it is hard. They will also find that it is slow, that data has weight. When I copy data at 400 megabytes a second, then it takes 45 minutes for a terabyte, and then there’s replication catch-up. So what I teach people is that like a terabyte takes an hour, 10 terabyte take eight hours of working day because replication catch up for the big databases, they have a lower churn rate. It’s probably faster. And if your database is bigger than 10 terabyte, it’s not done. Like I don’t make a new instance in a working day. I don’t make a change in a working day. So everything takes time. Yeah. And that means automation also means you iterate if you try to create it because it goes wrong when you try again.

But if you do that with databases, you are basically waiting a lot. And that if you don’t have to automate databases because you have very few of them, then you’re not doing that. That is also why most database automation is so crappy. It is always just done as far as those people have been pressured to do it. And with more size and fewer people, you need more automation. So the pressure is higher. And we are probably pretty far. We are also an environment where the databases are all different because they are having different schemas and different use cases. If you look at Facebook, they have a much larger database environment than we have, like probably by one or two orders of magnitude.

But I believe that they have uniform schemas. So they basically have much less complexity in different use cases, at least for the bulk of that. I bet they also have snowflakes, databases that are different than 95% of the rest. Yeah. I do know that Slack, for example, also uses MySQL Vitess in their case, but they store chats in databases. So the schema is always the same. Maybe it’s split differently using Vitess at different levels. And if I look at the chat that obviously partitions by customer, by channel, or by time, or things like that. And for small customers, you wouldn’t partition. For big customers, you would partition more. These are all obvious dividing lines, if you think about it. But it’s still the same schema. Yeah. But that is not the case in our environment. Yeah. The schema that keeps availability data, the schema that keeps hotel description, the schema that keeps the metadata for images.

These are all different applications with different schemas, different test patterns, different requirements. So we had to… We had a requirement to actually go all the way to support databases automatically with very few people with arbitrary use cases. And that is how we ended up with what we have. And that is also how, despite us being smaller than maybe Slack, for sure smaller than Facebook, we probably went further in what we did with automation. We consciously filtered all requests through a DBA for a long time. because we wanted to protect the ABUs from hurting themselves. And that is how we got away very long with this idea of running the DBA program as root on an arbitrary database machine. And then it calls out to the other instances and performs the changes and not providing a REST interface. We should have been doing that change a lot earlier. And we’re only doing it now.

But on the other hand, until we had the size that we could no longer talk to each customer when they wanted to change. and actually advise them, is this smart? That was good because it prevented a lot of wrong choices by simply forcing people to talk to us. We are now past that stage, and that is the current bottleneck, for example. So, DevOps, yes. Databases, not done a lot because not enough need. Yeah, most people are too small. Most people have few instances, and that also means rare changes. And people that have very many interfaces sometimes get away with simplifications because the instances are all alike. And both things are not true for us.

Vinay Joosery: So looking a little bit at, you know, let’s say in the infrastructure tooling space, you have a lot of these tool sets available. So you’ve, well, you mentioned languages where you’re probably doing things from scratch, but, you know, there’s things like Ansible, Puppet, Salt, Chef, Terraform. I mean, how, you know…

Kristian Köhntopp: Yeah, we use none of them, but we use all of them in the front-end area for the web applications. And they work very well. But of course, none of them are useful in database land. They’re not useful for Elasticsearch either or for Cassandra. Because these tools treat hosts as entities that can be changed independently. Puppet does things to a single host. If you try to set up a ZooKeeper with Puppet, you end up with the problem of knowing if the ZooKeeper cluster you’re working on is already a cluster or are you building a new one. And in that case, you always need to be aware of, do I still have a cluster? Do I still have quorum? Can I take the machine I make a change on out of the cluster to perform the change? So do I maintain quorum if I take this one down to make the change? But there’s not even syntax for this in Puppet. I cannot formulate the check that is necessary in Puppet to express, is it safe to make a change on this node of a Zookeeper cluster without losing write availability, without losing quorum? The same is true for replication hierarchy. I need to make updates from the bottom of the replication hierarchy to the root. Because the leaves always need to be newer, have more features than the replication root. Because otherwise I replicate downstream things that the leaves don’t understand yet. Because they are the old version.

Yeah, but there’s no way in Puppet or in Ansible or in any of these other tools to formulate the rules, the invariants that always need to be true on the replication tree to be functional. I can’t say I need to have in this replication hierarchy, at least 12 nodes to have the capacity and they are a tree and for each member, the following invariant of the version number needs to be true. And that all replicas need to have to have a lower or equal version number than the one I’m trying to change right now. So I have to iterate over the leaves recursively to make sure that this invariant is true and only if it’s true, I can perform the change. There is some orchestration rules here.

There are tools that can do that. SaltStack has facilities for that. They’re very rarely used. Kubernetes operators do that, but they have other limitations. For example, an operator only sees one Kubernetes cluster, but the resilient replication hierarchy goes across multiple clusters, multiple zones or whatever. to be available even in the face of a cluster failure. So I can’t write a Kubernetes operator for MySQL that is actually resilient. I can always write a Kubernetes operator for MySQL, obviously because it has been done, but this is always managing one MySQL, no one MySQL hierarchy in one Kubernetes cluster. So if I have a cluster failure, I have lost the entire… If I want to write something in Kubernetes that is bigger than that, I probably have operators that manage a fragment of a replication hierarchy in each cluster. And then I write a controller that manages my operators in various clusters. Nobody has done that yet. So there is no DevOps for MySQL because nobody has had enough pressure to solve the full problem.

And of course, I can’t just take a Kubernetes operator that runs MySQL and expect it to work. Because the Kubernetes operator is a part with code in it that implements a certain way of running MySQL. And that certain way of running MySQL might or might not be a clip to my environment. So I should always expect to maintain my own fork of an operator that somebody else developed. And I need to make it so that I can then port my changes alongside the primary source of that operator. Because I always have certain business rules in my fork of that operator. I operate MySQL differently than the original operator for reasons.

That means I need a development team, two or three persons that actually understand how the operator works, how operators work, how Go works, how this operator works, how MySQL works in order to be able to modify that operator and move forward alongside with the evolution of the actual upstream source operator. But we don’t run MySQL the way they envision. Maybe most of it, but not all of it. And that means as an organization, I need to have capability. to maintain that and to forward develop that. And that means I have a dependency on education and knowledge and staff that I need to maintain in order to retain that ability, capability, in order to be able to run my databases then. So even if I had full DevOps for databases, and even if that leverages automation available from outside, I would still need rather highly qualified people inside. Because they build on these things and they need to make modifications to that. And I need to retain that capability. That means I need to find and then retain these people. That’s hard.

Vinay Joosery: Yeah.

Kristian Köhntopp: That’s very specialized. And that means most people are winging it and you see a lot of badly administered databases.

Vinay Joosery: Yes. But, I mean, you know, Kristian, in a way, if you look at, you know, enterprises with lots of databases. Traditionally, they’ve, you know, they’ve kind of run them themselves. There is a push to the cloud in a way. So, you know, if you can use services, use them. But as we’ve seen, it only really automates a small portion of what actually enterprises need to automate, right? But what would be your recommendation here? You know, as we come towards the end of this podcast, what would be the recommendation? You build it, you buy the automation.

Kristian Köhntopp: Understand what your problem is, like, what size are you? What capabilities as an organization do you need? What do you expect from the database? How much do you need? What are the operations your operations people perform on the database? What are the requirements of your Dev people? Then look at the products and make a decision, an informed decision, not just go to the shelf and pick RDS because nobody has been ever fired for going to Amazon because you will end up then with a lot of local DevOps need because Amazon does not provide as we try to explain how incomplete that is.

Then you have a database instance and you have maybe a second one or you have maybe replication. Maybe that is sufficient for you because you’re a small web shop with some purchased PHP software that runs for you, but you will still need external consultancy to make that fly pretty slow. Maybe you are bigger. You have more instances, maybe you want to be bigger, or maybe you want to be more resilient and want to have facilities here that do things. Maybe you have to be more compliant. And that means that you either assemble a team of really good developers that write infrastructure. That is what we did, infrastructure code.

But these are really good developers that are not developing feature code because you spend them on infrastructure, but these are your innovation tokens that go to infrastructure and not to business. Maybe you find a solution that you can buy that is better than what you can get pre-made in any cloud. Maybe you find an actual partner that develops that with you. Or maybe you find other people that have the same problem you have and you fund an open source project where instead of having three developers doing this, you have one and the other one, two partners that you have found also contribute one… And then you do that together. You probably will need five instead of three because you cross two administrative borders. So there will be a lot of synchronization overhead.

Or maybe you fund somebody like people should have been funding LumiNoir for Orchestrator a lot more so that he would still maintain that as a project because that, for example, solves a large part of, say, replication automation, maintaining the IRA key and reassembling that. and monitoring all of that. And instead, it’s now an orphan project, which is really sad. So there are a lot of options. And I think most of them will be wrong for your company because of whatever unique requirements they have. But seeing an incomplete data service in the cloud as a solution that actually solves DevOps is wrong. What you will see is that you have a lot of application business units that all use RDS in some way, and that all come up with their own incomplete automation around RDS. And you say, hey, we have no database DBA DevOps team anymore.

That is true, but you have probably lost twice that amount of productivity because developers in each of these teams are redoing it from scratch in that environment. You just lost the visibility because it’s now no longer labeled as DBA DevOps team. It’s labeled as this ABU and that ABU, and somebody in there certainly does then unattributed unaccounted the DBA automation work under the label of, I don’t know, selling rental cars or hotel rooms or something like that. So you have institutionalized lying to yourself as a company because you lost that metric. That overhead is then vanishing in fractional overheads in all of the different ABUs and you have no longer visibility.

Vinay Joosery: Yeah. So, I mean, that’s kind of the paradox in a way, let’s go, you know, and if you look at, there’s a lot of pressure, even on, you know, upper management, you know, to use the cloud, go to the cloud, there’s so much attention on the cloud, right? It’s very shiny. And that’s kind of, in a way, we’ve come to a situation where, you know, there’s no real question, really. It’s like, well, you know, you should really move to the cloud, because there’s no other way, really. But then, you know, we found out that, well, you still need DevOps teams. You still need to actually, you know, do a lot of work, even though you’re using the services. People have large teams who actually know how it works with Amazon, right? Or with Google, right? And these people are not easy to find either. So, you know, in a way, it’s a bit of a grim future. I mean, you’re kind of like between a rock and a hard place. You can’t really outsource it to the hyperscalers as perhaps what, you know, they would like you to sort of believe. But even building it in-house means, yeah, you need to be using some of these innovation coins that you mentioned, which is a cost to the company.

Kristian Köhntopp: Yeah, let me be a bit cynical. I mean, hiring got a lot easier since yesterday. There are 11,000 ex-Facebook people available. So you could seize that opportunity and do something yourself if you have the money on staff to spend there. A lot of companies are also looking, so there’s that. I do believe that a lot of companies do not understand how they work. Like the management doesn’t understand how the sausage is made. There’s that other company, Twitter, for example, that currently quite demonstrates how clueless a CEO can be. But this is at a smaller scale, also visible in many other organizations that I have a bit of visibility into. If you are a company that is on the internet, doing stuff on the internet, and you are a C-level management person that prides itself in being non-technical, then you’re holding it wrong. The thing you’re holding wrong is the future of your company. As software eats the world, and that means you need to be literate.

As deep as possible, level to understand how things operate in order to understand what is possible, what could be possible with your organization. You need to be able to map the technical capability and the technical depth. onto the business requirements somehow in order to understand what is easy and what is hard. And frankly, most companies that I have visibility into, they are not able to do that at the upper management level as well. So that is where a lot of bad decisions are coming from.

Vinay Joosery: This has been a very fascinating conversation, Kristian. I think there’s a lot of things that go in automation, as we have seen. And from the perspective of using cloud services from the hyperscaler’s database services, there’s very actually little automation that fits the operational models of the enterprise. And that’s kind of the hard thing to understand because, as you mentioned, you really need to understand how the sausage is made in order to understand that.

Kristian Köhntopp: Yeah. I mean, so you have a business, that business has processes. processes are things that manipulate data that represents the entities you’re doing business with. Yeah, that’s your data model. And the processes like your code that modifies the data, and that is a transaction, a business transaction. that somehow gets mapped into actual data structures that end up in the database and into actual transactions that is the thing that you run against the database. And from the database or from the other components of your system come constraints, absolute scalability limits, absolute limits to the things that you can do.

There’s also other constraints from the business, from the legal side, from the compliance side, and from the financial side and so on. And as a C-level person, it is your job to actually have an as detailed as possible model of all of this. This is your business in your mind, and then to be able to explain to your people, the various departments, that you understand these constraints, that they understand the constraints, where you want to go, what you expect the challenges to be, like, where do we hit these constraints? And what do you expect from them to deliver in order for this to be not a problem? And then they will come back with proposed solutions, not all of them good. And you need to be able to either judge them or be part of the conversation where all of this is criticized. And in the end, lead the discussion between the subject matter experts from the various departments to some kind of consensus. That is your leadership model. That’s what I expect from a C-level management. And if I apply that yardstick to the various companies, there’s also often a lot of disappointment. But that is probably just me being a sysadmin and me being a German, and that is not the background that brings a lot of optimism. Yeah.

Vinay Joosery: All right. Well, with those words, thank you very much, Kristian. This was really great to have you join us. A thousand thanks. So in our next episode…

Kristian Köhntopp: Thanks for having me.

Vinay Joosery: In our next episode, folks, we will welcome Kristian again. And this time, we will look at how Booking.com built their own private database as a service. And behind that, there’s a lot of automation. And we’ll have a peek at that, all right? Thank you, all.

Guest-at-a-Glance

Name: Kristian Köhntopp
What he does: Kristian is the principal system engineer at Booking.com.
Website: Booking.com
Noteworthy: Kristian is an architect with years of experience in databases, Linux/Unix, data center planning and design, and security management systems in enterprise and startup environments. In his current role, Kristan focuses on database automation, provisioning a few thousand databases and a few hundred application hierarchies, and cloud migration.
You can find Kristian Köhntopp on LinkedIn