Putting the breaks on database downtime
Simplifying database cluster operations for the world’s leading manufacturer of braking systems.
Background
The Knorr-Bremse Group is the world’s leading manufacturer of braking systems for rail and commercial vehicles. For more than 110 years now, the company has pioneered the development, production, and marketing of state-of-the-art braking sytems. Other lines of business include intelligent entrance systems, HVAC systems, power conversion systems, and driver assistance systems.
Knorr-Bremse is headquartered in Munich, Germany. The group has a global workforce of 24,000 employees in 27 countries, with a revenue of 5.2 billion euros in 2014.
Challenge
Knorr-Bremse’s locations in Austria have a number of internally-developed applications that are used during production of their products. Work is organized in three shifts per day to ensure that manufacturing processes run 24 hours a day, without any interruption. Production processes rely on continuous availability of data, and any disruption in the databases would impact the output of the plants.
Strong data consistency is another imperative, as these processes need to have data that is real-time and accurate – even if the data is distributed between servers.
Knorr-Bremse’s MySQL databases made use of master-slave replication to achieve redundancy, but that was not satisfactory. The operating system and other software components on the database servers were updated on average twice a month with updates and security patches. It was important to do so without database downtime. Failover of the regular master-slave setups was manual. Since replication was asynchronous, there was risk of data loss.
The company already had a virtualized infrastructure in place, with a mixture of Ubuntu 12 and 14 running on VMWare hosts. Any new solution had to fi t the existing environment. Finally, the systems were operated by a team of system administrators (sysadmins). There was no full-time database administrator (DBA) specialized in managing and maintaining the
database systems. So it was important to have a system that could be managed by the
existing team, rather than having to recruit a DBA.
Solution
Galera Cluster with HAProxy looked like a very good alternative to the standard asynchronous MySQL replication. It automatically managed failures of single nodes and would resync them when they came back online. At least on paper!
The team decided to evaluate the technology. A test cluster was quickly deployed using the online Severalnines cluster configurator. The existing database was easily migrated to the Galera cluster, as the tables were mostly InnoDB. The process to evaluate, test, and then go live took about 3 months. As the team was getting up to speed, there were a number of questions that came up. They leveraged the Severalnines support team to get these resolved so that the team could quickly get to a fully working solution.
We are very glad to have moved to Galera Cluster with ClusterControl, as we now have a highly available and stable database solution for our applications. This would not have been possible without the tools from Severalnines, which helped us get productive in a very short time. The support team was also fast and competent, so we could quickly resolve any issues that arose.
Juergen Mayer, Manager IT District Austria of Knorr-Bremse
Outcome
The solution went live in December 2014, and there has not been any downtime incident since. The sysadmin team is also able to take out individual servers in order to do their regular maintenance work without affecting applications that need access to the data.
Since replication is synchronous, all database servers have exactly the same data. So all applications are able to access any database server via a redundant HAProxy load balancer, with the certainty that the data is consistent across the whole cluster.
Finally, the management is automated using Severalnines ClusterControl – from scheduling backups to removing nodes out of the cluster for maintenance and then re-introducing them back in production. Having a visual representation of the Galera cluster, from database load, SQL queries, and host metrics, allows the team to get a good view of what is happening.
Summary
Automated database management
Severalnines’ ClusterControl automates backups, removing nodes out of the cluster for maintenance, and re-introducing them back in production.
Your virtual DBA
With no full-time DBA, Knorr-Bremse needed a system that could be managed by the existing team without specialized database management skills.
Updates and security patches with zero downtime
ClusterControl could perform failover and replication with no downtime and no risk of data loss.
Ready to automate your database?
Sign up now and you’ll be running your database in just minutes.