How to Design Highly Available Open Source Database Environments

These days high availability is a must for any serious deployment. Long gone are days when you could schedule a downtime of your database for several hours to perform a maintenance. Making a database environment highly available is one of the highest priorities nowadays alongside data integrity. For a database, which is often considered the single source of truth, compromised data integrity can have catastrophic consequences. This whitepaper discusses the requirements for high availability in database setups, and how to design the system from the ground up for continuous data integrity.

Table of contents

  • 1. Introduction - couple of words on “High Availability”
  • 2. High Availability basics
    • 2.1. Measuring High Availability
      • 2.1.1 What is High Availability?
      • 2.1.2. SLA’s
        • 2.1.2.1 Nines
      • 2.1.3 Measuring availability
    • 2.2. Magic number: “three”
    • 2.3. Single Points of Failure
  • 3. How to design your environment for High Availability?
    • 3.1. Identify Single Points of Failure
    • 3.2. Decide what availability level you want to achieve
    • 3.3. Which failures you can tolerate?
      • 3.3.1. Overall setup
      • 3.3.2. Hardware failures
      • 3.3.3. Network failures
      • 3.3.4. Proxy layer failures
      • 3.3.5. Database tier failures
        • 3.3.5.1. MySQL crash on slave
        • 3.3.5.2. MySQL crash on master
        • 3.3.5.3. Partial data loss
        • 3.3.5.4. Complete data loss
        • 3.3.5.5. Temporary load spike
        • 3.3.5.6. Increased load due to bad query
      • 3.3.6. Availability zone or a datacenter failure
      • 3.3.7. What issues cannot be tolerated?
    • 3.4. Remove SPOF’s and reduce the impact of issues with high severity
      • 3.4.1. Identify the culprit of the issues
        • 3.4.1.1. Hardware issues
        • 3.4.1.2. Network issues
        • 3.4.1.3. Proxy layer issues
        • 3.4.1.4. Database tier issues
        • 3.4.1.5. Infrastructure issues
      • 3.4.2. How to minimize the impact of the issues?
        • 3.4.2.1. Not enough resources to handle failure of a single node
        • 3.4.2.2. Failover is not fast enough
        • 3.4.2.3. No redundancy in the proxy layer
        • 3.4.2.4. Long backup recovery time
        • 3.4.2.5. No redundancy in terms of the infrastructure
    • 3.5. Design the environment
      • 3.5.1. Database tier design
      • 3.5.2. Proxy tier design
        • 3.5.2.1. Deploy ProxySQL with Keepalived for VIP failover
        • 3.5.2.2. Deploy ProxySQL on application hosts
        • 3.5.2.3. Synchronization of the ProxySQL configuration
      • 3.5.3. Backup redesign
      • 3.5.4. Deployment
    • 3.6. Test your design
  • 4. Examples of the highly available setups
    • 4.1. Single datacenter, replication
    • 4.2. Single datacenter, Galera cluster
    • 4.3. Multiple datacenter, replication

Introduction - couple of words on “High Availability”.

These days high availability is a must for any serious deployment. Long gone are days when you could schedule a downtime of your database for several hours to perform a maintenance. If your services are not available, you are losing customers and money. Therefore making a database environment highly available has typically one of the highest priorities.

This poses a significant challenge to database administrators. First of all, how do you tell if your environment is highly available or not? How would you measure it? What are the steps you need to take in order to improve availability? How to design your setup to make it highly available from the beginning?

There are many many HA solutions available in the MySQL (and MariaDB) ecosystem, but how do we know which ones we can trust? Some solutions might work under certain specific conditions, but might cause more trouble when applied outside of these conditions. Even a basic functionality like MySQL replication, which can be configured in many ways, can cause significant harm - for instance, circular replication with multiple writeable masters. Although it is easy to set up a ‘multi-master setup’ using replication, it can very easily break and leave us with diverging datasets on different servers. For a database, which is often considered the single source of truth, compromised data integrity can have catastrophic consequences.

In the following chapters, we’ll discuss the requirements for high availability in database setups, and how to design the system from the ground up.

Want to read the rest?

Download the full whitepaper for free