Disaster Recovery Planning for MySQL & MariaDB

Database outages are almost inevitable and understanding the timeline of an outage can help us better prepare, diagnose and recover from one. To mitigate the impact of downtime, organizations need an appropriate disaster recovery (DR) plan. This white paper provides essential insights into how to build such a plan, discussing the database mechanisms involved as well as how these mechanisms can be fully automated with ClusterControl, a management platform for open source database systems.

Table of contents

  • Introduction
  • Business considerations for Disaster Recovery
    • Is 100% Uptime Possible?
    • Analysing risk
    • Assessing business impact
  • Defining Disaster Recovery?
    • Outage Timeline
    • Recovery Time Objective
    • Recovery Point Objective
    • RPO + RTO = 0 ?
  • Disaster Recovery Tiers
    • 1. No Offsite Data
    • 2. Database Backup with no Hot Site
    • 3. Database Backup with Hot Site
    • 4. Asynchronous Replication to Hot Site
    • 5. Synchronous Replication to Hot Site
  • In Conclusion

Introduction

The cost of downtime can vary significantly between different organizations, and in some cases, it may be enough to cause a company to go out of business. To mitigate the impact of downtime, organizations need an appropriate disaster recovery plan in place. But how much should a business invest? Designing a highly available system comes at a cost, and not all businesses and certainly not all applications need five 9’s availability.

The best disaster recovery strategy for an application largely depends on it’s importance to the business, and more specifically, RTO (Recovery Time Objective) and RPO (Recovery Point Objective). RTO is the maximum period of time within which an application must be restored after a disruption. RPO is the determined maximum period of time that can pass during which data is lost. Can the business afford to lose 5 hours of data, or no more than 5 minutes? Can it be down for 4 hours, or at most 15 minutes? Knowing these numbers will go a long way in helping IT determine a disaster recovery strategy, as well as the best database solution to support it.

Therefore, disaster recovery can be implemented at different levels. They can be anything from periodic full backups that are archived offsite, to multi-datacenter setups with synchronous data replication. What is right for the business will vary by mission-criticalness.

As we will see in this whitepaper, outages are inevitable but understanding the timeline of an outage can help us better prepare, diagnose and recover from one. With regards to the database, different mechanisms can be implemented as part of a DR plan in order to prepare and respond to an outage. Higher levels of DR require increasing amounts of eventualities that one would have to plan for. We will look at the different levels, and specifically at the database mechanisms required for each level. Finally, we will see how these mechanisms can be fully automated with ClusterControl, a management platform for open source database systems.

Want to read the rest?

Download the full whitepaper for free