blog

Why Did My MySQL Database Crash? Get Insights with the New MySQL Freeze Frame

Paul Namuag

Published: January 16, 2020
Last Updated: August 15, 2023

In case you haven’t seen it, we just released ClusterControl 1.7.5 with major improvements and new useful features. Some of the features include Cluster Wide Maintenance, support for version CentOS 8 and Debian 10, PostgreSQL 12 Support, MongoDB 4.2 and Percona MongoDB v4.0 support, as well as the new MySQL Freeze Frame.

Wait, but What is a MySQL Freeze Frame? Is This Something New to MySQL?

Well it’s not something new within the MySQL Kernel itself. It’s a new feature we added to ClusterControl 1.7.5 that is specific to MySQL databases. The MySQL Freeze Frame in ClusterControl 1.7.5 will cover these following things:

Snapshot MySQL status before cluster failure.
Snapshot MySQL process list before cluster failure (coming soon).
Inspect cluster incidents in operational reports or from the s9s command line tool.

These are valuable sets of information that can help trace bugs and fix your MySQL/MariaDB clusters when things go south. In the future, we are planning to include also snapshots of the SHOW ENGINE InnoDB status values as well. So please stay tuned to our future releases.

Note that this feature is still in beta state, we expect to collect more datasets as we work with our users. In this blog, we will show you how to leverage this feature, especially when you need further information when diagnosing your MySQL/MariaDB cluster.

ClusterControl on Handling Cluster Failure

For cluster failures, ClusterControl does nothing unless Auto Recovery (Cluster/Node) is enabled just like below:

Once enabled, ClusterControl will try to recover a node or recover the cluster by bringing up the entire cluster topology.

For MySQL, for example in a master-slave replication, it must have at least one master alive at any given time, regardless of the number of available slave/s. ClusterControl attempts to correct the topology at least once for replication clusters, but provides more retries for multi-master replication like NDB Cluster and Galera Cluster. Node recovery attempts to recover a failing database node, e.g. when the process was killed (abnormal shutdown), or the process suffered an OOM (Out-of-Memory). ClusterControl will connect to the node via SSH and try to bring up MySQL. We have previously blogged about How ClusterControl Performs Automatic Database Recovery and Failover, so please visit that article to learn more about the scheme for ClusterControl auto recovery.

In the previous version of ClusterControl < 1.7.5, those attempted recoveries triggered alarms. But one thing our customers missed was a more complete incident report with state information just before the cluster failure. Until we realized this shortfall and added this feature in ClusterControl 1.7.5. We called it the “MySQL Freeze Frame”. The MySQL Freeze Frame, as of this writing, offers a brief summary of incidents leading to cluster state changes just before the crash. Most importantly, it includes at the end of the report the list of hosts and their MySQL Global Status variables and values.

How Does MySQL Freeze Frame Differs With Auto Recovery?

The MySQL Freeze Frame is not part of the auto recovery of ClusterControl. Whether Auto Recovery is disabled or enabled, the MySQL Freeze Frame will always do its work as long as a cluster or node failure has been detected.

How Does MySQL Freeze Frame Work?

In ClusterControl, there are certain states that we classify as different types of Cluster Status. MySQL Freeze Frame will generate an incident report when these two states are triggered:

CLUSTER_DEGRADED
CLUSTER_FAILURE

In ClusterControl, a CLUSTER_DEGRADED is when you can write to a cluster, but one or more nodes are down. When this happens, ClusterControl will generate the incident report.

For CLUSTER_FAILURE, though its nomenclature explains itself, it is the state where your cluster fails and is no longer able to process reads or writes. Then that is a CLUSTER_FAILURE state. Regardless of whether an auto-recovery process is attempting to fix the problem or whether it’s disabled, ClusterControl will generate the incident report.

How Do You Enable MySQL Freeze Frame?

ClusterControl’s MySQL Freeze Frame is enabled by default and only generates an incident report only when the states CLUSTER_DEGRADED or CLUSTER_FAILURE are triggered or encountered. So there’s no need on the user end to set any ClusterControl configuration setting, ClusterControl will do it for you automagically.

Locating the MySQL Freeze Frame Incident Report

As of this writing, there are 4-ways you can locate the incident report. These can be found by doing the following sections below.

Using the Operational Reports Tab

The Operational Reports from the previous versions are used only to create, schedule, or list the operational reports that have been generated by users. Since version 1.7.5, we included the incident report generated by our MySQL Freeze Frame feature. See the example below:

The checked items or items with Report type == incident_report, are the incident reports generated by MySQL Freeze Frame feature in ClusterControl.

Using Error Reports

By selecting the cluster and generating an error report, i.e. going through this process:

Subscribe to our newsletter

You’ll get two emails every month full of fresh database ops tips and strategic considerations.

Migration and upgrades: achieving near zero-downtime in PostgreSQL

Comparing DevOps tooling approaches: Terraform, Ansible, Chef, Puppet, and DIY scripting

Why Cloud Repatriation Matters Now More Than Ever

Automating Day 2 operations: Scaling, upgrades and maintenance