Severalnines Blog
The automation and management blog for open source databases

Become a ClusterControl DBA: Safeguarding your Data


In the past four posts of the blog series, we covered deployment of clustering/replication (MySQL/Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health and in the last post, how to make your setup highly available through HAProxy and MaxScale.

So now that you have your databases up and running and highly available, how do you ensure that you have backups of your data?

You can use backups for multiple things: disaster recovery, to provide production data to test against development or even to provision a slave node. This last case is already covered by ClusterControl. When you add a new (replica) node to your replication setup, ClusterControl will make a backup/snapshot of the master node and use it to build the replica. After the backup has been extracted, prepared and the database is up and running, ClusterControl will automatically set up replication.

Creating an instant backup

In essence creating a backup is the same for Galera, MySQL replication, Postgres and MongoDB. You can find the backup section under ClusterControl > Backup and by default it should open the scheduling overview. From here you can also press the “Backup” button to make an instant backup.


As all these various databases have different backup tools, there is obviously some difference in the options you can choose. For instance with MySQL you get choose between mysqldump and xtrabackup. If in doubt which one to choose (for MySQL), check out this blog about the differences and use cases for mysqldump and xtrabackup.

On this very same screen, you can also create a backup schedule that allows you to run the backup at a set interval, for instance, during off-peak hours.


Backing up MySQL and Galera

As mentioned in the previous paragraph, you can make MySQL backups using either mysqldump or xtrabackup. Using mysqldump you can make backups of individual schemas or a selected set of schemas while xtrabackup will always make a full backup of your database.

In the Backup Wizard, you can choose which host you want to run the backup on, the location where you want to store the backup files, and its directory and specific schemas.


If the node you are backing up is receiving (production) traffic, and you are afraid the extra disk writes will become intrusive, it is advised to send the backups to the ClusterControl host. This will cause the backup to stream the files over the network to the ClusterControl host and you have to make sure there is enough space available on this node.

If you would choose xtrabackup as the method for the backup, it would open up extra options: desync, compression and xtrabackup parallel threads/gzip. The desync option is only applicable to desync a node from a Galera cluster. 


After scheduling an instant backup you can keep track of the progress of the backup job in the Settings > Cluster Jobs. After it has finished, you should be able to see the backup file in the configured location.


Backing up PostgreSQL

Similar to the instant backups of MySQL, you can run a backup on your Postgres database. With Postgres backups the are less options to fill in as there is one backup method: pg_dump.


Backing up MongoDB

Similar to PostgreSQL there is only one backup method: mongodump. In contrary to PostgreSQL the node that we take the backup from can be desynced in the case of MongoDB.


Scheduling backups

Now that we have played around with creating instant backups, we now can extend that by scheduling the backups.
The scheduling is very easy to do: you can select on which days the backup has to be made and at what time it needs to run.

For xtrabackup there is an additional feature: incremental backups. An incremental backup will only backup the data that changed since the last backup. Of course, the incremental backups are useless if there would not be full backup as a starting point. Between two full backups, you can have as many incremental backups as you like. But restoring them will take longer. 

Once scheduled the job(s) should become visible under the “Current Backup Schedule” and you can edit them by double clicking on them. Like with the instant backups, these jobs will schedule the creation of a backup and you can keep track of the progress via the Cluster Jobs overview if necessary.

Backup reports

You can find the Backup Reports under ClusterControl > Backup and this will give you a cluster level overview of all backups made. Also from this interface you can directly restore a backup to a host in the master-slave setup or an entire Galera cluster. 


The nice feature from ClusterControl is that it is able to restore a node/cluster using the full+incremental backups as it will keep track of the last (full) backup made and start the incremental backup from there. Then it will group a full backup together with all incremental backups till the next full backup. This allows you to restore starting from the full backup and applying the incremental backups on top of it.

Offsite backup in Amazon S3 or Glacier

Since we have now a lot of backups stored on either the database hosts or the ClusterControl host, we also want to ensure they don’t get lost in case we face a total infrastructure outage. (e.g. DC on fire or flooded) Therefore ClusterControl allows you to copy your backups offsite to Amazon S3 or Glacier. 

To enable offsite backups with Amazon, you need to add your AWS credentials and keypair in the Service Providers dialogue (Settings > Service Providers).


Once setup you are now able to copy your backups offsite:


This process will take some time as the backup will be sent encrypted and the Glacier service is, in contrary to S3, not a fast storage solution.

After copying your backup to Amazon S3 or Glacier you can get them back easily by selecting the backup in the S3/Glacier tab and click on retrieve. You can also remove existing backups from Amazon S3 and Glacier here.

An alternative to Amazon S3 or Glacier would be to send your backups to another data center (if available). You can do this with a sync tool like BitTorrent Sync. We wrote a blog article on how to set up BitTorrent Sync for backups within ClusterControl.

Final thoughts

We showed you how to get your data backed up and how to store them safely off site. Recovery is always a different thing. ClusterControl can recover automatically your databases from the backups made in the past that are stored on premises or copied back from S3 or Glacier. Recovering from backups that have been moved to any other offsite storage will involve manual intervention though.
Obviously there is more to securing your data, especially on the side of securing your connections. We will cover this in the next blog post!