MongoDB Backup Management Tips for Sharded Clusters

Agus Syafaat

Making proper backups of the database is a critical task. Besides setting the high availability architecture of your MongoDB for database services, you also need to have backups of your databases to ensure the availability of data in case of disaster. For example, if you accidentally delete some data from a production database, the only way to recover the data from the database point of view is to restore from backup.

Recently, ClusterControl started to support a new backup method, called Percona Backup for MongoDB, developed by Percona. It can run consistent backups for MongoDB Replica Sets and Sharded Clusters.

In this blog, we will have a look at backup management for MongoDB Replica Sets and Sharded Clusters.

MongoDB Backup in Highly Available Architecture

ClusterControl supports 3 backup methods, which are mongodump,  mongodb consistent  and  Percona Backup for Mongodb. The mongodb consistent backup is using mongodump utility as the backup method, and the backup can be restored using mongorestore. 

The latest supported backup method is Percona Backup for Mongodb for consistent and point in time backups of Replica Set and Sharded Clusters, it requires an agent to run on every node or replica set or shard nodes and management nodes for shard clusters as described in here.

Configuring and scheduling consistent backup using Percona Backup for Mongodb in ClusterControl is very easy. Go to the Backup page, and then configure the Percona Backup for Mongodb. The prerequisite is to have Percona Backup for MongoDB running on each node, which can also be installed from ClusterControl. 

We need to install the Percona Backup for MongoDB agent first before being able to Schedule Backup as below:

And then configure the backup directory. Please take a note that the backup directory has to be a shared disk that has been mounted on all nodes with exactly the same mounted path as below:

If you do not have any kind of shared disk ready in the system, you can use NFS to accomplish this. For configuring the NFS server, we need a dedicated server / virtual machine with enough free space to store the backup. Install the nfs-utils and nfs-utils-lib library in the server as below (assuming we are using the CentOS based):

[[email protected] ~]# yum install nfs-utils nfs-utils-lib

[[email protected] ~]# yum install portmap

And start the portmap and nfs services.

[[email protected] ~]# /etc/init.d/portmap start

[[email protected] ~]# /etc/init.d/nfs start

After that, add new entries in /etc/exports as shown below:

[[email protected] ~]# vi /etc/exports

/backup 10.10.10.11(rw,sync,no_root_squash)

On the database node, we just need to mount the storage disk as shared storage.

Last thing, just click the install button and it will trigger a new job to configure the agent on each node.

After all PBM ggent is installed, we can configure the backup method for the cluster as below:

Physical vs Logical Backup

MongoDB backup supports logical backup and physical backup. The method for logical backup by using the mongodump utility is included when you install the mongodb package. Mongodump needs an access to your mongodb database, thus it requires credential access for mongodump with backup roles privileges and must have grant find action to backup the database.

It works for BSON data dump formats.The mongodump will connect to your database with credentials provided, read the entire data in your database and dump the data into files. Since it is a single threaded process, it will take longer for the backup especially with a large size of database. Mongodump does not maintain the atomicity of transactions across the shards, that is why it can not be used as a backup strategy for mongodb version 4.2 and above in a sharded cluster. Percona Backup for MongoDB is a logical backup but it supports consistent backups of clusters.

Physical backup in MongoDB works through the snapshot of the mongodb file systems, it is copying the underlying mongodb files to another location as base backup of your mongodb database. The file system snapshot are operating system if you use LVM (Logical Volume Manager) as software for managing your disk layout and device, or software appliance eg. Veritas, or NetApp Backup. You must enable journaling, the changes activity log in mongodb before running the file system snapshot to make the backup consistent.

Besides the filesystem snapshot, you can also use the cp or rsync command to copy MongoDB data files, but you need to stop the write process to mongodb because the process of copying datafiles is not an atomic operation. The backup can not be used for Point in Time Recovery in Replica Sets or Sharded Cluster architectures.

Percona Backup for MongoDB consists of two components, the pbm-agent that needs to be installed on each node and the pbm as a command line interface to interact and run the backups.The pbm-agent coordinate between the database nodes and running the backup and restoration process. The pbm-agent will decide the best node for taking the backup.

PITR Backup

In many database systems, it is common to use a checkpoint to flush the data into the disk. MongoDB uses  WiredTiger storage engine as a default storage engine and also uses checkpoints to provide a consistent view of data. Not only that, the checkpoint in MongoDB can be used to recover from the last checkpoint. The journaling works between each checkpoint, journaling is required to recover from unexpected outages that happen at any time between the checkpoints. Journaling guarantees the write operations are logged to disk, MongoDB will create a journal entry for every change, including the bytes that changed and the disk location. 

Mongodump and mongorestore can be used for point in time recovery backup, there is an option to leverage  the oplog. The oplog is a capped collection in MongoDB which tracks all the changes in collections for every write transactions (eg. insert, update, delete). So, if you want to do point in time recovery, you need to restore from the last full backup and also use the oplog file to apply the changes to the exact time you want to recover. Another tool that can be used is the Percona Backup for MongoDB, the process is similar like mongodump, we need to restore from the backup and then apply the oplog.

Conclusion

Taking a consistent backup is important, especially in clustered MongoDB setups (replica set or sharded cluster). ClusterControl provides an easy way to configure the Percona Backup for MongoDB in your cluster and schedule your backups.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.