blog

A Review of MongoDB Backup Options

Akash Kathiriya

Published:

Database backup is nothing but a way to protect or restore data. It is the process of storing the operational state, architecture, and data of your database. It can be very useful in situations of technical outage or disaster. So it is essential to keep the backup of your database and that your database has a good and easy process for backup.

MongoDB provides several tools/techniques to backup your databases easily.

In this article, we will discuss some of the top MongoDB backup and restore workflows.

Generally, there are three most common options to backup your MongoDB server/cluster.

  • Mongodump/Mongorestore
  • MongoDB Cloud Manager
  • Database Snapshots

Apart from these general options, there are other ways to backup your MongoDB. We will discuss all these options as well in this article. Let’s get started.

MongoDump/MongoRestore

If you have a small database (<100GB) and you want to have full control of your backups, then Mongodump and Mongorestore are your best options. These are mongo shell commands which can be used to manually backup your database or collections. Mongodump dumps all the data in Binary JSON(BSON) format to the specified location. Mongorestore can use this BSON files to restore your database.

Backup a Whole Database

$ sudo mongodump --db mydb --out /var/backups/mongo

Output:

2018-08-20T10:11:57.685-0500    writing mydb.users to /var/backups/mongo/mydb/users.bson
2018-08-20T10:11:57.907-0500    writing mydb.users metadata to /var/backups/mongo/mydb/users.metadata.json
2018-08-20T10:11:57.911-0500    done dumping mydb.users (25000 documents)
2018-08-20T10:11:57.911-0500    writing mydb.system.indexes to /var/backups/mongo/mydb/system.indexes.bson

In this command, the most important argument is –db. It specifies the name of the database that you want to backup. If you don’t specify this argument then the Mongodump command will backup all your databases which can be very intensive process.

Backup a Single Collection

$ mongodump -d mydb -o /var/backups/mongo --collection users

This command will backup only users collection in mydb database. If you don’t give this option then, it will backup all the collection in the database by default.

Taking Regular Backups Using Mongodump/Mongorestore

As a standard practice, you should be making regular backups of your MongoDB database. Suppose you want to take a backup every day at 3:03 AM, then in a Linux system you can do this by adding a cron entry in crontab.

$ sudo crontab -e

Add this line in crontab:

3 3 * * * mongodump --out /var/backups/mongo

Restore a Whole Database

For restoring the database, we can use Mongorestore command with –db option. It will read the BSON files created by Mongodump and restore your database.

$ sudo mongorestore --db mydb /var/backups/mongo/mydb

Output

2018-07-20T12:44:30.876-0500    building a list of collections to restore from /var/backups/mongo/mydb/ dir
2018-07-20T12:44:30.908-0500    reading metadata file from /var/backups/mongo/mydb/users.metadata.json
2018-07-20T12:44:30.909-0500    restoring mydb.users from file /var/backups/mongo/mydb/users.bson
2018-07-20T12:45:01.591-0500    restoring indexes for collection mydb.users from metadata
2018-07-20T12:45:01.592-0500    finished restoring mydb.users (25000 documents)
2018-07-20T12:45:01.592-0500    done

Restore a whole collection

To restore just a single collection from db, you can use the following command:

$ mongorestore -d mydb -c users mydb/users.bson

If your collection is backed up in JSON format instead of BSON then you can use the following command:

$ mongoimport --db mydb --collection users --file users.json --jsonArray

Advantages

  • Very simple to use
  • You have full access to your backup
  • You can put your backups at any location like NFS shares, AWS S3 etc.

Disadvantages

  • Every time it will take a full backup of the database, not just the difference.
  • For large databases, it can take hours to backup and restore the database.
  • It’s not point-in-time by default, which means that if your data changes while backing it up then your backup may result in inconsistency. You can use –oplog option to resolve this problem. It will take a snapshot of the database at the end of mongodump process.

MongoDB Ops Manager

Ops Manager is a management application for MongoDB which runs in your data center. It continuously backs up your data and provides point-in-time restore processes for your database. Within this application, there is an agent which connects to your MongoDB instances. It will first perform an initial sync to backup the current state of the database. The agent will keep sending the compressed and encrypted oplog data to Ops Manager so that you can have a continuous backup. Using this data, Ops Manager will create database snapshots. It will create a snapshot of your database every 6 hours and oplog data will be stored for 24 hours. You can configure the snapshot schedule anytime using the Ops Manager.

Advantages

  • It’s point-in-time by default
  • Doesn’t impact the production performance except for initial sync
  • Support for consistent snapshots of sharded clusters
  • Flexibility to exclude non-critical collections

Disadvantages

  • Network latency increases with the snapshot size while restoring the database.

MongoDB Cloud Manager

MongoDB Cloud Manager is cloud-based backup solution which provides point-in-time restore, continuous and online backup solution as a fully managed service. You can simply install the Cloud Manager agent to manage backup and restore of your database. It will store your backup data in MongoDB cloud.

Advantages

  • Very simple to use. Good GUI.
  • Continuous backup of queries and oplog.

Disadvantages

  • No control on backup data. It is stored in MongoDB cloud.
  • Cost depends on the size of the data and the amount of oplog changes.
  • Restore process is slow.

Snapshot Database Files

This is the simplest solution to backup your database. You can copy all the underlying files (content of data/ directory) and place it to any secure location. Before copying all the files, you should stop all the ongoing write operations to a database to ensure the data consistency. You can use db.fsyncLock() command to stop all the write operations.

There are two types of snapshots: one is cloud level snapshots and another is OS level snapshots.

If you are storing database data with a cloud service provider like AWS then you have to take AWS EBS snapshots for backup. In contrast, if you are storing DB files in native OS like Linux then you have to take LVM snapshots. LVM snapshots are not portable to other machines. So cloud bases snapshots are better than OS based snapshots.

Advantages

  • Easy to use.
  • Full control over snapshots. You can move it to any data center.
  • These snapshots are diff snapshots which store only the differences from previous snapshots.
  • No need to download the snapshots for restoring your database. You can just create a new volume from your snapshot.

Disadvantages

  • Using this method, you can only restore your database at breakup points.
  • Maintenance becomes very complex sometimes.
  • To coordinate backups across all the replica sets (in sharded system), you need a special devops team.

MongoDB Consistent Backup tool

MongoDB consistent backup is a tool for performing consistent backups of MongoDB clusters. It can backup a cluster with one or many shards to a single point of the database. It uses Mongodump as a default backup method. Run the following command to take backup using this tool.

$ mongodb-consistent-backup -H localhost -P 27017 -u USERNAME -p PASSWORD -l /var/backups/mongo

All the backups generated by this commands are MongoRestore compatible. You can user mongorestore command with –oplogReplay option to ensure consistency.

$ mongorestore --host localhost --port 27017 -u USERNAME -p PASSWORD --oplogReplay --dir /var/backups/mongo/mydb/dump

Advantages

  • Fully open source
  • Works with sharded cluster
  • Provides an option for remote backup such as Amazon S3
  • Auto-scaling available
  • Very easy to install and run

Disadvantage

  • Not fully mature product
  • Very few remote upload options
  • Doesn’t support data encryption before saving to disk
  • Official code repository lacks proper testing

ClusterControl Backup

ClusterControl is an all in one automated database management system. It lets you monitor, deploy, manage & scale your database clusters with ease. It supports MySQL, MongoDB, PostgreSQL, Percona XtraDB and Galera Cluster. This software automates almost all the database operations like deploying a cluster, adding or removing a node from any cluster, continuous backups, scaling the cluster etc. All these things, you can do from one single GUI provided by the ClusterControl system.

ClusterControl provides a very nice GUI for MongoDB backup management with support for scheduling and creative reports. It gives you two options for backup methods.

  1. Mongodump
  2. Mongodb consistent backup

So users can choose any option according to their needs. This tool assigns a unique ID to all the backups and stores it under this path: ClusterControl > Settings > Backup > BackupID. If the specified node is not live while taking the backup then the tool will automatically find the live node from the cluster and carry on the backup process on that node. This tool also provides an option for scheduling the backups using any of the above backup methods. You can enable/disable any scheduling job by just toggling a button. ClusterControl runs the backup process in background so it won’t affect the other jobs in the queue.

Advantages

  • Easy installation and very simple to use
  • Multiple options for backup methods
  • Backup scheduling is very easy using a simple GUI form
  • Automated backup verification
  • Backup reports with status

Disadvantage

  • Both backup methods internally use mongodump, which has some issues with handling very large databases.

Conclusion

A good backup strategy is a critical part of any database management system. MongoDB offers many options for backups and recovery/restore. Along with a good backup method, it is very important to have multiple replicas of the database. This helps to restore the database without having the downtime of even one second. Sometimes for larger databases, the backup process can be very resource intensive. So your server should be equipped with good CPU, RAM, and more disk space to handle this kind of load. The backup process can increase the load on the server because of these reasons so you should run the backup process during the nights or non-peak hours.

Subscribe below to be notified of fresh posts