How to Prevent Rollbacks in MongoDB

Onyancha Brian Henry

Replication in MongoDB involves replica sets by members with an architecture of a primary and secondary members but at times with a non data bearing member called an arbiter. The replication process is that, whenever the data has been written to the primary node, the changes are recorded on an oplog file from which the secondary members apply the same changes. Read operations can be made from any data bearing member hence creating a scenario commonly known as High Availability.

However, in some cases the secondary members may fail to catch up with the primary in making changes and in case the primary node fails before these changes have been applied, one will be forced to resync the whole cluster so that they can be in the same data state.

What is a Rollback?

This is an automatic failover feature in MongoDB where the primary node in a replica set may fail while making changes which unfortunately end up not being  reflected to the secondary members in time from the oplog hence need to revert the state of the primary to one before the changes were made.

Rollbacks are therefore necessary only when the primary has accepted to write the operations which have not been replicated to the secondary members before the primary steps down due to some reason such as network partition. If in case the write operations manage to be replicated in one of the members which is available and accessible to a majority of the replica set, a rollback will not happen.

The main reason behind rollbacks in MongoDB is to keep data consistency for all members and therefore, when the primary rejoins the replica set, if its changes have not been applied to the secondary members, it will be reverted to the state before the failure.

However, rollbacks should be rare or rather avoided in MongoDB as they may result in a lot of data loss and consequently affecting operation of connected applications to the database.           

MongoDB Rollback Process

Let us  consider a three-members replica set with A as the primary, B and C as the secondary members. We will be populating data to A and at the same time trigger some network partitioning to B and C. We will be using MongoDB version 4.2 and Atlas in this test.

First we will get the status of the replica set by running the command rs.status() on the mongo shell  

MongoDB Enterprise Cluster0-shard-0:PRIMARY> rs.status()

Looking at the members attribute you can see something like

"members" : [

{

"_id" : 0,

"name" : "cluster0-shard-00-00-sc27x.mongodb.net:27017",

"health" : 1,

"state" : 2,

"stateStr" : "SECONDARY",

"uptime" : 1891079,

"optime" : {

"ts" : Timestamp(1594826711, 1),

"t" : NumberLong(27)

},

"optimeDurable" : {

"ts" : Timestamp(1594826711, 1),

"t" : NumberLong(27)

},

"optimeDate" : ISODate("2020-07-15T15:25:11Z"),

"optimeDurableDate" : ISODate("2020-07-15T15:25:11Z"),

"lastHeartbeat" : ISODate("2020-07-15T15:25:19.509Z"),

"lastHeartbeatRecv" : ISODate("2020-07-15T15:25:18.532Z"),

"pingMs" : NumberLong(0),

"lastHeartbeatMessage" : "",

"syncingTo" : "cluster0-shard-00-02-sc27x.mongodb.net:27017",

"syncSourceHost" : "cluster0-shard-00-02-sc27x.mongodb.net:27017",

"syncSourceId" : 2,

"infoMessage" : "",

"configVersion" : 4

},

{

"_id" : 1,

"name" : "cluster0-shard-00-01-sc27x.mongodb.net:27017",

"health" : 1,

"state" : 2,

"stateStr" : "SECONDARY",

"uptime" : 1891055,

"optime" : {

"ts" : Timestamp(1594826711, 1),

"t" : NumberLong(27)

},

"optimeDurable" : {

"ts" : Timestamp(1594826711, 1),

"t" : NumberLong(27)

},

"optimeDate" : ISODate("2020-07-15T15:25:11Z"),

"optimeDurableDate" : ISODate("2020-07-15T15:25:11Z"),

"lastHeartbeat" : ISODate("2020-07-15T15:25:17.914Z"),

"lastHeartbeatRecv" : ISODate("2020-07-15T15:25:19.403Z"),

"pingMs" : NumberLong(0),

"lastHeartbeatMessage" : "",

"syncingTo" : "cluster0-shard-00-02-sc27x.mongodb.net:27017",

"syncSourceHost" : "cluster0-shard-00-02-sc27x.mongodb.net:27017",

"syncSourceId" : 2,

"infoMessage" : "",

"configVersion" : 4

},

{

"_id" : 2,

"name" : "cluster0-shard-00-02-sc27x.mongodb.net:27017",

"health" : 1,

"state" : 1,

"stateStr" : "PRIMARY",

"uptime" : 1891089,

"optime" : {

"ts" : Timestamp(1594826711, 1),

"t" : NumberLong(27)

},

"optimeDate" : ISODate("2020-07-15T15:25:11Z"),

"syncingTo" : "",

"syncSourceHost" : "",

"syncSourceId" : -1,

"infoMessage" : "",

"electionTime" : Timestamp(1592935644, 1),

"electionDate" : ISODate("2020-06-23T18:07:24Z"),

"configVersion" : 4,

"self" : true,

"lastHeartbeatMessage" : ""

}

],

This will show you the status of each member of your replica set. Now we opened a new terminal for node A and populated it with 20000 records:

MongoDB Enterprise Cluster0-shard-0:PRIMARY> for (var y = 20000; y >= 0; y--) {

    db.mytest.insert( { record : y } )

 }

WriteResult({ "nInserted" : 1 })

MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.mytest 2020-07-15T21:28:40.436+2128 I NETWORK  [thread1] trying reconnect to 127.0.0.1:3001 (127.0.0.1) failed

2020-07-15T21:28:41.436+2128 I 

NETWORK  [thread1] reconnect 127.0.0.1:3001 (127.0.0.1) ok

MongoDB Enterprise Cluster0-shard-0:SECONDARY> rs.slaveOk()

MongoDB Enterprise Cluster0-shard-0:SECONDARY> db.mytest.count()

20000

During the network partitioning, A will be down making it unavailable for B and C and hence B elected as the primary in our case. When A rejoins it will be added as the secondary and you can check that using the rs.status() command. However, some records managed to be replicated to member B before the network partitioning as seen below: (Remember in this case B is the primary now)

MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.mytest.find({}).count()

12480    

The number is the count for documents that were able to be replicated to B before A went down.

If we write some data to B and allow A to join the network, then we can notice some changes to A

connecting to: 127.0.0.1:3001/admin

MongoDB Enterprise Cluster0-shard-0:ROLLBACK> 

MongoDB Enterprise Cluster0-shard-0:RECOVERING> 

MongoDB Enterprise Cluster0-shard-0:SECONDARY> 

MongoDB Enterprise Cluster0-shard-0:SECONDARY> 

MongoDB Enterprise Cluster0-shard-0:PRIMARY>

Using an oplogFetcher secondary members sync oplog entries from their syncSource. The oplogFetcher triggers a find method to the source oplog followed by a series of getMores cursor series. When A rejoins as the secondary the same approach is applied and a document greater than the predicate timestamp is returned. If the first document in B does not match A’s oplog last entry, A will be forced into a rollback.

Recovering Rollback Data in MongoDB

Rollback is not a bad thing in MongDB but one should try as much as possible to ensure they do not happen quite often. It is a safety auto measure of ensuring data consistency between members of a replica set. In case rollback happen, here are some steps to address the situation:

Rollback Data Collection

You need to collect member data regarding the rollback.This is done by ensuring rollback files are created (only available with MongoDB version 4.0) by enabling the createRollbackDataFiles. By default this option is set to true hence rollback files will always be created.

The rollback files are placed in the path <dbpath>/rollback/<db>.<collection> and they contain data which can be converted from the BSON format using the bsondump tool to a format that is human readable.

Loading the Rollback Files Data in a Separate Database or Server

Mongorestore is a vital aspect of MongoDB that can aid in enabling recovery of rollback data files. The first thing is to copy rollback files into a new server then using mongorestore load the files into your server . The mongorestore command is shown below.  

mongorestore -u <> -p <> -h 127.0.0.1 -d <rollbackrestoretestdb> -c <rollbackrestoretestc> <path to the .bson file> --authenticationDatabase=<database of user>

Cleaning Data That is Not Needed and Sifting Through Data

This step needs one to use discretion to choose between the data to be kept from rollback files and the data to be thrown away. It is advisable to import all the rollback files data, this decision point makes this step the most difficult step in data recovery.  

Using the Primary as a Cluster to Import Data 

Start the final step by downloading cleansed data through the use of mongorestore and mongodump, follow this by re-importing  the data into the original production cluster.

Preventing MongoDB Rollbacks

In order to prevent rollbacks of data from happening when using MongoDB one can do the following. 

Running All Voting Members ‘MAJORITY’

This can be done by using w: majority write concern which has the power to option request acknowledgement that will enable write operation to given specific tags of Mongod instances. This can be achieved by using the w option followed by <value> tag. To prevent rollback all voting members in MongoDB will have journal enabled and use of w: majority write concern this ensures that majority are able to write and set replica nodes before a rollback happens. It also ensures that client receives acknowledgement after propagating write operation on replica set. 

User Operations  

Updated version of MongoDB , that is version 4.2 have the ability to shut-down all the undergoing operation incase of a rollback.

Index Builds 

Version 4.2 of MongoDB feature compatibility version (fcv) "4.2" are capable of waiting for all the in-progress indices that are being built and finished all of the before a rollback takes place. However, version 4.0 waits for the continuing in-progress and builds background index thus a possibility of a rollback is high.

Size and Limitations

Version 4.0 of MongoDB has no listed limits of given data that can be rolled back when in-progress background index builds. 

Conclusion 

MongoDB rollback is a common phenomena to those using MongoDB without the knowledge of how to prevent it. Rollbacks are preventable if one keenly follows and adheres to some of safe practices and ways of avoiding rollbacks in MongoDB. In all , it is always advisable to upgrade to the newest version of MongoDB so as to avoid some preventable hiccups.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.