blog

MongoDB Chain Replication Basics

David Wayne

Published: May 17, 2018
Last Updated: May 4, 2022

What is Chain Replication?

When we talk about replication, we are referring to the process of making redundant copies of data in order to meet design criteria on data availability. Chain replication, therefore, refers to the linear ordering of MongoDB servers to form a synchronized chain. The chain contains a primary node, succeeded by secondary servers arranged linearly. Like the word chain suggest, the server closest to the primary server replicates from it while every other succeeding secondary server replicates from the preceding secondary MongoDB server. This is the main difference between chained replication and normal replication. Chained replication occurs when a secondary node selects its target using ping time or when the closest node is a secondary. Although chain replication as it appears, reduces load on the primary node, it may cause replication lag.

Why Use Chain Replication?

System infrastructures sometimes suffer unpredictable failures leading to loss of a server and therefore affecting availability. Replication ensures that unpredictable failures do not affect availability. Replication further allows recovery from hardware failure and service interruption. Both chained and unchained replication serve this purpose of ensuring availability despite system failures. Having established that replication is important, you may ask why use chain replication in particular. There is no performance difference between chained and unchained replication in MongoDb. In both cases, when the primary node fails, the secondary servers vote for a new acting primary and therefore writing and reading of data is not affected in both cases. Chained replication is however the default replication type in MongoDb.

How to Setup a Chain Replica

By default, chained replication is enabled in MongoDB. We will therefore elaborate on the process of deactivating chain replication. The major reason for which chain replication can be disabled is if it is causing lag. The merit of chain replication is however superior to the lag demerit and therefore in most cases deactivating it is unnecessary. Just in case chain replication is not active by default, the following commands will help you activate.

cfg = rs.config()
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)

This process is reversible. When forced to deactivate chain replication, the following process is followed religiously.

cfg = rs.config()
cfg.settings.chainingAllowed = false
rs.reconfig(cfg)

Tips & Tricks for Chain Replication

The most dreadful limitations of chain replication is replication lag. Replication lag refers to the delay that occurs between the time when an operation is done on the primary and when the same operation is replicated on the secondary. Although it is naturally impossible, it is always desired that the speed of replication to be very high in that replication lag is zero. To avoid or minimize replication lag to be close to zero, it a prudent design criteria to use primary and secondary hosts of the same specs in terms of CPU, RAM, IO and network related specs.

Although chain replication ensures data availability, chain replication can be used together with journaling. Journaling provides data safety by writing to a log that is regularly flushed to disk. When the two are combined three servers are written per write request unlike in chain replication alone where only two servers are written per write request.

Another important tip is using w with replication. The w parameter controls the number of servers that a response should be written to before returning success. When the w parameter is set, the getlasterror checks the servers’ oplog and waits until the given number of ‘w’ servers have the operation applied.

Using a monitoring tool like MongoDB Monitoring Service (MMS) or ClusterControl allows you to obtain the status of your replica nodes and visualize changes over time. For instance, in MMS, you can find replica lag graphs of the secondary nodes showing the variation in replication lag time.

Measuring Chain Replication Performance

By now you are aware that the most important performance parameter of chain replication is the replication lag time. We will therefore discuss how to test for replication lag period. This test can be done through the MongoDb shell script. To do a replication lag test, we compare the oplog of the last event on the primary node and the oplog of last event on the secondary node.

To check the information for the primary node, we run the following code.

db.printSlaveReplicationInfo()

The above command will provide information on all the recent operations on the primary node.The results should appear as below.

rs-ds046297:PRIMARY db.printSlaveReplicationInfo()
source: ds046297-a1.mongolab.com:46297
synced To: Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
      = 7475 secs ago (2.08hrs)
source: ds046297-a2.mongolab.com:46297
synced To: Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
      = 7475 secs ago (2.08hrs)

Having obtained the oplog for the primary, we are now interested in the oplog for the secondary node. The following command will help us obtain the oplog.

db.printReplicationInfo()

This command will provide an output with details on oplog size, log length, time for oplog first event, time for oplog last event and the current time. The results appear as below.

rs-ds046297:PRIMARY db.printReplicationInfo()
configured oplog size:   1024MB
log length start to end: 5589 secs (1.55hrs)
oplog first event time:  Tue Mar 05 2013 06:15:19 GMT-0800 (PST)
oplog last event time:   Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
now:                     Tue Mar 05 2013 09:53:07 GMT-0800 (PST)

From the oplog of the primary server, the last sync occurred on Tue Mar 05 2013 07:48:19 GMT-0800 (PST). From the oplog of the secondary server, the last operation occurred on Tue Mar 05 2013 07:48:19 GMT-0800 (PST). The replication lag was zero and therefore our chain replicated system is in correct operation. Replication time lag may however vary depending on the amount of changes that need to be replicated.

Advanced Partitioning Strategies for PostgreSQL OLTP and Analytics Datasets at Scale

Active-Active MySQL Group Replication Best Practices

Multi-Tenant, Multi-Cloud Logical and Bi-Directional Replication Deep Dive

Why Cloud Repatriation Matters Now More Than Ever