blog
Maintaining MongoDB Replica Sets in the Cloud Using Ansible
Replication has been widely applied in database systems for ensuring high availability of data through creating redundancy. It is basically a strategy of making a copy of the same data in different running servers that may be in different machines so that, in case of failure of the main server, another one can be brought about to continue with the serving.
A replica set is a group of MongoDB instances that maintain the same set of data. They are the basis of production deployments. Replication is advantageous by the fact that data is always available from a different server just in case the main server system fails. Besides, it improves on read throughput by enabling a client send read request to different servers and get response from the nearest one.
A replica set constitutes several data bearing nodes which could be hosted in different machines and an arbiter node. One of these data bearing nodes is labeled as the primary while the others are secondary nodes. The primary node receives all write operations and then replicates the data to the other nodes after the write operation has been completed and the changes recorded in an oplog.
An arbiter is an additional instance that do not maintain a data set but provides a quorum in a replica set by responding to heartbeat and election requests by other replica set members.They thus reduce on the cost of maintaining a replica set rather than a fully functional replica set member with a data set.
Automatic Failover
A primary node may fail due to some reasons such as power outages or network disconnection thereby not able to communicate with the other members. If the communication is cut off for more than the configured electionTimeoutMillis period, one of the secondaries calls for an election to nominate itself as the new primary. If the election is complete and successful, the cluster continues with the normal operation. During this period, no write operations can be carried out. However, the read queries can be configured to go as normal on the secondaries while the primary is offline.
For an optimal replication process, the median time before cluster elects a new primary at maximum should be 12 seconds with default replication configuration settings. This may be affected by factors such as network latency which may extend the time hence one should be considerate of the cluster’s architecture to ensure this time is not set too high.
The value for electionTimeoutMillis can be lowered from the default 10000 (10 seconds) hence the primary can be detected very first during very fast. However, this may be calling the elections frequently for even minor factors such as temporary network latency even though the primary node is healthy. This will lead to issues such as rollbacks for write operations.
Ansible for Replica Sets
As mentioned, a replica set may have members from different host machines hence making it more complex to maintain the cluster. We need a single platform from which this replica set can be maintained with ease. Ansible is one of the tools that provides a better overview for configuring and managing a replica set. If you are new to ansible please have a quick recap from this article to understand the basics such as creating a playbook.
Configuration Parameters
- arbiter_at_index: this defines the position of the arbiter in the replica set members list. An arbiter remember does not have any data as the other members and cannot be used as the primary node. It is only available to create a quorum during the election. For example if you have an even number of members, it is good to add an arbiter such that if the votes are equal, it adds a 1 to make a winning member. The value to be assigned should be an integer.
- chaining_allowed: This takes a boolean value and defines whether the other secondary members should replicate from the other secondary members if chaining _allowed = true. Otherwise if chaining _allowed = false, the other secondary members can only replicate from the primary. The default value is true.
- election_timeout_secs: by default the value is 10000 (takes an integer value). It is the time in milliseconds for detecting when the primary node is not reachable or rather not communicating to the other members hence trigger an election. Set this to a median value of 12 seconds. If set too high, it will take long before detecting the primary failure and hence longer to do an election. Since this affects the write operation, you may end up losing a lot of data during that period. On the other hand if it is set too low, there will be frequent triggering of an election even when the case is not that serious and the primary still reachable. As a result, you will have so many rollbacks for write operations that may at some point lead to poor data integrity or inconsistency.
- heartbeat_timeout_secs: Replica sets need to communicate to each other before an election by sending a signal referred to as a heartbeat. The members then need to respond to this signaling within a specific period which by default is set to 10 seconds. Heartbeat_timeout_secs is the number of seconds the replica set members wait for a successful heartbeat from each other and if a member does not respond, it is marked as inaccessible. However, this is applicable only for protocol version 0. Tha value for this is therefore an integer.
- login_host: This is the host that houses the login database. By default for MongoDB is localhost.
- login_database: the default is the admin and is where login credentials are stored.(takes a string value)
- login_user: the username with which the authentication should be done.(takes a string value)
- login_password: the password to authenticate user with. (takes a string value)
- login_port: This is the MongoDB port for the host to login to. (takes an integer value)
- members: defines a list of replica set members. It is a string separated by comma or a yaml list i.e. mongodb0:27017,mongodb2:27018,mongodb3:27019… If there is no port number, then the 27017 is assumed.
- protocol_version: takes an integer that defines the version of the replication process. Either 0 or 1
- replica_set: this is a string value that defines the name of the replica set.
- ssl: boolean value that defines whether to use SSL connection when connecting to the database or not.
- ssl_certs_reqs: this specifies if a certificate is required from the other side of the connection and if there will be need to validate it if provided. The choices for this are CERT_NONE, CERT_OPTIONAL and CERT_REQUIRED. The default is CERT_REQUIRED.
- validate: takes a boolean value that defines whether to do any basic validation on the provided replica set config. The default value is true.
Creating a MongoDB Replica Set Using Ansible
Here is a simple example of tasks for setting up a replica set in ansible. Let’s call this file tasks.yaml
# Create a replicaset called 'replica0' with the 3 provided members
- name: Ensure replicaset replica0 exists
mongodb_replicaset:
login_host: localhost
login_user: admin
login_password: root
replica_set: replica0
arbiter_at_index:2
election_timeout_secs:12000
members:
- mongodb1:27017
- mongodb2:27018
- mongodb3:27019
when: groups.mongod.index(inventory_hostname) == 0
# Create two single-node replicasets on the localhost for testing
- name: Ensure replicaset replica0 exists
mongodb_replicaset:
login_host: localhost
login_port: 3001
login_user: admin
login_password: root
login_database: admin
replica_set: replica0
members: localhost:3000
validate: yes
- name: Ensure replicaset replica1 exists
mongodb_replicaset:
login_host: localhost
login_port: 3002
login_user: admin
login_password: secret
login_database: root
replica_set: replica1
members: localhost:3001
validate: yes
In our playbook we can call the tasks like
---
- hosts: ansible-test
remote_user: root
become: yes
Tasks:
- include: tasks.yml
If you run this in your playbook, ansible-playbook -i inventory.txt -c ssh mongodbcreateReplcaSet.yaml you will be presented with a response if the replica set has been created or not. If the key mongodb_replicaset is returned with a valueof success and a description of the replica set that has been created, then you are good to go.
Conclusion
In MongoDB generally it is tedious to configure a replica set for the mongod instances that may be hosted by different machines. However, Ansible provides a simple platform of doing the same by just defining a few parameters as discussed above. Replication is one of the processes that ensures continuous application operation hence should be well configured by setting a multiple number of members in the production world. An arbiter is used to create a quorum during the election process hence should be included in the configuration file by defining its position.