blog

The Basics of Deploying a MongoDB Replica Set and Shards Using Puppet

Onyancha Brian Henry

Published

Database system perform best when they are integrated with some well defined approaches that facilitate both the read and write throughput  operations. MongoDB went the extra mile by embracing replication and sharding with the aim of enabling horizontal and vertical scaling as opposed to relational DBMs whose same concept only enhance vertical scaling.

 Sharding ensures distribution of load among the members of the database cluster so that the read operations are carried out with little latency. Without sharding, the capacity of a single database server with a large set of data and high throughput operations can be technically challenged and may result in failure of that server if the necessary measures are not taken into account. For example, if the rate of queries is very high, the CPU capacity of the server will be overwhelmed.

Replication on the other hand is a concept whereby different database servers are housing the same data. It ensures high availability of data besides enhancing data integrity. Take an example of a high performing social media application, if the main serving database system fails like in case of a power blackout, we should have another system to be serving the same data. A good replica set should have more than 3 members, an arbiter and optimal electionTimeoutMillis. In replication, we will have a master/primary node where all the write operations are made and then applied to an Oplog. From the Oplog, all the made changes are then applied to the other members, which in this case are referred to as secondary nodes or slaves. In case the primary nodes does not communicate after some time: electionTimeoutMillis, the other nodes are signaled to go for an election. The electionTimeoutMillis should be set not too high nor too low for reason that the systems will be down for a long time hence lose a lot of data or frequent elections that may result even with  temporary network latency hence data inconsistency respectively. An arbiter is used to add a vote to a winning member to become a master in case there is a draw but does not carry any data like the other members.

Why Use Puppet to Deploy a MongoDB Replica Set

More often, sharding is used hand in hand with replication. The process of configuring and maintaining a replica set is not easy due to:

  1. High chances of human error
  2. Incapability to carry out repetitive tasks automatically
  3. Time consuming especially when a large number of members is involved
  4. Possibility of work dissatisfaction
  5. Overwhelming complexity that may emerge.

In order to overcome the outlined setbacks, we settle to an automated system like Puppet that have plenty of resources to help us work with ease.

In our previous blog, we learnt the process of installing and configuring MongoDB with Puppet. However, it is important to understand the basic resources of Puppet since we will be using them in configuring our replica set and shards. In case you missed it out, this is the manifest file for the process of installing and running your MongoDB on the machine you created

​  package {'mongodb':

    ensure => 'installed',

  }

  service {'mongodb':

    ensure => 'running',

    enable => true

  }

So we can put the content above in a file called runMongoDB.pp and run it with the command 

$ sudo apply runMongoDB.pp

Sing the mongodb‘ module and functions, we can set up our replica set with the corresponding parameters for each  mongodb resource.

MongoDB Connection

We need to establish a mongodb connection between a node and the mongodb server. The main aim of this is to prevent configuration changes from being applied if the mongodb server cannot be reached but can potentially be used for other purposes like database monitoring. We use the mongodb_conn_validator

mongodb_conn_validator{‘mongodb_validator’:

ensure => present,

     server: ‘127.0.0.1:27017’,

     timeout: 40,

     tcp_port:27017

    }

name:  in this case the name mongodb_validator defines identity of the resource. It could also be considered as a connection string

server: this could be a string or an array of strings containing DNS names/ IP addresses of the server where mongodb should be running.

timeout: this is the maximum number of seconds the validator should wait before deciding that the puppetdb is not running.

tcp_port:  this is a provider for the resource that validates the mongodb connection by attempting the https connection to the mongodb server. The puppet SSL certificate setup from the local puppet environment is used in the authentication.

Creating the Database

mongodb_database{‘databaseName’:

ensure => present,

     tries => 10

}

This function takes 3 params that is:

name:  in this case the name databaseName defines the name of the database we are creating, which would have also been declared as name => ‘databaseName’.

tries: this defines the maximum amount of two second tries to wait MongoDB startup

Creating MongoDB User

The module mongodb_user enables one to create and manage users for a given database in the puppet module.

mongodb_user {userprod:

  username => ‘prodUser’,

  ensure => present,

  password_hash => mongodb_password(‘prodUser’, ‘passProdser’),

  database => prodUser,

  roles => [‘readWrite’, ‘dbAdmin’],

  tries  => 10

}

Properties

username: defines the name of the user.

password_hash: this is the password hash of the user. The function mongodb_password() available on MongoDB 3.0 and later is used for creating the hash.

roles: this defines the roles that the user is allowed to execute on the target database.

password: this is the plain user password text.

database: defines the user’s target database.

Creating a Replica Set

We use the module mongodb_replset to create a replica set.

Mongodb_replset{'replicaset1':

   arbiter: 'host0:27017',

   ensure  => present,

   members => ['host0:27017','host1:27017', 'host2:27017', 'host3:27017'] 

   initialize_host: host1:27017

}

name: defines the name of the replica set.

members: an array of members the replica set will  hold.

initialize_host: host to be used in initialization of the replica set

arbiter: defines the replica set member that will be used as an arbiter.

Creating a MongoDB Shard

mongodb_shard{'shard1':

   ensure  => present,

   members => ['shard1/host1:27017', 'shard1/host2:27017', 'shard1/host3:27017'] 

   keys: 'price'

}

name: defines the name of the shard.

members: this the array  of members the shard will  hold.

keys: define the key to be used in the sharding or an array of keys that can be used to create a compound shard key.

Subscribe below to be notified of fresh posts