blog

MongoDB Sharding – On-Premises Deployment and Monitoring of MongoDB Sharded Clusters

Art van Scheppingen

Published

Last year, we did a survey asking our users about any other databases they were using alongside MySQL. A clear majority were interested in using other databases alongside MySQL, these included (in order of popularity) MongoDB, PostgreSQL, Cassandra, Hadoop, Redis.

Today, we are glad to announce the availability of ClusterControl for MongoDB, which includes a MongoDB Configurator (no longer available) to easily deploy MongoDB Sharded Clusters, as well as on-premise monitoring and cluster management. Setting up a sharded cluster is not a trivial task, and this is probably the area where most users need help with. Sharding allows MongoDB to handle distribution of data across a number of nodes to maximise use of disk space and dynamically load balance queries. Each shard consists of a replica set, which provides automated failover and redundancy to ensure that data exists on at least 3 servers.

MongoDB Cluster Setup

Using the configurator, you can set up a MongoDB cluster with auto sharding and full failover support using replica sets. The setup looks like this;

  • N number of shards, each shard consisting of 3 shard serves, the mongod instances (started with –shardsvr parameter)
  • Three Config servers – these are mongod instances (started with –configsvr parameter) that store the meta data for the shards. As per documentation, “a production shard cluster will have three config server processes, each existing on a separate machine. Writes to config servers use a two-phase commit to ensure atomic and replication transaction of the shard cluster’s metadata”.
  • Mongo query routers (mongos) – clients connect a mongos, which route queries to the appropriate shards. They are self contained and will usually be run on each of your application servers.

We use the following defaults:

  • Config servers have their data directory in /var/lib/mongodb/config and listens on port 27019
  • mongod shard servers have their data directory in /var/lib/mongodb, and listens on port 27018
  • mongos listens on port 27017

We also have a ClusterControl server, our on-premise tool to manage and monitor the MongoDB cluster. ClusterControl collects monitoring data from all the various servers, and stores the data on a local monitoring database (cmondb). The admin can visualise their shards and drill down into nodes using the web interface.

ClusterControl also manages all the MongoDB nodes, and will restart any nodes that fail. Using the MongoDB Configurator

** No longer available

The wizard is very straightforward, and we would recommend you stick with the default values. You will however need to key in the IP addresses of the servers you are deploying on. At the end, the wizard will generate a deployment package with your unique settings. You can use this package and run one command (./deploy.sh) to deploy your entire cluster.

If you have small servers (with < 10GB of free disk space) for testing, then you can also use "smallfiles", which will make the config servers and shard servers use less disk space.

One small note, we do require that you can ssh from the ClusterControl server to the other servers using key-based authentication, so you don’t have to type in ssh passwords. If you have not setup key-based authentication, the deploy.sh script will ask you if you can ssh without typing in passwords. If you can’t, it will be setup for you.

Monitoring and Management

The ClusterControl server will sample quite a lot of data from all the MongoDB nodes as well from the underlying OS/HW, and is a tool to find out what is going on in your cluster. The default sampling time is 10 seconds, but this can be changed. The collected data is also used to build up a logical topology of the cluster, and ClusterControl can thereby derive the general health of the cluster.

Failed processes are automatically restarted, and users are alerted if processes fail too often or too fast. Other alerts are created if replication between PRIMARY and SECONDARY is lagging, or actually completely broken, and users will get an advise on what to do next.

A command line tool, s9s_mongodb_admin allows you to remove blocking lockfiles, stop the entire cluster in case it is needed, and start it again.

Future Work – more management functionality

We are currently working on the following:

  • cluster backups, storage to a centralized location
  • restaging failed servers with data from a healthy server
  • adding nodes to a shard
  • adding shards

Subscribe below to be notified of fresh posts