Severalnines Blog
The automation and management blog for open source databases

MongoDB Sharding - On-premises Deployment and Monitoring of MongoDB Sharded Clusters

Severalnines
Posted in:

Last year, we did a survey asking our users about any other databases they were using alongside MySQL. A clear majority were interested in using other databases alongside MySQL, these included (in order of popularity) MongoDB, PostgreSQL, Cassandra, Hadoop, Redis.

Today, we are glad to announce the availability of ClusterControl for MongoDB, which includes a MongoDB Configurator (no longer available) to easily deploy MongoDB Sharded Clusters, as well as on-premise monitoring and cluster management. Setting up a sharded cluster is not a trivial task, and this is probably the area where most users need help with. Sharding allows MongoDB to handle distribution of data across a number of nodes to maximise use of disk space and dynamically load balance queries. Each shard consists of a replica set, which provides automated failover and redundancy to ensure that data exists on at least 3 servers.

MongoDB Cluster Setup

Using the configurator, you can set up a MongoDB cluster with auto sharding and full failover support using replica sets. The setup looks like this;

  • N number of shards, each shard consisting of 3 shard serves, the mongod instances (started with --shardsvr parameter)
  • Three Config servers - these are mongod instances (started with --configsvr parameter) that store the meta data for the shards. As per documentation, "a production shard cluster will have three config server processes, each existing on a separate machine. Writes to config servers use a two-phase commit to ensure atomic and replication transaction of the shard cluster's metadata".
  • Mongo query routers (mongos) - clients connect a mongos, which route queries to the appropriate shards. They are self contained and will usually be run on each of your application servers.

We use the following defaults:

  • Config servers have their data directory in /var/lib/mongodb/config and listens on port 27019
  • mongod shard servers have their data directory in /var/lib/mongodb, and listens on port 27018
  • mongos listens on port 27017

We also have a ClusterControl server, our on-premise tool to manage and monitor the MongoDB cluster. ClusterControl collects monitoring data from all the various servers, and stores the data on a local monitoring database (cmondb). The admin can visualise their shards and drill down into nodes using the web interface.

ClusterControl also manages all the MongoDB nodes, and will restart any nodes that fail.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Using the MongoDB Configurator

** No longer available

The wizard is very straightforward, and we would recommend you stick with the default values. You will however need to key in the IP addresses of the servers you are deploying on. At the end, the wizard will generate a deployment package with your unique settings. You can use this package and run one command (./deploy.sh) to deploy your entire cluster.

If you have small servers (with < 10GB of free disk space) for testing, then you can also use "smallfiles", which will make the config servers and shard servers use less disk space.

One small note, we do require that you can ssh from the ClusterControl server to the other servers using key-based authentication, so you don't have to type in ssh passwords. If you have not setup key-based authentication, the deploy.sh script will ask you if you can ssh without typing in passwords. If you can't, it will be setup for you.

Monitoring and Management

The ClusterControl server will sample quite a lot of data from all the MongoDB nodes as well from the underlying OS/HW, and is a tool to find out what is going on in your cluster. The default sampling time is 10 seconds, but this can be changed. The collected data is also used to build up a logical topology of the cluster, and ClusterControl can thereby derive the general health of the cluster.

Failed processes are automatically restarted, and users are alerted if processes fail too often or too fast. Other alerts are created if replication between PRIMARY and SECONDARY is lagging, or actually completely broken, and users will get an advise on what to do next.

A command line tool, s9s_mongodb_admin allows you to remove blocking lockfiles, stop the entire cluster in case it is needed, and start it again.

Future Work - more management functionality

We are currently working on the following:

  • cluster backups, storage to a centralized location
  • restaging failed servers with data from a healthy server
  • adding nodes to a shard
  • adding shards

Related Post

Video: The Difference Between MongoDB Sharding and a MongoDB ReplicaSet

This video details the difference between a MongoDB ReplicaSet and a MongoDB Sharded Cluster in ClusterControl.

Posted in:

MongoDB tools from the community that complement ClusterControl

MongoDB is the database of choice for developers. Out of the developer community has come a number of tools to help you manage your MongoDB data and servers. We made a selection of MongoDB community tools that can be used to complement ClusterControl.

Posted in:

Pre-emptive security with audit logging for MongoDB

As the MongoDB ransom hack took a lot of people by surprise, having access to an audit log would have given them the ability to perform pre-emptive measures. The audit log is one of the most underrated features of MongoDB Enterprise and Percona Server for MongoDB.  We will uncover its secrets in this blog post.

Posted in:

MongoDB Webinar - How to Secure MongoDB with ClusterControl

With authentication disabled by default in MongoDB, learning how to secure MongoDB becomes essential. In this webinar we will explain how you can improve your MongoDB security and demonstrate how this is automatically done by ClusterControl.