blog

Troubleshooting a MongoDB Sharded Cluster

Onyancha Brian Henry

Published: June 12, 2020
Last Updated: May 4, 2022

In MongoDB, large data sets involve high throughput operations and this may overwhelm the capacity of a single server. Large working data sets implicate more stress on the I/O capacity of disk devices and may lead to a problem such as Page Faults.

There are mainly two ways of solving this…

Vertical Scaling: increasing single server capacity. Achieved by adding more CPU, RAM and storage space but with a limitation that: available technology may restrict a single machine from being sufficiently powerful for some workload. Practically, there is a maximum for vertical scaling.
Horizontal Scaling Through Sharding: This involves dividing system dataset over multiple servers hence reducing the overall workload for a single server. Expanding the capacity of the deployment only requires adding more servers to lower overall cost than high-end hardware for a single machine. However, this comes with a trade off that there will be a lot of complexity in infrastructure and maintenance for the deployment. The complexity gets more sophisticated when troubleshooting the sharded cluster in an event of disaster. In this blog, we provide some of the troubleshooting possibilities that may help:
Selecting Shard Keys and Cluster Availability
Mongos instance becoming unavailable
A member becomes absent from the shard replica set
All members of a replica set are absent
Stale config data leads to Cursor Fails
Config server becomes unavailable
Fixing Database String Error
Avoiding Downtime when moving config servers

Selecting Shard Keys and Cluster Availability

Sharding involves dividing data into small groups called shards so as to reduce the overall workload for a given throughput operation. This grouping is achieved through selecting an optimal key which is mainly the most important part before sharding. An optimal key should ensure:

Mongos can isolate most queries to a specific mongod. If for example more operations are subjected to a single shard, failure of that shard will only render data associated with it being absent at that time. It is advisable to select a shard key that will give more shards to reduce the amount of data unavailability in case the shard crashes.
MongoDB will be able to divide the data evenly among the chunks. This ensures that throughput operations will also be distributed evenly reducing the chances of any failing due more workload stress.
Write scalability across the cluster for ensuring high availability. Each shard should be a replica set in that if a certain mongod instance fails, the remaining replica set members are capable of electing another member to be a primary hence ensuring operational continuity.

If in any case a given shard has the tendency of failing, start by checking how many throughput operations is it subjected to and consider selecting a better sharding key to have more shards.

What If? Mongos Instance Becomes Absent

First check if you are connecting to the right port since you might have changed unknowingly. For instance, deployment with the AWS platform, there is a likelihood of this issue because of the security groups that may not allow connections on that port. For an immediate test, try specifying the full host:port to make sure you are using a loopback interface. The good thing is, if each application server has its own mongos instance, the application servers may continue accessing the database. Besides, mongos instances have their states altered with time and can restart without necessarily losing data. When the instance is reconnected it will retrieve a copy of the config database and begin routing queries.

Ensure the port you are trying to reconnect on is also not occupied by another process.

What If? A Member Becomes Absent From the Shard Replica Set

Start by checking the status of the shard by running the command sh.status(). If the returned result does not have the clusterId then the shard is indeed unavailable. Always investigate availability interruptions and failures and if you are unable to recover it in the shortest time possible, create a new member to replace it as soon as possible so as to avoid more data loss.

If a secondary member becomes unavailable but with current oplog entries, when reconnected it can catch up to the latest set state by reading current data from the oplog as normal replication process. If it fails to replicate the data you need to do an initial sync using either of these two options…

Restart mongod with an empty data directory and let MongoDB’s normal initial syncing feature restore the data. However, this approach takes long to copy the data but quite straight forward.
Restart the host machine with a copy of a recent data directory from another member in the replica set. Quick process but with complicated steps

The initial sync will enable MongoDB to…

Clone all the available databases except the local database. Ensure that the target member has enough disk space in the local database to temporarily store the oplog records for a duration the data is being copied.
Apply all changes to the data set using the oplog from the source. The process will be complete only if the status of the replica transitions from STARTUP2 to SECONDARY.

What If? All Members of a Replica Set are Absent

Data held in a shard will be unavailable if all members of a replica set shard become absent. Since the other shards remain available, read and write operations are still possible except that the application will be served with partial data. You will need to investigate the cause of the interruptions and attempt reactivating the shard as soon as possible. Check which query profiler or the explain method what might have led to that problem.

What If? Stale Config Data Leads to Cursor Fails

Sometimes a mongos instance may take long to update metadata cache from the config database leading to a query returning the warning:

could not initialize cursor across all shards because : stale config detected

This error will always be presented until the mongos instances refresh their caches. This should not propagate back to the application. To fix this you need to force the instance to refresh by running fluRouterConfig.

To flush the cache for a specific collection run

db.adminCommand({ flushRouterConfig: "" } )

To flush cache for a specific database run

db.adminCommand({ flushRouterConfig: "" } )

To flush cache for all databases and their collections run:

db.adminCommand("flushRouterConfig")

db.adminCommand( { flushRouterConfig: 1 } )

What If? Config Server Becomes Unavailable

Config server in this case can be considered as the primary member from which secondary nodes replicate their data. If it becomes absent, the available secondary nodes will have to elect one among their members to become the primary. To avoid getting into a situation where you may not have a config server, consider distributing the replica set members across two data centers since…

If one data center goes down, data will still be available for reads rather than no operations if you used a single data center.
If the data center that entails minority members goes down, the replica set can still serve both write and read operations.

It is advisable to distribute members across at least three data centers.

Another distribution possibility is to evenly distribute the data bearing members across the two data centers and remaining members in the cloud.

Fixing Database String Error

As from MongoDB 3.4, SCCC config servers are not supported for mirrored mongod instances. If you need to upgrade your sharded cluster to version 3.4, you need to convert config servers from SCCC to CSRS.

Avoiding Downtime When Moving Config Servers

Downtime may happen as a result of some factors such as power outage or network frequencies hence resulting in the failure of a config server to the cluster. Use CNAME to identify that server for renaming or renumbering during reconnection. If the moveChunk commit command fails during migration process, MongoDB will report the error:

ERROR: moveChunk commit failed: version is at | instead of

|" and "ERROR: TERMINATING"

This means the shard has also not been connected to the config database hence the primary will terminate this member to avoid data inconsistency.You need to resolve the chunk migration failure independently by consulting the MongoDB support. Also ensure to provide some stable resources like network and power to the cluster.

Conclusion

A MongoDB sharded cluster reduces workload over which a single server would have been subjected to hence improving on performance of throughput operations. However, failure to configure some params correctly like selecting an optimal shard key may create a load imbalance hence some shards end up failing.

Assuming the configuration is done correctly some unavoidable setbacks such as power outages may also strike. To continue supporting your application with minimal downtime, consider using at least 3 data centers. If one fails the others will be available to support read operations if the primary is among the affected members. Also upgrade your system to at least version 3.4 as it supports more features.

ClickHouse Schema Design and Data Modeling

Building a Modern Analytics Stack Around ClickHouse

Managing ClickHouse Resources in Multi-Tenant Environments

How to Migrate from SQL Server to PostgreSQL HA for Dockerized Apps