Database ops management consists of 80% reading and interpreting your monitoring systems. Hundreds of metrics can be interpreted and combined in various ways to give you deep insights into your database systems and how to optimize them. When running multiple database systems, the monitoring of these systems can become quite a chore. If the interpretation and combination of metrics takes a lot of time, wouldn’t it be great if this could be automated in some way?
This is why we created database advisors in ClusterControl: small scripts that can interpret and combine metrics for you, and give you advice when applicable. For MySQL we have created an extensive library of the most commonly used MySQL monitoring checks. But also for MongoDB we have a broad library of advisors to your disposal. For this blog post, we have picked the nine most important ones for you. We’ll describe each and every one of them in detail.
The nine MongoDB advisors we will cover in this blog post are:
- Disk mount options check
- Numa check
- Collection lock percentage (MMAP)
- Replication lag
- Replication Window
- Un-sharded databases and collections (sharded cluster only)
- Authentication enabled check
- Authentication/authorization sanity check
- Error detection (new advisor)
Disk Mount Options Advisor
It is very important to have your disks mounted in the most optimal way. With the ClusterControl disk mount options advisor, we look more closely at your data disk on a daily basis. In this advisor, we investigate the filesystem used, mount options and the io scheduler settings of the operating system.
We check if your disks have been mounted with noatime and nodiratime. Setting these will decrease the performance of the disks, as on every access to a file or directory the access time has to be written to disk. Since this happens continuously on databases, this is a good performance setting and also increases the durability of your SSDs.
For file systems we recommend to use modern file systems like xfs, zfs, ext4 or btrfs. These file systems are created with performance in mind. The io scheduler is advised to be either on noop or deadline. Deadline has been the default for databases for years, but thanks to faster storage like SSDs the noop scheduler is making more sense nowadays.
Numa Check Advisor
For MongoDB we enable our NUMA check advisor. This advisor will check if NUMA (Non-Uniform Access Memory) has been enabled on your system, and if this is the case, to switch it off.
When Non-Uniform Access Memory has been enabled, the CPU of the server is only able to address its own memory directly and not the other CPUs in the machine. This way the CPU is only able to allocate memory from its own memory space, and allocating anything in excess will result in swap usage. This architecture has a strong performance benefit on multi-processor applications that allocate all CPUs, but as MongoDB isn’t a multi-processor application it will decrease the performance greatly and could lead to huge swap usage.
Collection Lock Percentage (MMAP)
As MMAP is a file based storage system, it doesn’t support the document level locking as found in WiredTiger and RocksDB. Instead the lowest level of locking for MMAP is the collection locks. This means any writes to a collection (insert, update or delete) will lock the entire collection. If the percentage of locks is getting too high, this indicates you have contentions problems on the collection. When not addressed properly, this could bring your write throughput to a grinding halt. Therefore having an advisor warning you up front is very helpful.
MongoDB Replication Lag Advisor
If you are scaling out MongoDB for reads via secondaries, the replication lag is very important to keep an eye on. The MongoDB client drivers will only use secondaries that don’t lag too far behind, else you may risk serving out stale data.
Inside MongoDB the primary will keep track of the replication status of its secondaries. The advisor will fetch the replication information and guards the replication lag. If the lag becomes too high it will send out a warning or critical status message.
MongoDB Replication Window Advisor
Next to replication lag, the replication window is an important metric to watch. The MongoDB oplog is a single collection, that has been limited in a (preset) size. Once the oplog reaches the end and a new transaction needs to be stored, it will evict the oldest transaction to make room for the new transaction. The replication window reflects the number of seconds between the oldest and newest transaction in the oplog.
This metric is very important as you need to know for how long you can take a secondary out of the replicaSet, before it will no longer be able to catch up with the master due to being too far behind in replication. Also if a secondary starts lagging behind, it would be good to know how long we can tolerate this before the secondary is no longer able to catch up.
In the MongoDB shell, a function is available to calculate the replication window. This advisor in ClusterControl uses the function to make the same calculation. The benefit would be that you now can be alerted on a too short replication window.
MongoDB Un-Sharded Databases and Collections Advisor
In a sharded MongoDB cluster, all un-sharded databases and collections are assigned to a default primary shard by the MongoDB shard router. This primary shard can vary between the databases and collections, but in general this would be the shard with the most disk space available.
Having a un-sharded database or collection doesn’t immediately pose a risk for your cluster. However if an application or user starts to write large volumes of data to one of these, the primary shard could fill up quickly and create an outage on this shard. As the database or collection is not sharded, it will not be able to make use of other shards.
Because of this reason we have created an advisor that will prevent this from happening. The advisor will scan all databases and collections, and warn you if it has not been sharded.
Authentication Enabled Check
Without enabling authentication in MongoDB, any user logging in will be treated as an admin. This is a serious risk as normally admin tasks, like creating users or making backups, now have become available to anyone. This combined with exposed MongoDB servers, resulted in the recent MongoDB ransom hacks, while a simple enabling of authentication would have prevented most of these cases.
We have implemented an advisor that verifies if your MongoDB servers have authentication enabled. This can be done explicitly by setting this in the configuration, or implicitly by enabling the replication keyfile. If this advisor fails to detect the authentication has been enabled you should immediately act upon this, as you server is vulnerable to be compromised.
Authentication/Authorization Sanity Check
Next to the authentication enabled advisor, we also have built an advisor that performs a sanity check for both authentication and authorization in MongoDB.
In MongoDB the authentication and authorization is not placed in a central location, but is performed and stored on database level. Normally users will connect to the database, authenticating against the database they intend to use. However, with the correct grants, it is also possible to authenticate against other (unrelated) databases and still make use of another database. Normally this is perfectly fine, unless a user has excessive rights (like the admin role) over another database.
In this advisor, we verify if these excessive roles are present, and if they could pose a threat. We also check at the same time for weak and easy to guess passwords.
Error Detection (new Advisor)
In MongoDB, any error encountered will be counted or logged. Within MongoDB there is a big variety of possible errors: user asserts, regular asserts, warnings and even internal server exceptions. If there are trends in these errors, it is likely that there is either a misconfiguration or an application issue.
This advisor will look at the statistics of MongoDB errors (asserts) and makes sense of this. We interpret the trends found and advice on how to decrease errors in your MongoDB environment!