MongoDB performance depends on how it utilizes the underlying resources. It stores data on disk, as well as in memory. It uses CPU resources to perform operations, and a network to communicate with its clients. There should be adequate resources to support its general liveliness. In this article we are going to discuss various resource requirements for the MongoDB database system and how we can optimize them for maximum performance.
Requirements for MongoDB
Apart from providing large-scale resources such as the RAM and CPU to the database, tuning the Operating System can also improve performance to some extent. The significant utilities required for establishing a MongoDB environment include:
- Enough disk space
- Adequate memory
- Excellent network connection.
The most common operating system for MongoDB is Linux, so we’ll look at how to optimize it for the database.
There are many tuning techniques that can be applied to Linux. However, as some changes take place without rebooting your host, it is always a good practice to reboot after making changes to ensure they are applied. In this section, the tuning implementations we are going to discuss are:
- Network Stack
- NTP Daemon
- Linux User Limit
- File system and Options
- Virtual Memory
Like any other software, an excellent network connection provides a better exchange interface for requests and responses with the server. However, MongoDB is not favored with the Linux default kernel network tunings. As the name depicts, this is an arrangement of many layers that can be categorized into 3 main ones: User area, Kernel area and Device area. The user area and kernel area are referred to as host since their tasks are carried out by the CPU. The device area is responsible for sending and receiving packets through an interface called Network Interface Card. For better performance with the MongoDB environment, the host should be confined to a 1Gbps network interface limit. In this case, what we are supposed to tune is the relatively throughput settings which include:
- net.core.somaxconn (increase the value)
- net.ipv4.tcp_max_syn_backlog (increase the value)
- net.ipv4.tcp_fin_timeout (reduce the value)
- net.ipv4.tcp_keepalive_intvl (reduce the value)
- net.ipv4.tcp_keepalive_time (reduce the value)
To make these changes permanent, create a new file /etc/sysctl.d/mongodb-sysctl.conf if it does not exist and add these lines to it.
net.core.somaxconn = 4096 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_time = 120 net.ipv4.tcp_max_syn_backlog = 4096
Then run the command as root user /sbin/sysctl -p in order to apply the changes permanently.
Network Time Protocol (NTP) is a technique for which a software clock of a Linux system is synchronized with internet time servers. MongoDB, being a cluster, is dependent on time consistency across nodes. For this reason, it is important for the NTP to be run permanently on MongoDB hosts. The importance of the NTP configuration is to ensure continuous serving of the server to some set time after a network disconnection. By default, the NTP is installed on the client side so for MongoDB to install the NTP on a Linux system with Debian/Ubuntu flavor, just run the command:
$ sudo apt-get install ntp
You can visit ntp.conf to see the configuration of the NTP daemon for different OS.
Linux User Limit
Sometimes a user side fault can end up impacting the entire server and host system. To shun this, the Linux system is designed to undertake some system resource limits regarding processes being executed on a per-user basis. This being evident, it will be inappropriate to deploy MongoDB on such default system configurations since it would require more resources than the default provision. Besides, MongoDB is often the main process to utilize the underlying hardware, therefore, it will be predominant to optimize the Linux system for such dedicated usage. ThE database can then fully exploit the available resources.
However, it will not be convenient to disable this limit constraints or set them to an unlimited state. For example, if you run into a shortage of CPU storage or RAM, a small fault can escalate into a huge problem and result into other features to fail – e.g., SSH which is vital in solving the initial problem.
In order to achieve better estimations, you should understand the constraints requirements at the database level. For instance, estimating the number of users that will make requests to the database and processing time. You can refer to Key things to Monitor for MongoDB. A most preferable limit for max-user-processes and open-files are 64000. To set these values create a new file if it does not exist as /etc/security/limits.d and add these lines
mongod soft nofile 64000 mongod hard nofile 64000 mongod soft nproc 64000 mongod hard nproc 64000
For you to apply this changes, restart your mongod since the changes apply only to new shells.
File System and Options
MongoDB employs 3 type of filesystems that is, ext3, ext4, and XFS for on-disk database data. For the WiredTiger storage engine employed for MongoDB version greater than 3, the XFS is best used rather than ext4 which is considered to create some stability issues while ext3 is also avoided due to its poor pre-allocation performance. MongoDB does not use the default filesystem technique of performing an access-time metadata update like other systems. You can therefore disable access-time updates to save on the small amount of disk IO activity utilized by these updates.
This can be done by adding a flag noatime to the file system options field in the file etc/fstab for the disk serving MongoDB data.
$ grep "/var/lib/mongo" /proc/mounts /dev/mapper/data-mongodb /var/lib/mongo ext4 rw, seclabel, noatime, data=ordered 0 0
This change can only be realized when your reboot or restart your MongoDB.
Among the several security features a Linux system has, at kernel-level is the Security-Enhanced Linux. This is an implementation of fine-grained Mandatory Access Control. It provides a bridge to the security policy to determine whether an operation should proceed. Unfortunately, many Linux users set this access control module to warn only or they disable it totally. This is often due to some associated setbacks such as unexpected permission denied error. This module, as much as many people ignore it, plays a major role in reducing local attacks to the server. With this feature enabled and the correspondent modes set to positive, it will provide a secure background for your MongoDB. Therefore, you should enable the SELinux mode and also apply the Enforcing mode especially at the beginning of your installation. To change the SELinux mode to Enforcing: run the command
$ sudo setenforce Enforcing
You can check the running SELinux mode by running
$ sudo getenforce
MongoDB employs the cache technology to enhance quick fetching of data. In this case, dirty pages are created and some memory will be required to hold them. Dirty ratio therefore becomes the percentage of the total system memory that can hold dirty pages. In most cases, the default values are between (25 – 35)%. If this value is surpassed, then the pages are committed to disk and have an effect of creating a hard pause. To avoid this, you can set the kernel to always flush data through another ratio referred to as dirty_background_ratio whose value ranges between (10% – 15%) to disk in the background without necessarily creating the hard pause.
The aim here is to ensure quality query performance. You can therefore reduce the background ratio if your database system will require large memory. If a hard pause is allowed, you might end up having data duplicates or some data may fail to be recorded during that time. You can also reduce the cache size to avoid data being written to disk in small batches frequently that may end up increasing the disk throughput. To check the currently running value you can run this command:
$ sysctl -a | egrep “vm.dirty.*_ratio”
and you will be presented with something like this.
vm.dirty_background_ratio = 10 vm.dirty_ratio = 20
It is a value ranging from 1 to 100 for which the Virtual Memory manager behaviour can be influenced from. Setting it to 100 implies to swap forcefully to disk and set it to 0 directs the kernel to swap only to shun out-of-memory problems. The default range for Linux is 50 – 60 of which is not appropriate for database systems. In my own test, setting the value between 0 to 10 is optimal. You can always set this value in the /etc/sysctl.conf
vm.swappiness = 5
You can then check this value by running the command
$ sysctl vm.swappiness
For you to apply these changes run the command /sbin/sysctl -p or you can reboot your system.