Preparing a MongoDB Server for Production

Onyancha Brian Henry

After developing your application and database model (when it is time to move the environment into production) there are a couple of things that need to be done first. Oftentimes developers fail to take into consideration additional important MongoDB steps before deploying the database into production. Consequently, it is in the production mode they end up encountering underlying setbacks that are not be presented in the development mode. Sometimes it may be too late or rather a lot of data would be lost if disaster strikes. Besides, some of the steps discussed here will enable one to gauge the database’s health and hence plan for necessary measures before disaster strikes.

Use the Current Version and Latest Drivers

Generally, latest versions in any technology come with improved features in regard to  the underlying functionality than their predecessors. MongoDB’s latest versions are more robust and improved than their predecessors in terms of performance, scalability and memory capacity. The same applies for the related drivers since they  are  developed by the core database engineers and get updated more frequently even than the database itself. 

Native extensions installed for your language can easily lay a platform for quick and standard procedures for testing, approving and upgrading the new drivers. There are also automotive software such as Ansible, Puppet, SaltStack and Chef that can be used for easy upgrade of the MongoDB in all your nodes without incurring professional expenses and time.

Also consider using the WiredTiger storage engine as it is the most developed with modern features that suit modern database expectations

Subscribe to a MongoDB mailing list to get the latest information in regard to changes to new versions & drivers and bug fixes hence keeping updated.

Use a 64-bit System to Run MongoDB

In 32-bit systems,  MongoDB processes are limited to about 2.5GB of data because the database uses memory-mapped files for performance. This becomes a limitation for processes that might surpass  the boundary leading to a crush. The core impact will be: in case of an error,  you will not be able to  restart the server till the time you remove your data or migrate your database to a higher system like the 64-bit hence a higher downtime for your application. 

If you have to keep using a 32-bit system,  your coding must be very simple to reduce the number of bugs and latency for throughput operations.

However for code complexities such as aggregation pipeline and geodata, it is advisable to use the 64-bit system.

Ensure Documents are Bounded to 16MB Size

MongoDB documents are limited to the 16MB size but you need not to get close to this limit as it will implicate some performance degradation. In practice, the documents are mostly KB or less in size. Document size is dependent on the data modelling strategy between embedding and referencing. Embedding is preferred where the document size is not expected to grow much in size. For instance, if you have a social media application where users post and it has comments, the best practice will be to have two collections one to hold post information.

  {

   _id:1,

   post: 'What is in your mind?',

   datePosted: '12-06-2019',

   postedBy:'xyz',

   likes: 10,

   comments: 30

}

and the other to hold comments for that post.

     {

   _id: ObjectId('2434k23k4'),

   postId: 1,

   dateCommented: '12-06-2019',

   commentedBy:'ABCD',

   comment: 'When will we get better again',

}

By having such data models, comments will be stored in a different collection from the post. This prevents the document in post collection from growing out of bound in case there will be so many comments. Ensure you  avoid application patterns that would allow documents to grow unbounded.

Ensure Working Set Fits in Memory

The database may fail to read data from virtual memory (RAM) leading to page faults.  Page faults will force the database to read data from a physical disk leading to  increased latency and consequently a lag in the overall application performance.  Page faults happen due to working with a large set that does not fit in memory. This may be as a result of some documents having an unbounded size or poor sharding strategy.Remedies for page faults will be:

  • Ensuring documents are bounded to the 16MB size.
  • Ensuring a good sharding strategy by selecting an optimal sharding key that will limit the number of documents a throughput operation will be subjected to.
  • Increase size of the MongoDB instance to accommodate more working sets.

Ensure you Have Replica Sets in Place

In the database world, it is not ideal to rely on a single  database due to the fact that catastrophe may strike. Besides, you would expect an increase in the number of users to the database hence need to ensure high availability of data. Replication is a crucial approach for ensuring high availability in case of failover. MongoDB has the capability of serving data geographically: which means users from different locations will be served by the nearest cloud host as one way of reducing latency for requests. 

In case the primary node fails, the secondary nodes can elect a new one to keep up with write operations rather than the application having a downtime during the failover. Actually, some cloud hosting platforms that are quite considerate with replication don’t support non-replicated MongoDB for production environments.

Enable Journaling

As much as journaling implicates some performance degradation, it is important as well. Journaling enhances write ahead operations which means in case the database fails in the process of doing an update, the update would have been saved somewhere and when it comes alive again, the process can be completed. Journaling can easily facilitate crash recovery hence should be turned on by default.

Ensure you Setup a Backup Strategy

Many businesses fail to continue after data loss due to no or poor backup systems. Before deploying your database into production ensure you have used either of these backup strategies:

  • Mongodump: optimal for small deployments and when producing backups filtered on specific needs.
  • Copying underlying: optimal for large deployments and efficient approach for taking full backups and restoring them.
  • MongoDB Management Service (MMS): provides continuous online backup for MongoDB as a fully managed service. Optimal for a sharded cluster and replica sets.

Backups files should also not be stored in the same host provider of the database. Backup Ninja is a service that can be used for this.

Be Prepared for Slow Queries

Hardly can one realize slow queries in the development environment due to the fact that little data is involved. However, this may not be the case in production considering that you will have many users or a lot of data will be involved. Slow queries may arise if you failed to use indexes or used an indexing key that is not optimal. Nevertheless, we should find a way that will show you the reason for slow queries. 

We therefore resolve to enable MongoDB Query Profiler. As much as this can lead to performance degradation, the profiler will help in exposing performance issues. Before deploying your database, you need to enable the profiler for the collections you suspect might have slow queries, especially ones that involve documents  with a lot of embedding.

Connect to a Monitoring Tool

Capacity planning is a very essential undertaking in MongoDB. You will also need to know the health of your db at any given time. For convenience, connecting your database to a monitoring tool will save you some time in realizing what you need to improve on your database with time. For instance, a graphical representation that indicates CPU slow performance as a result of increased queries will direct you to add more hardware resources to your system. 

Monitoring tools also have an alerting system through mailing or short messages that conveniently update you on some issues before they heighten into catastrophe. Therefore, in production, ensure your database is connected to a monitoring tool.

ClusterControl provides free MongoDB monitoring in the Community Edition.

Implement Security Measures

Database security is another important feature that needs to be taken into account strictly. You need to protect the MongoDB installation in production by ensuring some pre-production security checklists are adhered to. Some of the considerations are:

  • Configuring Role-Based Access Control
  • Enabling Access Control and Enforce Authentication
  • Encrypting incoming and outgoing connections (TLS/SSL)
  • Limiting network exposure
  • Encrypting and protecting data
  • Have a track plan on access and changes to database configurations

Avoid external injections by running MongoDB with secure configuration options. For example, disabling server-side scripting if not using JavaScript server side operations such as mapReduce and $where. Use the JSON validator for your collection data through some modules like mongoose to ensure that all stored documents are in the  valid BSON format.

Hardware and Software Considerations 

MongoDB has few hardware prerequisites, since it is explicitly designed with great consideration on the commodity hardware necessary. The following are the main hardware deliberations for MongoDB you need to consider before deployment into production.

  • Assign adequate  RAM and CPU
  • Use the WiredTiger storage engine. Designed to use filesystem cache and WiredTiger internal cache hence increased performance. For instance, when operating with a system of 4GB RAM the  WiredTiger cache uses 1.5GB of the RAM  ( 0.5 * (4GB -1GB) = 1.5GB) while a system with 1.2GB of RAM WiredTiger  cache uses only 256MB. 
  • NUMA Hardware. There are numerous operational issues which include slow performance and high system process usage,  therefore, one should consider configuring a memory interleave policy. 
  • Disk and Storage system: Use solid state Disk (SSDs): MongoDB shows better  price-performance ratio with SATA SSD

Conclusion

Databases in production are very crucial for ensuring smooth running of a business hence should be treated with a lot of considerations. One should lay down some procedures that can help to reduce errors or rather provide an easy way of finding these errors. Besides, it is advisable to set up an alerting system that will show the database’s health with time for capacity planning and detecting issues before they mitigate into catastrophe.

 
ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.