blog
MongoDB Schema Planning Tips
One of the most advertised features of MongoDB is its ability to be “schemaless”. This means that MongoDB does not impose any schema on any documents stored inside a collection. Normally, MongoDB stores documents in a JSON format so each document can store various kinds of schema/structure. This is beneficial for the initial stages of development but in the later stages, you may want to enforce some schema validation while inserting new documents for better performance and scalability. In short, “Schemaless” doesn’t mean you don’t need to design your schema. In this article, I will discuss some general tips for planning your MongoDB schema.
Figuring out the best schema design which suits your application may become tedious sometimes. Here are some points which you can consider while designing your schema.
Avoid Growing Documents
If your schema allows creating documents which grow in size continuously then you should take steps to avoid this because it can lead to degradation of DB and disk IO performance. By default, MongoDB allows 16MB size per document. If your document size increases more than 16 MB over a period of time then, it is a sign of bad schema design. It can lead to failure of queries sometimes. You can use document buckets or document pre-allocation techniques to avoid this situation. In case, your application needs to store documents of size more than 16 MB then you can consider using MongoDB GridFS API.
Avoid Updating Whole Documents
If you try to update the whole document, MongoDB will rewrite the whole document elsewhere in the memory. This can drastically degrade the write performance of your database. Instead of updating the whole document, you can use field modifiers to update only specific fields in the documents. This will trigger an in-place update in memory, hence improved performance.
Try to Avoid Application-Level Joins
As we all know, MongoDB doesn’t support server level joins. Therefore, we have to get all the data from DB and then perform join at the application level. If you are retrieving data from multiple collections and joining a large amount of data, you have to call DB several times to get all the necessary data. This will obviously require more time as it involves the network. As a solution for this scenario, if your application heavily relies on joins then denormalizing schema makes more sense. You can use embedded documents to get all the required data in a single query call.
Use Proper Indexing
While doing searching or aggregations, one often sorts data. Even though you apply to sort in the last stage of a pipeline, you still need an index to cover the sort. If the index on sorting field is not available, MongoDB is forced to sort without an index. There is a memory limit of 32MB of the total size of all documents which are involved in the sort operation. If MongoDB hits that limit then it may either produce an error or return an empty set.
Having discussed adding indexes, it is also important not to add unnecessary indexes. Each index you add in the database, you have to update all these indexes while updating documents in the collection. This can degrade database performance. Also, each index will occupy some space and memory as well so, number of indexes can lead to storage-related problems.
One more way to optimize the use of an index is overriding the default _id field. The only purpose of this field is keeping one unique field per document. If your data contains a timestamp or any id field then you can override _id field and save one extra index.
Read v/s Write Ratio
Designing schema for any application hugely depends on whether an application is read heavy or write heavy. For example, if you are building a dashboard to display time series data then you should design your schema in such a way that maximizes the write throughput. If your application is E-commerce based then, most of the operations will be read operations as most users will be going through all the products and browsing various catalogs. In such cases, you should use denormalized schema to reduce the number of calls to DB for getting relevant data.
BSON Data Types
Make sure that you define BSON data types for all fields properly while designing schema. Because when you change the data type of any field, MongoDB will rewrite the whole document in a new memory space. For example, if you try to store (int)0 in place of (float)0.0 field, MongoDB rewrites the whole document at a new address due to change in BSON data type.
Conclusion
In a nutshell, it is wise to design schema for your Mongo Database as it will only improve the performance of your application. Starting from version 3.2, MongoDB started supporting document validation where you can define which fields are required to insert a new document. From version 3.6, MongoDB introduced a more elegant way of enforcing schema validation using JSON Schema Validation. Using this validation method, you can enforce data type checking along with required field checking. You can use the above approaches to check whether all documents are using the same type of schema or not.