What's New in MongoDB 4.2

Onyancha Brian Henry

Database updates come with improved features for performance, security, and with new integrated features. It is always advisable to test a new version before deploying it into production, just to ensure that it suits your needs and there is no possibility of crashes. 

Considering many products, those preceding the first minor versions of a new major version have the most important fixes. For instance I would prefer to have MongoDB version 4.2.1 in production a few days after release than I would for version 4.2.0. 

In this blog we are going to discuss what has been included and what improvements have been made to MongoDB version 4.2

What’s New in MongoDB 4.2

  1. Distributed transactions
  2. Wildcard indexes
  3. Retryable reads and writes
  4. Automatic Client-Side Field-level encryption.
  5. Improved query language for expressive updates
  6. On-demand materialized Views
  7. Modern maintenance operations

Distributed Transactions

Transactions are important database features that ensure data consistency and integrity especially those that guarantees the ACID procedures. MongoDB version 4.2  now supports multi-document transactions on replica sets and a sharded cluster through the distributed transaction approach. The same syntax for using transactions has been maintained as the previous 4.0 version.

However, the client driver specs have changed a bit hence if one intends to use transactions in MongoDB 4.2, you must  upgrade the drivers to versions that are compatible with 4.2 servers.

This version does not limit the size of a transaction in terms of memory usage but only dependent on the size of your hardware and the hardware handling capability.

Global cluster locale reassignment is now possible with version 4.2. This is to say, for a geo zone sharding implementation, if a user residing in region A moves to region B, by changing the value of their location field, the data can be automatically moved from region A to B through a transaction.

The sharding system now allows one to change a shard key contrary to the previous version. Literally, when a shard key is changed, it is an equivalent to moving the document to another shard. In this version, MongoDB wraps this update and if the document needs to be moved from one shard to another, the update will be executed inside a transaction in a background manner.

Using transactions is not an advisable approach since they degrade database performance especially if they occur multiple times. During a transaction period,  there is a stretched window for operations that may cause conflicts when making writes to a document that is affected. As much as a transaction can be retried, there might be an update made to the document before this retry, and whenever the retry happens, it may deal with the old rather than the latest document version. Retries obviously exert more processing cost besides increasing the application downtime through growing latency. 

A good practice around using transactions include:

  1. Avoid using unindexed queries inside a transaction as a way of ensuring the Op will not be slow.
  2. Your transaction should involve a few documents.

With the MongoDB dynamic schema format and embedding feature, you can opt to put all fields in the same collection to avoid the need to use transaction as a first measure.

Wildcard Indexes

Wildcard indexes were introduced in MongoDB version 4.2 to enhance queries against arbitrary fields or fields whose names are not known in advance, by indexing the entire document or subdocument.They are not intended to replace the workload based indexes but suit working with data involving polymorphic pattern. A polymorphic pattern is where all documents in a collection are similar but not having an identical structure. Polymorphic data patterns can be generated from application involving, product catalogs or social data. Below is an example of Polymorphic collection data

{

Sport: ‘Chess’,

playerName: ‘John Mah’,

Career_earning: {amount: NumberDecimal(“3000”), currency: “EUR”},

gamesPlayed:25,

career_titles:10

},

{

Sport: Tennis,

playerName: ‘Semenya Jones,

Career_earning: {amount: NumberDecimal(“34545”), currency: “USD”},

Event: {

name:”Olympics”,

career_titles:10,

career_tournaments:14

}

By indexing the entire document using Wildcard indexes, you can make a query using any arbitrary field as an index.

To create a Wildcard index

$db.collection.createIndex({“fieldA.$**”: 1})

If the selected field is a nested document or an array, the Wildcard index recurses into the document and stores the value for all the fields in the document or array.

Retryable Reads and Writes

Normally a database may incur some frequent network transient outages that may result in a query being partially or fully unexecuted. These network errors may not be that serious hence offer a chance for a retry of these queries once reconnected. Starting with MongoDB 4.2, the retry configuration is enabled by default. The MongoDB drivers can retry failed reads and writes for certain transactions whenever they encounter minor network errors or rather when they are unable to find some healthy primary in the sharded cluster/ replica set. However, if you don’t want the retryable writes, you can explicitly disable them in your configurations but I don’t find a compelling reason why one should disable them.

This feature is to ensure that in any case MongoDB infrastructure changes, the application code shouldn’t be affected. Regarding an example explained by Eliot Horowitz, the Co-founder of MongoDB, for a webpage that does 20 different database operations, instead of reloading the entire thing or having to wrap the entire web page in some sort of loop, the driver under the covers can just decide to retry the operation. Whenever a write fails, it will retry automatically and will have a contract with the server to guarantee that every write happens only once.

The retryable writes only makes a single retry attempt which helps to address replica set elections and  transient network errors but not for persistent network errors.

Retryable writes do not address instances where the failover period exceeds serverSelectionTimoutMs value in the parameter configurations.

With this MongoDB version, one can update document shard key values (except if the shardkey is the immutable _id field) by issuing a single-document findAndModify/update operations either in a transaction or as a retryable write.

MongoDB version 4.2 can now retry a single-document upsert operation (i.e upsert: true and multi: false) which may have failed because of duplicate key error if the operation meets these key conditions:

  1. Target collection contains a unique index that made the duplicate key error.
  2. The update operation will not modify any of the fields in the query predicate.
  3. The update match condition is either a single equality predicate {field: “value”} or a logical AND of equality predicates {filed: “value”, field0: “value0”}
  4. Set of fields in the unique index key pattern matches the set of fields in the update query predicate.

Automatic Client-Side Field-level Encryption

MongoDB version 4.2 comes with the Automatic Client-Side Field Level encryption (CSFLE), a feature that allows  developers to selectively encrypt individual fields of a document on the client side before it is sent to the server. The encrypted data is thus kept private from the providers hosting the database and any user that may have direct access to the database.

Only applications with the access to the correct encryption keys can decrypt and read the protected data. In case the encryption key is deleted, all data that was encrypted will be rendered unreadable.

Note: this feature is only available with MongoDB enterprise only.

Improved query language for expressive updates

MongoDB version 4.2 provides a richer query language than its predecessors. It now supports aggregations and modern use-case operations along the lines of geo-based searches, graph search and text search. It has integrated a third-party search engine which makes searches faster considering that the search engine is running on a different process/server. This generally improves on database performance contrary to if all searches were made to the mongod process which would rather make the database operation latency volatile whenever the search engine reindexes.

With this version, you can now handle arrays, do sums and other maths operations directly through an update statement.

On-Demand Materialized Views

The data aggregation pipeline framework in MongoDB is a great feature with different stages of transforming a document to some desired state. The MongoDB version 4.2 introduces a new stage $merge which for me I will say it saved me some time working with the final output that needed to be stored in a collection. Initially, the $out stage allows creating a new collection based on aggregation and populates the collection with the results obtained. If the collection already exist, it would overwrite the collection with the new results contrary to the $merge stage which only incorporates the pipeline results into an existing output rather than fully replacing the collection. Regenerating an entire collection everytime with the $out stage consumes a lot of CPU and IO which may degrade the database performance. The output content will therefore be timely updated enabling users to create on-demand materialized views

Modern Maintenance Operations

Developers can now have a great operational experience with MongoDB version 4.2 with integrated features that enhance high availability, cloud managed backup strategy,  improve the monitoring power and alerting systems. MongoDB Atlas and MongoDB Ops Manager are the providing platforms for these features. The latter has been labeled as the best for running MongoDB on-enterprise. It has also been integrated with Kubernetes operator for on-premise users who are moving to private cloud. This interface enables one to directly control Ops Manager.

There are some internal changes made to MongoDB version 4.2 which include:

  1. Listing open cursors.
  2. Removal of the MMAPv1 storage engine.
  3. Improvement on the WiredTiger data file repair.
  4. Diagnostic fields can now have queryHash
  5. Auto-splitting thread for mongos nodes has been removed.

Conclusion

MongoDB version 4.2 comes with some improvements along the lines of security and database performance. It has included an Automatic Client-Side Field Level Encryption that ensures data is protected from the client angle. More features like a Third party search engine and inclusion of the $merge stage in the aggregation framework render some improvement in the database performance. Before putting this version in production, please ensure that all your needs are fully addressed.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.