Severalnines Blog
The automation and management blog for open source databases

An Overview of WiredTiger Storage Engine for MongoDB

Every database system has a structured component which is responsible for maintaining how data is stored and served both in memory and disk. This is often referred to as a storage engine. More often when evaluating the architecture of operational databases, developers put into the account on first-hand factors such as data modeling, reduced latency, improved throughput operations, data consistency, scalability easiness, and minimal fault tolerance. In spite of that, one needs to have a detailed and advanced knowledge on the underlying storage engine for a better tuning so that it delivers on the highlighted factors efficiently.

A simple cycle of an application to db system is illustrated below...

Example of common application architecture
Example of common application architecture

WiredTiger Storage Engine

MongoDB supports mainly 3 storage engines whose performance differ in accordance to some specific workloads. The storage engines are:

  1. WiredTiger Storage Engine
  2. In-Memory Storage Engine
  3. MMAPv1 Storage Engine

The WiredTiger storage engine has both configurations of a B-Tree Based Engine and a Log Structured Merge Tree Based Engine.

B-Tree Based Engine

This is one of the ancient storage engines from which other sophisticated setups are derived from. It is a self-balancing tree data structure that ensures data sorting and enables searches, sequential access, insertions and deletes in a logarithmic manner. It is row-based storage such that each row is considered as being a single record in the database

Merits of a B-Tree Storage Engine

  • High throughput and low latency reads. B-Trees has a tendency of growing wide and shallow such that very few nodes are traversed.
  • Keeps keys in sorted order for sequential traversing and indexes are balanced with a recursive algorithm.
  • The interior storage nodes are always kept at least half full which in general reduces wastage.
  • Easy to handle a large number of insertions and deletions within a short duration.
  • Hierarchical indexing is employed with the aim of reducing disk reads.
  • Speeds up insertions and deletions through usage of partially full blocks.

Limitations of a B-Tree storage engine

  • Poor write performance due to the need to ensure a well-ordered data structure with random writes. Random writes are more expensive than sequential writes to the storage.
  • Ready-modify-write penalty of an entire block even for a minor update to a row in a block.

Log Structured Merge Tree Based Engine

Because of the poor write performance of the B-Tree Based Engine, developers had to come up with a way to cope with larger datasets to DBMS. The Log Structured Merge Tree Based Engine (LSM Tree) was hence established to improve performance for indexed access to files with high write volume over an extended period. In this case, random writes at the first stage of cascading memory are turned into sequential writes at the disk-based first component.

Merits of a LSM Tree Storage Engine

  • Ability to do fast sequential writes enhances quick handling of large fast-growing data.
  • Well suited for tiered storage hence giving organizations a better selection in terms of cost and performance. Flash-based SSDs provide great performance in this case.
  • Better compression and storage efficiency hence saving on storage space and enhancing almost full storage
  • Data is always available for query immediately.
  • Insertions are very fast.

Limitations of a B-Tree storage engine

Consume more memory as compared to B-Tree during read operations due to read and space amplification. However, some approaches such as bloom filters have mitigated this effect in practice such that the number of files to be checked during a point query is reduced.

The WiredTiger technology was designed in a way to employ both B-Tree and LSM advantages making it sophisticated and the best storage engine for MongoDB. IT is actually MongoDB’s default storage engine.

Severalnines
 
Become a MongoDB DBA - Bringing MongoDB to Production
Learn about what you need to know to deploy, monitor, manage and scale MongoDB

WiredTiger Storage Engine Architecture

As mentioned above, it involves the concept of two basic storage engines that is the B-Tree and LSM Tree engines hence it is a multiversion concurrency control (MVCC) storage engine. The merits of the two combined enable the system see a snapshot of the database at the time it accesses a collection. Checkpoints are established such that a consistent view of data is recorded to disk between checkpoints. In case of a crash between checkpoints, it is easy to recover with these checkpoints or rather, even if there are no checkpoints for data, one can recover it from disk journal files.

Extensive usage of cache rather than disk to enhance low latency. WiredTiger storage engine relies heavily on the OS page-cache such that compressed data is fetched without involving the disk. Besides, the least recently used data is cleared from the RAM preserving more space for the cache.

B-Tree storage concept offers highly efficient reads and good write performance with low CPU utilization. It also has a document-level locking implementation that enables highly concurrent workloads and this concurrency consequently facilitates the server to take advantage of many core CPUs. In general, all theses enhances the high scalability of the database.

The enterprise edition supports on-disk encryption for the WiredTiger storage engine which is a feature that greatly improves data security.

WiredTiger storage engine enables a write-ahead logging which ensures an automatic crash recovery and makes writes durable.

Advantages of the WiredTiger Storage Engine

  • Efficient storage due to a multiple of compression technologies such as the Snapp, gzip and prefix compressions.
  • It is highly scalable with concurrent reads and writes. This in the end improves on the throughput and general database performance.
  • Assure data durability with write-ahead log and usage of checkpoints.
  • Optimal memory usage. The WiredTiger uses both the internal cache and file system cache.
  • With the filesystem cache, MongoDB can easily use the free memory that is not used by the WiredTiger cache.

WiredTiger Storage Engine Setbacks

Difficulties in updating data. The concurrency scheme prevents in-place updates such that updating a field value in a document re-writes the entire document.

Conclusion

WiredTiger storage engine integrates concepts from two major storage engines,the B-Tree and LSM tree storage engine to achieve maximum and optimal performance. Weighing the advantages from both cases and collectively employing them makes the WiredTiger a general purpose storage engine. For this reason, in the current versions of MongoDB, it is the default storage engine. This implies if you really don’t have a strong reason to abhor it, then it is the best for your data. However, the storage engine choice heavily relies on your data use case or rather where the WiredTiger cannot meet your expectations. In general, this is the best default storage engine.