The Battle of the NoSQL Databases - Comparing MongoDB & Cassandra

Mani Yangkatisal

Introduction to MongoDB

MongoDB was introduced back in 2009 by a company named 10gen. 10gen was later renamed to MongoDB Inc., the company which is responsible for the development of the software, and sells the enterprise version of this database. MongoDB Inc. handles all the support with its excellent enterprise-grade support team around the clock. They are committed to providing lifetime support, which means customers choose to use any version of MongoDB, and if they wish to upgrade, it would be supported anytime. It also provides them with an opportunity to be in sync with all the security fixes that the company offers round the clock.

MongoDB is well-known NoSQL databases that made a deep proliferation over the last decade or so, fueled by the explosive growth of the web and mobile applications running in the cloud. This new breed of internet-connected applications demands fast, fault-tolerant and scalable schema-less data storage which NoSQL databases can offer. MongoDB uses JSON to store data like documents that can vary in structure offerings, a dynamic, flexible schema. MongoDB designed for high availability and Scalability with auto-sharding. MongoDB is one of the popular open-source databases that arise under the NoSQL database, which is used for high volume data storage. MongoDB has the rows called documents that don't require a schema to be defined because the fields are created on the fly.  The data model available within MongoDB allows hierarchical relationships representation, to store arrays, and other more complex structures more efficiently.

Introduction to Cassandra

Apache Cassandra is another well-known as a free and open-source, distributed, wide column store. Cassandra was introduced back in 2008 by a couple of developers from Facebook, which later released as an open-source project. It is currently being supported by the Apache Software Foundation, and Apache is presently maintaining this project for any further enhancements.

Cassandra is a NoSQL database management system designed to handle large amounts of data across many commodity servers and provide high availability with no single point of failure. Cassandra offers very robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra supports the distribution design of Amazon Dynamo with the data model of Google's Bigtable.

Similarities between MongoDB and Cassandra

With the brief introduction of these two NoSQL databases, let us review some of the similarities between these two databases:

Both MongoDB and Cassandra are NoSQL database types and open-source distribution.

  • None of these databases is a replacement to the traditional RDBMS database types.
  • Both of these databases are not compliant to ACID (Atomicity, Consistency, Isolation, Durability), which refers to properties of database transactions that guarantee database transactions are processed reliably.
  • Both of these databases support sharding horizontal partitioning.
  • Consistency and Normalization are two concepts that these two database types not satisfy (as these lean more towards the RDBMS database types)

MongoDB vs. Cassandra: Features

Both technologies play a vital role in their fields, with their similarities between MongoDB and Cassandra showing their common features and differences show, uniqueness of these technologies. 

Figure 1 MongoDB vs. Cassandra – 8 Major Factors of Difference
Figure 1 MongoDB vs. Cassandra – 8 Major Factors of Difference

Expressive Data Model

MongoDB provides a rich and expressive data model that is known as 'object-oriented' or 'data-oriented.' This data model can easily support and represent any data structure in the domain of the user.  The data can have properties and can be nested in each other for multiple levels. Cassandra is more of a traditional data model with table structure, rows, and specific data type columns. This type is defined during the creation of the table. Anyhow, when we compare both the models, MongoDB tends to provide a rich data model. The figure below describes the typical high-level architectures of both databases in terms of its storage and replication levels.

Figure 2: Architecture diagram MongoDB vs. Cassandra
Figure 2: Architecture diagram MongoDB vs. Cassandra

High Availability Master Node

MongoDB supports one master node in a cluster, which controls a set of slave nodes. If the master node goes down, a slave is elected as master and takes about 20-30 seconds for the same. During this delay time, the cluster will be down and will not be able to accept any input. Cassandra supports multiple master nodes in a cluster, and in the event one of the master nodes goes offline, its place will be taken by another master node. In comparison, Cassandra supports higher availability over MongoDB because it does not affect the cluster and is always available. 

Secondary Indexes

MongoDB has more advantages compared to Cassandra if an application requires secondary indexes along with flexibility in the data model. Because of this, MongoDB is much easier to index any property of the data stored in the database. This property makes it easy to query. Cassandra has cursor support for the secondary indexes, which are limited to single columns and equality comparisons

Write Scalability

MongoDB supports only one master node. This master node in MongoDB only accepts the input, and the rest of the nodes in MongoDB are used as an output; therefore, if the data has to be written in the slave nodes and let it pass through the master node. Cassandra supports multiple master nodes in a cluster, which makes it suitable in the case of Scalability.

Query Language Support

Currently, MongoDB doesn't no support a query language. The queries in MongoDB are structured as JSON fragments. In contrast, Cassandra has a user-friendly set of queries which is known as CQL (Cassandra Query Language) and is easily adaptable by the developers who have prior knowledge of SQL. How are their queries different? 

Selecting records from the customer table:

 Cassandra:

SELECT * FROM customer;

 MongoDB:

db.customer.find()

Inserting records into the customer table:

 Cassandra:

INSERT INTO customer (custid, branch, status) VALUES('appl01', 'headquarters', 'A');

 MongoDB:

db.customer.insert({ cust_id: 'appl01', branch: 'headquarters', status: 'A' })

Updating records in the customer table:

Cassandra:

UPDATE Customer SET branch = ‘headquarters' WHERE custage > 2;

MongoDB:

db.customer.update( { custage: { $gt: 2 } }, { $set: { branch: 'headquarters' } }, { multi: true } )

Native Aggregation

MongoDB has a built-in Aggregation framework which is used to run an ETL pipeline to transform the data stored in the database and also supports both small and medium data traffic. When there is increased complexity, the framework gets more difficult to debug as well, whereas Cassandra does not have an integrated aggregation framework. Cassandra utilized external tools such as Hadoop, Apache Spark, etc.  Therefore, MongoDB is better than Cassandra when it comes to the built-in aggregation framework.

Schema-less Model

MongoDB provides the facility for a user is allowed to alter the enforcement of any schema on the database. Each database can be a different structure. It all depends on the program or the application to interpret the data. Whereas, Cassandra doesn't offer the facility to alter schemes but provides static typing where the user is required to define the type of the column in the beginning.

Performance Benchmark

Cassandra considers performing better in applications that require heavy data load since it can support multiple master nodes in a cluster. Whereas, MongoDB will not be ideal for applications with heavy data load as it can't scale with the performance. Based on the industry-standard benchmark created by Yahoo! called YCSB, MongoDB provides greater performance than Cassandra in all the tests they have executed, in some use cases by as much as 25x.  When optimized for a balance of throughput and durability between Cassandra and MongoDB, MongoDB provides over 50% greater throughput in mixed workloads, and 2.5x greater throughput in read-dominant workloads compared to Cassandra.

MongoDB provides the most flexibility for ensuring durability for specific operations: users can opt for the durability optimized configuration for specific operations that are deemed critical but for which the additional latency is acceptable. For Cassandra, this change requires editing a server config file and a full restart of the database.

Conclusion

MongoDB is known best for workloads with lots of highly unstructured data. The scale and types of data that you will be working with MongoDB's flexible data structures will suit you better than Cassandra. To use MongoDB effectively, you will have to be able to manage with the possibility of some downtime if the master node fails, as well as with limited write speeds. And don't forget, you will also have to learn a new query language. In MongoDB, the complex data can be easily managed by using the JSON format support capabilities. This is a key differentiator for MongoDB when you compare it with Cassandra. In some situations, Cassandra can be considered the best database to implement when involving large amounts of data, speed optimization, and query execution.  The comparison results of Cassandra and MongoDB, we will find that they have their respective advantages depending upon the implementation requirements and the volume of data to be dealt with.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.