NoSQL (“not only SQL”) is an approach to database design that enables the storage and querying of data outside the traditional structures found in relational databases. It was created to primarily deal with unstructured data that is generated from numerous sources such as documents, audio, video, social networks, etc. NoSQL databases are best for modern applications where data models evolve and scalability is essential. This database has gained popularity in recent years since companies now have to deal with unstructured data more than ever before. This model stores data differently from the traditional relational tables to permit the related data to be kept within a single data structure. A NoSQL database can be divided into four categories:
NoSQL databases are often used in agile projects because they offer flexible data models. This allows developers to focus on business logic and algorithms instead of dealing with schema updates. If you anticipate that your application’s data model needs to remain flexible to accommodate changes over time, the flexible schema approach of NoSQL databases may be a fit for your needs.
According to db-engines, the top two NoSQL databases (July 2021) are MongoDB (ranking: 5) and Redis (ranking: 6). Interestingly, these NoSQL databases did not exist 12 years ago. How they came into existence and gained attraction, popularity and they changed the landscape of the database management system is the primary essence of this blog post.
MongoDB is an open-source document-oriented database, with the initial release in February 2009. Document databases contrast strongly with the traditional relational database. They store all information for a given object in a single instance in the database, and every stored object can be different from every other. This eliminates the need for object-relational mapping and allows schemaless structure, a feature where application developers have the agility to evolve quickly due to its flexible data model. Rather than fitting an application to meet schema requirements, developers write the application and the schema follows.
MongoDB is very popular due to its flexibility, ease to learn and lower cost of ownership to get started. Application developers love MongoDB because they can change the data model on the go, and because MongoDB uses JSON documents to record data. JSON is everywhere and can be considered the de-facto format for sending rich data between web applications and endpoints. It’s simple design and flexibility makes it easy to read and understand, and in most cases, easy to manipulate in the programming language of your choice.
MongoDB came into existence thanks to 10gen (before it was renamed to MongoDB Inc), began developing it in 2007 and came out with its first GA release in February 2009. Since then, MongoDB has evolved rapidly and is considered one of the most exciting database projects for modern applications. According to StackOverflow Developer Survey 2020, MongoDB remains the database technology that developers want to learn the most. At the time of this writing, they just released version 5.0 (July 13th, 2021) and comes with many notable features like live resharding, native time-series data support, and versioned API, suitable for multi-cloud environments.
Another significant feature of MongoDB is the built-in high availability features like replication, replica set and sharding. It is horizontally scalable, which helps reduce the workload and scale your business with ease. It offers replication via a homegrown consensus protocol that draws inspiration from Raft and can distribute data across shards via its query router called mongos. You can use ClusterControl to deploy a MongoDB replica set and sharded cluster setup with ease.
MongoDB caught the public attention and critics due to its weakness on the default security configuration of MongoDB, allowing anyone to have full access to the database. Data from tens of thousands of MongoDB installations has been stolen. Furthermore, many MongoDB servers have been held for ransom. This exposure has led us to write a handful of security-related blog posts related to MongoDB such as, Secure MongoDB and Protect Yourself From the Ransom Hack and How to Secure MongoDB From Ransomware – Ten Tips. Consequently, MongoDB has improved the default configuration aspects to be more secure with MongoDB 3 and later.
Some huge companies heavily rely on MongoDB as their data store, such as Forbes, Toyota, SAP, Cisco, eBay and Adobe. MongoDB is considered a game-changer in the database world, becoming one of the most important database platforms in the internet era.
ClusterControl has supported MongoDB since version July 2013 (v1.2.3) and has been continuously improving since then. ClusterControl even supported the TokuMX (MongoDB with Tokutek’s fractal tree) back then, before it was deprecated in MongoDB 3 due to the upstream’s core design changes. In the recent notable enhancements, ClusterControl introduces support for Percona Backup for MongoDB, a distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets. The Percona Backup for MongoDB project is inherited from and replaces mongodb_consistent_backup, which is no longer actively developed or supported.
Redis is another most popular NoSQL database technology that focuses on frequent high-speed access to the same chunks of data, even if those chunks of data are large. In May 2019, Salvatore Sanfilllippo released the initial version of Redis, a.k.a Remote Dictionary Server, and has caught everyone’s attention because of its richer features to the already established open-source in-memory database solution at that time called Memcached.
Redis is super-fast due to in-memory data structure and the fact that it has been written in the C language (that’s one of the reasons Memcached was re-written in C). Because of its high performance, developers have turned to Redis for data caching when the read and write operations volume exceeds the capabilities of the traditional databases. Frequently accessed data can be cached and served by in-memory key-value datastores and minimizing reads and writes to slower disk-based systems focusing on persistent storage.
Traditionally, database management systems are designed to provide robust data functionalities rather than speed at scale. The application cache is often used to store copies of lookup tables and the replies to expensive queries from the DBMS, both to improve the application’s performance and reduce the load of the data source. Sometimes an application’s workflow requires the generation of resource-intensive results. Once these results are obtained, there are cases in which the results could be later reused, such as when performing partial aggregates. The cache acts as an ideal intermediate medium for retaining such results between requests. This is where Redis is shining.
Redis has evolved from a very fast simple key-value store to persistent data storage and being used as a messaging broker and queuing system. It enables true statelessness for applications’ processes while reducing duplication of data or requests to external data sources. According to StackOverflow Developer Survey 2020, Redis remains at the top of database technology that developers have expressed interest in continuing developing. At the time of this writing, Redis 6 is the latest version, with a new, more sophisticated user-based ACL implementation, built-in traffic SSL encryption, and multi-threaded I/O, albeit the Redis process is still single-threaded.
Redis deployment is supported in ClusterControl 1.9.0 by using our new next-generation ClusterControl GUI package available in a separate installation. At the time of this writing, we refer to it as ClusterControl v2 tagged with Technology Preview which only supports deployment of a Redis replication up to 5 nodes with Redis Sentinel with backup management of AOF and RDB. If you are interested, please refer to this guide on how to install it.
MongoDB and Redis are hands down two of the best NoSQL database solutions in the market right now and are believed to retain their position in the top 10 database ranking for quite a long time. That’s the reason ClusterControl supports both database technologies.