A key-value database is a type of non-relational database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects. Key-value databases are highly partitionable and allow horizontal scaling at scales that other types of databases cannot achieve.
In this blog post, we are going to cover some of the most important criteria that may help us in choosing the best key-value store for our applications. There are basically two types of data storage for key-value stores – in-memory or disk-based. Most of the datastores support both data storage types but commonly only excel only in either one of those.
Many of the key-value databases keep data in-memory (RAM) for fast data caching capabilities. Thanks to the simple data format that gives it its name, a key-value store can be very fast for read and write operations. It doesn’t require a predefined schema structure, only supports most common data types and often uses far less memory or disk space to store the same database.
Primarily, key-value stores are used to improve OLTP performance with the highest possible throughput for many simple requests. Frequently accessed data can be cached and served by in-memory key-value datastores and minimizing reads and writes to slower disk-based systems focusing on persistent storage. In-memory data caching is the best strategy to handle workload dominated by fetching data (reads), especially if you have a variety of data sources.
It is pretty common to see applications’ sessions being handled by the key-value store since they are required to handle lots of small continuous reads and writes, which may be volatile. If the key-value store is configured with redundancy in mind, often we see it is being utilized as a centralized session’s repository and a dedicated caching tier for the applications tier. Other than that, key-value stores are very commonly used to store basic information, such as customer details, security tokens, shopping-cart contents, e-commerce product details or managing each player’s session in massively multiplayer online games. It is also common to see key-value stores to be used as a publish/subscribe messaging system, especially for datastores that support persistent storage.
Hazelcast, AeroSpike, Apache Ignite, Redis, Memcached, and Tarantool are examples of popular datastores that provide an in-memory key-value store for fast data retrieval. Facebook, the world’s largest social network company, is able to achieve enormous scalability using multithreaded Memcached to handle billions of requests per second and holds trillions of items to deliver a rich experience for over a billion users around the world.
If your key-value dataset is growing too fast, data partitioning is the best strategy to handle the growth across time. Partitioning in a key-value store allows a subset of data to be served only by a subset pool of resources, which implies that a dataset is partitioned and distributed over multiple servers. There are many partitioning strategies implemented by different datastores such as consistent hashing, logical slots, or the conventional way of using a shard key as a partitioning condition for an even distribution among multiple shards.
If you need horizontal scaling which may empower your data store by adding more nodes, there are some datastores that come with horizontal scaling capabilities out-of-the-box:
Redis Cluster – Automatic data partitioning by storing keys in 16384 hash slots, or manual data partitioning using consistent hashing.
Tarantool – Automatic data partitioning using a shard key.
Memcached – Data partitioning using consistent hashing.
Aerospike – Automatic data partitioning using a deterministic hash process over 4096 logical partitions per namespace.
Hazelcast – Automatic data partitioning using consistent hashing to over 271 logical partitions.
For every shard or partition, there is a way to have one or more replicas as redundancy in case the server that hosts the primary partition goes down. It depends on the replication schemes offered by the data stores – master-slave replication, replica set, or shared-nothing with a replication factor.
Almost all key-value datastores that support partitioning are able to distribute the data quite evenly, allowing you to scale horizontally without having to worry about what to move in data rebalancing, resharding or cluster reprovisioning. Thanks to the smart and powerful partitioning algorithms used, the majority of scaling operations can be performed live with almost zero disruption to the running database service.
In theory, strong consistency requires that a read always returns the latest value written. This is a common characteristic of relational disk-based databases systems like MySQL, PostgreSQL, Microsoft SQL Server and Oracle. If your application requires a strong key-value data consistency across multiple database nodes, you may want to use disk-based key-value datastores like Apache ZooKeeper, etcd, Consul and LevelDB. Commonly, these datastores are used for configuration information, naming, synchronization and group services over large clusters in distributed systems. This is the primary reason these data stores are very popular for distributed system coordination, health check and metadata storage. For instance, etcd is the primary data backbone for Kubernetes, the popular container orchestration platform.
Strong consistency characteristics include that if a write precedes a read, the read should return a value no older than the write. If a write and a read are concurrent, the read operation can either return the value before the concurrent write or the value of the concurrent write. If a read returns the value of a concurrent write, that means the concurrent write has taken effect, regardless of whether the write operation has returned or not. All subsequent reads should not return values older than the write.
Eventual consistency is a theoretical guarantee that, provided no new updates to an entity are made, all reads of the entity will eventually return the last updated value. The great advantage of eventual consistency is the datastore system honors availability over network partitioning as in the CAP theorem, allowing you to have uninterrupted database service even when the cluster is partitioned and the database nodes are not able to “see” each other.
Cassandra, DynamoDB, Riak KV are among key-value datastores commonly used for eventual consistency. They all have different ways in handling conflict reconciliation. Riak KV uses logical clocks, DynamoDB uses vector timestamps and Cassandra uses wall clock timestamps to determine which concurrent write to store. Understanding these differences is important because all of these differences can influence how data is written after a network partitioning issue is resolved.
Synchronous vs Asynchronous Distribution
This aspect is important if you want to have high availability with data redundancy or distribute your data over multiple locations for backup, or bringing the data closer to the end-users. Most synchronous replication datastores write data to the primary instance and the replica simultaneously. Hence, the primary copy and the replica should always remain synchronized. In contrast, asynchronous replication datastores copy the data to the replica after the data is already written to the primary host, which happens at a later time.
For disk-based datastores like etcd, it replicates all data within a single consistent replication group. Each modification of a cluster state, which may change multiple keys, is assigned a globally unique ID, called a revision in etcd, from a monotonically increasing counter for reasoning over-ordering. Since there’s only a single replication group, the modification request only needs to go through the Raft Consensus protocol to commit. By limiting consensus to one replication group, etcd gets distributed consistency with a simple protocol while achieving low latency and high throughput.
For in-memory datastores, Tarantool supports synchronous replication (only per space), Redis supports only asynchronous replication, while Memcached does not support replication at all due to its shared-nothing architecture. Redis lets you create multiple replicas of a Redis primary. This allows you to scale database reads and helps you have highly available clusters.
Choosing the most suitable key-value datastore for your applications requires a bit of study and understanding on several aspects as mentioned above. In spite of them all storing a very similar key-value pair data type, the features powering the chosen datastore can make a huge difference in the long run.