blog

Installing Redis Cluster (cluster mode enabled) with auto failover

Sebastian Insausti

Published: July 6, 2021
Last Updated: May 4, 2022

Redis is an open-source in-memory datastore used as a database or cache. It has built-in replication and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster. In this blog, we will see what is and how to install a Redis Cluster.

What is Redis Cluster?

Redis Cluster is a built-in Redis feature that offers automatic sharding, replication, and high availability which was previously implemented using Sentinels. It has the ability to automatically split your dataset among multiple nodes and to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.

The Redis Cluster goals are:

High performance and linear scalability for up to 1,000 nodes. There are no proxies, asynchronous replication is used, and no merge operations are performed on values.
An acceptable degree of write safety. The system tries to retain all the writes originating from clients connected with the majority of the master nodes. Usually, there are small windows of time where acknowledged writes can be lost.
It is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable slave for every master node that is no longer reachable.

Now that we know what it is, let’s see how to install it.

How to install Redis Cluster

According to the official documentation, the minimal cluster that works as expected requires to contain at least three master nodes, but actually, the recommendation is to have a six nodes cluster with three masters and three nodes for the slaves, so let’s do that.

For this example, we will install Redis Cluster on CentOS 8 using the following topology:

Master 1: 10.10.10.121
Master 2: 10.10.10.122
Master 3: 10.10.10.123
Slave 1: 10.10.10.124
Slave 2: 10.10.10.125
Slave 3: 10.10.10.126

The following commands must be run in all the nodes, master and slave.

By default, during the creation of this blog post, the available Redis version on CentOS 8 is 5.0.3, so let’s use the Remi Repository to have the current stable version 6.2:

$ dnf install https://rpms.remirepo.net/enterprise/remi-release-8.rpm -y
$ dnf module install redis:remi-6.2 -y

Enable the Redis Service:

$ systemctl enable redis.service

To configure your Redis Cluster you need to edit the Redis configuration file /etc/redis.conf and change the following parameters:

$ vi /etc/redis.conf
bind 10.10.10.121 #Replace this IP address to the local IP address on each node
protected-mode no
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 15000
appendonly yes

These parameters are:

bind: By default, if it is not specified, Redis listens for connections from all available network interfaces on the server. It is possible to listen to just one or multiple selected interfaces.
protected-mode: Protected mode is a layer of security protection, in order to avoid that Redis instances left open on the internet are accessed and exploited. By default protected mode is enabled.
port: Accept connections on the specified port, default is 6379. If port 0 is specified Redis will not listen on a TCP socket.
cluster-enabled: Enables/Disables Redis Cluster support on a specific Redis node. If it is disabled, the instance starts as a stand-alone instance as usual.
cluster-config-file: The file where a Redis Cluster node automatically persists the cluster configuration every time there is a change, in order to be able to re-read it at startup.
cluster-node-timeout: The maximum amount of time (in milliseconds) a Redis Cluster node can be unavailable, without it being considered as failing. If a master node is not reachable for more than the specified amount of time, it will be failed over by its slaves.
appendonly: The Append Only File is an alternative persistence mode that provides much better durability. For instances using the default data fsync policy, Redis can lose just one second of writes in a server failure like a power outage, or a single write if something is wrong with the Redis process itself, but the operating system is still running correctly.

Every Redis Cluster node requires two TCP connections open. The normal Redis TCP port used to serve clients, by default 6379, and the port obtained by adding 10000 to the data port, so by default 16379. This second port is assigned for the Cluster bus, which is used by nodes for failure detection, configuration update, failover authorization, and more.

Now, you can start the Redis Service:

$ systemctl start redis.service

In the Redis log file, by default /var/log/redis/redis.log, you will see this:

76:M 02 Jul 2021 18:06:17.658 * Ready to accept connections

Now everything is ready, you need to create the cluster using the redis-cli tool. For this, you must run the following command in only one node:

$ redis-cli --cluster create 10.10.10.121:7000 10.10.10.122:7000 10.10.10.123:7000 10.10.10.124:7000 10.10.10.125:7000 10.10.10.126:7000 --cluster-replicas 1

In this command, you need to add the IP Address and Redis port for each node. The three first nodes will be the master nodes, and the rest the slave ones. The cluster-replicas 1 means one slave node for each master. The output of this command will look something like this:

>>> Performing hash slots allocation on 6 nodes...

Master[0] -> Slots 0 - 5460

Master[1] -> Slots 5461 - 10922

Master[2] -> Slots 10923 - 16383

Adding replica 10.10.10.125:7000 to 10.10.10.121:7000

Adding replica 10.10.10.126:7000 to 10.10.10.122:7000

Adding replica 10.10.10.124:7000 to 10.10.10.123:7000

M: 4394d8eb03de1f524b56cb385f0eb9052ce65283 10.10.10.121:7000

   slots:[0-5460] (5461 slots) master

M: 5cc0f693985913c553c6901e102ea3cb8d6678bd 10.10.10.122:7000

   slots:[5461-10922] (5462 slots) master

M: 22de56650b3714c1c42fc0d120f80c66c24d8795 10.10.10.123:7000

   slots:[10923-16383] (5461 slots) master

S: 8675cd30fdd4efa088634e50fbd5c0675238a35e 10.10.10.124:7000

   replicates 22de56650b3714c1c42fc0d120f80c66c24d8795

S: ad0f5210dda1736a1b5467cd6e797f011a192097 10.10.10.125:7000

   replicates 4394d8eb03de1f524b56cb385f0eb9052ce65283

S: 184ada329264e994781412f3986c425a248f386e 10.10.10.126:7000

   replicates 5cc0f693985913c553c6901e102ea3cb8d6678bd

Can I set the above configuration? (type 'yes' to accept):

After accepting the configuration, the cluster will be created:

>>> Nodes configuration updated

>>> Assign a different config epoch to each node

>>> Sending CLUSTER MEET messages to join the cluster

Waiting for the cluster to join

.

>>> Performing Cluster Check (using node 10.10.10.121:7000)

M: 4394d8eb03de1f524b56cb385f0eb9052ce65283 10.10.10.121:7000

   slots:[0-5460] (5461 slots) master

   1 additional replica(s)

S: 184ada329264e994781412f3986c425a248f386e 10.10.10.126:7000

   slots: (0 slots) slave

   replicates 5cc0f693985913c553c6901e102ea3cb8d6678bd

M: 5cc0f693985913c553c6901e102ea3cb8d6678bd 10.10.10.122:7000

   slots:[5461-10922] (5462 slots) master

   1 additional replica(s)

M: 22de56650b3714c1c42fc0d120f80c66c24d8795 10.10.10.123:7000

   slots:[10923-16383] (5461 slots) master

   1 additional replica(s)

S: ad0f5210dda1736a1b5467cd6e797f011a192097 10.10.10.125:7000

   slots: (0 slots) slave

   replicates 4394d8eb03de1f524b56cb385f0eb9052ce65283

S: 8675cd30fdd4efa088634e50fbd5c0675238a35e 10.10.10.124:7000

   slots: (0 slots) slave

   replicates 22de56650b3714c1c42fc0d120f80c66c24d8795

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

If you take a look at the master log file, you will see:

3543:M 02 Jul 2021 19:40:23.250 # configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH

3543:M 02 Jul 2021 19:40:23.258 # IP address for this node updated to 10.10.10.121

3543:M 02 Jul 2021 19:40:25.281 * Replica 10.10.10.125:7000 asks for synchronization

3543:M 02 Jul 2021 19:40:25.281 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '1f42a85e22d8a19817844aeac14fbb8201a6fc88', my replication IDs are '9f8db08a36207c17800f75487b193a624f17f091' and '0000000000000000000000000000000000000000')

3543:M 02 Jul 2021 19:40:25.281 * Replication backlog created, my new replication IDs are '21abfca3b9405356569b2684c6d68c0d2ec19b3b' and '0000000000000000000000000000000000000000'

3543:M 02 Jul 2021 19:40:25.281 * Starting BGSAVE for SYNC with target: disk

3543:M 02 Jul 2021 19:40:25.284 * Background saving started by pid 3289

3289:C 02 Jul 2021 19:40:25.312 * DB saved on disk

3289:C 02 Jul 2021 19:40:25.313 * RDB: 0 MB of memory used by copy-on-write

3543:M 02 Jul 2021 19:40:25.369 * Background saving terminated with success

3543:M 02 Jul 2021 19:40:25.369 * Synchronization with replica 10.10.10.125:7000 succeeded

3543:M 02 Jul 2021 19:40:28.180 # Cluster state changed: ok

And the replica’s log file:

11531:M 02 Jul 2021 19:40:23.253 # configEpoch set to 4 via CLUSTER SET-CONFIG-EPOCH

11531:M 02 Jul 2021 19:40:23.357 # IP address for this node updated to 10.10.10.124

11531:S 02 Jul 2021 19:40:25.277 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.

11531:S 02 Jul 2021 19:40:25.277 * Connecting to MASTER 10.10.10.123:7000

11531:S 02 Jul 2021 19:40:25.277 * MASTER <-> REPLICA sync started

11531:S 02 Jul 2021 19:40:25.277 # Cluster state changed: ok

11531:S 02 Jul 2021 19:40:25.277 * Non blocking connect for SYNC fired the event.

11531:S 02 Jul 2021 19:40:25.278 * Master replied to PING, replication can continue...

11531:S 02 Jul 2021 19:40:25.278 * Trying a partial resynchronization (request 7d8da986c7e699fe33002d10415f98e91203de01:1).

11531:S 02 Jul 2021 19:40:25.279 * Full resync from master: 99a8defc35b459b7b73277933aa526d3f72ae76e:0

11531:S 02 Jul 2021 19:40:25.279 * Discarding previously cached master state.

11531:S 02 Jul 2021 19:40:25.299 * MASTER <-> REPLICA sync: receiving 175 bytes from master to disk

11531:S 02 Jul 2021 19:40:25.299 * MASTER <-> REPLICA sync: Flushing old data

11531:S 02 Jul 2021 19:40:25.300 * MASTER <-> REPLICA sync: Loading DB in memory

11531:S 02 Jul 2021 19:40:25.306 * Loading RDB produced by version 6.2.4

11531:S 02 Jul 2021 19:40:25.306 * RDB age 0 seconds

11531:S 02 Jul 2021 19:40:25.306 * RDB memory usage when created 2.60 Mb

11531:S 02 Jul 2021 19:40:25.306 * MASTER <-> REPLICA sync: Finished with success

11531:S 02 Jul 2021 19:40:25.308 * Background append only file rewriting started by pid 2487

11531:S 02 Jul 2021 19:40:25.342 * AOF rewrite child asks to stop sending diffs.

2487:C 02 Jul 2021 19:40:25.342 * Parent agreed to stop sending diffs. Finalizing AOF...

2487:C 02 Jul 2021 19:40:25.342 * Concatenating 0.00 MB of AOF diff received from parent.

2487:C 02 Jul 2021 19:40:25.343 * SYNC append only file rewrite performed

2487:C 02 Jul 2021 19:40:25.343 * AOF rewrite: 0 MB of memory used by copy-on-write

11531:S 02 Jul 2021 19:40:25.411 * Background AOF rewrite terminated with success

11531:S 02 Jul 2021 19:40:25.411 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)

11531:S 02 Jul 2021 19:40:25.411 * Background AOF rewrite finished successfully

Monitoring Redis Cluster Nodes

To know the status of each Redis node, you can use the following command:

$ redis-cli -h 10.10.10.121 -p 7000 cluster nodes

184ada329264e994781412f3986c425a248f386e 10.10.10.126:7000@17000 slave 5cc0f693985913c553c6901e102ea3cb8d6678bd 0 1625255155519 2 connected

5cc0f693985913c553c6901e102ea3cb8d6678bd 10.10.10.122:7000@17000 master - 0 1625255153513 2 connected 5461-10922

22de56650b3714c1c42fc0d120f80c66c24d8795 10.10.10.123:7000@17000 master - 0 1625255151000 3 connected 10923-16383

ad0f5210dda1736a1b5467cd6e797f011a192097 10.10.10.125:7000@17000 slave 4394d8eb03de1f524b56cb385f0eb9052ce65283 0 1625255153000 1 connected

8675cd30fdd4efa088634e50fbd5c0675238a35e 10.10.10.124:7000@17000 slave 22de56650b3714c1c42fc0d120f80c66c24d8795 0 1625255154515 3 connected

4394d8eb03de1f524b56cb385f0eb9052ce65283 10.10.10.121:7000@17000 myself,master - 0 1625255152000 1 connected 0-5460

You can also filter the output using the grep linux command to check only the master nodes:

$ redis-cli -h 10.10.10.121 -p 7000 cluster nodes  | grep master

5cc0f693985913c553c6901e102ea3cb8d6678bd 10.10.10.122:7000@17000 master - 0 1625255389768 2 connected 5461-10922

22de56650b3714c1c42fc0d120f80c66c24d8795 10.10.10.123:7000@17000 master - 0 1625255387000 3 connected 10923-16383

4394d8eb03de1f524b56cb385f0eb9052ce65283 10.10.10.121:7000@17000 myself,master - 0 1625255387000 1 connected 0-5460

Or even the slave nodes:

$ redis-cli -h 10.10.10.121 -p 7000 cluster nodes  | grep slave

184ada329264e994781412f3986c425a248f386e 10.10.10.126:7000@17000 slave 5cc0f693985913c553c6901e102ea3cb8d6678bd 0 1625255395795 2 connected

ad0f5210dda1736a1b5467cd6e797f011a192097 10.10.10.125:7000@17000 slave 4394d8eb03de1f524b56cb385f0eb9052ce65283 0 1625255395000 1 connected

8675cd30fdd4efa088634e50fbd5c0675238a35e 10.10.10.124:7000@17000 slave 22de56650b3714c1c42fc0d120f80c66c24d8795 0 1625255393000 3 connected

Redis Cluster Auto Failover

Let’s test the auto failover feature in Redis Cluster. For this, we are going to stop the Redis Service in one master node, and see what happens.

On Master 2 – 10.10.10.122:

$ systemctl stop redis
$ systemctl status redis |grep Active
   Active: inactive (dead) since Fri 2021-07-02 19:53:41 UTC; 1h 4min ago

Now, let’s check the output of the command that we used in the previous section to monitor the Redis nodes:

$ redis-cli -h 10.10.10.121 -p 7000 cluster nodes

184ada329264e994781412f3986c425a248f386e 10.10.10.126:7000@17000 master - 0 1625255654350 7 connected 5461-10922

5cc0f693985913c553c6901e102ea3cb8d6678bd 10.10.10.122:7000@17000 master,fail - 1625255622147 1625255621143 2 disconnected

22de56650b3714c1c42fc0d120f80c66c24d8795 10.10.10.123:7000@17000 master - 0 1625255654000 3 connected 10923-16383

ad0f5210dda1736a1b5467cd6e797f011a192097 10.10.10.125:7000@17000 slave 4394d8eb03de1f524b56cb385f0eb9052ce65283 0 1625255656366 1 connected

8675cd30fdd4efa088634e50fbd5c0675238a35e 10.10.10.124:7000@17000 slave 22de56650b3714c1c42fc0d120f80c66c24d8795 0 1625255655360 3 connected

4394d8eb03de1f524b56cb385f0eb9052ce65283 10.10.10.121:7000@17000 myself,master - 0 1625255653000 1 connected 0-5460

As you can see, one of the slave nodes was promoted to master, in this case, Slave 3 – 10.10.10.126, so the auto failover worked as expected.

Conclusion

Redis is a good option in case you want to use an in-memory datastore. As you can see in this blog post, the installation is not rocket science and the usage of Redis Cluster is explained in its official documentation. This blog just covers the basic installation and test steps, but you can also improve this by, for example, adding authentication in the Redis configuration, or even running a benchmark using the redis-benchmark tool to check performance.

Advanced Partitioning Strategies for PostgreSQL OLTP and Analytics Datasets at Scale

Active-Active MySQL Group Replication Best Practices

Multi-Tenant, Multi-Cloud Logical and Bi-Directional Replication Deep Dive

Comparing DevOps tooling approaches: Terraform, Ansible, Chef, Puppet, and DIY scripting