Just like any database activity, monitoring is a constant practice to make sure an optimal performance of databases is always assured and guaranteed to the limit that it is expected to provide. Redis is not far away from how it has to be maintained and managed and monitoring is one of the best practices that needs to be adapted.
There are various tools that can be used either open source or enterprise services. However, it is important to determine what are the key areas that your eyes need to be on top of. Interestingly, not only DBAs, but DevOps too are very interested with Redis – even developers are jumping the bandwagon. With more than 50k GitHub stars, 19.6k forks, and 512 contributors, numbers are growing at a rapid pace for Redis. It is an incredibly popular open source project supported by a vibrant community.
Initially employed as a caching layer, Redis is now used by virtually every large enterprise, startup, and government organization to power use cases such as e-commerce, AI/ML, search, fraud detection, real-time inventory management, user session stores, and much more. It is noteworthy that determining the most important use cases and its flaws is very important to determine problems and troubleshoot issues that you might have encountered while using Redis. This blog will cover the areas that need to be looked at and constantly monitored to avoid the problems managing and maintaining your Redis database clusters.
Monitoring Redis performance
Doesn’t matter if your Redis is a single node, or a Redis master-replica cluster, or a Redis Cluster, it requires monitoring nonetheless. There’s no single area in Redis that you only have to look upon during the monitoring phase. In order to ensure performance we have to look at its memory usage, throughput, network connectivity (such as clients connections, replications), and caching hit ratio or cache eviction.
Redis has a powerful command called INFO. It mostly covers information that you can use for metrics when monitoring Redis. We’ll take it one at a time for these areas that are key to monitoring the performance of your Redis instances.
Obviously, Redis powers its caching performance by efficiently storing data in memory. If memory has issues, it can definitely affect the performance of your Redis instances. However, it doesn’t mean that we should only look at memory. Memory is a critical resource for the performance of Redis. Used memory defines the total number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).
If high memory utilization is left unnoticed, it may lead to serious performance degradation. For monitoring your memory, it is common to track the following:
Used Memory represents the total number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc). Monitoring this metric can help prevent “Out of memory” errors in the database.
Used Memory RSS is the set size, or the number of bytes that the operating system has allocated to Redis. This information helps identify memory fragmentation.
Used Memory Peak is the peak memory consumed by Redis (in bytes).
You can collect all memory utilization metrics data for a Redis instance by running the INFO MEMORY command. See below:
192.168.40.170:6001> INFO MEMORY # Memory used_memory:2721208 used_memory_human:2.60M used_memory_rss:9629696 used_memory_rss_human:9.18M used_memory_peak:2740096 used_memory_peak_human:2.61M used_memory_peak_perc:99.31% used_memory_overhead:2571376 used_memory_startup:1481664 used_memory_dataset:149832 used_memory_dataset_perc:12.09% allocator_allocated:2734528 allocator_active:3047424 allocator_resident:5447680 total_system_memory:2084089856 total_system_memory_human:1.94G used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.11 allocator_frag_bytes:312896 allocator_rss_ratio:1.79 allocator_rss_bytes:2400256 rss_overhead_ratio:1.77 rss_overhead_bytes:4182016 mem_fragmentation_ratio:3.65 mem_fragmentation_bytes:6990456 mem_not_counted_for_evict:4 mem_replication_backlog:1048576 mem_clients_slaves:20512 mem_clients_normal:20504 mem_aof_buffer:8 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0 lazyfreed_objects:16
It is very important to determine how your database performs during peak load or on a high-traffic period. It is one of the most crucial moments especially when you rolled out your first version of your application or have pushed changes that attracted a lot of traffic from your audience into your application.
Throughput covers database operations or calls (collecting info, client calls, config, syncs, pings, publish/subscribe). This is the area where you can determine how your server is performing at a particular time. Impact can differ at all times because it is workload dependent and because of the complexity of your business logic implementation. By looking at the history of throughput, you can infer the pattern of load on a server e.g. peak load, the frequency of peak load, the time frames of peak load, average load, etc.
You can collect throughput metric values for all the commands run on the Redis server by executing INFO COMMANDSTATS. See below:
127.0.0.1:6379> INFO COMMANDSTATS # Commandstats cmdstat_info_calls=5841,usec=374959,usec_per_call=64.19 cmdstat_client_calls=8,usec=6,usec_per_call=0.75 cmdstat_config_calls=1949,usec=117679,usec_per_call=60.38 cmdstat_psync_calls=2,usec=973,usec_per_call=486.50 cmdstat_auth_calls=5858,usec=25937,usec_per_call=4.43 cmdstat_publish_calls=9524,usec=81760,usec_per_call=8.58 cmdstat_replconf_calls=12420,usec=17190,usec_per_call=1.38 cmdstat_subscribe_calls=4,usec=10,usec_per_call=2.50 cmdstat_ping_calls=18637,usec=28044,usec_per_call=1.50
Cache hit ratio and evicted cache data
It is known in the community of developers and to those who are savvy with using caching services such as its fellow counterpart, Memcached, that Redis uses LRU (Least Recently Used). Starting on Redis 4.0, LFU (Least Frequently Used) has been introduced as another mode for eviction. LFU mode may work better (provide a better hits/misses ratio) in certain cases, since using LFU Redis will try to track the frequency of access of items, so that the ones used rarely are evicted while the ones used more often will have a higher chance of remaining in memory.
When Redis is used as a cache, often it is handy to let it automatically evict old data as you add new data. This behavior is very well known in the community of developers, since it is the default behavior of the popular memcached system.
If you are curious what this means for monitoring, tracking the cache hit ratio and evicted data will let you determine how your business logic and usage of Redis accords to your level of expectations. If the cache hit ratio is lower, then the impact can have larger latency results since requests for fetching data are happening from disk instead of the memory. Several actions you can take such as scaling up the host or scaling horizontally while sharding data efficiently to distribute and manage keys efficiently.
You can run the command INFO STATS to gather metrics from this. For example,
192.168.40.190:6001> INFO STATS # Stats total_connections_received:1005 total_commands_processed:408959 instantaneous_ops_per_sec:0 total_net_input_bytes:10442468 total_net_output_bytes:7328626 instantaneous_input_kbps:0.02 instantaneous_output_kbps:0.01 rejected_connections:0 sync_full:1 sync_partial_ok:0 sync_partial_err:1 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 expire_cycle_cpu_milliseconds:116 evicted_keys:0 keyspace_hits:100004 keyspace_misses:2 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:258 total_forks:3 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 tracking_total_keys:0 tracking_total_items:0 tracking_total_prefixes:0 unexpected_error_replies:0 total_error_replies:202 dump_payload_sanitizations:0 total_reads_processed:410165 total_writes_processed:405437 io_threaded_reads_processed:0 io_threaded_writes_processed:0
You can see those bold metrics specify the evicted data and data that specifies the number of hits of keys and number of misses.
Number of client connections
The number of connections is a limited resource which is either enforced by the operating system or by the Redis configuration. Monitoring the active connections helps you to ensure that you have sufficient connections to serve all your requests at peak time.
You can verify the number of connections by using the CLIENT LIST command. See my example command below:
root@pupnode41:~# echo "CLIENT LIST" | redis-cli -a password -p 6379 2>/dev/null | wc -l 60
A Redis using a master-replica setup or with Redis Cluster, replication provides data redundancy and promotes high availability setup. Redis is a fully-rich NoSQL open source database or a dictionary server pack with tools that offers high availability which has health monitoring that triggers failover in case it is necessary if the primary or master fails down.
Redis databases engage in replication activities that involve slave Redis servers copying and syncing data from the master. This ensures data is never lost entirely if the primary server breaks down, as a failover node will take the role of the primary. Determining the performance of your replication, there are two commands that can be helpful in Redis. INFO REPLICATION and ROLE can be used, but the INFO REPLICATION provides more informative metrics to look upon. This helps you determine and ensure data integrity and also if replicas or slaves are performing their function.
127.0.0.1:6379> INFO REPLICATION # Replication role:master connected_slaves:2 slave0_ip=192.168.40.242,port=6379,state=online,offset=259597555,lag=1 slave1_ip=192.168.40.243,port=6379,state=online,offset=259597555,lag=1 master_replid:e566832595b879c530c1d229c046d8f7ec8a48cf master_replid2:0000000000000000000000000000000000000000 master_repl_offset:259597569 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:258548994 repl_backlog_histlen:1048576
The variable connected_slaves informs the status of the reachability of the slave server to a master. Slave unreachability could lead to higher read latency depending on the read load on a server.
In a slave role, the results are different which provides also info regarding its connectivity with the master, for example:
127.0.0.1:6379> INFO REPLICATION # Replication role:slave master_host:192.168.40.241 master_port:6379 master_link_status:up master_last_io_seconds_ago:1 master_sync_in_progress:0 slave_repl_offset:259953389 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:e566832595b879c530c1d229c046d8f7ec8a48cf master_replid2:0000000000000000000000000000000000000000 master_repl_offset:259953389 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:258904814 repl_backlog_histlen:1048576
The variable master_slave_io_seconds_ago tells how much time elapses during the communication between a slave and the master. A higher value for this metric can be indicative of issues on the master/slave or some network problems. It further causes the slave to serve stale data.
Tools for Monitoring Redis
Obviously, we shouldn’t rely on the INFO command in Redis alone. There are tons of tools that can help you monitor Redis, for example, RedisInsight. It is a free GUI for Redis that is available on all platforms (Windows, Mac, Linux, and Docker) and works with all variants of Redis.
Redis also supplies the MONITOR command. Although it is logged in a human readable format, it is hard to follow and there’s no rich graphs to rely upon. Although this is helpful especially when diagnosing performance issues but with limitations as AUTH, EXECT, HELLO, and QUIT are not logged when using the MONITOR command.
Redis Memory Analyzer (RMA) is also a great tool. RMA is a console tool to scan Redis key space in real time and aggregate memory usage statistics by key patterns. You may use this tool without maintenance on production servers. You can scan by all or selected Redis types such as “string”, “hash”, “list”, “set”, “zset” and use matching patterns as you like. RMA try to discern key names by patterns, for example if you have keys like ‘user:100’ and ‘user:101’ application would pick out common pattern ‘user:*’ in output so you can analyze most memory distressed data in your instance.
Redis Toolkit is a toolkit for actively monitoring, analyzing and reporting your Redis database. The toolkit has 2 types of reporting:
hit rate – actively monitors a redis database using the redis-cli monitor command, stores the commands Redis is running locally and then generates a report.
memory – dumps the contents of the Redis database locally and analyzes the memory distribution per key.
Other options are available on some paid enterprise tools. Severalnines are also in the process of bringing ClusterControl to add this feature and support to provide more feasible and efficient monitoring of your Redis database clusters.
Key areas that require to be monitored in Redis are delivered in this blog. Redis is not as complicated compared to other RDBMS as there are certain areas that have to be looked upon. Redis is straightforward and lightweight when it comes to monitoring. Identify the important factors and you should be on the way to successfully running and maintaining your Redis cluster.