Troubleshoot Azure Cache for Redis server issues
This section discusses troubleshooting issues caused by conditions on an Azure Cache for Redis server or any of the virtual machines hosting it.
Note
Several of the troubleshooting steps in this guide include instructions to run Redis commands and monitor various performance metrics. For more information and instructions, see the articles in the Additional information section.
High server load
High server load means the Redis server is busy and unable to keep up with requests, leading to timeouts. Check the Server Load metric on your cache by selecting Monitoring from the Resource menu on the left. You see the Server Load graph in the working pane under Insights. Or, add a metric set to Server Load under Metrics.
Following are some options to consider for high server load.
Scale up or scale out
Scale out to add more shards, so that load is distributed across multiple Redis processes. Also, consider scaling up to a larger cache size with more CPU cores. For more information, see Azure Cache for Redis planning FAQs.
Rapid changes in number of client connections
For more information, see Avoid client connection spikes.
Long running or expensive commands
This section was moved. For more information, see Long running commands.
Scaling
Scaling operations are CPU and memory intensive as it could involve moving data around nodes and changing cluster topology. For more information, see Scaling.
Server maintenance
If your Azure Cache for Redis underwent a failover, all client connections from the node that went down are transferred to the node that is still running. The server load could spike because of the increased connections. You can try rebooting your client applications so that all the client connections get recreated and redistributed among the two nodes.
High memory usage
Memory pressure on the server can lead to various performance problems that delay processing of requests. When memory pressure hits, the system pages data to disk, which causes the system to slow down significantly.
Here are some possible causes of memory pressure:
- The cache is filled with data near its maximum capacity
- Redis server is seeing high memory fragmentation
Fragmentation is likely to be caused when a load pattern is storing data with high variation in size. For example, fragmentation might happen when data is spread across 1 KB and 1 MB in size. When a 1-KB key is deleted from existing memory, a 1-MB key can’t fit into it causing fragmentation. Similarly, if 1-MB key is deleted and 1.5-MB key is added, it can’t fit into the existing reclaimed memory. This causes unused free memory and results in more fragmentation.
If the used_memory_rss
value is higher than 1.5 times the used_memory
metric, there's fragmentation in memory. The fragmentation can cause issues when:
- Memory usage is close to the max memory limit for the cache, or
UsedMemory_RSS
is higher than the Max Memory limit, potentially resulting in page faulting in memory.
If a cache is fragmented and is running under high memory pressure, the system does a failover to try recovering Resident Set Size (RSS) memory.
Redis exposes two stats, used_memory
and used_memory_rss
, through the INFO command that can help you identify this issue. You can view these metrics using the portal.
Validate that the maxmemory-reserved
and maxfragmentationmemory-reserved
values are set appropriately.
There are several possible changes you can make to help keep memory usage healthy:
- Configure a memory policy and set expiration times on your keys. This policy may not be sufficient if you have fragmentation.
- Configure a maxmemory-reserved value that is large enough to compensate for memory fragmentation.
- Create alerts on metrics like used memory to be notified early about potential impacts.
- Scale to a larger cache size with more memory capacity. For more information, see Azure Cache for Redis planning FAQs.
For recommendations on memory management, see Best practices for memory management.
Long-running commands
This section was moved. For more information, see Long running commands.
Server-side bandwidth limitation
This section was moved. For more information, see Network bandwidth limitation.