Azure Managed Redis OOM for two hours

Question

Azure Managed Redis OOM for two hours

Théo Mouchabac 20

My Azure Managed Redis instance went down for at least two hours with no possible action on my side to restore, restart, or do anything with the service during that time.

Clients could not write to the redis instance anymore, getting OOM (out of memory) issues. maxmemory was not readable or settable from redis clients.

When looking at monitoring dashboards from the Azure portal, the memory consumption was fine (way below the instance capacity). Every metric was fine until a sudden spike in read, writes and latency. We cannot see what justifies this coming from clients connected to the service.

The incident was "automatically" resolved a few hours later, but this was way too long and we had to recreate a new instance to ensure our own service availability.
However, the resolving of the incident came with no explanation at all. Because I don't want the issue to reproduce, I need to understand what happened exactly.

Here is a screenshot of the health event: Capture d’écran 2025-06-17 à 20.49.08

At that time, I could not even contact support to understand what happened.

Please provide me with more explanation on this case specifically so that i can avoid the issue in the future.

Regards,

Sai Raghunadh M 4,640 Reputation points Microsoft External Staff Moderator

2025-06-17T19:14:53.49+00:00

Hi Théo Mouchabac

Could you please share the details asked in the private message.
Théo Mouchabac 20 Reputation points

2025-06-20T09:37:52.68+00:00

My question is why was a scaling operation initiated without any action on my side?

1 answer

Your answer

Sai Raghunadh M 4,640 Reputation points Microsoft External Staff Moderator

2025-06-17T19:14:53.49+00:00

Hi Théo Mouchabac

Could you please share the details asked in the private message.
Théo Mouchabac 20 Reputation points

2025-06-20T09:37:52.68+00:00

My question is why was a scaling operation initiated without any action on my side?

Answer 1

Hi Théo Mouchabac,

The issue occurred because a scaling operation for your Azure Redis cache in the France South region didn’t go as expected. The system attempted to scale the cache, but the process got stuck and eventually timed out, leaving the cache in an unresponsive state almost like it had run out of memory, even though actual usage was within limits. As a result, the service remained unavailable for about two hours. Fortunately, the Azure Redis team identified the problem, applied the necessary fixes, retried the scaling operation, and successfully brought the cache back online.

However, since there was no valid Account Admin email configured in Azure, you likely didn’t receive automatic notifications about this incident. Azure does not send default alerts about service outages, incidents, or maintenance unless a valid email is provided or proper alerting is set up. To ensure your team stays informed especially for region-specific issues you should manually configure Service Health alerts and link them to action groups with valid email addresses, phone numbers, or other preferred notification methods.

if the above answer was helpful. If this answers your query, do click Accept Answer and Yes, if you have any further query do let us know.

Share via

Azure Managed Redis OOM for two hours

1 answer

Your answer