Managing Redis Connection Loss on Linux App Service

Florian Pierrel 20 Reputation points
2023-11-09T17:12:57.13+00:00

We are experiencing undetected connection losses with Azure Cache for Redis once or twice a month on our .NET 6 application hosted with Docker on a Linux App Service. This leads to Redis Timeout Exceptions, and the number of clients connected to the Redis server drops sharply. The problem occurs at the same time on about half the instances. All applications hosted on the same instance experience the same problem at the same time.

The problem resolves itself after 15 minutes. We believe it's due to the TCP parameter net.ip4v.tcp_retries2, as mentioned in the documentation.

Is this the case? Is this frequency normal? Could the problem be Azure Redis?

How can we mitigate or solve this issue? We've tried using ForceReconnect and updating the StackExchange.Redis library to the last version, but it's not working. Unfortunately, we can't modify the TCP parameter net.ip4v.tcp_retries2 on the Linux App Plan.

Azure Cache for Redis
Azure Cache for Redis
An Azure service that provides access to a secure, dedicated Redis cache, managed by Microsoft.
262 questions
0 comments No comments
{count} votes

Accepted answer
  1. GeethaThatipatri-MSFT 29,482 Reputation points Microsoft Employee
    2023-11-13T14:35:35.6433333+00:00

    Hi @Florian Pierrel When you experience this undetected connection loses, that sounds consistent with our monthly patching. Ultimately, when we patch the cache all clients are disconnected but can then immediately reconnect. The linked article explains the reasoning behind that.
    As you’ve found there is an issue with linux settings being too generous with your default values causing reconnects to fail for ~15 mins or so.

     Since it doesn’t appear you can set the OS-level setting TCP parameter net.ip4v.tcp_retries2 in App Service, here is a snippet from the comment linked above that offers a few client-side options you could configure in stackexchange.redis such as adjusting keepalive settings and using the ForceReconnect pattern.

    As you found, there are TCP settings you can change on the client machine to force it to timeout the connection sooner and allow for reconnect. In addition to tcp_retries2, you can try tuning the keepalive settings as discussed here: lettuce-io/lettuce-core#1428 (comment). It should be safe to reduce these timeouts to more realistic durations machine-wide unless you have systems that actually depend on the unusually long retransmits.

    An additional approach is using the ForceReconnect pattern recommended in the Azure best practices. If you're seeing issues like this, it's perfectly appropriate to trigger reconnect on RedisTimeoutExceptions in addition to RedisConnectionExceptions. Just don't be too aggressive with it because an overloaded server can also result in persistent RedisTimeoutExceptions. Recreating connections in that situation can cause additional server load and a cascade failure.

    I hope this information helps.

    Regards

    Geetha


1 additional answer

Sort by: Most helpful
  1. SAMITSARKAR_MSFT 791 Reputation points Microsoft Employee
    2023-11-09T17:30:24.2666667+00:00

    Hello Florian,

    Welcome to Microsoft Q&A Platform. Thank you for reaching out & hope you are doing well.

    I understand that application hosted with docker on linux appservice is getting Timed Out.

    Can you please share more insights about the Redis timed out Exception with the complete stack trace to identify the issue?

    You can also leverage to the article https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-troubleshoot-timeouts for more insights.

    Thanks


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.