Hi @Florian Pierrel When you experience this undetected connection loses, that sounds consistent with our monthly patching. Ultimately, when we patch the cache all clients are disconnected but can then immediately reconnect. The linked article explains the reasoning behind that.
As you’ve found there is an issue with linux settings being too generous with your default values causing reconnects to fail for ~15 mins or so.
Since it doesn’t appear you can set the OS-level setting TCP parameter net.ip4v.tcp_retries2 in App Service, here is a snippet from the comment linked above that offers a few client-side options you could configure in stackexchange.redis such as adjusting keepalive settings and using the ForceReconnect pattern.
As you found, there are TCP settings you can change on the client machine to force it to timeout the connection sooner and allow for reconnect. In addition to tcp_retries2, you can try tuning the keepalive settings as discussed here: lettuce-io/lettuce-core#1428 (comment). It should be safe to reduce these timeouts to more realistic durations machine-wide unless you have systems that actually depend on the unusually long retransmits.
An additional approach is using the ForceReconnect pattern recommended in the Azure best practices. If you're seeing issues like this, it's perfectly appropriate to trigger reconnect on RedisTimeoutExceptions in addition to RedisConnectionExceptions. Just don't be too aggressive with it because an overloaded server can also result in persistent RedisTimeoutExceptions. Recreating connections in that situation can cause additional server load and a cascade failure.
I hope this information helps.
Regards
Geetha