Dears,
we are facing connection timeout to Redis Cache (PaaS) on our AKS workloads.
Before moving to the Redis PaaS solution, we were using our own Redis deploy (K8s pods).
We were facing the same issue: connection timeout.
One can think that the issue is caused by our application and not Redis itself.
But the point here is that we have many pods in separate namespaces, with different configurations and they all face Redis disconnections in the same time frame (approx 1 hour).
At this point my guess is that the underlying issue comes from the AKS node timesync.
One clue for this assumption is that all the pods facing the issue are on the same node, despite we have many replicas on other pool nodes.
Another clue is that while the issue is going on, we have no other issues in the cluster and node, all metrics are fine: CPU usage, IO, memory, nw bandwith...
My questions are:
1- is there any evidence that AKS has timesync issues in the current OS node version for k8s vers. 1.21.2 ?
2- how can I investigate on my own if timesync is occuring while I have the Redis timeouts ?
thanks
Marco