How to handle connection timeout with Azure Managed Redis + OSS clustering policy + redispy ?

Question

How to handle connection timeout with Azure Managed Redis + OSS clustering policy + redispy ?

Daniel 0

Hello everyone,

I was using this code (#1) to setup my redis cluster (OSS clustering policy) connection and handle automatically "MOVE" errors. It was ok, no more "MOVE" errors in comparison to using a single Redis() setup.

Given that Azure Managed Redis only provides a single endpoint, I thought it was enough; until I got a timeout error message with the IP and port of one node. So, I'm assuming this setup does not handle error connections and proxy to the replica node, which is critical, as the service is already configured with "high availability".

Our fix was to use EnterpriseCluster policy and go back to the single Redis() connection as it seems in this mode, there is error connection handling at the Azure level and the "high availability" makes sense; but with the OSS clustering policy, I assume I need to implement the "high availability" handling on my side, right ? by catching errors, listing nodes and reconfigure the cli to use the available node; the thing is that Azure doesn't provides the addresses of the nodes, only a single endpoint; should I use the "CLUSTER NODES" command ?

Please, to anyone using the same setup with OSS clustering policy, what do you think ? is my code correct? I think the EnterpriseCluster is the clear, simplest and reliable solution (at a increased cost); my only concern is about the OSS being enabled by default together with "high availability" option, if it requires additional setup (or maybe is an error in the configuration page).

Thank you!

#1

        backoff = ExponentialBackoff() if retries > 1 else NoBackoff()
        retry = Retry(backoff, retries)

        host = os.environ.get("REDISHOST", "localhost")
        port = os.environ.get("REDISPORT", "6379")
        user = os.environ.get("REDISUSER", "")
        password = os.environ.get("REDISPASSWORD", "")
        startup_nodes = [ClusterNode(host=host, port=int(port))]

        return RedisCluster(  # type: ignore
            startup_nodes=startup_nodes,
            decode_responses=True,
            retry=retry,
            retry_on_error=[ConnectionError, TimeoutError],
            health_check_interval=30,
            require_full_coverage=False,
            read_from_replicas=False,
            username=user if user else None,
            password=password if password else None,
            ssl=True,
        )

Vijayalaxmi Kattimani 3,250 Reputation points Microsoft External Staff Moderator

2025-04-03T10:16:06.63+00:00

Hi @Daniel,

We haven’t heard from you on the last response and wanted to follow up to check if your issue has been resolved.

If you have found a solution, we would appreciate it if you could share it with the community, as it may be helpful to others. Otherwise, please provide more details, and we will do our best to assist you further.
Daniel 0 Reputation points

2025-04-03T10:28:14.4333333+00:00

Hello @Vijayalaxmi Kattimani ,

Thanks for the suggestion, we are testing with "Force Reconnect" and a "Retry" strategy; and keeping using Enterprise Cluster Policy to avoid implementing node discovery/error handling in our side. I'll let you know next week if it was ok.
Vijayalaxmi Kattimani 3,250 Reputation points Microsoft External Staff Moderator

2025-04-03T10:33:22.99+00:00

Hi @Daniel,

Thank you for the update.

Following up to see, If my last response answers your query, please click "Accept Answer" and select "Yes" for "Was this answer helpful?" This can be beneficial to others in the community.

1 answer

Your answer

Vijayalaxmi Kattimani 3,250 Reputation points Microsoft External Staff Moderator

2025-04-03T10:16:06.63+00:00

Hi @Daniel,

We haven’t heard from you on the last response and wanted to follow up to check if your issue has been resolved.

If you have found a solution, we would appreciate it if you could share it with the community, as it may be helpful to others. Otherwise, please provide more details, and we will do our best to assist you further.
Daniel 0 Reputation points

2025-04-03T10:28:14.4333333+00:00

Hello @Vijayalaxmi Kattimani ,

Thanks for the suggestion, we are testing with "Force Reconnect" and a "Retry" strategy; and keeping using Enterprise Cluster Policy to avoid implementing node discovery/error handling in our side. I'll let you know next week if it was ok.
Vijayalaxmi Kattimani 3,250 Reputation points Microsoft External Staff Moderator

2025-04-03T10:33:22.99+00:00

Hi @Daniel,

Thank you for the update.

Following up to see, If my last response answers your query, please click "Accept Answer" and select "Yes" for "Was this answer helpful?" This can be beneficial to others in the community.

Answer 1

Hi Daniel,

Greetings!

We would like to inform you that, managing connection timeouts with Azure Managed Redis and the OSS clustering policy can be challenging due to the single endpoint provided by Azure.

Here are some insights and suggestions to help you handle this setup efficiently:

Retry Logic: Your current retry logic using ExponentialBackoff and Retry is a good start. Ensure that your retry settings are appropriately configured to handle transient errors.
Health Checks: Setting health_check_interval=30 is beneficial. You might want to adjust this interval based on your application's tolerance for downtime and the expected frequency of node failures.
Error Handling: Since Azure Managed Redis provides a single endpoint, handling connection errors and proxying to replica nodes manually is necessary. You can use the CLUSTER NODES command to get the list of nodes and their statuses. This will help you identify available nodes and reconfigure your client accordingly.
Timeout Settings: Configure appropriate connect and command timeouts. A connect timeout of around 5 seconds is recommended.
Force Reconnect: Implement a mechanism to force reconnect in case of persistent connection issues. This can be done using a singleton pattern for your Redis connection and periodically forcing a reconnect.
Monitoring and Alerts: Set up monitoring and alerts to detect and respond to connection issues promptly. Azure provides various metrics and logs that can help you track the health of your Redis instances.

We would like to inform you that, since Azure Managed Redis offers two choices for clustering policy: OSS and Enterprise. OSS cluster policy is recommended for most applications because it supports higher maximum throughput, but there are advantages and disadvantages to each version. Please refer to this document https://learn.microsoft.com/en-us/azure/redis/architecture

Note: Azure Managed Redis in the preview feature, there may be certain bugs or limitations that do not exist in the stable version.

Please refer to the below mentioned link for more information.

https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/quickstart-create-redis-enterprise

https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-enterprise-tiers

https://learn.microsoft.com/en-us/azure/redis/best-practices-connection

https://learn.microsoft.com/en-us/azure/redis/troubleshoot-timeouts

I hope this information helps. Please do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Vijayalaxmi Kattimani 3,250 Reputation points Microsoft External Staff Moderator

2025-04-04T08:57:00.83+00:00

Hi @Daniel,

We haven’t heard from you on the last response and wanted to follow up to check if your issue has been resolved.

If you have found a solution, we would appreciate it if you could share it with the community, as it may be helpful to others. Otherwise, please provide more details, and we will do our best to assist you further.

Share via

How to handle connection timeout with Azure Managed Redis + OSS clustering policy + redispy ?

1 answer

Your answer