How to mitigate Azure SignalR server connections dropping during production hours

81327804 1 Reputation point
2023-10-11T22:59:32.3533333+00:00

We have been using Azure SignalR service for a few years now and it has been great as part of our application solution. But for the last 2 weeks or so, our logs have shown that the server-side connection has been dropped, which causes all the associated user connections to drop as well. Our app handles the reconnection fine, and the server happily accepts the massive influx of all the agents reconnecting at the same time, but it is undesirable to have the connection drop and impact all our users during business hours. For the past 4 days (Oct 8-11) the drop has occurred at approx the same time (11:34am NSW time 00:34 UTC) and our users have noticed this pattern too.

My server app in running in Azure (we are in the australia-east region, so is the signalr service), I did diagnostic checks on the WebApp and it is not suffering from high CPU or memory constraints nor is it restarting. From what I can tell in the logs "Connection to '(Primary)https://...' was dropped, probably caused by network instability or service restart." is a result of the Azure Signalr Service closing the connection - my best guess is because of restarts, maintenance or upgrades...?

I understand from this (https://learn.microsoft.com/en-us/azure/azure-signalr/signalr-howto-troubleshoot-guide#server-connection-drops) page, that these kinds of events will happen, but almost daily and during business hours is having a negative impact on our applications' perceived reliability.

From other reading I've done, there is no way to prevent the server connection drops from affecting client connections - am I correct in this assumption?

And can anyone confirm that these drops are initiated by Azure SignalR Service and if so what the cause is, and if it's just maintenance can it be moved to outside local business hours (00:34 UTC makes me suspicious that it should be local time midnight, not UTC midnight...)

Thanks in advance for your help.

Azure SignalR Service
Azure SignalR Service
An Azure service that is used for adding real-time communications to web applications.
122 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. brtrach-MSFT 15,356 Reputation points Microsoft Employee
    2023-10-12T00:33:19.29+00:00

    @81327804 Thank you for reaching out. Before going forward, I want to stress that without access to your services logs, it will be difficult to provide a true understanding of why you are receiving these errors and what can be done. I should also stress that traditionally PaaS products have less configurability. This is changing though as some products are providing customers the ability to have more input as to when their service is upgraded or maintained.

    One item that would be worth trying that would require little to no effort on your side is to consider using Azure SignalR with availability zones. You can read more about that here. Australia East should support availability zones according to the latest documentation I've read. The first item you need to verify is what tier are you running your Azure SignalR service at? Availability Zones are only provided within the premium tier and enabled by default on the premium tier. Upgrading your service to the premium tier should not cause any downtime and would benefit you with high availability and fault tolerance.

    If enabling availability zones does not help to reduce the impact to your service, one option is to implement a retry mechanism on the client side to automatically reconnect to the server when the connection is dropped. This can help minimize the impact of the drops on your users.

    Let me know the results of the above suggestions as there is one final item, we can look into but let's start with the above items first.