Architecture advice for distributed load testing Azure SignalR to 1 million concurrent users with custom scenarios

Benyamin Radmard 0 Reputation points
2025-12-05T08:17:31.7366667+00:00

I am architecting a large-scale real-time application using Azure SignalR Service (Premium Tier) and need to validate our system's performance scaling from 50k up to 1 million concurrent users.

The Challenge: My requirement is to run custom "bot" scenarios where clients are active, not just idle. Each simulated user needs to:

  1. Negotiate with our backend (HTTP request).

Connect to SignalR.

  1. Join a specific group

Send/Receive messages at a specific interval (e.g., 1 message every 5 seconds).

I am concerned that standard load testing approaches (like a simple JMeter script on a few VMs) will hit client-side bottlenecks long before we reach the SignalR Service limits. Specifically:

Ephemeral Port Exhaustion: A single load agent IP is limited to ~65k ports.

CPU Context Switching: Managing 1M active connections requires massive client-side resources.

Negotiation Bottleneck: Ramping up 1M users creates a "thundering herd" on our backend API /negotiate endpoint.

My Questions:

Distributed Architecture: What is the recommended Azure architecture for generating this level of distributed load? Is there a standard pattern using AKS (Kubernetes) to orchestrate thousands of lightweight clients to avoid port exhaustion?

Tooling: Are there specific tools or SDKs recommended by Microsoft for orchestrating custom SignalR scenarios at this scale? (Standard tools often struggle to simulate "smart" client logic without consuming excessive resources).

Ramp-up Strategy: How should we handle the load on the backend negotiation endpoint during the test? Is it common practice to mock the negotiation step during load tests to isolate the SignalR Service performance?

Any advice on the "Right Way" to architect this test bench on Azure would be appreciated.I am architecting a large-scale real-time application using Azure SignalR Service (Premium Tier) and need to validate our system's performance scaling from 50k up to 1 million concurrent users.

The Challenge: My requirement is to run custom "bot" scenarios where clients are active, not just idle. Each simulated user needs to:

Negotiate with our backend (HTTP request).

Connect to SignalR.

  1. Join a specific groups.

Send/Receive messages at a specific interval (e.g., 1 message every 5 seconds).

I am concerned that standard load testing approaches (like a simple JMeter script on a few VMs) will hit client-side bottlenecks long before we reach the SignalR Service limits. Specifically:

Ephemeral Port Exhaustion: A single load agent IP is limited to ~65k ports.

CPU Context Switching: Managing 1M active connections requires massive client-side resources.

Negotiation Bottleneck: Ramping up 1M users creates a "thundering herd" on our backend API /negotiate endpoint.

My Questions:

Distributed Architecture: What is the recommended Azure architecture for generating this level of distributed load? Is there a standard pattern using AKS (Kubernetes) to orchestrate thousands of lightweight clients to avoid port exhaustion?

Tooling: Are there specific tools or SDKs recommended by Microsoft for orchestrating custom SignalR scenarios at this scale? (Standard tools often struggle to simulate "smart" client logic without consuming excessive resources).

Ramp-up Strategy: How should we handle the load on the backend negotiation endpoint during the test? Is it common practice to mock the negotiation step during load tests to isolate the SignalR Service performance?

Any advice on the "Right Way" to architect this test bench on Azure would be appreciated.

Azure SignalR Service
Azure SignalR Service
An Azure service that is used for adding real-time communications to web applications.
{count} votes

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2025-12-05T08:17:56.0866667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To architect a distributed load testing solution for Azure SignalR Service that scales from 50,000 to 1 million concurrent users, consider the following recommendations:

    Distributed Architecture

    1. Use Azure Kubernetes Service (AKS): Deploy your load testing agents in AKS to orchestrate thousands of lightweight clients. This setup helps manage resources effectively and can mitigate port exhaustion issues by distributing connections across multiple pods.
    2. Horizontal Scaling: Scale your load testing agents horizontally by deploying multiple instances of your testing application within AKS. This allows you to handle more connections without hitting the ephemeral port limit.
    3. Client-Side Optimization: Design your clients to be lightweight and efficient. Use techniques such as connection pooling and reusing existing connections where possible to reduce overhead.

    Tooling

    1. Custom Load Testing Tools: While standard tools like JMeter may not suffice for simulating complex client logic, consider building a custom load testing tool using .NET or Node.js that can handle the SignalR negotiation and messaging logic. This tool can be designed to simulate active users effectively.
    2. Microsoft SDKs: Utilize the Azure SignalR Service SDK to manage connections and messaging efficiently. This SDK can help streamline the process of connecting clients and sending messages.

    Ramp-up Strategy

    1. Staggered Ramp-up: Implement a staggered ramp-up strategy for your load tests to avoid overwhelming the backend negotiation endpoint. Gradually increase the number of connections over time rather than starting all at once.
    2. Mocking Negotiation: It is common practice to mock the negotiation step during load tests to isolate the performance of the SignalR Service. This approach allows you to focus on the scalability of SignalR without the additional load on your backend API.
    3. Monitoring and Metrics: Continuously monitor the performance of both the SignalR Service and your backend during tests. Use Azure Monitor and Application Insights to gather metrics and identify bottlenecks.

    By following these strategies, you can effectively architect a robust load testing environment for your Azure SignalR Service application, ensuring it can handle the desired scale of concurrent users while minimizing potential bottlenecks.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.