Load balancer for LLM models in Azure AI foundry

Question

Load balancer for LLM models in Azure AI foundry

Guy Aronson 20

Is load balancing possible for multiple LLM deployments in Azure AI Foundry? My app's LLMs are overloaded despite quota increases.

A load balancer between the LLM deployments would optimize it.

Prashanth Veeragoni 5,245 Reputation points Microsoft External Staff Moderator

2025-02-26T01:02:03.2066667+00:00

Hi Guy Aronson,

Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let me know.

Thank You.
Guy Aronson 20 Reputation points

2025-02-26T07:05:44.5633333+00:00

Hi @Prashanth Veeragoni

Yes, the answers was helpful. Many thanks!
Prashanth Veeragoni 5,245 Reputation points Microsoft External Staff Moderator

2025-02-26T07:20:48.9333333+00:00

Hi Guy Aronson,

I'm glad to hear that your issue has been resolved. And thanks for sharing the information, which might be beneficial to other community members reading this thread as solution. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", so I'll reiterate the previous response to an answer in case you'd like to accept the answer. This will help other users who may have a similar query find the solution more easily.

Thank You.

Accepted answer

1 additional answer

Your answer

Prashanth Veeragoni 5,245 Reputation points Microsoft External Staff Moderator

2025-02-26T01:02:03.2066667+00:00

Hi Guy Aronson,

Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let me know.

Thank You.
Guy Aronson 20 Reputation points

2025-02-26T07:05:44.5633333+00:00

Hi @Prashanth Veeragoni

Yes, the answers was helpful. Many thanks!
Prashanth Veeragoni 5,245 Reputation points Microsoft External Staff Moderator

2025-02-26T07:20:48.9333333+00:00

Hi Guy Aronson,

I'm glad to hear that your issue has been resolved. And thanks for sharing the information, which might be beneficial to other community members reading this thread as solution. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", so I'll reiterate the previous response to an answer in case you'd like to accept the answer. This will help other users who may have a similar query find the solution more easily.

Thank You.

Answer 1

Hi Guy Aronson,

Welcome to Microsoft Q&A forum. Thank you for posting your query.

Yes, load balancing for multiple LLM deployments in Azure AI Foundry is possible, but Azure does not provide an out-of-the-box load balancer specifically for LLM models in AI Foundry. However, you can implement a custom load balancing approach using Azure services. Here’s how you can handle this issue effectively.

Possible Reasons for Overloading Despite Quota Increases:

Uneven distribution of requests: If all requests are hitting a single deployment, others might be underutilized.

Rate limits per instance: Even with a quota increase, a single deployment has request rate limits.

High latency: Some LLMs may be slower in response times, causing backlogs.

Lack of request routing: If your app is not distributing requests effectively, it could be overwhelming specific deployments.

Implementing Load Balancing for LLMs in Azure AI Foundry:

Since Azure AI Foundry does not natively support LLM load balancing, consider these three approaches:

Approach 1: Use Azure Front Door (Recommended)

Approach 2: Use Azure Application Gateway with API Management

Approach 3: Implement a Custom Load Balancer in Code

Monitor endpoint health and retry if a server is unresponsive.

For enterprise-scale deployments, the best approach is Azure Front Door + API Management. If you want a quick, cost-effective solution, try custom code-based load balancing.

As Approach1 is recommended Let’s check the implementation

Azure Front Door can act as a global load balancer and distribute requests across multiple LLM deployments.

How to set up:

Deploy multiple instances of the same LLM model in different regions.

Configure Azure Front Door to route traffic between different endpoints.

Use Weighted or Priority-based Routing:

Weighted routing__:__ Distributes traffic proportionally to deployments.

Priority-based routing__:__ Redirects traffic if a deployment fails.

Hope this helps. Do let us know if you any further queries.

-------------

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Thank you.

Rathinasamy, Yamunadevi 0 Reputation points

2025-06-18T15:13:19.1533333+00:00

I am using using Microsoft AI central for load balancing. Is that a good approach?

Answer 2

Guy Aronson 20

Hi @Prashanth Veeragoni ,

Yes the answer was helpful.

Many thanks!

Share via

Load balancer for LLM models in Azure AI foundry

1 additional answer

Your answer