How to load balance in Azure AI Foundry Agent Service?

takeolexus 180 Reputation points
2025-06-27T08:02:06.7133333+00:00

I am creating a Blazor server app for a chatbot using the Azure AI Foundry Agent Service.

With the conventional AOAI (Azure OpenAI) API alone, I was able to load balance API requests using services like Azure API Management or YARP. However, with the new Agent Service, the Agent itself calls other tools (for example, BingGroundingSearch). The connection information for that tool needs to be determined at the time of the Agent's instantiation, which makes load balancing difficult.

How should I design the architecture?

Example code is below.

var endpoint = _config["AgentEndpoint"]!;
var bingConnectionId = _config["BingConnectionId"]!;
var credential = new AzureCliCredential();
// You need to set the region-specific connector ID here
BingGroundingSearchConfiguration bingSearchConfig = new(bingConnectionId);
BingGroundingToolDefinition bingGroundingTool = new(
    new BingGroundingSearchToolParameters(
        [
            bingSearchConfig
        ]
    )
);
PersistentAgentsClient agentClient = new(endpoint, credential);
PersistentAgent definition = agentClient.Administration.CreateAgent(
    model: "gpt-4.1-mini",
    name: "BengSearchPlugin",
    instructions: "Use the bing grounding tool to answer questions.",
    tools: [bingGroundingTool]    // <--- Agent region and Bing region are paired
);

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,685 questions
{count} votes

Accepted answer
  1. Manas Mohanty 6,690 Reputation points Microsoft External Staff Moderator
    2025-07-03T05:01:56.28+00:00

    Hi takeolexus

    Sorry for the late response.

    At that time, the BingGroundingSearch connection ID to connect to is: /subscriptions/<subscription id>/resourceGroups/<rg name>/providers/Microsoft.CognitiveServices/accounts/<eastus2 region's AI agent name>/projects/<project name>/connections/bingopenai Even if eastus2 is load balanced and connected to westus3, the BingGroundingSearch connection ID remains that of the original region, eastus2.

    Yes*, Respective sources stay in native region during load balances. Only requests get routed to different endpoints through APIM (Tested and Enterprise ready)/ Loadbalancer*

    I had suggested MCP Protocol as viable solution as multiple MCP server can be load balanced through APIM or load balancer

    Went through below documentation.

    https://techcommunity.microsoft.com/blog/integrationsonazureblog/azure-api-management-your-auth-gateway-for-mcp-servers/4402690

    Idea mentioned in above doc requests are routed the from AI agent or copilot with MCP protocol to APIM Gateway which will route the requests to multiple MCP server connected with "Grounding with Bing search" services and the processed information gets back to AI agents or copilots eventually.

    User's image

    You can connect to APIM or load balancer with Logic app or Azure function

    Reference - https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/overview

    Here is other possible solution in mind

    Multi Agent away

    You can connect multiple agents with one Agent. You can specify in connected Agents or Central Agent to failover to other Agents.

    image

    https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/connected-agents?pivots=portal

    https://techcommunity.microsoft.com/blog/azure-ai-services-blog/building-a-digital-workforce-with-multi-agents-in-azure-ai-foundry-agent-service/4414671

    Routing to Multiple Grounding with Bing Search

    We can connect to logic app and route the requests to other Grounding Bing search or Agent in case there is internal server error or latency.

    Please let us know if you can accept this answer.

    Thank you.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Obinna Ejidike 2,850 Reputation points
    2025-06-27T10:34:08.4666667+00:00

    Hi takeolexus

    Thanks for using the Q&A platform.

    Azure AI Foundry doesn't offer native LLM load balancing. Instead, you need to build it using Azure networking services.

    Microsoft suggests combining Azure Front Door and API Management, then routing traffic to multiple agent endpoints.

    You can find the community thread on this here: https://learn.microsoft.com/en-us/answers/questions/2181647/load-balancer-for-llm-models-in-azure-ai-foundry

    If the response was helpful, please feel free to mark it as “Accepted Answer” and consider giving it an upvote. This also benefits others in the community.

    Regards,

    Obinna.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.