Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Model router for Azure AI Foundry is a deployable AI chat model that selects the best large language model (LLM) to respond to a prompt in real time. It uses different preexisting models to deliver high performance and save compute costs, all in one model deployment. To learn more about how model router works, its advantages, and limitations, see the Model router concepts guide.
Use model router through the Completions API like you use a single base model such as GPT-4. Follow the same steps as in the Chat completions guide.
Deploy a model router model
Model router is packaged as a single Azure AI Foundry model that you deploy. Follow the steps in the resource deployment guide. In the Create new deployment step, find model-router in the Models list. Select it, and then complete the rest of the deployment steps.
Note
Your deployment settings apply to all underlying chat models that model router uses.
- Don't deploy the underlying chat models separately. Model router works independently of your other deployed models.
- Select a content filter when you deploy the model router model or apply a filter later. The content filter applies to all content passed to and from the model router; don't set content filters for each underlying chat model.
- Your tokens-per-minute rate limit setting applies to all activity to and from the model router; don't set rate limits for each underlying chat model.## Use model router in chats
You can use model router through the chat completions API in the same way you'd use other OpenAI chat models. Set the model parameter to the name of our model router deployment, and set the messages parameter to the messages you want to send to the model.
In the Azure AI Foundry portal, go to your model router deployment on the Models + endpoints page and select it to open the model playground. In the playground, enter messages and see the model's responses. Each response shows which underlying model the router selected.
Important
You can set the Temperature and Top_P parameters to the values you prefer (see the concepts guide), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it ignores the Temperature and Top_P input parameters.
The parameters stop, presence_penalty, frequency_penalty, logit_bias, and logprobs are similarly dropped for o-series models but used otherwise.
Important
The reasoning_effort parameter (see the Reasoning models guide) isn't supported in model router. If the model router selects a reasoning model for your prompt, it also selects a reasoning_effort input value based on the complexity of the prompt.
Output format
The JSON response you receive from a model router model is identical to the standard chat completions API response. Note that the "model" field reveals which underlying model was selected to respond to the prompt.
{
"choices": [
{
"content_filter_results": {
"hate": {
"filtered": "False",
"severity": "safe"
},
"protected_material_code": {
"detected": "False",
"filtered": "False"
},
"protected_material_text": {
"detected": "False",
"filtered": "False"
},
"self_harm": {
"filtered": "False",
"severity": "safe"
},
"sexual": {
"filtered": "False",
"severity": "safe"
},
"violence": {
"filtered": "False",
"severity": "safe"
}
},
"finish_reason": "stop",
"index": 0,
"logprobs": "None",
"message": {
"content": "I'm doing well, thank you! How can I assist you today?",
"refusal": "None",
"role": "assistant"
}
}
],
"created": 1745308617,
"id": "xxxx-yyyy-zzzz",
"model": "gpt-4.1-nano-2025-04-14",
"object": "chat.completion",
"prompt_filter_results": [
{
"content_filter_results": {
"hate": {
"filtered": "False",
"severity": "safe"
},
"jailbreak": {
"detected": "False",
"filtered": "False"
},
"self_harm": {
"filtered": "False",
"severity": "safe"
},
"sexual": {
"filtered": "False",
"severity": "safe"
},
"violence": {
"filtered": "False",
"severity": "safe"
}
},
"prompt_index": 0
}
],
"system_fingerprint": "xxxx",
"usage": {
"completion_tokens": 15,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 0,
"rejected_prediction_tokens": 0
},
"prompt_tokens": 21,
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
},
"total_tokens": 36
}
}
Monitor model router metrics
Monitor performance
Monitor the performance of your model router deployment in Azure Monitor (AzMon) in the Azure portal.
- Go to the Monitoring > Metrics page for your Azure OpenAI resource in the Azure portal.
- Filter by the deployment name of your model router model.
- Split the metrics by underlying models if needed.
Monitor costs
You can monitor the costs of model router, which is the sum of the costs incurred by the underlying models.
- Visit the Resource Management -> Cost analysis page in the Azure portal.
- If needed, filter by Azure resource.
- Then, filter by deployment name: Filter by "Tag", select Deployment as the type of the tag, and then select your model router deployment name as the value.