Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
You can decide and configure which models are available for inference in the Azure AI services resource model's inference endpoint. When a given model is configured, you can then generate predictions from it by indicating its model name or deployment name on your requests. No further changes are required in your code to use it.
In this article, you learn how to add a new model from Azure AI Foundry Models.
Prerequisites
To complete this article, you need:
- An Azure subscription. If you're using GitHub Models, you can upgrade your experience and create an Azure subscription in the process. Read Upgrade from GitHub Models to Azure AI Foundry Models if it's your case.
- An Azure AI services resource. For more information, see Create an Azure AI Foundry resource.
Add a model
As opposite to GitHub Models where all the models are already configured, the Azure AI Services resource allows you to control which models are available in your endpoint and under which configuration.
You can add all the models you need in the endpoint by using Azure AI Foundry for GitHub. In the following example, we add a Mistral-Large
model in the service:
Go to Model catalog section in Azure AI Foundry for GitHub.
Scroll to the model you're interested in and select it.
You can review the details of the model in the model card.
Select Deploy.
For model providers that require more terms of contract, you are asked to accept those terms. This is the case for Mistral models for instance. Accept the terms on those cases by selecting Subscribe and deploy.
You can configure the deployment settings at this time. By default, the deployment receives the name of the model you're deploying. The deployment name is used in the
model
parameter for request to route to this particular model deployment. This allows you to also configure specific names for your models when you attach specific configurations. For instanceo1-preview-safe
for a model with a strict content filter. Use third-party models like Mistral, you can also configure the deployment to use a specific version of the model.
Tip
Each model can support different deployments types, providing different data residency or throughput guarantees. See deployment types for more details.
- Use the Customize option if you need to change settings like content filter.
Select Deploy.
Once the deployment completes, the new model is listed in the page and it's ready to be used.
Use the model
Deployed models in Azure AI Foundry Models can be consumed using the Azure AI model's inference endpoint for the resource.
To use it:
Get the Azure AI model's inference endpoint URL and keys from the deployment page or the Overview page. If you're using Microsoft Entra ID authentication, you don't need a key.
When constructing your request, indicate the parameter
model
and insert the model deployment name you created.from azure.ai.inference.models import SystemMessage, UserMessage response = client.complete( messages=[ SystemMessage(content="You are a helpful assistant."), UserMessage(content="Explain Riemann's conjecture in 1 paragraph"), ], model="mistral-large" ) print(response.choices[0].message.content)
When using the endpoint, you can change the
model
parameter to any available model deployment in your resource.
Additionally, Azure OpenAI models can be consumed using the Azure OpenAI in Azure AI Foundry Models endpoint in the resource. This endpoint is exclusive for each model deployment and has its own URL.
Model deployment customization
When creating model deployments, you can configure additional settings including content filtering and rate limits. Select the option Customize in the deployment wizard to configure it.
Note
Configurations may vary depending on the model you're deploying.