Edit

Share via


Add and configure models from Azure AI Foundry Models

You can decide and configure which models are available for inference in the Azure AI services resource model's inference endpoint. When a given model is configured, you can then generate predictions from it by indicating its model name or deployment name on your requests. No further changes are required in your code to use it.

In this article, you learn how to add a new model from Azure AI Foundry Models.

Prerequisites

To complete this article, you need:

Add a model

As opposite to GitHub Models where all the models are already configured, the Azure AI Services resource allows you to control which models are available in your endpoint and under which configuration.

You can add all the models you need in the endpoint by using Azure AI Foundry for GitHub. In the following example, we add a Mistral-Large model in the service:

  1. Go to Model catalog section in Azure AI Foundry for GitHub.

  2. Scroll to the model you're interested in and select it.

    An animation showing how to search models in the model catalog and select one for viewing its details.

  3. You can review the details of the model in the model card.

  4. Select Deploy.

  5. For model providers that require more terms of contract, you are asked to accept those terms. This is the case for Mistral models for instance. Accept the terms on those cases by selecting Subscribe and deploy.

    Screenshot showing how to agree the terms and conditions of a Mistral-Large model.

  6. You can configure the deployment settings at this time. By default, the deployment receives the name of the model you're deploying. The deployment name is used in the model parameter for request to route to this particular model deployment. This allows you to also configure specific names for your models when you attach specific configurations. For instance o1-preview-safe for a model with a strict content filter. Use third-party models like Mistral, you can also configure the deployment to use a specific version of the model.

Tip

Each model can support different deployments types, providing different data residency or throughput guarantees. See deployment types for more details.

  1. Use the Customize option if you need to change settings like content filter.

Screenshot showing how to customize the deployment if needed.

  1. Select Deploy.

  2. Once the deployment completes, the new model is listed in the page and it's ready to be used.

Use the model

Deployed models in Azure AI Foundry Models can be consumed using the Azure AI model's inference endpoint for the resource.

To use it:

  1. Get the Azure AI model's inference endpoint URL and keys from the deployment page or the Overview page. If you're using Microsoft Entra ID authentication, you don't need a key.

  2. When constructing your request, indicate the parameter model and insert the model deployment name you created.

    from azure.ai.inference.models import SystemMessage, UserMessage
    
    response = client.complete(
        messages=[
            SystemMessage(content="You are a helpful assistant."),
            UserMessage(content="Explain Riemann's conjecture in 1 paragraph"),
        ],
        model="mistral-large"
    )
    
    print(response.choices[0].message.content)
    
  3. When using the endpoint, you can change the model parameter to any available model deployment in your resource.

Additionally, Azure OpenAI models can be consumed using the Azure OpenAI in Azure AI Foundry Models endpoint in the resource. This endpoint is exclusive for each model deployment and has its own URL.

Model deployment customization

When creating model deployments, you can configure additional settings including content filtering and rate limits. Select the option Customize in the deployment wizard to configure it.

Note

Configurations may vary depending on the model you're deploying.