Configure your AI project to use Azure AI model inference

Article
01/23/2025

If you already have an AI project in Azure AI Foundry, the model catalog deploys models from third-party model providers as stand-alone endpoints in your project by default. Each model deployment has its own set of URI and credentials to access it. On the other hand, Azure OpenAI models are deployed to Azure AI Services resource or to the Azure OpenAI Service resource.

You can change this behavior and deploy both types of models to Azure AI Services resources using Azure AI model inference. Once configured, deployments of Models as a Service models supporting pay-as-you-go billing happen to the connected Azure AI Services resource instead to the project itself, giving you a single set of endpoint and credential to access all the models deployed in Azure AI Foundry. You can manage Azure OpenAI and third-party model providers models in the same way.

Additionally, deploying models to Azure AI model inference brings the extra benefits of:

Routing capability.
Custom content filters.
Global capacity deployment type.
Key-less authentication with role-based access control.

In this article, you learn how to configure your project to use models deployed in Azure AI model inference in Azure AI services.

Prerequisites

To complete this tutorial, you need:

An Azure subscription. If you're using GitHub Models, you can upgrade your experience and create an Azure subscription in the process. Read Upgrade from GitHub Models to Azure AI model inference if it's your case.
An Azure AI services resource. For more information, see Create an Azure AI Services resource.
An Azure AI project and Azure AI Hub.

Tip

When your AI hub is provisioned, an Azure AI services resource is created with it and the two resources connected. To see which Azure AI services resource is connected to your project, go to the Azure AI Foundry portal > Management center > Connected resources, and find the connections of type AI Services.

Configure the project to use Azure AI model inference

To configure the project to use the Azure AI model inference capability in Azure AI Services, follow these steps:

Go to Azure AI Foundry portal.
At the top navigation bar, over the right corner, select the Preview features icon. A contextual blade shows up at the right of the screen.
Turn the feature Deploy models to Azure AI model inference service on.
Close the panel.
In the landing page of your project, identify the Azure AI Services resource connected to your project. Use the drop-down to change the resource you're connected if you need to.
If no resource is listed in the drop-down, your AI Hub doesn't have an Azure AI Services resource connected to it. Create a new connection by:
1. In the lower left corner of the screen, select Management center.
2. In the section Connections select New connection.
3. Select Azure AI services.
4. In the browser, look for an existing Azure AI Services resource in your subscription.
5. Select Add connection.
6. The new connection is added to your Hub.
7. Return to the project's landing page to continue and now select the new created connection. Refresh the page if it doesn't show up immediately.
Under Included capabilities, ensure you select Azure AI Inference. The Azure AI model inference endpoint URI is displayed along with the credentials to get access to it.

Tip

Each Azure AI services resource has a single Azure AI model inference endpoint which can be used to access any model deployment on it. The same endpoint serves multiple models depending on which ones are configured. Learn about how the endpoint works.
Take note of the endpoint URL and credentials.

Create the model deployment in Azure AI model inference

For each model you want to deploy under Azure AI model inference, follow these steps:

Go to Model catalog section in Azure AI Foundry portal.
Scroll to the model you're interested in and select it.
You can review the details of the model in the model card.
Select Deploy.
For models providers that require more terms of contract, you're asked to accept those terms. Accept the terms on those cases by selecting Subscribe and deploy.
You can configure the deployment settings at this time. By default, the deployment receives the name of the model you're deploying. The deployment name is used in the model parameter for request to route to this particular model deployment. It allows you to configure specific names for your models when you attach specific configurations. For instance, o1-preview-safe for a model with a strict content safety content filter.
We automatically select an Azure AI Services connection depending on your project because you turned on the feature Deploy models to Azure AI model inference service. Use the Customize option to change the connection based on your needs. If you're deploying under the Standard deployment type, the models need to be available in the region of the Azure AI Services resource.
Select Deploy.
Once the deployment finishes, you see the endpoint URL and credentials to get access to the model. Notice that now the provided URL and credentials are the same as displayed in the landing page of the project for the Azure AI model inference endpoint.
You can view all the models available under the resource by going to Models + endpoints section and locating the group for the connection to your AI Services resource:

Upgrade your code with the new endpoint

Once your Azure AI Services resource is configured, you can start consuming it from your code. You need the endpoint URL and key for it, which can be found in the Overview section:

You can use any of the supported SDKs to get predictions out from the endpoint. The following SDKs are officially supported:

OpenAI SDK
Azure OpenAI SDK
Azure AI Inference package
Azure AI Projects package

See the supported languages and SDKs section for more details and examples. The following example shows how to use the Azure AI Inference package with the newly deployed model:

Install the package azure-ai-inference using your package manager, like pip:

Bash

pip install azure-ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

Python

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

model = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
)

Explore our samples and read the API reference documentation to get yourself started.

Install the package @azure-rest/ai-inference using npm:

Bash

npm install @azure-rest/ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

JavaScript

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
    "https://<resource>.services.ai.azure.com/models", 
    new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
);

Explore our samples and read the API reference documentation to get yourself started.

Install the Azure AI inference library with the following command:

.NET CLI

dotnet add package Azure.AI.Inference --prerelease

Import the following namespaces:

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

Explore our samples and read the API reference documentation to get yourself started.

Add the package to your project:

XML

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-inference</artifactId>
    <version>1.0.0-beta.1</version>
</dependency>

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

Java

ChatCompletionsClient client = new ChatCompletionsClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Explore our samples and read the API reference documentation to get yourself started.

Use the reference section to explore the API design and which parameters are available. For example, the reference section for Chat completions details how to use the route /chat/completions to generate predictions based on chat-formatted instructions. Notice that the path /models is included to the root of the URL:

Request

HTTP/1.1

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json

Generate your first chat completion:

Python

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Explain Riemann's conjecture in 1 paragraph"),
    ],
    model="mistral-large"
)

print(response.choices[0].message.content)

JavaScript

var messages = [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
];

var response = await client.path("/chat/completions").post({
    body: {
        messages: messages,
        model: "mistral-large"
    }
});

console.log(response.choices[0].message.content)

requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph")
    },
    Model = "mistral-large"
};

response = client.Complete(requestOptions);
Console.WriteLine($"Response: {response.Value.Content}");

Java

List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));

ChatCompletions chatCompletions = client.complete(new ChatCompletionsOptions(chatMessages));

for (ChatChoice choice : chatCompletions.getChoices()) {
    ChatResponseMessage message = choice.getMessage();
    System.out.println("Response:" + message.getContent());
}

Request

HTTP/1.1

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json

JSON

{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant"
        },
        {
            "role": "user",
            "content": "Explain Riemann's conjecture in 1 paragraph"
        }
    ],
    "model": "mistral-large"
}

Use the parameter model="<deployment-name> to route your request to this deployment. Deployments work as an alias of a given model under certain configurations. See Routing concept page to learn how Azure AI Services route deployments.

Move from Serverless API Endpoints to Azure AI model inference

Although you configured the project to use the Azure AI model inference, existing model deployments continue to exist within the project as Serverless API Endpoints. Those deployments aren't moved for you. Hence, you can progressively upgrade any existing code that reference previous model deployments. To start moving the model deployments, we recommend the following workflow:

Recreate the model deployment in Azure AI model inference. This model deployment is accessible under the Azure AI model inference endpoint.
Upgrade your code to use the new endpoint.
Clean up the project by removing the Serverless API Endpoint.

Upgrade your code with the new endpoint

Once the models are deployed under Azure AI Services, you can upgrade your code to use the Azure AI model inference endpoint. The main difference between how Serverless API endpoints and Azure AI model inference works reside in the endpoint URL and model parameter. While Serverless API Endpoints have a set of URI and key per each model deployment, Azure AI model inference has only one for all of them.

The following table summarizes the changes you have to introduce:

Property	Serverless API Endpoints	Azure AI Model Inference
Endpoint	`https://<endpoint-name>.<region>.inference.ai.azure.com`	`https://<ai-resource>.services.ai.azure.com/models`
Credentials	One per model/endpoint.	One per Azure AI Services resource. You can use Microsoft Entra ID too.
Model parameter	None.	Required. Use the name of the model deployment.

Clean-up existing Serverless API endpoints from your project

After you refactored your code, you might want to delete the existing Serverless API endpoints inside of the project (if any).

For each model deployed as Serverless API Endpoints, follow these steps:

Go to Azure AI Foundry portal.
Select Models + endpoints.
Identify the endpoints of type Serverless and select the one you want to delete.
Select the option Delete.

Warning

This operation can't be reverted. Ensure that the endpoint isn't currently used by any other user or piece of code.
Confirm the operation by selecting Delete.
If you created a Serverless API connection to this endpoint from other projects, such connections aren't removed and continue to point to the inexistent endpoint. Delete any of those connections for avoiding errors.

Limitations

Consider the following limitations when configuring your project to use Azure AI model inference:

Only models supporting pay-as-you-go billing (Models as a Service) are available for deployment to Azure AI model inference. Models requiring compute quota from your subscription (Managed Compute), including custom models, can only be deployed within a given project as Managed Online Endpoints and continue to be accessible using their own set of endpoint URI and credentials.
Models available as both pay-as-you-go billing and managed compute offerings are, by default, deployed to Azure AI model inference in Azure AI services resources. Azure AI Foundry portal doesn't offer a way to deploy them to Managed Online Endpoints. You have to turn off the feature mentioned at Configure the project to use Azure AI model inference or use the Azure CLI/Azure ML SDK/ARM templates to perform the deployment.

Next steps

Add more models to your endpoint.

Additional resources

Documentation

Add and configure models to Azure AI services - Azure AI Foundry

Learn how to add and configure new models to the Azure AI model's inference endpoint in Azure AI services.
What is Azure AI model inference? - Azure AI Foundry

Apply advanced language models to variety of use cases with Azure AI model inference.
Create and configure resources for Azure AI model inference - Azure AI Foundry

Learn how to get your environment ready with the Azure AI model inference
Models available in Azure AI model inference - Azure AI Foundry

Explore the models available via the Azure AI model inference and their capabilities.
Supported programming languages for models in Azure AI Model Inference - Azure AI Foundry

Learn about supported programming languages for models in Azure AI Model Inference
How to use reasoning models with Azure AI model inference - Azure AI Foundry

Learn how to use reasoning capabilities from models with Azure AI model inference

Training

Module

Deploy a model to a managed online endpoint - Training

Learn how to deploy models to a managed online endpoint for real-time inferencing.

Certification

Microsoft Certified: Azure AI Engineer Associate - Certifications

Design and implement an Azure AI solution using Azure AI services, Azure AI Search, and Azure Open AI.

Share via

Configure your AI project to use Azure AI model inference

Prerequisites

Configure the project to use Azure AI model inference

Create the model deployment in Azure AI model inference

Upgrade your code with the new endpoint

Move from Serverless API Endpoints to Azure AI model inference

Upgrade your code with the new endpoint

Clean-up existing Serverless API endpoints from your project

Limitations

Next steps

Feedback

Additional resources