Deployment overview for Microsoft Foundry Models

Note

This document refers to the Microsoft Foundry (classic) portal.

🔍 View the Microsoft Foundry (new) documentation to learn about the new portal.

The model catalog in Microsoft Foundry is the hub to discover and use a wide range of Foundry Models for building generative AI applications. You need to deploy models to make them available for receiving inference requests. Foundry offers a comprehensive suite of deployment options for Foundry Models, depending on your needs and model requirements.

Deployment options

Foundry provides several deployment options depending on the type of models and resources you need to provision. The following deployment options are available:

Standard deployment in Foundry resources
Deployment to serverless API endpoints
Deployment to managed computes

Foundry portal might automatically pick a deployment option based on your environment and configuration. Use Foundry resources for deployment whenever possible. Models that support multiple deployment options default to Foundry resources for deployment. To access other deployment options, use the Azure CLI or Azure Machine Learning SDK for deployment.

Standard deployment in Foundry resources

Foundry resources (formerly referred to as Azure AI Services resources), is the preferred deployment option in Foundry. It offers the widest range of capabilities, including regional, data zone, or global processing, and it offers standard and provisioned throughput (PTU) options. Flagship models in Foundry Models support this deployment option.

This deployment option is available in:

Foundry resources
Azure OpenAI resources¹
Azure AI hub, when connected to a Foundry resource

¹If you use Azure OpenAI resources, the model catalog shows only Azure OpenAI in Foundry Models for deployment. You can get the full list of Foundry Models by upgrading to a Foundry resource.

To get started with standard deployment in Foundry resources, see How-to: Deploy models to Foundry Models.

Serverless API endpoint

This deployment option is available only in Azure AI hub resources. It allows you to create dedicated endpoints to host the model, accessible through an API. Foundry Models support serverless API endpoints with pay-as-you-go billing, and you can create only regional deployments for serverless API endpoints.

To get started with deployment to a serverless API endpoint, see Deploy models as serverless API deployments.

Managed compute

This deployment option is available only in Azure AI hub resources. It allows you to create a dedicated endpoint to host the model in a dedicated compute. You need to have compute quota in your subscription to host the model, and you're billed per compute uptime.

Managed compute deployment is required for model collections that include:

Hugging Face
NVIDIA inference microservices (NIMs)
Industry models (Saifr, Rockwell, Bayer, Cerence, Sight Machine, Page AI, SDAIA)
Databricks
Custom models

To get started, see How to deploy and inference a managed compute deployment and Deploy Foundry Models to managed compute with pay-as-you-go billing.

Capabilities for the deployment options

Use Standard deployments in Foundry resources whenever possible. This deployment option provides the most capabilities among the available deployment options. The following table lists details about specific capabilities for each deployment option:

Capability	Standard deployment in Foundry resources	Serverless API Endpoint	Managed compute
Which models can be deployed?	Foundry Models	Foundry Models with pay-as-you-go billing	Open and custom models
Deployment resource	Foundry resource	AI project (in AI hub resource)	AI project (in AI hub resource)
Requires AI Hubs	No	Yes	Yes
Data processing options	Regional Data-zone Global	Regional	Regional
Private networking	Yes	Yes	Yes
Content filtering	Yes	Yes	No
Custom content filtering	Yes	No	No
Key-less authentication	Yes	No	No
Billing bases	Token usage & provisioned throughput units	Token usage²	Compute core hours³

² A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in serverless deployment. After you delete the endpoint, no further charges accrue.

³ Billing is on a per-minute basis, depending on the product tier and the number of instances used in the deployment since the moment of creation. After you delete the endpoint, no further charges accrue.

Feedback

Was this page helpful?

Last updated on 2025-11-18