Edit

Deployment overview for Microsoft Foundry Models

Microsoft Foundry Models is the hub for discovering and deploying a wide range of AI models for generative AI applications. To make a model available for inference requests, you deploy it. Foundry offers two deployment options depending on the model type and your infrastructure needs.

Deployment options

Foundry provides two deployment options:

  • Standard deployment in Foundry resources — For Foundry Models, including Foundry Models sold by Azure (also known as Azure Direct Models, or ADM) and select Models from partners and community. This option is the preferred and most capable deployment path.
  • Managed compute deployment — Available for all non-ADM models, including some models from partner and community, and custom models.

The Foundry portal automatically selects the appropriate deployment option based on the model you choose. Foundry Models deploy through Foundry resources. All other models deploy on managed compute.

Standard deployment in Foundry resources Managed compute
Models ADM models (Azure OpenAI + partner models billed through Azure) and select Models from partners and community Other models in the model catalog from partners and custom models. For example, models from Hugging Face, NVIDIA NIMs, industry models, and Databricks.
Billing Token usage or provisioned throughput units (PTU) Compute core hours (per-minute, per-instance)
Data processing Regional, data zone, or global Regional only
Content filtering Built-in and customizable Via Azure AI Content Safety APIs

Standard deployment in Foundry resources

Standard deployment in Foundry resources is the preferred deployment option in Foundry. It supports the widest range of capabilities and deployment types.

Which models use standard deployment?

All Foundry Models, including Foundry Models sold by Azure and select Models from partners and community use standard deployment. Foundry Models sold by Azure include all Azure OpenAI models and selected models from top providers that are billed through your Azure subscription, covered by Azure service-level agreements, and supported by Microsoft. Select Models from partners and community that use standard deployment include Anthropic models, and specific models from partners like Mistral, Cohere, and Meta.

Capabilities

Standard deployment supports:

  • Multiple deployment types — Global Standard, Data Zone Standard, Regional Standard, Provisioned, Batch, and more. Each type controls where data is processed and how you pay. For details, see Deployment types for Microsoft Foundry Models.
  • Data processing flexibility — Choose regional, data zone (US or EU), or global processing based on your compliance requirements.
  • Content filtering — Built-in Azure AI Content Safety filters with customizable configurations.
  • Keyless authentication — Microsoft Entra ID (recommended) and key-based authentication.
  • Private networking — Virtual network integration for secure access.
  • Provisioned throughput — Reserve capacity with PTUs for predictable, low-latency performance. For details, see Provisioned throughput.

Resource requirements

Standard deployment is available in:

  • Foundry resources — The primary resource type for new Foundry projects. No AI Hub required.
  • Azure OpenAI resources — If you use Azure OpenAI resources, the model catalog shows only Azure OpenAI models for deployment. Upgrade to a Foundry resource for access to the full set of Foundry Models.

To get started with deployment, see Deploy Microsoft Foundry Models in the Foundry portal or Deploy models using Azure CLI and Bicep.

Managed compute deployment

Managed compute deployment creates a dedicated endpoint that hosts the model on dedicated compute resources. This option is required for all non-ADM models.

Important

Managed compute deployment creates a dedicated endpoint that hosts the model on dedicated compute resources. This option is required for models that don't belong to the category of Foundry Models sold by Azure and select Models from partners and community, such as custom models and industry models.

Which models use managed compute?

Examples of model collections that require managed compute include:

  • Hugging Face
  • Some Meta models
  • Some Mistral models
  • NVIDIA inference microservices (NIMs)
  • Industry models (Saifr, Rockwell, Bayer, Cerence, Sight Machine, Page AI, SDAIA)
  • Databricks
  • Custom models

Capabilities

Managed compute supports:

  • Dedicated compute resources — Model weights are deployed to dedicated virtual machines. A managed compute endpoint can host one or more deployments and exposes a REST API for inference.
  • Private networking — Virtual network integration for secure access.
  • Key and Microsoft Entra authentication — Secure access to your deployed endpoint.
  • Content safety — Use the Azure AI Content Safety service APIs to screen model responses. Content safety is billed separately.

Billing and quota

Managed compute billing is based on compute core hours. You're billed per minute depending on the product tier and the number of instances in the deployment. After you delete the endpoint, no further charges accrue.

You need compute quota in your Azure subscription for the specific virtual machine products required to run the model. Some models allow deployment to a temporarily shared quota for testing.

Get started

Deployment option comparison

Use Standard deployment in Foundry resources whenever possible. The following table compares capabilities across the two deployment options:

Capability Standard deployment in Foundry resources Managed compute
Which models can be deployed? All Foundry Models, including Foundry Models sold by Azure and select Models from partners and community Custom models, industry models, and some partner models
Deployment resource Foundry resource AI project (hub-based, classic portal)
Requires AI Hub No Yes
Data processing options Regional, data zone, global Regional
Private networking Yes Yes
Content filtering Built-in and customizable Via Azure AI Content Safety APIs
Keyless authentication Yes (Microsoft Entra ID) Key-based and Microsoft Entra
Billing Token usage or provisioned throughput units Compute core hours

Tip

For detailed pricing information, see Plan and manage costs for Foundry Tools.