Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Microsoft Foundry Models is your one-stop destination for discovering, evaluating, and deploying powerful AI models. Whether you're building a custom copilot, creating an agent, enhancing an existing application, or exploring new AI capabilities, Foundry Models has you covered.
By using Foundry Models, you can:
- Explore a rich catalog of cutting-edge models from Microsoft, OpenAI, DeepSeek, Hugging Face, Meta, and more.
- Compare and evaluate models side-by-side by using real-world tasks and your own data.
- Deploy with confidence, thanks to built-in tools for fine-tuning, observability, and responsible AI.
- Choose your path - bring your own model, use a hosted one, or integrate seamlessly with Azure services.
- Whether you're a developer, data scientist, or enterprise architect, Foundry Models gives you the flexibility and control to build AI solutions that scale - securely, responsibly, and fast.
Foundry offers a comprehensive catalog of AI models. There are more than 1,900 models ranging from Foundation Models, Reasoning Models, Small Language Models, Multimodal Models, Domain Specific Models, Industry Models, and more.
The catalog is organized into two main categories:
Understanding the distinction between these categories helps you choose the right models based on your specific requirements and strategic goals.
Models sold directly by Azure
Microsoft hosts and sells these models under Microsoft Product Terms. These models undergo rigorous evaluation and are deeply integrated into Azure's AI ecosystem. They come from various top providers and offer enhanced integration, optimized performance, and direct Microsoft support, including enterprise-grade service level agreements (SLAs).
Characteristics of these direct models:
- Official first-party support from Microsoft
- High level of integration with Azure services and infrastructure
- Extensive performance benchmarking and validation
- Adherence to Microsoft's Responsible AI standards
- Enterprise-grade scalability, reliability, and security
These models also have the benefit of fungible provisioned throughput, meaning you can flexibly use your quota and reservations across any of these models.
Models from partners and community
These models constitute most of the Foundry Models. Trusted organizations, partners, research labs, and community contributors provide these models. They offer specialized and diverse AI capabilities, covering a wide array of scenarios, industries, and innovations.
Characteristics of models from partners and community:
- Developed and supported by external partners and community contributors
- Diverse range of specialized models catering to niche or broad use cases
- Typically validated by providers themselves, with integration guidelines provided by Azure
- Community-driven innovation and rapid availability of cutting-edge models
- Standard Azure AI integration, with support and maintenance managed by the respective providers
You can deploy models as Managed Compute or Standard (pay-as-you-go) deployment options. The model provider selects how the models are deployable.
Choosing between direct models and partner and community models
When selecting models from Foundry Models, consider the following factors:
- Use case and requirements: Models sold directly by Azure are ideal for scenarios that require deep Azure integration, guaranteed support, and enterprise SLAs. Azure ecosystem models excel in specialized use cases and innovation-led scenarios.
- Support expectations: Models sold directly by Azure come with robust Microsoft-provided support and maintenance. Providers support their models, with varying levels of SLA and support structures.
- Innovation and specialization: Models from partners and community offer rapid access to specialized innovations and niche capabilities, often developed by leading research labs and emerging AI providers.
Model collections
The model catalog organizes models into different collections:
Azure OpenAI models exclusively available on Azure: Flagship Azure OpenAI models available through an integration with Azure OpenAI in Foundry Models. Microsoft supports these models and their use according to the product terms and SLA for Azure OpenAI in Foundry Models.
Open models from the Hugging Face hub: Hundreds of models from the Hugging Face hub for real-time inference with managed compute. Hugging Face creates and maintains models listed in this collection. For help, use the Hugging Face forum or Hugging Face support. Learn more in Deploy open models with Foundry.
To request adding a model to the model catalog, use this form.
Overview of model catalog capabilities
The model catalog in Foundry portal is the hub to discover and use a wide range of models for building generative AI applications. The model catalog features hundreds of models across model providers such as Azure OpenAI, Mistral, Meta, Cohere, NVIDIA, and Hugging Face, including models that Microsoft trained. Models from providers other than Microsoft are Non-Microsoft Products as defined in Microsoft Product Terms and are subject to the terms provided with the models.
You can search and discover models that meet your need through keyword search and filters. The model catalog also offers the model performance leaderboard and benchmark metrics for select models. You can access them by selecting Browse leaderboard and Compare Models. Benchmark data is also accessible from the model card Benchmark tab.
On the model catalog filters, you find:
- Collection: filter models based on the model provider collection.
- Industry: filter for the models that are trained on industry specific dataset.
- Capabilities: filter for unique model features such as reasoning and tool calling.
- Deployment options: filter for the models that support a specific deployment options.
- Standard: pay per API call.
- Provisioned: best suited for real-time scoring for large consistent volume.
- Batch: best suited for cost-optimized batch jobs, and not latency. No playground support is provided for the batch deployment.
- Managed compute: deploy a model on an Azure virtual machine. You're billed for hosting and inferencing.
- Inference tasks: filter models based on the inference task type.
- Fine-tune tasks: filter models based on the fine-tuned task type.
- Licenses: filter models based on the license type.
On the model card, you find:
- Quick facts: key information about the model at a quick glance.
- Details: detailed information about the model, including description, version info, supported data type, and more.
- Benchmarks: performance benchmark metrics for select models.
- Existing deployments: if you already deployed the model, you can find it under Existing deployments tab.
- License: legal information related to model licensing.
- Artifacts: displayed for open models only. You can see the model assets and download them through the user interface.
Model deployment: Managed compute and serverless deployments
In addition to Azure OpenAI models, the model catalog offers two distinct ways to deploy models for your use: managed compute and serverless deployments.
The deployment options and features available for each model vary, as described in the following tables. Learn more about data processing with the deployment options.
Capabilities of model deployment options
| Features | Managed compute | Serverless deployments |
|---|---|---|
| Deployment experience and billing | Deploy model weights to dedicated virtual machines by using managed compute. A managed compute can have one or more deployments and makes available a REST API for inference. You're billed for the virtual machine core hours that the deployments use. | Access models through a deployment that provisions an API to access the model. The API provides access to the model that Microsoft hosts and manages, for inference. You're billed for inputs and outputs to the APIs, typically in tokens. Pricing information is provided before you deploy. |
| API authentication | Keys and Microsoft Entra authentication. | Keys and Microsoft Entra authentication. |
| Content safety | Use Azure AI Content Safety service APIs. | Azure AI Content Safety filters are available integrated with inference APIs. Azure AI Content Safety filters are billed separately. |
| Network isolation | Configure managed networks for Foundry hubs. | Serverless deployments follow your hub's public network access (PNA) flag setting. For more information, see the Network isolation for models deployed through standard deployments section later in this article. |
Managed compute
The capability to deploy models by using managed compute builds on platform capabilities of Azure Machine Learning to enable seamless integration, across the entire GenAIOps (sometimes called LLMOps) lifecycle, of the wide collection of models in the model catalog.
Availability of models for deployment as managed compute
You can access these models through Azure Machine Learning registries. They enable an ML-first approach to hosting and distributing Machine Learning assets such as model weights, container runtimes for running the models, pipelines for evaluating and fine-tuning the models, and datasets for benchmarks and samples. These ML Registries build on highly scalable and enterprise-ready infrastructure that:
Delivers low-latency access to model artifacts in all Azure regions with built-in geo-replication.
Supports enterprise security requirements such as limiting access to models with Azure Policy and secure deployment with managed virtual networks.
Deployment of models for inference with managed compute
Deploy models available for deployment with managed compute to Azure Machine Learning online endpoints for real-time inference, or use them for Azure Machine Learning batch inference to batch process your data. Deploying to managed compute requires Virtual Machine quota in your Azure Subscription for the specific SKUs needed to optimally run the model. Some models allow you to deploy to temporarily shared quota for testing the model. Learn more about deploying models:
Building generative AI apps with managed compute
Prompt flow offers capabilities for prototyping, experimenting, iterating, and deploying your AI applications. You can use models deployed with managed compute in prompt flow with the LLM tool, which supports all models available through the Azure AI model inference API. You can also use the REST API exposed by managed computes in popular LLM tools like LangChain with the langchain-azure-ai package.
Content safety for models deployed as managed compute
Azure AI Content Safety is available for use with models deployed to managed compute to screen for various categories of harmful content such as sexual content, violence, hate, and self-harm, and advanced threats such as jailbreak risk detection and protected material detection. Use the Content Safety (Text) tool in prompt flow to pass model responses to Azure AI Content Safety for screening, or integrate directly using the Azure AI Content Safety APIs. You're billed separately as per Azure AI Content Safety pricing for such use.
Serverless deployments
You can deploy certain models in the model catalog as serverless deployments. Microsoft hosts the models in managed infrastructure, which enables API-based access to the model provider's model. API-based access can dramatically reduce the cost of accessing a model and significantly simplify the provisioning experience. Most serverless deployments use token-based pricing.
How are non-Microsoft models made available as serverless deployments?
Model providers offer models for serverless deployment, but Microsoft hosts these models in Azure infrastructure and provides access through APIs. Model providers set the license terms and price for their models. The Azure Machine Learning service manages the hosting infrastructure, makes the inference APIs available, and acts as the data processor for prompts submitted and content output by models deployed as serverless deployments. For more information about data processing, see the data privacy article.
Note
Cloud Solution Provider (CSP) subscriptions can't purchase standard deployment models.
Billing
Foundry portal and Azure Machine Learning studio provide the discovery, subscription, and consumption experience for models deployed as serverless deployments. Users accept license terms for use of the models. Pricing information for consumption is provided during deployment.
You pay for models from non-Microsoft providers through Azure Marketplace, in accordance with the Microsoft Commercial Marketplace Terms of Use.
You pay for models from Microsoft via Azure meters as First Party Consumption Services. As described in the Product Terms, you purchase First Party Consumption Services by using Azure meters, but these services aren't subject to Azure service terms. Use of these models is subject to the provided license terms.
Fine-tuning models
For models available as serverless deployments that support fine-tuning, you can take advantage of hosted fine-tuning to tailor models using your own data. For more information, see fine-tune models in Foundry portal.
RAG with models deployed as standard deployments
Foundry enables users to make use of Vector Indexes and Retrieval Augmented Generation. You can use models that you deploy as standard deployments to generate embeddings and inferencing based on custom data to generate answers specific to your use case. For more information, see Retrieval augmented generation and indexes.
Regional availability of offers and models
Users can access serverless deployments only if their Azure subscription belongs to a billing account in a country or region where the model provider makes the offer available. If the offer is available in the relevant region, the user must have a Hub or Project in the Azure region where the model is available for deployment or fine-tuning, as applicable. For detailed information, see Region availability for models in standard deployments.
Content safety for models deployed through standard deployments
Important
This feature is currently in public preview. This preview version is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
For language models deployed via serverless API, Azure AI implements a default configuration of Azure AI Content Safety text moderation filters that detect harmful content such as hate, self-harm, sexual, and violent content. To learn more about content filtering, see Guardrails & controls for Models Sold Directly by Azure.
Tip
Content filtering is not available for certain model types that are deployed via serverless API. These model types include embedding models and time series models.
Content filtering occurs synchronously as the service processes prompts to generate content. You might be billed separately according to Azure AI Content Safety pricing for such use. You can disable content filtering for individual serverless endpoints either:
- At the time when you first deploy a language model
- Later, by selecting the content filtering toggle on the deployment details page
Suppose you decide to use an API other than the Model Inference API to work with a model that's deployed via a serverless API. In such a situation, content filtering isn't enabled unless you implement it separately by using Azure AI Content Safety.
To get started with Azure AI Content Safety, see Quickstart: Analyze text content. If you don't use content filtering when working with models that are deployed via serverless API, you run a higher risk of exposing users to harmful content.
Network isolation for models deployed through standard deployments
Endpoints for models deployed as standard deployments follow the public network access (PNA) flag setting of the workspace in which the deployment exists. To secure your serverless deployment endpoint, disable the PNA flag on your workspace. You can secure inbound communication from a client to your endpoint by using a private endpoint for the workspace.
To set the PNA flag for the workspace:
- Go to the Azure portal.
- Search for Azure Machine Learning, and select your workspace from the list of workspaces.
- On the Overview page, use the left pane to go to Settings > Networking.
- Under the Public access tab, configure settings for the public network access flag.
- Save your changes. It might take up to five minutes for your changes to propagate.
Limitations
- If you have a workspace with a private endpoint created before July 11, 2024, new serverless deployment endpoints added to this workspace don't follow its networking configuration. Instead, you need to create a new private endpoint for the workspace and create new serverless deployments in the workspace so that the new deployments can follow the workspace's networking configuration.
- If you have a workspace with serverless deployments created before July 11, 2024, and you enable a private endpoint on this workspace, the existing serverless deployments don't follow the workspace's networking configuration. To bring serverless deployments in the workspace into networking compliance, you need to create the deployments again.
- Currently On Your Data support isn't available for serverless deployments in private workspaces, since private workspaces have the PNA flag disabled.
- Any network configuration change (for example, enabling or disabling the PNA flag) might take up to five minutes to propagate.