Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure AI Foundry Models include a comprehensive catalog of models organized into two categories—Models sold directly by Azure, and Models from partners and community. These models from partners and community, which are available for deployment on a managed compute, are either open or protected models. In this article, you learn how to use protected models from partners and community, offered via Azure Marketplace for deployment on managed compute with pay-as-you-go billing.
Prerequisites
An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.
If you don't have one, create a hub based project.
Azure Marketplace purchases enabled for your Azure subscription.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned a custom role with the following permissions. User accounts assigned the Owner or Contributor role for the Azure subscription can also create deployments. For more information on permissions, see Role-based access control in Azure AI Foundry portal.
On the Azure subscription— to subscribe the workspace/project to the Azure Marketplace offering:
- Microsoft.MarketplaceOrdering/agreements/offers/plans/read
- Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action
- Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read
- Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read
- Microsoft.SaaS/register/action
On the resource group— to create and use the SaaS resource:
- Microsoft.SaaS/resources/read
- Microsoft.SaaS/resources/write
On the workspace— to deploy endpoints:
- Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/*
- Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*
Subscription scope and unit of measure for Azure Marketplace offer
Azure AI Foundry enables a seamless subscription and transaction experience for protected models as you create and consume your dedicated model deployments at scale. The deployment of protected models on managed compute involves pay-as-you-go billing for the customer in two dimensions:
- Per-hour Azure Machine Learning compute billing for the virtual machines employed in the deployment.
- Surcharge billing for the model as set by the model publisher on the Azure Marketplace offer.
Pay-as-you-go billing of Azure compute and model surcharge is pro-rated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that can be used to deploy the model on Azure AI Foundry managed compute.
A user's subscription to Azure Marketplace offers are scoped to the project resource within Azure AI Foundry. If a subscription to the Azure Marketplace offer for a particular model already exists within the project, the user is informed in the deployment wizard that the subscription already exists for the project.
Note
For NVIDIA inference microservices (NIM), multiple models are associated with a single marketplace offer, so you only have to subscribe to the NIM offer once within a project to be able to deploy all NIMs offered by NVIDIA in the AI Foundry model catalog. If you want to deploy NIMs in a different project with no existing SaaS subscription, you need to resubscribe to the offer.
To find all the SaaS subscriptions that exist in an Azure subscription:
Sign in to the Azure portal and go to your Azure subscription.
Select Subscriptions and then select your Azure subscription to open its overview page.
Select Settings > Resources to see the list of resources.
Use the Type filter to select the SaaS resource type.
The consumption-based surcharge is accrued to the associated SaaS subscription and billed to a user via Azure Marketplace. You can view the invoice in the Overview tab of the respective SaaS subscription.
Subscribe and deploy on managed compute
Tip
Because you can customize the left pane in the Azure AI Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.
Sign in to Azure AI Foundry.
If you're not already in your project, select it.
Select Model catalog from the left pane.
Select the Deployment options filter in the model catalog and choose Managed compute.
Filter the list further by selecting the Collection and model of your choice. In this article, we use Cohere Command A from the list of supported models for illustration.
From the model's page, select Use this model to open the deployment wizard.
Choose from one of the supported VM SKUs for the model. You need to have Azure Machine Learning Compute quota for that SKU in your Azure subscription.
Select Customize to specify your deployment configuration for parameters such as the instance count. You can also select an existing endpoint for the deployment or create a new one. For this example, we specify an instance count of 1 and create a new endpoint for the deployment.
Select Next to proceed to the pricing breakdown page.
Review the pricing breakdown for the deployment, terms of use, and license agreement associated with the model's offer on Azure Marketplace. The pricing breakdown tells you what the aggregated pricing for the deployed model would be, where the surcharge for the model is a function of the number of GPUs in the VM instance that is selected in the previous steps. In addition to the applicable surcharge for the model, Azure compute charges also apply, based on your deployment configuration. If you have existing reservations or Azure savings plan, the invoice for the compute charges honors and reflects the discounted VM pricing.
Select the checkbox to acknowledge that you understand and agree to the terms of use. Then, select Deploy. Azure AI Foundry creates the user's subscription to the marketplace offer and then creates the deployment of the model on a managed compute. It takes about 15-20 minutes for the deployment to complete.
Consume deployments
After your deployment is successfully created, you can follow these steps to consume it:
- Select Models + Endpoints under My assets in your Azure AI Foundry project.
- Select your deployment from the Model deployments tab.
- Navigate to the Test tab for sample inference to the endpoint.
- Return to the Details tab to copy the deployment's "Target URI", which you can use to run inference with code.
- Go to the Consume tab of the deployment to find code samples for consumption.
Network isolation of deployments
Collections in the model catalog can be deployed within your isolated networks using workspace managed virtual network. For more information on how to configure your workspace managed networks, see Configure a managed virtual network to allow internet outbound.
Limitation
An Azure AI Foundry project with ingress Public Network Access disabled can only support a single active deployment of one of the protected models from the catalog. Attempts to create more active deployments result in deployment creation failures.
Supported models
The following sections list the supported models for managed compute deployment with pay-as-you-go billing, grouped by collection.
Paige AI
Model | Task |
---|---|
Virchow2G | Image Feature Extraction |
Virchow2G-Mini | Image Feature Extraction |
Cohere
Model | Task |
---|---|
Command A | Chat completion |
Embed v4 | Embeddings |
Rerank v3.5 | Text classification |
NVIDIA
NVIDIA inference microservices (NIM) are containers built by NVIDIA for optimized pretrained and customized AI models serving on NVIDIA GPUs. NVIDIA NIMs available on Azure AI Foundry model catalog can be deployed with a Standard subscription to the NVIDIA NIM SaaS offer on Azure Marketplace.
Some special things to note about NIMs are:
NIMs include a 90-day trial. The trial applies to all NIMs associated with a particular SaaS subscription, and starts from the time the SaaS subscription is created.
SaaS subscriptions scope to an Azure AI Foundry project. Because multiple models are associated with a single Azure Marketplace offer, you only need to subscribe once to the NIM offer within a project, then you're able to deploy all the NIMs offered by NVIDIA in the AI Foundry model catalog. If you want to deploy NIMs in a different project with no existing SaaS subscription, you need to resubscribe to the offer.
Model | Task |
---|---|
Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice | Chat completion |
Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice | Chat completion |
Deepseek-R1-Distill-Llama-8B-NIM-microservice | Chat completion |
Llama-3.3-70B-Instruct-NIM-microservice | Chat completion |
Llama-3.1-8B-Instruct-NIM-microservice | Chat completion |
Mistral-7B-Instruct-v0.3-NIM-microservice | Chat completion |
Mixtral-8x7B-Instruct-v0.1-NIM-microservice | Chat completion |
Llama-3.2-NV-embedqa-1b-v2-NIM-microservice | Embeddings |
Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice | Text classification |
Openfold2-NIM-microservice | Protein Binder |
ProteinMPNN-NIM-microservice | Protein Binder |
MSA-search-NIM-microservice | Protein Binder |
Rfdiffusion-NIM-microservice | Protein Binder |
Consume NVIDIA NIM deployments
After your deployment is successfully created, you can follow the steps in Consume deployments to consume it.
NVIDIA NIMs on Azure AI Foundry expose an OpenAI compatible API. See the API reference to learn more about the payload supported. The model
parameter for NIMs on Azure AI Foundry is set to a default value within the container and isn't required to be passed in to the request payload to your online endpoint. The Consume tab of the NIM deployment on Azure AI Foundry includes code samples for inference with the target URL of your deployment.
You can also consume NIM deployments using the Azure AI Foundry Models SDK, with limitations that include:
- No support for creating and authenticating clients using
load_client
. - You should call client method
get_model_info
to retrieve model information.
Develop and run agents with NIM endpoints
The following NVIDIA NIMs of chat completions task type in the model catalog can be used to create and run agents using Agent Service using various supported tools, with the following two extra requirements:
- Create a Serverless Connection to the project using the NIM endpoint and Key. The target URL for the NIM endpoint in the connection should be
https://<endpoint-name>.region.inference.ml.azure.com/v1/
. - Set the model parameter in the request body to be of the form,
https://<endpoint>.region.inference.ml.azure.com/v1/@<parameter value per table below>
while creating and running agents.
NVIDIA NIM | model parameter value |
---|---|
Llama-3.3-70B-Instruct-NIM-microservice | meta/llama-3.3-70b-instruct |
Llama-3.1-8B-Instruct-NIM-microservice | meta/llama-3.1-8b-instruct |
Mistral-7B-Instruct-v0.3-NIM-microservice | mistralai/mistral-7b-instruct-v0.3 |
Security scanning
NVIDIA ensures the security and reliability of NVIDIA NIM container images through best-in-class vulnerability scanning, rigorous patch management, and transparent processes. To learn more about security scanning, see the security page. Microsoft works with NVIDIA to get the latest patches of the NIMs to deliver secure, stable, and reliable production-grade software within Azure AI Foundry.
You can refer to the last updated time for the NIM on the right pane of the model's overview page. You can redeploy to consume the latest version of NIM from NVIDIA on Azure AI Foundry.