Deploy Microsoft Foundry Models to managed compute with pay-as-you-go billing

Note

This document refers to the Microsoft Foundry (classic) portal.

🔍 View the Microsoft Foundry (new) documentation to learn about the new portal.

Microsoft Foundry Models include a comprehensive catalog of models organized into two categories—models sold directly by Azure, and models from partners and community. The models from partners and community, which you can deploy on managed compute, are either open or protected models. In this article, you learn how to use protected models from partners and community, offered through Azure Marketplace, for deployment on managed compute with pay-as-you-go billing.

Prerequisites

An Azure subscription with a valid payment method. Free or trial Azure subscriptions don't work. If you don't have an Azure subscription, create a paid Azure account to begin.
If you don't have one, create a hub project for Foundry. You can deploy to managed compute using a hub project. A Foundry project won't work for this purpose.
Azure Marketplace purchases enabled for your Azure subscription.
Azure role-based access controls (Azure RBAC) grant access to operations in Foundry portal. To perform the steps in this article, your user account must be assigned a custom role with the following permissions. User accounts assigned the Owner or Contributor role for the Azure subscription can also create deployments. For more information on permissions, see Role-based access control in Foundry portal.
On the Azure subscription— to subscribe the workspace/project to the Azure Marketplace offering:
- Microsoft.MarketplaceOrdering/agreements/offers/plans/read
- Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action
- Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read
- Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read
- Microsoft.SaaS/register/action
On the resource group— to create and use the SaaS resource:
- Microsoft.SaaS/resources/read
- Microsoft.SaaS/resources/write
On the workspace— to deploy endpoints:
- Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/*
- Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*

Subscription scope and unit of measure for Azure Marketplace offer

Foundry provides a seamless subscription and transaction experience for protected models as you create and consume your dedicated model deployments at scale. The deployment of protected models on managed compute involves pay-as-you-go billing for the customer in two dimensions:

Per-hour Azure Machine Learning compute billing for the virtual machines used in the deployment.
Surcharge billing for the model as set by the model publisher on the Azure Marketplace offer.

Pay-as-you-go billing of Azure compute and model surcharge is prorated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that you can use to deploy the model on Foundry managed compute.

A user's subscription to Azure Marketplace offers are scoped to the project resource within Foundry. If a subscription to the Azure Marketplace offer for a particular model already exists within the project, the user is informed in the deployment wizard that the subscription already exists for the project.

Note

For NVIDIA inference microservices (NIM), multiple models are associated with a single marketplace offer, so you only have to subscribe to the NIM offer once within a project to be able to deploy all NIMs offered by NVIDIA in the Foundry model catalog. If you want to deploy NIMs in a different project with no existing SaaS subscription, you need to resubscribe to the offer.

To find all the SaaS subscriptions that exist in an Azure subscription:

Sign in to the Azure portal and go to your Azure subscription.
Select Subscriptions and then select your Azure subscription to open its overview page.
Select Settings > Resources to see the list of resources.
Use the Type filter to select the SaaS resource type.

The consumption-based surcharge goes to the associated SaaS subscription and bills the user through Azure Marketplace. You can view the invoice in the Overview tab of the respective SaaS subscription.

Tip

Because you can customize the left pane in the Microsoft Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.

Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).
If you're not already in your project, select it.
Select Model catalog from the left pane.
Filter the models list by selecting the Collection and model of your choice. This article uses Cohere Command A from the list of supported models for illustration.
From the model's page, select Use this model to open the deployment wizard.
If presented purchase options, select Managed Compute.
If you don't have dedicated quota, select the checkbox next to the statement: I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.
Choose from one of the supported virtual machine (VM) SKUs for the model. You need to have Azure Machine Learning compute quota for that SKU in your Azure subscription.
Select Customize to specify your deployment configuration for parameters such as the instance count. You can also select an existing endpoint for the deployment or create a new one. For this example, specify an instance count of 1 and create a new endpoint for the deployment.
Select Next to proceed to the pricing breakdown page.
Review the pricing breakdown for the deployment, terms of use, and license agreement associated with the model's offer on Azure Marketplace. The pricing breakdown tells you what the aggregated pricing for the deployed model would be, where the surcharge for the model is a function of the number of GPUs in the VM instance that you selected in the previous steps. In addition to the applicable surcharge for the model, Azure compute charges also apply, based on your deployment configuration. If you have existing reservations or Azure savings plan, the invoice for the compute charges honors and reflects the discounted VM pricing.
Select the checkbox to acknowledge that you understand and agree to the terms of use. Then, select Deploy. Foundry creates your subscription to the marketplace offer and then creates the deployment of the model on a managed compute. It takes about 15-20 minutes for the deployment to complete.

Consume deployments

After you successfully create your deployment, follow these steps to consume it:

Select Models + Endpoints under My assets in your Foundry project.
Select your deployment from the Model deployments tab.
Go to the Test tab for sample inference to the endpoint.
Return to the Details tab to copy the deployment's "Target URI", which you can use to run inference with code.
Go to the Consume tab of the deployment to find code samples for consumption.

Network isolation of deployments

You can deploy collections in the model catalog within your isolated networks by using workspace managed virtual network. For more information on how to configure your workspace managed networks, see Configure a managed virtual network to allow internet outbound.

Limitation

A Foundry project with ingress Public Network Access disabled can only support a single active deployment of one of the protected models from the catalog. Attempts to create more active deployments result in deployment creation failures.

Supported models

The following sections list the supported models for managed compute deployment with pay-as-you-go billing, grouped by collection.

Cohere

Model	Task
Command A	Chat completion
Embed v4	Embeddings
Rerank v3.5	Text classification
Cohere-rerank-v4.0-pro	rerank text classification
Cohere-rerank-v4.0-fast	rerank text classification

Inception Labs

Model	Task
Mercury	Chat completion, Text generation, Summarization

NVIDIA

NVIDIA inference microservices (NIM) are containers that NVIDIA builds for optimized pretrained and customized AI models serving on NVIDIA GPUs. You can deploy NVIDIA NIMs available on Foundry model catalog with a Standard subscription to the NVIDIA NIM SaaS offer on Azure Marketplace.

Some special things to note about NIMs are:

NIMs include a 90-day trial. The trial applies to all NIMs associated with a particular SaaS subscription, and starts from the time the SaaS subscription is created.
SaaS subscriptions scope to a Foundry project. Because multiple models are associated with a single Azure Marketplace offer, you only need to subscribe once to the NIM offer within a project, then you're able to deploy all the NIMs offered by NVIDIA in the Foundry model catalog. If you want to deploy NIMs in a different project with no existing SaaS subscription, you need to resubscribe to the offer.

Model	Task
Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice	Chat completion
Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice	Chat completion
Deepseek-R1-Distill-Llama-8B-NIM-microservice	Chat completion
Llama-3.3-70B-Instruct-NIM-microservice	Chat completion
Llama-3.1-8B-Instruct-NIM-microservice	Chat completion
Mistral-7B-Instruct-v0.3-NIM-microservice	Chat completion
Mixtral-8x7B-Instruct-v0.1-NIM-microservice	Chat completion
Llama-3.2-NV-embedqa-1b-v2-NIM-microservice	Embeddings
Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice	Text classification
Openfold2-NIM-microservice	Protein Binder
ProteinMPNN-NIM-microservice	Protein Binder
MSA-search-NIM-microservice	Protein Binder
Rfdiffusion-NIM-microservice	Protein Binder
NVIDIA-Nemotron-Nano-9b-v2-NIM-microservice	Chat completion
Trellis-NIM-microservice	Image-to-3D, Text-to-3D, 3D-generation
Cosmos-reason1-NIM-microservice	Task-completion-verification, Action-affordance, Next-plausible-action-prediction
Evo2-40b-NIM-microservice	Genomics
Boltz2-NIM-microservice	Structure-Prediction
Llama-3.3-Nemotron-Super-49B-v1.5-NIM-microservice	Chat completion, Summarization

Consume NVIDIA NIM deployments

After you create your deployment, follow the steps in Consume deployments to consume it.

NVIDIA NIMs on Foundry expose an OpenAI compatible API. See the API reference to learn more about the supported payload. The model parameter for NIMs on Foundry is set to a default value within the container and isn't required in the request payload to your online endpoint. The Consume tab of the NIM deployment on Foundry includes code samples for inference with the target URL of your deployment.

You can also consume NIM deployments by using the Foundry Models SDK, with limitations that include:

No support for creating and authenticating clients using load_client.
You should call client method get_model_info to retrieve model information.

Develop and run agents with NIM endpoints

The following NVIDIA NIMs of chat completions task type in the model catalog can be used to create and run agents using Agent Service with various supported tools, with the following two extra requirements:

Create a Serverless Connection to the project by using the NIM endpoint and key. The target URL for the NIM endpoint in the connection should be https://<endpoint-name>.region.inference.ml.azure.com/v1/.
Set the model parameter in the request body to be of the form, https://<endpoint>.region.inference.ml.azure.com/v1/@<parameter value per table below> while creating and running agents.

NVIDIA NIM	`model` parameter value
Llama-3.3-70B-Instruct-NIM-microservice	meta/llama-3.3-70b-instruct
Llama-3.1-8B-Instruct-NIM-microservice	meta/llama-3.1-8b-instruct
Mistral-7B-Instruct-v0.3-NIM-microservice	mistralai/mistral-7b-instruct-v0.3

Security scanning

NVIDIA ensures the security and reliability of NVIDIA NIM container images through best-in-class vulnerability scanning, rigorous patch management, and transparent processes. To learn more about security scanning, see the security page. Microsoft works with NVIDIA to get the latest patches of the NIMs to deliver secure, stable, and reliable production-grade software within Foundry.

You can refer to the last updated time for the NIM on the right pane of the model's overview page. You can redeploy to consume the latest version of NIM from NVIDIA on Foundry.

Paige AI

Model	Task
Virchow2G	Image Feature Extraction
Virchow2G-Mini	Image Feature Extraction

Voyage AI

Model	Task
voyage-3.5-embedding-model	Embeddings

Feedback

Was this page helpful?

Last updated on 2025-12-11