Edit

Share via


Deploy Microsoft Foundry Models in the Foundry portal

Note

This document refers to the Microsoft Foundry (classic) portal.

🔄 Switch to the Microsoft Foundry (new) documentation if you're using the new portal.

Note

This document refers to the Microsoft Foundry (new) portal.

In this article, you learn how to use the Foundry portal to deploy a Foundry Model in a Foundry resource for use in performing inferencing tasks. Foundry Models include models such as Azure OpenAI models, Meta Llama models, and more. Once you deploy a Foundry Model, you can interact with it by using the Foundry Playground and inference it by using code.

This article uses a Foundry Model from partners and community Llama-3.2-90B-Vision-Instruct for illustration. Models from partners and community require that you subscribe to Azure Marketplace before deployment. On the other hand, Foundry Models sold directly by Azure, such as Azure Open AI in Foundry Models, don't have this requirement. For more information about Foundry Models, including the regions where they're available for deployment, see Foundry Models sold directly by Azure and Foundry Models from partners and community.

Prerequisites

To complete this article, you need:

Deploy a model

Deploy a model by following these steps in the Foundry portal:

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).

    Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).

  2. Go to the Model catalog section in the Foundry portal.

  3. Select a model and review its details in the model card. This article uses Llama-3.2-90B-Vision-Instruct for illustration.

  4. Select Use this model.

  5. For Foundry Models from partners and community, you need to subscribe to Azure Marketplace. This requirement applies to Llama-3.2-90B-Vision-Instruct, for example. Read the terms of use and select Agree and Proceed to accept the terms.

    Note

    For Foundry Models sold directly by Azure, such as the Azure OpenAI model gpt-4o-mini, you don't subscribe to Azure Marketplace.

  6. Configure the deployment settings. By default, the deployment receives the name of the model you're deploying, but you can modify the name as needed before deploying the model. Later during inferencing, the deployment name is used in the model parameter to route requests to this particular model deployment. This convention allows you to configure specific names for your model deployments.

    Tip

    Each model supports different deployment types, providing different data residency or throughput guarantees. See deployment types for more details. In this example, the model supports the Global Standard deployment type.

  7. The Foundry portal automatically selects the Foundry resource associated with your project as the Connected AI resource. Select Customize to change the connection if needed. If you're deploying under the Serverless API deployment type, the project and resource must be in one of the supported regions of deployment for the model.

    Screenshot showing how to customize the deployment if needed.

  8. Select Deploy. The model's deployment details page opens up while the deployment is being created.

  9. When the deployment completes, the model is ready for use. You can also use the Foundry Playgrounds to interactively test the model.

Deploy a model by following these steps in the Foundry portal:

  1. Sign in to Microsoft Foundry. Make sure the New Foundry toggle is off. These steps refer to Foundry (classic).

    Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).

  2. From the Foundry portal homepage, select Discover in the upper-right navigation, then Models in the left pane.

  3. Select a model and review its details in the model card. This article uses Llama-3.2-90B-Vision-Instruct for illustration.

  4. Select Deploy > Custom settings to customize your deployment. Alternatively, you can use the default deployment settings by selecting Deploy > Default settings.

  5. For Foundry Models from partners and community, you need to subscribe to Azure Marketplace. This requirement applies to Llama-3.2-90B-Vision-Instruct, for example. Read the terms of use and select Agree and Proceed to accept the terms.

    Note

    For Foundry Models sold directly by Azure, such as the Azure OpenAI model gpt-4o-mini, you don't subscribe to Azure Marketplace.

  6. Configure the deployment settings. By default, the deployment receives the name of the model you're deploying, but you can modify the name as needed before deploying the model. Later during inferencing, the deployment name is used in the model parameter to route requests to this particular model deployment. This convention allows you to configure specific names for your model deployments. Select Deploy to create your deployment.

    Tip

    Each model supports different deployment types, providing different data residency or throughput guarantees. See deployment types for more details. In this example, the model supports the Global Standard deployment type.

  7. The Foundry portal automatically deploys your model in the Foundry resource associated with your project. Your project and resource must be in one of the supported regions of deployment for the model.

  8. Select Deploy. When the deployment completes, you land on the Foundry Playgrounds where you can interactively test the model.

Manage models

You can manage the existing model deployments in the resource by using the Foundry portal.

  1. Go to the Models + Endpoints section in Foundry portal.

  2. The portal groups and displays model deployments per resource. Select the Llama-3.2-90B-Vision-Instruct model deployment from the section for your Foundry resource. This action opens the model's deployment page.

    Screenshot showing the list of models available under a given connection.

You can manage the existing model deployments in the resource by using the Foundry portal.

  1. Select Build in the upper-right navigation.

  2. Select Models in the left pane to see the list of deployments in the resource.

Test the deployment in the playground

You can interact with the new model in the Foundry portal by using the playground. The playground is a web-based interface that lets you interact with the model in real-time. Use the playground to test the model with different prompts and see the model's responses.

  1. From the model's deployment page, select Open in playground. This action opens the chat playground with the name of your deployment already selected.

    Screenshot showing how to select a model deployment to use in playground.

  2. Type your prompt and see the outputs.

  3. Use View code to see details about how to access the model deployment programmatically.

You can interact with the new model in the Foundry portal by using the playground. The playground is a web-based interface that lets you interact with the model in real-time. Use the playground to test the model with different prompts and see the model's responses.

  1. From the list of deployments, select the Llama-3.2-90B-Vision-Instruct deployment to open up the playground page.

  2. Type your prompt and see the outputs.

  3. Select the Code tab to see details about how to access the model deployment programmatically.

Inference the model with code

To perform inferencing on the deployed model with code samples, see the following examples:

Regional availability and quota limits of a model

For Foundry Models, the default quota varies by model and region. Certain models might only be available in some regions. For more information on availability and quota limits, see Azure OpenAI in Microsoft Foundry Models quotas and limits and Microsoft Foundry Models quotas and limits.

Quota for deploying and inferencing a model

For Foundry Models, deploying and inferencing consume quota that Azure assigns to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). When you sign up for Foundry, you receive default quota for most of the available models. Then, you assign TPM to each deployment as you create it, which reduces the available quota for that model. You can continue to create deployments and assign them TPMs until you reach your quota limit.

When you reach your quota limit, you can only create new deployments of that model if you:

  • Request more quota by submitting a quota increase form.
  • Adjust the allocated quota on other model deployments in the Foundry portal, to free up tokens for new deployments.

For more information about quota, see Microsoft Foundry Models quotas and limits and Manage Azure OpenAI quota.