Deploy models as serverless APIs

Important

Some of the features described in this article might only be available in preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, you learn how to deploy a model from the model catalog as a serverless API with pay-as-you-go token based billing.

Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.

Prerequisites

  • An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.

  • An Azure AI Studio hub.

  • An Azure AI Studio project.

  • Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the Azure AI Developer role on the resource group. For more information on permissions, see Role-based access control in Azure AI Studio.

  • You need to install the following software to work with Azure AI Studio:

    You can use any compatible web browser to navigate Azure AI Studio.

Subscribe your project to the model offering

For models offered through the Azure Marketplace, you can deploy them to serverless API endpoints to consume their predictions. If it's your first time deploying the model in the project, you have to subscribe your project for the particular model offering from the Azure Marketplace. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending.

Note

Models offered through the Azure Marketplace are available for deployment to serverless API endpoints in specific regions. Check Model and region availability for Serverless API deployments to verify which models and regions are available. If the one you need is not listed, you can deploy to a workspace in a supported region and then consume serverless API endpoints from a different workspace.

  1. Sign in to Azure AI Studio.

  2. Ensure your account has the Azure AI Developer role permissions on the resource group, or that you meet the permissions required to subscribe to model offerings.

  3. Select Model catalog from the left sidebar and find the model card of the model you want to deploy. In this article, you select a Meta-Llama-3-8B-Instruct model.

    1. If you're deploying the model using Azure CLI, Python, or ARM, copy the Model ID.

      Important

      Do not include the version when copying the Model ID. Serverless API endpoints always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-meta/models/Meta-Llama-3-8B-Instruct/versions/3, copy azureml://registries/azureml-meta/models/Meta-Llama-3-8B-Instruct.

    A screenshot showing a model's details page.

  4. Create the model's marketplace subscription. When you create a subscription, you accept the terms and conditions associated with the model offer.

    1. On the model's Details page, select Deploy and then select Serverless API to open the deployment wizard.

    2. Select the project in which you want to deploy your models. Notice that not all the regions are supported.

      A screenshot showing how to deploy a model with the serverless API option.

    3. If you see the note You already have an Azure Marketplace subscription for this project, you don't need to create the subscription since you already have one. You can proceed to Deploy the model to a serverless API endpoint.

    4. In the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. You can also select the Pricing and terms tab to learn about pricing for the selected model.

    5. Select Subscribe and Deploy.

  5. Once you sign up the project for the particular Azure Marketplace offering, subsequent deployments of the same offering in the same project don't require subscribing again.

  6. At any point, you can see the model offers to which your project is currently subscribed:

    1. Go to the Azure portal.

    2. Navigate to the resource group where the project belongs.

    3. On the Type filter, select SaaS.

    4. You see all the offerings to which you're currently subscribed.

    5. Select any resource to see the details.

Deploy the model to a serverless API endpoint

Once you've created a model's subscription, you can deploy the associated model to a serverless API endpoint. The serverless API endpoint provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription.

In this article, you create an endpoint with name meta-llama3-8b-qwerty.

  1. Create the serverless endpoint

    1. From the previous wizard, select Deploy (if you've just subscribed the project to the model offer in the previous section), or select Continue to deploy (if your deployment wizard had the note You already have an Azure Marketplace subscription for this project).

      A screenshot showing a project that is already subscribed to the offering.

    2. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region.

      A screenshot showing how to specify the name of the deployment you want to create.

    3. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page.

  2. At any point, you can see the endpoints deployed to your project:

    1. Go to your project.

    2. Select the section Deployments

    3. Serverless API endpoints are displayed.

  3. The created endpoint uses key authentication for authorization. Use the following steps to get the keys associated with a given endpoint.

    You can return to the Deployments page, select the deployment, and note the endpoint's Target URI and Key. Use them to call the deployment and generate predictions.

    Note

    When using the Azure portal, serverless API endpoints aren't displayed by default on the resource group. Use the Show hidden types option to display them on the resource group.

  4. At this point, your endpoint is ready to be used.

  5. If you need to consume this deployment from a different project or hub, or you plan to use prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API endpoint on a new project or hub, see Consume deployed serverless API endpoints from a different project or from Prompt flow.

    Tip

    If you're using prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Using the serverless API endpoint

Models deployed in Azure Machine Learning and Azure AI studio in Serverless API endpoints support the Azure AI Model Inference API that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way.

Read more about the capabilities of this API and how you can leverage it when building applications.

Delete endpoints and subscriptions

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

To delete a serverless API endpoint:

  1. Go to the Azure AI Studio.

  2. Go to Components > Deployments.

  3. Open the deployment you want to delete.

  4. Select Delete.

To delete the associated model subscription:

  1. Go to the Azure portal

  2. Navigate to the resource group where the project belongs.

  3. On the Type filter, select SaaS.

  4. Select the subscription you want to delete.

  5. Select Delete.

Cost and quota considerations for models deployed as serverless API endpoints

Models deployed as serverless API endpoints are offered through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying or fine-tuning the models.

Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference and fine-tuning; however, multiple meters are available to track each scenario independently.

For more information on how to track costs, see Monitor costs for models offered through the Azure Marketplace.

A screenshot showing different resources corresponding to different model offers and their associated meters.

Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.

Permissions required to subscribe to model offerings

Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the Owner, Contributor, or Azure AI Developer role for the Azure subscription. Alternatively, your account can be assigned a custom role that has the following permissions:

  • On the Azure subscription—to subscribe the workspace to the Azure Marketplace offering, once for each workspace, per offering:

    • Microsoft.MarketplaceOrdering/agreements/offers/plans/read
    • Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action
    • Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read
    • Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read
    • Microsoft.SaaS/register/action
  • On the resource group—to create and use the SaaS resource:

    • Microsoft.SaaS/resources/read
    • Microsoft.SaaS/resources/write
  • On the workspace—to deploy endpoints (the Azure Machine Learning data scientist role contains these permissions already):

    • Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/*
    • Microsoft.MachineLearningServices/workspaces/serverlessEndpoints/*

For more information on permissions, see Role-based access control in Azure AI Studio.

Next step