Consume serverless API endpoints from a different workspace

In this article, you learn how to configure an existing serverless API endpoint in a different workspace than the one where it was deployed.

Certain models in the model catalog can be deployed as serverless APIs. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.

The need to consume a serverless API endpoint in a different workspace than the one that was used to create the deployment might arise in situations such as these:

  • You want to centralize your deployments in a given workspace and consume them from different workspaces in your organization.
  • You need to deploy a model in a workspace in a particular Azure region where serverless deployment for that model is available. However, you need to consume it from another region, where serverless deployment isn't available for the particular models.

Prerequisites

  • An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.

  • An Azure Machine Learning workspace where you want to consume the existing deployment.

  • A model deployed to a serverless API endpoint. This article assumes that you previously deployed the Meta-Llama-3-8B-Instruct model. To learn how to deploy this model as a serverless API, see Deploy models as serverless APIs.

  • You need to install the following software to work with Azure Machine Learning:

    You can use any compatible web browser to navigate Azure Machine Learning studio.

Create a serverless API endpoint connection

Follow these steps to create a connection:

  1. Connect to the workspace where the endpoint is deployed:

    Go to Azure Machine Learning studio and navigate to the workspace where the endpoint you want to connect to is deployed.

  2. Get the endpoint's URL and credentials for the endpoint you want to connect to. In this example, you get the details for an endpoint name meta-llama3-8b-qwerty.

    1. Select Endpoints from the left sidebar.

    2. Select the Serverless endpoints tab to display the serverless API endpoints.

    3. Select the endpoint you want to connect to.

    4. On the endpoint's Details tab, copy the values for Target URI and Key.

  3. Now, connect to the workspace where you want to create the connection and consume the endpoint.

  4. Create the connection in the workspace:

    1. Go to the workspace where the connection needs to be created to.

    2. Go to the Manage section in the left navigation bar and select Connections.

    3. Select Create.

    4. Select Serverless Model.

    5. For the Target URI, paste the value you copied previously.

    6. For the Key, paste the value you copied previously.

    7. Give the connection a name, in this case meta-llama3-8b-connection.

    8. Select Add connection.

  5. At this point, the connection is available for consumption.

  6. To validate that the connection is working:

    1. From the left navigation bar of Azure Machine Learning studio, go to Authoring > Prompt flow.

    2. Select Create to create a new flow.

    3. Select Create in the Chat flow box.

    4. Give your Prompt flow a name and select Create.

    5. Select the chat node from the graph to go to the chat section.

    6. For Connection, open the dropdown list to select the connection you just created, in this case meta-llama3-8b-connection.

    7. Select Start compute session from the top navigation bar, to start a prompt flow automatic runtime.

    8. Select the Chat option. You can now send messages and get responses.