Run Azure OpenAI models in batch endpoints to compute embeddings

Article
12/18/2024

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

To run inference over large amounts of data, you can use batch endpoints to deploy models, including Azure OpenAI models. In this article, you see how to create a batch endpoint to deploy the text-embedding-ada-002 model from Azure OpenAI to compute embeddings at scale. You can use the same approach for completions and chat completions models.

The example in this article uses Microsoft Entra authentication to grant access to an Azure OpenAI Service resource, but you can also use an access key. The model is registered in MLflow format. It uses the Azure OpenAI flavor, which provides support for calling the Azure OpenAI service at scale.

To follow along with the example steps, see the Jupyter notebook Score OpenAI models in batch using Batch Endpoints.

Prerequisites

An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
An Azure Machine Learning workspace. To create a workspace, see Manage Azure Machine Learning workspaces.
The following permissions in the Azure Machine Learning workspace:
- For creating or managing batch endpoints and deployments: Use an Owner, Contributor, or custom role that has been assigned the Microsoft.MachineLearningServices/workspaces/batchEndpoints/* permissions.
- For creating Azure Resource Manager deployments in the workspace resource group: Use an Owner, Contributor, or custom role that has been assigned the Microsoft.Resources/deployments/write permission in the resource group where the workspace is deployed.
The Azure Machine Learning CLI or the Azure Machine Learning SDK for Python:
- Azure CLI
- Python
Run the following command to install the Azure CLI and the ml extension for Azure Machine Learning:
```
az extension add -n ml
```
Pipeline component deployments for batch endpoints are introduced in version 2.7 of the ml extension for the Azure CLI. Use the az extension update --name ml command to get the latest version.
Run the following command to install the Azure Machine Learning SDK for Python:
```
pip install azure-ai-ml
```
The ModelBatchDeployment and PipelineComponentBatchDeployment classes are introduced in version 1.7.0 of the SDK. Use the pip install -U azure-ai-ml command to get the latest version.

Connect to your workspace

The workspace is the top-level resource for Azure Machine Learning. It provides a centralized place to work with all artifacts you create when you use Azure Machine Learning. In this section, you connect to the workspace where you perform your deployment tasks.

Azure CLI
Python

In the following command, enter your subscription ID, workspace name, resource group name, and location:

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

Import the required libraries:

from azure.ai.ml import MLClient, Input, load_component
from azure.ai.ml.entities import BatchEndpoint, ModelBatchDeployment, ModelBatchDeploymentSettings, PipelineComponentBatchDeployment, Model, AmlCompute, Data, BatchRetrySettings, CodeConfiguration, Environment, Data
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.ai.ml.dsl import pipeline
from azure.identity import DefaultAzureCredential

Configure the workspace details and get a handle to the workspace:

In the following command, enter your subscription ID, resource group name, and workspace name:

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

Clone the examples repository

The example in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy or paste YAML and other files, use the following commands to clone the repository and go to the folder for your coding language:

Azure CLI
Python

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/sdk/python

Use the following command to go to the folder for this example:

Azure CLI
Python SDK

cd endpoints/batch/deploy-models/openai-embeddings

cd endpoints/batch/deploy-models/openai-embeddings

Create an Azure OpenAI resource

This article shows you how to run OpenAI models hosted in Azure OpenAI. To begin, you need an Azure OpenAI resource that's deployed in Azure. For information about creating an Azure OpenAI resource, see Create a resource.

The name of your Azure OpenAI resource forms part of the resource URL. Use the following command to save that URL for use in later steps.

Azure CLI
Python SDK

OPENAI_API_BASE="https://<your-azure-openai-resource-name>.openai.azure.com"

openai_api_base="https://<your-azure-openai-resource-name>.openai.azure.com"

In this article, you see how to create a deployment for an Azure OpenAI model. The following image shows a deployed Azure OpenAI model and highlights the Azure OpenAI resource that it's deployed to:

For information about managing Azure OpenAI models in Azure OpenAI, see Focus on Azure OpenAI Service.

Create a compute cluster

Batch endpoints use a compute cluster to run models. Use the following code to create a compute cluster called batch-cluster-lp. If you already have a compute cluster, you can skip this step.

Azure CLI
Python SDK

COMPUTE_NAME="batch-cluster-lp"
az ml compute create -n batch-cluster-lp --type amlcompute --min-instances 0 --max-instances 5

compute_name = "batch-cluster-lp"
if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    compute_cluster = AmlCompute(
        name=compute_name, description="amlcompute", min_instances=0, max_instances=5
    )
    ml_client.begin_create_or_update(compute_cluster).result()

Choose an authentication mode

You can access the Azure OpenAI resource in two ways:

Microsoft Entra authentication (recommended)
An access key

Using Microsoft Entra is recommended because it helps you avoid managing secrets in deployments.

Microsoft Entra authentication
Access keys

You can configure the identity of the compute cluster to have access to the Azure OpenAI deployment to get predictions. In this way, you don't need to manage permissions for each endpoint user. To give the identity of the compute cluster access to the Azure OpenAI resource, follow these steps:

Assign an identity to the compute cluster that your deployment uses. This example uses a compute cluster called batch-cluster-lp and a system-assigned managed identity, but you can use other options. If your compute cluster already has an assigned identity, you can skip this step.
```
COMPUTE_NAME="batch-cluster-lp"
az ml compute update --name $COMPUTE_NAME --identity-type system_assigned
```
Get the managed identity principal ID that's assigned to the compute cluster you plan to use.
```
PRINCIPAL_ID=$(az ml compute show -n $COMPUTE_NAME --query identity.principal_id)
```

Get the unique ID of the resource group where the Azure OpenAI resource is deployed:

RG="<openai-resource-group-name>"
RESOURCE_ID=$(az group show -g $RG --query "id" -o tsv)

Assign the Cognitive Services User role to the managed identity:

az role assignment create --role "Cognitive Services User" --assignee $PRINCIPAL_ID --scope $RESOURCE_ID

Register the Azure OpenAI model

Model deployments in batch endpoints can deploy only registered models. You can use MLflow models with the Azure OpenAI flavor to create a model in your workspace that references a deployment in Azure OpenAI.

In the cloned repository, the model folder contains an MLflow model that generates embeddings based on the text-embedding-ada-002 model.

Azure CLI
Python SDK

MODEL_NAME='text-embedding-ada-002'
az ml model create --name $MODEL_NAME --path "model"

model_name = "text-embedding-ada-002"
model_local_path = "model"

model = ml_client.models.create_or_update(
    Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)

Create a deployment for an Azure OpenAI model

To deploy the Azure OpenAI model, you need to create an endpoint, an environment, a scoring script, and a batch deployment. The following sections show you how to set up these components.

Create an endpoint

An endpoint is needed to host the model. To create an endpoint, take the following steps:

Set up a variable to store your endpoint name. Replace the name in the following code with one that's unique within the region of your resource group.
- Azure CLI
- Python SDK
```
ENDPOINT_NAME="text-davinci-002"
```
```
endpoint_name = "text-embedding-ada"
```

Configure the endpoint:

Azure CLI
Python SDK

Create a YAML file called endpoint.yml that contains the following lines. Replace the name value with your endpoint name.

$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: text-embedding-ada-qwerty
description: An endpoint to generate embeddings in batch for the ADA-002 model from OpenAI
auth_mode: aad_token

endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An endpoint to generate embeddings in batch for the ADA-002 model from OpenAI",
)

Create the endpoint resource:

Azure CLI
Python SDK

az ml batch-endpoint create -n $ENDPOINT_NAME -f endpoint.yml

ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

Configure an environment

The scoring script in this example uses some libraries that aren't part of the standard OpenAI SDK. Create an environment that contains a base image and also a conda YAML file to capture those dependencies:

Azure CLI
Python SDK

The environment definition consists of the following lines, which are included in the deployment definition.

$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: batch-openai-mlflow
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
conda_file: conda.yaml

environment = Environment(
    name="batch-openai-mlflow",
    conda_file="environment/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)

The conda YAML file, conda.yml, contains the following lines:

channels:
- conda-forge
dependencies:
- python=3.8.5
- pip<=23.2.1
- pip:
  - openai==0.27.8
  - requests==2.31.0
  - tenacity==8.2.2
  - tiktoken==0.4.0
  - azureml-core
  - azure-identity
  - datasets
  - mlflow

Create a scoring script

This example uses a scoring script that performs the execution. In batch endpoints, MLflow models don't require a scoring script. But this example extends the capabilities of batch endpoints by:

Allowing the endpoint to read multiple data types, including CSV, TSV, Parquet, JSON, JSON Lines, Arrow, and text formats.
Adding some validations to ensure the MLflow model has an Azure OpenAI flavor.
Formatting the output in JSON Lines format.
Optionally adding the AZUREML_BI_TEXT_COLUMN environment variable to control which input field you want to generate embeddings for.

Tip

By default, MLflow generates embeddings from the first text column that's available in the input data. If you want to use a different column, set the AZUREML_BI_TEXT_COLUMN environment variable to the name of your preferred column. Leave that variable blank if the default behavior works for you.

The scoring script, code/batch_driver.py, contains the following lines:

import os
import glob
import mlflow
import pandas as pd
import numpy as np
from pathlib import Path
from typing import List
from datasets import load_dataset

DATA_READERS = {
    ".csv": "csv",
    ".tsv": "tsv",
    ".parquet": "parquet",
    ".json": "json",
    ".jsonl": "json",
    ".arrow": "arrow",
    ".txt": "text",
}


def init():
    global model
    global output_file
    global task_name
    global text_column

    # AZUREML_MODEL_DIR is the path where the model is located.
    # If the model is MLFlow, you don't need to indicate further.
    model_path = glob.glob(os.environ["AZUREML_MODEL_DIR"] + "/*/")[0]
    # AZUREML_BI_TEXT_COLUMN is an environment variable you can use
    # to indicate over which column you want to run the model on. It can
    # used only if the model has one single input.
    text_column = os.environ.get("AZUREML_BI_TEXT_COLUMN", None)

    model = mlflow.pyfunc.load_model(model_path)
    model_info = mlflow.models.get_model_info(model_path)

    if not mlflow.openai.FLAVOR_NAME in model_info.flavors:
        raise ValueError(
            "The indicated model doesn't have an OpenAI flavor on it. Use "
            "``mlflow.openai.log_model`` to log OpenAI models."
        )

    if text_column:
        if (
            model.metadata
            and model.metadata.signature
            and len(model.metadata.signature.inputs) > 1
        ):
            raise ValueError(
                "The model requires more than 1 input column to run. You can't use "
                "AZUREML_BI_TEXT_COLUMN to indicate which column to send to the model. Format your "
                f"data with columns {model.metadata.signature.inputs.input_names()} instead."
            )

    task_name = model._model_impl.model["task"]
    output_path = os.environ["AZUREML_BI_OUTPUT_PATH"]
    output_file = os.path.join(output_path, f"{task_name}.jsonl")


def run(mini_batch: List[str]):
    if mini_batch:
        filtered_files = filter(lambda x: Path(x).suffix in DATA_READERS, mini_batch)
        results = []

        for file in filtered_files:
            data_format = Path(file).suffix
            data = load_dataset(DATA_READERS[data_format], data_files={"data": file})[
                "data"
            ].data.to_pandas()
            if text_column:
                data = data.loc[[text_column]]
            scores = model.predict(data)
            results.append(
                pd.DataFrame(
                    {
                        "file": np.repeat(Path(file).name, len(scores)),
                        "row": range(0, len(scores)),
                        task_name: scores,
                    }
                )
            )

        pd.concat(results, axis="rows").to_json(
            output_file, orient="records", mode="a", lines=True
        )

    return mini_batch

Create a batch deployment

To configure the Azure OpenAI deployment, you use environment variables. Specifically, you use the following keys:

OPENAI_API_TYPE is the type of API and authentication that you want to use.
OPENAI_API_BASE is the URL of your Azure OpenAI resource.
OPENAI_API_VERSION is the version of the API that you plan to use.

Microsoft Entra authentication
Access keys

If you use the OPENAI_API_TYPE environment variable with a value of azure_ad, Azure OpenAI uses Microsoft Entra authentication. No key is required to invoke the Azure OpenAI deployment. Instead, the identity of the cluster is used.

Update the values of the authentication and environment variables in the deployment configuration. The following example uses Microsoft Entra authentication:

Azure CLI
Python SDK

The deployment.yml file configures the deployment:

$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json
endpoint_name: text-embedding-ada-qwerty
name: default
description: The default deployment for generating embeddings
type: model
model: azureml:text-embedding-ada-002@latest
environment:
  name: batch-openai-mlflow
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
  conda_file: environment/conda.yaml
code_configuration:
  code: code
  scoring_script: batch_driver.py
compute: azureml:batch-cluster-lp
resources:
  instance_count: 1
settings:
  max_concurrency_per_instance: 1
  mini_batch_size: 1
  output_action: summary_only
  retry_settings:
    max_retries: 1
    timeout: 9999
  logging_level: info
  environment_variables:
    OPENAI_API_TYPE: azure_ad
    OPENAI_API_BASE: $OPENAI_API_BASE
    OPENAI_API_VERSION: 2023-03-15-preview

Tip

The environment_variables section provides the configuration for the Azure OpenAI deployment. The OPENAI_API_BASE value is set when the deployment is created, so you don't have to edit that value in the YAML configuration file.

deployment = ModelBatchDeployment(
    name="default",
    description="The default deployment for generating embeddings",
    endpoint_name=endpoint.name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(code="code", scoring_script="batch_driver.py"),
    compute=compute_name,
    settings=ModelBatchDeploymentSettings(
        instance_count=1,
        max_concurrency_per_instance=1,
        mini_batch_size=1,
        output_action=BatchDeploymentOutputAction.SUMMARY_ONLY,
        retry_settings=BatchRetrySettings(max_retries=1, timeout=9999),
        logging_level="info",
        error_threshold=-1,
        environment_variables={
            "OPENAI_API_TYPE": "azure_ad",
            "OPENAI_API_VERSION": "2023-03-15-preview",
            "OPENAI_API_BASE": openai_api_base,
        },
    ),
)

Tip

The environment_variables section provides the configuration for the Azure OpenAI deployment.

Create the deployment.

Azure CLI
Python SDK

az ml batch-deployment create --file deployment.yml \
                              --endpoint-name $ENDPOINT_NAME \
                              --set-default \
                              --set settings.environment_variables.OPENAI_API_BASE=$OPENAI_API_BASE

ml_client.batch_deployments.begin_create_or_update(deployment).result()

Set the new deployment as the default one:

endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

The batch endpoint is ready for use.

Test the deployment

For testing the endpoint, you use a sample of the dataset BillSum: A Corpus for Automatic Summarization of US Legislation. This sample is included in the data folder of the cloned repository.

Set up the input data:
- Azure CLI
- Python SDK
In the commands in this section, use data as the name of the folder that contains the input data.
```
input = Input(type=AssetTypes.URI_FOLDER, path="data")
```
Invoke the endpoint:
- Azure CLI
- Python SDK
```
JOB_NAME=$(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input data --query name -o tsv)
```
Tip

What's the difference between the inputs and input parameter when you invoke an endpoint?

In general, you can use a dictionary inputs = {} parameter with the invoke method to provide an arbitrary number of required inputs to a batch endpoint that contains a model deployment or a pipeline deployment.

For a model deployment, you can use the input parameter as a shorter way to specify the input data location for the deployment. This approach works because a model deployment always takes only one data input.
```
job = ml_client.batch_endpoints.invoke(endpoint_name=endpoint.name, input=input)
```

Track the progress:

Azure CLI
Python SDK

az ml job show -n $JOB_NAME --web

ml_client.jobs.get(job.name)

After the deployment is finished, download the predictions:

Azure CLI
Python SDK

az ml job download --name $JOB_NAME --output-name score --download-path ./

The deployment creates a child job that implements the scoring. Get a reference to that child job:

scoring_job = list(ml_client.jobs.list(parent_job_name=job.name))[0]

Download the scores:

ml_client.jobs.download(name=scoring_job.name, download_path=".", output_name="score")

Use the following code to view the output predictions:

import pandas as pd
from io import StringIO

# Read the output data into an object.
with open('embeddings.jsonl', 'r') as f:
    json_lines = f.readlines()
string_io = StringIO()
for line in json_lines:
    string_io.write(line)
string_io.seek(0)

# Read the data into a data frame.
embeddings = pd.read_json(string_io, lines=True)

# Print the data frame.
print(embeddings)

You can also open the output file, embeddings.jsonl, to see the predictions:

{"file": "billsum-0.csv", "row": 0, "embeddings": [[0, 0, 0, 0, 0, 0, 0]]}
{"file": "billsum-0.csv", "row": 1, "embeddings": [[0, 0, 0, 0, 0, 0, 0]]}

Create jobs and input data for batch endpoints

Share via