Run Azure OpenAI models in batch endpoints to compute embeddings
APPLIES TO:
Azure CLI ml extension v2 (current)
Python SDK azure-ai-ml v2 (current)
To run inference over large amounts of data, you can use batch endpoints to deploy models, including Azure OpenAI models. In this article, you see how to create a batch endpoint to deploy the text-embedding-ada-002
model from Azure OpenAI to compute embeddings at scale. You can use the same approach for completions and chat completions models.
The example in this article uses Microsoft Entra authentication to grant access to an Azure OpenAI Service resource, but you can also use an access key. The model is registered in MLflow format. It uses the Azure OpenAI flavor, which provides support for calling the Azure OpenAI service at scale.
To follow along with the example steps, see the Jupyter notebook Score OpenAI models in batch using Batch Endpoints.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
An Azure Machine Learning workspace. To create a workspace, see Manage Azure Machine Learning workspaces.
The following permissions in the Azure Machine Learning workspace:
- For creating or managing batch endpoints and deployments: Use an Owner, Contributor, or custom role that has been assigned the
Microsoft.MachineLearningServices/workspaces/batchEndpoints/*
permissions. - For creating Azure Resource Manager deployments in the workspace resource group: Use an Owner, Contributor, or custom role that has been assigned the
Microsoft.Resources/deployments/write
permission in the resource group where the workspace is deployed.
- For creating or managing batch endpoints and deployments: Use an Owner, Contributor, or custom role that has been assigned the
The Azure Machine Learning CLI or the Azure Machine Learning SDK for Python:
Run the following command to install the Azure CLI and the
ml
extension for Azure Machine Learning:az extension add -n ml
Pipeline component deployments for batch endpoints are introduced in version 2.7 of the
ml
extension for the Azure CLI. Use theaz extension update --name ml
command to get the latest version.
Connect to your workspace
The workspace is the top-level resource for Azure Machine Learning. It provides a centralized place to work with all artifacts you create when you use Azure Machine Learning. In this section, you connect to the workspace where you perform your deployment tasks.
In the following command, enter your subscription ID, workspace name, resource group name, and location:
az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>
Clone the examples repository
The example in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy or paste YAML and other files, use the following commands to clone the repository and go to the folder for your coding language:
git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli
Use the following command to go to the folder for this example:
cd endpoints/batch/deploy-models/openai-embeddings
Create an Azure OpenAI resource
This article shows you how to run OpenAI models hosted in Azure OpenAI. To begin, you need an Azure OpenAI resource that's deployed in Azure. For information about creating an Azure OpenAI resource, see Create a resource.
The name of your Azure OpenAI resource forms part of the resource URL. Use the following command to save that URL for use in later steps.
OPENAI_API_BASE="https://<your-azure-openai-resource-name>.openai.azure.com"
In this article, you see how to create a deployment for an Azure OpenAI model. The following image shows a deployed Azure OpenAI model and highlights the Azure OpenAI resource that it's deployed to:
For information about managing Azure OpenAI models in Azure OpenAI, see Focus on Azure OpenAI Service.
Create a compute cluster
Batch endpoints use a compute cluster to run models. Use the following code to create a compute cluster called batch-cluster-lp. If you already have a compute cluster, you can skip this step.
COMPUTE_NAME="batch-cluster-lp"
az ml compute create -n batch-cluster-lp --type amlcompute --min-instances 0 --max-instances 5
Choose an authentication mode
You can access the Azure OpenAI resource in two ways:
- Microsoft Entra authentication (recommended)
- An access key
Using Microsoft Entra is recommended because it helps you avoid managing secrets in deployments.
You can configure the identity of the compute cluster to have access to the Azure OpenAI deployment to get predictions. In this way, you don't need to manage permissions for each endpoint user. To give the identity of the compute cluster access to the Azure OpenAI resource, follow these steps:
Assign an identity to the compute cluster that your deployment uses. This example uses a compute cluster called batch-cluster-lp and a system-assigned managed identity, but you can use other options. If your compute cluster already has an assigned identity, you can skip this step.
COMPUTE_NAME="batch-cluster-lp" az ml compute update --name $COMPUTE_NAME --identity-type system_assigned
Get the managed identity principal ID that's assigned to the compute cluster you plan to use.
PRINCIPAL_ID=$(az ml compute show -n $COMPUTE_NAME --query identity.principal_id)
Get the unique ID of the resource group where the Azure OpenAI resource is deployed:
RG="<openai-resource-group-name>" RESOURCE_ID=$(az group show -g $RG --query "id" -o tsv)
Assign the Cognitive Services User role to the managed identity:
az role assignment create --role "Cognitive Services User" --assignee $PRINCIPAL_ID --scope $RESOURCE_ID
Register the Azure OpenAI model
Model deployments in batch endpoints can deploy only registered models. You can use MLflow models with the Azure OpenAI flavor to create a model in your workspace that references a deployment in Azure OpenAI.
In the cloned repository, the model folder contains an MLflow model that generates embeddings based on the text-embedding-ada-002
model.
Register the model in the workspace:
MODEL_NAME='text-embedding-ada-002'
az ml model create --name $MODEL_NAME --path "model"
Create a deployment for an Azure OpenAI model
To deploy the Azure OpenAI model, you need to create an endpoint, an environment, a scoring script, and a batch deployment. The following sections show you how to set up these components.
Create an endpoint
An endpoint is needed to host the model. To create an endpoint, take the following steps:
Set up a variable to store your endpoint name. Replace the name in the following code with one that's unique within the region of your resource group.
ENDPOINT_NAME="text-davinci-002"
Configure the endpoint:
Create a YAML file called endpoint.yml that contains the following lines. Replace the
name
value with your endpoint name.$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json name: text-embedding-ada-qwerty description: An endpoint to generate embeddings in batch for the ADA-002 model from OpenAI auth_mode: aad_token
Create the endpoint resource:
az ml batch-endpoint create -n $ENDPOINT_NAME -f endpoint.yml
Configure an environment
The scoring script in this example uses some libraries that aren't part of the standard OpenAI SDK. Create an environment that contains a base image and also a conda YAML file to capture those dependencies:
The environment definition consists of the following lines, which are included in the deployment definition.
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: batch-openai-mlflow
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
conda_file: conda.yaml
The conda YAML file, conda.yml, contains the following lines:
channels:
- conda-forge
dependencies:
- python=3.8.5
- pip<=23.2.1
- pip:
- openai==0.27.8
- requests==2.31.0
- tenacity==8.2.2
- tiktoken==0.4.0
- azureml-core
- azure-identity
- datasets
- mlflow
Create a scoring script
This example uses a scoring script that performs the execution. In batch endpoints, MLflow models don't require a scoring script. But this example extends the capabilities of batch endpoints by:
- Allowing the endpoint to read multiple data types, including CSV, TSV, Parquet, JSON, JSON Lines, Arrow, and text formats.
- Adding some validations to ensure the MLflow model has an Azure OpenAI flavor.
- Formatting the output in JSON Lines format.
- Optionally adding the
AZUREML_BI_TEXT_COLUMN
environment variable to control which input field you want to generate embeddings for.
Tip
By default, MLflow generates embeddings from the first text column that's available in the input data. If you want to use a different column, set the AZUREML_BI_TEXT_COLUMN
environment variable to the name of your preferred column. Leave that variable blank if the default behavior works for you.
The scoring script, code/batch_driver.py, contains the following lines:
import os
import glob
import mlflow
import pandas as pd
import numpy as np
from pathlib import Path
from typing import List
from datasets import load_dataset
DATA_READERS = {
".csv": "csv",
".tsv": "tsv",
".parquet": "parquet",
".json": "json",
".jsonl": "json",
".arrow": "arrow",
".txt": "text",
}
def init():
global model
global output_file
global task_name
global text_column
# AZUREML_MODEL_DIR is the path where the model is located.
# If the model is MLFlow, you don't need to indicate further.
model_path = glob.glob(os.environ["AZUREML_MODEL_DIR"] + "/*/")[0]
# AZUREML_BI_TEXT_COLUMN is an environment variable you can use
# to indicate over which column you want to run the model on. It can
# used only if the model has one single input.
text_column = os.environ.get("AZUREML_BI_TEXT_COLUMN", None)
model = mlflow.pyfunc.load_model(model_path)
model_info = mlflow.models.get_model_info(model_path)
if not mlflow.openai.FLAVOR_NAME in model_info.flavors:
raise ValueError(
"The indicated model doesn't have an OpenAI flavor on it. Use "
"``mlflow.openai.log_model`` to log OpenAI models."
)
if text_column:
if (
model.metadata
and model.metadata.signature
and len(model.metadata.signature.inputs) > 1
):
raise ValueError(
"The model requires more than 1 input column to run. You can't use "
"AZUREML_BI_TEXT_COLUMN to indicate which column to send to the model. Format your "
f"data with columns {model.metadata.signature.inputs.input_names()} instead."
)
task_name = model._model_impl.model["task"]
output_path = os.environ["AZUREML_BI_OUTPUT_PATH"]
output_file = os.path.join(output_path, f"{task_name}.jsonl")
def run(mini_batch: List[str]):
if mini_batch:
filtered_files = filter(lambda x: Path(x).suffix in DATA_READERS, mini_batch)
results = []
for file in filtered_files:
data_format = Path(file).suffix
data = load_dataset(DATA_READERS[data_format], data_files={"data": file})[
"data"
].data.to_pandas()
if text_column:
data = data.loc[[text_column]]
scores = model.predict(data)
results.append(
pd.DataFrame(
{
"file": np.repeat(Path(file).name, len(scores)),
"row": range(0, len(scores)),
task_name: scores,
}
)
)
pd.concat(results, axis="rows").to_json(
output_file, orient="records", mode="a", lines=True
)
return mini_batch
Create a batch deployment
To configure the Azure OpenAI deployment, you use environment variables. Specifically, you use the following keys:
OPENAI_API_TYPE
is the type of API and authentication that you want to use.OPENAI_API_BASE
is the URL of your Azure OpenAI resource.OPENAI_API_VERSION
is the version of the API that you plan to use.
If you use the OPENAI_API_TYPE
environment variable with a value of azure_ad
, Azure OpenAI uses Microsoft Entra authentication. No key is required to invoke the Azure OpenAI deployment. Instead, the identity of the cluster is used.
Update the values of the authentication and environment variables in the deployment configuration. The following example uses Microsoft Entra authentication:
The deployment.yml file configures the deployment:
$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json endpoint_name: text-embedding-ada-qwerty name: default description: The default deployment for generating embeddings type: model model: azureml:text-embedding-ada-002@latest environment: name: batch-openai-mlflow image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest conda_file: environment/conda.yaml code_configuration: code: code scoring_script: batch_driver.py compute: azureml:batch-cluster-lp resources: instance_count: 1 settings: max_concurrency_per_instance: 1 mini_batch_size: 1 output_action: summary_only retry_settings: max_retries: 1 timeout: 9999 logging_level: info environment_variables: OPENAI_API_TYPE: azure_ad OPENAI_API_BASE: $OPENAI_API_BASE OPENAI_API_VERSION: 2023-03-15-preview
Tip
The
environment_variables
section provides the configuration for the Azure OpenAI deployment. TheOPENAI_API_BASE
value is set when the deployment is created, so you don't have to edit that value in the YAML configuration file.Create the deployment.
az ml batch-deployment create --file deployment.yml \ --endpoint-name $ENDPOINT_NAME \ --set-default \ --set settings.environment_variables.OPENAI_API_BASE=$OPENAI_API_BASE
The batch endpoint is ready for use.
Test the deployment
For testing the endpoint, you use a sample of the dataset BillSum: A Corpus for Automatic Summarization of US Legislation. This sample is included in the data folder of the cloned repository.
Set up the input data:
In the commands in this section, use data as the name of the folder that contains the input data.
Invoke the endpoint:
JOB_NAME=$(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input data --query name -o tsv)
Track the progress:
az ml job show -n $JOB_NAME --web
After the deployment is finished, download the predictions:
az ml job download --name $JOB_NAME --output-name score --download-path ./
Use the following code to view the output predictions:
import pandas as pd from io import StringIO # Read the output data into an object. with open('embeddings.jsonl', 'r') as f: json_lines = f.readlines() string_io = StringIO() for line in json_lines: string_io.write(line) string_io.seek(0) # Read the data into a data frame. embeddings = pd.read_json(string_io, lines=True) # Print the data frame. print(embeddings)
You can also open the output file, embeddings.jsonl, to see the predictions:
{"file": "billsum-0.csv", "row": 0, "embeddings": [[0, 0, 0, 0, 0, 0, 0]]} {"file": "billsum-0.csv", "row": 1, "embeddings": [[0, 0, 0, 0, 0, 0, 0]]}