Run OpenAI models in batch endpoints to compute embeddings
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)
Batch Endpoints can deploy models to run inference over large amounts of data, including OpenAI models. In this example, you learn how to create a batch endpoint to deploy ADA-002 model from OpenAI to compute embeddings at scale but you can use the same approach for completions and chat completions models. It uses Microsoft Entra authentication to grant access to the Azure OpenAI resource.
About this example
In this example, we're going to compute embeddings over a dataset using ADA-002 model from OpenAI. We will register the particular model in MLflow format using the OpenAI flavor which has support to orchestrate all the calls to the OpenAI service at scale.
The example in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste YAML and other files, first clone the repo and then change directories to the folder:
git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli
The files for this example are in:
cd endpoints/batch/deploy-models/openai-embeddings
Follow along in Jupyter Notebooks
You can follow along this sample in the following notebooks. In the cloned repository, open the notebook: deploy-and-test.ipynb.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
An Azure Machine Learning workspace. To create a workspace, see Manage Azure Machine Learning workspaces.
Ensure that you have the following permissions in the Machine Learning workspace:
- Create or manage batch endpoints and deployments: Use an Owner, Contributor, or Custom role that allows
Microsoft.MachineLearningServices/workspaces/batchEndpoints/*
. - Create Azure Resource Manager deployments in the workspace resource group: Use an Owner, Contributor, or Custom role that allows
Microsoft.Resources/deployments/write
in the resource group where the workspace is deployed.
- Create or manage batch endpoints and deployments: Use an Owner, Contributor, or Custom role that allows
Install the following software to work with Machine Learning:
Run the following command to install the Azure CLI and the
ml
extension for Azure Machine Learning:az extension add -n ml
Pipeline component deployments for Batch Endpoints are introduced in version 2.7 of the
ml
extension for the Azure CLI. Use theaz extension update --name ml
command to get the latest version.
Connect to your workspace
The workspace is the top-level resource for Machine Learning. It provides a centralized place to work with all artifacts you create when you use Machine Learning. In this section, you connect to the workspace where you perform your deployment tasks.
In the following command, enter the values for your subscription ID, workspace, location, and resource group:
az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>
Ensure you have an OpenAI deployment
The example shows how to run OpenAI models hosted in Azure OpenAI Service. To successfully do it, you need an OpenAI resource correctly deployed in Azure and a deployment for the model you want to use.
Take note of the OpenAI resource being used. We use the name to construct the URL of the resource. Save the URL for later use on the tutorial.
Ensure you have a compute cluster where to deploy the endpoint
Batch endpoints use compute cluster to run the models. In this example, we use a compute cluster called batch-cluster. We create the compute cluster here but you can skip this step if you already have one:
COMPUTE_NAME="batch-cluster"
az ml compute create -n batch-cluster --type amlcompute --min-instances 0 --max-instances 5
Decide in the authentication mode
You can access the Azure OpenAI resource in two ways:
- Using Microsoft Entra authentication (recommended).
- Using an access key.
Using Microsoft Entra is recommended because it helps you avoid managing secrets in the deployments.
You can configure the identity of the compute to have access to the Azure OpenAI deployment to get predictions. In this way, you don't need to manage permissions for each of the users using the endpoint. To configure the identity of the compute cluster get access to the Azure OpenAI resource, follow these steps:
Ensure or assign an identity to the compute cluster your deployment uses. In this example, we use a compute cluster called batch-cluster and we assign a system assigned managed identity, but you can use other alternatives.
COMPUTE_NAME="batch-cluster" az ml compute update --name $COMPUTE_NAME --identity-type system_assigned
Get the managed identity principal ID assigned to the compute cluster you plan to use.
PRINCIPAL_ID=$(az ml compute show -n $COMPUTE_NAME --query identity.principal_id)
Get the unique ID of the resource group where the Azure OpenAI resource is deployed:
RG="<openai-resource-group-name>" RESOURCE_ID=$(az group show -g $RG --query "id" -o tsv)
Grant the role Cognitive Services User to the managed identity:
az role assignment create --role "Cognitive Services User" --assignee $PRINCIPAL_ID --scope $RESOURCE_ID
Register the OpenAI model
Model deployments in batch endpoints can only deploy registered models. You can use MLflow models with the flavor OpenAI to create a model in your workspace referencing a deployment in Azure OpenAI.
Create an MLflow model in the workspace's models registry pointing to your OpenAI deployment with the model you want to use. Use MLflow SDK to create the model:
Tip
In the cloned repository in the folder model you already have an MLflow model to generate embeddings based on ADA-002 model in case you want to skip this step.
import mlflow import openai engine = openai.Model.retrieve("text-embedding-ada-002") model_info = mlflow.openai.save_model( path="model", model="text-embedding-ada-002", engine=engine.id, task=openai.Embedding, )
Register the model in the workspace:
Create a deployment for an OpenAI model
First, let's create the endpoint that hosts the model. Decide on the name of the endpoint:
Configure the endpoint:
The following YAML file defines a batch endpoint:
endpoint.yml
$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json name: text-embedding-ada-qwerty description: An endpoint to generate embeddings in batch for the ADA-002 model from OpenAI auth_mode: aad_token
Create the endpoint resource:
Our scoring script uses some specific libraries that are not part of the standard OpenAI SDK so we need to create an environment that have them. Here, we configure an environment with a base image a conda YAML.
environment/environment.yml
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json name: batch-openai-mlflow image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04 conda_file: conda.yaml
The conda YAML looks as follows:
conda.yaml
channels: - conda-forge dependencies: - python=3.8.5 - pip<=23.2.1 - pip: - openai==0.27.8 - requests==2.31.0 - tenacity==8.2.2 - tiktoken==0.4.0 - azureml-core - azure-identity - datasets - mlflow
Let's create a scoring script that performs the execution. In Batch Endpoints, MLflow models don't require a scoring script. However, in this case we want to extend a bit the capabilities of batch endpoints by:
- Allow the endpoint to read multiple data types, including
csv
,tsv
,parquet
,json
,jsonl
,arrow
, andtxt
. - Add some validations to ensure the MLflow model used has an OpenAI flavor on it.
- Format the output in
jsonl
format. - Add an environment variable
AZUREML_BI_TEXT_COLUMN
to control (optionally) which input field you want to generate embeddings for.
Tip
By default, MLflow will use the first text column available in the input data to generate embeddings from. Use the environment variable
AZUREML_BI_TEXT_COLUMN
with the name of an existing column in the input dataset to change the column if needed. Leave it blank if the default behavior works for you.The scoring script looks as follows:
code/batch_driver.py
import os import glob import mlflow import pandas as pd import numpy as np from pathlib import Path from typing import List from datasets import load_dataset DATA_READERS = { ".csv": "csv", ".tsv": "tsv", ".parquet": "parquet", ".json": "json", ".jsonl": "json", ".arrow": "arrow", ".txt": "text", } def init(): global model global output_file global task_name global text_column # AZUREML_MODEL_DIR is the path where the model is located. # If the model is MLFlow, you don't need to indicate further. model_path = glob.glob(os.environ["AZUREML_MODEL_DIR"] + "/*/")[0] # AZUREML_BI_TEXT_COLUMN is an environment variable you can use # to indicate over which column you want to run the model on. It can # used only if the model has one single input. text_column = os.environ.get("AZUREML_BI_TEXT_COLUMN", None) model = mlflow.pyfunc.load_model(model_path) model_info = mlflow.models.get_model_info(model_path) if not mlflow.openai.FLAVOR_NAME in model_info.flavors: raise ValueError( "The indicated model doesn't have an OpenAI flavor on it. Use " "``mlflow.openai.log_model`` to log OpenAI models." ) if text_column: if ( model.metadata and model.metadata.signature and len(model.metadata.signature.inputs) > 1 ): raise ValueError( "The model requires more than 1 input column to run. You can't use " "AZUREML_BI_TEXT_COLUMN to indicate which column to send to the model. Format your " f"data with columns {model.metadata.signature.inputs.input_names()} instead." ) task_name = model._model_impl.model["task"] output_path = os.environ["AZUREML_BI_OUTPUT_PATH"] output_file = os.path.join(output_path, f"{task_name}.jsonl") def run(mini_batch: List[str]): if mini_batch: filtered_files = filter(lambda x: Path(x).suffix in DATA_READERS, mini_batch) results = [] for file in filtered_files: data_format = Path(file).suffix data = load_dataset(DATA_READERS[data_format], data_files={"data": file})[ "data" ].data.to_pandas() if text_column: data = data.loc[[text_column]] scores = model.predict(data) results.append( pd.DataFrame( { "file": np.repeat(Path(file).name, len(scores)), "row": range(0, len(scores)), task_name: scores, } ) ) pd.concat(results, axis="rows").to_json( output_file, orient="records", mode="a", lines=True ) return mini_batch
- Allow the endpoint to read multiple data types, including
One the scoring script is created, it's time to create a batch deployment for it. We use environment variables to configure the OpenAI deployment. Particularly we use the following keys:
OPENAI_API_BASE
is the URL of the Azure OpenAI resource to use.OPENAI_API_VERSION
is the version of the API you plan to use.OPENAI_API_TYPE
is the type of API and authentication you want to use.
The environment variable
OPENAI_API_TYPE="azure_ad"
instructs OpenAI to use Active Directory authentication and hence no key is required to invoke the OpenAI deployment. The identity of the cluster is used instead.Once we decided on the authentication and the environment variables, we can use them in the deployment. The following example shows how to use Microsoft Entra authentication particularly:
deployment.yml
$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json endpoint_name: text-embedding-ada-qwerty name: default description: The default deployment for generating embeddings type: model model: azureml:text-embedding-ada-002@latest environment: name: batch-openai-mlflow image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest conda_file: environment/conda.yaml code_configuration: code: code scoring_script: batch_driver.py compute: azureml:batch-cluster-lp resources: instance_count: 1 settings: max_concurrency_per_instance: 1 mini_batch_size: 1 output_action: summary_only retry_settings: max_retries: 1 timeout: 9999 logging_level: info environment_variables: OPENAI_API_TYPE: azure_ad OPENAI_API_BASE: $OPENAI_API_BASE OPENAI_API_VERSION: 2023-03-15-preview
Tip
Notice the
environment_variables
section where we indicate the configuration for the OpenAI deployment. The value forOPENAI_API_BASE
will be set later in the creation command so you don't have to edit the YAML configuration file.Now, let's create the deployment.
At this point, our batch endpoint is ready to be used.
Test the deployment
For testing our endpoint, we are going to use a sample of the dataset BillSum: A Corpus for Automatic Summarization of US Legislation. This sample is included in the repository in the folder data.
Create a data input for this model:
Invoke the endpoint:
Track the progress:
Once the deployment is finished, we can download the predictions:
The output predictions look like the following.
import pandas as pd embeddings = pd.read_json("named-outputs/score/embeddings.jsonl", lines=True) embeddings
embeddings.jsonl
{ "file": "billsum-0.csv", "row": 0, "embeddings": [ [0, 0, 0 ,0 , 0, 0, 0 ] ] }, { "file": "billsum-0.csv", "row": 1, "embeddings": [ [0, 0, 0 ,0 , 0, 0, 0 ] ] },