Run OpenAI models in batch endpoints to compute embeddings

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Batch Endpoints can deploy models to run inference over large amounts of data, including OpenAI models. In this example, you learn how to create a batch endpoint to deploy ADA-002 model from OpenAI to compute embeddings at scale but you can use the same approach for completions and chat completions models. It uses Microsoft Entra authentication to grant access to the Azure OpenAI resource.

About this example

In this example, we're going to compute embeddings over a dataset using ADA-002 model from OpenAI. We will register the particular model in MLflow format using the OpenAI flavor which has support to orchestrate all the calls to the OpenAI service at scale.

The example in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste YAML and other files, first clone the repo and then change directories to the folder:

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli

The files for this example are in:

cd endpoints/batch/deploy-models/openai-embeddings

Follow along in Jupyter Notebooks

You can follow along this sample in the following notebooks. In the cloned repository, open the notebook: deploy-and-test.ipynb.

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

  • An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.

  • An Azure Machine Learning workspace. If you don't have one, use the steps in the Manage Azure Machine Learning workspaces article to create one.

  • Ensure that you have the following permissions in the workspace:

    • Create or manage batch endpoints and deployments: Use an Owner, Contributor, or Custom role that allows Microsoft.MachineLearningServices/workspaces/batchEndpoints/*.

    • Create ARM deployments in the workspace resource group: Use an Owner, Contributor, or Custom role that allows Microsoft.Resources/deployments/write in the resource group where the workspace is deployed.

  • You need to install the following software to work with Azure Machine Learning:

    The Azure CLI and the ml extension for Azure Machine Learning.

    az extension add -n ml
    

    Note

    Pipeline component deployments for Batch Endpoints were introduced in version 2.7 of the ml extension for Azure CLI. Use az extension update --name ml to get the last version of it.

Connect to your workspace

The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section, we'll connect to the workspace in which you'll perform deployment tasks.

Pass in the values for your subscription ID, workspace, location, and resource group in the following code:

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

Ensure you have an OpenAI deployment

The example shows how to run OpenAI models hosted in Azure OpenAI service. To successfully do it, you need an OpenAI resource correctly deployed in Azure and a deployment for the model you want to use.

An screenshot showing the Azure OpenAI studio with the list of model deployments available.

Take note of the OpenAI resource being used. We use the name to construct the URL of the resource. Save the URL for later use on the tutorial.

OPENAI_API_BASE="https://<your-azure-openai-resource-name>.openai.azure.com"

Ensure you have a compute cluster where to deploy the endpoint

Batch endpoints use compute cluster to run the models. In this example, we use a compute cluster called batch-cluster. We create the compute cluster here but you can skip this step if you already have one:

COMPUTE_NAME="batch-cluster"
az ml compute create -n batch-cluster --type amlcompute --min-instances 0 --max-instances 5

Decide in the authentication mode

You can access the Azure OpenAI resource in two ways:

  • Using Microsoft Entra authentication (recommended).
  • Using an access key.

Using Microsoft Entra is recommended because it helps you avoid managing secrets in the deployments.

You can configure the identity of the compute to have access to the Azure OpenAI deployment to get predictions. In this way, you don't need to manage permissions for each of the users using the endpoint. To configure the identity of the compute cluster get access to the Azure OpenAI resource, follow these steps:

  1. Ensure or assign an identity to the compute cluster your deployment uses. In this example, we use a compute cluster called batch-cluster and we assign a system assigned managed identity, but you can use other alternatives.

    COMPUTE_NAME="batch-cluster"
    az ml compute update --name $COMPUTE_NAME --identity-type system_assigned
    
  2. Get the managed identity principal ID assigned to the compute cluster you plan to use.

    PRINCIPAL_ID=$(az ml compute show -n $COMPUTE_NAME --query identity.principal_id)
    
  3. Get the unique ID of the resource group where the Azure OpenAI resource is deployed:

    RG="<openai-resource-group-name>"
    RESOURCE_ID=$(az group show -g $RG --query "id" -o tsv)
    
  4. Grant the role Cognitive Services User to the managed identity:

    az role assignment create --role "Cognitive Services User" --assignee $PRINCIPAL_ID --scope $RESOURCE_ID
    

Register the OpenAI model

Model deployments in batch endpoints can only deploy registered models. You can use MLflow models with the flavor OpenAI to create a model in your workspace referencing a deployment in Azure OpenAI.

  1. Create an MLflow model in the workspace's models registry pointing to your OpenAI deployment with the model you want to use. Use MLflow SDK to create the model:

    Tip

    In the cloned repository in the folder model you already have an MLflow model to generate embeddings based on ADA-002 model in case you want to skip this step.

    import mlflow
    import openai
    
    engine = openai.Model.retrieve("text-embedding-ada-002")
    
    model_info = mlflow.openai.save_model(
        path="model",
        model="text-embedding-ada-002",
        engine=engine.id,
        task=openai.Embedding,
    )
    
  2. Register the model in the workspace:

    MODEL_NAME='text-embedding-ada-002'
    az ml model create --name $MODEL_NAME --path "model"
    

Create a deployment for an OpenAI model

  1. First, let's create the endpoint that hosts the model. Decide on the name of the endpoint:

    ENDPOINT_NAME="text-davinci-002"
    
  2. Configure the endpoint:

    The following YAML file defines a batch endpoint:

    endpoint.yml

    $schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
    name: text-embedding-ada-qwerty
    description: An endpoint to generate embeddings in batch for the ADA-002 model from OpenAI
    auth_mode: aad_token
    
  3. Create the endpoint resource:

    az ml batch-endpoint create -n $ENDPOINT_NAME -f endpoint.yml
    
  4. Our scoring script uses some specific libraries that are not part of the standard OpenAI SDK so we need to create an environment that have them. Here, we configure an environment with a base image a conda YAML.

    environment/environment.yml

    $schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
    name: batch-openai-mlflow
    image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
    conda_file: conda.yaml
    

    The conda YAML looks as follows:

    conda.yaml

    channels:
    - conda-forge
    dependencies:
    - python=3.8.5
    - pip<=23.2.1
    - pip:
      - openai==0.27.8
      - requests==2.31.0
      - tenacity==8.2.2
      - tiktoken==0.4.0
      - azureml-core
      - azure-identity
      - datasets
      - mlflow
    
  5. Let's create a scoring script that performs the execution. In Batch Endpoints, MLflow models don't require a scoring script. However, in this case we want to extend a bit the capabilities of batch endpoints by:

    • Allow the endpoint to read multiple data types, including csv, tsv, parquet, json, jsonl, arrow, and txt.
    • Add some validations to ensure the MLflow model used has an OpenAI flavor on it.
    • Format the output in jsonl format.
    • Add an environment variable AZUREML_BI_TEXT_COLUMN to control (optionally) which input field you want to generate embeddings for.

    Tip

    By default, MLflow will use the first text column available in the input data to generate embeddings from. Use the environment variable AZUREML_BI_TEXT_COLUMN with the name of an existing column in the input dataset to change the column if needed. Leave it blank if the default behavior works for you.

    The scoring script looks as follows:

    code/batch_driver.py

    import os
    import glob
    import mlflow
    import pandas as pd
    import numpy as np
    from pathlib import Path
    from typing import List
    from datasets import load_dataset
    
    DATA_READERS = {
        ".csv": "csv",
        ".tsv": "tsv",
        ".parquet": "parquet",
        ".json": "json",
        ".jsonl": "json",
        ".arrow": "arrow",
        ".txt": "text",
    }
    
    
    def init():
        global model
        global output_file
        global task_name
        global text_column
    
        # AZUREML_MODEL_DIR is the path where the model is located.
        # If the model is MLFlow, you don't need to indicate further.
        model_path = glob.glob(os.environ["AZUREML_MODEL_DIR"] + "/*/")[0]
        # AZUREML_BI_TEXT_COLUMN is an environment variable you can use
        # to indicate over which column you want to run the model on. It can
        # used only if the model has one single input.
        text_column = os.environ.get("AZUREML_BI_TEXT_COLUMN", None)
    
        model = mlflow.pyfunc.load_model(model_path)
        model_info = mlflow.models.get_model_info(model_path)
    
        if not mlflow.openai.FLAVOR_NAME in model_info.flavors:
            raise ValueError(
                "The indicated model doesn't have an OpenAI flavor on it. Use "
                "``mlflow.openai.log_model`` to log OpenAI models."
            )
    
        if text_column:
            if (
                model.metadata
                and model.metadata.signature
                and len(model.metadata.signature.inputs) > 1
            ):
                raise ValueError(
                    "The model requires more than 1 input column to run. You can't use "
                    "AZUREML_BI_TEXT_COLUMN to indicate which column to send to the model. Format your "
                    f"data with columns {model.metadata.signature.inputs.input_names()} instead."
                )
    
        task_name = model._model_impl.model["task"]
        output_path = os.environ["AZUREML_BI_OUTPUT_PATH"]
        output_file = os.path.join(output_path, f"{task_name}.jsonl")
    
    
    def run(mini_batch: List[str]):
        if mini_batch:
            filtered_files = filter(lambda x: Path(x).suffix in DATA_READERS, mini_batch)
            results = []
    
            for file in filtered_files:
                data_format = Path(file).suffix
                data = load_dataset(DATA_READERS[data_format], data_files={"data": file})[
                    "data"
                ].data.to_pandas()
                if text_column:
                    data = data.loc[[text_column]]
                scores = model.predict(data)
                results.append(
                    pd.DataFrame(
                        {
                            "file": np.repeat(Path(file).name, len(scores)),
                            "row": range(0, len(scores)),
                            task_name: scores,
                        }
                    )
                )
    
            pd.concat(results, axis="rows").to_json(
                output_file, orient="records", mode="a", lines=True
            )
    
        return mini_batch
    
  6. One the scoring script is created, it's time to create a batch deployment for it. We use environment variables to configure the OpenAI deployment. Particularly we use the following keys:

    • OPENAI_API_BASE is the URL of the Azure OpenAI resource to use.
    • OPENAI_API_VERSION is the version of the API you plan to use.
    • OPENAI_API_TYPE is the type of API and authentication you want to use.

    The environment variable OPENAI_API_TYPE="azure_ad" instructs OpenAI to use Active Directory authentication and hence no key is required to invoke the OpenAI deployment. The identity of the cluster is used instead.

  7. Once we decided on the authentication and the environment variables, we can use them in the deployment. The following example shows how to use Microsoft Entra authentication particularly:

    deployment.yml

    $schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json
    endpoint_name: text-embedding-ada-qwerty
    name: default
    description: The default deployment for generating embeddings
    type: model
    model: azureml:text-embedding-ada-002@latest
    environment:
      name: batch-openai-mlflow
      image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
      conda_file: environment/conda.yaml
    code_configuration:
      code: code
      scoring_script: batch_driver.py
    compute: azureml:batch-cluster-lp
    resources:
      instance_count: 1
    settings:
      max_concurrency_per_instance: 1
      mini_batch_size: 1
      output_action: summary_only
      retry_settings:
        max_retries: 1
        timeout: 9999
      logging_level: info
      environment_variables:
        OPENAI_API_TYPE: azure_ad
        OPENAI_API_BASE: $OPENAI_API_BASE
        OPENAI_API_VERSION: 2023-03-15-preview
    

    Tip

    Notice the environment_variables section where we indicate the configuration for the OpenAI deployment. The value for OPENAI_API_BASE will be set later in the creation command so you don't have to edit the YAML configuration file.

  8. Now, let's create the deployment.

    az ml batch-deployment create --file deployment.yml \
                                  --endpoint-name $ENDPOINT_NAME \
                                  --set-default \
                                  --set settings.environment_variables.OPENAI_API_BASE=$OPENAI_API_BASE
    
  9. At this point, our batch endpoint is ready to be used.

Test the deployment

For testing our endpoint, we are going to use a sample of the dataset BillSum: A Corpus for Automatic Summarization of US Legislation. This sample is included in the repository in the folder data.

  1. Create a data input for this model:

    az ml job show -n $JOB_NAME --web
    
  2. Invoke the endpoint:

    JOB_NAME=$(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input data --query name -o tsv)
    
  3. Track the progress:

    az ml job show -n $JOB_NAME --web
    
  4. Once the deployment is finished, we can download the predictions:

    To download the predictions, use the following command:

    az ml job download --name $JOB_NAME --output-name score --download-path ./
    
  5. The output predictions look like the following.

    import pandas as pd 
    
    embeddings = pd.read_json("named-outputs/score/embeddings.jsonl", lines=True)
    embeddings
    

    embeddings.jsonl

    {
        "file": "billsum-0.csv",
        "row": 0,
        "embeddings": [
            [0, 0, 0 ,0 , 0, 0, 0 ]
        ]
    },
    {
        "file": "billsum-0.csv",
        "row": 1,
        "embeddings": [
            [0, 0, 0 ,0 , 0, 0, 0 ]
        ]
    },
    

Next steps