External models in Databricks Model Serving

Article
04/08/2024

Important

The code examples in this article demonstrate usage of the Public Preview MLflow Deployments CRUD API.

This article describes external models in Databricks Model Serving including its supported model providers and limitations.

What are external models?

External models are third-party models hosted outside of Databricks. Supported by Model Serving, external models allow you to streamline the usage and management of various large language model (LLM) providers, such as OpenAI and Anthropic, within an organization. You can also use Databricks Model Serving as a provider to serve custom models, which offers rate limits for those endpoints. As part of this support, Model Serving offers a high-level interface that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM-related requests.

In addition, Azure Databricks support for external models provides centralized credential management. By storing API keys in one secure location, organizations can enhance their security posture by minimizing the exposure of sensitive API keys throughout the system. It also helps to prevent exposing these keys within code or requiring end users to manage keys safely.

See Tutorial: Create external model endpoints to query OpenAI models for step-by-step guidance on external model endpoint creation and querying supported models served by those endpoints using the MLflow Deployments SDK. See the following guides for instructions on how to use the Serving UI and the REST API:

Requirements

API key for the model provider.
Databricks workspace in External models supported regions.

Model providers

External models in Model Serving is designed to support a variety of model providers. A provider represents the source of the machine learning models, such as OpenAI, Anthropic, and so on. Each provider has its specific characteristics and configurations that are encapsulated within the external_model field of the external model endpoint configuration.

The following providers are supported:

openai: For models offered by OpenAI and the Azure integrations for Azure OpenAI and Azure OpenAI with AAD.
anthropic: For models offered by Anthropic.
cohere: For models offered by Cohere.
amazon-bedrock: For models offered by Amazon Bedrock.
ai21labs: For models offered by AI21Labs.
google-cloud-vertex-ai: For models offered by Google Cloud Vertex AI.
databricks-model-serving: For Databricks Model Serving endpoints with compatible schemas. See Endpoint configuration.

To request support for a provider not listed here, reach out to your Databricks account team.

Supported models

The model you choose directly affects the results of the responses you get from the API calls. Therefore, choose a model that fits your use-case requirements. For instance, for generating conversational responses, you can choose a chat model. Conversely, for generating embeddings of text, you can choose an embedding model.

The table below presents a non-exhaustive list of supported models and corresponding endpoint types. Model associations listed below can be used as a helpful guide when configuring an endpoint for any newly released model types as they become available with a given provider. Customers are responsible for ensuring compliance with applicable model licenses.

Note

With the rapid development of LLMs, there is no guarantee that this list is up to date at all times.

Model provider	llm/v1/completions	llm/v1/chat	llm/v1/embeddings
OpenAI**	* gpt-3.5-turbo-instruct * babbage-002 * davinci-002	* gpt-3.5-turbo * gpt-4 * gpt-3.5-turbo-0125 * gpt-3.5-turbo-1106 * gpt-4-0125-preview * gpt-4-turbo-preview * gpt-4-1106-preview * gpt-4-vision-preview * gpt-4-1106-vision-preview	* text-embedding-ada-002 * text-embedding-3-large * text-embedding-3-small
Azure OpenAI**	* text-davinci-003 * gpt-35-turbo-instruct	* gpt-35-turbo * gpt-35-turbo-16k * gpt-4 * gpt-4-32k	* text-embedding-ada-002 * text-embedding-3-large * text-embedding-3-small
Anthropic	* claude-1 * claude-1.3-100k * claude-2 * claude-2.1 * claude-2.0 * claude-instant-1.2	* claude-3-opus-20240229 * claude-3-sonnet-20240229 * claude-2.1 * claude-2.0 * claude-instant-1.2
Cohere**	* command * command-light-nightly * command-light * command-nightly		* embed-english-v2.0 * embed-multilingual-v2.0 * embed-english-light-v2.0 * embed-english-v3.0 * embed-english-light-v3.0 * embed-multilingual-v3.0 * embed-multilingual-light-v3.0
Databricks Model Serving	Databricks serving endpoint	Databricks serving endpoint	Databricks serving endpoint
Amazon Bedrock	Anthropic: * claude-instant-v1 * claude-v1 * claude-v2 Cohere: * command-text-v14 * command-text-v14:7:4k * command-light-text-v14 * command-light-text-v14:7:4k AI21 Labs: * j2-grande-instruct * j2-jumbo-instruct * j2-mid * j2-mid-v1 * j2-ultra j2-ultra-v1	Anthropic: * claude-instant-v1:2:100k * claude-v2 * claude-v2:0:18k * claude-v2:0:100k * claude-v2:1 * claude-v2:1:18k * claude-v2:1:200k * claude-3-sonnet-20240229-v1:0	Amazon: * titan-embed-text-v1 * titan-embed-g1-text-02 * titan-embed-text-v1:2:8k
AI21 Labs†	* j2-mid * j2-light * j2-ultra
Google Cloud Vertex AI	text-bison	* chat-bison * gemini-pro	textembedding-gecko

** Model provider supports fine-tuned completion and chat models. To query a fine-tuned model, populate the name field of the external model configuration with the name of your fine-tuned model.

† Model provider supports custom completion models.

Use models served on Databricks Model Serving endpoints

Databricks Model Serving endpoints as a provider is supported for the llm/v1/completions, llm/v1/chat, and llm/v1/embeddings endpoint types. These endpoints must accept the standard query parameters marked as required, while other parameters might be ignored depending on whether or not the Databricks Model Serving endpoint supports them.

See POST /serving-endpoints/{name}/invocations in the API reference for standard query parameters.

These endpoints must produce responses in the following OpenAI format.

For completions tasks:

{
"id": "123", # Not Required
"model": "test_databricks_model",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

For chat tasks:

{
  "id": "123", # Not Required
  "model": "test_chat_model",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  },
  {
    "index": 1,
    "message": {
      "role": "human",
      "content": "\n\nWhat is the weather in San Francisco?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

For embeddings tasks:

{
  "data": [
    {
      "embedding": [
       0.0023064255,
        -0.009327292,
        .... # (1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    },
    {
      "embedding": [
        0.0023064255,
        -0.009327292,
        .... #(1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    }
  ],
  "model": "test_embedding_model",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Endpoint configuration

To serve and query external models you need to configure a serving endpoint. See Create custom model serving endpoints

For an external model serving endpoint you must include the external_model field and its parameters in the served_entities section of the endpoint configuration.

The external_model field defines the model to which this endpoint forwards requests. When specifying a model, it is critical that the provider supports the model you are requesting. For instance, openai as a provider supports models like text-embedding-ada-002, but other providers might not. If the model is not supported by the provider, Databricks returns an HTTP 4xx error when trying to route requests to that model.

The below table summarizes the external_model field parameters. See POST /api/2.0/serving-endpoints for endpoint configuration parameters.

Parameter	Descriptions
`name`	The name of the model to use. For example, `gpt-3.5-turbo` for OpenAI’s `GPT-3.5-Turbo` model.
`provider`	Specifies the name of the provider for this model. This string value must correspond to a supported external model provider. For example, `openai` for OpenAI’s `GPT-3.5` models.
`task`	The task corresponds to the type of language model interaction you desire. Supported tasks are “llm/v1/completions”, “llm/v1/chat”, “llm/v1/embeddings”.
`<provider>_config`	Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key. See Configure the provider for an endpoint.

The following is an example of creating an external model endpoint using the create_endpoint() API. In this example, a request sent to the completion endpoint is forwarded to the claude-2 model provided by anthropic.

import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

client.create_endpoint(
    name="anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "claude-2",
                    "provider": "anthropic",
                    "task": "llm/v1/completions",
                    "anthropic_config": {
                        "anthropic_api_key": "{{secrets/my_anthropic_secret_scope/anthropic_api_key}}"
                    }
                }
            }
        ]
    }
)

Configure the provider for an endpoint

When you create an endpoint, you must supply the required configurations for the specified model provider. The following sections summarize the available endpoint configuration parameters for each model provider.

OpenAI

Configuration Parameter	Description	Required	Default
`openai_api_key`	The API key for the OpenAI service.	Yes
`openai_api_type`	An optional field to specify the type of OpenAI API to use.	No
`openai_api_base`	The base URL for the OpenAI API.	No	`https://api.openai.com/v1`
`openai_api_version`	An optional field to specify the OpenAI API version.	No
`openai_organization`	An optional field to specify the organization in OpenAI.	No

Cohere

Configuration Parameter	Description	Required	Default
`cohere_api_key`	The API key for the Cohere service.	Yes

Anthropic

Configuration Parameter	Description	Required	Default
`anthropic_api_key`	The API key for the Anthropic service.	Yes

Azure OpenAI

Azure OpenAI has distinct features as compared with the direct OpenAI service. For an overview, please see the comparison documentation.

Configuration Parameter	Description	Required
`openai_api_key`	The API key for the Azure OpenAI service.	Yes
`openai_api_type`	Adjust this parameter to represent the preferred security access validation protocol. For access token validation, use `azure`. For authentication using Azure Active Directory (Azure AD) use, `azuread`.	Yes
`openai_api_base`	The base URL for the Azure OpenAI API service provided by Azure.	Yes
`openai_api_version`	The version of the Azure OpenAI service to utilize, specified by a date.	Yes
`openai_deployment_name`	The name of the deployment resource for the Azure OpenAI service.	Yes
`openai_organization`	An optional field to specify the organization in OpenAI.	No

The following example demonstrates how to create an endpoint with Azure OpenAI:

client.create_endpoint(
    name="openai-chat-endpoint",
    config={
        "served_entities": [{
            "external_model": {
                "name": "gpt-3.5-turbo",
                "provider": "openai",
                "task": "llm/v1/chat",
                "openai_config": {
                    "openai_api_type": "azure",
                    "openai_api_key": "{{secrets/my_openai_secret_scope/openai_api_key}}",
                    "openai_api_base": "https://my-azure-openai-endpoint.openai.azure.com",
                    "openai_deployment_name": "my-gpt-35-turbo-deployment",
                    "openai_api_version": "2023-05-15"
                }
            }
        }]
    }
)

Google Cloud Vertex AI

Configuration Parameter	Description	Required
`private_key`	This is the private key for the service account which has access to the Google Cloud Vertex AI Service. See Best practices for managing service account keys.	Yes
`region`	This is the region for the Google Cloud Vertex AI Service. See supported regions for more details. Some models are only available in specific regions.	Yes
`project_id`	This is the Google Cloud project id that the service account is associated with.	Yes

Amazon Bedrock

To use Amazon Bedrock as an external model provider, customers need to make sure Bedrock is enabled in the specified AWS region, and the specified AWS key pair have the appropriate permissions to interact with Bedrock services. For more information, see AWS Identity and Access Management.

Configuration Parameter	Description	Required
`aws_region`	The AWS region to use. Bedrock has to be enabled there.	Yes
`aws_access_key_id`	An AWS access key ID with permissions to interact with Bedrock services.	Yes
`aws_secret_access_key`	An AWS secret access key paired with the access key ID, with permissions to interact with Bedrock services.	Yes
`bedrock_provider`	The underlying provider in Amazon Bedrock. Supported values (case insensitive) include: Anthropic, Cohere, AI21Labs, Amazon	Yes

The following example demonstrates how to create an endpoint with Amazon Bedrock.

client.create_endpoint(
    name="bedrock-anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "external_model": {
                    "name": "claude-v2",
                    "provider": "amazon-bedrock",
                    "task": "llm/v1/completions",
                    "amazon_bedrock_config": {
                        "aws_region": "<YOUR_AWS_REGION>",
                        "aws_access_key_id": "{{secrets/my_amazon_bedrock_secret_scope/aws_access_key_id}}",
                        "aws_secret_access_key": "{{secrets/my_amazon_bedrock_secret_scope/aws_secret_access_key}}",
                        "bedrock_provider": "anthropic",
                    },
                }
            }
        ]
    },
)

If there are AWS permission issues, Databricks recommends that you verify the credentials directly with the Amazon Bedrock API.

AI21 Labs

Configuration Parameter	Description	Required	Default
`ai21labs_api_key`	This is the API key for the AI21 Labs service.	Yes

Query an external model endpoint

After you create an external model endpoint, it is ready to receive traffic from users.

You can send scoring requests to the endpoint using the OpenAI client, the REST API or the MLflow Deployments SDK.

See the standard query parameters for a scoring request in POST /serving-endpoints/{name}/invocations.
Query foundation models

The following example queries the claude-2 completions model hosted by Anthropic using the OpenAI client. To use the OpenAI client, populate the model field with the name of the model serving endpoint that hosts the model you want to query.

This example uses a previously created endpoint, anthropic-completions-endpoint, configured for accessing external models from the Anthropic model provider. See how to create external model endpoints.

See Supported models for additional models you can query and their providers.

import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

completion = client.completions.create(
  model="anthropic-completions-endpoint",
  prompt="what is databricks",
  temperature=1.0
)
print(completion)

Expected output response format:

{
"id": "123", # Not Required
"model": "anthropic-completions-endpoint",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

Additional query parameters

You can pass any additional parameters supported by the endpoint’s provider as part of your query.

For example:

logit_bias (supported by OpenAI, Cohere).
top_k (supported by Anthropic, Cohere).
frequency_penalty (supported by OpenAI, Cohere).
presence_penalty (supported by OpenAI, Cohere).
stream (supported by OpenAI, Anthropic, Cohere, Amazon Bedrock for Anthropic). This is only available for chat and completions requests.

Limitations

Depending on the external model you choose, your configuration might cause your data to be processed outside of the region where your data originated.

See Model Serving limits and regions.