Deploy an agent for generative AI application

Important

This feature is in Public Preview.

This article shows how to deploy your AI agent using the deploy() API from databricks.agents.

Requirements

  • Before you can deploy your agent you must register it to Unity Catalog. Only agents registered in Unity Catalog are able to be deployed using deploy(). See Create and log AI agents. When you register your agent to Unity Catalog it is packaged in the form of a model.

  • MLflow 2.13.1 or above to deploy agents using the the deploy() API from databricks.agents.

  • Install the the databricks-agents SDK.

    %pip install databricks-agents
    dbutils.library.restartPython()
    

Deploy an agent using deploy()

The deploy() API does the following:

  • Creates CPU model serving endpoints for your agent that can be integrated into your user-facing application.
  • Enables the Review App for your agent. The Review App allows your stakeholders to chat with the agent and give feedback using the Review App UI.
  • Logs every request to the Review App or REST API to an inference table. The data logged includes query requests, responses, and intermediate trace data from MLflow Tracing.
  • Creates a feedback model with the same catalog and schema as the agent you are trying to deploy. This feedback model is the mechanism that makes it possible to accept feedback from the Review App and log it to an inference table. This model is served in the same CPU model serving endpoint as your deployed agent. Because this serving endpoint has inference tables enabled, it is possible to log feedback from the Review App to an inference table.

Note

Deployments can take up to 15 minutes to complete. Raw JSON payloads take 10 - 30 minutes to arrive, and the formatted logs are processed from the raw payloads about every hour.


from databricks.agents import deploy
from mlflow.utils import databricks_utils as du

deployment = deploy(model_fqn, uc_model_info.version)

# query_endpoint is the URL that can be used to make queries to the app
deployment.query_endpoint

# Copy deployment.rag_app_url to browser and start interacting with your RAG application.
deployment.rag_app_url

Agent-enhanced inference tables

The deploy() creates three inference tables for each deployment to log requests and responses to and from the agent serving endpoint. Users can expect the data to be in the payload table within an hour of interacting with their deployment.

Payload request logs and assessment logs might take longer to populate, but are ultimately derived from the raw payload table. You can extract request and assessment logs from the payload table yourself. Deletions and updates to the payload table are not reflected in the payload request logs or the payload assessment logs.

Note

If you have Azure Storage Firewall enabled, reach out to your Databricks account team to enable inference tables for your endpoints.

Table Example Unity Catalog table name What is in each table
Payload {catalog_name}.{schema_name}.{model_name}_payload Raw JSON request and response payloads
Payload request logs {catalog_name}.{schema_name}.{model_name}_payload_request_logs Formatted request and responses, MLflow traces
Payload assessment logs {catalog_name}.{schema_name}.{model_name}_payload_assessment_logs Formatted feedback, as provided in the Review App, for each request

The following shows the schema for the request logs table.

Column name Type Description
client_request_id String Client request ID, usually null.
databricks_request_id String Databricks request ID.
date Date Date of request.
timestamp_ms Long Timestamp in milliseconds.
timestamp Timestamp Timestamp of the request.
status_code Integer Status code of endpoint.
execution_time_ms Long Total execution milliseconds.
conversation_id String Conversation id extracted from request logs.
request String The last user query from the user’s conversation. This is extracted from the RAG request.
response String The last response to the user. This is extracted from the RAG request.
request_raw String String representation of request.
response_raw String String representation of response.
trace String String representation of trace extracted from the databricks_options of response Struct.
sampling_fraction Double Sampling fraction.
request_metadata Map[String, String] A map of metadata related to the model serving endpoint associated with the request. This map contains the endpoint name, model name, and model version used for your endpoint.
schema_version String Integer for the schema version.

The following is the schema for the assessment logs table.

Column name Type Description
request_id String Databricks request ID.
step_id String Derived from retrieval assessment.
source Struct A struct field containing the information on who created the assessment.
timestamp Timestamp Timestamp of request.
text_assessment Struct A struct field containing the data for any feedback on the agent’s responses from the review app.
retrieval_assessment Struct A struct field containing the data for any feedback on the documents retrieved for a response.

Authentication for dependent resources

When creating the model serving endpoint for agent deployment, Databricks verifies that the creator of the deployment has the necessary permissions on the dependent resources and enables automatic authentication passthrough.

The following table lists MLflow version requirements based on the features the agent authentication passthrough is for:

Feature Minimum mlflow version
Vector search indexes Requires mlflow 2.13.1 or above
Model Serving endpoints Requires mlflow 2.13.1 or above
SQL warehouses Requires mlflow 2.16.1 or above
Unity Catalog Functions Requires mlflow 2.16.1 or above

For LangChain flavored agents, dependent resources are automatically inferred during agent creation and logging. Those resources are logged in the resources.yaml file in the logged model artifact. During deployment, databricks.agents.deploy automatically creates the M2M OAuth tokens required to access and communicate with these inferred resource dependencies.

For PyFunc flavored agents, you must manually specify any resource dependencies during logging of the deployed agent in the resources parameter. See Specify resources for PyFunc agent. During deployment, databricks.agents.deploy creates an M2M OAuth token with access to the resources specified in the resources parameter, and deploys it to the deployed agent.

Get deployed applications

The following shows how to get your deployed agents.

from databricks.agents import list_deployments, get_deployments

# Get the deployment for specific model_fqn and version
deployment = get_deployments(model_name=model_fqn, model_version=model_version.version)

deployments = list_deployments()
# Print all the current deployments
deployments

Additional resources