Deploy an agent for generative AI application

Artikkel
01/29/2025

Important

This article shows how to deploy your AI agent using the deploy() function from the Agent Framework Python API.

Requirements

MLflow 2.13.1 or above to deploy agents using the the deploy() API from databricks.agents.
Register an AI agent to Unity Catalog. See Register the agent to Unity Catalog.
Deploying agents from outside a Databricks notebook requires databricks-agents SDK version 0.12.0 or above.

Install the the databricks-agents SDK.

%pip install databricks-agents
dbutils.library.restartPython()

Deploy an agent using `deploy()`

The deploy() function does the following:

Creates CPU model serving endpoints for your agent that can be integrated into your user-facing application.
- To reduce cost for idle endpoints (at the expense of increased time to serve initial queries), you can enable scale to zero for your serving endpoint by passing scale_to_zero_enabled=True to deploy(). See Endpoint scaling expectations.
- Enables inference tables with AI Gateway on the model serving endpoint. See Monitor served models using AI Gateway-enabled inference tables.
Note

For streaming response logs, only ChatCompletion-compatible fields and traces are aggregated.
- Databricks automatically provides short-lived service principal credentials to agent code running in the endpoint. The credentials have the minimum permissions to access Databricks-managed resources as defined during model logging. Before generating these credentials, Databricks ensures that the endpoint owner has the appropriate permissions to prevent privilege escalation and unauthorized access. See Authentication for dependent resources.
  - If you have resource dependencies that are not Databricks-managed, for example, using Pinecone, you can pass in environment variables with secrets to the deploy() API. See Configure access to resources from model serving endpoints.
Enables the Review App for your agent. The Review App lets stakeholders chat with the agent and give feedback using the Review App UI.
Logs every request to the Review App or REST API to an inference table. The data logged includes query requests, responses, and intermediate trace data from MLflow Tracing.
Creates a feedback model with the same catalog and schema as the agent you are trying to deploy. This feedback model is the mechanism that makes it possible to accept feedback from the Review App and log it to an inference table. This model is served in the same CPU model serving endpoint as your deployed agent. Because this serving endpoint has inference tables enabled, it is possible to log feedback from the Review App to an inference table.

Note

Deployments can take up to 15 minutes to complete. Raw JSON payloads take 10 - 30 minutes to arrive, and the formatted logs are processed from the raw payloads about every hour.


from databricks.agents import deploy
from mlflow.utils import databricks_utils as du

deployment = deploy(model_fqn, uc_model_info.version)

# query_endpoint is the URL that can be used to make queries to the app
deployment.query_endpoint

# Copy deployment.rag_app_url to browser and start interacting with your RAG application.
deployment.rag_app_url

Agent-enhanced inference tables

The deploy() creates three inference tables for each deployment to log requests and responses to and from the agent serving endpoint. Users can expect the data to be in the payload table within an hour of interacting with their deployment.

Payload request logs and assessment logs might take longer to populate, but are ultimately derived from the raw payload table. You can extract request and assessment logs from the payload table yourself. Deletions and updates to the payload table are not reflected in the payload request logs or the payload assessment logs.

Note

If you have Azure Storage Firewall enabled, reach out to your Databricks account team to enable inference tables for your endpoints.

Table	Example Unity Catalog table name	What is in each table
Payload	`{catalog_name}.{schema_name}.{model_name}_payload`	Raw JSON request and response payloads
Payload request logs	`{catalog_name}.{schema_name}.{model_name}_payload_request_logs`	Formatted request and responses, MLflow traces
Payload assessment logs	`{catalog_name}.{schema_name}.{model_name}_payload_assessment_logs`	Formatted feedback, as provided in the Review App, for each request

The following shows the schema for the request logs table.

Column name	Type	Description
`client_request_id`	String	Client request ID, usually `null`.
`databricks_request_id`	String	Databricks request ID.
`date`	Date	Date of request.
`timestamp_ms`	Long	Timestamp in milliseconds.
`timestamp`	Timestamp	Timestamp of the request.
`status_code`	Integer	Status code of endpoint.
`execution_time_ms`	Long	Total execution milliseconds.
`conversation_id`	String	Conversation id extracted from request logs.
`request`	String	The last user query from the user’s conversation. This is extracted from the RAG request.
`response`	String	The last response to the user. This is extracted from the RAG request.
`request_raw`	String	String representation of request.
`response_raw`	String	String representation of response.
`trace`	String	String representation of trace extracted from the `databricks_options` of response Struct.
`sampling_fraction`	Double	Sampling fraction.
`request_metadata`	Map[String, String]	A map of metadata related to the model serving endpoint associated with the request. This map contains the endpoint name, model name, and model version used for your endpoint.
`schema_version`	String	Integer for the schema version.

The following is the schema for the assessment logs table.

Column name	Type	Description
`request_id`	String	Databricks request ID.
`step_id`	String	Derived from retrieval assessment.
`source`	Struct	A struct field containing the information on who created the assessment.
`timestamp`	Timestamp	Timestamp of request.
`text_assessment`	Struct	A struct field containing the data for any feedback on the agent’s responses from the review app.
`retrieval_assessment`	Struct	A struct field containing the data for any feedback on the documents retrieved for a response.

Authentication for dependent resources

AI agents often need to authenticate to other resources to complete tasks. For example, an agent may need to access a Vector Search index to query unstructured data.

Your agent can use one of the following methods to authenticate to dependent resources when you serve it behind a Model Serving endpoint:

Automatic authentication passthrough: Declare Databricks resource dependencies for your agent during logging. Databricks can automatically provision, rotate, and manage short-lived credentials when your agent is deployed to securely access resources. Databricks recommends using automatic authentication passthrough where possible.
Manual authentication: Manually specify long-lived credentials during agent deployment. Use manual authentication for Databricks resources that do not support automatic authentication passthrough, or for external API access.

Automatic authentication passthrough

Model Serving supports automatic authentication passthrough for the most common types of Databricks resources used by agents.

To enable automatic authentication passthrough, you must specify dependencies during agent logging.

Then, when you serve the agent behind an endpoint, Databricks performs the following steps:

Permission verification: Databricks verifies that the endpoint creator can access all dependencies specified during agent logging.
Service principal creation and grants: A service principal is created for the agent model version and is automatically granted read access to agent resources.

Note

The system-generated service principal does not appear in API or UI listings. If the agent model version is removed from the endpoint, the service principal is also deleted.
Credential provisioning and rotation: Short-lived credentials (an M2M OAuth token) for the service principal are injected into the endpoint, allowing agent code to access Databricks resources. Databricks also rotates the credentials, ensuring that your agent has continued, secure access to dependent resources.

This authentication behavior is similar to the “Run as owner” behavior for Databricks dashboards - downstream resources like Unity Catalog tables are accessed using the credentials of a service principal with least-privilege access to dependent resources.

The following table lists the Databricks resources that support automatic authentication passthrough and the permissions the endpoint creator must have when deploying the agent.

Note

Unity Catalog resources also require USE SCHEMA on the parent schema and USE CATALOG on the parent catalog.

Resource type	Permission
SQL Warehouse	Use Endpoint
Model Serving endpoint	Can Query
Unity Catalog Function	EXECUTE
Genie space	Can Run
Vector Search index	Can Use
Unity Catalog Table	SELECT

Manual authentication

You can also manually provide credentials using secrets-based environment variables. Manual authentication can be helpful in the following scenarios:

The dependent resource does not support automatic authentication passthrough.
The agent is accessing an external resource or API.
The agent needs to use credentials other than those of the agent deployer.

For example, to use the Databricks SDK in your agent to access other dependent resources, you can set the environment variables described in Databricks client unified authentication.

Get deployed applications

The following shows how to get your deployed agents.

from databricks.agents import list_deployments, get_deployments

# Get the deployment for specific model_fqn and version
deployment = get_deployments(model_name=model_fqn, model_version=model_version.version)

deployments = list_deployments()
# Print all the current deployments
deployments

Provide feedback on a deployed agent (experimental)

When you deploy your agent with agents.deploy(), agent framework also creates and deploys a “feedback” model version within the same endpoint, which you can query to provide feedback on your agent application. Feedback entries appear as request rows within the inference table associated with your agent serving endpoint.

Note that this behavior is experimental: Databricks may provide a first-class API for providing feedback on a deployed agent in the future, and future functionality may require migrating to this API.

Limitations of this API include:

The feedback API lacks input validation - it always responds successfully, even if passed invalid input.
The feedback API requires passing in the Databricks-generated request_id of the agent endpoint request on which you wish to provide feedback. To get the databricks_request_id, include {"databricks_options": {"return_trace": True}} in your original request to the agent serving endpoint. The agent endpoint response will then include the databricks_request_id associated with the request so that you can pass that request ID back to the feedback API when providing feedback on the agent response.
Feedback is collected using inference tables. See inference table limitations.

The following example request provides feedback on the agent endpoint named “your-agent-endpoint-name”, and assumes that the DATABRICKS_TOKEN environment variable is set to a Databricks REST API token.

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '
      {
          "dataframe_records": [
              {
                  "source": {
                      "id": "user@company.com",
                      "type": "human"
                  },
                  "request_id": "573d4a61-4adb-41bd-96db-0ec8cebc3744",
                  "text_assessments": [
                      {
                          "ratings": {
                              "answer_correct": {
                                  "value": "positive"
                              },
                              "accurate": {
                                  "value": "positive"
                              }
                          },
                          "free_text_comment": "The answer used the provided context to talk about Delta Live Tables"
                      }
                  ],
                  "retrieval_assessments": [
                      {
                          "ratings": {
                              "groundedness": {
                                  "value": "positive"
                              }
                          }
                      }
                  ]
              }
          ]
      }' \
https://<workspace-host>.databricks.com/serving-endpoints/<your-agent-endpoint-name>/served-models/feedback/invocations

You can pass additional or different key-value pairs in the text_assessments.ratings and retrieval_assessments.ratings fields to provide different types of feedback. In the example, the feedback payload indicates that the agent’s response to the request with ID 573d4a61-4adb-41bd-96db-0ec8cebc3744 was correct, accurate, and grounded in context fetched by a retriever tool.

Del via

Deploy an agent for generative AI application

Requirements

Deploy an agent using `deploy()`

Agent-enhanced inference tables

Authentication for dependent resources

Automatic authentication passthrough

Manual authentication

Get deployed applications

Provide feedback on a deployed agent (experimental)

Additional resources

Tilbakemeldinger

Flere ressurser

Del via

Deploy an agent for generative AI application

Requirements

Deploy an agent using deploy()

Agent-enhanced inference tables

Authentication for dependent resources

Automatic authentication passthrough

Manual authentication

Get deployed applications

Provide feedback on a deployed agent (experimental)

Additional resources

Tilbakemeldinger

Flere ressurser

Deploy an agent using `deploy()`