Deploy an agent for generative AI application

आलेख
07/29/2024

Important

This feature is in Public Preview.

This article shows how to deploy your agent using the deploy() API from databricks.agents.

Requirements

Before you can deploy your agent you must register it to Unity Catalog. Only agents registered in Unity Catalog are able to be deployed using deploy(). See Create and log AI agents. When you register your agent to Unity Catalog it is packaged in the form of a model.
MLflow 2.13.1 or above to deploy agents using the the deploy() API from databricks.agents.

Deploy an agent using `deploy()`

The deploy() API does the following:

Creates CPU model serving endpoints for your agent that can be integrated into your user-facing application.
- Inference tables are enabled on these model serving endpoints. See Inference tables for monitoring and debugging models.
  - Authentication credentials are automatically passed to all Databricks-managed resources required by the agent.
  - If you have resource dependencies that are not Databricks-managed, for example using Pinecone, you can pass in environment variables with secrets to the deploy() API. See Configure access to resources from model serving endpoints.
Enables the Review App for your agent. The Review App allows your stakeholders to chat with the agent and give feedback using the Review App UI.
Logs every request to the Review App or REST API to an inference table. The data logged includes query requests, responses, and intermediate trace data from MLflow Tracing.
Creates a feedback model with the same catalog and schema as the agent you are trying to deploy. This feedback model is the mechanism that makes it possible to accept feedback from the Review App and log it to an inference table. This model is served in the same CPU model serving endpoint as your deployed agent. Because this serving endpoint has inference tables enabled, it is possible to log feedback from the Review App to an inference table.

Note

Deployments can take up to 15 minutes to complete. Raw JSON payloads take 10 - 30 minutes to arrive, and the formatted logs are processed from the raw payloads about every hour.


from databricks.agents import deploy
from mlflow.utils import databricks_utils as du

deployment = deploy(model_fqn, uc_model_info.version)

# query_endpoint is the URL that can be used to make queries to the app
deployment.query_endpoint

# Copy deployment.rag_app_url to browser and start interacting with your RAG application.
deployment.rag_app_url

Agent-enhanced inference tables

The deploy() creates three inference tables for each deployment to log requests and responses to and from the agent serving endpoint.

Note

If you have Azure Storage Firewall enabled, reach out to your Databricks account team to enable inference tables for your endpoints.

Table	Example Unity Catalog table name	What is in each table
Payload	`{catalog_name}.{schema_name}.{model_name}_payload`	Raw JSON payloads
Payload request logs	`{catalog_name}.{schema_name}.{model_name}_payload_request_logs`	Formatted request and responses, MLflow traces
Payload assessment logs	`{catalog_name}.{schema_name}.{model_name}_payload_assessment_logs`	Formatted feedback, as provided in the Review App, for each request

Request log and assessment log tables

Two additional tables are generated automatically from the above payload inference tables: request logs and assessment logs. Users can expect the data to be in these tables within an hour of interacting with their deployment.

The following shows the schema for the request logs table.

Column name	Type	Description
`client_request_id`	String	Client request ID, usually `null`.
`databricks_request_id`	String	Databricks request ID.
`date`	Date	Date of request.
`timestamp_ms`	Long	Timestamp in milliseconds.
`timestamp`	Timestamp	Timestamp of the request.
`status_code`	Integer	Status code of endpoint.
`execution_time_ms`	Long	Total execution milliseconds.
`conversation_id`	String	Conversation id extracted from request logs.
`request`	String	The last user query from the user’s conversation. This is extracted from the RAG request.
`response`	String	The last response to the user. This is extracted from the RAG request.
`request_raw`	String	String representation of request.
`response_raw`	String	String representation of response.
`trace`	String	String representation of trace extracted from the `databricks_options` of response Struct.
`sampling_fraction`	Double	Sampling fraction.
`request_metadata`	Map[String, String]	A map of metadata related to the model serving endpoint associated with the request. This map contains the endpoint name, model name, and model version used for your endpoint.
`schema_version`	String	Integer for the schema version.

The following is the schema for assessment logs.

Column name	Type	Description
`request_id`	String	Databricks request ID.
`step_id`	String	Derived from retrieval assessment.
`source`	Struct	A struct field containing the information on who created the assessment.
`timestamp`	Timestamp	Timestamp of request.
`text_assessment`	Struct	A struct field containing the data for any feedback on the agent’s responses from the review app.
`retrieval_assessment`	Struct	A struct field containing the data for any feedback on the documents retrieved for a response.

Get deployed applications

The following shows how to get your deployed agents.

from databricks.agents import list_deployments, get_deployments

# Get the deployment for specific model_fqn and version
deployment = get_deployments(model_name=model_fqn, model_version=model_version.version)

deployments = list_deployments()
# Print all the current deployments
deployments

इसके माध्यम से साझा किया गया

Deploy an agent for generative AI application

Requirements

Deploy an agent using `deploy()`

Agent-enhanced inference tables

Request log and assessment log tables

Get deployed applications

Additional resources

प्रतिक्रिया

प्रतिक्रिया

अतिरिक्त संसाधन

इसके माध्यम से साझा किया गया

Deploy an agent for generative AI application

Requirements

Deploy an agent using deploy()

Agent-enhanced inference tables

Request log and assessment log tables

Get deployed applications

Additional resources

प्रतिक्रिया

प्रतिक्रिया

अतिरिक्त संसाधन

Deploy an agent using `deploy()`