Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This page covers how to debug common issues with AI agents deployed on Azure Databricks.
Go to:
- Best practices
- Local development
- Configuration issues
- Deployment issues
- Runtime errors
- Authentication errors
- Memory and storage
Most debugging sections on this page apply to agents deployed to Databricks Apps. However, you can also find debugging information for agents deployed on Model Serving (legacy) using the tab selectors.
Author agents using best practices
Use the following best practices when authoring agents:
- Enable MLflow tracing: Follow the best practices in Author an AI agent and deploy it on Databricks Apps. Enable MLflow trace autologging to make your agents easier to debug.
- Document tools clearly: Clear tool and parameter descriptions ensure your agent understands your tools and uses them appropriately. See Improve tool-calling with clear documentation.
- Add timeouts and token limits to LLM calls: Add timeouts and token limits to the LLM calls in your code to avoid delays caused by long-running steps.
- If your agent uses the OpenAI client to query a Azure Databricks LLM serving endpoint, set custom timeouts on the serving endpoint calls as needed.
- Validate configuration before deployment: Run
databricks bundle validatebefore you deploy to catch YAML configuration issues early. This helps identify mismatched resource references, invalid permissions, and syntax errors. - Test locally first: Use local development to catch issues before you deploy. Start your agent server locally, test with sample requests, and verify that MLflow traces appear correctly before you deploy to Databricks Apps.
Debug local development issues
Test your agent locally to identify issues before deployment.
Before you run your agent locally, verify that your environment is configured correctly:
Check Databricks CLI version: Run
databricks -vto verify that you have version 0.283.0 or later.Verify CLI profiles: Run
databricks auth profilesto see the configured authentication profiles.Validate environment configuration: Check that your
.envfile contains the required variables, especiallyMLFLOW_TRACKING_URI, which must use the formatdatabricks://PROFILE_NAMEto include your CLI profile.
Common local development errors
| Error | Cause | Solution |
|---|---|---|
The provided MLFLOW_EXPERIMENT_ID does not exist |
Wrong tracking URI format or experiment was deleted | Verify that MLFLOW_TRACKING_URI uses the databricks://PROFILE_NAME format with your CLI profile name |
Module not found |
Dependencies not installed | Run uv sync to install dependencies |
Port already in use |
Another process using the port | Use --port flag to specify a different port (e.g., uv run start-app --port 8001) |
| Authentication errors when running locally | The environment is not configured | Run the quickstart script or manually configure the .env file with your CLI profile |
Test the agent locally
To test your agent before deployment:
Start the agent server locally:
uv run start-appIn another terminal, send a test request:
curl -X POST http://localhost:8000/invocations \ -H "Content-Type: application/json" \ -d '{"input": [{"role": "user", "content": "hello"}]}'View MLflow traces in the Azure Databricks UI to verify your agent is logging traces correctly.
Debug configuration issues
Configuration errors in databricks.yml and app.yaml are common sources of deployment failures.
Validate the Databricks Asset Bundles configuration
Validate the Databricks Asset Bundles configuration before deploying the app:
databricks bundle validate
This command checks your configuration for:
- YAML syntax errors
- Missing required fields
- Not valid resource references
- Permission configuration issues
Common configuration mismatches
| Configuration point | Rule | How to debug |
|---|---|---|
valueFrom references in app.yaml |
Must exactly match a resource name in databricks.yml |
Search for the exact string in both files to verify they match |
| App name | Must start with the agent- prefix (e.g., agent-data-analyst) |
Check the name field under resources.apps in databricks.yml |
| Genie space ID | Must be the 32-character hex string from the Genie URL | Extract from the URL path: https://workspace.cloud.databricks.com/genie/rooms/{SPACE_ID} |
| Unity Catalog function reference | Must use format catalog.schema.function_name |
Verify the function exists using databricks unity-catalog functions list |
| Lakebase instance reference | Must use value (not valueFrom) in the app.yaml file |
The instance name is a literal string, not a resource reference |
Debug deployment issues
Agents deployed to Apps
App already exists error
App already exists error
If you see Error: failed to create app - An app with the same name already exists, you have two options:
Option 1: Bind to existing app (recommended)
# Get existing app configuration
databricks apps get <app-name> --output json
# Sync the configuration to your databricks.yml, then bind
databricks bundle deployment bind <bundle-name> <app-name> --auto-approve
# Deploy
databricks bundle deploy
databricks bundle run <bundle-name>
Option 2: Delete and recreate
databricks apps delete <app-name>
databricks bundle deploy
databricks bundle run <bundle-name>
App not updating after deployment
App not updating after deploy
databricks bundle deploy only uploads files to the workspace. You must also run databricks bundle run <bundle-name> to restart the app with the new code.
Always deploy using both commands:
databricks bundle deploy && databricks bundle run <bundle-name>
View deployment status and logs
View deployment status and logs
To check your app's deployment status:
databricks apps get <app-name>
To view app logs in real-time:
databricks apps logs <app-name> --follow
Agents on Model Serving (legacy)
If you deployed your agent using agents.deploy() to a Model Serving endpoint, review Debugging guide for Model Serving for deployment-specific issues.
To debug runtime issues such as slow or failing requests, see Debug runtime errors.
Debug runtime errors
Agents deployed to Apps
Use app logs and request testing to identify issues with your deployed agent.
Analyze app logs
View real-time logs from your deployed app:
databricks apps logs <app-name> --follow
Look for:
- Stack traces indicating code errors
- Permission denied messages for resources
- Connection errors to external services
- Timeout messages
Common runtime errors
| Error | Cause | Solution |
|---|---|---|
| 302 redirect when querying app | Using Personal Access Token instead of OAuth | Get an OAuth token with databricks auth token |
| Agent not using available tools | Tools not returned from MCP client | Verify the MCP server URL is correct and the resource has proper permissions in databricks.yml |
| Streaming response breaks mid-response | Connection timeout | Increase the CHAT_PROXY_TIMEOUT_SECONDS environment variable in app.yaml |
| Agent returning "Memory not available" | Missing user_id in request |
Pass custom_inputs.user_id in the request payload |
| Empty or error responses despite 200 status | Error occurred within streamed response | Check the actual stream content and app logs, not just the HTTP status code |
Agents on Model Serving (legacy)
Use inference tables and MLflow traces to identify issues with agents deployed to Model Serving endpoints.
Identify problematic requests
If you enabled MLflow trace autologging while authoring your agent, traces are automatically logged in inference tables. Use these traces to identify agent components that are slow or failing.
- In your workspace, go to the Serving tab and select your deployment name.
- In the Inference tables section, find the inference table's fully-qualified name. For example,
my-catalog.my-schema.my-table. - Run the following in a Databricks notebook:
%sql SELECT * FROM my-catalog.my-schema.my-table - Inspect the Response column for detailed trace information.
- Filter on
request_time,databricks_request_idorstatus_codeto narrow down results.%sql SELECT * FROM my-catalog.my-schema.my-table WHERE status_code != 200
Analyze root cause issues
After identifying failing or slow requests, use the mlflow.models.validate_serving_input API to invoke your agent against the failed input request. View the resulting trace and perform root cause analysis on the failed response.
For a faster development loop, update your agent code directly and iterate by invoking your agent against the failed input example.
Debug authentication errors
Agents deployed to Apps
OAuth token authentication required
OAuth token authentication required
You must use a Databricks OAuth token to query agents deployed to Apps. Using a Personal Access Token (PAT) results in a 302 redirect error.
To get an OAuth token:
databricks auth token
Use the token in requests to your deployed app:
TOKEN=$(databricks auth token | jq -r '.access_token')
curl -X POST <app-url>/invocations \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"input": [{"role": "user", "content": "hello"}]}'
Resource permission errors
Resource permission errors
When your agent cannot access workspace resources, verify the resource is properly configured in databricks.yml. Each resource type requires specific permissions:
| Error | Cause | Solution |
|---|---|---|
| Permission denied on Genie space | Missing genie_space resource |
Add a genie_space resource with permission: 'CAN_RUN' |
| Vector search index not accessible | Missing uc_securable resource for the index |
Add a uc_securable resource with securable_type: 'TABLE' and permission: 'SELECT' |
| Unity Catalog function execution denied | Missing uc_securable resource for the function |
Add a uc_securable resource with securable_type: 'FUNCTION' and permission: 'EXECUTE' |
| Serving endpoint access denied | Missing serving_endpoint resource |
Add a serving_endpoint resource with permission: 'CAN_QUERY' |
| SQL warehouse access denied | Missing sql_warehouse resource |
Add a sql_warehouse resource with permission: 'CAN_USE' |
Example resource configuration in databricks.yml:
resources:
apps:
my_agent:
name: 'agent-my-app'
resources:
- name: 'my_genie_space'
genie_space:
space_id: '01234567890abcdef01234567890abcd'
permission: 'CAN_RUN'
- name: 'my_vector_index'
uc_securable:
securable_full_name: 'catalog.schema.index_name'
securable_type: 'TABLE'
permission: 'SELECT'
Custom MCP server permissions
Custom MCP server permissions
If your agent connects to a custom MCP server running as a Databricks app, you must manually grant permissions since apps are not yet supported as resource dependencies in databricks.yml.
# Get your agent app's service principal
AGENT_SP=$(databricks apps get <agent-app-name> --output json | jq -r '.service_principal_name')
# Grant permission on the MCP server app
databricks apps update-permissions <mcp-server-app-name> \
--json "{\"access_control_list\": [{\"service_principal_name\": \"$AGENT_SP\", \"permission_level\": \"CAN_USE\"}]}"
Agents on Model Serving (legacy)
If your deployed agent encounters authentication errors while accessing resources such as vector search indexes or LLM endpoints, verify that it was logged with the necessary resources for automatic authentication passthrough. See Automatic authentication passthrough.
To inspect the logged resources, run the following in a notebook:
%pip install -U mlflow[databricks]
%restart_python
import mlflow
mlflow.set_registry_uri("databricks-uc")
# Replace with the model name and version of your deployed agent
agent_registered_model_name = ...
agent_model_version = ...
model_uri = f"models:/{agent_registered_model_name}/{agent_model_version}"
agent_info = mlflow.models.Model.load(model_uri)
print(f"Resources logged for agent model {model_uri}:", agent_info.resources)
To re-add missing or incorrect resources, log the agent and deploy it again.
If you use manual authentication for resources, verify that environment variables are correctly set. Manual settings override any automatic authentication configurations. See Manual authentication.
Debug memory and storage issues
For agents using Lakebase for memory storage, the following issues are common:
| Error | Cause | Solution |
|---|---|---|
relation 'store' does not exist |
Memory tables not initialized | Run await store.setup() locally before deploying to create required tables |
Unable to resolve :re[LKB] instance |
Wrong instance name or incorrect configuration | Verify LAKEBASE_INSTANCE_NAME uses value (not valueFrom) in app.yaml and matches the instance_name in databricks.yml |
permission denied for table store |
Missing Lakebase permissions | Add a database resource in databricks.yml with permission: 'CAN_CONNECT_AND_CREATE' |
| Memory not persisting across conversations | Different user_id per request |
Ensure you pass a consistent user_id in custom_inputs for each user |
Example Lakebase resource configuration:
resources:
apps:
my_agent:
resources:
- name: 'memory_database'
database:
instance_name: '<lakebase-instance-name>'
database_name: 'postgres'
permission: 'CAN_CONNECT_AND_CREATE'
Before deploying an agent with memory, initialize the tables locally:
import asyncio
from databricks_langchain import AsyncDatabricksStore
async def setup_memory():
async with AsyncDatabricksStore(
instance_name='your-lakebase-instance',
embedding_endpoint='databricks-gte-large-en',
embedding_dims=1024,
) as store:
await store.setup()
asyncio.run(setup_memory())