Debug a deployed AI agent

This page covers how to debug common issues with AI agents deployed on Azure Databricks.

Go to:

Best practices
Local development
Configuration issues
Deployment issues
Runtime errors
Authentication errors
Memory and storage

Most debugging sections on this page apply to agents deployed to Databricks Apps. However, you can also find debugging information for agents deployed on Model Serving (legacy) using the tab selectors.

Author agents using best practices

Use the following best practices when authoring agents:

Enable MLflow tracing: Follow the best practices in Author an AI agent and deploy it on Databricks Apps. Enable MLflow trace autologging to make your agents easier to debug.
Document tools clearly: Clear tool and parameter descriptions ensure your agent understands your tools and uses them appropriately. See Improve tool-calling with clear documentation.
Add timeouts and token limits to LLM calls: Add timeouts and token limits to the LLM calls in your code to avoid delays caused by long-running steps.
- If your agent uses the OpenAI client to query a Azure Databricks LLM serving endpoint, set custom timeouts on the serving endpoint calls as needed.
Validate configuration before deployment: Run databricks bundle validate before you deploy to catch YAML configuration issues early. This helps identify mismatched resource references, invalid permissions, and syntax errors.
Test locally first: Use local development to catch issues before you deploy. Start your agent server locally, test with sample requests, and verify that MLflow traces appear correctly before you deploy to Databricks Apps.

Debug local development issues

Test your agent locally to identify issues before deployment.

Before you run your agent locally, verify that your environment is configured correctly:

Check Databricks CLI version: Run databricks -v to verify that you have version 0.283.0 or later.
Verify CLI profiles: Run databricks auth profiles to see the configured authentication profiles.
Validate environment configuration: Check that your .env file contains the required variables, especially MLFLOW_TRACKING_URI, which must use the format databricks://PROFILE_NAME to include your CLI profile.

Common local development errors

Error	Cause	Solution
`The provided MLFLOW_EXPERIMENT_ID does not exist`	Wrong tracking URI format or experiment was deleted	Verify that `MLFLOW_TRACKING_URI` uses the `databricks://PROFILE_NAME` format with your CLI profile name
`Module not found`	Dependencies not installed	Run `uv sync` to install dependencies
`Port already in use`	Another process using the port	Use `--port` flag to specify a different port (e.g., `uv run start-app --port 8001`)
Authentication errors when running locally	The environment is not configured	Run the quickstart script or manually configure the `.env` file with your CLI profile

Test the agent locally

To test your agent before deployment:

Start the agent server locally:
```
uv run start-app
```

In another terminal, send a test request:

curl -X POST http://localhost:8000/invocations \
  -H "Content-Type: application/json" \
  -d '{"input": [{"role": "user", "content": "hello"}]}'

View MLflow traces in the Azure Databricks UI to verify your agent is logging traces correctly.

Debug configuration issues

Configuration errors in databricks.yml and app.yaml are common sources of deployment failures.

Validate the Databricks Asset Bundles configuration

Validate the Databricks Asset Bundles configuration before deploying the app:

databricks bundle validate

This command checks your configuration for:

YAML syntax errors
Missing required fields
Not valid resource references
Permission configuration issues

Common configuration mismatches

Configuration point	Rule	How to debug
`valueFrom` references in `app.yaml`	Must exactly match a resource `name` in `databricks.yml`	Search for the exact string in both files to verify they match
App name	Must start with the `agent-` prefix (e.g., `agent-data-analyst`)	Check the `name` field under `resources.apps` in `databricks.yml`
Genie space ID	Must be the 32-character hex string from the Genie URL	Extract from the URL path: `https://workspace.cloud.databricks.com/genie/rooms/{SPACE_ID}`
Unity Catalog function reference	Must use format `catalog.schema.function_name`	Verify the function exists using `databricks unity-catalog functions list`
Lakebase instance reference	Must use `value` (not `valueFrom`) in the `app.yaml` file	The instance name is a literal string, not a resource reference

Debug deployment issues

Agents deployed to Apps

App already exists error

If you see Error: failed to create app - An app with the same name already exists, you have two options:

Option 1: Bind to existing app (recommended)

# Get existing app configuration
databricks apps get <app-name> --output json

# Sync the configuration to your databricks.yml, then bind
databricks bundle deployment bind <bundle-name> <app-name> --auto-approve

# Deploy
databricks bundle deploy
databricks bundle run <bundle-name>

Option 2: Delete and recreate

databricks apps delete <app-name>
databricks bundle deploy
databricks bundle run <bundle-name>

App not updating after deployment

App not updating after deploy

databricks bundle deploy only uploads files to the workspace. You must also run databricks bundle run <bundle-name> to restart the app with the new code.

Always deploy using both commands:

databricks bundle deploy && databricks bundle run <bundle-name>

View deployment status and logs

To check your app's deployment status:

databricks apps get <app-name>

To view app logs in real-time:

databricks apps logs <app-name> --follow

Agents on Model Serving (legacy)

If you deployed your agent using agents.deploy() to a Model Serving endpoint, review Debugging guide for Model Serving for deployment-specific issues.

To debug runtime issues such as slow or failing requests, see Debug runtime errors.

Debug runtime errors

Agents deployed to Apps

Use app logs and request testing to identify issues with your deployed agent.

Analyze app logs

View real-time logs from your deployed app:

databricks apps logs <app-name> --follow

Look for:

Stack traces indicating code errors
Permission denied messages for resources
Connection errors to external services
Timeout messages

Common runtime errors

Error	Cause	Solution
302 redirect when querying app	Using Personal Access Token instead of OAuth	Get an OAuth token with `databricks auth token`
Agent not using available tools	Tools not returned from MCP client	Verify the MCP server URL is correct and the resource has proper permissions in `databricks.yml`
Streaming response breaks mid-response	Connection timeout	Increase the `CHAT_PROXY_TIMEOUT_SECONDS` environment variable in `app.yaml`
Agent returning "Memory not available"	Missing `user_id` in request	Pass `custom_inputs.user_id` in the request payload
Empty or error responses despite 200 status	Error occurred within streamed response	Check the actual stream content and app logs, not just the HTTP status code

Agents on Model Serving (legacy)

Use inference tables and MLflow traces to identify issues with agents deployed to Model Serving endpoints.

Identify problematic requests

If you enabled MLflow trace autologging while authoring your agent, traces are automatically logged in inference tables. Use these traces to identify agent components that are slow or failing.

In your workspace, go to the Serving tab and select your deployment name.
In the Inference tables section, find the inference table's fully-qualified name. For example, my-catalog.my-schema.my-table.

Run the following in a Databricks notebook:

%sql
SELECT * FROM my-catalog.my-schema.my-table

Inspect the Response column for detailed trace information.
Filter on request_time, databricks_request_id or status_code to narrow down results.
```
%sql
SELECT * FROM my-catalog.my-schema.my-table
WHERE status_code != 200
```

Analyze root cause issues

After identifying failing or slow requests, use the mlflow.models.validate_serving_input API to invoke your agent against the failed input request. View the resulting trace and perform root cause analysis on the failed response.

For a faster development loop, update your agent code directly and iterate by invoking your agent against the failed input example.

Debug authentication errors

Agents deployed to Apps

OAuth token authentication required

You must use a Databricks OAuth token to query agents deployed to Apps. Using a Personal Access Token (PAT) results in a 302 redirect error.

To get an OAuth token:

databricks auth token

Use the token in requests to your deployed app:

TOKEN=$(databricks auth token | jq -r '.access_token')
curl -X POST <app-url>/invocations \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"input": [{"role": "user", "content": "hello"}]}'

Resource permission errors

When your agent cannot access workspace resources, verify the resource is properly configured in databricks.yml. Each resource type requires specific permissions:

Error	Cause	Solution
Permission denied on Genie space	Missing `genie_space` resource	Add a `genie_space` resource with `permission: 'CAN_RUN'`
Vector search index not accessible	Missing `uc_securable` resource for the index	Add a `uc_securable` resource with `securable_type: 'TABLE'` and `permission: 'SELECT'`
Unity Catalog function execution denied	Missing `uc_securable` resource for the function	Add a `uc_securable` resource with `securable_type: 'FUNCTION'` and `permission: 'EXECUTE'`
Serving endpoint access denied	Missing `serving_endpoint` resource	Add a `serving_endpoint` resource with `permission: 'CAN_QUERY'`
SQL warehouse access denied	Missing `sql_warehouse` resource	Add a `sql_warehouse` resource with `permission: 'CAN_USE'`

Example resource configuration in databricks.yml:

resources:
  apps:
    my_agent:
      name: 'agent-my-app'
      resources:
        - name: 'my_genie_space'
          genie_space:
            space_id: '01234567890abcdef01234567890abcd'
            permission: 'CAN_RUN'
        - name: 'my_vector_index'
          uc_securable:
            securable_full_name: 'catalog.schema.index_name'
            securable_type: 'TABLE'
            permission: 'SELECT'

Custom MCP server permissions

If your agent connects to a custom MCP server running as a Databricks app, you must manually grant permissions since apps are not yet supported as resource dependencies in databricks.yml.

# Get your agent app's service principal
AGENT_SP=$(databricks apps get <agent-app-name> --output json | jq -r '.service_principal_name')

# Grant permission on the MCP server app
databricks apps update-permissions <mcp-server-app-name> \
  --json "{\"access_control_list\": [{\"service_principal_name\": \"$AGENT_SP\", \"permission_level\": \"CAN_USE\"}]}"

Agents on Model Serving (legacy)

If your deployed agent encounters authentication errors while accessing resources such as vector search indexes or LLM endpoints, verify that it was logged with the necessary resources for automatic authentication passthrough. See Automatic authentication passthrough.

To inspect the logged resources, run the following in a notebook:

%pip install -U mlflow[databricks]
%restart_python

import mlflow
mlflow.set_registry_uri("databricks-uc")

# Replace with the model name and version of your deployed agent
agent_registered_model_name = ...
agent_model_version = ...

model_uri = f"models:/{agent_registered_model_name}/{agent_model_version}"
agent_info = mlflow.models.Model.load(model_uri)
print(f"Resources logged for agent model {model_uri}:", agent_info.resources)

To re-add missing or incorrect resources, log the agent and deploy it again.

If you use manual authentication for resources, verify that environment variables are correctly set. Manual settings override any automatic authentication configurations. See Manual authentication.

Debug memory and storage issues

For agents using Lakebase for memory storage, the following issues are common:

Error	Cause	Solution
`relation 'store' does not exist`	Memory tables not initialized	Run `await store.setup()` locally before deploying to create required tables
`Unable to resolve :re[LKB] instance`	Wrong instance name or incorrect configuration	Verify `LAKEBASE_INSTANCE_NAME` uses `value` (not `valueFrom`) in `app.yaml` and matches the `instance_name` in `databricks.yml`
`permission denied for table store`	Missing Lakebase permissions	Add a `database` resource in `databricks.yml` with `permission: 'CAN_CONNECT_AND_CREATE'`
Memory not persisting across conversations	Different `user_id` per request	Ensure you pass a consistent `user_id` in `custom_inputs` for each user

Example Lakebase resource configuration:

resources:
  apps:
    my_agent:
      resources:
        - name: 'memory_database'
          database:
            instance_name: '<lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

Before deploying an agent with memory, initialize the tables locally:

import asyncio
from databricks_langchain import AsyncDatabricksStore

async def setup_memory():
    async with AsyncDatabricksStore(
        instance_name='your-lakebase-instance',
        embedding_endpoint='databricks-gte-large-en',
        embedding_dims=1024,
    ) as store:
        await store.setup()

asyncio.run(setup_memory())

Feedback

Was this page helpful?

Last updated on 2026-02-12

Share via

Debug a deployed AI agent

Author agents using best practices

Debug local development issues

Common local development errors

Test the agent locally

Debug configuration issues

Validate the Databricks Asset Bundles configuration

Common configuration mismatches

Debug deployment issues

Agents deployed to Apps

App already exists error

App not updating after deploy

View deployment status and logs

Agents on Model Serving (legacy)

Debug runtime errors

Agents deployed to Apps

Analyze app logs

Common runtime errors

Agents on Model Serving (legacy)

Identify problematic requests

Analyze root cause issues

Debug authentication errors

Agents deployed to Apps

OAuth token authentication required

Resource permission errors

Custom MCP server permissions

Agents on Model Serving (legacy)

Debug memory and storage issues

Feedback

Additional resources