Migrate an agent from Model Serving to Databricks Apps

Migrate an existing AI agent from a Model Serving endpoint to Databricks Apps.

Databricks recommends authoring agents on Databricks Apps since it provides the following advantages over Model Serving:

Rapid iteration: Iterate on agent code and deployment configuration in seconds, with local debugging and full transparency into logs and agent behavior.
Git-based versioning and CI/CD: Package and version modular Python agent code with Git, and deploy with Databricks Asset Bundles.
AI coding assistant support: Use AI coding assistants to develop and migrate your agent locally.
Scalable async agents: Build asynchronous agents with native Python async patterns for higher concurrency.
Flexible server customization: Use any framework or stack, add custom routes and middleware, and configure user and agent authentication to LLM endpoints and tools.
MLflow tracing: Use MLflow git-based logged models and real-time tracing to monitor agent behavior.

Requirements

An existing agent deployed on a Model Serving endpoint.
The Databricks CLI installed and authenticated. See Install or update the Databricks CLI.
Python 3.11 or later.
The uv package manager. See uv installation.
Databricks Apps enabled in your workspace. See Set up your Databricks Apps workspace and development environment.

Clone the migration template

The migration template provides the scaffolding for developing and deploying an agent on Databricks Apps, along with skill files that teach AI coding assistants how to perform each migration step.

Clone the template and go to the folder:

git clone https://github.com/databricks/app-templates.git
cd app-templates/agent-migration-from-model-serving

The template folder contains:

AGENTS.md: Instructions for AI coding assistants describing the migration workflow
skills/: Skill files for each migration step, executed in sequence by the assistant
agent_server/: The scaffolding of the target Databricks Apps agent with placeholder code for the @invoke() and @stream() handlers
databricks.yml: A Databricks Asset Bundles configuration template with placeholder resource declarations

AI-assisted migration (recommended)

AI-assisted migration is the recommended way to use this template. An AI coding assistant reads AGENTS.md and the skill files and automatically handles the code and configuration changes.

Open the template folder in an AI coding assistant such as Cursor, GitHub Copilot, or Claude.
Ask the assistant to perform the migration by providing your endpoint name:

"Migrate my Model Serving endpoint `my-agent-endpoint` to a Databricks App"

The assistant generates a migration plan and executes each step:

Screenshot of an AI coding assistant displaying a step-by-step TODO list for migrating an agent from Model Serving to Databricks Apps.

Manual migration

Databricks recommends using AI coding assistants to perform the migration. If you prefer to migrate without an AI coding assistant, the following high-level steps describe the process.

Important

These steps are a high-level overview and do not cover all migration scenarios, such as stateful agents, async vs. sync tradeoffs, Unity Catalog artifact access, or complex resource configurations.

Use an AI coding assistant to help with migration or see the migrate-from-model-serving skill in the template for more detailed information.

Step 1. Download agent artifacts

Get the model name and version from your endpoint:

databricks serving-endpoints get <endpoint-name> --output json

Find served_entities[0].entity_name (model name) and entity_version in the response, then download the artifacts:

DATABRICKS_CONFIG_PROFILE=<profile> uv run --no-project \
  --with "mlflow[databricks]>=2.15.0" \
  python3 << 'EOF'
import mlflow
mlflow.set_tracking_uri("databricks")
mlflow.artifacts.download_artifacts(
    artifact_uri="models:/<model-name>/<version>",
    dst_path="./original_mlflow_model"
)
EOF

The downloaded folder contains:

MLmodel — resource declarations for the original agent
code/ — the agent's Python source files
artifacts/ — optional configuration files and prompts
input_example.json — a sample request for testing

Step 2. Migrate agent code

Copy all Python files from code/ to agent_server/ and any artifacts from artifacts/ to agent_server/artifacts/.

After moving files, update any relative imports and hardcoded file paths to reflect the new folder structure. Then rewrite agent_server/agent.py to use the pattern shown in Step 3.

Step 3. Transform agent code

On Model Serving, agents use a class-based ResponsesAgent with predict() and predict_stream() methods. On Databricks Apps, the MLflow AgentServer serves module-level functions decorated with @invoke() and @stream().

When you migrate, choose one of the following patterns:

Async (recommended): Uses Python async def and await to handle multiple requests concurrently. While one request waits for an LLM response, the server processes other requests.
Sync: Keeps synchronous Python patterns from your Model Serving agent. Choose this for a minimal migration or if your code relies on synchronous-only libraries.

Model Serving (before)

The original class-based agent structure.

from mlflow.pyfunc import ResponsesAgent, ResponsesAgentRequest, ResponsesAgentResponse

class MyAgent(ResponsesAgent):
  def predict(self, request: ResponsesAgentRequest, params=None) -> ResponsesAgentResponse:
    # Synchronous implementation
    ...
    return ResponsesAgentResponse(output=outputs)

  def predict_stream(self, request: ResponsesAgentRequest, params=None):
    # Synchronous generator
    for chunk in ...:
      yield ResponsesAgentStreamEvent(...)

Apps — async (recommended)

The primary agent logic lives in streaming(). The non_streaming() function collects its output and returns it as a single response.

from mlflow.genai.agent_server import invoke, stream
from mlflow.types.responses import (
  ResponsesAgentRequest,
  ResponsesAgentResponse,
  ResponsesAgentStreamEvent,
)

@invoke()
async def non_streaming(request: ResponsesAgentRequest) -> ResponsesAgentResponse:
  # Async implementation - typically calls streaming() and collects results
  outputs = [
    event.item
    async for event in streaming(request)
    if event.type == "response.output_item.done"
  ]
  return ResponsesAgentResponse(output=outputs)

@stream()
async def streaming(request: ResponsesAgentRequest) -> AsyncGenerator[ResponsesAgentStreamEvent, None]:
  # Async generator
  async for event in ...:
    yield event

Apps — sync

Extract the class methods into decorated module-level functions with minimal structural changes.

from mlflow.genai.agent_server import invoke, stream
from mlflow.types.responses import (
  ResponsesAgentRequest,
  ResponsesAgentResponse,
  ResponsesAgentStreamEvent,
)

@invoke()
def non_streaming(request: ResponsesAgentRequest) -> ResponsesAgentResponse:
  # Same sync logic from original predict(), extracted from the class
  ...
  return ResponsesAgentResponse(output=outputs)

@stream()
def streaming(request: ResponsesAgentRequest):
  # Same sync generator from original predict_stream(), extracted from the class
  for chunk in ...:
    yield ResponsesAgentStreamEvent(...)

Step 4. Set up the app

Install dependencies and run the quickstart script to configure authentication, create the MLflow experiment, and generate the .env file.

Step 5. Test locally

Start the app server and verify the agent responds correctly before deploying.

Test with your original input_example.json using curl, then deploy after the agent responds as expected.

Step 6. Configure resources

Model Serving agents declare resources in an MLmodel file. Databricks Apps agents declare resources in the databricks.yml configuration file using Databricks Asset Bundles.

See Authentication for AI agents.

Map your resource declarations to the equivalent Databricks Asset Bundles format:

MLmodel resource type	`databricks.yml` equivalent	Permission
`serving_endpoint`	`serving_endpoint`	`CAN_QUERY`
`lakebase`	`database`	`CAN_CONNECT_AND_CREATE`
`vector_search_index`	`uc_securable` (securable_type: `TABLE`)	`SELECT`
`function`	`uc_securable` (securable_type: `FUNCTION`)	`EXECUTE`
`table`	`uc_securable` (securable_type: `TABLE`)	`SELECT` or `MODIFY`
`uc_connection`	`uc_securable` (securable_type: `CONNECTION`)	`USE_CONNECTION`
`sql_warehouse`	`sql_warehouse`	`CAN_USE`
`genie_space`	`genie_space`	`CAN_RUN`

Step 7. Deploy the agent using Databricks Asset Bundles

Deploy your agent to Databricks Apps using Databricks Asset Bundles.

Before deploying, verify your folder structure looks like the following:

<working-directory>/
├── original_mlflow_model/    # Downloaded artifacts from Model Serving
│   ├── MLmodel
│   ├── code/
│   │   └── agent.py
│   ├── input_example.json
│   └── requirements.txt
│
└── <app-name>/               # New Databricks App (ready to deploy)
    ├── agent_server/
    │   ├── agent.py          # Migrated agent code
    │   └── ...
    ├── app.yaml
    ├── databricks.yml        # Bundle config with resources
    ├── pyproject.toml
    ├── requirements.txt
    └── ...

Note

The requirements.txt file in your app folder must contain uv so that Databricks Apps can install dependencies from pyproject.toml.

Validate the bundle configuration:
```
databricks bundle validate
```
Deploy the bundle to your workspace (bundle deploy uploads the files but doesn't start the app):
```
databricks bundle deploy
```

Start the app:

databricks bundle run <app-resource-name>

Next steps

After migrating your agent, see:

Feedback

Was this page helpful?

Last updated on 2026-02-23