Migrate from dbx to bundles

Artikkeli
10/09/2024

Important

Databricks recommends that you use Databricks Asset Bundles instead of dbx by Databricks Labs. Related articles about dbx have been retired and might not be updated.

This article describes how to migrate projects for dbx by Databricks Labs over to Databricks Asset Bundles. See Introduction to dbx by Databricks Labs and What are Databricks Asset Bundles?.

Before you migrate, note the following limitations and feature comparisons between dbx by Databricks Labs and Databricks Asset Bundles.

Limitations

The following functionality supported in dbx by Databricks Labs is limited, does not exist, or requires workarounds in Databricks Asset Bundles.

Building JAR artifacts is not supported in bundles.
FUSE notation for workspace paths is not supported in bundles (for example, /Workspace/<path>/<filename>). However, you can instruct bundles to generate FUSE-style workspace paths during deployments by using notation such as /Workspace/${bundle.file_path}/<filename>.

Feature comparisons

Before you migrate, note how the following features for dbx by Databricks Labs are implemented in Databricks Asset Bundles.

Templates and projects

dbx provide support for Jinja templating. You can include Jinja templates in the deployment configuration and pass environment variables either inline or through a variables file. Although not recommended, dbx also provides experimental support for custom user functions.

Bundles provide support for Go templates for configuration reuse. Users can create bundles based on prebuilt templates. There is almost full parity for templating, except for custom user functions.

Build management

dbx provides build support through pip wheel, Poetry, and Flit. Users can specify the build option in the build section of a project’s deployment.yml file.

Bundles enable users to build, deploy, and run Python wheel files. Users can leverage the built-in whl entry in a bundle’s databricks.yml file.

Sync, deploy, and run code

dbx enables uploading code separately from generating workspace resources such as Azure Databricks jobs.

Bundles always upload code and create or update workspace resources at the same time. This simplifies deployments and avoids blocking conditions for jobs that are already in progress.

Migrate a dbx project to a bundle

After you note the preceding limitations and feature comparisons between dbx by Databricks Labs and Databricks Asset Bundles, you are ready to migrate from dbx to bundles.

Databricks recommends that to begin a dbx project migration, you keep your dbx project in its original folder and that you have a separate, blank folder into which you copy your original dbx project’s contents. This separate folder will be your new bundle. You could encounter unexpected issues if you begin converting your dbx project in its original folder to a bundle and then make some mistakes or want to start over from the beginning,

Step 1: Install and set up the Databricks CLI

Databricks Asset Bundles are generally available in Databricks CLI version 0.218.0 and above. If you have already installed and set up Databricks CLI version 0.218.0 or above, skip ahead to Step 2.

Note

Bundles are not compatible with Databricks CLI versions 0.18 and below.

Install or update to Databricks CLI version 0.218.0 or above. See Install or update the Databricks CLI.
Set up the Databricks CLI for authentication with your target Azure Databricks workspaces, for example by using Azure Databricks personal access token authentication. For other Azure Databricks authentication types, see Authentication for the Databricks CLI.

Step 2: Create the bundle configuration file

If you are using an IDE such as Visual Studio Code, PyCharm Professional or IntelliJ IDEA Ultimate that provides support for YAML files and JSON schema files, you can use your IDE not only to create the bundle configuration file but to check the file’s syntax and formatting and provide code completion hints, as follows.

Visual Studio Code

Add YAML language server support to Visual Studio Code, for example by installing the YAML extension from the Visual Studio Code Marketplace.
Generate the Databricks Asset Bundle configuration JSON schema file by using the Databricks CLI to run the bundle schema command and redirect the output to a JSON file. For example, generate a file named bundle_config_schema.json within the current directory, as follows:
```
databricks bundle schema > bundle_config_schema.json
```
Use Visual Studio Code to create or open a bundle configuration file within the current directory. By convention, this file is named databricks.yml.
Add the following comment to the beginning of your bundle configuration file:
```
# yaml-language-server: $schema=bundle_config_schema.json
```
Note

In the preceding comment, if your Databricks Asset Bundle configuration JSON schema file is in a different path, replace bundle_config_schema.json with the full path to your schema file.
Use the YAML language server features that you added earlier. For more information, see your YAML language server’s documentation.

PyCharm Professional

Generate the Databricks Asset Bundle configuration JSON schema file by using the Databricks CLI to run the bundle schema command and redirect the output to a JSON file. For example, generate a file named bundle_config_schema.json within the current directory, as follows:
```
databricks bundle schema > bundle_config_schema.json
```
Configure PyCharm to recognize the bundle configuration JSON schema file, and then complete the JSON schema mapping, by following the instructions in Configure a custom JSON schema.
Use PyCharm to create or open a bundle configuration file. By convention, this file is named databricks.yml. As you type, PyCharm checks for JSON schema syntax and formatting and provides code completion hints.

IntelliJ IDEA Ultimate

Generate the Databricks Asset Bundle configuration JSON schema file by using the Databricks CLI to run the bundle schema command and redirect the output to a JSON file. For example, generate a file named bundle_config_schema.json within the current directory, as follows:
```
databricks bundle schema > bundle_config_schema.json
```
Configure IntelliJ IDEA to recognize the bundle configuration JSON schema file, and then complete the JSON schema mapping, by following the instructions in Configure a custom JSON schema.
Use IntelliJ IDEA to create or open a bundle configuration file. By convention, this file is named databricks.yml. As you type, IntelliJ IDEA checks for JSON schema syntax and formatting and provides code completion hints.

Step 3: Convert dbx project settings to databricks.yml

Convert the settings in your dbx project’s .dbx/project.json file to the equivalent settings in your bundle’s databricks.yml file. For details, see Converting dbx project settings to databricks.yml.

Step 4: Convert dbx deployment settings to databricks.yml

Convert the settings in your dbx project’s conf folder to the equivalent settings in your bundle’s databricks.yml file. For details, see Converting dbx deployment settings to databricks.yml.

Step 5: Validate the bundle

Before you deploy artifacts or run a Azure Databricks job, a Delta Live Tables pipeline, or an MLOps pipeline, you should make sure that your bundle configuration file is syntactically correct. To do this, run the bundle validate command from the bundle root:

databricks bundle validate

For information about bundle validate, see Validate a bundle.

Step 6: Deploy the bundle

To deploy any specified local artifacts to the remote workspace, run the bundle deploy command from the bundle root. If no command options are specified, the default target declared in the bundle configuration file is used:

databricks bundle deploy

To deploy the artifacts within the context of a specific target, specify the -t (or --target) option along with the target’s name as declared within the bundle configuration file. For example, for a target declared with the name development:

databricks bundle deploy -t development

For information about bundle deploy, see Deploy a bundle.

Tip

You can link bundle-defined jobs and pipelines to existing jobs and pipelines in the Azure Databricks workspace to keep them in sync. See Bind bundle resources.

Step 7: Run the bundle

To run a specific job or pipeline, run the bundle run command from the bundle root. You must specify the job or pipeline declared within the bundle configuration file. If the -t option is not specified, the default target as declared within the bundle configuration file is used. For example, to run a job named hello_job within the context of the default target:

databricks bundle run hello_job

To run a job named hello_job within the context of a target declared with the name development:

databricks bundle run -t development hello_job

For information about bundle run, see Run a bundle.

(Optional) Step 8: Configure the bundle for CI/CD with GitHub

If you use GitHub for CI/CD, you can use GitHub Actions to run the databricks bundle deploy and databricks bundle run commands automatically, based on specific GitHub workflow events and other criteria. See Run a CI/CD workflow with a Databricks Asset Bundle and GitHub Actions.

Converting dbx project settings to databricks.yml

For dbx, project settings are by default in a file named project.json in the project’s .dbx folder. See Project file reference.

For bundles, bundle configurations are by default in a file named databricks.yml within the bundle’s root folder. See Databricks Asset Bundle configuration.

For a conf/project.json file with the following example content:

{
  "environments": {
    "default": {
      "profile": "charming-aurora",
      "storage_type": "mlflow",
      "properties": {
        "workspace_directory": "/Workspace/Shared/dbx/charming_aurora",
        "artifact_location": "/Workspace/Shared/dbx/projects/charming_aurora"
      }
    }
  },
  "inplace_jinja_support": true
}

The corresponding databricks.yml file is as follows:

bundle:
  name: <some-unique-bundle-name>

targets:
  default:
    workspace:
      profile: charming-aurora
      root_path: /Shared/dbx/charming_aurora
      artifact_path: /Shared/dbx/projects/charming_aurora
    resources:
      # See an example "resources" mapping in the following section.

The following objects in this example’s preceding conf/project.json file are not supported in databricks.yml files and have no workarounds:

inplace_jinja_support
storage_type

The following additional allowed objects in conf/project.json files are not supported in databricks.yml files and have no workarounds:

enable-context-based-upload-for-execute
enable-failsafe-cluster-reuse-with-assets

Converting dbx deployment settings to databricks.yml

For dbx, deployment settings are by default in a file within the project’s conf folder. See Deployment file reference. The deployment settings file by default has one of the following file names:

deployment.yml
deployment.yaml
deployment.json
deployment.yml.j2
deployment.yaml.j2
deployment.json.j2

For bundles, deployment settings are by default in a file named databricks.yml within the bundle’s root folder. See Databricks Asset Bundle configuration.

For a conf/deployment.yml file with the following example content:

build:
  python: "pip"

environments:
  default:
    workflows:
      - name: "workflow1"
        tasks:
          - task_key: "task1"
            python_wheel_task:
              package_name: "some-pkg"
              entry_point: "some-ep"