Share via

Azure Databricks CI/CD Pipeline using Databricks Asset Bundles (DAB) + Azure DevOps — Branching Strategy & Deployment Flow

Shubhangi Nannware 160 Reputation points
2026-05-21T18:17:27.21+00:00

We have Dev and Prod Environment only
Problem Statement:
I am setting up a CI/CD pipeline for Azure Databricks using Databricks Asset Bundles (DAB) and Azure DevOps Pipelines. I have created the bundle locally and pushed it to a feature branch (feature_databricks_bundle) in Azure DevOps. Currently, the repo has only a main branch and short-lived feature branches. I want to validate my proposed branching strategy and deployment flow before proceeding.

Questions:

Q1 — Branching Strategy Validation Is the following branching strategy correct for DAB-based CI/CD?

  • Create a permanent dev branch from main
  • Merge feature_databricks_bundle into dev (then delete the feature branch)
  • Developers create future feature branches from dev (not main)
  • On PR approval into dev → auto-deploy to Dev Databricks workspace
  • On PR approval into main → auto-deploy to Prod Databricks workspace

Is this the recommended approach, or should feature branches still be based off main?

Q2 — Auto-discovery of new Notebooks If a developer adds a new notebook to the repository under the bundle's configured src path, will DAB automatically include it in the next deployment without requiring any manual changes to databricks.yml? Or does each new notebook/job need to be explicitly declared in the bundle configuration?

Q3 — Azure DevOps YML Pipeline Triggers For the Azure DevOps .yml pipeline, what is the recommended way to configure branch-based triggers for multi-environment deployments?

Example setup I am thinking:

# Trigger deploy to Dev on merge into dev branch
trigger:
  branches:
    include:
      - dev

# Trigger deploy to Prod on merge into main
trigger:
  branches:
    include:
      - main

Is it better to have two separate pipeline YML files (one per environment) or a single pipeline with conditional stage execution based on the target branch?

Q4 — DAB Target Configuration In databricks.yml, I plan to define two targets like below. Is this the right way to map targets to workspaces and environments?

targets:
  dev:
    mode: development
    workspace:
      host: https://<dev-workspace>.azuredatabricks.net
  prod:
    mode: production
    workspace:
      host: https://<prod-workspace>.azuredatabricks.net

Should service principal authentication be configured at the target level, and if so, what is the recommended way to pass secrets securely via Azure DevOps pipeline variables?

Q5 — PR Gate: Deploy-before-Merge vs Merge-then-Deploy My current plan is:

  1. Developer raises PR → featuredev
  2. CI pipeline runs bundle deployment to Dev workspace as a PR validation gate
  3. If deployment succeeds, PR is approved and merged
  4. Merge into main triggers deployment to Prod
Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.


Answer accepted by question author

Pilladi Padma Sai Manisha 9,860 Reputation points Microsoft External Staff Moderator
2026-05-25T05:28:08.55+00:00

Hey Shubhangi Nannware,

thanks for laying out your scenario so thoroughly. I’ll run through each of your questions in turn.

Q1 — Branching Strategy

Your proposed flow ( main → permanent dev branch → short-lived feature branches off dev → PR into dev auto-deploys to Dev workspace → PR into main auto-deploys to Prod ) is a perfectly valid “environment-branch” approach and aligns with guidance you’ll see in Azure Synapse CI/CD and ADF docs (they recommend protected branches per environment).

That said, the Databricks Asset Bundles best-practice doc shows the simplest workflow as feature branches off main, merge into main → CI deploys to a staging workspace → then promotion to Prod. If you prefer a dedicated dev branch (so that every merge to dev lands in a long-lived Dev environment), your plan is fine—just be consistent:

• Create and protect a dev branch.

• Base your feature branches off dev.

• Merge to dev triggers your Dev-workspace deploy.

• Merge dev into main (or merge feature branches directly into main) triggers your Prod deploy.

Q2 — Auto-discovery of New Notebooks

By default, Databricks bundles will upload everything under the configured src folder (so any new notebooks you drop in there will be picked up and staged in DBFS). However, Databricks Asset Bundles will not create new Jobs for you unless you explicitly declare them in your databricks.yml. In practice:

• Notebooks, libraries, Python wheels, etc. are auto-collected under src/… and sent up on each deploy.

• Any jobs (or pipelines) you want created/updated must have an entry in your bundle config.

Q3 — Azure DevOps YAML Pipeline Triggers

Azure DevOps only allows one top-level trigger: block per YAML, so your two separate trigger: sections won’t work as-is. Two common patterns emerge:

  1. Single multi-stage pipeline

– trigger on both branches:

  trigger:  

    branches:  

      include:  

        - dev  

        - main  

– Have two stages (DeployToDev, DeployToProd) that use conditions:

  condition: eq(variables['Build.SourceBranchName'], 'dev')  

  condition: eq(variables['Build.SourceBranchName'], 'main')  
  1. Two YAML files

azure-pipelines-dev.yml with trigger: dev → Dev stage only

azure-pipelines-prod.yml with trigger: main → Prod stage only

Most teams lean toward option 1 (one YAML, branch-conditional stages) to reduce duplication, but if you want completely separate approvals/gates per environment file, option 2 is fine.

Q4 — DAB Target Configuration

Yes—defining two targets in databricks.yml is the right approach:

targets:

dev:  

  mode: development  

  workspace:  

    host: https://<dev-workspace>.azuredatabricks.net  

prod:  

  mode: production  

  workspace:  

    host: https://<prod-workspace>.azuredatabricks.net  

For authentication you have two choices:

• Environment variables (recommended): set DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET, DATABRICKS_TENANT_ID, and DATABRICKS_HOST per environment. Then your bundle picks them up automatically.

• Inline in databricks.yml: under each target you can declare an auth: { ... } block, but avoid committing secrets.

In Azure DevOps, store service principal secrets in a Key Vault–backed variable group or secure pipeline variables and map them to the environment variables above.

Q5 — PR Gate: Deploy-before-Merge vs Merge-then-Deploy

Your idea—use the Dev deploy as a PR validation gate—is on the right track. A couple refinements:

  1. For true CI, you can run databricks bundle validate --target dev (or --validateDeploy) in the PR pipeline instead of a full deploy. That catches config errors without touching shared Dev workspace state.
  2. Once the bundle validate succeeds, merge the PR.
  3. Then your merge-into-dev (or merge-into-main) pipeline runs the real databricks bundle deploy.

If you need an actual deploy to Dev as part of PR validation, consider spinning up an isolated Dev workspace per PR (or run against a “sandbox” branch) to avoid stepping on other developers’ changes.

Hope that helps—let me know if you need any clarification!

Reference List

  1. Databricks Asset Bundles best-practices (branching, artifacts) https://learn.microsoft.com/azure/databricks/dev-tools/ci-cd/best-practices
  2. Azure DevOps + Databricks CI/CD example https://learn.microsoft.com/azure/databricks/dev-tools/ci-cd/azure-devops
  3. Azure Synapse Analytics CI/CD guidance (branching, triggers, approvals) https://learn.microsoft.com/azure/synapse-analytics/cicd/continuous-integration-delivery
  4. Azure Data Factory CI/CD best practices (environment branches) https://learn.microsoft.com/azure/data-factory/continuous-integration-deployment

Was this answer helpful?

1 person found this answer helpful.
0 comments No comments

Answer accepted by question author

AI answer

2026-05-21T18:18:01.2+00:00

Q1 — Branching strategy The proposed strategy is aligned with the recommended patterns, with a small adjustment:

  • Recommended flow from the Databricks CI/CD guidance:
    • Develop locally or in the workspace and deploy to a Databricks development workspace to test changes.
    • Create a feature branch to version control updates and regularly sync local/workspace changes.
    • When testing is finished, merge the feature branch into main.
    • CI/CD automatically deploys main to a staging workspace and runs automated tests.
    • When staging checks pass, CI/CD deploys main to production.
    This is explicitly described as the “simplest best practice” branching strategy for bundles.
  • For Azure DevOps more generally, the guidance is also to keep a simple strategy: main as the continuous development branch, plus release branches as needed, with CI triggered on every check-in to main and release branches.

Given that, two viable options are:

  1. Simple (recommended for most teams):
    • Feature branches → merge into main.
    • CI on main deploys to Dev (and optionally to Staging → Prod via promotion).
    • Use environments/approvals to control promotion to Prod.
  2. Two long‑lived branches (what is proposed):
    • Feature branches from dev.
    • Merge to dev → deploy to Dev workspace.
    • Merge to main → deploy to Prod workspace.

Both are compatible with Databricks bundles and Azure DevOps. The documentation, however, describes feature branches merging into main and then using CI/CD to move from staging to production, so feature branches based off main with environment-based promotion is closer to the documented best practice.

Q2 — Auto-discovery of new notebooks From the bundles best practices and library-dependency guidance:

  • Bundles require resources (jobs, pipelines, libraries) to be defined in source and referenced in databricks.yml.
  • The recommendation is to “reference the uploaded compiled library in databricks.yml” and to define resources declaratively.

Implication:

  • Adding a new notebook under the src path does not automatically create or wire up a new job/pipeline in the bundle.
  • Any new workflow (job, pipeline, etc.) must be explicitly declared in the bundle configuration so that databricks bundle validate and databricks bundle deploy know what to deploy.

Q3 — Azure DevOps YAML pipeline triggers From the Azure DevOps + Databricks CI/CD guidance:

  • Pipelines are defined in YAML (azure-pipelines.yml) and can be customized per branch using the Git branch selector.
  • Best practice is not to do production work directly in main, and to use a dedicated branch (for example release) for production deployment.
  • Builds are typically triggered with every check-in to the relevant branch, and release pipelines or stages handle promotion across environments (Dev → QA → UAT → Staging → Prod).

For multi-environment deployments with bundles, two common patterns are supported by the docs and DevOps guidance:

  1. Single multi-stage YAML pipeline
    • One azure-pipelines.yml with stages like Build, Deploy_Dev, Deploy_Prod.
    • Use trigger and/or condition on stages to run Dev on one branch and Prod on another, for example:
      • Trigger on both dev and main.
      • Deploy_Dev stage runs when Build.SourceBranch is refs/heads/dev.
      • Deploy_Prod stage runs when Build.SourceBranch is refs/heads/main.
    • This aligns with the “simple branching strategy” and “deploy multiple branches to different stages” guidance, where different branches feed different stages/environments.
  2. Separate YAML pipelines per environment
    • One YAML file bound to dev branch (deploys to Dev workspace).
    • Another YAML file bound to main (deploys to Prod workspace).
    • Azure DevOps supports customizing the build process per branch via the branch selector in the pipeline editor.

Both are supported. The documentation leans toward a single pipeline with multiple stages and branch-based routing to stages when managing multiple environments, because it keeps the flow centralized and easier to reason about.

Q4 — DAB target configuration The target configuration shown matches the documented pattern for bundles:

targets:
  dev:
    mode: development
    workspace:
      host: https://<dev-workspace>.azuredatabricks.net
  prod:
    mode: production
    workspace:
      host: https://<prod-workspace>.azuredatabricks.net
  • Bundles explicitly support multiple targets (for example dev, prod) that map to different workspaces and modes.
  • The recommended workflow is to validate and deploy the bundle per target using databricks bundle validate and databricks bundle deploy for the appropriate target.

For authentication:

  • The GitHub Actions example in the documentation uses a service principal and sets the DATABRICKS_TOKEN environment variable from a secret (SP_TOKEN).
  • The same pattern applies to Azure DevOps:
    • Use a service principal associated with the Databricks workspace.
    • Generate a Databricks access token for that principal.
    • Store the token as a secret variable in Azure DevOps (variable group or pipeline variable marked secret).
    • In the pipeline, set DATABRICKS_TOKEN (and any other required unified auth environment variables) from that secret before running bundle commands.

Authentication details are typically not stored in databricks.yml targets; instead, targets define workspace hosts and modes, while credentials are injected at runtime via environment variables or service connections.

Q5 — PR gate: deploy-before-merge vs merge-then-deploy From the documented CI/CD flow for bundles:

  • The recommended sequence is:
    • Develop and test in a development workspace.
    • Use feature branches and merge into main after testing.
    • CI/CD then deploys main to staging and, after tests, to production.

This maps well to:

  • PR from feature → integration branch (dev or main).
  • CI validation on PR (build, databricks bundle validate, possibly a test deployment to a Dev workspace).
  • After PR approval and merge, the main pipeline deploys to the appropriate environment(s).

The proposed plan:

  1. PR featuredev.
  2. CI pipeline deploys bundle to Dev workspace as a PR validation gate.
  3. If deployment succeeds, PR is approved and merged.
  4. Merge into main triggers deployment to Prod.

This is compatible with the documented guidance, as long as:

  • The Dev deployment used as a gate is against a non-production workspace (which matches the “develop in dev workspace” recommendation).
  • Production deployment is only triggered from a stable branch (for example main or release) after tests and checks pass.

The alternative, also supported by the docs, is to:

  • Run validation (including databricks bundle validate) on PR without deploying, then deploy to Dev only after merge.

Both are valid; using a Dev deployment as a PR gate is a stricter form of validation and fits within the recommended CI/CD model for bundles.


References:

AI-generated content may be incorrect. Read our transparency notes for more information.

Was this answer helpful?

1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.