An Apache Spark-based analytics platform optimized for Azure.
Hey Shubhangi Nannware,
thanks for laying out your scenario so thoroughly. I’ll run through each of your questions in turn.
Q1 — Branching Strategy
Your proposed flow ( main → permanent dev branch → short-lived feature branches off dev → PR into dev auto-deploys to Dev workspace → PR into main auto-deploys to Prod ) is a perfectly valid “environment-branch” approach and aligns with guidance you’ll see in Azure Synapse CI/CD and ADF docs (they recommend protected branches per environment).
That said, the Databricks Asset Bundles best-practice doc shows the simplest workflow as feature branches off main, merge into main → CI deploys to a staging workspace → then promotion to Prod. If you prefer a dedicated dev branch (so that every merge to dev lands in a long-lived Dev environment), your plan is fine—just be consistent:
• Create and protect a dev branch.
• Base your feature branches off dev.
• Merge to dev triggers your Dev-workspace deploy.
• Merge dev into main (or merge feature branches directly into main) triggers your Prod deploy.
Q2 — Auto-discovery of New Notebooks
By default, Databricks bundles will upload everything under the configured src folder (so any new notebooks you drop in there will be picked up and staged in DBFS). However, Databricks Asset Bundles will not create new Jobs for you unless you explicitly declare them in your databricks.yml. In practice:
• Notebooks, libraries, Python wheels, etc. are auto-collected under src/… and sent up on each deploy.
• Any jobs (or pipelines) you want created/updated must have an entry in your bundle config.
Q3 — Azure DevOps YAML Pipeline Triggers
Azure DevOps only allows one top-level trigger: block per YAML, so your two separate trigger: sections won’t work as-is. Two common patterns emerge:
- Single multi-stage pipeline
– trigger on both branches:
trigger:
branches:
include:
- dev
- main
– Have two stages (DeployToDev, DeployToProd) that use conditions:
condition: eq(variables['Build.SourceBranchName'], 'dev')
condition: eq(variables['Build.SourceBranchName'], 'main')
- Two YAML files
– azure-pipelines-dev.yml with trigger: dev → Dev stage only
– azure-pipelines-prod.yml with trigger: main → Prod stage only
Most teams lean toward option 1 (one YAML, branch-conditional stages) to reduce duplication, but if you want completely separate approvals/gates per environment file, option 2 is fine.
Q4 — DAB Target Configuration
Yes—defining two targets in databricks.yml is the right approach:
targets:
dev:
mode: development
workspace:
host: https://<dev-workspace>.azuredatabricks.net
prod:
mode: production
workspace:
host: https://<prod-workspace>.azuredatabricks.net
For authentication you have two choices:
• Environment variables (recommended): set DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET, DATABRICKS_TENANT_ID, and DATABRICKS_HOST per environment. Then your bundle picks them up automatically.
• Inline in databricks.yml: under each target you can declare an auth: { ... } block, but avoid committing secrets.
In Azure DevOps, store service principal secrets in a Key Vault–backed variable group or secure pipeline variables and map them to the environment variables above.
Q5 — PR Gate: Deploy-before-Merge vs Merge-then-Deploy
Your idea—use the Dev deploy as a PR validation gate—is on the right track. A couple refinements:
- For true CI, you can run
databricks bundle validate --target dev(or--validateDeploy) in the PR pipeline instead of a full deploy. That catches config errors without touching shared Dev workspace state. - Once the bundle validate succeeds, merge the PR.
- Then your merge-into-
dev(or merge-into-main) pipeline runs the realdatabricks bundle deploy.
If you need an actual deploy to Dev as part of PR validation, consider spinning up an isolated Dev workspace per PR (or run against a “sandbox” branch) to avoid stepping on other developers’ changes.
Hope that helps—let me know if you need any clarification!
Reference List
- Databricks Asset Bundles best-practices (branching, artifacts) https://learn.microsoft.com/azure/databricks/dev-tools/ci-cd/best-practices
- Azure DevOps + Databricks CI/CD example https://learn.microsoft.com/azure/databricks/dev-tools/ci-cd/azure-devops
- Azure Synapse Analytics CI/CD guidance (branching, triggers, approvals) https://learn.microsoft.com/azure/synapse-analytics/cicd/continuous-integration-delivery
- Azure Data Factory CI/CD best practices (environment branches) https://learn.microsoft.com/azure/data-factory/continuous-integration-deployment