Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
This article is an introduction to CI/CD on Databricks. Continuous integration and continuous delivery (CI/CD) refers to the process of developing and delivering software in short, frequent cycles through the use of automation pipelines. CI/CD is common to software development, and is becoming increasingly necessary to data engineering and data science. By automating the building, testing, and deployment of code, development teams are able to deliver releases more reliably than with the manual processes still common to data engineering and data science teams.
Azure Databricks recommends using Databricks Asset Bundles for CI/CD, which enable the development and deployment of complex data, analytics, and ML projects for the Azure Databricks platform. Bundles allow you to easily manage many custom configurations and automate builds, tests, and deployments of your projects to Azure Databricks development, staging, and production workspaces.
For an overview of CI/CD for machine learning projects on Azure Databricks, see How does Databricks support CI/CD for machine learning?.
You can use Databricks Asset Bundles to define and programmatically manage your Azure Databricks CI/CD implementation, which usually includes:
A typical flow for an Azure Databricks CI/CD pipeline includes the following steps:
For more information on managing the lifecycle of Azure Databricks assets and data, see the following documentation about CI/CD and data pipeline tools.
Area | Use these tools when you want to… |
---|---|
Databricks Asset Bundles | Programmatically define, deploy, and run Azure Databricks jobs, Delta Live Tables pipelines, and MLOps Stacks by using CI/CD best practices and workflows. |
Databricks Terraform provider | Provision and manage Databricks infrastructure and resources using Terraform. |
CI/CD workflows with Git and Databricks Git folders | Use GitHub and Databricks Git folders for source control and CI/CD workflows. |
Authenticate with Azure DevOps on Databricks | Authenticate with Azure DevOps. |
Use a Microsoft Entra service principal to authenticate access to Azure Databricks Git folders | Use an MS Entra service principal to authenticate access to Databricks Git folders. |
Continuous integration and delivery on Azure Databricks using Azure DevOps | Develop a CI/CD pipeline for Azure Databricks that uses Azure DevOps. |
Continuous integration and delivery using GitHub Actions | Develop a CI/CD workflow on GitHub that uses GitHub Actions developed for Azure Databricks. |
CI/CD with Jenkins on Azure Databricks | Develop a CI/CD pipeline for Azure Databricks that uses Jenkins. |
Orchestrate Azure Databricks jobs with Apache Airflow | Manage and schedule a data pipeline that uses Apache Airflow. |
Service principals for CI/CD | Use service principals, instead of users, with CI/CD systems. |
Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register todayTraining
Module
Implement CI/CD workflows in Azure Databricks - Training
Implement CI/CD workflows in Azure Databricks
Certification
Microsoft Certified: Azure Data Engineer Associate - Certifications
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.