Deploy and maintain data pipelines and workloads with Azure Databricks

Learning Path
4 Modules

At a glance

Level

Intermediate
Skill

 
Product

Azure Databricks
Role

Data Engineer
Subject

Data engineering

Master the complete lifecycle of building, deploying, and maintaining production-ready data pipelines in Azure Databricks—from design and orchestration to monitoring and optimization.

By the end of this learning path, you'll be able to:

Design and implement robust data pipelines using notebooks and Lakeflow Spark Declarative Pipelines
Create and orchestrate Lakeflow Jobs with triggers, schedules, and error handling
Apply version control and deploy pipelines across environments using Git and Declarative Automation Bundles
Monitor, troubleshoot, and optimize data workloads for reliability and performance

Prerequisites

Good understanding of Azure Databricks workspaces
Familiarity with data engineering concepts and SQL
Experience with Python programming and notebooks
Knowledge of Git version control fundamentals

Modules in this learning path

Design and implement data pipelines with Azure Databricks

Learn to design and implement robust data pipelines in Azure Databricks using notebooks and Lakeflow Spark Declarative Pipelines, covering orchestration, error handling, and task logic.

Implement Lakeflow Jobs with Azure Databricks

This module guides you through the process of implementing Lakeflow Jobs in Azure Databricks. You will learn how to create jobs, configure triggers and schedules, set up alerts, and manage automatic restarts to ensure reliable data pipeline execution.

Implement development lifecycle processes in Azure Databricks

Azure Databricks integrates with established development practices through Git folders for version control and Declarative Automation Bundles for infrastructure-as-code deployments. This module explores Git version control best practices, branching and pull request workflows, comprehensive testing strategies, and CLI-based bundle deployment across environments.

Monitor, troubleshoot and optimize workloads in Azure Databricks

Monitoring and optimization are essential for running reliable, cost-effective data workloads in Azure Databricks. This module explores cluster consumption metrics, Lakeflow Jobs troubleshooting, Spark job diagnostics, performance optimization for caching, skew, spill, and shuffle issues, and log streaming to Azure Log Analytics.

Start