Prepare and process data with Azure Databricks

Learning Path
4 Modules

At a glance

Level

Intermediate
Skill

 
Product

Azure Databricks
Role

Data Engineer
Subject

Data engineering

Master the essential skills to build robust, scalable data engineering solutions with Azure Databricks and Unity Catalog. Learn to design effective data models, ingest data from diverse sources, transform raw data into analytics-ready formats, and ensure data quality across your lakehouse architecture.

In this learning path, you'll learn how to build a data engineering workflow using Azure Databricks and Unity Catalog. Starting with foundational data modeling concepts, you'll design schemas and partitioning strategies optimized for analytical workloads. You'll then explore multiple ingestion patterns—from managed connectors to streaming pipelines—to bring data into your lakehouse. Next, you'll apply transformation techniques to cleanse and reshape data for business use. Finally, you'll implement quality controls to maintain data integrity throughout your pipelines. By the end, you'll have the practical skills to architect and build production-ready data solutions in Unity Catalog.

Prerequisites

Good understanding of Azure Databricks workspaces and Unity Catalog concepts
Familiarity with SQL and Python programming
Knowledge of fundamental data engineering and data warehouse concepts

Modules in this learning path

Design and implement data modeling with Azure Databricks

Effective data modeling forms the foundation of a performant and maintainable data platform. This module explores how to design ingestion logic, select appropriate tools and table formats, implement partitioning schemes, manage slowly changing dimensions, choose appropriate data granularity, and optimize table performance through clustering strategies in Azure Databricks with Unity Catalog.

Ingest data into Unity Catalog

Data ingestion is a fundamental capability for any data platform. This module explores the comprehensive set of techniques available in Azure Databricks for loading data into Unity Catalog tables. You'll learn how to use managed connectors with Lakeflow Connect, write custom ingestion code in notebooks, apply SQL commands for batch file loading, process change data capture feeds, configure streaming ingestion from message buses, set up Auto Loader for automatic file detection, and orchestrate ingestion workflows with Lakeflow Declarative Pipelines.

Cleanse, transform, and load data into Unity Catalog

Data engineering requires transforming raw data into clean, well-structured formats ready for analysis. This module explores techniques for profiling data quality, selecting appropriate column types, resolving duplicates and null values, applying filtering and aggregation transformations, combining datasets with joins and set operators, reshaping data through pivoting and denormalization, and loading transformed data using append, overwrite, and merge strategies.

Implement and manage data quality constraints with Azure Databricks

This module explores strategies for maintaining high data quality in Azure Databricks. You will learn how to implement validation checks, enforce schemas, manage schema drift, and use pipeline expectations to ensure data integrity throughout your data pipelines.

Start