Ingest data into Unity Catalog

Module
11 Units

Intermediate

Data Engineer

Azure Databricks

Data ingestion is a fundamental capability for any data platform. This module explores the comprehensive set of techniques available in Azure Databricks for loading data into Unity Catalog tables. You'll learn how to use managed connectors with Lakeflow Connect, write custom ingestion code in notebooks, apply SQL commands for batch file loading, process change data capture feeds, configure streaming ingestion from message buses, set up Auto Loader for automatic file detection, and orchestrate ingestion workflows with Lakeflow Spark Declarative Pipelines.

Learning objectives

By the end of this module, you'll be able to:

Configure Lakeflow Connect to ingest data from external sources using managed connectors
Ingest batch and streaming data using notebooks with DataFrames and Structured Streaming
Use SQL commands like COPY INTO and CREATE TABLE AS SELECT for file-based ingestion
Process change data capture feeds with the AUTO CDC API
Configure Spark Structured Streaming for real-time data ingestion from Kafka and Event Hubs
Set up Auto Loader to automatically detect and process new files with schema evolution
Orchestrate data ingestion workflows using Lakeflow Spark Declarative Pipelines

Prerequisites

The following prerequisites should be completed:

Basic understanding of Azure Databricks and Unity Catalog concepts
Familiarity with SQL and Python programming
Knowledge of data engineering concepts such as batch processing and streaming

Introduction min
Ingest data with Lakeflow Connect min
Ingest data with notebooks min
Ingest data with SQL methods min
Ingest data with CDC feed min
Ingest data with Spark Structured Streaming min
Ingest data with Auto Loader min
Ingest data with Lakeflow Spark Declarative Pipelines min
Exercise - Ingest Data into Unity Catalog min
Module assessment min
Summary min

Start