Design and implement data modeling with Azure Databricks

Intermediate
Data Engineer
Azure Databricks

Effective data modeling forms the foundation of a performant and maintainable data platform. This module explores how to design ingestion logic, select appropriate tools and table formats, implement partitioning schemes, manage slowly changing dimensions, choose appropriate data granularity, and optimize table performance through clustering strategies in Azure Databricks with Unity Catalog.

Learning objectives

By the end of this module, you'll be able to:

  • Design data ingestion logic and configure data source connections
  • Select the appropriate data ingestion tool for your scenario
  • Choose between Delta Lake, Apache Iceberg, and other table formats
  • Design and implement effective data partitioning schemes
  • Select and implement slowly changing dimension types
  • Design and implement temporal tables for change tracking and auditing
  • Choose appropriate data granularity for fact and dimension tables
  • Design and implement clustering strategies for query optimization
  • Evaluate when to use managed versus external tables

Prerequisites

The following prerequisites should be completed:

  • Basic understanding of Azure Databricks workspaces and Unity Catalog
  • Familiarity with SQL and data warehouse concepts
  • Knowledge of Delta Lake fundamentals