Design and implement data modeling with Azure Databricks
Intermediate
Data Engineer
Azure Databricks
Effective data modeling forms the foundation of a performant and maintainable data platform. This module explores how to design ingestion logic, select appropriate tools and table formats, implement partitioning schemes, manage slowly changing dimensions, choose appropriate data granularity, and optimize table performance through clustering strategies in Azure Databricks with Unity Catalog.
Learning objectives
By the end of this module, you'll be able to:
- Design data ingestion logic and configure data source connections
- Select the appropriate data ingestion tool for your scenario
- Choose between Delta Lake, Apache Iceberg, and other table formats
- Design and implement effective data partitioning schemes
- Select and implement slowly changing dimension types
- Design and implement temporal tables for change tracking and auditing
- Choose appropriate data granularity for fact and dimension tables
- Design and implement clustering strategies for query optimization
- Evaluate when to use managed versus external tables
Prerequisites
The following prerequisites should be completed:
- Basic understanding of Azure Databricks workspaces and Unity Catalog
- Familiarity with SQL and data warehouse concepts
- Knowledge of Delta Lake fundamentals