Edit

Share via


Design a medallion lakehouse with Azure Data Factory

Azure Data Factory

The medallion lakehouse architecture is a frequently used enterprise data design pattern. You can use this design pattern to logically organize raw data in its native format within a large and centralized repository. Incrementally enrich data as it flows through each layer of the architecture. This process improves the structure, quality, and insight that you can derive from the data.

Azure Data Factory is an Azure platform-as-a-service solution for scale-out serverless data integration and data transformation. Data Factory performs the extraction, transformation, and loading processes within the medallion lakehouse that are required across the various components to generate value from the raw data source.

This article provides a series of designs that typically progress from an initial implementation to enterprise-wide adoption, and ultimately to mission-critical expansion for specific solutions. This guidance supports customers on a similar cloud adoption journey.

Get started

If you're embarking on your cloud adoption journey with the medallion lakehouse architecture, start with these training modules.

Learn how to design and build secure, scalable, and high-performing solutions in Azure by using the pillars of the Azure Well-Architected Framework. This free online resource provides interactive training that includes knowledge checks to evaluate your learning.

For product documentation, see the following resources:

Baseline implementation

After you learn how to deploy Data Factory for data ingestion, develop your medallion lakehouse architecture for data processing by using Azure Databricks, and then serve that data to Power BI by using Azure SQL as the persisted store. You can apply your skills to design and establish a simple solution by using an on-premises data source.

Refer to the baseline architecture that deploys Data Factory instances for data ingestion, Azure Databricks for data processing, and Azure SQL for storing the processed data, all within a single zone-redundant region.

Enterprise adoption and hardening

To comply with common enterprise security and governance nonfunctional requirements (NFRs) for production workloads, you should add enterprise hardening patterns to the baseline architecture. For example, an NFR might require the solution to use federated resources that central teams manage. To avoid service disruptions, it's crucial to communicate your requirements effectively with those teams.

Refer to this architecture that deploys an enterprise-hardened implementation. This implementation extends the hub-and-spoke topology according to Azure landing zone principles.

Mission-critical uplift

The last step in this path is to expand the infrastructure and processes of an individual solution infrastructure to support a mission-critical service-level agreement. Mission-critical refers to solutions that cause business-critical or safety-critical problems when they underperform or are unavailable.

The solution must ensure high availability, quick responsiveness to operational problems, consistent performance, and robust security. Mission-critical architectures must balance performance and resiliency requirements and targets with cost optimization.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Other contributors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps