Edit

Share via


Implement medallion architecture with materialized lake views

This tutorial outlines the steps and considerations for implementing a medallion architecture using materialized lake views. By the end of this tutorial, you learn the key features and capabilities of materialized lake views and be able to create an automated data transformation workflow. This tutorial isn't intended to be a reference architecture, an exhaustive list of features and functionality, or a recommendation of specific best practices.

Prerequisites

As prerequisites to this tutorial, complete the following steps:

  1. Sign into your Power BI account, or if you don't have an account yet, sign up for a free trial.
  2. Enable Microsoft Fabric in your tenant. Select the default Power BI icon at the bottom left of the screen and select Fabric.
  3. Create a Microsoft Fabric enabled Workspace.
  4. Select a workspace from the Workspaces tab, then select + New item, and choose Data pipeline. Provide a name for your pipeline and select Create.
  5. Create a Lakehouse with schemas enabled. Name it SalesLakehouse and load sample data files into the Lakehouse. For more information, see Lakehouse tutorial.

Scenario overview

In this tutorial, you'are going to take an example of a fictional retail organization, Contoso, which uses a medallion architecture for data analytics to gain actionable insights into its retail sales operations. It aims to streamline the analysis process and generate deeper insights into business performance by organizing their data into three layers—bronze (raw data), silver (cleaned and enriched data), and gold (aggregated and analyzed data).

The following diagram represents different entities in each layer of medallion architecture in SalesLakehouse:

Screenshot showing medallion architecture.

Entities

  1. Orders: This entity includes details about each customer order, such as order date, shipment details, product category, and subcategory. Insights can be drawn to optimize shipping strategies, identify popular product categories, and improve order management.

  2. Sales: By analyzing sales data, Contoso can assess key metrics like total revenue, profit margins, order priorities, and discounts. Correlations between these factors provide a clearer understanding of customer purchasing behaviors and the efficiency of discount strategies.

  3. Location: This captures the geographical dimension of sales and orders, including cities, states, regions, and customer segments. It helps Contoso identify high-performing regions, address low-performing areas, and personalize strategies for specific customer segments.

  4. Agent performance: With details on agents managing transactions, their commissions, and sales data, Contoso can evaluate individual agent performance, incentivize top performers, and design effective commission structures.

  5. Agent commissions: Incorporating commission data ensures transparency and enables better cost management. Understanding the correlation between commission rates and agent performance helps refine incentive systems.

Sample dataset

Contoso maintains its retail operations raw data in CSV format within ADLS Gen2. We utilize this data to create the bronze layer, and then use the bronze layer to create the materialized lake views which form the silver and gold layers of the medallion architecture. First download the sample CSV files from the Fabric samples repo.

Create the pipeline

The high-level steps are as follows:

  1. Bronze Layer: Ingest raw data in the form of CSV files into the lakehouse.
  2. Silver Layer: Cleanse data using materialized lake views.
  3. Gold Layer: Curate data for analytics and reporting using materialized lake views.

Create bronze layer of sales analytics medallion architecture

  1. Load the CSV files corresponding to different entities from the downloaded data into the Lakehouse. To do so, navigate to your lakehouse and upload the downloaded data into the Files section of the lakehouse. It creates a folder named tutorial.

  2. Next create a shortcut to it from the Tables section. Select ... next to the Tables section, and select New schema shortcut and then Microsoft OneLake. Choose the SalesLakehouse from the data source types. Expand the Files section and choose the tutorial folder and select Create. You can also use other alternate options to get data into the Lakehouse.

    Screenshot showing how to create a shortcut to get the data into tables.

  3. From the Tables section, rename the tutorial folder as bronze.

    Screenshot showing creating bronze layer.

Create silver and gold layers of medallion architecture

  1. Upload the downloaded the notebook file to your workspace.

    Screenshot showing silver materialized lake view creation.

  2. Open the Notebook from the Lakehouse. For more information, see Explore the lakehouse data with a notebook.

  3. Run all cells of the notebook using Spark SQL to create materialized lake views with data quality constraints. Once all cells are successfully executed, Refresh the SalesLakehouse source to view the newly created materialized lake views for silver and gold schema.

    Screenshot showing run notebook.

Schedule the pipeline

  1. Once the materialized lake views for silver and gold layers are created, navigate to the lakehouse and select Managed materialized lake view to see the lineage view. It's autogenerated based on dependencies, each dependent materialized lake view forms the nodes of the lineage.

    Screenshot showing materialized lake view.

    Screenshot showing creation of lineage.

  2. Select Schedule from the navigation ribbon. Turn On the refresh and configure schedule.

    Screenshot showing scheduling run the materialized lake views.

Monitoring and troubleshooting

  1. The dropdown menu lists the current and historical runs.

    Screenshot showing scheduling execution.

  2. By selecting any of the runs, you can find the materialized lake view details on right side panel. The bottom activity panel provides a high-level overview of node execution status.

    Screenshot showing execution details.

  3. Select any node in the lineage to see the node execution details and link to detailed logs. If the node status is Failed, then an error message will also be displayed.

    Screenshot showing execution detail logs.

  4. Selecting the Detailed logs link will redirect you to the Monitor Hub from where you can access Spark error logs for further troubleshooting.

    Screenshot showing spark logs.

  5. Select the Data quality report button on the ribbon of materialized lake views page, to create or view an autogenerated data quality report.