Lakehouse tutorial: Ingest data into the lakehouse

Article
01/16/2025

In this tutorial, you ingest more dimensional and fact tables from the Wide World Importers (WWI) into the lakehouse.

Prerequisites

If you don't have a lakehouse, you must create a lakehouse.

Ingest data

In this section, you use the Copy data activity of the Data Factory pipeline to ingest sample data from an Azure storage account to the Files section of the lakehouse you created earlier.

Select Workspaces in the left navigation pane, and then select your new workspace from the Workspaces menu. The items view of your workspace appears.
From the New item option in the workspace ribbon, select Data pipeline.
In the New pipeline dialog box, specify the name as IngestDataFromSourceToLakehouse and select Create. A new data factory pipeline is created and opened.
Next, set up an HTTP connection to import the sample World Wide Importers data into the Lakehouse. From the list of New sources, select View more, search for Http and select it.

In the Connect to data source window, enter the details from the table below and select Next.

Property	Value
URL	`https://assetsprod.microsoft.com/en-us/wwi-sample-dataset.zip`
Connection	Create a new connection
Connection name	wwisampledata
Data gateway	None
Authentication kind	Anonymous

Screenshot showing the parameters to configure the Http connection.

In the next step, enable the Binary copy and choose ZipDeflate (.zip) as the Compression type since the source is a .zip file. Keep the other fields at their default values and click Next.
In the Connect to data destination window, specify the Root folder as Files and click Next. This will write the data to the Files section of the lakehouse.
Choose the File format as Binary for the destination. Click Next and then Save+Run. You can schedule pipelines to refresh data periodically. In this tutorial, we only run the pipeline once. The data copy process takes approximately 10-15 minutes to complete.
You can monitor the pipeline execution and activity in the Output tab. You can also view detailed data transfer information by selecting the glasses icon next to the pipeline name, which appears when you hover over the name.
After the successful execution of the pipeline, go to your lakehouse (wwilakehouse) and open the explorer to see the imported data.
Verify that the folder WideWorldImportersDW is present in the Explorer view and contains data for all tables.
The data is created under the Files section of the lakehouse explorer. A new folder with GUID contains all the needed data. Rename the GUID to wwi-raw-data

To load incremental data into a lakehouse, see Incrementally load data from a data warehouse to a lakehouse.

Next step

Prepare and transform data

Additional resources

Documentation

Lakehouse tutorial - Build a report - Microsoft Fabric

After ingesting data, and using notebooks to transform and prepare the data, you create a Power BI data model and create a report.
Lakehouse tutorial - clean up resources - Microsoft Fabric

As a final step in the tutorial, clean up your resources. Learn how to delete individual reports, pipelines, warehouses, or remove the entire workspace.
Lakehouse tutorial - Create your first lakehouse - Microsoft Fabric

Learn how to create a lakehouse, ingest data into a table, transform it, and use the data to create reports.
Lakehouse tutorial - Prepare and transform lakehouse data - Microsoft Fabric

After ingesting raw data into your new lakehouse, you can transform it using notebooks and Spark runtime.
Lakehouse tutorial - Create a workspace - Microsoft Fabric

Learn how to create a workspace that you can use to create other items required by this end-to-end tutorial.
Lakehouse end-to-end scenario: overview and architecture - Microsoft Fabric

This article provides an overview of the lakehouse, including its architecture, the components involved in its implementation, and the semantic model.
What is a lakehouse? - Microsoft Fabric

A lakehouse is a collection of files, folders, and tables that represent a database over a data lake used by Apache Spark and SQL for big data processing.
Create a lakehouse in Microsoft Fabric - Microsoft Fabric

Learn how to create a lakehouse from the Data Engineering homepage, the Workspace view, or the Create page.

Training

Module

Load data into a relational data warehouse - Training

Learn how to load tables in a relational data warehouse that is hosted in a dedicated SQL pool in Azure Synapse Analytics.

Certification

Microsoft Certified: Fabric Data Engineer Associate - Certifications

As a Fabric Data Engineer, you should have subject matter expertise with data loading patterns, data architectures, and orchestration processes.

Events

FabCon Vegas

Mar 31, 11 PM - Apr 2, 11 PM

The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.

Share via

Lakehouse tutorial: Ingest data into the lakehouse

Prerequisites

Ingest data

Next step

Feedback

Additional resources