Copy sample data into Lakehouse and transform with a dataflow with Data Factory in Microsoft Fabric
In this tutorial, we provide end-to-end steps to a common scenario that uses the pipeline to load source data into Lakehouse at high performance copy and then transform the data by dataflow to make users can easily load and transform data.
A Microsoft Fabric enabled workspace. If you don't already have one, refer to the article Create a workspace.
Create a data pipeline
Switch to the Data Factory experience.
Select New and then Data pipeline, and then input a name for your pipeline.
Use a pipeline to load sample data into Lakehouse
Use the following steps to load sample data into Lakehouse.
Step 1: Start with the Copy assistant
Select Copy Data on the canvas, to open the Copy assistant tool to get started.
Step 2: Configure your source
Choose the Public Holidays from the Sample data options for your data source, and then select Next.
In the Connect to data source section of the Copy data assistant, a preview of the sample data is displayed. Select Next to move on to the data destination.
Step 3: Configure your destination
Select the Workspace tab and choose Lakehouse. Then select Next.
Select Create new Lakehouse and enter LHDemo for the name, then select Next.
Configure and map your source data to the destination Lakehouse table by entering Table name, then select Next one more time.
Step 4: Review and create your copy activity
Review your copy activity settings in the previous steps and select Start data transfer immediately. Then select Save + Run to run the new pipeline.
Once finished, the copy activity is added to your new data pipeline canvas, and the pipeline automatically runs to load data into Lakehouse.
You can monitor the running process and check the results on the Output tab below the pipeline canvas. Hover over the name in the output row to see the Run details button (an icon of a pair of glasses, highlighted) to view the run details.
The run details show 69,557 rows were read and written, and various other details about the run, including a breakdown of the duration.
Use a dataflow gen2 to transform data in the Lakehouse
You now have a Lakehouse with sample data loaded. Next, you'll use a dataflow to transform the data. Dataflows are a code-free way to transform data at scale.
Select New and then Dataflow Gen2.
Click on get data dropdown and select More....
Search for Lakehouse and select Lakehouse in Microsoft Fabric.
Sign-in and click Next to continue.
Select the table you created in the previous step and click Create.
Review the data preview in the editor.
Apply a filter to the dataflow to only include rows where the Countryorregion column is equal to Belgium.
Add a data destination to the query by selecting Add data destination and then Lakehouse in Microsoft Fabric.
Sign-in and click Next to continue.
Create a new table called BelgiumPublicHolidays and click Next.
Review the settings and click Save settings.
Publish the dataflow by clicking Publish.
After the dataflow is published, click Refresh now to run the dataflow.
After the refresh is complete, you can view the data in the Lakehouse table. You can also use this data now to create reports, dashboards, and more.
This sample shows you how to copy sample data to Lakehouse and transform the data with a dataflow using Data Factory in Microsoft Fabric. You learned how to:
- Create a data pipeline.
- Use the pipeline to load sample data into Lakehouse.
- Use dataflow to transform data in the Lakehouse.
Next, advance to learn more about monitoring your pipeline runs.