Copy sample data into Lakehouse and transform with a dataflow with Data Factory in Microsoft Fabric

In this tutorial, we provide end-to-end steps to a common scenario that uses the pipeline to load source data into Lakehouse at high performance copy and then transform the data by dataflow to make users can easily load and transform data.

Prerequisites

A Microsoft Fabric enabled workspace. If you don't already have one, refer to the article Create a workspace.

Create a data pipeline

  1. Switch to the Data Factory experience.

    Screenshot showing the selection of the Data Factory experience.

  2. Select New and then Data pipeline, and then input a name for your pipeline.

    Screenshot showing the new Data pipeline button.

    Screenshot showing the pipeline name dialog.

Use a pipeline to load sample data into Lakehouse

Use the following steps to load sample data into Lakehouse.

Step 1: Start with the Copy assistant

Select Copy Data on the canvas, to open the Copy assistant tool to get started.

Screenshot showing the Copy data button on a new pipeline.

Step 2: Configure your source

  1. Choose the Public Holidays from the Sample data options for your data source, and then select Next.

    Screenshot showing the Public Holidays sample data selection in the Copy data assistant.

  2. In the Connect to data source section of the Copy data assistant, a preview of the sample data is displayed. Select Next to move on to the data destination.

    Screenshot showing a preview of the Public Holiday sample data.

Step 3: Configure your destination

  1. Select the Workspace tab and choose Lakehouse. Then select Next.

    Screenshot showing the selection of the Lakehouse destination.

  2. Select Create new Lakehouse and enter LHDemo for the name, then select Next.

    Screenshot showing the Create new lakehouse option with the name LHDemo specified for the new Lakehouse.

  3. Configure and map your source data to the destination Lakehouse table by entering Table name, then select Next one more time.

    Screenshot showing the table name to create in the Lakehouse destination.

Step 4: Review and create your copy activity

  1. Review your copy activity settings in the previous steps and select Start data transfer immediately. Then select Save + Run to run the new pipeline.

    Screenshot showing the Review + save window of the copy data assistant with the Start data transfer immediately checkbox checked.

  2. Once finished, the copy activity is added to your new data pipeline canvas, and the pipeline automatically runs to load data into Lakehouse.

    Screenshot showing the created pipeline with Copy activity and the current run in progress.

  3. You can monitor the running process and check the results on the Output tab below the pipeline canvas. Hover over the name in the output row to see the Run details button (an icon of a pair of glasses, highlighted) to view the run details.

    Screenshot showing the run details button on the pipeline Output tab.

  4. The run details show 69,557 rows were read and written, and various other details about the run, including a breakdown of the duration.

    Screenshot showing the run details for the successful pipeline run.

Use a dataflow gen2 to transform data in the Lakehouse

You now have a Lakehouse with sample data loaded. Next, you'll use a dataflow to transform the data. Dataflows are a code-free way to transform data at scale.

  1. Select New and then Dataflow Gen2.

    Screenshot showing the new Dataflow button.

  2. Click on get data dropdown and select More....

    Screenshot showing the get data dropdown.

  3. Search for Lakehouse and select Lakehouse in Microsoft Fabric.

    Screenshot showing the Lakehouse in Microsoft Fabric option.

  4. Sign-in and click Next to continue.

    Screenshot showing the sign-in dialog.

  5. Select the table you created in the previous step and click Create.

    Screenshot showing the selection of the table created in the previous step.

  6. Review the data preview in the editor.

    Screenshot showing the data preview in the dataflow editor.

  7. Apply a filter to the dataflow to only include rows where the Countryorregion column is equal to Belgium.

    Screenshot showing the filter applied to the dataflow.

  8. Add a data destination to the query by selecting Add data destination and then Lakehouse in Microsoft Fabric.

    Screenshot showing the add data destination button.

  9. Sign-in and click Next to continue.

    Screenshot showing the sign-in dialog.

  10. Create a new table called BelgiumPublicHolidays and click Next.

    Screenshot showing the create new table dialog.

  11. Review the settings and click Save settings.

    Screenshot showing the review settings dialog.

  12. Publish the dataflow by clicking Publish.

    Screenshot showing the publish button.

  13. After the dataflow is published, click Refresh now to run the dataflow.

    Screenshot showing the refresh now button.

After the refresh is complete, you can view the data in the Lakehouse table. You can also use this data now to create reports, dashboards, and more.

This sample shows you how to copy sample data to Lakehouse and transform the data with a dataflow using Data Factory in Microsoft Fabric. You learned how to:

  • Create a data pipeline.
  • Use the pipeline to load sample data into Lakehouse.
  • Use dataflow to transform data in the Lakehouse.

Next, advance to learn more about monitoring your pipeline runs.