Edit

Share via


Copy sample data into Lakehouse and transform with a dataflow with Data Factory in Microsoft Fabric

In this tutorial, we provide end-to-end steps to a common scenario that uses the pipeline to load source data into Lakehouse at high performance copy and then transform the data by dataflow to make users can easily load and transform data.

Prerequisites

A Microsoft Fabric enabled workspace. If you don't already have one, refer to the article Create a workspace.

Create a data pipeline

  1. Switch to the Data Factory experience.

  2. Select New and then Data pipeline, and then input a name for your pipeline.

    Screenshot showing the new Data pipeline button.

    Screenshot showing the pipeline name dialog.

Use a pipeline to load sample data into Lakehouse

Use the following steps to load sample data into Lakehouse.

Step 1: Start with the Copy assistant

Select Copy data assistant on the canvas to open the copy assistant tool to get started. Or Select Use copy assistant from the Copy data drop down list under the Activities tab on the ribbon.

Screenshot showing the Copy data button on a new pipeline.

Step 2: Configure your source

  1. Choose the Public Holidays from the Sample data options for your data source, and then select Next.

    Screenshot showing the Public Holidays sample data selection in the Copy data assistant.

  2. In the Connect to data source section of the Copy data assistant, a preview of the sample data is displayed. Select Next to move on to the data destination.

    Screenshot showing a preview of the Public Holiday sample data.

Step 3: Configure your destination

  1. Select Lakehouse.

    Screenshot showing the selection of the Lakehouse destination.

  2. Enter LHDemo for the Lakehouse name, then select Create and connect.

    Screenshot showing the specified name for the new Lakehouse.

  3. Configure and map your source data to the destination Lakehouse table. Select Tables for the Root folder and Load to new table for Load settings. Provide a Table name and select Next.

    Screenshot showing the table name to create in the Lakehouse destination.

Step 4: Review and create your copy activity

  1. Review your copy activity settings in the previous steps and select Start data transfer immediately. Then select Save + Run to run the new pipeline.

    Screenshot showing the Review + save window of the copy data assistant with the Start data transfer immediately checkbox checked.

  2. Once finished, the copy activity is added to your new data pipeline canvas, and the pipeline automatically runs to load data into Lakehouse.

    Screenshot showing the created pipeline with Copy activity and the current run in progress.

  3. You can monitor the running process and check the results on the Output tab below the pipeline canvas. Hover over the name in the output row to see the Run details button (an icon of a pair of glasses, highlighted) to view the run details.

    Screenshot showing the run details button on the pipeline Output tab.

  4. The run details show 69,557 rows were read and written, and various other details about the run, including a breakdown of the duration.

    Screenshot showing the run details for the successful pipeline run.

Use a Dataflow Gen2 to transform data in the Lakehouse

You now have a Lakehouse with sample data loaded. Next, you'll use a dataflow to transform the data. Dataflows are a code-free way to transform data at scale.

  1. Select your workspace, then select New item > Dataflow Gen2.

    Screenshot showing the new Dataflow button.

  2. Select the get data dropdown and select More....

    Screenshot showing the get data dropdown.

  3. Search for Lakehouse and select Lakehouse.

    Screenshot showing the Lakehouse in Microsoft Fabric option.

  4. Sign-in and select Next to continue.

    Screenshot showing the sign-in dialog.

  5. Select the table you created in the previous step and select Create.

    Screenshot showing the selection of the table created in the previous step.

  6. Review the data preview in the editor.

    Screenshot showing the data preview in the dataflow editor.

  7. Apply a filter to the dataflow to only include rows where the Countryorregion column is equal to Belgium.

    Screenshot showing the filter applied to the dataflow.

  8. Add a data destination to the query by selecting Add data destination and then Lakehouse in Microsoft Fabric.

    Screenshot showing the add data destination button.

  9. Sign-in and select Next to continue.

    Screenshot showing the sign-in dialog.

  10. Create a new table called BelgiumPublicHolidays and select Next.

    Screenshot showing the create new table dialog.

  11. Review the settings and select Save settings.

    Screenshot showing the review settings dialog.

  12. Publish the dataflow by selecting Publish.

    Screenshot showing the publish button.

  13. After the dataflow is published, select Refresh now to run the dataflow.

    Screenshot showing the refresh now button.

After the refresh is complete, you can view the data in the Lakehouse table. You can also use this data now to create reports, dashboards, and more.

This sample shows you how to copy sample data to Lakehouse and transform the data with a dataflow using Data Factory in Microsoft Fabric. You learned how to:

  • Create a data pipeline.
  • Use the pipeline to load sample data into Lakehouse.
  • Use dataflow to transform data in the Lakehouse.

Next, advance to learn more about monitoring your pipeline runs.