Edit

Share via


Load Sample data to Data Warehouse

In this tutorial, you build a data pipeline to move a Sample dataset to the Data Warehouse. This experience shows you a quick demo about how to use pipeline copy activity and how to load data into Data Warehouse.

Prerequisites

To get started, you must complete the following prerequisites:

  • A Microsoft Fabric tenant account with an active subscription. Create an account for free.
  • Make sure you have a Microsoft Fabric enabled Workspace: Create a workspace.
  • Make sure you have already created a Data Warehouse. To create it, refer to Create a Data Warehouse

Create a data pipeline

  1. Navigate to Power BI.

  2. Select the Power BI icon in the bottom left of the screen, then select Data factory to open homepage of Data Factory.

  3. Navigate to your Microsoft Fabric workspace. If you created a new workspace in the prior Prerequisites section, use this one.

    Screenshot of the workspaces window where you navigate to your workspace.

  4. Select Data pipeline and then input a pipeline name to create a new pipeline.

    Screenshot showing the new data pipeline button in the newly created workspace.

    Screenshot showing the name of creating a new pipeline.

Copy data using pipeline

In this session, you start to build your pipeline by following below steps about copying from a sample dataset provided by pipeline into Data Warehouse.

Step 1: Start with the Copy assistant

  1. Select Copy data assistant on the canvas to open the copy assistant tool to get started. Or Select Use copy assistant from the Copy data drop down list under the Activities tab on the ribbon.

    Screenshot showing the Copy data button on a new pipeline.

Step 2: Configure your source

  1. Choose the NYC Taxi - Green from the Sample data options for your data source.

    Screenshot showing the NYC Taxi - Green sample data selection in the Copy data assistant.

  2. In the Connect to data source section of the Copy data assistant, a preview of the sample data NYC Taxi - Green is displayed. Select Next to move on to the data destination.

    Screenshot showing a preview of the Bing COVID-19 sample data.

Step 3: Configure your destination

  1. Select the OneLake tab and choose an existing Warehouse.

    Screenshot showing the selection of the Warehouse destination.

  2. Configure and map your source data to the destination Warehouse table by entering Table, then select Next one more time.

    Screenshot showing the table name to create in the Warehouse destination.

  3. Configure other settings on Settings page. In this tutorial, select Next directly since you don't need to use staging and copy command.

    Screenshot showing the destination settings.

Step 4: Review and run your copy activity

  1. Review your copy activity settings in the previous steps and select Save + Run to start the activity. Or you can revisit the previous steps in the tool to edit your settings, if needed.

    Screenshot of the Review + create page of the Copy data assistant highlighting source and destination.

  2. The Copy activity is added to your new data pipeline canvas. All settings including advanced settings for the activity are available in the tabs below the pipeline canvas when the created Copy data activity is selected.

    Screenshot showing the completed Copy activity in pipeline canvas.

Schedule your data pipeline

  1. You can monitor the running process and check the results on the Output tab below the pipeline canvas. Select the run details button (with the glasses icon highlighted) to view the run details.

    Screenshot showing the Output tab of the pipeline run in-progress with the Details button highlighted in the run status.

  2. The run details show how much data was read and written and various other details about the run.

    Screenshot showing the run details window.

  3. You can also schedule the pipeline to run with a specific frequency as required. Below is an example scheduling the pipeline to run every 15 minutes. You can also specify the Start time and End time for your schedule. If you don't specify a start time, the start time is the time your schedule applies. If you don't specify an end time, your pipeline run will keep recurring every 15 minutes.

    Screenshot showing the schedule dialog for the pipeline with a 15-minute recurring schedule.

This sample shows you how to load sample data into a Data Warehouse using Data Factory in Microsoft Fabric. You learned how to:

  • Create a data pipeline.
  • Copy data using your pipeline.
  • Run and schedule your data pipeline.

Next, advance to learn more about monitoring your pipeline runs.