Use a dataflow in a pipeline

In this tutorial, you build a data pipeline to move OData from a Northwind source to a lakehouse destination and send an email notification when the pipeline is completed.

Prerequisites

To get started, you must complete the following prerequisites:

Create a Lakehouse

To start, you first need to create a lakehouse. A lakehouse is a data lake that is optimized for analytics. In this tutorial, you create a lakehouse that's used as a destination for the dataflow.

  1. Go to your Fabric enabled workspace.

    Screenshot of the Fabric enabled workspace emphasized.

  2. Select Lakehouse in the create menu.

    Screenshot of the create menu with Create Lakehouse emphasized.

  3. Enter a Name for the lakehouse.

  4. Select Create.

Now you've created a lakehouse and you can now set up the dataflow.

Create a dataflow

A dataflow is a reusable data transformation that can be used in a pipeline. In this tutorial, you create a dataflow that gets data from an OData source and writes the data to a lakehouse destination.

  1. Go to your Fabric enabled workspace.

    Screenshot of the Fabric enabled workspace.

  2. Select Dataflow Gen2 in the create menu.

    Screenshot of the Dataflow Gen2 selection under the new menu.

  3. Ingest the data from the OData source.

    1. Select Get data, and then select More.

      Screenshot of the Get data menu with More emphasized.

    2. From Choose data source, search for OData, and then select the OData connector.

      Screenshot of the Get data menu with OData emphasized.

    3. Enter the URL of the OData source. For this tutorial, use the OData sample service.

    4. Select Next.

    5. Select the Entity that you want to ingest. In this tutorial, use the Orders entity.

      Screenshot of the OData preview.

    6. Select Create.

Now that you've ingested the data from the OData source, you can set up the lakehouse destination.

To ingest the data to the lakehouse destination:

  1. Select Add data destination.

  2. Select Lakehouse.

    Screenshot of the Add output destination menu with lakehouse emphasized.

  3. Configure the connection you want to use to connect to the lakehouse. The default settings are fine.

  4. Select Next.

  5. Navigate to the workspace where you created the lakehouse.

  6. Select the lakehouse that you created in the previous steps.

    Screenshot of the selected lakehouse.

  7. Confirm the table name.

  8. Select Next.

  9. Confirm the update method and select Save settings.

    Screenshot of the update methods, with replace selected.

  10. Publish the dataflow.

    Important

    When the first Dataflow Gen2 is created in a workspace, Lakehouse and Warehouse items are provisioned along with their related SQL analytics endpoint and semantic models. These items are shared by all dataflows in the workspace and are required for Dataflow Gen2 to operate, shouldn't be deleted, and aren't intended to be used directly by users. The items are an implementation detail of Dataflow Gen2. The items aren't visible in the workspace, but might be accessible in other experiences such as the Notebook, SQL-endpoint, Lakehouse, and Warehouse experiences. You can recognize the items by their prefix in the name. The prefix of the items is `DataflowsStaging'.

Now that you've ingested the data to the lakehouse destination, you can set up your data pipeline.

Create a data pipeline

A data pipeline is a workflow that can be used to automate data processing. In this tutorial, you create a data pipeline that runs the Dataflow Gen2 that you created in the previous procedure.

  1. Navigate back to the workspace overview page and select Data Pipelines in the create menu.

    Screenshot of the Data Pipeline selection.

  2. Provide a Name for the data pipeline.

  3. Select the Dataflow activity.

    Screenshot of the dataflow activity emphasized.

  4. Select the Dataflow that you created in the previous procedure in the Dataflow dropdown list under Settings.

    Screenshot of the dataflow dropdown list.

  5. Add an Office 365 Outlook activity.

    Screenshot emphasizing how to select an Office 365 Outlook activity.

  6. Configure the Office 365 Outlook activity to send email notification.

    1. Authenticate with your Office 365 account.

    2. Select the Email address that you want to send the notification to.

    3. Enter a Subject for the email.

    4. Enter a Body for the email.

      Screenshot showing the Office 365 Outlook activity settings.

Run and schedule the data pipeline

In this section, you run and schedule the data pipeline. This schedule allows you to run the data pipeline on a schedule.

  1. Go to your workspace.

  2. Open the dropdown menu of the data pipeline that you created in the previous procedure, and then select Schedule.

    Screenshot of the pipeline menu with schedule emphasized.

  3. In Scheduled run, select On.

    Screenshot of scheduled run set to On.

  4. Provide the schedule you want to use to run the data pipeline.

    1. Repeat, for example, every Day or every Minute.
    2. When selected Daily, you can also select the Time.
    3. Start On a specific Date.
    4. End On a specific Date.
    5. Select the Timezone.
  5. Select Apply to apply the changes.

You've now created a data pipeline that runs on a schedule, refreshes the data in the lakehouse, and sends you an email notification. You can check the status of the data pipeline by going to the Monitor Hub. You can also check the status of the data pipeline by going to Data Pipeline and selecting the Run history tab in the dropdown menu.

This sample shows you how to use a dataflow in a pipeline with Data Factory in Microsoft Fabric. You learned how to:

  • Create a dataflow.
  • Create a pipeline invoking your dataflow.
  • Run and schedule your data pipeline.

Next, advance to learn more about monitoring your pipeline runs.