Use a dataflow in a pipeline

Article
12/18/2024

In this tutorial, you build a data pipeline to move OData from a Northwind source to a lakehouse destination and send an email notification when the pipeline is completed.

Prerequisites

To get started, you must complete the following prerequisites:

Make sure you have a Microsoft Fabric enabled Workspace that isn't the default My Workspace.

Create a Lakehouse

To start, you first need to create a lakehouse. A lakehouse is a data lake that is optimized for analytics. In this tutorial, you create a lakehouse that's used as a destination for the dataflow.

Go to your Fabric enabled workspace.
Select Lakehouse in the create menu.
Enter a Name for the lakehouse.
Select Create.

Now you've created a lakehouse and you can now set up the dataflow.

Create a dataflow

A dataflow is a reusable data transformation that can be used in a pipeline. In this tutorial, you create a dataflow that gets data from an OData source and writes the data to a lakehouse destination.

Go to your Fabric enabled workspace.
Select Dataflow Gen2 in the create menu.
Ingest the data from the OData source.
1. Select Get data, and then select More.
2. From Choose data source, search for OData, and then select the OData connector.
3. Enter the URL of the OData source. For this tutorial, use the OData sample service.
4. Select Next.
5. Select the Entity that you want to ingest. In this tutorial, use the Orders entity.
6. Select Create.

Now that you've ingested the data from the OData source, you can set up the lakehouse destination.

To ingest the data to the lakehouse destination:

Select Add data destination.
Select Lakehouse.
Configure the connection you want to use to connect to the lakehouse. The default settings are fine.
Select Next.
Navigate to the workspace where you created the lakehouse.
Select the lakehouse that you created in the previous steps.
Confirm the table name.
Select Next.
Confirm the update method and select Save settings.
Publish the dataflow.

Important

When the first Dataflow Gen2 is created in a workspace, Lakehouse and Warehouse items are provisioned along with their related SQL analytics endpoint and semantic models. These items are shared by all dataflows in the workspace and are required for Dataflow Gen2 to operate, shouldn't be deleted, and aren't intended to be used directly by users. The items are an implementation detail of Dataflow Gen2. The items aren't visible in the workspace, but might be accessible in other experiences such as the Notebook, SQL-endpoint, Lakehouse, and Warehouse experiences. You can recognize the items by their prefix in the name. The prefix of the items is `DataflowsStaging'.

Now that you've ingested the data to the lakehouse destination, you can set up your data pipeline.

Create a data pipeline

A data pipeline is a workflow that can be used to automate data processing. In this tutorial, you create a data pipeline that runs the Dataflow Gen2 that you created in the previous procedure.

Navigate back to the workspace overview page and select Data Pipelines in the create menu.
Provide a Name for the data pipeline.
Select the Dataflow activity.
Select the Dataflow that you created in the previous procedure in the Dataflow dropdown list under Settings.
Add an Office 365 Outlook activity.
Configure the Office 365 Outlook activity to send email notification.
1. Authenticate with your Office 365 account.
2. Select the Email address that you want to send the notification to.
3. Enter a Subject for the email.
4. Enter a Body for the email.

Run and schedule the data pipeline

In this section, you run and schedule the data pipeline. This schedule allows you to run the data pipeline on a schedule.

Go to your workspace.
Open the dropdown menu of the data pipeline that you created in the previous procedure, and then select Schedule.
In Scheduled run, select On.
Provide the schedule you want to use to run the data pipeline.
1. Repeat, for example, every Day or every Minute.
2. When selected Daily, you can also select the Time.
3. Start On a specific Date.
4. End On a specific Date.
5. Select the Timezone.
Select Apply to apply the changes.

You've now created a data pipeline that runs on a schedule, refreshes the data in the lakehouse, and sends you an email notification. You can check the status of the data pipeline by going to the Monitor Hub. You can also check the status of the data pipeline by going to Data Pipeline and selecting the Run history tab in the dropdown menu.

This sample shows you how to use a dataflow in a pipeline with Data Factory in Microsoft Fabric. You learned how to:

Create a dataflow.
Create a pipeline invoking your dataflow.
Run and schedule your data pipeline.

Next, advance to learn more about monitoring your pipeline runs.

How to monitor pipeline runs in Microsoft Fabric

Share via

Use a dataflow in a pipeline

Prerequisites

Create a Lakehouse

Create a dataflow

Create a data pipeline

Run and schedule the data pipeline

Feedback

Additional resources

Share via

Use a dataflow in a pipeline

Prerequisites

Create a Lakehouse

Create a dataflow

Create a data pipeline

Run and schedule the data pipeline

Related content

Feedback

Additional resources