Edit

Share via


Quickstart: Create your first dataflow to get and transform data

Dataflows are a self-service, cloud-based, data preparation technology. In this article, you create your first dataflow, get data for your dataflow, then transform the data and publish the dataflow.

Prerequisites

The following prerequisites are required before you start:

Create a dataflow

In this section, you're creating your first dataflow.

  1. Navigate to your Microsoft Fabric workspace.

    Screenshot of the workspaces window where you navigate to your workspace.

  2. Select +New item, and then select Dataflow Gen2.

    Screenshot with the Dataflow Gen2 selection emphasized.

Get data

Let's get some data! In this example, you're getting data from an OData service. Use the following steps to get data in your dataflow.

  1. In the dataflow editor, select Get data and then select More.

    Screenshot with the Get Data option selected and More emphasized in the drop-down box.

  2. In Choose data source, select View more.

    Screenshot of Get data source with View more emphasized.

  3. In New source, select Other > OData as the data source.

    Screenshot of Get data source with the Other category and the OData connector emphasized.

  4. Enter the URL https://services.odata.org/v4/northwind/northwind.svc/, and then select Next.

    Screenshot of the OData data source where you enter the data URL.

  5. Select the Orders and Customers tables, and then select Create.

    Screenshot of the Power Query navigator with the Customers and Orders tables emphasized.

You can learn more about the get data experience and functionality in the get data overview.

Apply transformations and publish

You loaded your data into your first dataflow. Congratulations! Now it's time to apply a couple of transformations to bring this data into the shape we need.

You transform the data in the Power Query editor. You can find a detailed overview of the Power Query editor at The Power Query user interface, but this section takes you through the basic steps:

  1. Make sure the Data Profiling tools are turned on. Go to Home > Options > Global Options, then select all the options under Column profile.

    Screenshot of Global options with the Column profile selections emphasized.

    Also make sure you enable the diagram view using the Layout configurations under the View tab in the Power Query editor ribbon, or by selecting the diagram view icon on the lower right side of the Power Query window.

    Screenshot of the overall look of Power Query diagram view.

  2. Within the Orders table, calculate the total number of orders per customer: Select the CustomerID column in the data preview and then select Group By under the Transform tab in the ribbon.

    Screenshot showing the Orders table selected, and Group by emphasized in the Transform tab.

  3. You perform a count of rows as the aggregation within Group By. You can learn more about Group By capabilities in Grouping or summarizing rows.

    Screenshot of Group by, with the Count rows operation selected.

  4. After grouping data in the Orders table, we'll obtain a two-column table with CustomerID and Count as the columns.

    Screenshot of the two column table.

  5. Next, you want to combine data from the Customers table with the Count of Orders per customer: Select the Customers query in the Diagram View and use the "⋮" menu to access the Merge queries as new transformation.

    Screenshot of the dataflow editor, with the vertical ellipsis of the Customers table and Merge queries as new emphasized.

  6. Configure the Merge operation by selecting CustomerID as the matching column in both tables. Then select Ok.

    Screenshot of the Merge window.

    Screenshot of the Merge window, with the Left table for merge set to the Customers table and the Right table for merge set to the Orders table. The CustomerID column is selected for both the Customers and Orders tables. Also, the Join Kind is set to Left outer. All other selections are set to their default value.

  7. Now there's a new query with all columns from the Customers table and one column with nested data from the Orders table.

    Screenshot of the dataflows editor with the new Merge query added to the right of the Customers and Orders tables.

  8. Let's focus on just a few columns from the Customers table. To do this, turn on schema view by selecting the schema view button in the bottom-right corner of the dataflows editor.

    Screenshot of the dataflows editor with the schema view button emphasized in the bottom-right corner.

  9. In schema view, you'll see all the columns in your table. Select CustomerID, CompanyName, and Orders (2). Then, go to the Schema tools tab, select Remove columns, and choose Remove other columns. This keeps only the columns you want.

    Screenshot of the schema view showing all of the available column names, with the CustomerID, CompanyName, and Orders (2) columns emphasized.

    Screenshot of the schema tools menu with Remove other columns emphasized.

  10. The Orders (2) column holds extra details from the merge step. To see and use this data, select the Show data view button in the bottom-right corner, next to Show schema view. Then, in the Orders (2) column header, select the Expand Column icon and pick the Count column. This adds the order count for each customer to your table.

    Screenshot for using data view.

  11. Now let's rank your customers by how many orders they've made. Select the Count column, then go to the Add Column tab and select Rank column. This adds a new column showing each customer's rank based on their order count.

    Screenshot of the dataflows editor with the Count column selected.

  12. Keep the default settings in Rank Column. Then select OK to apply this transformation.

    Screenshot of the Rank window with all default settings displayed.

  13. Now rename the resulting query as Ranked Customers using the Query settings pane on the right side of the screen.

    Screenshot of the dataflows editor with the Ranked Customers name emphasized under the query settings properties.

  14. You're ready to set where your data goes. In the Query settings pane, scroll to the bottom and select Choose data destination.

    Screenshot of the dataflows editor with the location of the Data destination selection emphasized.

  15. You can send your results to a lakehouse if you have one, or skip this step if you don't. Here, you can pick which lakehouse and table to use for your data, and choose whether to add new data (Append) or replace what's there (Replace).

    Screenshot of the Connect to data destination window with lakehouse selected.

    Screenshot of the Choose destination settings window.

  16. Your dataflow is now ready to be published. Review the queries in the diagram view, and then select Publish.

    Screenshot of the dataflows editor with the Publish button on the lower-right side emphasized.

    Select Publish in the lower-right corner to save your dataflow. You'll go back to your workspace, where a spinner icon next to your dataflow name shows it's publishing. When the spinner disappears, your dataflow's ready to refresh!

    Important

    The first time you create a Dataflow Gen2 in a workspace, Fabric sets up some background items (Lakehouse and Warehouse) that help your dataflow run. These items are shared by all dataflows in the workspace, and you shouldn't delete them. They're not meant to be used directly and usually aren't visible in your workspace, but you might see them in other places like Notebooks or SQL analytics. Look for names that start with DataflowsStaging to spot them.

  17. In your workspace, select the Schedule Refresh icon.

    Screenshot of the workspace with the schedule refresh icon emphasized.

  18. Turn on the scheduled refresh, select Add another time, and configure the refresh as shown in the following screenshot.

    Screenshot showing how to select another time.

    Screenshot of the scheduled refresh options, with scheduled refresh turned on, the refresh frequency set to Daily, the Time zone set to coordinated universal time, and the Time set to 4:00 AM. The on button, the Add another time selection, the dataflow owner, and the apply button are all emphasized.

Clean up resources

If you're not going to continue to use this dataflow, delete the dataflow using the following steps:

  1. Navigate to your Microsoft Fabric workspace.

    Screenshot of the workspaces window where you navigate to your workspace.

  2. Select the vertical ellipsis next to the name of your dataflow and then select Delete.

    Screenshot with the three vertical dots and the delete option in the drop-down menu emphasized.

  3. Select Delete to confirm the deletion of your dataflow.

    Screenshot of the Delete dataflow window, with the Delete button emphasized.

The dataflow in this sample shows you how to load and transform data in Dataflow Gen2. You learned how to:

  • Create a Dataflow Gen2.
  • Transform data.
  • Configure destination settings for transformed data.
  • Run and schedule your data pipeline.

Advance to the next article to learn how to create your first data pipeline.