Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this tutorial, you build a dataflow to move data from a Northwind OData source to an ADLS Gen2 destination, and then read that data in a notebook in Databricks.
Prerequisites
To get started, you must complete the following prerequisites:
- Make sure you have a Microsoft Fabric enabled Workspace that isn't the default My Workspace.
- Make sure you have a Databricks workspace set up. For more information, see Create a Databricks workspace in Microsoft Azure.
- Make sure you have access to an ADLS Gen2 storage account. For more information, see Create a storage account
Create a new Dataflow Gen2
To create a new Dataflow Gen2 in your Fabric workspace, follow these steps:
In your Fabric workspace, select New item > Dataflow Gen2.
When the Dataflow Gen2 canvas opens, you'll see the Power Query editor interface where you can start building your data transformation flow.
Connect to the Northwind OData source
Next, you'll connect to the Northwind OData source to retrieve sample data:
In the Power Query editor, select Get data from the ribbon.
In the Choose data source dialog, search for "OData" and select OData.
In the OData dialog, enter the following URL:
https://services.odata.org/V3/Northwind/Northwind.svc/Select OK to connect to the OData source using an anonymous connection.
In the Navigator window, you'll see the available tables from the Northwind database. For this tutorial, select the Customers and Orders tables.
Select Transform data to proceed to the data transformation phase.
Transform the data
Now you'll apply some basic transformations to prepare the data:
With the Customers table selected, you can see a preview of the data. Remove unnecessary columns by selecting the columns you want to keep:
- CustomerID
- CompanyName
- ContactName
- Country
- City
Right-click on any of the selected columns and choose Remove Other Columns.
Switch to the Orders table and keep the following columns:
- OrderID
- CustomerID
- OrderDate
- ShippedDate
- Freight
Apply the same Remove Other Columns operation.
Configure ADLS Gen2 destination
Now you'll configure the destination to write data to your ADLS Gen2 storage account:
Select the Customers query in the Queries pane.
In the Data destination section that appears at the bottom, select + to add a new destination.
From the destination options, select Azure Data Lake Storage Gen2.
In the Connect to data destination dialog, configure the connection settings:
Connection settings:
- URL: Enter your ADLS Gen2 storage account URL in the format:
https://[storageaccountname].dfs.core.windows.net
Connection credentials:
- Connection: Select Create new connection from the dropdown
- Connection name: Enter a descriptive name for this connection (e.g., "ADLS Gen2 Connection")
- Data gateway: Select (none) for cloud-based storage
- Authentication kind: Select Organizational account to use your Microsoft 365 credentials
- Privacy Level: Select None for this tutorial
Note
You'll see your current signed-in account displayed. You can switch accounts if needed by selecting Switch account.
- URL: Enter your ADLS Gen2 storage account URL in the format:
Select Next to continue.
In the Choose destination target dialog, configure the destination settings:
On the left side, you'll see your storage account structure. Navigate to and select your desired container (e.g., "mydatacontainer").
On the right side, configure the file settings:
- File name: Enter a name for your file (e.g., "Customers.csv"). The system will show a preview that "A new file will be created in Azure Data Lake Storage Gen2"
- File format: Select Delimited text from the dropdown
- File origin: Select 65001: Unicode (UTF-8) for proper character encoding
- Delimiter: Select Comma as the field separator
Select Next to proceed to the destination settings configuration.
In the Choose destination settings dialog, review the column mapping and settings. You can keep all the default settings for this tutorial.
Select Save settings to confirm the destination configuration.
Repeat steps 1-9 for the Orders query, using a similar file name like "Orders.csv".
Save and run the dataflow
Select Save and run from the ribbon to save and execute your dataflow immediately.
Monitor the execution status. Once completed, your data will be available in the ADLS Gen2 storage account as CSV files.
Set up Databricks notebook
Now you'll create a Databricks notebook to read the data from ADLS Gen2:
Create a new notebook in Databricks
In your Databricks workspace, select Create > Notebook.
Give your notebook a name, such as "Northwind-Data-Analysis", and select Python as the language.
Configure ADLS Gen2 connection
In the first cell of your notebook, add the following code to configure the connection to your ADLS Gen2 account:
# Configure ADLS Gen2 connection storage_account_name = "your_storage_account_name" storage_account_key = "your_storage_account_key" container_name = "mydatacontainer" # Use the same container name you configured in the dataflow spark.conf.set( f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net", storage_account_key )Important
Replace
your_storage_account_nameandyour_storage_account_keywith your actual ADLS Gen2 credentials. For production environments, consider using Azure Key Vault or other secure credential management methods.Run the cell by pressing Shift + Enter.
Read the CSV files
Note
The file paths in the examples below assume your files are stored directly in the container root. Adjust the paths according to how you configured your destinations in the dataflow. For example, if you specified a folder structure or different file names during the destination setup, update the paths accordingly.
In a new cell, add code to read the Customers data:
# Read Customers data customers_path = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/Customers.csv" customers_df = spark.read.format("csv") \ .option("header", "true") \ .option("inferSchema", "true") \ .load(customers_path) # Display the data customers_df.show(10)In another cell, read the Orders data:
# Read Orders data orders_path = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/Orders.csv" orders_df = spark.read.format("csv") \ .option("header", "true") \ .option("inferSchema", "true") \ .load(orders_path) # Display the data orders_df.show(10)
Verify the solution
To verify that everything is working correctly:
Check ADLS Gen2: Navigate to your storage account in the Azure portal and verify that the CSV files are present in the specified containers and folders.
Monitor Dataflow: In your Fabric workspace, check the dataflow refresh history to ensure successful execution.
Validate Data in Databricks: Run the notebook cells and verify that data is being read correctly from ADLS Gen2.
Clean up resources
When you're finished with this tutorial, you can delete the resources to avoid incurring additional charges:
- Delete the Dataflow Gen2 from your Fabric workspace
- Remove the CSV files from your ADLS Gen2 storage account
- Delete the Databricks notebook
Related content
This tutorial showed you how to use a Dataflow Gen2 to extract data from an OData source, load it into ADLS Gen2, and analyze it in Databricks. You learned how to:
- Create a Dataflow Gen2 in Microsoft Fabric
- Connect to an OData source (Northwind database)
- Transform and clean data using Power Query
- Configure ADLS Gen2 as a destination for CSV files
- Set up a Databricks notebook with ADLS Gen2 connectivity
- Read and analyze data from ADLS Gen2 in Databricks
Next, advance to learn more about monitoring your dataflow runs and building more complex data pipelines.