How to access on-premises data sources in Data Factory for Microsoft Fabric

Data Factory for Microsoft Fabric is a powerful cloud-based data integration service that allows you to create, schedule, and manage workflows for various data sources. In scenarios where your data sources are located on-premises, Microsoft provides the On-Premises Data Gateway to securely bridge the gap between your on-premises environment and the cloud. This document guides you through the process of accessing on-premises data sources within Data Factory for Microsoft Fabric using the On-Premises Data Gateway.

Create an on-premises data gateway

  1. An on-premises data gateway is a software application designed to be installed within a local network environment. It provides a means to directly install the gateway onto your local machine. For detailed instructions on how to download and install the on-premises data gateway, refer to Install an on-premises data gateway.

    Screenshot showing the on-premises data gateway setup.

  2. Sign-in using your user account to access the on-premises data gateway, after which it's prepared for utilization.

    Screenshot showing the on-premises data gateway setup after the user signed in.

Create a connection for your on-premises data source

  1. Navigate to the admin portal and select the settings button (an icon that looks like a gear) at the top right of the page. Then choose Manage connections and gateways from the dropdown menu that appears.

    Screenshot showing the Settings menu with Manage connections and gateways highlighted.

  2. On the New connection dialog that appears, select On-premises and then provide your gateway cluster, along with the associated resource type and relevant information.

    Screenshot showing the New connection dialog with On-premises selected.

Connect your on-premises data source to a Dataflow Gen2 in Data Factory for Microsoft Fabric

  1. Go to your workspace and create a Dataflow Gen2.

    Screenshot showing a demo workspace with the new Dataflow Gen2 option highlighted.

  2. Add a new source to the dataflow and select the connection established in the previous step.

    Screenshot showing the Connect to data source dialog in a Dataflow Gen2 with an on-premises source selected.

  3. You can use the Dataflow Gen2 to perform any necessary data transformations based on your requirements.

    Screenshot showing the Power Query editor with some transformations applied to the sample data source.

  4. Use the Add data destination button on the Home tab of the Power Query editor to add a destination for your data from the on-premises source.

    Screenshot showing the Power Query editor with the Add data destination button selected, showing the available destination types.

  5. Publish the Dataflow Gen2.

    Screenshot showing the Power Query editor with the Publish button highlighted.

Now you've created a Dataflow Gen2 to load data from an on-premises data source into a cloud destination.

Using on-premises data in a pipeline (Preview)

  1. Go to your workspace and create a data pipeline.

    Screenshot showing how to create a new data pipeline.

Note

You need to configure the firewall to allow outbound connections *.frontend.clouddatahub.net from the gateway for Fabric pipeline capabilities.

  1. From the Home tab of the pipeline editor, select Copy data and then Use copy assistant. Add a new source to the activity in the assistant's Choose data source page, then select the connection established in the previous step.

    Screenshot showing where to choose a new data source from the Copy data activity.

  2. Select a destination for your data from the on-premises data source.

    Screenshot showing where to choose the data destination in the Copy activity.

  3. Run the pipeline.

    Screenshot showing where to run the pipeline in the pipeline editor window.

Now you've created and ran a pipeline to load data from an on-premises data source into a cloud destination.

These are the connectors currently supported by Fabric Pipeline when utilizing an on-premises data gateway:

  • ADLS Gen1 for Cosmos Structured Stream
  • ADLS Gen2 for Cosmos Structured Stream
  • Amazon S3
  • Amazon S3 Compatible Storage
  • Amazon RDS for SQL Server
  • Azure Blob Storage
  • Azure Cosmos DB (SQL API)
  • Azure Database for PostgreSQL
  • Azure Data Explorer
  • Azure Data Lake Storage Gen2
  • Azure SQL Database
  • Azure SQL Managed Instance
  • Azure Synapse Analytics
  • Azure Table Storage
  • Dataverse
  • DB2
  • Dynamics 365
  • Dynamics CRM
  • Microsoft Fabric Warehouse
  • File System
  • FTP
  • Generic HTTP
  • Generic OData
  • Generic ODBC
  • Google Cloud Storage
  • KQL Database
  • Microsoft Fabric Lakehouse
  • MongoDB
  • MongoDB Atlas
  • SAP HANA
  • SFTP
  • SharePoint Online List
  • SQL Server