How to Run Azure Data Factory Over a Private Network to Copy Data from Google Cloud Storage to Azure Blob Storage

Nikunj Patel 20 Reputation points
2025-01-08T15:24:35.03+00:00

Scenario:

Azure and Google Cloud Platform (GCP) are connected through a VPN, and there is a requirement to copy data from Google Cloud Storage (GCS) to Azure Blob Storage in batch mode. One approach being considered is using Azure Data Factory (ADF) for the data transfer.

Question:

While setting up a new linked service in Azure Data Factory to connect to Google Cloud Storage, the configuration prompts to use the Google Cloud Storage API, which typically routes traffic over the public internet. How can we ensure that ADF copies data from GCS to Azure Blob Storage over the private network (VPN or Interconnect) instead of the public internet? Is there a way to configure ADF to utilize the private connection for this regular data transfer?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

Accepted answer
  1. Nandan Hegde 36,146 Reputation points MVP Volunteer Moderator
    2025-01-08T17:40:36.5+00:00

1 additional answer

Sort by: Most helpful
  1. Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator
    2025-01-15T13:59:26.2533333+00:00

    Hi @Nikunj Patel
    Thanks for the question and using MS Q&A platform.
    The self-hosted IR (Integration Runtime) does not need to be deployed in either GCP or Azure. It can be deployed on a machine in your on-premises network that has a private connection (VPN or ExpressRoute) to Azure.  

    The self-hosted IR acts as a bridge between Azure Data Factory and your on-premises data sources. It securely transfers data between your on-premises storage and Azure storage without going over the public internet.

    To connect the self-hosted IR to Google Cloud Storage, you configure a linked service in Azure Data Factory to connect to Google Cloud Storage. Then, specify the self-hosted IR as the integration runtime for this linked service. When the data transfer activity is triggered in Azure Data Factory, it runs on the self-hosted IR in your on-premises network. Finally, the self-hosted IR uses the private connection (VPN or ExpressRoute) to connect to Google Cloud Storage and transfer the data.
    For more details refer to this: https://learn.microsoft.com/en-us/azure/data-factory/connector-google-cloud-storage?tabs=data-factory#copy-data-from-google-cloud-storage-using-azure-data-factory-or-synapse-analytics
    Hope this helps. Do let us know if you have any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.