Communication between Azure Data Factory and Self Hosted Integration Runtime

Sukumar Vinnakota 316 Reputation points
2021-06-09T05:58:26.577+00:00

I want to know how Data Factory communicates to SHIR to establish communication with on-prem Data sources.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,639 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,646 Reputation points Moderator
    2021-06-09T09:24:54.81+00:00

    Hello @Sukumar Vinnakota ,

    Thanks for the question and using MS Q&A platform.

    The self-hosted integration runtime is a service running in Azure Data Factory, but you can add local compute nodes on local servers in your on-premises network. A connection is created between the nodes and the integration runtime within your Azure Data Factory (ADF) in Azure. Through this connection, ADF can reach your local data and copy it securely to the cloud. This set-up is very similar to the Power BI on-premises gateway. In fact, the self-hosted integration runtime used to be called the "data management gateway" in ADF V1.

    When you move data between on-premises and the cloud, the activity uses a self-hosted integration runtime to transfer the data between an on-premises data source and the cloud.

    Here is a high-level summary of the data-flow steps for copying with a self-hosted IR:

    103698-image.png

    1. A data developer first creates a self-hosted integration runtime within an Azure data factory by using the Azure portal or the PowerShell cmdlet. Then the data developer creates a linked service for an on-premises data store, specifying the self-hosted integration runtime instance that the service should use to connect to data stores.
    2. The self-hosted integration runtime node encrypts the credentials by using Windows Data Protection Application Programming Interface (DPAPI) and saves the credentials locally. If multiple nodes are set for high availability, the credentials are further synchronized across other nodes. Each node encrypts the credentials by using DPAPI and stores them locally. Credential synchronization is transparent to the data developer and is handled by the self-hosted IR.
    3. Azure Data Factory communicates with the self-hosted integration runtime to schedule and manage jobs. Communication is via a control channel that uses a shared Azure Relay connection. When an activity job needs to be run, Data Factory queues the request along with any credential information. It does so in case credentials aren't already stored on the self-hosted integration runtime. The self-hosted integration runtime starts the job after it polls the queue.
    4. The self-hosted integration runtime copies data between an on-premises store and cloud storage. The direction of the copy depends on how the copy activity is configured in the data pipeline. For this step, the self-hosted integration runtime directly communicates with cloud-based storage services like Azure Blob storage over a secure HTTPS channel.

    For more details, refer to Create and configure a self-hosted integration runtime.

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    3 people found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.