@Rahul - Thanks for the question and using MS Q&A platform.
When a self-hosted integration runtime is created within an Azure Data Factory or Synapse workspace, it is registered with the Azure service and assigned a unique identifier. This identifier is used to authenticate and authorize the self-hosted integration runtime to communicate with the Azure Data Factory or Synapse workspace.
When you move data between on-premises and the cloud, the activity uses a self-hosted integration runtime to transfer the data between an on-premises data source and the cloud.
Here is a high-level summary of the data-flow steps for copying with a self-hosted IR:
- A data developer first creates a self-hosted integration runtime within an Azure data factory or Synapse workspace by using the Azure portal or the PowerShell cmdlet. Then the data developer creates a linked service for an on-premises data store, specifying the self-hosted integration runtime instance that the service should use to connect to data stores.
- The self-hosted integration runtime node encrypts the credentials by using Windows Data Protection Application Programming Interface (DPAPI) and saves the credentials locally. If multiple nodes are set for high availability, the credentials are further synchronized across other nodes. Each node encrypts the credentials by using DPAPI and stores them locally. Credential synchronization is transparent to the data developer and is handled by the self-hosted IR.
- Azure Data Factory and Synapse pipelines communicate with the self-hosted integration runtime to schedule and manage jobs. Communication is via a control channel that uses a shared Azure Relay connection. When an activity job needs to be run, the service queues the request along with any credential information. It does so in case credentials aren't already stored on the self-hosted integration runtime. The self-hosted integration runtime starts the job after it polls the queue.
- The self-hosted integration runtime copies data between an on-premises store and cloud storage. The direction of the copy depends on how the copy activity is configured in the data pipeline. For this step, the self-hosted integration runtime directly communicates with cloud-based storage services like Azure Blob storage over a secure HTTPS channel.
In summary, the self-hosted integration runtime is authorized to communicate with the Azure Data Factory or Synapse workspace through a unique identifier assigned during registration and authentication is handled through encrypted credentials stored locally on the self-hosted integration runtime node.
For more details, refer to Create and configure a self-hosted integration runtime and On-premises data store credentials.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.