How does data factory handle guid

Question

How does data factory handle guid

Himanshu Sinha 11

How would you design a data pipeline in Azure Data Factory to move data from an on-premises SQL Server to an Azure Data Lake, ensuring data security, fault tolerance, and performance optimization?

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-27T08:58:17.4+00:00

@Himanshu Sinha

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

2 answers

Your answer

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-27T08:58:17.4+00:00

@Himanshu Sinha

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

would need some more clarity on your ask to provide proper details but based on details :

As you need to sync data from On Prem SQL server, you would need a self hosted IR and you should leverage windows auth as the best form of authentication.

To connect to ADLS, best way would be to use managed identity as the most secure form.

For better performance, rather than syncing the full data daily, you can use delta data approach via watermark functionality.

What are you expecting w.r.t data security and fault tolerance?

Answer 2

hi Himanshu, thanks for posting this on the Q&A,

u gotta set up a self-hosted integration runtime. that's ur bridge between on-prem sql server and the cloud Set up a self-hosted IR.

its secure and plays nice with sql server. when connecting to azure data lake, managed identity is ur best friend, no passwords lying around, just clean secure access. check how it works here: Managed identity for Data Lake Storage.

now about making it fast and smart! instead of dumping all data every time, use watermark tables. they track what changed since last run. u basically add a column like last_updated and only grab new or modified records. saves time and money )) Incremental loading with watermark.

for fault tolerance, turn on retries in ur pipeline activities. data factory can automatically try again if something fails. and for extra safety, maybe add some alerts when things go sideways.

worth looking into partitioning ur data in data lake, can make queries way faster later.

if u get fancy graphs showing how long each step takes, where bottlenecks are… super useful when tuning performance. Monitor and manage pipelines.

what else… aha! security-wise, encrypt everything in transit and at rest. data factory handles most of it automatically, but check ur sql server and data lake settings.

any specific part u want to dive deeper into? happy to explain more ))

ps: if u ever need to sync huge amounts of data, maybe look into data factory’s parallel copy feature. can speed things up big time Copy performance optimization.

rgds,

Alex

Share via

How does data factory handle guid

2 answers

Your answer