How does data factory handle guid

Himanshu Sinha 11 Reputation points
2025-06-26T05:31:59.26+00:00

How would you design a data pipeline in Azure Data Factory to move data from an on-premises SQL Server to an Azure Data Lake, ensuring data security, fault tolerance, and performance optimization?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Nandan Hegde 36,146 Reputation points MVP Volunteer Moderator
    2025-06-26T06:27:45.8833333+00:00

    would need some more clarity on your ask to provide proper details but based on details :

    As you need to sync data from On Prem SQL server, you would need a self hosted IR and you should leverage windows auth as the best form of authentication.

    To connect to ADLS, best way would be to use managed identity as the most secure form.

    For better performance, rather than syncing the full data daily, you can use delta data approach via watermark functionality.

    What are you expecting w.r.t data security and fault tolerance?

    0 comments No comments

  2. Alex Burlachenko 9,780 Reputation points
    2025-06-26T07:29:44.3866667+00:00

    hi Himanshu, thanks for posting this on the Q&A,

    u gotta set up a self-hosted integration runtime. that's ur bridge between on-prem sql server and the cloud Set up a self-hosted IR.

    its secure and plays nice with sql server. when connecting to azure data lake, managed identity is ur best friend, no passwords lying around, just clean secure access. check how it works here: Managed identity for Data Lake Storage.

    now about making it fast and smart! instead of dumping all data every time, use watermark tables. they track what changed since last run. u basically add a column like last_updated and only grab new or modified records. saves time and money )) Incremental loading with watermark.

    for fault tolerance, turn on retries in ur pipeline activities. data factory can automatically try again if something fails. and for extra safety, maybe add some alerts when things go sideways.

    worth looking into partitioning ur data in data lake, can make queries way faster later.

    if u get fancy graphs showing how long each step takes, where bottlenecks are… super useful when tuning performance. Monitor and manage pipelines.

    what else… aha! security-wise, encrypt everything in transit and at rest. data factory handles most of it automatically, but check ur sql server and data lake settings.

    any specific part u want to dive deeper into? happy to explain more ))

    ps: if u ever need to sync huge amounts of data, maybe look into data factory’s parallel copy feature. can speed things up big time Copy performance optimization.

    rgds,

    Alex

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.