How do I maintain data between an on-prem SQL Server and an Azure Data Lake?

System2000 236 Reputation points
2021-10-28T20:47:05.893+00:00

Hi,

I am going to be using Azure Synapse or ADF pipelines to ingest several TB of data from SQL server data sources and store them as CSV in an Azure Data Lake Gen 2.

Data is regularly appended to (and updated) in the source SQL Server tables, but I don't really want to ingest the full number of TBs of data every day to keep the data in the data lake up to date. Ideally, I would like to only ingest data from the SQL Server that is not already in the data lake, and those entries that have been updated. Can this be done?

If it's possible to achieve this, then I am also wondering how I update the CSVs in the data lake e.g. can I append data to the CSVs? can I update entries in the CSVs?

Help is appreciated.

Thanks.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
{count} votes

Answer accepted by question author
  1. System2000 236 Reputation points
    2021-11-01T10:29:26.05+00:00

    Hi @PRADEEPCHEEKATLA ,

    Thanks for your response. Since asking the question I found this article:
    https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview

    I think this will suit my needs. (Your suggestion is the fourth of four options.)

    Thanks

    Simon.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.