Hi,
I am going to be using Azure Synapse or ADF pipelines to ingest several TB of data from SQL server data sources and store them as CSV in an Azure Data Lake Gen 2.
Data is regularly appended to (and updated) in the source SQL Server tables, but I don't really want to ingest the full number of TBs of data every day to keep the data in the data lake up to date. Ideally, I would like to only ingest data from the SQL Server that is not already in the data lake, and those entries that have been updated. Can this be done?
If it's possible to achieve this, then I am also wondering how I update the CSVs in the data lake e.g. can I append data to the CSVs? can I update entries in the CSVs?
Help is appreciated.
Thanks.