Import incremental data  from Azure Data Lake

Lucas Silva 1 Reputation point
2020-08-20T17:58:50.56+00:00

Good afternoon,

I have the following scenario !!

I have a table in SQL that always receives changes daily, and I need to send this data to Azure data
lake

So, today I take this data and send it in the datalake

file: Historicos.csv

The other day, I take the data again and send it back to Historicos.csv

And the other day the same thing and so on, always sending in the same file (Historicos.csv)

The problem is that every time I load this new data in the same file, that file is overwritten and the previous data is lost ....

Was there any way to incrementally send the data ????????? does not replace, but adds new data in the same file?

19117-lake.png

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,389 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,845 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Cris Zhan-MSFT 6,611 Reputation points
    2020-08-21T09:56:59.58+00:00

    Hi,

    Please check this blog:
    https://dataninjago.com/2017/12/03/ssis-in-azure-1-periodically-ingesting-data-into-azure-data-lake-from-sql-database-using-ssis/

    ===============================================
    If the response helped, do "Accept Answer" and upvote it.


  2. KranthiPakala-MSFT 46,427 Reputation points Microsoft Employee
    2020-08-21T21:40:16.693+00:00

    Hi @Lucas Silva ,

    Welcome to Microsoft Q&A platform and thanks for using this forum.

    As per my understanding you are trying to read incremental data from your source (SQL) and append the data (only the incremental data) to existing destination file (Historicos.csv in Azure Data lake store). But instead of appending the incremental data to existing file, the file is overwritten with only incremental data and old data is lost. Please correct me if my understanding is incorrect.

    If that is the case could you please confirm what service are you using to copy data from source to sink? Is it ADF? or some other service?

    In case if you are using copy activity of ADF, then copy activity doesn't support append file functionality. Please feel free to up-vote and/or comment on an already existing feature request thread: https://feedback.azure.com/forums/270578-data-factory/suggestions/36163726-append-replace-upsert-the-outgoing-data-from-copy

    Below are the supported copy behaviors when files are from file-based data store:

    copyBehavior -(Defines the copy behavior when the source is files from a file-based data store)

    Allowed values are:

    • PreserveHierarchy (default): Preserves the file hierarchy in the target folder. The relative path of source file to source folder is identical to the relative path of target file to target folder.
    • FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have auto-generated names.
    • MergeFiles: Merges all files from the source folder to one file. If the file name is specified, the merged file name is the specified name. Otherwise, it's an autogenerated file name.

    Workaround:

    Possible workaround to achieve your requirement is to utilize CopyBehavior feature of Copy activity.

    1. First have a copy activtiy_1 to Copy daily data from your source (SQL) to a staging folder in your ADLS as filename_datetime.csv (then you will have daily files sitting in this folder as filename_20200819103433.csv, filename_20200820043432.csv, filename_20200821114423.csv,..)
    2. Then have a subsequent copy activity_2 in the same pipeline to copy those daily files and copy to destination folder using MergeFiles CopyBehavior of copy activity which will merge all files into a single file. This file will be overwritten daily when the pipeline runs and you have all the historical data including latest incremental data.

    19611-mergefiles.png

    If you would like to avoid repeating of copying the same/historical files everyday from staging to destination, another possible workaround is to use custom activity of ADF and write your own logic to copy the data and append to existing file in destination folder.

    In case if you are using a different service other than ADF, please let me know, will be happy to assist further.

    Hope this info helps.

    ----------

    Thank you
    Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.