delta files

arkiboys 9,706 Reputation points
2022-04-07T10:20:21.773+00:00

Hello,
In ADF-dataflow sink, I place the data as parquet files.
The sink puts the data as what looks to be like partition files like part-0000... , part-00017..., etc
After investigation/reloading, these files get re-created on each sink load.
Now I have strated thinking about delta so that instead of re-creating the .parquet files on each load, they get updated instead.
For this, is it just a matter of using the inline/delta dataset in sink and prior to sink to have alter row transformation and place upsert and delete ticked with true() ?

Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,644 questions
0 comments No comments
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2022-04-08T08:56:16.047+00:00

    Hi @arkiboys ,

    Thank you for posting query in Microsoft Q&A Platform.

    As per my understanding of ask you are looking for a way to not override files in storage. Please correct me if I am wrong.

    Yes, when we use files format then they get override in storage when with same name a new file comes. Delta format helps to avoid this nature. Delta format allows you to preform upsert and delete operations as well.

    Click here to know more about Delta format in Azure data factory.

    Hope this helps. Please let us know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.