Share via

load same data multiple times

arkiboys 9,711 Reputation points
2022-03-17T11:26:38.88+00:00

in dataflow there is a sink which writes to storage in .parquet
in a foreach loop the dataflow is called for each item to be loaded.
on first load, the storage has data for each item.
if I run the adf the second time, all the items get reloaded into .parquet files in storage and so duplication occurs in storage.
How can I make sure that the data is overwritten if it exists in .parquet in storage already rather than duplicating?

I tried the clear folder option in sink setting but that will clear on each iteration which is not the solution.

Thank you

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.

0 comments No comments

Answer accepted by question author

Nasreen Akter 10,896 Reputation points Volunteer Moderator
2022-03-17T13:29:57.4+00:00

Hi @arkiboys ,

I think one solution would be to specify the SINK filename(s). Thanks!

Was this answer helpful?


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.