Dynamic dataflow

arkiboys 9,686 Reputation points
2021-09-29T13:50:42.823+00:00

Hello,
At present I have created a pipeline which dynamically loads data (Any number of objects at source, i.e. sql server tables) from source to sink (Parquet files). This seems to be working fine. So, instead of creating a pipeline for each object (For example a sql server table), we have one pipeline which handles all the objects and loads them into .parquet files appropriately…
So on a daily basis, the .parquet files(Destiination) have the same data as the sql tables(Source)
So far so good.

Question,
the next stage is for me to be able to load the .parquet files into another blob storage BUT making sure that the final destination will have the necessary upserts and deletes if any (i.e. comparing the first .parquet to the final .parquet).
For example,
source --> .parquet1 (Has the same data as source - loaded daily) --> upsert/delete the next .parquet file on each load so that the final destination has the latest data.


Instead of creating a separate dataflow for each .parquet file, what is the best way to take care of the upserts/deletes dynamically so to perhaps have one dataflow which will handle any of the .parquet file and update/insert/delete the final destination accordingly.

Hope you see what I mean

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,142 questions
{count} votes

0 additional answers

Sort by: Most helpful