Upsert operation in azure data factory

Question

We want to copy data from production and send it to dev env daily. As a solution, we re thinking to create a pipeline in Azure data factory to fetch all data from source database, scramble PII information for a few columns and send it to a destination database, Schema of both databases are exactly same. Both are on azure managed instance.

In the pipeline, we want that foreach should pick all tables one by one then do upsert operation. Questions:

How we would be able to setup datasets, which can be used dynamically for all the tables.
There are around 2000 objects in the source database which keeo on increasing, we don't want to create procedures for all of them.
We want pipeline should pick it dynamically, is there any way to do this?

Answer

I am splitting your question in 4 parts :

How can we dynamically set up datasets for all tables?

To dynamically set up datasets for all tables, you can start by creating a metadata table in your source database. This table will store the list of tables that need to be processed. You can populate this metadata table with the names of the tables you want to process, which allows for dynamic updates as tables are added or removed from the source database. Next, in Azure Data Factory, create a Lookup activity that reads from this metadata table to get the list of tables to process. This approach avoids the need for manually creating datasets for each table.

How can we handle a growing number of tables without creating individual procedures?

Handling a growing number of tables without creating individual procedures can be managed using parameterized datasets and a ForEach activity in Azure Data Factory. After setting up the Lookup activity to fetch table names from the metadata table, add a ForEach activity to iterate over this list of tables. Within this ForEach activity, use parameterized datasets to dynamically handle different table names. This means you only need to create a few reusable datasets and activities, which are configured to accept table names as parameters, thus avoiding the need to create individual procedures for each table.

How can we dynamically pick tables and perform operations on them?

To dynamically pick tables and perform operations on them, you can leverage the ForEach activity in Azure Data Factory, which iterates over the list of table names retrieved from the Lookup activity. Inside the ForEach activity, include a Data Flow or Stored Procedure activity configured to handle your data transformation and upsert logic. The Data Flow activity can be used to scramble PII data and perform the upsert operation. By parameterizing these activities, they can dynamically adapt to the specific table being processed, ensuring the pipeline can scale and accommodate new tables as they are added.

How can we scramble PII information and perform upserts efficiently?

Scrambling PII information and performing upserts efficiently can be achieved by designing a Data Flow in Azure Data Factory. This Data Flow can include transformations to scramble PII data based on your specific requirements. Once the data is transformed, the Data Flow can be configured to upsert the data into the destination database. Alternatively, you can use a Stored Procedure to handle the upsert operation. The Stored Procedure can be called from within the ForEach activity, passing the necessary parameters for each table. This approach ensures that both PII scrambling and upsert operations are handled efficiently within the same pipeline execution.

Share via

Upsert operation in azure data factory

1 answer

How can we dynamically set up datasets for all tables?

How can we handle a growing number of tables without creating individual procedures?

How can we dynamically pick tables and perform operations on them?

How can we scramble PII information and perform upserts efficiently?