Need to compare the data before loading in to sink

Anshal 1,866 Reputation points
2024-04-19T10:56:03.4+00:00

Hi friends, I have a requirement that to ensure we load the correct data into to sink Azure SQL database , so want to compare the source data with destination stored data and load only if the data is not exists in sink and store the rejected data in to Blob. How can I achieve this ?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,384 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,583 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 15,446 Reputation points
    2024-04-19T12:06:54.37+00:00

    You can either use Lookup activity or a Data flow :

    Using Lookup Activity:

    • Source Dataset: This dataset will be connected to your source data.
    • Sink Dataset: This dataset will be your Azure SQL Database.
    • Lookup Activity: Use this to load existing data from the Azure SQL Database into memory. You can limit this to relevant fields that determine data uniqueness or updates.

    Using Data Flow:

    • Source Stream: Represents your incoming data.
    • Existing Data Stream: Uses Lookup to bring existing data.
    • Join Activity: Join the Source Stream and Existing Data Stream on the key(s) that you use to determine uniqueness.
    • Conditional Split: After joining, use a Conditional Split to separate records into those that exist and those that do not. For existing records, further split based on whether the data needs to be updated (if any field is different from the existing record).

    How to distinguish old,updated and rejected records?

    • Use a Copy Activity or Sink in Data Flow to insert or update records in your Azure SQL Database based on the output stream of new or changed data from the Conditional Split.
    • Store records that do not meet your criteria (like duplicates or invalid data) into Azure Blob Storage using another Sink in your Data Flow or a separate Copy Activity. Configure the Blob Storage linked service and specify the format and path where you want to store these records.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful