unique records and duplicate record separate load

Anshal 2,251 Reputation points
2023-10-23T11:54:50.9366667+00:00

Hi friends, I have a dataset that contains a lot of duplicate records and I have to design a pipeline that loads unique and duplicates separately in the sink.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,300 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,434 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 31,391 Reputation points
    2023-10-23T13:22:36.32+00:00

    For unique records, you can use a Copy Data activity to get data into a staging area using ROW_NUMBER() in a SQL script.

    Add another Copy Data`activity to get data into another staging area. You can use a similar window function, but this time select the records where the row number is greater than 1.

    -- For Unique Records
    WITH RankedData AS (
      SELECT *,
             ROW_NUMBER() OVER (PARTITION BY YourUniqueColumns ORDER BY SomeColumn) as rn
      FROM YourStagingTable
    )
    SELECT *
    INTO UniqueRecordsSink
    FROM RankedData
    WHERE rn = 1;
    -- For Duplicate Records
    WITH RankedData AS (
      SELECT *,
             ROW_NUMBER() OVER (PARTITION BY YourUniqueColumns ORDER BY SomeColumn) as rn
      FROM YourStagingTable
    )
    SELECT *
    INTO DuplicateRecordsSink
    FROM RankedData
    WHERE rn > 1;
    
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.