Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,300 questions
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Hi friends, I have a dataset that contains a lot of duplicate records and I have to design a pipeline that loads unique and duplicates separately in the sink.
For unique records, you can use a Copy Data activity to get data into a staging area using ROW_NUMBER() in a SQL script.
Add another Copy Data`activity to get data into another staging area. You can use a similar window function, but this time select the records where the row number is greater than 1.
-- For Unique Records
WITH RankedData AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY YourUniqueColumns ORDER BY SomeColumn) as rn
FROM YourStagingTable
)
SELECT *
INTO UniqueRecordsSink
FROM RankedData
WHERE rn = 1;
-- For Duplicate Records
WITH RankedData AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY YourUniqueColumns ORDER BY SomeColumn) as rn
FROM YourStagingTable
)
SELECT *
INTO DuplicateRecordsSink
FROM RankedData
WHERE rn > 1;