Remove Duplicate Row from csv using aggregrate function

Dinesh Prajapati 1 Reputation point
2023-01-05T06:09:30.887+00:00

Remove Duplicate Row from csv using aggregrate function.

Hi team,

I want to remove duplicate row from my csv. I tried using aggregrate function to do that and was able to do that for one csv. But my requirement is that if all the columns of one row matches any other row then that should be removed. first or last anyone will do. But I am not able to handle this in aggregrate function. Also I have around 15 files whose columns are also different, so I cannot groupby using some specific column.

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
7,006 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ryan Hill 16,076 Reputation points Microsoft Employee
    2023-01-06T05:50:34.97+00:00

    Hi @Dinesh Prajapati ,

    I'm not sure if you're using an Azure Logic App or App Service derivative i.e., web job, function app; but neither any of these will impact your original concern.

    Having 15 files isn't an issue because you can simply union all the datasets together so that you can aggregate all the sources. What is an issue is when you say all the columns are different. In order to remove a duplicate, columns have to match. I think using will be the easiest platform to use to aggregate your data.

    Read all your files in and use the Parse JSON object to convert your CSV rows into objects through data operations. By composing the rows into objects, you have more flexibility to determine what rows are duplicate or not. You can do the same thing through custom code as well, read in the file, serialize to an object, and compare objects through a custom hash to determine which objects are the same.

    No comments