Share via

Duplicate Rows

Anonymous
2023-03-14T12:11:56+00:00

Hello,

I have a problem with duplicate rows, I know Excel has a function that allows to remove the duplicate rows but it doesnt fit for what I am looking for.

Imagine you have to analyse a big amount of data that has this structure:

ID Product Product 1 Product 2
3726 Lemon Orange
3726 Orange Lemon
9842 Meat Fish
9842 Fish Meat

And you need to remove the duplicates that in this case would be the second row (3726-Orange-Lemon) and the fourth because they are duplicate, but when you introduce this data in the "Remove Duplicate" function it doesnt detect this registers as duplicate, and with 4 rows is ok but with 46000 rows you can not check out row by row, so how can I remove all those rows that are duplicate but Excel doesnt detect them as duplicate?

Microsoft 365 and Office | Excel | For business | Windows

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments

3 answers

Sort by: Most helpful
  1. Ashish Mathur 102K Reputation points Volunteer Moderator
    2023-03-14T23:09:46+00:00

    Hi,

    In the Data > Remove Duplicates window, just ensure that only the first box is checked.

    Hope this helps.

    Was this answer helpful?

    0 comments No comments
  2. HansV 462.6K Reputation points
    2023-03-14T12:47:13+00:00

    Do you have Microsoft 365 or Office 2021? If so, you could create a helper column and use that in the Remove Duplicates dialog

    Using your example, enter the following formula in D2:

    =TEXTJOIN("|",TRUE,SORT(B2:C2,,,TRUE))

    and fill down. Result:

    ID Product Product 1 Product 2 All Products
    3726 Lemon Orange Lemon Orange
    3726 Orange Lemon Lemon Orange
    9842 Meat Fish Fish Meat
    9842 Fish Meat Fish Meat

    Was this answer helpful?

    0 comments No comments
  3. Anonymous
    2023-03-14T12:41:23+00:00

    In D2 input formula then drop down, it will get the result as below. Then you can sort from dup column and delete the rows with duplicate tagged.

    =IF(COUNTIF($A$2:A2,A2)=2,"duplicate","")

    ID Product Product 1 Product 2 Dup
    3726 Lemon Orange
    3726 Orange Lemon duplicate
    9842 Meat Fish
    9842 Fish Meat duplicate

    Help Column=

    =IF(COUNTIF($A$2:A2,A2)=2,A2&C2&B2,A2&B2&C2)

    Dup Column=

    =IF(COUNTIF($D$2:D2,D2)=2,"duplicate","")

    ID Product Product 1 Product 2 Help Column Dup
    3726 Lemon Orange 3726LemonOrange
    3726 Orange Lemon 3726LemonOrange duplicate
    9842 Meat Fish 9842MeatFish
    9842 Fish Meat 9842MeatFish duplicate
    3783 ABC ACD 3783ABCACD
    3783 ABD 3783ABD

    Was this answer helpful?

    0 comments No comments