I want to remove duplicate based on the following criteria.

Question

I want to remove duplicate based on the following criteria.

Anonymous

I want the formula to return a total of 5 rows( removing the duplicates ) but keep the rows that have a Package code 20 or 25 . If the duplicate orders do not have a row containing 20,25 then it can bring in the row with first occurrence by Order Number

Here is the Data

Order Number	Package Code
A123	20
A123	1
A123	2
A123	3
A456	1
A456	25
A456	4
A456	8
A456	7
B789	1
B789	2
B789	3
B789	4
C567	5
C567	6
D098	9
D098	10
D098	20
D098	1
D098	0

Here is what I want to formula to return:

Order Number	Package Code
A123	20
A456	25
B789	1
C567	5
D098	20

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments

2 answers

Answer 1

Anonymous

//select * from remove_duplicate_ord;

select * from (

select * from remove_duplicate_ord order by iif(Package Code in (20,25),0,1)) group by Order Number;

https://s21.ax1x.com/2024/04/01/pF7QnEV.jpg

https://e.anyoupin.cn/EData/?s=remove\_duplicate\_ord

0 comments

Answer 2

Hello,

I found a solution to the problem of removing duplicates based on specific criteria, specifically prioritizing certain package codes, in Excel. Here's a brief explanation:

Create a List of Priority Codes: In this case, we're focusing on package codes 20 and 25. We want to keep these codes in our final list.

Use a Formula to Select the Right Rows: We need a formula that checks for these priority codes within each order number. If the order has a priority code, it should be selected. If not, the formula should default to the first occurrence of that order number.

Apply this Logic to the Data Set: I applied this logic using a Python script for demonstration, but in Excel, you can use a combination of IF, VLOOKUP, and INDEX/MATCH functions to achieve a similar result.

Here’s a Python script that demonstrates the logic:

import pandas as pd

Your data goes here

data = {"Order Number": ["A123", "A123", ...], "Package Code": [20, 1, ...]} df = pd. DataFrame(data)

Defining priority package codes

priority_codes = [20, 25]

Logic to select the right row

def select_row(group): priority_rows = group[group['Package Code'].isin(priority_codes)] return priority_rows.iloc[0] if not priority_rows.empty else group.iloc[0]

Apply the logic and get the result

result_df = df.groupby('Order Number').apply(select_row).reset_index(drop=True)

This script will give you a table like:

Order Number Package Code A123 20 A456 25 B789 1 C567 5 D098 20 In Excel, you can replicate this logic using formulas, or if you're comfortable with VBA, you can automate this process. Please remember to adjust your Excel functions to follow this logic.

Best Regards, Saurabh

Share via

I want to remove duplicate based on the following criteria.

2 answers

Your data goes here

Defining priority package codes

Logic to select the right row

Apply the logic and get the result