dataflow - select distinctrows

Question

dataflow - select distinctrows

arkiboys 9,706

field1 field2 field3
name1 surname1 address1
name1 surename1 address1
name1 surename1 address1
name2 surename2 address2
name2 surename2 address2
name2 surename2 address2
...

In my select activity, it returns in data preview several fields.
There are duplicates and I would like to return distict rows.
After the select activity I have placed an aggregate activity.
Inside this aggregate activity the screenshot below

How is this done please?

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2022-03-16T08:39:22+00:00

Hi @arkiboys ,
The screenshot you are mentioning about is not visible. Could you please try resharing the same. Have you tried using collect function inside aggregate transformation to achieve the above requirement - eg. Field1= collect(field1) , Field2= collect(field2) , Field3= collect(field3)
arkiboys 9,706 Reputation points

2022-03-16T09:22:29.693+00:00

Hi,
The below settings seem to be giving me what I want which is to do select distinct

in group by I place all fields
In aggregates section I am using column pattern like below:

each column that matches: name == 'field1' && name == 'field2', etc

first box: $$ second box: first($$)

Question:
Where do I put your collect(field) suggestion?

Thanks

Accepted answer

0 additional answers

Your answer

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2022-03-16T08:39:22+00:00

Hi @arkiboys ,
The screenshot you are mentioning about is not visible. Could you please try resharing the same. Have you tried using collect function inside aggregate transformation to achieve the above requirement - eg. Field1= collect(field1) , Field2= collect(field2) , Field3= collect(field3)
arkiboys 9,706 Reputation points

2022-03-16T09:22:29.693+00:00

Hi,
The below settings seem to be giving me what I want which is to do select distinct

in group by I place all fields
In aggregates section I am using column pattern like below:

each column that matches: name == 'field1' && name == 'field2', etc

first box: $$ second box: first($$)

Question:
Where do I put your collect(field) suggestion?

Thanks

Answer 1

Hi @arkiboys ,
Thankyou for using Microsoft Q&A platform and posting your query.
As I understand your query, you want to remove duplicate in your data. Your approach of trying aggregate with column pattern seems correct but before that try adding Rank Transformation with Dense option enabled to generate Id based on the field values.

Based on this id , we can group by in group by settings of aggregate transformation. In aggregate settings, we can use column pattern with condition as : name !='Id' . Use Column name expression as : $$ and value expression as : first($$)

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Share via

dataflow - select distinctrows

0 additional answers

Your answer