How do I get distinct values in Azure Cognitive Search for pagination?

ocean-man-24 0 Reputation points
2023-09-01T02:26:57.0066667+00:00

Hello, I've been stuck with this problem for a few weeks now and I have no idea how to solve it.

So basically, my index contains data that looks something like this:

[
    { id: 1, value: "some snippet from a paragraph"},
    { id: 1, value: "some different snippet from the same paragraph" }.
    { id: 1, value: "some different snippet from the same paragraph" },
    { id: 2, value: "some snippet from a different paragraph" },
    { id: 2, value: "some different snippet from id no.2's paragraph " },
    ... and a few more million values
]

As you can see, one "paragraph" is split into multiple chunks. Each item might have the same ID, but with different values.

I used $top and $skip, but the problem with this is that the values in the top n results will contain "duplicates". Like, the top 50 results will contain multiple items that have an ID of 1.

What I want to happen is that the top n results should only contain items with unique ID's. Is it possible to just get the item with the highest score of each ID? How can I make this happen?

The results should look something like this:

[   
    { id: 1, value: "some value" },
    { id: 2, value: "some value" }.
    { id: 3, value: "some value" },
    { id: 4, value: "some value" },
    { id: 5, value: "some value" },
    ... 
]
Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
685 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. SnehaAgrawal-MSFT 18,191 Reputation points
    2023-09-04T09:16:43.31+00:00

    @ocean-man-24 Thanks for reaching! As I can see you also reached us on SO and response was provided by experts, Hope it helps let us know if further query or issue remains.

    0 comments No comments