identical docuements, different similarity score( in one shard)

Liu, Meg 21 Reputation points

I have two same documents in one shard, but when I query them, the similarity scores are very different. Anyone has any idea of what has caused this?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
684 questions
{count} votes

Accepted answer
  1. SnehaAgrawal-MSFT 18,191 Reputation points

    Thanks for asking question! You may want to know that Azure Cognitive Search distributes each index horizontally through a sharding process, which means that portions of an index are physically separate.

    By default, the score of a document is calculated based on statistical properties of the data within a shard. This approach is generally not a problem for a large corpus of data, and it provides better performance than having to calculate the score based on information across all shards.

    Also, this could cause two very similar documents (or even identical documents) to end up with different relevance scores if they end up in different shards as you mentioned.

    You may try to compute the score based on the statistical properties across all shards, you can do so by adding scoringStatistics=global as a query parameter (or add scoringStatistics: global as a body parameter of the query request).

    POST https://[service name]
    "search": "<query string>",
    "scoringStatistics": "global"

    Using scoringStatistics will ensure that all shards in the same replica provide the same results.

    Also, different replicas may be slightly different from one another as they are always getting updated with the latest changes to your index.

    For more details check this document link:

    Let us know if you have further query or issue remains.

0 additional answers

Sort by: Most helpful