identical docuements, different similarity score( in one shard)

Question

I have two same documents in one shard, but when I query them, the similarity scores are very different. Anyone has any idea of what has caused this?

Accepted Answer

Thanks for asking question! You may want to know that Azure Cognitive Search distributes each index horizontally through a sharding process, which means that portions of an index are physically separate.

By default, the score of a document is calculated based on statistical properties of the data within a shard. This approach is generally not a problem for a large corpus of data, and it provides better performance than having to calculate the score based on information across all shards.

Also, this could cause two very similar documents (or even identical documents) to end up with different relevance scores if they end up in different shards as you mentioned.

You may try to compute the score based on the statistical properties across all shards, you can do so by adding scoringStatistics=global as a query parameter (or add scoringStatistics: global as a body parameter of the query request).

POST https://[service name].search.windows.net/indexes/hotels/docs/search?api-version=2020-06-30
{
"search": "",
"scoringStatistics": "global"
}

Using scoringStatistics will ensure that all shards in the same replica provide the same results.

Also, different replicas may be slightly different from one another as they are always getting updated with the latest changes to your index.

For more details check this document link:

Let us know if you have further query or issue remains.

Share via

identical docuements, different similarity score( in one shard)

0 additional answers