Relevance scoring in hybrid search using Reciprocal Rank Fusion (RRF)

Alt
10/07/2024

Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores from multiple, previously ranked results to produce a unified result set. In Azure AI Search, RRF is used whenever there are two or more queries that execute in parallel. Each query produces a ranked result set, and RRF is used to merge and homogenize the rankings into a single result set, returned in the query response. Examples of scenarios where RRF is always used include hybrid search and multiple vector queries executing concurrently.

RRF is based on the concept of reciprocal rank, which is the inverse of the rank of the first relevant document in a list of search results. The goal of the technique is to take into account the position of the items in the original rankings, and give higher importance to items that are ranked higher in multiple lists. This can help improve the overall quality and reliability of the final ranking, making it more useful for the task of fusing multiple ordered search results.

Note

New in 2024-09-01-preview is the ability to deconstruct an RRF-ranked search score into its component subscores. This gives you transparency into all-up score composition. For more information, see unpack search scores (preview) in this article.

How RRF ranking works

RRF works by taking the search results from multiple methods, assigning a reciprocal rank score to each document in the results, and then combining the scores to create a new ranking. The concept is that documents appearing in the top positions across multiple search methods are likely to be more relevant and should be ranked higher in the combined result.

Here's a simple explanation of the RRF process:

Obtain ranked search results from multiple queries executing in parallel.
Assign reciprocal rank scores for result in each of the ranked lists. RRF generates a new @search.score for each match in each result set. For each document in the search results, the engine assigns a reciprocal rank score based on its position in the list. The score is calculated as 1/(rank + k), where rank is the position of the document in the list, and k is a constant, which was experimentally observed to perform best if it's set to a small value like 60. Note that this k value is a constant in the RRF algorithm and entirely separate from the k that controls the number of nearest neighbors.
Combine scores. For each document, the engine sums the reciprocal rank scores obtained from each search system, producing a combined score for each document.
The engine ranks documents based on combined scores and sorts them. The resulting list is the fused ranking.

Only fields marked as searchable in the index, or searchFields in the query, are used for scoring. Only fields marked as retrievable, or fields specified in select in the query, are returned in search results, along with their search score.

Parallel query execution

RRF is used anytime there's more than one query execution. The following examples illustrate query patterns where parallel query execution occurs:

A full text query, plus one vector query (simple hybrid scenario), equals two query executions.
A full text query, plus one vector query targeting two vector fields, equals three query executions.
A full text query, plus two vector queries targeting five vector fields, equals 11 query executions

Scores in a hybrid search results

Whenever results are ranked, @search.score property contains the value used to order the results. Scores are generated by ranking algorithms that vary for each method. Each algorithm has its own range and magnitude.

The following chart identifies the scoring property returned on each match, algorithm, and range of scores for each relevance ranking algorithm.

Search method	Parameter	Scoring algorithm	Range
full-text search	`@search.score`	BM25 algorithm	No upper limit.
vector search	`@search.score`	HNSW algorithm, using the similarity metric specified in the HNSW configuration.	0.333 - 1.00 (Cosine), 0 to 1 for Euclidean and DotProduct.
hybrid search	`@search.score`	RRF algorithm	Upper limit is bounded by the number of queries being fused, with each query contributing a maximum of approximately 1 to the RRF score. For example, merging three queries would produce higher RRF scores than if only two search results are merged.
semantic ranking	`@search.rerankerScore`	Semantic ranking	0.00 - 4.00

Semantic ranking occurs after RRF merging of results. Its score (@search.rerankerScore) is always reported separately in the query response. Semantic ranker can rerank full text and hybrid search results, assuming those results include fields having semantically rich content. It can rerank pure vector queries if the search documents include text fields that contain semantically relevant content.

Unpack a search score into subscores (preview)

Using 2024-09-01-preview, you can deconstruct a search score to view its subscores.

For vector queries, this information can help you determine an appropriate value for vector weighting or setting minimum thresholds.

To get subscores:

Use the latest preview Search Documents REST API or an Azure SDK beta package that provides the feature.
Modify a query request, adding a new debug parameter set to either vector, semantic if using semantic ranker, or all.

Here's an example of hybrid query that returns subscores in debug mode:

POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2024-09-01=preview

{
    "vectorQueries": [
        {
            "vector": [
                -0.009154141,
                0.018708462,
                . . . 
                -0.02178128,
                -0.00086512347
            ],
            "fields": "DescriptionVector",
            "kind": "vector",
            "exhaustive": true,
            "k": 10
        },
        {
            "vector": [
                -0.009154141,
                0.018708462,
                . . . 
                -0.02178128,
                -0.00086512347
            ],
            "fields": "DescriptionVector",
            "kind": "vector",
            "exhaustive": true,
            "k": 10
        }
    ],
    "search": "historic hotel walk to restaurants and shopping",
    "select": "HotelName, Description, Address/City",
    "debug": "vector",
    "top": 10
}

Weighted scores

Using 2024-07-01 and newer preview API versions, you can weight vector queries to increase or decrease their importance in a hybrid query.

Recall that when computing RRF for a certain document, the search engine looks at the rank of that document for each result set where it shows up. Assume a document shows up in three separate search results, where the results are from two vector queries and one text BM25-ranked query. The position of the document varies in each result.

Match found	Position in results	@search.score	weight multiplier	@search.score (weighted)
vector results one	position 1	0.8383955	0.5	0.41919775
vector results two	position 5	0.81514114	2.0	1.63028228
BM25 results	position 10	0.8577363	NA	0.8577363

The document's position in each result set corresponds to an initial score, which is added up to create the final RRF score for that document.

If you add vector weighting, the initial scores are subject to a weighting multiplier that increases or decreases the score. The default is 1.0, which means no weighting and the initial score is used as-is in RRF scoring. However, if you add a weight of 0.5, the score is reduced and that result becomes less important in the combined ranking. Conversely, if you add a weight of 2.0, the score becomes a larger factor in the overall RRF score.

In this example, the @search.score (weighted) values are passed to the RRF ranking model.

Number of ranked results in a hybrid query response

By default, if you aren't using pagination, the search engine returns the top 50 highest ranking matches for full text search, and the most similar k matches for vector search. In a hybrid query, top determines the number of results in the response. Based on defaults, the top 50 highest ranked matches of the unified result set are returned.

Often, the search engine finds more results than top and k. To return more results, use the paging parameters top, skip, and next. Paging is how you determine the number of results on each logical page and navigate through the full payload. You can set maxTextRecallSize to larger values (the default is 1,000) to return more results from the text side of hybrid query.

By default, full text search is subject to a maximum limit of 1,000 matches (see API response limits). Once 1,000 matches are found, the search engine no longer looks for more.

For more information, see How to work with search results.

Diagram of a search scoring workflow

The following diagram illustrates a hybrid query that invokes keyword and vector search, with boosting through scoring profiles, and semantic ranking.

A query that generates the previous workflow might look like this:

POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: {{admin-api-key}}
{
   "queryType":"semantic",
   "search":"hello world",
   "searchFields":"field_a, field_b",
   "vectorQueries": [
       {
           "kind":"vector",
           "vector": [1.0, 2.0, 3.0],
           "fields": "field_c, field_d"
       },
       {
           "kind":"vector",
           "vector": [4.0, 5.0, 6.0],
           "fields": "field_d, field_e"
       }
   ],
   "scoringProfile":"my_scoring_profile"
}

Comhroinn trí