Relevance in vector search

2025-07-09

During vector query execution, the search engine looks for similar vectors to find the best candidates to return in search results. Depending on how you indexed the vector content, the search for relevant matches is either exhaustive, or constrained to nearest neighbors for faster processing. Once candidates are found, similarity metrics are used to score each result based on the strength of the match.

This article explains the algorithms used to find relevant matches and the similarity metrics used for scoring. It also offers tips for improving relevance if search results don't meet expectations.

Algorithms used in vector search

Vector search algorithms include:

Exhaustive K-Nearest Neighbors (KNN), which performs a brute-force scan of the entire vector space.
Hierarchical Navigable Small World (HNSW), which performs an Approximate Nearest Neighbor (ANN) search.

Only vector fields marked as searchable in the index or searchFields in the query are used for searching and scoring.

About exhaustive KNN

Exhaustive KNN calculates the distances between all pairs of data points and finds the exact k nearest neighbors for a query point. Because the algorithm doesn't require fast random access of data points, KNN doesn't consume vector index size quota. However, it provides the global set of nearest neighbors.

Exhaustive KNN is computationally intensive, so use it for small to medium datasets or when the need for precision outweighs the need for query performance. Another use case is building a dataset to evaluate the recall of an ANN algorithm, as exhaustive KNN can be used to build the ground truth set of nearest neighbors.

About HNSW

HNSW is an ANN algorithm optimized for high-recall, low-latency applications with unknown or volatile data distribution. During indexing, HNSW creates extra data structures that organize data points into a hierarchical graph. During query execution, HNSW navigates through this graph to find the most relevant matches, enabling efficient nearest neighbor searches.

HNSW requires all data points to reside in memory for fast random access, which consumes vector index size quota. This design balances search accuracy with computational efficiency and makes HNSW suitable for most scenarios, especially when searching over larger datasets.

HNSW offers several tunable configuration parameters to optimize throughput, latency, and recall for your search application. For example, fields that specify HNSW also support exhaustive KNN using the query request parameter "exhaustive": true. However, fields indexed for exhaustiveKnn don't support HNSW queries because the extra data structures that enable efficient search don't exist.

About ANN

ANN is a class of algorithms for finding matches in vector space. This class of algorithms uses different data structures or data partitioning methods to significantly reduce the search space and accelerate query processing.

ANN algorithms sacrifice some accuracy but offer scalable and faster retrieval of approximate nearest neighbors, which makes them ideal for balancing accuracy and efficiency in modern information retrieval applications. You can adjust the parameters of your algorithm to fine-tune the recall, latency, memory, and disk footprint requirements of your search application.

Azure AI Search uses HNSW for its ANN algorithm.

How nearest neighbor search works

Vector queries execute against an embedding space consisting of vectors generated from the same embedding model. Generally, the input value within a query request is fed into the same machine learning model that generated embeddings in the vector index. The output is a vector in the same embedding space. Since similar vectors are clustered close together, finding matches is equivalent to finding the vectors that are closest to the query vector, and returning the associated documents as the search result.

For example, if a query request is about hotels, the model maps the query into a vector that exists somewhere in the cluster of vectors representing documents about hotels. Identifying which vectors are the most similar to the query, based on a similarity metric, determines which documents are the most relevant.

When vector fields are indexed for exhaustive KNN, the query executes against "all neighbors". For fields indexed for HNSW, the search engine uses an HNSW graph to search over a subset of nodes within the vector index.

Creating the HNSW graph

During indexing, the search service constructs the HNSW graph. The goal of indexing a new vector into an HNSW graph is to add it to the graph structure in a manner that allows for efficient nearest neighbor search. The following steps summarize the process:

Initialization: Start with an empty HNSW graph, or the existing HNSW graph if it's not a new index.
Entry point: This is the top-level of the hierarchical graph and serves as the starting point for indexing.
Adding to the graph: Different hierarchical levels represent different granularities of the graph, with higher levels being more global, and lower levels being more granular. Each node in the graph represents a vector point.
- Each node is connected to up to m neighbors that are nearby. This is the m parameter.
- The number of data points considered as candidate connections is governed by the efConstruction parameter. This dynamic list forms the set of closest points in the existing graph for the algorithm to consider. Higher efConstruction values result in more nodes being considered, which often leads to denser local neighborhoods for each vector.
- These connections use the configured similarity metric to determine distance. Some connections are "long-distance" connections that connect across different hierarchical levels, creating shortcuts in the graph that enhance search efficiency.
Graph pruning and optimization: This can happen after indexing all vectors, and it improves navigability and efficiency of the HNSW graph.

Navigating the HNSW graph at query time

A vector query navigates the hierarchical graph structure to scan for matches. The following summarize the steps in the process:

Initialization: The algorithm initiates the search at the top-level of the hierarchical graph. This entry point contains the set of vectors that serve as starting points for search.
Traversal: Next, it traverses the graph level by level, navigating from the top-level to lower levels, selecting candidate nodes that are closer to the query vector based on the configured distance metric, such as cosine similarity.
Pruning: To improve efficiency, the algorithm prunes the search space by only considering nodes that are likely to contain nearest neighbors. This is achieved by maintaining a priority queue of potential candidates and updating it as the search progresses. The length of this queue is configured by the parameter efSearch.
Refinement: As the algorithm moves to lower, more granular levels, HNSW considers more neighbors near the query, which allows the candidate set of vectors to be refined, improving accuracy.
Completion: The search completes when the desired number of nearest neighbors have been identified, or when other stopping criteria are met. This desired number of nearest neighbors is governed by the query-time parameter k.

Similarity metrics used to measure nearness

The algorithm finds candidate vectors to evaluate similarity. To perform this task, a similarity metric calculation compares the candidate vector to the query vector and measures the similarity. The algorithm keeps track of the ordered set of most similar vectors that its found, which forms the ranked result set when the algorithm has reached completion.

Metric	Description
`cosine`	This metric measures the angle between two vectors, and isn't affected by differing vector lengths. Mathematically, it calculates the angle between two vectors. Cosine is the similarity metric used by Azure OpenAI embedding models, so if you're using Azure OpenAI, specify `cosine` in the vector configuration.
`dotProduct`	This metric measures both the length of each pair of two vectors, and the angle between them. Mathematically, it calculates the products of vectors' magnitudes and the angle between them. For normalized vectors, this is identical to `cosine` similarity, but slightly more performant.
`euclidean`	(also known as `l2 norm`) This metric measures the length of the vector difference between two vectors. Mathematically, it calculates the Euclidean distance between two vectors, which is the l2-norm of the difference of the two vectors.

Note

If you run two or more vector queries in parallel, or if you do a hybrid search that combines vector and text queries in the same request, Reciprocal Rank Fusion (RRF) is used for scoring the final search results.

Scores in a vector search results

Scores are calculated and assigned to each match, with the highest matches returned as k results. The @search.score property contains the score. The following table shows the range within which a score will fall.

Search method	Parameter	Scoring metric	Range
vector search	`@search.score`	Cosine	0.333 - 1.00

Forcosine metric, it's important to note that the calculated @search.score isn't the cosine value between the query vector and the document vectors. Instead, Azure AI Search applies transformations such that the score function is monotonically decreasing, meaning score values will always decrease in value as the similarity becomes worse. This transformation ensures that search scores are usable for ranking purposes.

There are some nuances with similarity scores:

Cosine similarity is defined as the cosine of the angle between two vectors.
Cosine distance is defined as 1 - cosine_similarity.

To create a monotonically decreasing function, the @search.score is defined as 1 / (1 + cosine_distance).

Developers who need a cosine value instead of the synthetic value can use a formula to convert the search score back to cosine distance:

double ScoreToSimilarity(double score)
{
    double cosineDistance = (1 - score) / score;
    return  -cosineDistance + 1;
}

Having the original cosine value can be useful in custom solutions that set up thresholds to trim results of low quality results.

Tips for relevance tuning

If you aren't getting relevant results, experiment with changes to query configuration. There are no specific tuning features, such as a scoring profile or field or term boosting, for vector queries:

Experiment with chunk size and overlap. Try increasing the chunk size and ensuring there's sufficient overlap to preserve context or continuity between chunks.
For HNSW, try different levels of efConstruction to change the internal composition of the proximity graph. The default is 400. The range is 100 to 1,000.
Increase k results to feed more search results into a chat model, if you're using one.
Try hybrid queries with semantic ranking. In benchmark testing, this combination consistently produced the most relevant results.