Rerank your retrieved results

5 minutes

Vector search matches user queries with semantically similar content instead of exact word matches. However, you can get multiple results from a vector search that you want to filter and rerank to refine and reorder your search results.

Imagine the following scenario:

Diagram of a ranking scenario after retrieving relevant documents.

A user query comes in and is used to search your vector store for any relevant documents.
Multiple documents are identified as being semantically similar to the user query.
Only a subset of the documents are relevant. You can use a reranker to select the top three documents based on certain criteria.
The three documents are provided as context for a language model to generate a response to the user.

Reranking adjusts the initial ranking of retrieved documents to enhance the precision and relevance of search results. You can reorder documents based on the relevance scores with the goal to place the most relevant documents at the top of the list.

When you use vector search to retrieve relevant documents, the similarity between two vectors is calculated with the cosine similarity. The higher the metric, the more similar two vectors are.

Diagram of the cosine similarity between a query and document vector.

Reranking goes beyond just evaluating the cosine similarity between the query and document vectors. It supports a deeper semantic understanding by considering the actual relevance of the documents to the query. A reranker can select more relevant documents and reduce hallucinations.

Diagram of the reranking of multiple document vectors.

To use a reranker, you can use:

Private APIs, like Cohere or Jina.
Open-source rerankers, like cross-encoders, FlagEmbedding, or FlashRank.

Though rerankers can improve the accuracy of the final response of your Generative AI application, implementing rerankers adds complexity to the RAG pipeline and must be done with care.

Rerank your retrieved results

Feedback