Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure DocumentDB now supports a powerful hybrid search capability that combines vector search with full-text search scoring using the Reciprocal Rank Fusion (RRF) function.
What is hybrid search?
Hybrid search leverages the strengths of both vector-based and traditional keyword-based search methods to deliver more relevant and accurate search results. This is achieved by combining vector embeddings and text data within the same documents:
- Vector search: Utilizes machine learning models to understand the semantic meaning of queries and documents. This allows for more context-aware search results, especially useful for complex queries where traditional keyword search might fall short. For efficient vector similarity search, Azure DocumentDB offers three
cosmosSearchcompatible vector indices: HNSW, IVF, and DiskANN. - Full-text search: Azure DocumentDB uses
$textoperator for full-text search.
The results from vector search and full-text search are then combined to return final search results benefiting from the strengths of both search approaches:
- Enhanced Relevance: By combining semantic understanding with keyword matching, hybrid search can deliver more relevant results for a wide range of queries.
- Improved Accuracy: The RRF function ensures that the most pertinent results from both search methods are prioritized.
- Versatility: Suitable for various use cases including recommendation systems, semantic search, and personalized content discovery.
How to use hybrid search
- Create a collection to store your data, including both text content and vector embeddings.
- Create a vector index on your vector embedding field using the
cosmosSearchoperator. - Create a full-text index on your text field using the standard index creation commands with the
$texttype. - Use the aggregation pipeline with the
$searchoperator (for vector search) and the$textoperator (for full-text search), followed by steps to implement Reciprocal Rank Fusion to combine the scores using the$unionWithoperator.
Configure indexes for hybrid search
Create a vector index
db.runCommand({
"createIndexes": "yourCollectionName",
"indexes": [
{
"key": {
"vectorField": "cosmosSearch"
},
"name": "vectorIndex",
"cosmosSearchOptions": {
"kind": "vector-diskann", // "vector-ivf" , "vector-hnsw"
"similarity": "cosine", // "l2"
"dimensions": 3072 // Max 16,000
}
}
]
});
Create a full-text index
db.runCommand({
"createIndexes": "yourCollectionName",
"indexes": [
{
"key": {
"textField": "text"
},
"name": "fullTextIndex"
}
]
});
Perform hybrid search
db.hybrid_col.aggregate([
{ $search: { cosmosSearch: { path: "vector", vector: [0.1, 0.2, 0.3], k: 5 } } }, // Vector search
{ $group: { _id: null, vectorResults: { $push: "$$ROOT" } } },
{ $unwind: { path: "$vectorResults", includeArrayIndex: "vectorRank" } },
{ $addFields: { vs_score: { $divide: [1, { $add: ["$vectorRank", 1, 1] }] } } },
{ $project: { _id: "$vectorResults._id", title: "$vectorResults.text", vs_score: 1 } },
{ $unionWith: {
coll: "hybrid_col",
pipeline: [
{ $match: { $text: { $search: "cat" } } },
{ $addFields: { textScore: { $meta: "textScore" } } },
{ $group: { _id: null, textResults: { $push: "$$ROOT" } } },
{ $unwind: { path: "$textResults", includeArrayIndex: "textRank" } },
{ $addFields: { fts_score: { $divide: [1, { $add: ["$textRank", 10, 1] }] } } },
{ $project: { _id: "$_id", title: "$text", fts_score: 1 } }
]
}},
{ $group: {
_id: "$title",
finalScore: { $max: { $add: [{ $ifNull: ["$vs_score", 0] }, { $ifNull: ["$fts_score", 0] }] } }
}},
{ $sort: { finalScore: -1 } }
])
In this example, the first $search stage performs a vector similarity search on the vector field for the query vector [0.1, 0.2, 0.3], returning the top five most similar documents (k: 5). The $group stage then groups all the vector search results into a single document with an array called vectorResults. Following this, the $unwind stage deconstructs the vectorResults array, creating a separate document for each result and adding its rank (vectorRank). The subsequent $addFields stage calculates the Reciprocal Rank Fusion (RRF) score for each vector search result based on its rank. The $project stage then selects the _id, title, and the calculated vs_score from the vector search results.
Moving on to the text search phase, the $unionWith stage combines the results of the vector search pipeline with those of a separate full-text search pipeline. In the full-text search phase, the $match stage with the $text operator performs a full-text search for the term "cat" in the text field. The $addFields stage retrieves the relevance score (textScore) assigned by the $text operator. Similar to the vector search results, the $group stage groups all full-text search results, and the $unwind stage deconstructs the textResults array, adding the rank (textRank). A different penalty (10) is used here, which can be tuned. The $project stage selects the _id, title, and fts_score from the full-text search results. After combining the results, the $group stage groups documents by their title.
Finally, the $project stage calculates the finalScore for each document by taking the maximum of its vector RRF score (vs_score) and its full-text RRF score (fts_score). $ifNull handles cases where a document might only be present in one of the search results.
Limitations
- Azure DocumentDB's full-text search currently only supports BM25 ranking in gated preview.
- Currently, there is no single, dedicated command to perform a hybrid search directly. You need to construct the hybrid search query using the aggregation pipeline as demonstrated in the examples above.