Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
This feature is in public preview under supplemental terms of use. We recommend the latest preview REST API version for this feature.
Exercise the ability to use fewer dimensions on text-embedding-3 models. On Azure OpenAI, text-embedding-3 models are retrained on the Matryoshka Representation Learning (MRL) technique that produces multiple vector representations at different levels of compression. This approach produces faster searches and reduced storage costs with minimal loss of semantic information.
In Azure AI Search, MRL support supplements scalar and binary quantization. When you use either quantization method, you can specify a truncationDimension
property on your vector fields to reduce the dimensionality of text embeddings.
MRL multilevel compression saves on vector storage and improves query response times for vector queries based on text embeddings. In Azure AI Search, MRL support is only offered together with another method of quantization. Using binary quantization with MRL provides the maximum vector index size reduction. To achieve maximum storage reduction, use binary quantization with MRL and set stored
to false.
Prerequisites
A text-embedding-3 model, such as text-embedding-3-small or text-embedding-3-large.
New vector fields of type
Edm.Half
orEdm.Single
. You can't add MRL compression to an existing field.Hierarchical Navigable Small World (HNSW) algorithm. This preview doesn't support exhaustive KNN.
Scalar or binary quantization. Truncated dimensions can be set only when scalar or binary quantization is configured. We recommend binary quantization for MRL compression.
Supported clients
You can use the REST APIs or Azure SDK beta packages to implement MRL compression. At this time, there's no Azure portal or Azure AI Foundry support.
REST API 2024-09-01-preview or later. We recommend the latest preview API.
Check the change logs for each Azure SDK beta package: Python, .NET, Java, JavaScript.
Use MRL-extended text embeddings
MRL is built into the text embedding model you're already using. To use MRL capabilities in Azure AI Search:
Use Create or Update Index (preview) or an equivalent API to specify the index schema.
Add vector fields to the index definition.
Specify a
vectorSearch.compressions
object in your index definition.Include a quantization method, either scalar or binary (recommended).
Include the
truncationDimension
parameter and set it to 512. If you're using the text-embedding-3-large model, you can set it as low as 256.Include a vector profile that specifies the HNSW algorithm and the vector compression object.
Assign the vector profile to a vector field of type
Edm.Half
orEdm.Single
in the fields collection.
There are no query-side modifications for using an MRL-capable text embedding model. MRL support doesn't affect integrated vectorization, text-to-query conversions at query time, semantic ranking, and other relevance-enhancement features, such as reranking with original vectors and oversampling.
Although indexing is slower due to the extra steps, queries are faster.
Example: Vector search configuration that supports MRL
The following example illustrates a vector search configuration that meets the requirements and recommendations of MRL.
truncationDimension
is a compression property. It specifies how much to shrink the vector graph in memory together with a compression method like scalar or binary compression. We recommend 1,024 or higher for truncationDimension
with binary quantization. A dimensionality of less than 1,000 degrades the quality of search results when using MRL and binary compression.
{
"vectorSearch": {
"profiles": [
{
"name": "use-bq-with-mrl",
"compression": "use-mrl,use-bq",
"algorithm": "use-hnsw"
}
],
"algorithms": [
{
"name": "use-hnsw",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}
],
"compressions": [
{
"name": "use-mrl",
"kind": "truncation",
"rerankWithOriginalVectors": true,
"defaultOversampling": 10,
"truncationDimension": 1024
},
{
"name": "use-bq",
"kind": "binaryQuantization",
"rerankWithOriginalVectors": true,
"defaultOversampling": 10
}
]
}
}
Here's an example of a fully specified vector field definition that satisfies the requirements for MRL. Recall that vector fields must:
Be of type
Edm.Half
orEdm.Single
.Have a
vectorSearchProfile
property that specifies the algorithm and compression settings.Have a
dimensions
property that specifies the number of dimensions for scoring and ranking results. Its value should be the dimensions limit of the model you're using (1,536 for text-embedding-3-small).
{
"name": "text_vector",
"type": "Collection(Edm.Single)",
"searchable": true,
"filterable": false,
"retrievable": false,
"stored": false,
"sortable": false,
"facetable": false,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null,
"normalizer": null,
"dimensions": 1536,
"vectorSearchProfile": "use-bq-with-mrl",
"vectorEncoding": null,
"synonymMaps": []
}