Edit

Share via


Truncate dimensions using MRL compression (preview)

Important

This feature is in public preview under supplemental terms of use. We recommend the latest preview REST API version for this feature.

Exercise the ability to use fewer dimensions on text-embedding-3 models. On Azure OpenAI, text-embedding-3 models are retrained on the Matryoshka Representation Learning (MRL) technique that produces multiple vector representations at different levels of compression. This approach produces faster searches and reduced storage costs with minimal loss of semantic information.

In Azure AI Search, MRL support supplements scalar and binary quantization. When you use either quantization method, you can specify a truncationDimension property on your vector fields to reduce the dimensionality of text embeddings.

MRL multilevel compression saves on vector storage and improves query response times for vector queries based on text embeddings. In Azure AI Search, MRL support is only offered together with another method of quantization. Using binary quantization with MRL provides the maximum vector index size reduction. To achieve maximum storage reduction, use binary quantization with MRL and set stored to false.

Prerequisites

Supported clients

You can use the REST APIs or Azure SDK beta packages to implement MRL compression. At this time, there's no Azure portal or Azure AI Foundry support.

Use MRL-extended text embeddings

MRL is built into the text embedding model you're already using. To use MRL capabilities in Azure AI Search:

  1. Use Create or Update Index (preview) or an equivalent API to specify the index schema.

  2. Add vector fields to the index definition.

  3. Specify a vectorSearch.compressions object in your index definition.

  4. Include a quantization method, either scalar or binary (recommended).

  5. Include the truncationDimension parameter and set it to 512. If you're using the text-embedding-3-large model, you can set it as low as 256.

  6. Include a vector profile that specifies the HNSW algorithm and the vector compression object.

  7. Assign the vector profile to a vector field of type Edm.Half or Edm.Single in the fields collection.

There are no query-side modifications for using an MRL-capable text embedding model. MRL support doesn't affect integrated vectorization, text-to-query conversions at query time, semantic ranking, and other relevance-enhancement features, such as reranking with original vectors and oversampling.

Although indexing is slower due to the extra steps, queries are faster.

Example: Vector search configuration that supports MRL

The following example illustrates a vector search configuration that meets the requirements and recommendations of MRL.

truncationDimension is a compression property. It specifies how much to shrink the vector graph in memory together with a compression method like scalar or binary compression. We recommend 1,024 or higher for truncationDimension with binary quantization. A dimensionality of less than 1,000 degrades the quality of search results when using MRL and binary compression.

{ 
  "vectorSearch": { 
    "profiles": [ 
      { 
        "name": "use-bq-with-mrl", 
        "compression": "use-mrl,use-bq", 
        "algorithm": "use-hnsw" 
      } 
    ],
    "algorithms": [
       {
          "name": "use-hnsw",
          "kind": "hnsw",
          "hnswParameters": {
             "m": 4,
             "efConstruction": 400,
             "efSearch": 500,
             "metric": "cosine"
          }
       }
    ],
    "compressions": [ 
      { 
        "name": "use-mrl", 
        "kind": "truncation", 
        "rerankWithOriginalVectors": true, 
        "defaultOversampling": 10, 
        "truncationDimension": 1024
      }, 
      { 
        "name": "use-bq", 
        "kind": "binaryQuantization", 
        "rerankWithOriginalVectors": true,
        "defaultOversampling": 10
       } 
    ] 
  } 
} 

Here's an example of a fully specified vector field definition that satisfies the requirements for MRL. Recall that vector fields must:

  • Be of type Edm.Half or Edm.Single.

  • Have a vectorSearchProfile property that specifies the algorithm and compression settings.

  • Have a dimensions property that specifies the number of dimensions for scoring and ranking results. Its value should be the dimensions limit of the model you're using (1,536 for text-embedding-3-small).

{
    "name": "text_vector",
    "type": "Collection(Edm.Single)",
    "searchable": true,
    "filterable": false,
    "retrievable": false,
    "stored": false,
    "sortable": false,
    "facetable": false,
    "key": false,
    "indexAnalyzer": null,
    "searchAnalyzer": null,
    "analyzer": null,
    "normalizer": null,
    "dimensions": 1536,
    "vectorSearchProfile": "use-bq-with-mrl",
    "vectorEncoding": null,
    "synonymMaps": []
}