Vector search in Azure AI Search

Vector search is an approach in information retrieval that uses numeric representations of content for search scenarios. Because the content is numeric rather than plain text, the search engine matches on vectors that are the most similar to the query, with no requirement for matching on exact terms.

This article is a high-level introduction to vector support in Azure AI Search. It also explains integration with other Azure services and covers terminology and concepts related to vector search development.

We recommend this article for background, but if you'd rather get started, follow these steps:

You could also begin with the vector quickstart or the code samples on GitHub.

Vector search is in the Azure portal and the Azure SDKs for .NET, Python, and JavaScript.

Vector search is a new capability for indexing, storing, and retrieving vector embeddings from a search index. You can use it to power similarity search, multi-modal search, recommendations engines, or applications implementing the Retrieval Augmented Generation (RAG) architecture.

The following diagram shows the indexing and query workflows for vector search.

Architecture of vector search workflow.

On the indexing side, prepare source documents that contain embeddings. Although integrated vectorization is in public preview, the generally available version of Azure AI Search doesn't generate embeddings. If you're bound to a no-preview feature policy, your solution should include calls to Azure OpenAI or other models that can transform image, audio, text, and other content into vector representations. Add a vector field to your index definition on Azure AI Search. Load the index with a documents payload that includes the vectors. Your index is now ready to query.

On the query side, in your client application, collect the query input. Add a step that converts the input into a vector, and then send the vector query to your index on Azure AI Search for a similarity search. Azure AI Search returns documents with the requested k nearest neighbors (kNN) in the results.

You can index vector data as fields in documents alongside alphanumeric content. Vector queries can be issued singly or in combination with filters and other query types, including term queries (hybrid search) and semantic ranking in the same search request.

Availability and pricing

Vector search is available as part of all Azure AI Search tiers in all regions at no extra charge.

Newer services created after July 1, 2023 support higher quotas for vector indexes.

Note

Some older search services created before January 1, 2019 are deployed on infrastructure that doesn't support vector workloads. If you try to add a vector field to a schema and get an error, it's a result of outdated services. In this situation, you must create a new search service to try out the vector feature.

What scenarios can vector search support?

Scenarios for vector search include:

  • Vector search for text. Encode text using embedding models such as OpenAI embeddings or open source models such as SBERT, and retrieve documents with queries that are also encoded as vectors.

  • Vector search across different data types (multi-modal). Encode images, text, audio, and video, or even a mix of them (for example, with models like CLIP) and do a similarity search across them.

  • Multi-lingual search. Use a multi-lingual embeddings model to represent your document in multiple languages in a single vector space to find documents regardless of the language they are in.

  • Hybrid search. Vector search is implemented at the field level, which means you can build queries that include both vector fields and searchable text fields. The queries execute in parallel and the results are merged into a single response. Optionally, add semantic ranking for even more accuracy with L2 reranking using the same language models that power Bing.

  • Filtered vector search. A query request can include a vector query and a filter expression. Filters apply to text and numeric fields, and are useful for metadata filters, and including or excluding search documents based on filter criteria. Although a vector field isn't filterable itself, you can set up a filterable text or numeric field. The search engine can process the filter before or after the vector query executes.

  • Vector database. Use Azure AI Search as a vector store to serve as long-term memory or an external knowledge base for Large Language Models (LLMs), or other applications. For example, you can use Azure AI Search as a vector index in an Azure Machine Learning prompt flow for Retrieval Augmented Generation (RAG) applications.

You can use other Azure services to provide embeddings and data storage.

Vector search concepts

If you're new to vectors, this section explains some core concepts.

Vector search is a method of information retrieval where documents and queries are represented as vectors instead of plain text. In vector search, machine learning models generate the vector representations of source inputs, which can be text, images, audio, or video content. Having a mathematic representation of content provides a common basis for search scenarios. If everything is a vector, a query can find a match in vector space, even if the associated original content is in different media or in a different language than the query.

Vectors can overcome the limitations of traditional keyword-based search by using machine learning models to capture the meaning of words and phrases in context, rather than relying solely on lexical analysis and matching of individual query terms. By capturing the intent of the query, vector search can return more relevant results that match the user's needs, even if the exact terms aren't present in the document.

Additionally, vector search can be applied to different types of content, such as images and videos, not just text. This enables new search experiences such as multi-modal search or cross-language search in multi-lingual applications.

Embeddings and vectorization

Embeddings are a specific type of vector representation of content or a query, created by machine learning models that capture the semantic meaning of text or representations of other content such as images. Natural language machine learning models are trained on large amounts of data to identify patterns and relationships between words. During training, they learn to represent any input as a vector of real numbers in an intermediary step called the encoder. After training is complete, these language models can be modified so the intermediary vector representation becomes the model's output. The resulting embeddings are high-dimensional vectors, where words with similar meanings are closer together in the vector space, as explained in Understand embeddings (Azure OpenAI).

The effectiveness of vector search in retrieving relevant information depends on the effectiveness of the embedding model in distilling the meaning of documents and queries into the resulting vector. The best models are well-trained on the types of data they're representing. You can evaluate existing models such as Azure OpenAI text-embedding-ada-002, bring your own model that's trained directly on the problem space, or fine-tune a general-purpose model. Azure AI Search doesn't impose constraints on which model you choose, so pick the best one for your data.

In order to create effective embeddings for vector search, it's important to take input size limitations into account. We recommend following the guidelines for chunking data before generating embeddings. This best practice ensures that the embeddings accurately capture the relevant information and enable more efficient vector search.

What is the embedding space?

Embedding space is the corpus for vector queries. Within a search index, it's all of the vector fields populated with embeddings from the same embedding model. Machine learning models create the embedding space by mapping individual words, phrases, or documents (for natural language processing), images, or other forms of data into a representation comprised of a vector of real numbers representing a coordinate in a high-dimensional space. In this embedding space, similar items are located close together, and dissimilar items are located farther apart.

For example, documents that talk about different species of dogs would be clustered close together in the embedding space. Documents about cats would be close together, but farther from the dogs cluster while still being in the neighborhood for animals. Dissimilar concepts such as cloud computing would be much farther away. In practice, these embedding spaces are abstract and don't have well-defined, human-interpretable meanings, but the core idea stays the same.

In vector search, the search engine searches through the vectors within the embedding space to identify those that are near to the query vector. This technique is called nearest neighbor search. Nearest neighbors help quantify the similarity between items. A high degree of vector similarity indicates that the original data was similar too. To facilitate fast nearest neighbor search, the search engine will perform optimizations or employ data structures or data partitioning to reduce the search space. Each vector search algorithm will have different approaches to this problem, trading off different characteristics such as latency, throughput, recall, and memory. To compute similarity, similarity metrics provide the mechanism for computing this distance.

Azure AI Search currently supports the following algorithms:

  • Hierarchical Navigable Small World (HNSW): HNSW is a leading ANN algorithm optimized for high-recall, low-latency applications where data distribution is unknown or can change frequently. It organizes high-dimensional data points into a hierarchical graph structure that enables fast and scalable similarity search while allowing a tunable a trade-off between search accuracy and computational cost. Because the algorithm requires all data points to reside in memory for fast random access, this algorithm consumes vector index size quota.

  • Exhaustive K-nearest neighbors (KNN): Calculates the distances between the query vector and all data points. It's computationally intensive, so it works best for smaller datasets. Because the algorithm doesn't require fast random access of data points, this algorithm doesn't consume vector index size quota. However, this algorithm will provide the global set of nearest neighbors.

Within an index definition, you can specify one or more algorithms, and then for each vector field specify which algorithm to use:

Algorithm parameters that are used to initialize the index during index creation are immutable and can't be changed after the index is built. However, parameters that affect the query-time characteristics (efSearch) can be modified.

In addition, fields that specify HNSW algorithm also support exhaustive KNN search using the query request parameter "exhaustive": true. The opposite isn't true however. If a field is indexed for exhaustiveKnn, you can't use HNSW in the query because the additional data structures that enable efficient search don’t exist.

Approximate Nearest Neighbors

Approximate Nearest Neighbor search (ANN) is a class of algorithms for finding matches in vector space. This class of algorithms employs different data structures or data partitioning methods to significantly reduce the search space to accelerate query processing.

ANN algorithms sacrifice some accuracy, but offer scalable and faster retrieval of approximate nearest neighbors, which makes them ideal for balancing accuracy against efficiency in modern information retrieval applications. You can adjust the parameters of your algorithm to fine-tune the recall, latency, memory, and disk footprint requirements of your search application.

Azure AI Search uses HNSW for its ANN algorithm.

Next steps