Create a vector query in Azure AI Search
In Azure AI Search, if you added vector fields to a search index, this article explains how to:
Code samples in the azure-search-vector repository demonstrate end-to-end workflows that include schema definition, vectorization, indexing, and queries.
Prerequisites
Azure AI Search, in any region and on any tier. Most existing services support vector search. For services created prior to January 2019, a small subset won't support vector search. If an index containing vector fields fails to be created or updated, this is an indicator. In this situation, a new service must be created.
A search index containing vector fields. See Add vector fields to a search index.
Use REST API version 2023-11-01 if you want the stable version. Otherwise, you can continue to use 2023-10-01-Preview, 2023-07-01-Preview, the Azure SDK libraries, or Search Explorer in the Azure portal.
Tips
The stable version (2023-11-01) doesn't provide built-in vectorization of the query input string. Encoding (text-to-vector) of the query string requires that you pass the query string to an external embedding model for vectorization. You would then pass the response to the search engine for similarity search over vector fields.
The preview version (2023-10-01-Preview) adds integrated vectorization. Create and assign a vectorizer to get built-in embedding of query strings. Update your query to provide a text string to the vectorizer.
All results are returned in plain text, including vectors in fields marked as retrievable
. Because numeric vectors aren't useful in search results, choose other fields in the index as a proxy for the vector match. For example, if an index has "descriptionVector" and "descriptionText" fields, the query can match on "descriptionVector" but the search result can show "descriptionText". Use the select
parameter to specify only human-readable fields in the results.
Check your index for vector fields
If you aren't sure whether your search index already has vector fields, look for:
A non-empty
vectorSearch
property containing algorithms and other vector-related configurations embedded in the index schema.In the fields collection, look for fields of type
Collection(Edm.Single)
with adimensions
attribute, and avectorSearch
section in the index.
You can also send an empty query (search=*
) against the index. If the vector field is "retrievable", the response includes a vector field consisting of an array of floating point values.
Convert query input into a vector
This section applies to the generally available version of vector search (2023-11-01).
To query a vector field, the query itself must be a vector. To convert a text query string provided by a user into a vector representation, your application must call an embedding library or API endpoint that provides this capability. Use the same embedding that you used to generate embeddings in the source documents.
You can find multiple instances of query string conversion in the azure-search-vector repository for each of the Azure SDKs.
Here's a REST API example of a query string submitted to a deployment of an Azure OpenAI model:
POST https://{{openai-service-name}}.openai.azure.com/openai/deployments/{{openai-deployment-name}}/embeddings?api-version={{openai-api-version}}
Content-Type: application/json
api-key: {{admin-api-key}}
{
"input": "what azure services support full text search"
}
The expected response is 202 for a successful call to the deployed model. The "embedding" field in the body of the response is the vector representation of the query string "input". For testing purposes, you would copy the value of the "embedding" array into "vector.value" in a query request, using syntax shown in the next several sections.
The actual response for this POST call to the deployment model includes 1536 embeddings, trimmed here to just the first few vectors for readability.
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.009171937,
0.018715322,
...
-0.0016804502
]
}
],
"model": "ada",
"usage": {
"prompt_tokens": 7,
"total_tokens": 7
}
}
Your application code is responsible for handling this response and providing the embedding in the query request.
Vector query request
This section shows you the basic structure of a vector query. You can use the Azure portal, REST APIs, or the Azure SDKs to query vectors.
REST API version 2023-11-01 is the stable API version for Search POST. This API supports:
vectorQueries
is the construct for vector search.kind
set tovector
specifies the query string is a vector.vector
is the query string.exhaustive
(optional) invokes exhaustive KNN at query time, even if the field is indexed for HNSW.
In the following example, the vector is a representation of this query string: "what Azure services support full text search". The query targets the "contentVector" field. The query returns k
results. The actual vector has 1536 embeddings, so it's trimmed in this example for readability.
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2023-11-01
Content-Type: application/json
api-key: {{admin-api-key}}
{
"count": true,
"select": "title, content, category",
"vectorQueries": [
{
"kind": "vector"
"vector": [
-0.009154141,
0.018708462,
. . .
-0.02178128,
-0.00086512347
],
"exhaustive": true,
"fields": "contentVector",
"k": 5
}
]
}
Vector query response
Here's a modified example so that you can see the basic structure of a response from a pure vector query. The previous query examples selected title, content, category as a best practice. This example shows a contentVector field in the response to illustrate that retrievable vector fields can be included.
{
"@odata.count": 3,
"value": [
{
"@search.score": 0.80025613,
"title": "Azure Search",
"category": "AI + Machine Learning",
"contentVector": [
-0.0018343845,
0.017952163,
0.0025753193,
...
]
},
{
"@search.score": 0.78856903,
"title": "Azure Application Insights",
"category": "Management + Governance",
"contentVector": [
-0.016821077,
0.0037742127,
0.016136652,
...
]
},
{
"@search.score": 0.78650564,
"title": "Azure Media Services",
"category": "Media",
"contentVector": [
-0.025449317,
0.0038463024,
-0.02488436,
...
]
}
]
}
Key points:
k
determines how many nearest neighbor results are returned. In the example above, ak
value of three was used. Vector queries always returnk
results, assuming at leastk
documents exist, even if there are documents with poor similarity, because the algorithm is only identifying thek
nearest neighbors to the query vector. As a result, note that both count and facet aggregations (facet counts) operate on thisk
recall set.The
@search.score
is determined by the vector search algorithm (HNSW algorithm and acosine
similarity metric in this example).Fields include text and vector values. The content vector field consists of 1536 dimensions for each match, so it's truncated for brevity (normally, you might exclude vector fields from results). The text fields used in the response (
"select": "title, category"
) aren't used during query execution. The match is made on vector data alone. However, a response can include any "retrievable" field in an index. As such, the inclusion of text fields is helpful because its values are easily recognized by users.
Vector query with filter
A query request can include a vector query and a filter expression. Filters apply to "filterable" text and numeric fields, and are useful for including or excluding search documents based on filter criteria. Although a vector field isn't filterable itself, a query can specify filters on other fields in the same index.
In newer API versions, you can set a filter mode to apply filters before or after vector query execution. For a comparison of each mode and the expected performance based on index size, see Filters in vector queries.
Tip
If you don't have source fields with text or numeric values, check for document metadata, such as LastModified or CreatedBy properties, that might be useful in a metadata filter.
REST API version 2023-11-01 is the stable version for this API. It has:
vectorFilterMode
for prefilter (default) or postfilter filtering modes.filter
provides the criteria.
In the following example, the vector is a representation of this query string: "what Azure services support full text search". The query targets the "contentVector" field. The actual vector has 1536 embeddings, so it's trimmed in this example for readability.
The filter criteria are applied to a filterable text field ("category" in this example) before the search engine executes the vector query.
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2023-11-01
Content-Type: application/json
api-key: {{admin-api-key}}
{
"count": true,
"select": "title, content, category",
"filter": "category eq 'Databases'",
"vectorFilterMode": "preFilter",
"vectorQueries": [
{
"kind": "vector"
"vector": [
-0.009154141,
0.018708462,
. . .
-0.02178128,
-0.00086512347
],
"exhaustive": true,
"fields": "contentVector",
"k": 5
}
]
}
Multiple vector fields
You can set the "vectors.fields" property to multiple vector fields. For example, the Postman collection has vector fields named "titleVector" and "contentVector". A single vector query executes over both the "titleVector" and "contentVector" fields, which must have the same embedding space since they share the same query vector.
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2023-11-01
Content-Type: application/json
api-key: {{admin-api-key}}
{
"count": true,
"select": "title, content, category",
"vectorQueries": [
{
"kind": "vector"
"vector": [
-0.009154141,
0.018708462,
. . .
-0.02178128,
-0.00086512347
],
"exhaustive": true,
"fields": "contentVector, titleVector",
"k": 5
}
]
}
Multiple vector queries
Multi-query vector search sends multiple queries across multiple vector fields in your search index. A common example of this query request is when using models such as CLIP for a multi-modal vector search where the same model can vectorize image and non-image content.
The following query example looks for similarity in both myImageVector
and myTextVector
, but sends in two different query embeddings respectively. This scenario is ideal for multi-modal use cases where you want to search over different embedding spaces. This query produces a result that's scored using Reciprocal Rank Fusion (RRF).
vectorQueries
provides an array of vector queries.vector
contains the image vectors and text vectors in the search index. Each instance is a separate query.fields
specifies which vector field to target.k
is the number of nearest neighbor matches to include in results.
{
"count": true,
"select": "title, content, category",
"vectorQueries": [
{
"kind": "vector"
"vector": [
-0.009154141,
0.018708462,
. . .
-0.02178128,
-0.00086512347
],
"fields": "myimagevector",
"k": 5
},
{
"kind": "vector"
"vector": [
-0.002222222,
0.018708462,
-0.013770515,
. . .
],
"fields": "mytextvector",
"k": 5
}
]
}
Search results would include a combination of text and images, assuming your search index includes a field for the image file (a search index doesn't store images).
Query with integrated vectorization (preview)
This section shows a vector query that invokes the new integrated vectorization preview feature. Use 2023-10-01-Preview REST API or an updated beta Azure SDK package.
A prerequisite is a search index having a vectorizer configured and assigned to a vector field. The vectorizer provides connection information to an embedding model used at query time.
Queries provide text strings instead of vectors:
kind
must be set totext
.text
must have a text string. It's passed to the vectorizer assigned to the vector field.fields
is the vector field to search.
Here's a simple example of a query that's vectorized at query time. The text string is vectorized and then used to query the descriptionVector field.
POST https://{{search-service}}.search.windows.net/indexes/{{index}}/docs/search?api-version=2023-10-01-preview
{
"select": "title, genre, description",
"vectorQueries": [
{
"kind": "text"
"text": "mystery novel set in London",
"fields": "descriptionVector",
"k": 5
}
]
}
Here's a hybrid query, with multiple vector fields and queries and semantic ranking. Again, the differences are the kind
of vector query and the text
string instead of a vector.
In this example, the search engine makes three vectorization calls to the vectorizers assigned to descriptionVector, synopsisVector, and authorBioVector in the index. The resulting vectors are used to retrieve documents against their respective fields. The search engine also executes the search
query.
POST https://{{search-service}}.search.windows.net/indexes/{{index}}/docs/search?api-version=2023-10-01-preview
Content-Type: application/json
api-key: {{admin-api-key}}
{
"search":"mystery novel set in London",
"searchFields":"description, synopsis",
"semanticConfiguration":"my-semantic-config",
"queryType":"semantic",
"select": "title, author, synopsis",
"filter": "genre eq 'mystery'",
"vectorFilterMode": "postFilter",
"vectorQueries": [
{
"kind": "text"
"text": "mystery novel set in London",
"fields": "descriptionVector, synopsisVector",
"k": 5
},
{
"kind": "text"
"text": "living english author",
"fields": "authorBioVector",
"k": 5
}
]
}
The scored results from all four queries are fused using RRF ranking. Secondary semantic ranking is invoked over the fused search results, but on the searchFields
only, boosting results that are the most semantically aligned to "search":"mystery novel set in London"
.
Note
Vectorizers are used during indexing and querying. If you don't need data chunking and vectorization in the index, you can skip steps like creating an indexer, skillset, and data source. In this scenario, the vectorizer is used only at query time to convert a text string to an embedding.
Configure a query response
When you're setting up the vector query, think about the response structure. The response is a flattened rowset. Parameters on the query determine which fields are in each row and how many rows are in the response. The search engine ranks the matching documents and returns the most relevant results.
Fields in a response
Search results are composed of "retrievable" fields from your search index. A result is either:
- All "retrievable" fields (a REST API default).
- Fields explicitly listed in a "select" parameter on the query.
The examples in this article used a "select" statement to specify text (non-vector) fields in the response.
Note
Vectors aren't designed for readability, so avoid returning them in the response. Instead, choose non-vector fields that are representative of the search document. For example, if the query targets a "descriptionVector" field, return an equivalent text field if you have one ("description") in the response.
Number of results
A query might match to any number of documents, as many as all of them if the search criteria are weak (for example "search=*" for a null query). Because it's seldom practical to return unbounded results, you should specify a maximum for the response:
"k": n
results for vector-only queries"top": n
results for hybrid queries that include a "search" parameter
Both "k" and "top" are optional. Unspecified, the default number of results in a response is 50. You can set "top" and "skip" to page through more results or change the default.
Ranking
Ranking of results is computed by either:
- The similarity metric specified in the index
vectorSearch
section for a vector-only query. Valid values arecosine
,euclidean
, anddotProduct
. - Reciprocal Rank Fusion (RRF) if there are multiple sets of search results.
Azure OpenAI embedding models use cosine similarity, so if you're using Azure OpenAI embedding models, cosine
is the recommended metric. Other supported ranking metrics include euclidean
and dotProduct
.
Multiple sets are created if the query targets multiple vector fields, or if the query is a hybrid of vector and full text search, with or without semantic ranking. Within vector search, a vector query can only target one internal vector index. So for multiple vector fields and multiple vector queries, the search engine generates multiple queries that target the respective vector indexes of each field. Output is a set of ranked results for each query, which are fused using RRF. For more information, see Vector query execution and scoring.
Next steps
As a next step, we recommend reviewing the demo code for Python, C# or JavaScript.
Feedback
Submit and view feedback for