Generate embeddings for search queries and documents

2025-06-11

Azure AI Search doesn't host embedding models, so one of your challenges is creating vectors for query inputs and outputs. You can use any supported embedding model, but this article assumes Azure OpenAI embedding models for illustration.

We recommend integrated vectorization, which provides built-in data chunking and vectorization. Integrated vectorization takes a dependency on indexers, skillsets, and built-in or custom skills that point to a model that executes externally from Azure AI Search. Several built-in skills point to embedding models in Azure AI Foundry, which makes integrated vectorization your easiest solution for solving the embedding challenge.

If you want to handle data chunking and vectorization yourself, we provide demos in the sample repository that show you how to integrate with other community solutions.

How embedding models are used in vector queries

Query inputs are either vectors, or text or images that are converted to vectors during query processing. The built-in solution in Azure AI Search is to use a vectorizer.

Alternatively, you can also handle the conversion yourself by passing the query input to an embedding model of your choice. To avoid rate limiting, you can implement retry logic in your workload. For the Python demo, we used tenacity.
Query outputs are any matching documents found in a search index. Your search index must have been previously loaded with documents having one or more vector fields with embeddings. Whatever embedding model you used for indexing, use that same model for queries.

Create resources in the same region

Although integrated vectorization with Azure OpenAI embedding models doesn't require resources to be in the same region, using the same region can improve performance and reduce latency.

Check regions for a text embedding model.
Find the same region for Azure AI Search.
To support hybrid queries that include semantic ranking, or if you want to try machine learning model integration using a custom skill in an AI enrichment pipeline, select an Azure AI Search region that provides those features.

Generate an embedding for an improvised query

The following Python code generates an embedding that you can paste into the "values" property of a vector query.

!pip install openai

import openai

openai.api_type = "azure"
openai.api_key = "YOUR-API-KEY"
openai.api_base = "https://YOUR-OPENAI-RESOURCE.openai.azure.com"
openai.api_version = "2024-02-01"

response = openai.Embedding.create(
    input="How do I use Python in VS Code?",
    engine="text-embedding-ada-002"
)
embeddings = response['data'][0]['embedding']
print(embeddings)

Output is a vector array of 1,536 dimensions.

Choose an embedding model in Azure AI Foundry

In the Azure AI Foundry portal, you have the option of creating a search index when you add knowledge to your agent workflow. A wizard guides you through the steps. When asked to provide an embedding model that vectorizes your plain text content, you can use one of the following supported models:

text-embedding-3-large
text-embedding-3-small
text-embedding-ada-002
Cohere-embed-v3-english
Cohere-embed-v3-multilingual

Your model must already be deployed and you must have permission to access it. For more information, see Deploy AI models in Azure AI Foundry portal.

Tips and recommendations for embedding model integration

Identify use cases: Evaluate the specific use cases where embedding model integration for vector search features can add value to your search solution. This can include multimodal or matching image content with text content, multilingual search, or similarity search.
Design a chunking strategy: Embedding models have limits on the number of tokens they can accept, which introduces a data chunking requirement for large files. For more information, see Chunk large documents for vector search solutions.
Optimize cost and performance: Vector search can be resource-intensive and is subject to maximum limits, so consider only vectorizing the fields that contain semantic meaning. Reduce vector size so that you can store more vectors for the same price.
Choose the right embedding model: Select an appropriate model for your specific use case, such as word embeddings for text-based searches or image embeddings for visual searches. Consider using pretrained models like text-embedding-ada-002 from OpenAI or Image Retrieval REST API from Azure AI Computer Vision.
Normalize Vector lengths: Ensure that the vector lengths are normalized before storing them in the search index to improve the accuracy and performance of similarity search. Most pretrained models already are normalized but not all.
Fine-tune the model: If needed, fine-tune the selected model on your domain-specific data to improve its performance and relevance to your search application.
Test and iterate: Continuously test and refine your embedding model integration to achieve the desired search performance and user satisfaction.

Share via

Generate embeddings for search queries and documents

How embedding models are used in vector queries

Create resources in the same region

Generate an embedding for an improvised query

Choose an embedding model in Azure AI Foundry

Tips and recommendations for embedding model integration

Next steps

Feedback

Additional resources