Tutorial: Choose embedding and chat models for RAG in Azure AI Search

บทความ
09/15/2024

A RAG solution built on Azure AI Search takes a dependency on embedding models for vectorization, and on chat models for conversational search over your data.

In this tutorial, you:

Learn which models in the Azure cloud work with built-in integration
Learn about the Azure models used for chat
Deploy models and collect model information for your code
Configure search engine access to Azure models
Learn about custom skills and vectorizers for attaching non-Azure models

If you don't have an Azure subscription, create a free account before you begin.

Prerequisites

The Azure portal, used to deploy models and configure role assignments in the Azure cloud.
An Owner role on your Azure subscription, necessary for creating role assignments. Your model provider has more role requirements for deploying and accessing models. Those are noted in the following steps.
A model provider, such as Azure OpenAI, Azure AI Vision via an Azure AI multi-service account, or Azure AI Studio.

We use Azure OpenAI in this tutorial. Other providers are listed so that you know your options for integrated vectorization.
Azure AI Search, Basic tier or higher provides a managed identity used in role assignments.
A shared region. To complete all of the tutorials in this series, the region must support both Azure AI Search and the model provider. See supported regions for:
- Azure OpenAI regions
- Azure AI Vision regions
- Azure AI Studio regions.
Azure AI Search is currently facing limited availability in some regions, such as West Europe and West US 2/3. Check the Azure AI Search region list to confirm region status.

Tip

Currently, the following regions provide the most overlap among the model providers and have the most capacity: East US, East US2, and South Central in the Americas; France Central or Switzerland North in Europe; Australia East in Asia Pacific.

For Azure AI Vision and AI Search interoperability, choose one of these regions: East US, France Central, Korea Central, North Europe, South East Asia, or West US.

Review models supporting built-in vectorization

Vectorized content improves the query results in a RAG solution. Azure AI Search supports an embedding action in an indexing pipeline. It also supports an embedding action at query time, converting text or image inputs into vectors for a vector search. In this step, identify an embedding model that works for your content and queries. If you're providing raw vector data and raw vector queries, or if your RAG solution doesn't include vector data, skip this step.

Vector queries that include a text-to-vector conversion step must use the same embedding model that was used during indexing. The search engine won't throw an error if you use different models, but you'll get poor results.

To meet the same-model requirement, choose embedding models that can be referenced through skills during indexing and through vectorizers during query execution. Review Create an indexing pipeline for code that calls an embedding skill and a matching vectorizer.

Azure AI Search provides skill and vectorizer support for the following embedding models in the Azure cloud.

Client	Embedding models	Skill	Vectorizer
Azure OpenAI	text-embedding-ada-002, text-embedding-3-large, text-embedding-3-small	AzureOpenAIEmbedding	AzureOpenAIEmbedding
Azure AI Vision	multimodal 4.0 ¹	AzureAIVision	AzureAIVision
Azure AI Studio model catalog	OpenAI-CLIP-Image-Text-Embeddings-vit-base-patch32, OpenAI-CLIP-Image-Text-Embeddings-ViT-Large-Patch14-336, Facebook-DinoV2-Image-Embeddings-ViT-Base, Facebook-DinoV2-Image-Embeddings-ViT-Giant, Cohere-embed-v3-english, Cohere-embed-v3-multilingual	AML ²	Azure AI Studio model catalog

¹ Supports image and text vectorization.

² Deployed models in the model catalog are accessed over an AML endpoint. We use the existing AML skill for this connection.

You can use other models besides those listed here. For more information, see Use non-Azure models for embeddings in this article.

Note

Inputs to an embedding models are typically chunked data. In an Azure AI Search RAG pattern, chunking is handled in the indexer pipeline, covered in another tutorial in this series.

Review models used for generative AI at query time

Azure AI Search doesn't have integration code for chat models, so you should choose an LLM that you're familiar with and that meets your requirements. You can modify query code to try different models without having to rebuild an index or rerun any part of the indexing pipeline. Review Search and generate answers for code that calls the chat model.

The following models are commonly used for a chat search experience:

Client	Chat models
Azure OpenAI	GPT-35-Turbo, GPT-4, GPT-4o, GPT-4 Turbo

GPT-35-Turbo and GPT-4 models are optimized to work with inputs formatted as a conversation.

Deploy models and collect information

Models must be deployed and accessible through an endpoint. Both embedding-related skills and vectorizers need the number of dimensions and the model name. Other details about your model might be required by the client used on the connection.

This tutorial series uses the following models and model providers:

Text-embedding-ada-02 on Azure OpenAI for embeddings
GPT-35-Turbo on Azure OpenAI for chat completion

You must have Cognitive Services OpenAI Contributor or higher to deploy models in Azure OpenAI.

Go to Azure OpenAI Studio.
Select Deployments on the left menu.
Select Deploy model > Deploy base model.
Select text-embedding-ada-02 from the dropdown list and confirm the selection.
Specify a deployment name. We recommend "text-embedding-ada-002".
Accept the defaults.
Select Deploy.
Repeat the previous steps for gpt-35-turbo.
Make a note of the model names and endpoint. Embedding skills and vectorizers assemble the full endpoint internally, so you only need the resource URI. For example, given https://MY-FAKE-ACCOUNT.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15, the endpoint you should provide in skill and vectorizer definitions is https://MY-FAKE-ACCOUNT.openai.azure.com.

Configure search engine access to Azure models

For pipeline and query execution, this tutorial uses Microsoft Entra ID for authentication and roles for authorization.

Assign yourself and the search service identity permissions on Azure OpenAI. The code for this tutorial runs locally. Requests to Azure OpenAI originate from your system. Also, search results from the search engine are passed to Azure OpenAI. For these reasons, both you and the search service need permissions on Azure OpenAI.

Sign in to the Azure portal and find your search service.
Configure Azure AI Search to use a system-managed identity.
Find your Azure OpenAI resource.
Select Access control (IAM) on the left menu.
Select Add role assignment.
Select Cognitive Services OpenAI User.
Select Managed identity and then select Members. Find the system-managed identity for your search service in the dropdown list.
Next, select User, group, or service principal and then select Members. Search for your user account and then select it from the dropdown list.
Select Review and Assign to create the role assignments.

For access to models on Azure AI Vision, assign Cognitive Services OpenAI User. For Azure AI Studio, assign Azure AI Developer.

Use non-Azure models for embeddings

The pattern for integrating any embedding model is to wrap it in a custom skill and custom vectorizer. This section provides links to reference articles. For a code example that calls a non-Azure model, see custom-embeddings demo.

Client	Embedding models	Skill	Vectorizer
Any	Any	custom skill	custom vectorizer

Next step

Design an index

แชร์ผ่าน