Azure OpenAI on your data (preview)
Azure OpenAI on your data enables you to run supported chat models such as GPT-35-Turbo and GPT-4 on your data without needing to train or fine-tune models. Running models on your data enables you to chat on top of, and analyze your data with greater accuracy and speed. By doing so, you can unlock valuable insights that can help you make better business decisions, identify trends and patterns, and optimize your operations. One of the key benefits of Azure OpenAI on your data is its ability to tailor the content of conversational AI.
Because the model has access to, and can reference specific sources to support its responses, answers are not only based on its pretrained knowledge but also on the latest information available in the designated data source. This grounding data also helps the model avoid generating responses based on outdated or incorrect information.
What is Azure OpenAI on your data
Azure OpenAI on your data works with OpenAI's powerful GPT-35-Turbo and GPT-4 language models, enabling them to provide responses based on your data. You can access Azure OpenAI on your data using a REST API or the web-based interface in the Azure OpenAI Studio to create a solution that connects to your data to enable an enhanced chat experience.
One of the key features of Azure OpenAI on your data is its ability to retrieve and utilize data in a way that enhances the model's output. Azure OpenAI on your data, together with Azure AI Search, determines what data to retrieve from the designated data source based on the user input and provided conversation history. This data is then augmented and resubmitted as a prompt to the OpenAI model, with retrieved information being appended to the original prompt. Although retrieved data is being appended to the prompt, the resulting input is still processed by the model like any other prompt. Once the data has been retrieved and the prompt has been submitted to the model, the model uses this information to provide a completion. See the Data, privacy, and security for Azure OpenAI Service article for more information.
Get started
To get started, connect your data source using Azure OpenAI Studio and start asking questions and chatting on your data.
Note
To get started, you need to already have been approved for Azure OpenAI access and have an Azure OpenAI Service resource with either the gpt-35-turbo or the gpt-4 models deployed.
Data formats and file types
Azure OpenAI on your data supports the following filetypes:
.txt
.md
.html
- Microsoft Word files
- Microsoft PowerPoint files
There is an upload limit, and there are some caveats about document structure and how it might affect the quality of responses from the model:
The model provides the best citation titles from markdown (
.md
) files.If a document is a PDF file, the text contents are extracted as a preprocessing step (unless you're connecting your own Azure AI Search index). If your document contains images, graphs, or other visual content, the model's response quality depends on the quality of the text that can be extracted from them.
If you're converting data from an unsupported format into a supported format, make sure the conversion:
- Doesn't lead to significant data loss.
- Doesn't add unexpected noise to your data.
This will impact the quality of the model response.
Ingesting your data
There are several different sources of data that you can use. The following sources will be connected to Azure AI Search:
- Blobs in an Azure storage container that you provide
- Local files uploaded using the Azure OpenAI Studio
You can additionally ingest your data from an existing Azure AI Search service, or use Azure Cosmos DB for MongoDB vCore.
Tip
For documents and datasets with long text, you should use the available data preparation script. The script chunks data so that your response with the service will be more accurate. This script also supports scanned PDF files and images.
Once data is ingested, an Azure AI Search index in your search resource gets created to integrate the information with Azure OpenAI models.
Data ingestion from Azure storage containers
Ingestion assets are created in Azure AI Search resource and Azure storage account. Currently these assets are: indexers, indexes, data sources, a custom skill in the search resource, and a container (later called the chunks container) in the Azure storage account. You can specify the input Azure storage container using the Azure OpenAI studio, or the ingestion API.
Data is read from the input container, contents are opened and chunked into small chunks with a maximum of 1024 tokens each. If vector search is enabled, the service will calculate the vector representing the embeddings on each chunk. The output of this step (called the "preprocessed" or "chunked" data) is stored in the chunks container created in the previous step.
The preprocessed data is loaded from the chunks container, and indexed in the Azure AI Search index.
Data ingestion from local files
Using Azure OpenAI Studio, you can upload files from your machine. The service then stores the files to an Azure storage container and performs ingestion from the container.
Data ingestion from URLs
Using Azure OpenAI Studio, you can paste URLs and the service will store the webpage content, using it when generating responses from the model.
Troubleshooting failed ingestion jobs
To troubleshoot a failed job, always look out for errors or warnings specified either in the API response or Azure OpenAI studio. Here are some of the common errors and warnings:
Quota Limitations Issues
An index with the name X in service Y could not be created. Index quota has been exceeded for this service. You must either delete unused indexes first, add a delay between index creation requests, or upgrade the service for higher limits.
Standard indexer quota of X has been exceeded for this service. You currently have X standard indexers. You must either delete unused indexers first, change the indexer 'executionMode', or upgrade the service for higher limits.
Resolution:
Upgrade to a higher pricing tier or delete unused assets.
Preprocessing Timeout Issues
Could not execute skill because the Web Api request failed
Could not execute skill because Web Api skill response is invalid
Resolution:
Break down the input documents into smaller documents and try again.
Permissions Issues
This request is not authorized to perform this operation
Resolution:
This means the storage account is not accessible with the given credentials. In this case, please review the storage account credentials passed to the API and ensure the storage account is not hidden behind a private endpoint (if a private endpoint is not configured for this resource).
Search options
Azure OpenAI on your data provides several search options you can use when you add your data source, leveraging the following types of search.
Vector search using Ada embedding models, available in select regions.
To enable vector search, you will need a
text-embedding-ada-002
deployment in your Azure OpenAI resource. Select your embedding deployment when connecting your data, then select one of the vector search types under Data management.
Important
- Semantic search and vector search are subject to additional pricing. You need to choose Basic or higher SKU to enable semantic search or vector search. See pricing tier difference and service limits for more information.
- To help improve the quality of the information retrieval and model response, we recommend enabling semantic search for the following languages: English, French, Spanish, Portuguese, Italian, Germany, Chinese(Zh), Japanese, Korean, Russian, and Arabic.
Search option | Retrieval type | Additional pricing? | Benefits |
---|---|---|---|
keyword | Keyword search | No additional pricing. | Performs fast and flexible query parsing and matching over searchable fields, using terms or phrases in any supported language, with or without operators. |
semantic | Semantic search | Additional pricing for semantic search usage. | Improves the precision and relevance of search results by using a reranker (with AI models) to understand the semantic meaning of query terms and documents returned by the initial search ranker |
vector | Vector search | Additional pricing on your Azure OpenAI account from calling the embedding model. | Enables you to find documents that are similar to a given query input based on the vector embeddings of the content. |
hybrid (vector + keyword) | A hybrid of vector search and keyword search | Additional pricing on your Azure OpenAI account from calling the embedding model. | Performs similarity search over vector fields using vector embeddings, while also supporting flexible query parsing and full text search over alphanumeric fields using term queries. |
hybrid (vector + keyword) + semantic | A hybrid of vector search, semantic and keyword search for retrieval. | Additional pricing on your Azure OpenAI account from calling the embedding model, and additional pricing for semantic search usage. | Leverages vector embeddings, language understanding and flexible query parsing to create rich search experiences and generative AI apps that can handle complex and diverse information retrieval scenarios. |
The optimal search option can vary depending on your dataset and use-case. You might need to experiment with multiple options to determine which works best for your use-case.
Index field mapping
If you're using your own index, you will be prompted in the Azure OpenAI Studio to define which fields you want to map for answering questions when you add your data source. You can provide multiple fields for Content data, and should include all fields that have text pertaining to your use case.
In this example, the fields mapped to Content data and Title provide information to the model to answer questions. Title is also used to title citation text. The field mapped to File name generates the citation names in the response.
Mapping these fields correctly helps ensure the model has better response and citation quality.
Using the model
After ingesting your data, you can start chatting with the model on your data using the chat playground in Azure OpenAI studio, or the following methods:
Custom parameters
You can modify the following additional settings in the Data parameters section in Azure OpenAI Studio and the API.
Parameter name | Description |
---|---|
Retrieved documents | Specifies the number of top-scoring documents from your data index used to generate responses. You might want to increase the value when you have short documents or want to provide more context. The default value is 5. This is the topNDocuments parameter in the API. |
Strictness | Sets the threshold to categorize documents as relevant to your queries. Raising the value means a higher threshold for relevance and filters out more less-relevant documents for responses. Setting this value too high might cause the model to fail to generate responses due to limited available documents. The default value is 3. |
Azure Role-based access controls (Azure RBAC) for adding data sources
To add a new data source to Azure OpenAI on your data, you need the following Azure RBAC roles.
Azure RBAC role | Which resource needs this role? | Needed when |
---|---|---|
Cognitive Services OpenAI Contributor | The Azure AI Search resource, to access Azure OpenAI resource. | You want to use Azure OpenAI on your data. |
Search Index Data Reader | The Azure OpenAI resource, to access the Azure AI Search resource. | You want to use Azure OpenAI on your data. |
Search Service Contributor | The Azure OpenAI resource, to access the Azure AI Search resource. | You plan to create a new Azure AI Search index. |
Storage Blob Data Contributor | You have an existing Blob storage container that you want to use, instead of creating a new one. | The Azure AI Search and Azure OpenAI resources, to access the storage account. |
Cognitive Services OpenAI User | The web app, to access the Azure OpenAI resource. | You want to deploy a web app. |
Contributor | Your subscription, to access Azure Resource Manager. | You want to deploy a web app. |
Cognitive Services Contributor Role | The Azure AI Search resource, to access Azure OpenAI resource. | You want to deploy a web app. |
Virtual network support & private endpoint support
- For instructions on setting up your resources to work on a virtual private network or private endpoint, see Use Azure OpenAI on your data securely
- Azure OpenAI, Azure AI Search, and Azure Storage Accounts can be protected under private endpoints and virtual private networks.
Document-level access control
Note
Document-level access control is supported for Azure AI search only.
Azure OpenAI on your data lets you restrict the documents that can be used in responses for different users with Azure AI Search security filters. When you enable document level access, the search results returned from Azure AI Search and used to generate a response will be trimmed based on user Microsoft Entra group membership. You can only enable document-level access on existing Azure AI Search indexes. To enable document-level access:
Follow the steps in the Azure AI Search documentation to register your application and create users and groups.
Index your documents with their permitted groups. Be sure that your new security fields have the schema below:
{"name": "group_ids", "type": "Collection(Edm.String)", "filterable": true }
group_ids
is the default field name. If you use a different field name likemy_group_ids
, you can map the field in index field mapping.Make sure each sensitive document in the index has the value set correctly on this security field to indicate the permitted groups of the document.
In Azure OpenAI Studio, add your data source. in the index field mapping section, you can map zero or one value to the permitted groups field, as long as the schema is compatible. If the Permitted groups field isn't mapped, document level access won't be enabled.
Azure OpenAI Studio
Once the Azure AI Search index is connected, your responses in the studio will have document access based on the Microsoft Entra permissions of the logged in user.
Web app
If you are using a published web app, you need to redeploy it to upgrade to the latest version. The latest version of the web app includes the ability to retrieve the groups of the logged in user's Microsoft Entra account, cache it, and include the group IDs in each API request.
API
When using the API, pass the filter
parameter in each API request. For example:
{
"messages": [
{
"role": "user",
"content": "who is my manager?"
}
],
"dataSources": [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "'$SearchEndpoint'",
"key": "'$SearchKey'",
"indexName": "'$SearchIndex'",
"filter": "my_group_ids/any(g:search.in(g, 'group_id1, group_id2'))"
}
}
]
}
my_group_ids
is the field name that you selected for Permitted groups during fields mapping.group_id1, group_id2
are groups attributed to the logged in user. The client application can retrieve and cache users' groups.
Schedule automatic index refreshes
Note
Automatic index refreshing is supported for Azure Blob storage only.
To keep your Azure AI Search index up-to-date with your latest data, you can schedule a refresh for it that runs automatically rather than manually updating it every time your data is updated. Automatic index refresh is only available when you choose blob storage as the data source. To enable an automatic index refresh:
Add a data source using Azure OpenAI studio.
Under Select or add data source select Indexer schedule and choose the refresh cadence you would like to apply.
After the data ingestion is set to a cadence other than once, Azure AI Search indexers will be created with a schedule equivalent to 0.5 * the cadence specified
. This means that at the specified cadence, the indexers will pull the documents that were added, modified, or deleted from the storage container, reprocess and index them. This ensures that the updated data gets preprocessed and indexed in the final index at the desired cadence automatically. To update your data, you only need to upload the additional documents from the Azure portal. From the portal, select Storage Account > Containers. Select the name of the original container, then Upload. The index will pick up the files automatically after the scheduled refresh period. The intermediate assets created in the Azure AI Search resource will not be cleaned up after ingestion to allow for future runs. These assets are:
{Index Name}-index
{Index Name}-indexer
{Index Name}-indexer-chunk
{Index Name}-datasource
{Index Name}-skillset
To modify the schedule, you can use the Azure portal.
Open your search resource page in the Azure portal
Select Indexers from the left pane
Perform the following steps on the two indexers that have your index name as a prefix.
Recommended settings
Use the following sections to help you configure Azure OpenAI on your data for optimal results.
System message
Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, what it should and shouldn't answer, and how to format responses. There's no token limit for the system message, but will be included with every API call and counted against the overall token limit. The system message will be truncated if it's greater than 400 tokens.
For example, if you're creating a chatbot where the data consists of transcriptions of quarterly financial earnings calls, you might use the following system message:
"You are a financial chatbot useful for answering questions from financial reports. You are given excerpts from the earnings call. Please answer the questions by parsing through all dialogue."
This system message can help improve the quality of the response by specifying the domain (in this case finance) and mentioning that the data consists of call transcriptions. It helps set the necessary context for the model to respond appropriately.
Note
The system message is used to modify how GPT assistant responds to a user question based on retrieved documentation. It does not affect the retrieval process. If you'd like to provide instructions for the retrieval process, it is better to include them in the questions. The system message is only guidance. The model might not adhere to every instruction specified because it has been primed with certain behaviors such as objectivity, and avoiding controversial statements. Unexpected behavior might occur if the system message contradicts with these behaviors.
Maximum response
Set a limit on the number of tokens per model response. The upper limit for Azure OpenAI on Your Data is 1500. This is equivalent to setting the max_tokens
parameter in the API.
Limit responses to your data
This option encourages the model to respond using your data only, and is selected by default. If you unselect this option, the model might more readily apply its internal knowledge to respond. Determine the correct selection based on your use case and scenario.
Interacting with the model
Use the following practices for best results when chatting with the model.
Conversation history
- Before starting a new conversation (or asking a question that is not related to the previous ones), clear the chat history.
- Getting different responses for the same question between the first conversational turn and subsequent turns can be expected because the conversation history changes the current state of the model. If you receive incorrect answers, report it as a quality bug.
Model response
If you are not satisfied with the model response for a specific question, try either making the question more specific or more generic to see how the model responds, and reframe your question accordingly.
Chain-of-thought prompting has been shown to be effective in getting the model to produce desired outputs for complex questions/tasks.
Question length
Avoid asking long questions and break them down into multiple questions if possible. The GPT models have limits on the number of tokens they can accept. Token limits are counted toward: the user question, the system message, the retrieved search documents (chunks), internal prompts, the conversation history (if any), and the response. If the question exceeds the token limit, it will be truncated.
Multi-lingual support
Currently, keyword search and semantic search in Azure OpenAI on your data supports queries are in the same language as the data in the index. For example, if your data is in Japanese, then input queries also need to be in Japanese. For cross-lingual document retrieval, we recommend building the index with Vector search enabled.
To help improve the quality of the information retrieval and model response, we recommend enabling semantic search for the following languages: English, French, Spanish, Portuguese, Italian, Germany, Chinese(Zh), Japanese, Korean, Russian, Arabic
We recommend using a system message to inform the model that your data is in another language. For example:
*"*You are an AI assistant designed to help users extract information from retrieved Japanese documents. Please scrutinize the Japanese documents carefully before formulating a response. The user's query will be in Japanese, and you must response also in Japanese."
If you have documents in multiple languages, we recommend building a new index for each language and connecting them separately to Azure OpenAI.
Deploying the model
After you connect Azure OpenAI to your data, you can deploy it using the Deploy to button in Azure OpenAI studio.
Using Power Virtual Agents
You can deploy your model to Power Virtual Agents directly from Azure OpenAI studio, enabling you to bring conversational experiences to various Microsoft Teams, Websites, Power Platform solutions, Dynamics 365, and other Azure Bot Service channels. Power Virtual Agents acts as a conversational and generative AI platform, making the process of creating, publishing and deploying a bot to any number of channels simple and accessible.
While Power Virtual Agents has features that leverage Azure OpenAI such as generative answers, deploying a model grounded on your data lets you create a chatbot that will respond using your data, and connect it to the Power Platform. The tenant used in the Azure OpenAI service and Power Platform should be the same. For more information, see Use a connection to Azure OpenAI on your data.
Note
Deploying to Power Virtual Agents from Azure OpenAI is only available to US regions. Power Virtual Agents supports Azure AI Search indexes with keyword or semantic search only. Other data sources and advanced features might not be supported.
Using the web app
You can also use the available standalone web app to interact with your model using a graphical user interface, which you can deploy using either Azure OpenAI studio or a manual deployment.
Web app customization
You can also customize the app's frontend and backend logic. For example, you could change the icon that appears in the center of the app by updating /frontend/src/assets/Azure.svg
and then redeploying the app using the Azure CLI. See the source code for the web app, and more information on GitHub.
When customizing the app, we recommend:
Resetting the chat session (clear chat) if the user changes any settings. Notify the user that their chat history will be lost.
Clearly communicating the impact on the user experience that each setting you implement will have.
When you rotate API keys for your Azure OpenAI or Azure AI Search resource, be sure to update the app settings for each of your deployed apps to use the new keys.
Pulling changes from the
main
branch for the web app's source code frequently to ensure you have the latest bug fixes and improvements.
Important considerations
Publishing creates an Azure App Service in your subscription. It might incur costs depending on the pricing plan you select. When you're done with your app, you can delete it from the Azure portal.
By default, the app will only be accessible to you. To add authentication (for example, restrict access to the app to members of your Azure tenant):
Go to the Azure portal and search for the app name you specified during publishing. Select the web app, and go to the Authentication tab on the left navigation menu. Then select Add an identity provider.
Select Microsoft as the identity provider. The default settings on this page will restrict the app to your tenant only, so you don't need to change anything else here. Then select Add
Now users will be asked to sign in with their Microsoft Entra account to be able to access your app. You can follow a similar process to add another identity provider if you prefer. The app doesn't use the user's login information in any other way other than verifying they are a member of your tenant.
Chat history
You can enable chat history for your users of the web app. By enabling the feature, your users will have access to their individual previous queries and responses.
To enable chat history, deploy or redeploy your model as a web app using Azure OpenAI Studio
Important
Enabling chat history will create a Cosmos DB instance in your resource group, and incur additional charges for the storage used.
Once you've enabled chat history, your users will be able to show and hide it in the top right corner of the app. When the history is shown, they can rename, or delete conversations. As they're logged into the app, conversations will be automatically ordered from newest to oldest, and named based on the first query in the conversation.
Deleting your Cosmos DB instance
Deleting your web app does not delete your Cosmos DB instance automatically. To delete your Cosmos DB instance, along with all stored chats, you need to navigate to the associated resource in the Azure portal and delete it. If you delete the Cosmos DB resource but keep the chat history option enabled on the studio, your users will be notified of a connection error, but can continue to use the web app without access to the chat history.
Using the API
After you upload your data through Azure OpenAI studio, you can make a call against Azure OpenAI models through APIs. Consider setting the following parameters even if they are optional for using the API.
Parameter | Recommendation |
---|---|
fieldsMapping |
Explicitly set the title and content fields of your index. This impacts the search retrieval quality of Azure AI Search, which impacts the overall response and citation quality. |
roleInformation |
Corresponds to the "System Message" in the Azure OpenAI Studio. See the System message section above for recommendations. |
Streaming data
You can send a streaming request using the stream
parameter, allowing data to be sent and received incrementally, without waiting for the entire API response. This can improve performance and user experience, especially for large or dynamic data.
{
"stream": true,
"dataSources": [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "'$SearchEndpoint'",
"key": "'$SearchKey'",
"indexName": "'$SearchIndex'"
}
}
],
"messages": [
{
"role": "user",
"content": "What are the differences between Azure Machine Learning and Azure AI services?"
}
]
}
Conversation history for better results
When you chat with a model, providing a history of the chat will help the model return higher quality results.
{
"dataSources": [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "'$SearchEndpoint'",
"key": "'$SearchKey'",
"indexName": "'$SearchIndex'"
}
}
],
"messages": [
{
"role": "user",
"content": "What are the differences between Azure Machine Learning and Azure AI services?"
},
{
"role": "tool",
"content": "{\"citations\": [{\"content\": \"title: Azure AI services and Machine Learning\\ntitleSuffix: Azure AI services\\ndescription: Learn where Azure AI services fits in with other Azure offerings for machine learning.\\nAzure AI services and machine learning\\nAzure AI services provides machine learning capabilities to solve general problems such as...\\n \"articles\\\\cognitive-services\\\\cognitive-services-and-machine-learning.md\", \"url\": null, \"metadata\": {\"chunking\": \"orignal document size=1018. Scores=0.32200050354003906 and 1.2880020141601562.Org Highlight count=115.\"}, \"chunk_id\": \"0\"}], \"intent\": \"[\\\"What are the differences between Azure Machine Learning and Azure AI services?\\\"]\"}"
},
{
"role": "assistant",
"content": " \nAzure Machine Learning is a product and service tailored for data scientists to build, train, and deploy machine learning models [doc1]..."
},
{
"role": "user",
"content": "How do I use Azure machine learning?"
}
]
}
Token usage estimation for Azure OpenAI on your data
Model | Total tokens available | Max tokens for system message | Max tokens for model response |
---|---|---|---|
ChatGPT Turbo (0301) 8k | 8000 | 400 | 1500 |
ChatGPT Turbo 16k | 16000 | 1000 | 3200 |
GPT-4 (8k) | 8000 | 400 | 1500 |
GPT-4 32k | 32000 | 2000 | 6400 |
The table above shows the total number of tokens available for each model type. It also determines the maximum number of tokens that can be used for the system message and the model response. Additionally, the following also consume tokens:
The meta prompt (MP): if you limit responses from the model to the grounding data content (
inScope=True
in the API), the maximum number of tokens is 4036 tokens. Otherwise (for example ifinScope=False
) the maximum is 3444 tokens. This number is variable depending on the token length of the user question and conversation history. This estimate includes the base prompt as well as the query rewriting prompts for retrieval.User question and history: Variable but capped at 2000 tokens.
Retrieved documents (chunks): The number of tokens used by the retrieved document chunks depends on multiple factors. The upper bound for this is the number of retrieved document chunks multiplied by the chunk size. It will, however, be truncated based on the tokens available tokens for the specific model being used after counting the rest of fields.
20% of the available tokens are reserved for the model response. The remaining 80% of available tokens include the meta prompt, the user question and conversation history, and the system message. The remaining token budget is used by the retrieved document chunks.
import tiktoken
class TokenEstimator(object):
GPT2_TOKENIZER = tiktoken.get_encoding("gpt2")
def estimate_tokens(self, text: str) -> int:
return len(self.GPT2_TOKENIZER.encode(text))
token_output = TokenEstimator.estimate_tokens(input_text)
Next steps
Feedback
Submit and view feedback for