Azure OpenAI on your data (preview)

Azure OpenAI on your data enables you to run supported chat models such as GPT-35-Turbo and GPT-4 on your data without needing to train or fine-tune models. Running models on your data enables you to chat on top of, and analyze your data with greater accuracy and speed. By doing so, you can unlock valuable insights that can help you make better business decisions, identify trends and patterns, and optimize your operations. One of the key benefits of Azure OpenAI on your data is its ability to tailor the content of conversational AI.

Because the model has access to, and can reference specific sources to support its responses, answers are not only based on its pretrained knowledge but also on the latest information available in the designated data source. This grounding data also helps the model avoid generating responses based on outdated or incorrect information.

What is Azure OpenAI on your data

Azure OpenAI on your data works with OpenAI's powerful GPT-35-Turbo and GPT-4 language models, enabling them to provide responses based on your data. You can access Azure OpenAI on your data using a REST API or the web-based interface in the Azure OpenAI Studio to create a solution that connects to your data to enable an enhanced chat experience.

One of the key features of Azure OpenAI on your data is its ability to retrieve and utilize data in a way that enhances the model's output. Azure OpenAI on your data, together with Azure AI Search, determines what data to retrieve from the designated data source based on the user input and provided conversation history. This data is then augmented and resubmitted as a prompt to the OpenAI model, with retrieved information being appended to the original prompt. Although retrieved data is being appended to the prompt, the resulting input is still processed by the model like any other prompt. Once the data has been retrieved and the prompt has been submitted to the model, the model uses this information to provide a completion. See the Data, privacy, and security for Azure OpenAI Service article for more information.

Get started

To get started, connect your data source using Azure OpenAI Studio and start asking questions and chatting on your data.

Note

To get started, you need to already have been approved for Azure OpenAI access and have an Azure OpenAI Service resource with either the gpt-35-turbo or the gpt-4 models deployed.

Data formats and file types

Azure OpenAI on your data supports the following filetypes:

  • .txt
  • .md
  • .html
  • Microsoft Word files
  • Microsoft PowerPoint files
  • PDF

There is an upload limit, and there are some caveats about document structure and how it might affect the quality of responses from the model:

  • The model provides the best citation titles from markdown (.md) files.

  • If a document is a PDF file, the text contents are extracted as a preprocessing step (unless you're connecting your own Azure AI Search index). If your document contains images, graphs, or other visual content, the model's response quality depends on the quality of the text that can be extracted from them.

  • If you're converting data from an unsupported format into a supported format, make sure the conversion:

    • Doesn't lead to significant data loss.
    • Doesn't add unexpected noise to your data.

    This will impact the quality of the model response.

Ingesting your data

There are several different sources of data that you can use. The following sources will be connected to Azure AI Search:

  • Blobs in an Azure storage container that you provide
  • Local files uploaded using the Azure OpenAI Studio

You can additionally ingest your data from an existing Azure AI Search service, or use Azure Cosmos DB for MongoDB vCore.

Custom parameters

You can modify the following additional settings in the Data parameters section in Azure OpenAI Studio and the API.

Parameter name Description
Retrieved documents Specifies the number of top-scoring documents from your data index used to generate responses. You might want to increase the value when you have short documents or want to provide more context. The default value is 5. This is the topNDocuments parameter in the API.
Strictness Sets the threshold to categorize documents as relevant to your queries. Raising the value means a higher threshold for relevance and filters out more less-relevant documents for responses. Setting this value too high might cause the model to fail to generate responses due to limited available documents. The default value is 3.

Azure Role-based access controls (Azure RBAC) for adding data sources

To add a new data source to Azure OpenAI on your data, you need the following Azure RBAC roles.

Azure RBAC role Which resource needs this role? Needed when
Cognitive Services OpenAI Contributor The Azure AI Search resource, to access Azure OpenAI resource. You want to use Azure OpenAI on your data.
Search Index Data Reader The Azure OpenAI resource, to access the Azure AI Search resource. You want to use Azure OpenAI on your data.
Search Service Contributor The Azure OpenAI resource, to access the Azure AI Search resource. You plan to create a new Azure AI Search index.
Storage Blob Data Contributor You have an existing Blob storage container that you want to use, instead of creating a new one. The Azure AI Search and Azure OpenAI resources, to access the storage account.
Cognitive Services OpenAI User The web app, to access the Azure OpenAI resource. You want to deploy a web app.
Contributor Your subscription, to access Azure Resource Manager. You want to deploy a web app.
Cognitive Services Contributor Role The Azure AI Search resource, to access Azure OpenAI resource. You want to deploy a web app.

Virtual network support & private endpoint support

  • For instructions on setting up your resources to work on a virtual private network or private endpoint, see Use Azure OpenAI on your data securely
  • Azure OpenAI, Azure AI Search, and Azure Storage Accounts can be protected under private endpoints and virtual private networks.

Document-level access control

Note

Document-level access control is supported for Azure AI search only.

Azure OpenAI on your data lets you restrict the documents that can be used in responses for different users with Azure AI Search security filters. When you enable document level access, the search results returned from Azure AI Search and used to generate a response will be trimmed based on user Microsoft Entra group membership. You can only enable document-level access on existing Azure AI Search indexes. To enable document-level access:

  1. Follow the steps in the Azure AI Search documentation to register your application and create users and groups.

  2. Index your documents with their permitted groups. Be sure that your new security fields have the schema below:

    {"name": "group_ids", "type": "Collection(Edm.String)", "filterable": true }
    

    group_ids is the default field name. If you use a different field name like my_group_ids, you can map the field in index field mapping.

  3. Make sure each sensitive document in the index has the value set correctly on this security field to indicate the permitted groups of the document.

  4. In Azure OpenAI Studio, add your data source. in the index field mapping section, you can map zero or one value to the permitted groups field, as long as the schema is compatible. If the Permitted groups field isn't mapped, document level access won't be enabled.

Azure OpenAI Studio

Once the Azure AI Search index is connected, your responses in the studio will have document access based on the Microsoft Entra permissions of the logged in user.

Web app

If you are using a published web app, you need to redeploy it to upgrade to the latest version. The latest version of the web app includes the ability to retrieve the groups of the logged in user's Microsoft Entra account, cache it, and include the group IDs in each API request.

API

When using the API, pass the filter parameter in each API request. For example:

{
    "messages": [
        {
            "role": "user",
            "content": "who is my manager?"
        }
    ],
    "dataSources": [
        {
            "type": "AzureCognitiveSearch",
            "parameters": {
                "endpoint": "'$SearchEndpoint'",
                "key": "'$SearchKey'",
                "indexName": "'$SearchIndex'",
                "filter": "my_group_ids/any(g:search.in(g, 'group_id1, group_id2'))"
            }
        }
    ]
}
  • my_group_ids is the field name that you selected for Permitted groups during fields mapping.
  • group_id1, group_id2 are groups attributed to the logged in user. The client application can retrieve and cache users' groups.

Schedule automatic index refreshes

Note

Automatic index refreshing is supported for Azure Blob storage only.

To keep your Azure AI Search index up-to-date with your latest data, you can schedule a refresh for it that runs automatically rather than manually updating it every time your data is updated. Automatic index refresh is only available when you choose blob storage as the data source. To enable an automatic index refresh:

  1. Add a data source using Azure OpenAI studio.

  2. Under Select or add data source select Indexer schedule and choose the refresh cadence you would like to apply.

    A screenshot of the indexer schedule in Azure OpenAI Studio.

After the data ingestion is set to a cadence other than once, Azure AI Search indexers will be created with a schedule equivalent to 0.5 * the cadence specified. This means that at the specified cadence, the indexers will pull the documents that were added, modified, or deleted from the storage container, reprocess and index them. This ensures that the updated data gets preprocessed and indexed in the final index at the desired cadence automatically. To update your data, you only need to upload the additional documents from the Azure portal. From the portal, select Storage Account > Containers. Select the name of the original container, then Upload. The index will pick up the files automatically after the scheduled refresh period. The intermediate assets created in the Azure AI Search resource will not be cleaned up after ingestion to allow for future runs. These assets are:

  • {Index Name}-index
  • {Index Name}-indexer
  • {Index Name}-indexer-chunk
  • {Index Name}-datasource
  • {Index Name}-skillset

To modify the schedule, you can use the Azure portal.

  1. Open your search resource page in the Azure portal

  2. Select Indexers from the left pane

    A screenshot of the indexers tab in the Azure portal.

  3. Perform the following steps on the two indexers that have your index name as a prefix.

    1. Select the indexer to open it. Then select the settings tab.

    2. Update the schedule to the desired cadence from "Schedule" or specify a custom cadence from "Interval (minutes)"

      A screenshot of the settings page for an individual indexer.

    3. Select Save.

Use the following sections to help you configure Azure OpenAI on your data for optimal results.

System message

Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, what it should and shouldn't answer, and how to format responses. There's no token limit for the system message, but will be included with every API call and counted against the overall token limit. The system message will be truncated if it's greater than 400 tokens.

For example, if you're creating a chatbot where the data consists of transcriptions of quarterly financial earnings calls, you might use the following system message:

"You are a financial chatbot useful for answering questions from financial reports. You are given excerpts from the earnings call. Please answer the questions by parsing through all dialogue."

This system message can help improve the quality of the response by specifying the domain (in this case finance) and mentioning that the data consists of call transcriptions. It helps set the necessary context for the model to respond appropriately.

Note

The system message is used to modify how GPT assistant responds to a user question based on retrieved documentation. It does not affect the retrieval process. If you'd like to provide instructions for the retrieval process, it is better to include them in the questions. The system message is only guidance. The model might not adhere to every instruction specified because it has been primed with certain behaviors such as objectivity, and avoiding controversial statements. Unexpected behavior might occur if the system message contradicts with these behaviors.

Maximum response

Set a limit on the number of tokens per model response. The upper limit for Azure OpenAI on Your Data is 1500. This is equivalent to setting the max_tokens parameter in the API.

Limit responses to your data

This option encourages the model to respond using your data only, and is selected by default. If you unselect this option, the model might more readily apply its internal knowledge to respond. Determine the correct selection based on your use case and scenario.

Interacting with the model

Use the following practices for best results when chatting with the model.

Conversation history

  • Before starting a new conversation (or asking a question that is not related to the previous ones), clear the chat history.
  • Getting different responses for the same question between the first conversational turn and subsequent turns can be expected because the conversation history changes the current state of the model. If you receive incorrect answers, report it as a quality bug.

Model response

  • If you are not satisfied with the model response for a specific question, try either making the question more specific or more generic to see how the model responds, and reframe your question accordingly.

  • Chain-of-thought prompting has been shown to be effective in getting the model to produce desired outputs for complex questions/tasks.

Question length

Avoid asking long questions and break them down into multiple questions if possible. The GPT models have limits on the number of tokens they can accept. Token limits are counted toward: the user question, the system message, the retrieved search documents (chunks), internal prompts, the conversation history (if any), and the response. If the question exceeds the token limit, it will be truncated.

Multi-lingual support

  • Currently, keyword search and semantic search in Azure OpenAI on your data supports queries are in the same language as the data in the index. For example, if your data is in Japanese, then input queries also need to be in Japanese. For cross-lingual document retrieval, we recommend building the index with Vector search enabled.

  • To help improve the quality of the information retrieval and model response, we recommend enabling semantic search for the following languages: English, French, Spanish, Portuguese, Italian, Germany, Chinese(Zh), Japanese, Korean, Russian, Arabic

  • We recommend using a system message to inform the model that your data is in another language. For example:

  • *"*You are an AI assistant designed to help users extract information from retrieved Japanese documents. Please scrutinize the Japanese documents carefully before formulating a response. The user's query will be in Japanese, and you must response also in Japanese."

  • If you have documents in multiple languages, we recommend building a new index for each language and connecting them separately to Azure OpenAI.

Deploying the model

After you connect Azure OpenAI to your data, you can deploy it using the Deploy to button in Azure OpenAI studio.

A screenshot showing the model deployment button in Azure OpenAI Studio.

Using Power Virtual Agents

You can deploy your model to Power Virtual Agents directly from Azure OpenAI studio, enabling you to bring conversational experiences to various Microsoft Teams, Websites, Power Platform solutions, Dynamics 365, and other Azure Bot Service channels. Power Virtual Agents acts as a conversational and generative AI platform, making the process of creating, publishing and deploying a bot to any number of channels simple and accessible.

While Power Virtual Agents has features that leverage Azure OpenAI such as generative answers, deploying a model grounded on your data lets you create a chatbot that will respond using your data, and connect it to the Power Platform. The tenant used in the Azure OpenAI service and Power Platform should be the same. For more information, see Use a connection to Azure OpenAI on your data.

Note

Deploying to Power Virtual Agents from Azure OpenAI is only available to US regions. Power Virtual Agents supports Azure AI Search indexes with keyword or semantic search only. Other data sources and advanced features might not be supported.

Using the web app

You can also use the available standalone web app to interact with your model using a graphical user interface, which you can deploy using either Azure OpenAI studio or a manual deployment.

A screenshot of the web app interface.

Web app customization

You can also customize the app's frontend and backend logic. For example, you could change the icon that appears in the center of the app by updating /frontend/src/assets/Azure.svg and then redeploying the app using the Azure CLI. See the source code for the web app, and more information on GitHub.

When customizing the app, we recommend:

  • Resetting the chat session (clear chat) if the user changes any settings. Notify the user that their chat history will be lost.

  • Clearly communicating the impact on the user experience that each setting you implement will have.

  • When you rotate API keys for your Azure OpenAI or Azure AI Search resource, be sure to update the app settings for each of your deployed apps to use the new keys.

  • Pulling changes from the main branch for the web app's source code frequently to ensure you have the latest bug fixes and improvements.

Important considerations
  • Publishing creates an Azure App Service in your subscription. It might incur costs depending on the pricing plan you select. When you're done with your app, you can delete it from the Azure portal.

  • By default, the app will only be accessible to you. To add authentication (for example, restrict access to the app to members of your Azure tenant):

    1. Go to the Azure portal and search for the app name you specified during publishing. Select the web app, and go to the Authentication tab on the left navigation menu. Then select Add an identity provider.

      Screenshot of the authentication page in the Azure portal.

    2. Select Microsoft as the identity provider. The default settings on this page will restrict the app to your tenant only, so you don't need to change anything else here. Then select Add

    Now users will be asked to sign in with their Microsoft Entra account to be able to access your app. You can follow a similar process to add another identity provider if you prefer. The app doesn't use the user's login information in any other way other than verifying they are a member of your tenant.

Chat history

You can enable chat history for your users of the web app. By enabling the feature, your users will have access to their individual previous queries and responses.

To enable chat history, deploy or redeploy your model as a web app using Azure OpenAI Studio

A screenshot of the chat history enablement button on Azure OpenAI studio.

Important

Enabling chat history will create a Cosmos DB instance in your resource group, and incur additional charges for the storage used.

Once you've enabled chat history, your users will be able to show and hide it in the top right corner of the app. When the history is shown, they can rename, or delete conversations. As they're logged into the app, conversations will be automatically ordered from newest to oldest, and named based on the first query in the conversation.

A screenshot of the chat history in the web app.

Deleting your Cosmos DB instance

Deleting your web app does not delete your Cosmos DB instance automatically. To delete your Cosmos DB instance, along with all stored chats, you need to navigate to the associated resource in the Azure portal and delete it. If you delete the Cosmos DB resource but keep the chat history option enabled on the studio, your users will be notified of a connection error, but can continue to use the web app without access to the chat history.

Using the API

After you upload your data through Azure OpenAI studio, you can make a call against Azure OpenAI models through APIs. Consider setting the following parameters even if they are optional for using the API.

Parameter Recommendation
fieldsMapping Explicitly set the title and content fields of your index. This impacts the search retrieval quality of Azure AI Search, which impacts the overall response and citation quality.
roleInformation Corresponds to the "System Message" in the Azure OpenAI Studio. See the System message section above for recommendations.

Streaming data

You can send a streaming request using the stream parameter, allowing data to be sent and received incrementally, without waiting for the entire API response. This can improve performance and user experience, especially for large or dynamic data.

{
    "stream": true,
    "dataSources": [
        {
            "type": "AzureCognitiveSearch",
            "parameters": {
                "endpoint": "'$SearchEndpoint'",
                "key": "'$SearchKey'",
                "indexName": "'$SearchIndex'"
            }
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "What are the differences between Azure Machine Learning and Azure AI services?"
        }
    ]
}

Conversation history for better results

When you chat with a model, providing a history of the chat will help the model return higher quality results.

{
    "dataSources": [
        {
            "type": "AzureCognitiveSearch",
            "parameters": {
                "endpoint": "'$SearchEndpoint'",
                "key": "'$SearchKey'",
                "indexName": "'$SearchIndex'"
            }
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "What are the differences between Azure Machine Learning and Azure AI services?"
        },
        {
            "role": "tool",
            "content": "{\"citations\": [{\"content\": \"title: Azure AI services and Machine Learning\\ntitleSuffix: Azure AI services\\ndescription: Learn where Azure AI services fits in with other Azure offerings for machine learning.\\nAzure AI services and machine learning\\nAzure AI services provides machine learning capabilities to solve general problems such as...\\n \"articles\\\\cognitive-services\\\\cognitive-services-and-machine-learning.md\", \"url\": null, \"metadata\": {\"chunking\": \"orignal document size=1018. Scores=0.32200050354003906 and 1.2880020141601562.Org Highlight count=115.\"}, \"chunk_id\": \"0\"}], \"intent\": \"[\\\"What are the differences between Azure Machine Learning and Azure AI services?\\\"]\"}"
        },
        {
            "role": "assistant",
            "content": " \nAzure Machine Learning is a product and service tailored for data scientists to build, train, and deploy machine learning models [doc1]..."
        },
        {
            "role": "user",
            "content": "How do I use Azure machine learning?"
        }
    ]
}

Token usage estimation for Azure OpenAI on your data

Model Total tokens available Max tokens for system message Max tokens for model response
ChatGPT Turbo (0301) 8k 8000 400 1500
ChatGPT Turbo 16k 16000 1000 3200
GPT-4 (8k) 8000 400 1500
GPT-4 32k 32000 2000 6400

The table above shows the total number of tokens available for each model type. It also determines the maximum number of tokens that can be used for the system message and the model response. Additionally, the following also consume tokens:

  • The meta prompt (MP): if you limit responses from the model to the grounding data content (inScope=True in the API), the maximum number of tokens is 4036 tokens. Otherwise (for example if inScope=False) the maximum is 3444 tokens. This number is variable depending on the token length of the user question and conversation history. This estimate includes the base prompt as well as the query rewriting prompts for retrieval.

  • User question and history: Variable but capped at 2000 tokens.

  • Retrieved documents (chunks): The number of tokens used by the retrieved document chunks depends on multiple factors. The upper bound for this is the number of retrieved document chunks multiplied by the chunk size. It will, however, be truncated based on the tokens available tokens for the specific model being used after counting the rest of fields.

    20% of the available tokens are reserved for the model response. The remaining 80% of available tokens include the meta prompt, the user question and conversation history, and the system message. The remaining token budget is used by the retrieved document chunks.

import tiktoken

class TokenEstimator(object):

    GPT2_TOKENIZER = tiktoken.get_encoding("gpt2")

    def estimate_tokens(self, text: str) -> int:
        return len(self.GPT2_TOKENIZER.encode(text))
      
token_output = TokenEstimator.estimate_tokens(input_text)

Next steps