Issue with "Limit responses to your data" with Azure OpenAI Use your own data API in Python

Question

Issue with "Limit responses to your data" with Azure OpenAI Use your own data API in Python

Yash Shukla 40

Using the python's api for ChatCompletions I tried running the use your own data functionality from the official repo: https://github.com/microsoft/sample-app-aoai-chatGPT .

However, the responses are not limited to the documents I have uploaded.

I could not find any parameter similar to the checkbox in the playground for "Limit responses to your own data" and was unable to achieve good results when tried prompt engineering for the same.

Is there a parameter for api or code snippet that I might have missed in the official documentation?

AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2023-08-24T15:10:54.0566667+00:00

Yash Shukla Greetings & Welcome to Microsoft Q&A forum!

Did you try setting AZURE_SEARCH_ENABLE_IN_DOMAIN parameter to "true"

When it is set to true, it limits responses to only queries relating to your data.

See https://github.com/microsoft/sample-app-aoai-chatGPT/blob/0e14412723cf65865ba64e72cd97ee65241236df/app.py#L34

Let me know if this is what you are looking for.
Yash Shukla 40 Reputation points

2023-08-25T05:51:03.4366667+00:00

I tried setting it to true. However, it still goes out of scope and responds from its database (along with an out-of-scope citation) rather than from the documents provided. One such example is here:

Expected response: Sorry, I do not have any information related to this.

Received Response: Refer Image above.

Is there anything that would help me resolve this issue?
Mohamad Bahri 180 Reputation points

2023-08-26T21:57:42.79+00:00

I am facing the same issue, I would like to have the python code that configures this option so that openai's model is only utilized for its language understanding capabilities and the result is just limited to the search results of a cognitive search service.
There should be a parameter in the ChatCompletion method that just limits the reply to the documents content.
Yash Shukla 40 Reputation points

2023-08-28T04:26:45.2566667+00:00

@AshokPeddakotla-MSFT Any updates?
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2023-08-28T11:16:59.4733333+00:00
Yash Shukla There are some other settings you can change that helps improve the functionality of this feature.

If you're using the gpt-35-turbo model, we've found the gpt-35-turbo-16k model tends to perform better for restricting answers to your data.

Also, enabling semantic search on the data source will helps.

Try the default model parameters for temperature and top_p, if you've changed them.

Let me know if that helps.

Accepted answer

0 additional answers

Your answer

AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2023-08-24T15:10:54.0566667+00:00

Yash Shukla Greetings & Welcome to Microsoft Q&A forum!

Did you try setting AZURE_SEARCH_ENABLE_IN_DOMAIN parameter to "true"

When it is set to true, it limits responses to only queries relating to your data.

See https://github.com/microsoft/sample-app-aoai-chatGPT/blob/0e14412723cf65865ba64e72cd97ee65241236df/app.py#L34

Let me know if this is what you are looking for.
Yash Shukla 40 Reputation points

2023-08-25T05:51:03.4366667+00:00

I tried setting it to true. However, it still goes out of scope and responds from its database (along with an out-of-scope citation) rather than from the documents provided. One such example is here:

Expected response: Sorry, I do not have any information related to this.

Received Response: Refer Image above.

Is there anything that would help me resolve this issue?
Mohamad Bahri 180 Reputation points

2023-08-26T21:57:42.79+00:00

I am facing the same issue, I would like to have the python code that configures this option so that openai's model is only utilized for its language understanding capabilities and the result is just limited to the search results of a cognitive search service.
There should be a parameter in the ChatCompletion method that just limits the reply to the documents content.
Yash Shukla 40 Reputation points

2023-08-28T04:26:45.2566667+00:00

@AshokPeddakotla-MSFT Any updates?
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2023-08-28T11:16:59.4733333+00:00

Yash Shukla There are some other settings you can change that helps improve the functionality of this feature.

If you're using the gpt-35-turbo model, we've found the gpt-35-turbo-16k model tends to perform better for restricting answers to your data.

Also, enabling semantic search on the data source will helps.

Try the default model parameters for temperature and top_p, if you've changed them.

Let me know if that helps.

Answer 1

Mohamad Bahri 180

Alright so from what I have searched, there isn't any native configuration that can solve this issue, regardless of whether you are using azure openai or openai itself.
The best possible way that I found in order to limit the response is to either use a binary classifier (detect if a prompt is on-topic or off-topic) or to use a a similarity detector between whatever you are getting from the search results and the answer. I used a similarity detector that returns a score between 0 and 1 based on the similarity between the search results and the prompt, set up a threshold and if the search results fit the user prompt, the score will be high and it'll be on-topic.

I really hope that they add some sort of configuration setup but I think it'll be challenging as it is very difficult to limit the model's response, and even in the playground the sometimes hallucinates.

Mohamad Bahri 180 Reputation points

2023-09-04T06:12:10.19+00:00

A little update, in the sample-app-aoai-chatgpt github repo, they are using a different way to call the azure openai api, in app.py the function prepare_body_headers_with_data they prepare the body of the call and they use the dataSources property with the inScope property inside it, I found that this does limit the responses to a great extent but it still sometimes hallucinates and generates answers from the model's data. I think that combining that function with another similarity_detctor or on/off_topic classification model could lead to the best possible results.
this is the repo: https://github.com/microsoft/sample-app-aoai-chatGPT
its actually the same repo that you get when you add your data in the azure portal and then deploy to a web app resource.

Still, there should be a more configurable method that 100% guarantees that the generated data is only from the sources.

Share via

Issue with "Limit responses to your data" with Azure OpenAI Use your own data API in Python

0 additional answers

Your answer