Azure OpenAI chatbot using custom data gives an error when using dataSources

ESB 25 Reputation points
2023-09-19T12:43:42.0466667+00:00

I'm working on a chatbot that answers using our own data. The only issue is that when using the python script generated from the chat playground the only response is "The requested information is not available in the retrieved data. Please try another query or topic.". This is true even when using the exact same query as in the chat playground. The same issue arises when using postman to send a http request. When not using dataSources, it works just like a normal chatGPT.

Here is the code:

import openai, os, requests

openai.api_type = "azure"

# Azure OpenAI on your own data is only supported by the 2023-08-01-preview API version
openai.api_version = "2023-08-01-preview"

# Azure OpenAI setup
openai.api_base = "Endpoint" # Add your endpoint here
openai.api_key = "Key" # Add your OpenAI API key here
deployment_id = "Deployment" # Add your deployment ID here
# Azure Cognitive Search setup
search_endpoint = "SearchEndpoint"; # Add your Azure Cognitive Search endpoint here
search_key = "SearchKey"; # Add your Azure Cognitive Search admin key here
search_index_name = "Index"; # Add your Azure Cognitive Search index name here

def setup_byod(deployment_id: str) -> None:
    """Sets up the OpenAI Python SDK to use your own data for the chat endpoint.

    :param deployment_id: The deployment ID for the model to use with your own data.

    To remove this configuration, simply set openai.requestssession to None.
    """

    class BringYourOwnDataAdapter(requests.adapters.HTTPAdapter):

        def send(self, request, **kwargs):
            request.url = f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"
            return super().send(request, **kwargs)

    session = requests.Session()

    # Mount a custom adapter which will use the extensions endpoint for any call using the given `deployment_id`
    session.mount(
        prefix=f"{openai.api_base}/openai/deployments/{deployment_id}",
        adapter=BringYourOwnDataAdapter()
    )

    openai.requestssession = session

setup_byod(deployment_id)

completion = openai.ChatCompletion.create(
    messages=[{"role": "system", "content": "SystemMessage"},{"role": "user", "content": "UserQuery"}],
    deployment_id=deployment_id,
    dataSources=[  # camelCase is intentional, as this is the format the API expects
        {
            "type": "AzureCognitiveSearch",
            "parameters": {
                "endpoint": search_endpoint,
                "key": search_key,
                "indexName": search_index_name,
            }
        }
    ]
)
print(completion["choices"][0]["message"]["content"])
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,378 questions
{count} votes

Accepted answer
  1. S Blackett 80 Reputation points
    2023-09-26T19:26:32.3433333+00:00

    This is my working code with my urls redacted.
    The three new fields I added are queryType semanticConfiguration and roleInformation.

    import openai, os, requests
    
    openai.api_type = "azure"
    
    # Azure OpenAI on your own data is only supported by the 2023-08-01-preview API version
    openai.api_version = "2023-09-01-preview"
    
    # Azure OpenAI setup
    openai.api_base = "" # Add your endpoint here
    openai.api_key = os.getenv("OPENAI_API_KEY") # Add your OpenAI API key here
    deployment_id = "" # Add your deployment ID here
    # Azure Cognitive Search setup
    search_endpoint = "" # Add your Azure Cognitive Search endpoint here
    search_key = os.getenv("SEARCH_KEY"); # Add your Azure Cognitive Search admin key here
    search_index_name = "" # Add your Azure Cognitive Search index name here
    
    def setup_byod(deployment_id: str) -> None:
        """Sets up the OpenAI Python SDK to use your own data for the chat endpoint.
    
        :param deployment_id: The deployment ID for the model to use with your own data.
    
        To remove this configuration, simply set openai.requestssession to None.
        """
    
        class BringYourOwnDataAdapter(requests.adapters.HTTPAdapter):
    
            def send(self, request, **kwargs):
                request.url = f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"
                return super().send(request, **kwargs)
    
        session = requests.Session()
    
        # Mount a custom adapter which will use the extensions endpoint for any call using the given `deployment_id`
        session.mount(
            prefix=f"{openai.api_base}/openai/deployments/{deployment_id}",
            adapter=BringYourOwnDataAdapter()
        )
    
        openai.requestssession = session
    
    setup_byod(deployment_id)
    
    completion = openai.ChatCompletion.create(
        messages=[{"role": "system", "content": 
                    """        Task: Roleplay as a conversational assistant."""
            },
                  {"role": "user", "content": "Who is Harry Styles?"}
                  ],
        deployment_id=deployment_id,
        dataSources=[  # camelCase is intentional, as this is the format the API expects
            {
                "type": "AzureCognitiveSearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "key": search_key,
                    "indexName": search_index_name,
                    "queryType": "semantic",
                    "semanticConfiguration" : "default",
                    "roleInformation": """        Task: Roleplay as a conversational assistant."""
                }
            }
        ]
    )
    print(completion)
    
    
    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Peter Silva Alva 0 Reputation points
    2023-11-27T19:18:25.6533333+00:00

    I recently faced a similar issue. I noticed the way I set up the prompt made a huge difference to solved this.

    If I include more information than needed, then I received index errors or not available information. When I exclude the redundant info (or not relevant), everything flows as expected.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.