Azure AI Search Open AI chat completion return [doc1] in the content of the assistance response.

Anis Achkar 35 Reputation points
2023-12-06T13:45:57.7233333+00:00

Hello everyone,

The issue:

I am using the Azure Open AI chat completion with my own data retrieved from Azure Cognitive Search (newly named AI search) in order to create a chat bot application with RAG capabilities.

The problem I am facing is that the answer that I am getting from the chat completion api contains chunks of text refering to [doc1] regardless of the document being referenced (see below + note that I don't have any document called doc1).

This is what I am refering to:

"... Dissertation Fellowship. [doc1]"

This is the entire object of the message array:

{"index": 1,
"role": "assistant",
"content": "Based on the retrieved document, Gloria Gonzalez is a Ph.D. holder in Spanish (US Hispanic Literature) from the University of Houston. She is an adjunct lecturer at the University of Houston's Department of Hispanic Studies, where she teaches courses such as Mexican-American Literature, Women in Hispanic Literature, and Spanish-American Short Story. She has published several peer-reviewed articles and is the author of the book \"Quixote Reborn: The Wanderer in US Hispanic Literature,\" which is forthcoming from Yale University Press. She has also presented at various conferences, including the Hispanic Storytelling Association Annual Conference and the US Hispanic Literature Annual Conference. Additionally, she has received several honors and awards, including the UH Teaching Awards and the Dissertation Fellowship. [doc1]",                    
"end_turn": true                }

The goal:

My goal is basically to be able to retrieve the documents name instead of having the [doc1].

Additional information on Chat completion API:

Here is the URI: <my endpoint>/openai/deployments/<my deployment name>/extensions/chat/completions?api-version=2023-06-01-preview

Here is the request body that I use:

{
        "temperature": 0,
        "max_tokens": 1000,
        "top_p": 1.0,
        "dataSources": [
            {
                "type": "AzureCognitiveSearch",
                "parameters": {
                    "endpoint": "<my end point>",
                    "key": "<my key>",
                    "indexName": "<my index name>"
                }
            }
        ],
        "messages": [
                    {"role": "user","content": "Who is Gloria"}        
                    ]
}

Here is the complete response that I get:

{
    "id": "<>",
    "model": "gpt-35-turbo",
    "created": 1701778584,
    "object": "chat.completion",
    "choices": [
        {
            "index": 0,
            "messages": [
                {
                    "index": 0,
                    "role": "tool",
                    "content": "{\"citations\": [{\"content\": \"Gloria Gonzalez\\n3204 Windover Way\\nHoustonFemenina  Hispánica\\nModern Languages Association\\n\\nGloriaGonzalezCV.docx\", \"id\": null, \"title\": null, \"filepath\": null, \"url\": null, \"metadata\": {\"chunking\": \"orignal document size=580. Scores=0.5272721Org Highlight count=7.\"}, \"chunk_id\": \"0\"}], \"intent\": \"[\\\"Who is Gloria?\\\"]\"}",
                    "end_turn": false
                },
                {
                    "index": 1,
                    "role": "assistant",
                    "content": "Based on the retrieved document, Gloria Gonzalez is a Ph.D. holder in Spanish (US Hispanic Literature) from the University of Houston. She is an adjunct lecturer at the University of Houston's Department of Hispanic Studies, where she teaches courses such as Mexican-American Literature, Women in Hispanic Literature, and Spanish-American Short Story. She has published several peer-reviewed articles and is the author of the book \"Quixote Reborn: The Wanderer in US Hispanic Literature,\" which is forthcoming from Yale University Press. She has also presented at various conferences, including the Hispanic Storytelling Association Annual Conference and the US Hispanic Literature Annual Conference. Additionally, she has received several honors and awards, including the UH Teaching Awards and the Dissertation Fellowship. [doc1]",
                    "end_turn": true
                }
            ]
        }
    ],
    "usage": {
        "prompt_tokens": 3937,
        "completion_tokens": 157,
        "total_tokens": 4094
    }
}

I am using the following microsoft documentation:

Index API body:

{
    "name" : "<my sharepoint index name>",
    "fields": [
        { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
        { "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
        { "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
        { "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
        { "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
        { "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
        { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
    ]
}



indexer API body:

{
    "name" : "<my sharepoint indexer name>",
    "dataSourceName" : "<my sharepiont datasource name>",
    "targetIndexName" : "<my sharepoint index name>",
    "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "maxFailedItemsPerBatch": null,
    "base64EncodeKeys": null,
    "configuration": {
        "indexedFileNameExtensions" : ".pdf, .docx, .pptx, .xlsx",
        "excludedFileNameExtensions" : ".png, .jpg",
        "dataToExtract": "contentAndMetadata"
      }
    },
    "schedule" : {"interval" : "PT5M"},
    "fieldMappings" : [
        { 
          "sourceFieldName" : "metadata_spo_site_library_item_id", 
          "targetFieldName" : "id", 
          "mappingFunction" : { 
            "name" : "base64Encode" 
          } 
         }
    ]
}


Final word

Thank you for your help.

Let me know if you need any additional information.

Thank you,

Anis

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,225 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
11,230 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.