Azure AI Search Open AI chat completion return [doc1] in the content of the assistance response.

Question

Azure AI Search Open AI chat completion return [doc1] in the content of the assistance response.

Anis Achkar 45

Hello everyone,

The issue:

I am using the Azure Open AI chat completion with my own data retrieved from Azure Cognitive Search (newly named AI search) in order to create a chat bot application with RAG capabilities.

The problem I am facing is that the answer that I am getting from the chat completion api contains chunks of text refering to [doc1] regardless of the document being referenced (see below + note that I don't have any document called doc1).

This is what I am refering to:

"... Dissertation Fellowship. [doc1]"

This is the entire object of the message array:

{"index": 1,
"role": "assistant",
"content": "Based on the retrieved document, Gloria Gonzalez is a Ph.D. holder in Spanish (US Hispanic Literature) from the University of Houston. She is an adjunct lecturer at the University of Houston's Department of Hispanic Studies, where she teaches courses such as Mexican-American Literature, Women in Hispanic Literature, and Spanish-American Short Story. She has published several peer-reviewed articles and is the author of the book \"Quixote Reborn: The Wanderer in US Hispanic Literature,\" which is forthcoming from Yale University Press. She has also presented at various conferences, including the Hispanic Storytelling Association Annual Conference and the US Hispanic Literature Annual Conference. Additionally, she has received several honors and awards, including the UH Teaching Awards and the Dissertation Fellowship. [doc1]",                    
"end_turn": true                }

The goal:

My goal is basically to be able to retrieve the documents name instead of having the [doc1].

Additional information on Chat completion API:

Here is the URI: <my endpoint>/openai/deployments/<my deployment name>/extensions/chat/completions?api-version=2023-06-01-preview

Here is the request body that I use:

{
        "temperature": 0,
        "max_tokens": 1000,
        "top_p": 1.0,
        "dataSources": [
            {
                "type": "AzureCognitiveSearch",
                "parameters": {
                    "endpoint": "<my end point>",
                    "key": "<my key>",
                    "indexName": "<my index name>"
                }
            }
        ],
        "messages": [
                    {"role": "user","content": "Who is Gloria"}        
                    ]
}

Here is the complete response that I get:

{
    "id": "<>",
    "model": "gpt-35-turbo",
    "created": 1701778584,
    "object": "chat.completion",
    "choices": [
        {
            "index": 0,
            "messages": [
                {
                    "index": 0,
                    "role": "tool",
                    "content": "{\"citations\": [{\"content\": \"Gloria Gonzalez\\n3204 Windover Way\\nHoustonFemenina  Hispánica\\nModern Languages Association\\n\\nGloriaGonzalezCV.docx\", \"id\": null, \"title\": null, \"filepath\": null, \"url\": null, \"metadata\": {\"chunking\": \"orignal document size=580. Scores=0.5272721Org Highlight count=7.\"}, \"chunk_id\": \"0\"}], \"intent\": \"[\\\"Who is Gloria?\\\"]\"}",
                    "end_turn": false
                },
                {
                    "index": 1,
                    "role": "assistant",
                    "content": "Based on the retrieved document, Gloria Gonzalez is a Ph.D. holder in Spanish (US Hispanic Literature) from the University of Houston. She is an adjunct lecturer at the University of Houston's Department of Hispanic Studies, where she teaches courses such as Mexican-American Literature, Women in Hispanic Literature, and Spanish-American Short Story. She has published several peer-reviewed articles and is the author of the book \"Quixote Reborn: The Wanderer in US Hispanic Literature,\" which is forthcoming from Yale University Press. She has also presented at various conferences, including the Hispanic Storytelling Association Annual Conference and the US Hispanic Literature Annual Conference. Additionally, she has received several honors and awards, including the UH Teaching Awards and the Dissertation Fellowship. [doc1]",
                    "end_turn": true
                }
            ]
        }
    ],
    "usage": {
        "prompt_tokens": 3937,
        "completion_tokens": 157,
        "total_tokens": 4094
    }
}

I am using the following microsoft documentation:

"Azure OpenAI Service REST API reference": https://learn.microsoft.com/en-us/azure/ai-services/openai/reference
"Index data from SharePoint document libraries": https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online

Additional information on Azure cognitive search set up (AI search):

Index API body:

{
    "name" : "<my sharepoint index name>",
    "fields": [
        { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
        { "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
        { "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
        { "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
        { "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
        { "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
        { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
    ]
}

indexer API body:

{
    "name" : "<my sharepoint indexer name>",
    "dataSourceName" : "<my sharepiont datasource name>",
    "targetIndexName" : "<my sharepoint index name>",
    "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "maxFailedItemsPerBatch": null,
    "base64EncodeKeys": null,
    "configuration": {
        "indexedFileNameExtensions" : ".pdf, .docx, .pptx, .xlsx",
        "excludedFileNameExtensions" : ".png, .jpg",
        "dataToExtract": "contentAndMetadata"
      }
    },
    "schedule" : {"interval" : "PT5M"},
    "fieldMappings" : [
        { 
          "sourceFieldName" : "metadata_spo_site_library_item_id", 
          "targetFieldName" : "id", 
          "mappingFunction" : { 
            "name" : "base64Encode" 
          } 
         }
    ]
}

Final word

Thank you for your help.

Let me know if you need any additional information.

Thank you,

Anis

Grmacjon-MSFT 19,301 Reputation points Moderator

2023-12-12T00:22:24.9433333+00:00

@Anis Achkar Thanks for bringing this our attention. We are checking internally with the engineering team to get more insights on your question and will get back to you when we hear back from them.

Best,

Grace
Suresh Kumar M G 5 Reputation points

2024-01-16T09:27:17.3633333+00:00

Hi @Grmacjon-MSFT , I'm encountering the same problem. Is there a resolution available for the mentioned issue or any possible workaround?
Buddiz AI 0 Reputation points

2024-02-01T09:26:59.7866667+00:00

@Suresh Kumar M G As microsoft support team responded for my support ticket for the same issue ,it is as per design from the AI search side. For further customization, It has to be taken care of by the system or web app side only as expected. here is my solution modify the output with regex query to remove [doc1] [doc2], etc., const filteredAssitantResponse = assistantResponse.content?.replace(/[doc\d+]/g, ''),
Anis Achkar 45 Reputation points

2024-02-05T09:07:41.41+00:00

@Buddiz AI thank you for you answer, I looked at the sample code for the following: https://github.com/Azure-Samples/azure-search-openai-demo and they are actually removing [doc1] [doc2] ... However, the content doesn't always return the document name and the goal here would be to replace [doc1] with the actual document being referenced. So my question for you is, how can I get the document name from the response of the chat completion api in order to return it?

Your answer

Grmacjon-MSFT 19,301 Reputation points Moderator

2023-12-12T00:22:24.9433333+00:00

@Anis Achkar Thanks for bringing this our attention. We are checking internally with the engineering team to get more insights on your question and will get back to you when we hear back from them.

Best,

Grace
Suresh Kumar M G 5 Reputation points

2024-01-16T09:27:17.3633333+00:00

Hi @Grmacjon-MSFT , I'm encountering the same problem. Is there a resolution available for the mentioned issue or any possible workaround?
Buddiz AI 0 Reputation points

2024-02-01T09:26:59.7866667+00:00

@Suresh Kumar M G As microsoft support team responded for my support ticket for the same issue ,it is as per design from the AI search side. For further customization, It has to be taken care of by the system or web app side only as expected. here is my solution modify the output with regex query to remove [doc1] [doc2], etc., const filteredAssitantResponse = assistantResponse.content?.replace(/[doc\d+]/g, ''),
Anis Achkar 45 Reputation points

2024-02-05T09:07:41.41+00:00

@Buddiz AI thank you for you answer, I looked at the sample code for the following: https://github.com/Azure-Samples/azure-search-openai-demo and they are actually removing [doc1] [doc2] ... However, the content doesn't always return the document name and the goal here would be to replace [doc1] with the actual document being referenced. So my question for you is, how can I get the document name from the response of the chat completion api in order to return it?

Share via

Azure AI Search Open AI chat completion return [doc1] in the content of the assistance response.

The issue:

The goal:

Additional information on Chat completion API:

Additional information on Azure cognitive search set up (AI search):

Final word

Your answer