Document Intelligence error when trying to output markdown (python, langchain SDK)

Dadfar, Reza 0 Reputation points
2024-02-27T07:41:31.1766667+00:00

Hi Microsoft Support Team,

I've encountered an issue while trying to analyze a publicly available document using both Document Intelligence (studio) and the Python SDK, following the example provided in your GitHub repository (https://github.com/microsoft/Form-Recognizer-Toolkit/blob/main/SampleCode/Python/sample_rag_langchain.ipynb). The document in question is available at this URL: "https://www.orica.com/ArticleDocuments/301/FY2023%20Annual%20Report.pdf.aspx". Upon attempting the analysis, I received the following error message:

  • Code: InternalServerError
  • Message: An unexpected error occurred. Exception Details: (FailedToSerializeAnalyzeResult) Failed to serialize analyze results, please contact support.
  • Code: FailedToSerializeAnalyzeResult
  • Message: Failed to serialize analyze results, please contact support.

I would greatly appreciate any assistance or guidance you could provide to resolve this issue. Additionally, I have two follow-up questions:

  1. Is there a way to directly extract the Markdown file from the Document Intelligence studio without using the SDK?
  2. The LangChain API and the example provided seem to work only with individual files. Is there an API available for processing folders containing several files?

Thank you for your help.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,622 questions
{count} votes

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 17,021 Reputation points
    2024-02-27T20:11:41.83+00:00

    Hello @Dadfar, Reza , Thanks for using Microsoft Q&A Platform.

    Yes, this is an ongoing issue with the output markdown when using Document Intelligence Studio or Python SDK to analyze a PDF document using prebuilt-layout model with specific page or page range.

    The latest update from the product team today is it's started to work after a fix. I have reproduced the same with your sample document by specifying page range and able to get the Markdown output results. I would request you to try the same on your end.

    User's image

    Regarding your question,

    Is there a way to directly extract the Markdown file from the Document Intelligence studio without using the SDK?

    As shown in the screenshot here, you can either copy the data or download the JSON result from the studio and extract required data from there. This should help.

    The LangChain API and the example provided seem to work only with individual files. Is there an API available for processing folders containing several files?

    Regarding this LangChain API, we don't have much information. Maybe you can raise this issue here for better assistance on the same. https://github.com/Azure-Samples/function-python-ai-langchain/issues

    I hope this helps.

    Regards,

    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.