Unstructured Data while using MS Document Intelligence API

Dimple Khurana 0 Reputation points
2024-04-10T05:41:47.4766667+00:00

Dear Forum Members,

I am using the Microsoft Document Intelligence (Invoice) pretrained model by calling through API in UiPath by using HTTP request. Now, the output is in JSON format. For that, I am using Deserialize JSON. And data I am getting is in unstructured format. I have checked in Microsoft Azure as well. On UI, it looks fine but in the Result tab, JSON looks unstructured. For example: Vendor Name\nABC\nInvoice Date\nInvoice No\n03-Oct-2023\n567890. In this case, when I am trying to extract Invoice No, it is giving me 03-Oct-2023. Is there any way that I could Deserialize and I could get proper data.

Thanks

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
{count} votes

1 answer

Sort by: Most helpful
  1. dupammi 8,615 Reputation points Microsoft External Staff
    2024-04-10T10:08:11.6166667+00:00

    Hi @Dimple Khurana

    Thank you for reaching out to the Microsoft Q&A community forum.

    Based on your query, it seems that the unstructured format of the JSON data is causing issues while extracting the required data.

    I tried to repro this issue at my end by using the POSTMAN. See below:

    POST Request:User's image GET Request using the operation-Location URL above.

    SELECTED "RAW" tab in outputs. See below.

    User's image

    Selected "Pretty" tab, that helped beautify and formatting JSON.

    User's image

    One possible reason for the unstructured format could be the RAW format of the JSON data. You can try using the "Pretty" format to beautify the JSON data and make it more structured.

    Regarding your question on how to deserialize, please use libraries such as JSON.NET for .NET and json module for Python to deserialize the JSON data into a structured format, if you would like to perform this programmatically.

    import json 
    # Deserialize the JSON data into a dictionary 
    data = json.loads(json_data)
    

    For more information on Microsoft Document Intelligence API and JSON deserialization libraries, please refer to the following links:

    I hope you understand, and the provided details helps in debugging and resolving your issue.

    Thank you.


    If this answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.