Using Azure Document Intelligence, can it detect and extract a "list" and separate the items within?

Matt Charron 0 Reputation points
2023-09-27T16:38:06.32+00:00

We are looking to pass in a PDF document which contains a list - the formatting/styling of the list could be varied:

i.e.:

1.

2.

3.

or

1)

2)

3)

etc.

and would ideally like the service to:

  • Detect the list as a whole
  • Split out the content of each item within the list

I have not been able to find any documentation or examples of this, but if I've simply missed it, I apologize for the oversight.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,621 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 17,021 Reputation points
    2023-09-28T01:16:55.83+00:00

    Hello @Matt Charron , Thanks for using Microsoft Q&A Platform.

    As we know the analyze operation of the Document Intelligence model depends on the type of document to be analyzed.

    Azure Document Intelligence can extract the content of each item within the list, but it cannot recognize the style or format of the list. When the data is in the list format the model can sometimes read it as a table data. Based on the structure we can expect the output, it can be read as the paragraph text as well.

    To help you understand I have experimented few formats with prebuilt models (like document, layout and read), and I could see the output can be either table/paragraph format. As per my knowledge these could be possible outcome, we can expect when we try documents that contains list.

    User's image

    Maybe you can try custom model to train these lists and see if that helps.

    I hope this helps.

    Regards,
    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.