Is it possible to extract titles, headers and table captions using Form Recognizer?

Bogdan 1 Reputation point
2021-11-25T18:29:03.523+00:00

I am using Form Recognizer to extract text and tables from multi-page PDF files. Is it possible to to extract the information whether a line is a title, header, subheader, ... and the caption of a table if a caption is present?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator
    2021-11-26T07:27:27.81+00:00

    @Bogdan You can extract all the text from a PDF file but this text will not be grouped under the categories as mentioned above. Certain pre-built API's can extract details from a form or a ID card with a key/value pair in the response. The best option in this scenario is to use the custom form API to extract text based on training provided to a form. The training should essentially tag the text of a header or a caption with a name so that the actual endpoint with the custom model can extract the text as required. I would recommend to start using the form studio to train a model for your custom scenario.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.