Is it possible to extract titles, headers and table captions using Form Recognizer?

Question

Is it possible to extract titles, headers and table captions using Form Recognizer?

Bogdan 1

I am using Form Recognizer to extract text and tables from multi-page PDF files. Is it possible to to extract the information whether a line is a title, header, subheader, ... and the caption of a table if a caption is present?

1 answer

Your answer

Answer 1

@Bogdan You can extract all the text from a PDF file but this text will not be grouped under the categories as mentioned above. Certain pre-built API's can extract details from a form or a ID card with a key/value pair in the response. The best option in this scenario is to use the custom form API to extract text based on training provided to a form. The training should essentially tag the text of a header or a caption with a name so that the actual endpoint with the custom model can extract the text as required. I would recommend to start using the form studio to train a model for your custom scenario.

If an answer is helpful, please click on or upvote which might help other community members reading this thread.

Share via

Is it possible to extract titles, headers and table captions using Form Recognizer?

1 answer

Your answer