Custom Model - how to extract text to have same indentation/ structure as original document

Ionut Dutescu 40 Reputation points
2024-03-05T14:27:53.48+00:00

I have trained a Custom Model to extract all the existent text in the image that contains a PDF. This is how we receive the PDFs as images and we need to extract all the text existent. The Document Intelligence model extracts it row by row, without respecting the original structure. I was wondering if there is any way to extract all the text, and output it in the same format/ indentation, the information extracted should be present in the same position e.g. left-corner date, right-corner email, details, the line items aligned on center. Basically the output text should have the same document structure. Is it possible in some way, or the model just extracts plain text and puts it all in a long long paragraph?

Thanks!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,620 questions
{count} votes

1 answer

Sort by: Most helpful
  1. santoshkc 7,865 Reputation points Microsoft Vendor
    2024-03-06T08:35:08.1266667+00:00

    Hi @Ionut Dutescu,

    Thank you for reaching out to Microsoft Q&A forum!

    To maintain the original document structure while extracting text from PDF images, you can use the "Draw region" option in the Custom model. This option allows you to draw regions around the text you want to extract, preserving the original layout and structure of the document.

    To use this option, you need to train your Custom model using the Document Intelligence service. During the training process, you can select the "Draw region" option and draw regions around the text you want to extract. Once the model is trained, you can use it to extract text from PDF images while preserving the original document structure.

    It's worth noting that the accuracy of the extracted text will depend on the quality of the PDF images and the accuracy of the regions you draw. You may need to experiment with different region sizes and positions to get the best results.

    I hope you understand! Thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.