Extract Sections of a Document using Azure Models in Python

Sanjana Mohan 0 Reputation points
2024-05-22T10:53:03.8866667+00:00

I am working on a RAG model, which answers questions based on a document. These questions are based on particular sections, so I was wondering if there is any way to extract different sections from a document using Azure document intelligence in python.

I am currently using the prebuilt-layout model. I tried using paragraphs roles to extract section headings and manually create the sections, however the model is not identifying section headings that accurately.

Is this the right model to use. Is there any way to extract sections from a document using this model or any other Azure document intelligence model?

Any help is appreciated. Thanks!

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,633 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Azar 22,860 Reputation points MVP
    2024-05-22T11:14:08.92+00:00

    Hi there, Sanjana Mohan

    Thats a good question and thanks for using QandA platform

    I guess, the prebuilt-layout model might not be the most accurate for identifying section headings. While the prebuilt-layout model is good for understanding the overall structure and layout, it may not always correctly identify section headings due to the variety in formatting styles across different documents.

    I suggest a more effective approach would be to use the Azure Form Recognizer's custom model capabilities. By training a custom model, you can specifically tailor it to recognize the particular formatting and structure of your documents, which can significantly improve accuracy in identifying section headings and extracting content.

    If this helps kindly accept the answer thanks ,much.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.