How is text classified as heading? (markdown)

Bogdan Pechounov 40 Reputation points
2024-03-26T16:20:29.74+00:00

When analyzing a document, the analyzeResult.paragraphs have a role such as "sectionHeading". Initially, I thought that each word is passed to a model like LayoutLM to perform NER (e.g. B-Header, I-Header, B-Paragraph, ...).

However, it is the OCR engine that detects the paragraphs, so the beginning and end are already determined. (The entire text needs to belong to a class, we can't have multiple classes in a paragraph: "Example heading" -> B-Header, B-Paragraph)

Is the entire text of the paragraph passed to a classifier? (without considering its surroundings?)

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,368 questions
{count} votes

1 answer

Sort by: Most helpful
  1. dupammi 6,390 Reputation points Microsoft Vendor
    2024-03-27T18:33:35.7166667+00:00

    Hi @Bogdan Pechounov

    Thank you for your response.

    The repository you mentioned is the Microsoft Unified Language Model (UniLM) repository, which contains various pre-trained models and architectures for natural language processing (NLP) tasks. The UniLM repository includes models such as UniLMv1, UniLMv2, and UniLMv3, which are pre-trained on large-scale datasets for tasks such as language modeling, question answering, and summarization etc.

    However, the UniLM repository is not directly related to Azure AI Document Intelligence. The DI Layout model is a pre-built model provided by Azure AI Document Intelligence that is specifically designed for layout analysis and extracting information such as paragraphs, titles, section headings, footnotes, page headers, page footers, and page numbers.

    Regarding your question about text selection mode, selecting a span of text is not equivalent to Named Entity Recognition (NER). NER is a technique used to identify and classify named entities in text, such as people, organizations, and locations. Selecting a span of text is a user interface feature that allows users to highlight and select a portion of text in a document. The bounding box of the selected text can be used to identify the region of the text, but it does not provide any information about the content of the text. Please refer a similar thread on Custom NER for more details.

    Regarding your question about text embeddings in the DI Layout model, I suppose it uses a combination of layout features such as font size, position, and alignment to predict the role of each text block in the document.

    I hope this helps. Thank you.

    1 person found this answer helpful.
    0 comments No comments