Hi @Gavin Gao ,
Thank you for reaching out to the Azure community forum!
I understand that you are working on Azure Document Intelligence and have a few questions on the significance of each element in the response JSON of the pre-built Layout model. I will be happy to assist you regarding this.
The span, length, and offset fields are significant in finding word spans in lines or sentences in the output of the Azure Document Intelligence pre-built Layout model.
The span field represents the position of the word span in the line or sentence. Span contains two inner fields, length and offset, which represent the length of the word span and its starting position, respectively. By using these fields, you can identify the exact location of the word span in the line or sentence.
Table elements:
The rowSpan and colSpan fields are used to specify the number of rows or columns that a cell in a table should span. These fields are optional and can be used to create complex table layouts.
The "kind" field in the table cell specifies the type of content in the cell. It can be used to identify whether the cell contains text, an image, or a table. By using the kind field, you can programmatically process the content of the table cell based on its type.
Paragraphs & Role field:
The role field is optional for paragraphs because not all paragraphs have a specific role. However, when a paragraph has a role, it can provide additional context about the content of the paragraph. For example, a paragraph with the role "header" may contain a section heading, while a paragraph with the role "footer" may contain copyright information.
The following illustration shows the typical components of a sample page.
Below is a high-level explanation of the structure, content & its significance of the JSON response for the document analysis:
"apiVersion": Indicates the REST API version used for this response.
"modelId": Specifies the Model ID used, which is likely the prebuilt invoice model.
"stringIndexType": Describes the character unit used for string offsets and lengths, typically using text elements, Unicode code points, or UTF-16 code units.
"content": Contains the extracted content from the document, including text and line breaks.
"pages": Represents a list of pages analyzed within the document.
"spans" within "pages": These represent parts of the top-level content covered by a page, indicating where content appears on a specific page.
"pageNumber": Indicates the indexed page number of the current page.
"angle": Specifies the orientation of content on the page in degrees.
"width" and "height": Provide the page dimensions (width and height) in pixels.
"unit": Indicates that the unit used for width, height, and polygon coordinates is pixels.
"words": Contains a list of extracted words on the page, along with their positions and confidence scores.
"spans" within "words": These represent spans (portions) of text within a word, indicating where a word begins and ends in the document's content.
"selectionMarks": Lists selection marks (e.g., checkboxes) on the page, including their state and positions.
"spans" within "selectionMarks": These indicate spans of content within a selection mark (e.g., checkbox), showing where the mark is located within the content.
"lines": Contains a list of lines on the page, which may include both words and selection marks.
"spans" within "lines": These indicate spans (portions) of content within a line, showing where the line's content begins and ends.
"tables": Represents a list of extracted tables, including their row and column counts.
"spans" within "tables": These represent parts of the top-level content covered by a table. Each span may correspond to a portion of the document's content contained within the table.
"cells": Contains details about cells within the tables, including their kind, position, and content.
"keyValuePairs": Lists extracted key-value pairs, including the key, value, and extraction confidence.
"spans" within "keyValuePairs": These represent spans of text within a key or value of a key-value pair, indicating where the key or value content is located.
"styles": Represents different styles of content, such as handwritten or printed, with associated spans and confidence scores.
"documents": Contains information about classified documents, including their type, bounding regions, and spans.
"fields": Provides details about extracted fields within a document, including their type, value, content, and confidence.
These elements and their associated attributes help structure and provide detailed information about the content, layout, styles, and extracted data within the document being analyzed.
For more information about the fields in the output of the Azure Document Intelligence pre-built Layout model, you can refer to the official documentation provided by Microsoft:
I hope this information helps!
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.