DocumentPage Class

Content and layout elements extracted from a page of the input.

New in version 2023-07-31: The barcodes, and formulas properties.

Inheritance
builtins.object
DocumentPage

Constructor

DocumentPage(**kwargs: Any)

Methods

from_dict

Converts a dict in the shape of a DocumentPage to the model itself.

to_dict

Returns a dict representation of DocumentPage.

from_dict

Converts a dict in the shape of a DocumentPage to the model itself.

from_dict(data: Dict) -> DocumentPage

Parameters

Name Description
data
Required

A dictionary in the shape of DocumentPage.

Returns

Type Description

DocumentPage

to_dict

Returns a dict representation of DocumentPage.

to_dict() -> Dict

Returns

Type Description

dict

Attributes

angle

The general orientation of the content in clockwise direction, measured in degrees between (-180, 180].

angle: float | None

barcodes

Extracted barcodes from the page.

barcodes: List[DocumentBarcode]

formulas

Extracted formulas from the page

formulas: List[DocumentFormula]

height

The height of the image/PDF in pixels/inches, respectively.

height: float | None

lines

Extracted lines from the page, potentially containing both textual and visual elements.

lines: List[DocumentLine]

page_number

1-based page number in the input document.

page_number: int

selection_marks

Extracted selection marks from the page.

selection_marks: List[DocumentSelectionMark]

spans

Location of the page in the reading order concatenated content.

spans: List[DocumentSpan]

unit

The unit used by the width, height, and bounding polygon properties. For images, the unit is "pixel". For PDF, the unit is "inch". Possible values include: "pixel", "inch".

unit: str | None

width

The width of the image/PDF in pixels/inches, respectively.

width: float | None

words

Extracted words from the page.

words: List[DocumentWord]