DocumentIntelligenceClient Class
DocumentIntelligenceClient.
- Inheritance
-
azure.ai.documentintelligence._client.DocumentIntelligenceClientDocumentIntelligenceClient
Constructor
DocumentIntelligenceClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, **kwargs: Any)
Parameters
Name | Description |
---|---|
endpoint
Required
|
The Document Intelligence service endpoint. Required. |
credential
Required
|
Credential needed for the client to connect to Azure. Is either a AzureKeyCredential type or a TokenCredential type. Required. |
Keyword-Only Parameters
Name | Description |
---|---|
api_version
|
The API version to use for this operation. Default value is "2024-02-29-preview". Note that overriding this default value may result in unsupported behavior. |
polling_interval
|
Default waiting time between two polls for LRO operations if no Retry-After header is present. |
Methods
begin_analyze_document |
Analyzes document with document model. |
begin_classify_document |
Classifies document with document classifier. |
close | |
send_request |
Runs the network request through the client's chained policies.
For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request |
begin_analyze_document
Analyzes document with document model.
begin_analyze_document(model_id: str, analyze_request: AnalyzeDocumentRequest | MutableMapping[str, Any] | IO[bytes] | None = None, *, pages: str | None = None, locale: str | None = None, string_index_type: str | StringIndexType | None = None, features: List[str | DocumentAnalysisFeature] | None = None, query_fields: List[str] | None = None, output_content_format: str | ContentFormat | None = None, **kwargs: Any) -> LROPoller[AnalyzeResult]
Parameters
Name | Description |
---|---|
model_id
Required
|
Unique document model name. Required. |
analyze_request
Required
|
Analyze request parameters. Is one of the following types: AnalyzeDocumentRequest, JSON, IO[bytes] Default value is None. |
Keyword-Only Parameters
Name | Description |
---|---|
pages
|
List of 1-based page numbers to analyze. Ex. "1-3,5,7-9". Default value is None. |
locale
|
Locale hint for text recognition and document analysis. Value may contain only the language code (ex. "en", "fr") or BCP 47 language tag (ex. "en-US"). Default value is None. |
string_index_type
|
Method used to compute string offset and length. Known values are: "textElements", "unicodeCodePoint", and "utf16CodeUnit". Default value is None. |
features
|
List of optional analysis features. Default value is None. |
query_fields
|
List of additional fields to extract. Ex. "NumberOfGuests,StoreNumber". Default value is None. |
output_content_format
|
str or
ContentFormat
Format of the analyze result top-level content. Known values are: "text" and "markdown". Default value is None. |
Returns
Type | Description |
---|---|
An instance of LROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping |
Exceptions
Type | Description |
---|---|
Examples
# JSON input template you can fill out and use as your body input.
analyze_request = {
"base64Source": bytes("bytes", encoding="utf-8"), # Optional. Base64
encoding of the document to analyze. Either urlSource or base64Source must be
specified.
"urlSource": "str" # Optional. Document URL to analyze. Either urlSource or
base64Source must be specified.
}
# response body for status code(s): 202
response == {
"apiVersion": "str", # API version used to produce this result. Required.
"content": "str", # Concatenate string representation of all textual and
visual elements in reading order. Required.
"modelId": "str", # Document model ID used to produce this result. Required.
"pages": [
{
"pageNumber": 0, # 1-based page number in the input
document. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"angle": 0.0, # Optional. The general orientation of the
content in clockwise direction, measured in degrees between (-180, 180].
"barcodes": [
{
"confidence": 0.0, # Confidence of correctly
extracting the barcode. Required.
"kind": "str", # Barcode kind. Required.
Known values are: "QRCode", "PDF417", "UPCA", "UPCE", "Code39",
"Code128", "EAN8", "EAN13", "DataBar", "Code93", "Codabar",
"DataBarExpanded", "ITF", "MicroQRCode", "Aztec", "DataMatrix",
and "MaxiCode".
"span": {
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
},
"value": "str", # Barcode value. Required.
"polygon": [
0.0 # Optional. Bounding polygon of
the barcode, with coordinates specified relative to the
top-left of the page. The numbers represent the x, y values
of the polygon vertices, clockwise from the left (-180
degrees inclusive) relative to the element orientation.
]
}
],
"formulas": [
{
"confidence": 0.0, # Confidence of correctly
extracting the formula. Required.
"kind": "str", # Formula kind. Required.
Known values are: "inline" and "display".
"span": {
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
},
"value": "str", # LaTex expression
describing the formula. Required.
"polygon": [
0.0 # Optional. Bounding polygon of
the formula, with coordinates specified relative to the
top-left of the page. The numbers represent the x, y values
of the polygon vertices, clockwise from the left (-180
degrees inclusive) relative to the element orientation.
]
}
],
"height": 0.0, # Optional. The height of the image/PDF in
pixels/inches, respectively.
"lines": [
{
"content": "str", # Concatenated content of
the contained elements in reading order. Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"polygon": [
0.0 # Optional. Bounding polygon of
the line, with coordinates specified relative to the top-left
of the page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation.
]
}
],
"selectionMarks": [
{
"confidence": 0.0, # Confidence of correctly
extracting the selection mark. Required.
"span": {
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
},
"state": "str", # State of the selection
mark. Required. Known values are: "selected" and "unselected".
"polygon": [
0.0 # Optional. Bounding polygon of
the selection mark, with coordinates specified relative to
the top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the left (-180
degrees inclusive) relative to the element orientation.
]
}
],
"unit": "str", # Optional. The unit used by the width,
height, and polygon properties. For images, the unit is "pixel". For PDF,
the unit is "inch". Known values are: "pixel" and "inch".
"width": 0.0, # Optional. The width of the image/PDF in
pixels/inches, respectively.
"words": [
{
"confidence": 0.0, # Confidence of correctly
extracting the word. Required.
"content": "str", # Text content of the
word. Required.
"span": {
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
},
"polygon": [
0.0 # Optional. Bounding polygon of
the word, with coordinates specified relative to the top-left
of the page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation.
]
}
]
}
],
"stringIndexType": "str", # Method used to compute string offset and length.
Required. Known values are: "textElements", "unicodeCodePoint", and
"utf16CodeUnit".
"contentFormat": "str", # Optional. Format of the analyze result top-level
content. Known values are: "text" and "markdown".
"documents": [
{
"confidence": 0.0, # Confidence of correctly extracting the
document. Required.
"docType": "str", # Document type. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page number of
page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on the page,
or the entire page if not specified. Coordinates specified
relative to the top-left of the page. The numbers represent
the x, y values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"fields": {
"str": {
"type": "str", # Data type of the field
value. Required. Known values are: "string", "date", "time",
"phoneNumber", "number", "integer", "selectionMark",
"countryRegion", "signature", "array", "object", "currency",
"address", "boolean", and "selectionGroup".
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"confidence": 0.0, # Optional. Confidence of
correctly extracting the field.
"content": "str", # Optional. Field content.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"valueAddress": {
"city": "str", # Optional. Name of
city, town, village, etc.
"cityDistrict": "str", # Optional.
Districts or boroughs within a city, such as Brooklyn in New
York City or City of Westminster in London.
"countryRegion": "str", # Optional.
Country/region.
"house": "str", # Optional. Build
name, such as World Trade Center.
"houseNumber": "str", # Optional.
House or building number.
"level": "str", # Optional. Floor
number, such as 3F.
"poBox": "str", # Optional. Post
office box number.
"postalCode": "str", # Optional.
Postal code used for mail sorting.
"road": "str", # Optional. Street
name.
"state": "str", # Optional.
First-level administrative division.
"stateDistrict": "str", # Optional.
Second-level administrative division used in certain locales.
"streetAddress": "str", # Optional.
Street-level address, excluding city, state, countryRegion,
and postalCode.
"suburb": "str", # Optional.
Unofficial neighborhood name, like Chinatown.
"unit": "str" # Optional. Apartment
or office number.
},
"valueArray": [
...
],
"valueBoolean": bool, # Optional. Boolean
value.
"valueCountryRegion": "str", # Optional.
3-letter country code value (ISO 3166-1 alpha-3).
"valueCurrency": {
"amount": 0.0, # Currency amount.
Required.
"currencyCode": "str", # Optional.
Resolved currency code (ISO 4217), if any.
"currencySymbol": "str" # Optional.
Currency symbol label, if any.
},
"valueDate": "2020-02-20", # Optional. Date
value in YYYY-MM-DD format (ISO 8601).
"valueInteger": 0, # Optional. Integer
value.
"valueNumber": 0.0, # Optional. Floating
point value.
"valueObject": {
"str": ...
},
"valuePhoneNumber": "str", # Optional. Phone
number value in E.164 format (ex. +19876543210).
"valueSelectionGroup": [
"str" # Optional. Selection group
value.
],
"valueSelectionMark": "str", # Optional.
Selection mark value. Known values are: "selected" and
"unselected".
"valueSignature": "str", # Optional.
Presence of signature. Known values are: "signed" and "unsigned".
"valueString": "str", # Optional. String
value.
"valueTime": "12:30:00" # Optional. Time
value in hh:mm:ss format (ISO 8601).
}
}
}
],
"figures": [
{
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page number of
page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on the page,
or the entire page if not specified. Coordinates specified
relative to the top-left of the page. The numbers represent
the x, y values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"caption": {
"content": "str", # Content of the caption.
Required.
"spans": [
{
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page
number of page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on
the page, or the entire page if not specified.
Coordinates specified relative to the top-left of the
page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of the
caption.
]
},
"elements": [
"str" # Optional. Child elements of the figure,
excluding any caption or footnotes.
],
"footnotes": [
{
"content": "str", # Content of the footnote.
Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of
the footnote.
]
}
]
}
],
"keyValuePairs": [
{
"confidence": 0.0, # Confidence of correctly extracting the
key-value pair. Required.
"key": {
"content": "str", # Concatenated content of the
key-value element in reading order. Required.
"spans": [
{
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page
number of page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on
the page, or the entire page if not specified.
Coordinates specified relative to the top-left of the
page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation. Required.
]
}
]
},
"value": {
"content": "str", # Concatenated content of the
key-value element in reading order. Required.
"spans": [
{
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page
number of page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on
the page, or the entire page if not specified.
Coordinates specified relative to the top-left of the
page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation. Required.
]
}
]
}
}
],
"languages": [
{
"confidence": 0.0, # Confidence of correctly identifying the
language. Required.
"locale": "str", # Detected language. Value may an ISO
639-1 language code (ex. "en", "fr") or BCP 47 language tag (ex.
"zh-Hans"). Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
]
}
],
"lists": [
{
"items": [
{
"content": "str", # Content of the list
item. Required.
"level": 0, # Level of the list item
(1-indexed). Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of
the list item.
]
}
],
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
]
}
],
"paragraphs": [
{
"content": "str", # Concatenated content of the paragraph in
reading order. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page number of
page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on the page,
or the entire page if not specified. Coordinates specified
relative to the top-left of the page. The numbers represent
the x, y values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"role": "str" # Optional. Semantic role of the paragraph.
Known values are: "pageHeader", "pageFooter", "pageNumber", "title",
"sectionHeading", "footnote", and "formulaBlock".
}
],
"sections": [
{
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"elements": [
"str" # Optional. Child elements of the section.
]
}
],
"styles": [
{
"confidence": 0.0, # Confidence of correctly identifying the
style. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"backgroundColor": "str", # Optional. Background color in
#rrggbb hexadecimal format..
"color": "str", # Optional. Foreground color in #rrggbb
hexadecimal format.
"fontStyle": "str", # Optional. Font style. Known values
are: "normal" and "italic".
"fontWeight": "str", # Optional. Font weight. Known values
are: "normal" and "bold".
"isHandwritten": bool, # Optional. Is content handwritten?.
"similarFontFamily": "str" # Optional. Visually most similar
font from among the set of supported font families, with fallback fonts
following CSS convention (ex. 'Arial, sans-serif').
}
],
"tables": [
{
"cells": [
{
"columnIndex": 0, # Column index of the
cell. Required.
"content": "str", # Concatenated content of
the table cell in reading order. Required.
"rowIndex": 0, # Row index of the cell.
Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"columnSpan": 0, # Optional. Number of
columns spanned by this cell.
"elements": [
"str" # Optional. Child elements of
the table cell.
],
"kind": "str", # Optional. Table cell kind.
Known values are: "content", "rowHeader", "columnHeader",
"stubHead", and "description".
"rowSpan": 0 # Optional. Number of rows
spanned by this cell.
}
],
"columnCount": 0, # Number of columns in the table.
Required.
"rowCount": 0, # Number of rows in the table. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page number of
page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on the page,
or the entire page if not specified. Coordinates specified
relative to the top-left of the page. The numbers represent
the x, y values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"caption": {
"content": "str", # Content of the caption.
Required.
"spans": [
{
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page
number of page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on
the page, or the entire page if not specified.
Coordinates specified relative to the top-left of the
page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of the
caption.
]
},
"footnotes": [
{
"content": "str", # Content of the footnote.
Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of
the footnote.
]
}
]
}
]
}
begin_classify_document
Classifies document with document classifier.
begin_classify_document(classifier_id: str, classify_request: ClassifyDocumentRequest | MutableMapping[str, Any] | IO[bytes], *, string_index_type: str | StringIndexType | None = None, split: str | SplitMode | None = None, **kwargs: Any) -> LROPoller[AnalyzeResult]
Parameters
Name | Description |
---|---|
classifier_id
Required
|
Unique document classifier name. Required. |
classify_request
Required
|
Classify request parameters. Is one of the following types: ClassifyDocumentRequest, JSON, IO[bytes] Required. |
Keyword-Only Parameters
Name | Description |
---|---|
string_index_type
|
Method used to compute string offset and length. Known values are: "textElements", "unicodeCodePoint", and "utf16CodeUnit". Default value is None. |
split
|
Document splitting mode. Known values are: "auto", "none", and "perPage". Default value is None. |
Returns
Type | Description |
---|---|
An instance of LROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping |
Exceptions
Type | Description |
---|---|
Examples
# JSON input template you can fill out and use as your body input.
classify_request = {
"base64Source": bytes("bytes", encoding="utf-8"), # Optional. Base64
encoding of the document to classify. Either urlSource or base64Source must be
specified.
"urlSource": "str" # Optional. Document URL to classify. Either urlSource
or base64Source must be specified.
}
# response body for status code(s): 202
response == {
"apiVersion": "str", # API version used to produce this result. Required.
"content": "str", # Concatenate string representation of all textual and
visual elements in reading order. Required.
"modelId": "str", # Document model ID used to produce this result. Required.
"pages": [
{
"pageNumber": 0, # 1-based page number in the input
document. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"angle": 0.0, # Optional. The general orientation of the
content in clockwise direction, measured in degrees between (-180, 180].
"barcodes": [
{
"confidence": 0.0, # Confidence of correctly
extracting the barcode. Required.
"kind": "str", # Barcode kind. Required.
Known values are: "QRCode", "PDF417", "UPCA", "UPCE", "Code39",
"Code128", "EAN8", "EAN13", "DataBar", "Code93", "Codabar",
"DataBarExpanded", "ITF", "MicroQRCode", "Aztec", "DataMatrix",
and "MaxiCode".
"span": {
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
},
"value": "str", # Barcode value. Required.
"polygon": [
0.0 # Optional. Bounding polygon of
the barcode, with coordinates specified relative to the
top-left of the page. The numbers represent the x, y values
of the polygon vertices, clockwise from the left (-180
degrees inclusive) relative to the element orientation.
]
}
],
"formulas": [
{
"confidence": 0.0, # Confidence of correctly
extracting the formula. Required.
"kind": "str", # Formula kind. Required.
Known values are: "inline" and "display".
"span": {
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
},
"value": "str", # LaTex expression
describing the formula. Required.
"polygon": [
0.0 # Optional. Bounding polygon of
the formula, with coordinates specified relative to the
top-left of the page. The numbers represent the x, y values
of the polygon vertices, clockwise from the left (-180
degrees inclusive) relative to the element orientation.
]
}
],
"height": 0.0, # Optional. The height of the image/PDF in
pixels/inches, respectively.
"lines": [
{
"content": "str", # Concatenated content of
the contained elements in reading order. Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"polygon": [
0.0 # Optional. Bounding polygon of
the line, with coordinates specified relative to the top-left
of the page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation.
]
}
],
"selectionMarks": [
{
"confidence": 0.0, # Confidence of correctly
extracting the selection mark. Required.
"span": {
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
},
"state": "str", # State of the selection
mark. Required. Known values are: "selected" and "unselected".
"polygon": [
0.0 # Optional. Bounding polygon of
the selection mark, with coordinates specified relative to
the top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the left (-180
degrees inclusive) relative to the element orientation.
]
}
],
"unit": "str", # Optional. The unit used by the width,
height, and polygon properties. For images, the unit is "pixel". For PDF,
the unit is "inch". Known values are: "pixel" and "inch".
"width": 0.0, # Optional. The width of the image/PDF in
pixels/inches, respectively.
"words": [
{
"confidence": 0.0, # Confidence of correctly
extracting the word. Required.
"content": "str", # Text content of the
word. Required.
"span": {
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
},
"polygon": [
0.0 # Optional. Bounding polygon of
the word, with coordinates specified relative to the top-left
of the page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation.
]
}
]
}
],
"stringIndexType": "str", # Method used to compute string offset and length.
Required. Known values are: "textElements", "unicodeCodePoint", and
"utf16CodeUnit".
"contentFormat": "str", # Optional. Format of the analyze result top-level
content. Known values are: "text" and "markdown".
"documents": [
{
"confidence": 0.0, # Confidence of correctly extracting the
document. Required.
"docType": "str", # Document type. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page number of
page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on the page,
or the entire page if not specified. Coordinates specified
relative to the top-left of the page. The numbers represent
the x, y values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"fields": {
"str": {
"type": "str", # Data type of the field
value. Required. Known values are: "string", "date", "time",
"phoneNumber", "number", "integer", "selectionMark",
"countryRegion", "signature", "array", "object", "currency",
"address", "boolean", and "selectionGroup".
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"confidence": 0.0, # Optional. Confidence of
correctly extracting the field.
"content": "str", # Optional. Field content.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"valueAddress": {
"city": "str", # Optional. Name of
city, town, village, etc.
"cityDistrict": "str", # Optional.
Districts or boroughs within a city, such as Brooklyn in New
York City or City of Westminster in London.
"countryRegion": "str", # Optional.
Country/region.
"house": "str", # Optional. Build
name, such as World Trade Center.
"houseNumber": "str", # Optional.
House or building number.
"level": "str", # Optional. Floor
number, such as 3F.
"poBox": "str", # Optional. Post
office box number.
"postalCode": "str", # Optional.
Postal code used for mail sorting.
"road": "str", # Optional. Street
name.
"state": "str", # Optional.
First-level administrative division.
"stateDistrict": "str", # Optional.
Second-level administrative division used in certain locales.
"streetAddress": "str", # Optional.
Street-level address, excluding city, state, countryRegion,
and postalCode.
"suburb": "str", # Optional.
Unofficial neighborhood name, like Chinatown.
"unit": "str" # Optional. Apartment
or office number.
},
"valueArray": [
...
],
"valueBoolean": bool, # Optional. Boolean
value.
"valueCountryRegion": "str", # Optional.
3-letter country code value (ISO 3166-1 alpha-3).
"valueCurrency": {
"amount": 0.0, # Currency amount.
Required.
"currencyCode": "str", # Optional.
Resolved currency code (ISO 4217), if any.
"currencySymbol": "str" # Optional.
Currency symbol label, if any.
},
"valueDate": "2020-02-20", # Optional. Date
value in YYYY-MM-DD format (ISO 8601).
"valueInteger": 0, # Optional. Integer
value.
"valueNumber": 0.0, # Optional. Floating
point value.
"valueObject": {
"str": ...
},
"valuePhoneNumber": "str", # Optional. Phone
number value in E.164 format (ex. +19876543210).
"valueSelectionGroup": [
"str" # Optional. Selection group
value.
],
"valueSelectionMark": "str", # Optional.
Selection mark value. Known values are: "selected" and
"unselected".
"valueSignature": "str", # Optional.
Presence of signature. Known values are: "signed" and "unsigned".
"valueString": "str", # Optional. String
value.
"valueTime": "12:30:00" # Optional. Time
value in hh:mm:ss format (ISO 8601).
}
}
}
],
"figures": [
{
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page number of
page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on the page,
or the entire page if not specified. Coordinates specified
relative to the top-left of the page. The numbers represent
the x, y values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"caption": {
"content": "str", # Content of the caption.
Required.
"spans": [
{
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page
number of page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on
the page, or the entire page if not specified.
Coordinates specified relative to the top-left of the
page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of the
caption.
]
},
"elements": [
"str" # Optional. Child elements of the figure,
excluding any caption or footnotes.
],
"footnotes": [
{
"content": "str", # Content of the footnote.
Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of
the footnote.
]
}
]
}
],
"keyValuePairs": [
{
"confidence": 0.0, # Confidence of correctly extracting the
key-value pair. Required.
"key": {
"content": "str", # Concatenated content of the
key-value element in reading order. Required.
"spans": [
{
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page
number of page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on
the page, or the entire page if not specified.
Coordinates specified relative to the top-left of the
page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation. Required.
]
}
]
},
"value": {
"content": "str", # Concatenated content of the
key-value element in reading order. Required.
"spans": [
{
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page
number of page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on
the page, or the entire page if not specified.
Coordinates specified relative to the top-left of the
page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation. Required.
]
}
]
}
}
],
"languages": [
{
"confidence": 0.0, # Confidence of correctly identifying the
language. Required.
"locale": "str", # Detected language. Value may an ISO
639-1 language code (ex. "en", "fr") or BCP 47 language tag (ex.
"zh-Hans"). Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
]
}
],
"lists": [
{
"items": [
{
"content": "str", # Content of the list
item. Required.
"level": 0, # Level of the list item
(1-indexed). Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of
the list item.
]
}
],
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
]
}
],
"paragraphs": [
{
"content": "str", # Concatenated content of the paragraph in
reading order. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page number of
page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on the page,
or the entire page if not specified. Coordinates specified
relative to the top-left of the page. The numbers represent
the x, y values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"role": "str" # Optional. Semantic role of the paragraph.
Known values are: "pageHeader", "pageFooter", "pageNumber", "title",
"sectionHeading", "footnote", and "formulaBlock".
}
],
"sections": [
{
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"elements": [
"str" # Optional. Child elements of the section.
]
}
],
"styles": [
{
"confidence": 0.0, # Confidence of correctly identifying the
style. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"backgroundColor": "str", # Optional. Background color in
#rrggbb hexadecimal format..
"color": "str", # Optional. Foreground color in #rrggbb
hexadecimal format.
"fontStyle": "str", # Optional. Font style. Known values
are: "normal" and "italic".
"fontWeight": "str", # Optional. Font weight. Known values
are: "normal" and "bold".
"isHandwritten": bool, # Optional. Is content handwritten?.
"similarFontFamily": "str" # Optional. Visually most similar
font from among the set of supported font families, with fallback fonts
following CSS convention (ex. 'Arial, sans-serif').
}
],
"tables": [
{
"cells": [
{
"columnIndex": 0, # Column index of the
cell. Required.
"content": "str", # Concatenated content of
the table cell in reading order. Required.
"rowIndex": 0, # Row index of the cell.
Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"columnSpan": 0, # Optional. Number of
columns spanned by this cell.
"elements": [
"str" # Optional. Child elements of
the table cell.
],
"kind": "str", # Optional. Table cell kind.
Known values are: "content", "rowHeader", "columnHeader",
"stubHead", and "description".
"rowSpan": 0 # Optional. Number of rows
spanned by this cell.
}
],
"columnCount": 0, # Number of columns in the table.
Required.
"rowCount": 0, # Number of rows in the table. Required.
"spans": [
{
"length": 0, # Number of characters in the
content represented by the span. Required.
"offset": 0 # Zero-based index of the
content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page number of
page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on the page,
or the entire page if not specified. Coordinates specified
relative to the top-left of the page. The numbers represent
the x, y values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"caption": {
"content": "str", # Content of the caption.
Required.
"spans": [
{
"length": 0, # Number of characters
in the content represented by the span. Required.
"offset": 0 # Zero-based index of
the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based page
number of page containing the bounding region. Required.
"polygon": [
0.0 # Bounding polygon on
the page, or the entire page if not specified.
Coordinates specified relative to the top-left of the
page. The numbers represent the x, y values of the
polygon vertices, clockwise from the left (-180 degrees
inclusive) relative to the element orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of the
caption.
]
},
"footnotes": [
{
"content": "str", # Content of the footnote.
Required.
"spans": [
{
"length": 0, # Number of
characters in the content represented by the span.
Required.
"offset": 0 # Zero-based
index of the content represented by the span. Required.
}
],
"boundingRegions": [
{
"pageNumber": 0, # 1-based
page number of page containing the bounding region.
Required.
"polygon": [
0.0 # Bounding
polygon on the page, or the entire page if not
specified. Coordinates specified relative to the
top-left of the page. The numbers represent the x, y
values of the polygon vertices, clockwise from the
left (-180 degrees inclusive) relative to the element
orientation. Required.
]
}
],
"elements": [
"str" # Optional. Child elements of
the footnote.
]
}
]
}
]
}
close
close() -> None
Keyword-Only Parameters
Name | Description |
---|---|
pages
|
List of 1-based page numbers to analyze. Ex. "1-3,5,7-9". Default value is None. |
locale
|
Locale hint for text recognition and document analysis. Value may contain only the language code (ex. "en", "fr") or BCP 47 language tag (ex. "en-US"). Default value is None. |
string_index_type
|
Method used to compute string offset and length. Known values are: "textElements", "unicodeCodePoint", and "utf16CodeUnit". Default value is None. |
features
|
List of optional analysis features. Default value is None. |
query_fields
|
List of additional fields to extract. Ex. "NumberOfGuests,StoreNumber". Default value is None. |
output_content_format
|
str or
ContentFormat
Format of the analyze result top-level content. Known values are: "text" and "markdown". Default value is None. |
Exceptions
Type | Description |
---|---|
send_request
Runs the network request through the client's chained policies.
>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = client.send_request(request)
<HttpResponse: 200 OK>
For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request
send_request(request: HttpRequest, *, stream: bool = False, **kwargs: Any) -> HttpResponse
Parameters
Name | Description |
---|---|
request
Required
|
The network request you want to make. Required. |
Keyword-Only Parameters
Name | Description |
---|---|
stream
|
Whether the response payload will be streamed. Defaults to False. |
Returns
Type | Description |
---|---|
The response of your network call. Does not do error handling on your response. |
Exceptions
Type | Description |
---|---|
Azure SDK for Python
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for