你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

文档智能附加功能

项目
11/05/2024

重要

使用文档智能选公共预览版，可以提前使用目前正处于开发状态的功能。在正式发布 (GA) 之前，根据用户反馈，功能、方法和流程可能会发生更改。
文档智能客户端库的公共预览版默认使用 REST API 版本 2024-07-31-preview。
公共预览版 2024-07-31-preview 目前仅在以下 Azure 区域中可用。请注意，AI Studio 中的自定义生成式（文档字段提取）模型仅适用于美国中北部区域：
- 美国东部
- 美国西部 2
- “西欧”
- 美国中北部

此内容适用于：v4.0（预览版） | 先前版本：v3.1 (GA)

此内容适用于：v3.1 (GA) | 最新版本：v4.0（预览版）

注意

除名片模型外，所有模型都提供加载项功能。

功能

文档智能支持更复杂的模块化分析功能。使用加载项功能扩展结果，以包含从文档中提取的更多功能。某些加载项功能会产生额外费用。根据文档提取方案，可以启用和禁用这些可选功能。若要启用某个功能，请将关联的功能名称添加到 features 查询字符串属性。可以通过提供逗号分隔的功能列表，在请求中启用多个附加功能。以下附加功能适用于 2023-07-31 (GA) 及更高版本。

ocrHighResolution
formulas
styleFont
barcodes
languages

对于 2024-07-31-preview 版本和更高版本，读取模型支持可搜索的 PDF 输出：

可搜索 PDF

注意

并非所有模型都支持所有附加功能。有关详细信息，请参阅模型数据提取。
Microsoft Office 文件类型目前不支持加载项功能。

文档智能支持可选功能，这些功能可以根据文档提取方案启用和禁用。以下附加功能适用于 2023-10-31-preview 及更高版本：

keyValuePairs
queryFields

注意

2023-10-30-preview API 中的查询字段实现与上一个预览版不同。新实现成本更低，适用于结构化文档。

版本可用性

加载项功能	附加功能/免费	2024-02-29-preview	`2023-07-31`（正式发布）	`2022-08-31`（正式发布）	v2.1 (GA)
字体属性提取	附加功能	✔️	✔️	不适用	不适用
公式提取	附加功能	✔️	✔️	不适用	不适用
高分辨率提取	附加功能	✔️	✔️	不适用	不适用
条形码提取	免费	✔️	✔️	不适用	不适用
语言检测	免费	✔️	✔️	不适用	不适用
键值对	免费	✔️	不适用	不适用	不适用
查询字段	附加功能*	✔️	不适用	不适用	n/a

✱ 附加功能 - 查询字段的定价与其他附加功能不同。有关详细信息，请参阅定价。

支持的文件格式

PDF
映像：JPEG/JPG、PNG、BMP、TIFF、HEIF

✱ 当前不支持 Microsoft Office 文件。

高分辨率提取

从大型文档（如工程图纸）中识别小文本是一项挑战。文本通常与其他图形元素混合在一起，并且具有不同的字体、大小和方向。此外，文本可以分解为单独的部分或与其他符号连接。文档智能现在支持使用 ocr.highResolution 功能从这些类型的文档中提取内容。通过启用此附加功能，可以提高从 A1/A2/A3 文档中提取内容的质量。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=ocrHighResolution

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.OCR_HIGH_RESOLUTION],  # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_with_highres]
if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")

    if page.lines:
        for line_idx, line in enumerate(page.lines):
            words = get_words(page, line)
            print(
                f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
                f"within bounding polygon '{line.polygon}'"
            )

            for word in words:
                print(f"......Word '{word.content}' has a confidence of {word.confidence}")

    if page.selection_marks:
        for selection_mark in page.selection_marks:
            print(
                f"Selection mark is '{selection_mark.state}' within bounding polygon "
                f"'{selection_mark.polygon}' and has a confidence of {selection_mark.confidence}"
            )

if result.tables:
    for table_idx, table in enumerate(result.tables):
        print(f"Table # {table_idx} has {table.row_count} rows and " f"{table.column_count} columns")
        if table.bounding_regions:
            for region in table.bounding_regions:
                print(f"Table # {table_idx} location on page: {region.page_number} is {region.polygon}")
        for cell in table.cells:
            print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
            if cell.bounding_regions:
                for region in cell.bounding_regions:
                    print(f"...content on page {region.page_number} is within bounding polygon '{region.polygon}'")

查看 GitHub 上的示例。

"styles": [true],
"pages": [
  {
    "page_number": 1,
    "width": 1000,
    "height": 800,
    "unit": "px",
    "lines": [
      {
        "line_idx": 1,
        "content": "This",
        "polygon": [10, 20, 30, 40],
        "words": [
          {
            "content": "This",
            "confidence": 0.98
          }
        ]
      }
    ],
    "selection_marks": [
      {
        "state": "selected",
        "polygon": [50, 60, 70, 80],
        "confidence": 0.91
      }
    ]
  }
],
"tables": [
  {
    "table_idx": 1,
    "row_count": 3,
    "column_count": 4,
    "bounding_regions": [
      {
        "page_number": 1,
        "polygon": [100, 200, 300, 400]
      }
    ],
    "cells": [
      {
        "row_index": 1,
        "column_index": 1,
        "content": "Content 1",
        "bounding_regions": [
          {
            "page_number": 1,
            "polygon": [110, 210, 310, 410]
          }
        ]
      }
    ]
  }
]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution

# Analyze a document at a URL:
url = "(https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.OCR_HIGH_RESOLUTION]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_with_highres]
if any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(
        f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}"
    )

    for line_idx, line in enumerate(page.lines):
        words = line.get_words()
        print(
            f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
            f"within bounding polygon '{format_polygon(line.polygon)}'"
        )

        for word in words:
            print(
                f"......Word '{word.content}' has a confidence of {word.confidence}"
            )

    for selection_mark in page.selection_marks:
        print(
            f"Selection mark is '{selection_mark.state}' within bounding polygon "
            f"'{format_polygon(selection_mark.polygon)}' and has a confidence of {selection_mark.confidence}"
        )

for table_idx, table in enumerate(result.tables):
    print(
        f"Table # {table_idx} has {table.row_count} rows and "
        f"{table.column_count} columns"
    )
    for region in table.bounding_regions:
        print(
            f"Table # {table_idx} location on page: {region.page_number} is {format_polygon(region.polygon)}"
        )
    for cell in table.cells:
        print(
            f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
        )
        for region in cell.bounding_regions:
            print(
                f"...content on page {region.page_number} is within bounding polygon '{format_polygon(region.polygon)}'"
            )

查看 GitHub 上的示例。

"styles": [true],
"pages": [
  {
    "page_number": 1,
    "width": 1000,
    "height": 800,
    "unit": "px",
    "lines": [
      {
        "line_idx": 1,
        "content": "This",
        "polygon": [10, 20, 30, 40],
        "words": [
          {
            "content": "This",
            "confidence": 0.98
          }
        ]
      }
    ],
    "selection_marks": [
      {
        "state": "selected",
        "polygon": [50, 60, 70, 80],
        "confidence": 0.91
      }
    ]
  }
],
"tables": [
  {
    "table_idx": 1,
    "row_count": 3,
    "column_count": 4,
    "bounding_regions": [
      {
        "page_number": 1,
        "polygon": [100, 200, 300, 400]
      }
    ],
    "cells": [
      {
        "row_index": 1,
        "column_index": 1,
        "content": "Content 1",
        "bounding_regions": [
          {
            "page_number": 1,
            "polygon": [110, 210, 310, 410]
          }
        ]
      }
    ]
  }
]

公式提取

ocr.formula 功能将 formulas 集合中所有已识别的公式（如数学公式）提取为 content 下的顶级对象。在 content 内，检测到的公式表示为 :formula:。此集合中的每个条目表示一个公式，该公式类型为 inline 或 display，其 LaTeX 表示形式 value 及其 polygon 坐标。最初，公式显示在每页的末尾。

注意

confidence 分数是硬编码的。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=formulas

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.FORMULAS],  # Specify which add-on capabilities to enable
)
result: AnalyzeResult = poller.result()

# [START analyze_formulas]
for page in result.pages:
    print(f"----Formulas detected from page #{page.page_number}----")
    if page.formulas:
        inline_formulas = [f for f in page.formulas if f.kind == "inline"]
        display_formulas = [f for f in page.formulas if f.kind == "display"]

        # To learn the detailed concept of "polygon" in the following content, visit: https://aka.ms/bounding-region
        print(f"Detected {len(inline_formulas)} inline formulas.")
        for formula_idx, formula in enumerate(inline_formulas):
            print(f"- Inline #{formula_idx}: {formula.value}")
            print(f"  Confidence: {formula.confidence}")
            print(f"  Bounding regions: {formula.polygon}")

        print(f"\nDetected {len(display_formulas)} display formulas.")
        for formula_idx, formula in enumerate(display_formulas):
            print(f"- Display #{formula_idx}: {formula.value}")
            print(f"  Confidence: {formula.confidence}")
            print(f"  Bounding regions: {formula.polygon}")

查看 GitHub 上的示例。

"content": ":formula:",
 "pages": [
   {
     "pageNumber": 1,
     "formulas": [
       {
         "kind": "inline",
         "value": "\\frac { \\partial a } { \\partial b }",
         "polygon": [...],
         "span": {...},
         "confidence": 0.99
       },
       {
         "kind": "display",
         "value": "y = a \\times b + a \\times c",
         "polygon": [...],
         "span": {...},
         "confidence": 0.99
       }
     ]
   }
 ]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.FORMULAS]    # Specify which add-on capabilities to enable
)
result = poller.result()

# [START analyze_formulas]
for page in result.pages:
    print(f"----Formulas detected from page #{page.page_number}----")
    inline_formulas = [f for f in page.formulas if f.kind == "inline"]
    display_formulas = [f for f in page.formulas if f.kind == "display"]

    print(f"Detected {len(inline_formulas)} inline formulas.")
    for formula_idx, formula in enumerate(inline_formulas):
        print(f"- Inline #{formula_idx}: {formula.value}")
        print(f"  Confidence: {formula.confidence}")
        print(f"  Bounding regions: {format_polygon(formula.polygon)}")

    print(f"\nDetected {len(display_formulas)} display formulas.")
    for formula_idx, formula in enumerate(display_formulas):
        print(f"- Display #{formula_idx}: {formula.value}")
        print(f"  Confidence: {formula.confidence}")
        print(f"  Bounding regions: {format_polygon(formula.polygon)}")

查看 GitHub 上的示例。

 "content": ":formula:",
   "pages": [
     {
       "pageNumber": 1,
       "formulas": [
         {
           "kind": "inline",
           "value": "\\frac { \\partial a } { \\partial b }",
           "polygon": [...],
           "span": {...},
           "confidence": 0.99
         },
         {
           "kind": "display",
           "value": "y = a \\times b + a \\times c",
           "polygon": [...],
           "span": {...},
           "confidence": 0.99
         }
       ]
     }
   ]

字体属性提取

ocr.font 功能将 styles 集合中提取的文本的所有字体属性提取为 content 下的顶级对象。每个样式对象都会指定一个字体属性、适用的文本范围及其相应的置信度分数。现有样式属性扩展了更多字体属性，例如文本字体的 similarFontFamily、斜体和正常等样式的 fontStyle、粗体或正常样式的 fontWeight、文本颜色的 color 和文本边界框颜色的 backgroundColor。

  {your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=styleFont

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.STYLE_FONT]    # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list)  # e.g., 'Arial, sans-serif
font_styles = defaultdict(list)  # e.g, 'italic'
font_weights = defaultdict(list)  # e.g., 'bold'
font_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format

if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")
    return

print("\n----Fonts styles detected in the document----")

# Iterate over the styles and group them by their font attributes.
for style in result.styles:
    if style.similar_font_family:
        similar_font_families[style.similar_font_family].append(style)
    if style.font_style:
        font_styles[style.font_style].append(style)
    if style.font_weight:
        font_weights[style.font_weight].append(style)
    if style.color:
        font_colors[style.color].append(style)
    if style.background_color:
        font_background_colors[style.background_color].append(style)

print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
    print(f"- Font family: '{font_family}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
    print(f"- Font style: '{font_style}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
    print(f"- Font weight: '{font_weight}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
    print(f"- Font color: '{font_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
    print(f"- Font background color: '{font_background_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

查看 GitHub 上的示例。

"content": "Foo bar",
"styles": [
   {
     "similarFontFamily": "Arial, sans-serif",
     "spans": [ { "offset": 0, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "similarFontFamily": "Times New Roman, serif",
     "spans": [ { "offset": 4, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "fontStyle": "italic",
     "spans": [ { "offset": 1, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "fontWeight": "bold",
     "spans": [ { "offset": 2, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "color": "#FF0000",
     "spans": [ { "offset": 4, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "backgroundColor": "#00FF00",
     "spans": [ { "offset": 5, "length": 2 } ],
     "confidence": 0.98
   }
 ]

  {your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.STYLE_FONT]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list)   # e.g., 'Arial, sans-serif
font_styles = defaultdict(list)             # e.g, 'italic'
font_weights = defaultdict(list)            # e.g., 'bold'
font_colors = defaultdict(list)             # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format

if any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

print("\n----Fonts styles detected in the document----")

# Iterate over the styles and group them by their font attributes.
for style in result.styles:
    if style.similar_font_family:
        similar_font_families[style.similar_font_family].append(style)
    if style.font_style:
        font_styles[style.font_style].append(style)
    if style.font_weight:
        font_weights[style.font_weight].append(style)
    if style.color:
        font_colors[style.color].append(style)
    if style.background_color:
        font_background_colors[style.background_color].append(style)

print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
    print(f"- Font family: '{font_family}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
    print(f"- Font style: '{font_style}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
    print(f"- Font weight: '{font_weight}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
    print(f"- Font color: '{font_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
    print(f"- Font background color: '{font_background_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

查看 GitHub 上的示例。

"content": "Foo bar",
"styles": [
   {
     "similarFontFamily": "Arial, sans-serif",
     "spans": [ { "offset": 0, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "similarFontFamily": "Times New Roman, serif",
     "spans": [ { "offset": 4, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "fontStyle": "italic",
     "spans": [ { "offset": 1, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "fontWeight": "bold",
     "spans": [ { "offset": 2, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "color": "#FF0000",
     "spans": [ { "offset": 4, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "backgroundColor": "#00FF00",
     "spans": [ { "offset": 5, "length": 2 } ],
     "confidence": 0.98
   }
 ]

条形码属性提取

ocr.barcode 功能将 barcodes 集合中所有已识别的条形码提取为 content 下的顶级对象。在 content 内，检测到的条形码表示为 :barcode:。此集合中的每个条目都表示一个条形码，包括条形码类型（表示为 kind）和嵌入的条形码内容（表示为 value）及其 polygon 坐标。最初，条形码显示在每页的末尾。 confidence 硬编码为 1。

支持的条形码类型

条形码类型	示例
`QR Code`
`Code 39`
`Code 93`
`Code 128`
`UPC (UPC-A & UPC-E)`
`PDF417`
`EAN-8`
`EAN-13`
`Codabar`
`Databar`
展开的 `Databar`
`ITF`
`Data Matrix`

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=barcodes

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-read",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.BARCODES]    # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
    print(f"----Barcodes detected from page #{page.page_number}----")
    if page.barcodes:
        print(f"Detected {len(page.barcodes)} barcodes:")
        for barcode_idx, barcode in enumerate(page.barcodes):
            print(f"- Barcode #{barcode_idx}: {barcode.value}")
            print(f"  Kind: {barcode.kind}")
            print(f"  Confidence: {barcode.confidence}")
            print(f"  Bounding regions: {barcode.polygon}")

查看 GitHub 上的示例。

----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
  Kind: QRCode
  Confidence: 0.95
  Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
  Kind: QRCode
  Confidence: 0.98
  Bounding regions: [50.5, 60.5, 70.5, 80.5]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.BARCODES]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
    print(f"----Barcodes detected from page #{page.page_number}----")
    print(f"Detected {len(page.barcodes)} barcodes:")
    for barcode_idx, barcode in enumerate(page.barcodes):
        print(f"- Barcode #{barcode_idx}: {barcode.value}")
        print(f"  Kind: {barcode.kind}")
        print(f"  Confidence: {barcode.confidence}")
        print(f"  Bounding regions: {format_polygon(barcode.polygon)}")

查看 GitHub 上的示例。

----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
  Kind: QRCode
  Confidence: 0.95
  Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
  Kind: QRCode
  Confidence: 0.98
  Bounding regions: [50.5, 60.5, 70.5, 80.5]

语言检测

将 languages 功能添加到 analyzeResult 请求可以预测每个文本行所检测到的主要语言，以及 analyzeResult 下 languages 集合中的 confidence。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=languages

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.LANGUAGES]     # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_languages]
print("----Languages detected in the document----")
if result.languages:
    print(f"Detected {len(result.languages)} languages:")
    for lang_idx, lang in enumerate(result.languages):
        print(f"- Language #{lang_idx}: locale '{lang.locale}'")
        print(f"  Confidence: {lang.confidence}")
        print(
            f"  Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'"
        )

查看 GitHub 上的示例。

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.LANGUAGES]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_languages]
print("----Languages detected in the document----")
print(f"Detected {len(result.languages)} languages:")
for lang_idx, lang in enumerate(result.languages):
    print(f"- Language #{lang_idx}: locale '{lang.locale}'")
    print(f"  Confidence: {lang.confidence}")
    print(f"  Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'")

查看 GitHub 上的示例。

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

可搜索 PDF

借助可搜索的 PDF 功能，可以将模拟 PDF（如扫描图像 PDF 文件）转换为包含嵌入文本的 PDF。嵌入文本通过在图像文件顶部覆盖检测到的文本实体，在 PDF 提取的内容中启用深度文本搜索。

重要

目前，可搜索 PDF 功能仅受读取 OCR 模型 prebuilt-read 支持。使用此功能时，请将 modelId 指定为 prebuilt-read，因为其他模型类型将返回此预览版的错误。
2024-07-31-preview prebuilt-read 模型随附了可搜索 PDF，常规 PDF 使用无需使用成本。

使用可搜索 PDF

若要使用可搜索 PDF，请使用 Analyze 操作发出 POST 请求，并将输出格式指定为 pdf：


POST /documentModels/prebuilt-read:analyze?output=pdf
{...}
202

完成 Analyze 操作后，发出 GET 请求以检索 Analyze 操作结果。

成功完成后，可以检索 PDF 并将其下载为 application/pdf。此操作允许直接下载 PDF 的嵌入文本形式，而不是 Base64 编码的 JSON。


// Monitor the operation until completion.
GET /documentModels/prebuilt-read/analyzeResults/{resultId}
200
{...}

// Upon successful completion, retrieve the PDF as application/pdf.
GET /documentModels/prebuilt-read/analyzeResults/{resultId}/pdf
200 OK
Content-Type: application/pdf

键值对

在早期 API 版本中，prebuilt-document 模型从窗体和文档中提取键值对。通过添加 keyValuePairs 功能到预生成布局，布局模型现在会生成相同的结果。

键值对是文档中的特定范围，用于标识标签或键及其关联的响应或值。在结构化形式中，这些对可以是用户为该字段输入的标签和值。在非结构化文档中，它们可能是基于段落中的文本执行合同的日期。 AI 模型经过训练，可基于各种文档类型、格式和结构提取可识别的键和值。

当模型检测到有键但无关联的值，或模型处理可选字段时，键也可以单独存在。例如，在某些实例中，窗体上的中间名字段可留空。键值对是文档中包含的文本范围。对应以不同方式描述相同值的文档，例如客户/用户，关联的键将是客户或用户（具体取决于上下文）。

REST API

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=keyValuePairs

查询字段

查询字段是一项附加功能，用于扩展从任何预生成模型中提取的架构，或者当键名称为变量时定义特定键名称。若要使用查询字段，请将功能设置为 queryFields 并在 queryFields 属性中提供以逗号分隔的字段名称列表。

文档智能现在支持查询字段提取。使用查询字段提取，可以使用查询请求将字段添加到提取过程，而无需额外的训练。
如果需要扩展预生成或自定义模型的架构，或者需要提取一些具有布局输出的字段，请使用查询字段。
查询字段是高级加载项功能。为了获得最佳结果，请使用适用于多单词字段名称的驼峰式大小写或 Pascal 拼写法字段名称来定义要提取的字段。
查询字段支持每个请求最多 20 个字段。如果文档包含字段的值，则返回字段和值。
此版本具有查询字段功能的新实现，其定价低于早期实现，应进行验证。

注意

文档智能工作室查询字段提取目前可用于布局和预生成模型 2024-02-29-preview2023-10-31-preview API 及更高版本，US tax 模型（W2、1098s 和 1099s 模型）除外。

查询字段提取

对于查询字段提取，请指定要提取的字段，文档智能会相应地分析文档。下面是一个示例：

如果要在文档智能工作室中处理协定，请使用版本 2024-02-29-preview 或 2023-10-31-preview：
在 analyze document 请求过程中，你可以传递字段标签列表，如 Party1、Party2、TermsOfUse、PaymentTerms、PaymentDate 以及 TermEndDate。
文档智能能够分析和提取字段数据，并在结构化 JSON 输出中返回值。
除了查询字段，响应还包括文本、表、选择标记和其他相关数据。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=queryFields&queryFields=TERMS

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/invoice/simple-invoice.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.QUERY_FIELDS],    # Specify which add-on capabilities to enable.
    query_fields=["Address", "InvoiceNumber"],  # Set the features and provide a comma-separated list of field names.
)
result: AnalyzeResult = poller.result()
print("Here are extra fields in result:\n")
if result.documents:
    for doc in result.documents:
        if doc.fields and doc.fields["Address"]:
            print(f"Address: {doc.fields['Address'].value_string}")
        if doc.fields and doc.fields["InvoiceNumber"]:
            print(f"Invoice number: {doc.fields['InvoiceNumber'].value_string}")

查看 GitHub 上的示例。

Address: 1 Redmond way Suite 6000 Redmond, WA Sunnayvale, 99243
Invoice number: 34278587

后续步骤

了解详细信息：读取模型布局模型

SDK 示例：python

查找更多示例：加载项功能

通过

文档智能附加功能

功能

版本可用性

支持的文件格式

高分辨率提取

公式提取

字体属性提取

条形码属性提取

支持的条形码类型

语言检测

可搜索 PDF

使用可搜索 PDF

键值对

REST API

查询字段

查询字段提取

后续步骤

反馈

其他资源