文件智慧附加元件功能

文章
10/16/2024

重要

文件智慧服務公開預覽版本可讓您搶先存取正在積極開發的功能。根據使用者意見反應，功能、方法和流程在正式發行 (GA) 前可能有所變更。
文件智慧服務用戶端程式庫的公開預覽版預設為 REST API 版本 2024-07-31-preview。
公開預覽版 2024-07-31-preview 目前僅適用於下列 Azure 地區。請注意，AI Studio 中的自訂生成 (文件欄位擷取) 模型僅適用美國中北部地區：
- 美國東部
- 美國西部 2
- 西歐
- 美國中北部

此內容適用於： v4.0 (預覽版) | 舊版： v3.1 (GA)

此內容適用於： v3.1 (GA) | 最新版本： v4.0 (預覽版)

注意

除了名片模型以外，所有模型內均提供附加元件功能。

功能

文件智慧支援更複雜和模組化的分析功能。使用附加元件功能來擴充結果以包含從文件中擷取的更多功能。某些附加元件功能會產生額外的成本。可以根據文件擷取的場景來啟用和停用這些選用功能。若要啟用功能，請將相關聯的功能名稱新增至 features 查詢字串屬性。您可以透過提供以逗號分隔的功能清單來根據要求啟用多個附加元件功能。下列附加元件功能可用於 2023-07-31 (GA) 和更新版本。

ocrHighResolution
formulas
styleFont
barcodes
languages

針對 2024-07-31-preview 發行和更新版本，讀取模型支援可搜尋的 PDF 輸出：

可搜尋 PDF

注意

並非所有模型都支援所有附加元件功能。如需詳細資訊，請參閱 模型資料擷取。
Microsoft Office 文件類型目前不支援附加元件功能。

文件智慧服務支援視文件擷取案例而定，可啟用和停用的選用功能。下列附加元件功能可用於 2023-10-31-preview 和更新版本：

keyValuePairs
queryFields

注意

2023-10-30-preview API 中的查詢欄位實作與上一個預覽版本不同。新的實作成本更低且很適用於結構化文件。

版本可用性

附加元件功能	附加元件/免費	2024-02-29-preview	`2023-07-31` (GA)	`2022-08-31` (GA)	v2.1 (GA)
字型屬性擷取	附加元件	✔️	✔️	n/a	n/a
公式擷取	附加元件	✔️	✔️	n/a	n/a
高解析度擷取	附加元件	✔️	✔️	n/a	n/a
條碼擷取	免費	✔️	✔️	n/a	n/a
語言偵測	免費	✔️	✔️	n/a	n/a
索引鍵/值組	免費	✔️	n/a	n/a	n/a
查詢欄位	附加元件*	✔️	n/a	n/a	n/a

✱ 附加元件 - 查詢欄位的價格與其他附加元件功能不同。如需詳細資料，請參閱定價。

支援的檔案格式

PDF
影像：JPEG/JPG、PNG、BMP、TIFF、HEIF

目前不支援 ✱ Microsoft Office 檔案。

高解析度擷取

從大型文件 (如工程繪圖) 中識別小型文字是一項挑戰。文字通常與其他圖形元素混合，並具有不同的字型、大小和方向。此外，文字可以被分成不同部分或與其他符號相連。文件智慧現在支援使用 ocr.highResolution 功能從這些類型的文件中擷取內容。透過啟用此附加功能，您可以提高從 A1/A2/A3 文件中擷取內容的品質。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=ocrHighResolution

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.OCR_HIGH_RESOLUTION],  # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_with_highres]
if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")

    if page.lines:
        for line_idx, line in enumerate(page.lines):
            words = get_words(page, line)
            print(
                f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
                f"within bounding polygon '{line.polygon}'"
            )

            for word in words:
                print(f"......Word '{word.content}' has a confidence of {word.confidence}")

    if page.selection_marks:
        for selection_mark in page.selection_marks:
            print(
                f"Selection mark is '{selection_mark.state}' within bounding polygon "
                f"'{selection_mark.polygon}' and has a confidence of {selection_mark.confidence}"
            )

if result.tables:
    for table_idx, table in enumerate(result.tables):
        print(f"Table # {table_idx} has {table.row_count} rows and " f"{table.column_count} columns")
        if table.bounding_regions:
            for region in table.bounding_regions:
                print(f"Table # {table_idx} location on page: {region.page_number} is {region.polygon}")
        for cell in table.cells:
            print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
            if cell.bounding_regions:
                for region in cell.bounding_regions:
                    print(f"...content on page {region.page_number} is within bounding polygon '{region.polygon}'")

在 GitHub 上檢視範例。

"styles": [true],
"pages": [
  {
    "page_number": 1,
    "width": 1000,
    "height": 800,
    "unit": "px",
    "lines": [
      {
        "line_idx": 1,
        "content": "This",
        "polygon": [10, 20, 30, 40],
        "words": [
          {
            "content": "This",
            "confidence": 0.98
          }
        ]
      }
    ],
    "selection_marks": [
      {
        "state": "selected",
        "polygon": [50, 60, 70, 80],
        "confidence": 0.91
      }
    ]
  }
],
"tables": [
  {
    "table_idx": 1,
    "row_count": 3,
    "column_count": 4,
    "bounding_regions": [
      {
        "page_number": 1,
        "polygon": [100, 200, 300, 400]
      }
    ],
    "cells": [
      {
        "row_index": 1,
        "column_index": 1,
        "content": "Content 1",
        "bounding_regions": [
          {
            "page_number": 1,
            "polygon": [110, 210, 310, 410]
          }
        ]
      }
    ]
  }
]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution

# Analyze a document at a URL:
url = "(https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.OCR_HIGH_RESOLUTION]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_with_highres]
if any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(
        f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}"
    )

    for line_idx, line in enumerate(page.lines):
        words = line.get_words()
        print(
            f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
            f"within bounding polygon '{format_polygon(line.polygon)}'"
        )

        for word in words:
            print(
                f"......Word '{word.content}' has a confidence of {word.confidence}"
            )

    for selection_mark in page.selection_marks:
        print(
            f"Selection mark is '{selection_mark.state}' within bounding polygon "
            f"'{format_polygon(selection_mark.polygon)}' and has a confidence of {selection_mark.confidence}"
        )

for table_idx, table in enumerate(result.tables):
    print(
        f"Table # {table_idx} has {table.row_count} rows and "
        f"{table.column_count} columns"
    )
    for region in table.bounding_regions:
        print(
            f"Table # {table_idx} location on page: {region.page_number} is {format_polygon(region.polygon)}"
        )
    for cell in table.cells:
        print(
            f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
        )
        for region in cell.bounding_regions:
            print(
                f"...content on page {region.page_number} is within bounding polygon '{format_polygon(region.polygon)}'"
            )

在 GitHub 上檢視範例。

"styles": [true],
"pages": [
  {
    "page_number": 1,
    "width": 1000,
    "height": 800,
    "unit": "px",
    "lines": [
      {
        "line_idx": 1,
        "content": "This",
        "polygon": [10, 20, 30, 40],
        "words": [
          {
            "content": "This",
            "confidence": 0.98
          }
        ]
      }
    ],
    "selection_marks": [
      {
        "state": "selected",
        "polygon": [50, 60, 70, 80],
        "confidence": 0.91
      }
    ]
  }
],
"tables": [
  {
    "table_idx": 1,
    "row_count": 3,
    "column_count": 4,
    "bounding_regions": [
      {
        "page_number": 1,
        "polygon": [100, 200, 300, 400]
      }
    ],
    "cells": [
      {
        "row_index": 1,
        "column_index": 1,
        "content": "Content 1",
        "bounding_regions": [
          {
            "page_number": 1,
            "polygon": [110, 210, 310, 410]
          }
        ]
      }
    ]
  }
]

公式擷取

ocr.formula 功能擷取 formulas 集合中所有已識別的公式，如數學方程，作為 content 下的頂端物件。在 content 內部，偵測到的公式表示為 :formula:。此集合中的每個項目都表示一個公式，其包括作為 inline 或 display 的公式類型、作為 value 的 LaTeX 表示及其 polygon 座標。最初，公式顯示在每頁的末尾。

注意

confidence 分數為硬式編碼。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=formulas

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.FORMULAS],  # Specify which add-on capabilities to enable
)
result: AnalyzeResult = poller.result()

# [START analyze_formulas]
for page in result.pages:
    print(f"----Formulas detected from page #{page.page_number}----")
    if page.formulas:
        inline_formulas = [f for f in page.formulas if f.kind == "inline"]
        display_formulas = [f for f in page.formulas if f.kind == "display"]

        # To learn the detailed concept of "polygon" in the following content, visit: https://aka.ms/bounding-region
        print(f"Detected {len(inline_formulas)} inline formulas.")
        for formula_idx, formula in enumerate(inline_formulas):
            print(f"- Inline #{formula_idx}: {formula.value}")
            print(f"  Confidence: {formula.confidence}")
            print(f"  Bounding regions: {formula.polygon}")

        print(f"\nDetected {len(display_formulas)} display formulas.")
        for formula_idx, formula in enumerate(display_formulas):
            print(f"- Display #{formula_idx}: {formula.value}")
            print(f"  Confidence: {formula.confidence}")
            print(f"  Bounding regions: {formula.polygon}")

在 GitHub 上檢視範例。

"content": ":formula:",
 "pages": [
   {
     "pageNumber": 1,
     "formulas": [
       {
         "kind": "inline",
         "value": "\\frac { \\partial a } { \\partial b }",
         "polygon": [...],
         "span": {...},
         "confidence": 0.99
       },
       {
         "kind": "display",
         "value": "y = a \\times b + a \\times c",
         "polygon": [...],
         "span": {...},
         "confidence": 0.99
       }
     ]
   }
 ]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.FORMULAS]    # Specify which add-on capabilities to enable
)
result = poller.result()

# [START analyze_formulas]
for page in result.pages:
    print(f"----Formulas detected from page #{page.page_number}----")
    inline_formulas = [f for f in page.formulas if f.kind == "inline"]
    display_formulas = [f for f in page.formulas if f.kind == "display"]

    print(f"Detected {len(inline_formulas)} inline formulas.")
    for formula_idx, formula in enumerate(inline_formulas):
        print(f"- Inline #{formula_idx}: {formula.value}")
        print(f"  Confidence: {formula.confidence}")
        print(f"  Bounding regions: {format_polygon(formula.polygon)}")

    print(f"\nDetected {len(display_formulas)} display formulas.")
    for formula_idx, formula in enumerate(display_formulas):
        print(f"- Display #{formula_idx}: {formula.value}")
        print(f"  Confidence: {formula.confidence}")
        print(f"  Bounding regions: {format_polygon(formula.polygon)}")

在 GitHub 上檢視範例。

 "content": ":formula:",
   "pages": [
     {
       "pageNumber": 1,
       "formulas": [
         {
           "kind": "inline",
           "value": "\\frac { \\partial a } { \\partial b }",
           "polygon": [...],
           "span": {...},
           "confidence": 0.99
         },
         {
           "kind": "display",
           "value": "y = a \\times b + a \\times c",
           "polygon": [...],
           "span": {...},
           "confidence": 0.99
         }
       ]
     }
   ]

字型屬性擷取

ocr.font 功能擷取 styles 集合中擷取之文字的所有字型屬性，作為 content 下的頂端物件。每個樣式物件都指定單一字型内容、它所套用的文字範圍及其相應的信賴度分數。現有樣式屬性擴充了更多字型屬性，例如針對文字字型的 similarFontFamily，針對斜體和一般等樣式的 fontStyle，針對粗體或一般的 fontWeight，針對文字色彩的 color，針對文字週框方塊色彩的 backgroundColor。

  {your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=styleFont

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.STYLE_FONT]    # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list)  # e.g., 'Arial, sans-serif
font_styles = defaultdict(list)  # e.g, 'italic'
font_weights = defaultdict(list)  # e.g., 'bold'
font_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format

if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")
    return

print("\n----Fonts styles detected in the document----")

# Iterate over the styles and group them by their font attributes.
for style in result.styles:
    if style.similar_font_family:
        similar_font_families[style.similar_font_family].append(style)
    if style.font_style:
        font_styles[style.font_style].append(style)
    if style.font_weight:
        font_weights[style.font_weight].append(style)
    if style.color:
        font_colors[style.color].append(style)
    if style.background_color:
        font_background_colors[style.background_color].append(style)

print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
    print(f"- Font family: '{font_family}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
    print(f"- Font style: '{font_style}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
    print(f"- Font weight: '{font_weight}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
    print(f"- Font color: '{font_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
    print(f"- Font background color: '{font_background_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

在 GitHub 上檢視範例。

"content": "Foo bar",
"styles": [
   {
     "similarFontFamily": "Arial, sans-serif",
     "spans": [ { "offset": 0, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "similarFontFamily": "Times New Roman, serif",
     "spans": [ { "offset": 4, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "fontStyle": "italic",
     "spans": [ { "offset": 1, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "fontWeight": "bold",
     "spans": [ { "offset": 2, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "color": "#FF0000",
     "spans": [ { "offset": 4, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "backgroundColor": "#00FF00",
     "spans": [ { "offset": 5, "length": 2 } ],
     "confidence": 0.98
   }
 ]

  {your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.STYLE_FONT]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list)   # e.g., 'Arial, sans-serif
font_styles = defaultdict(list)             # e.g, 'italic'
font_weights = defaultdict(list)            # e.g., 'bold'
font_colors = defaultdict(list)             # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format

if any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

print("\n----Fonts styles detected in the document----")

# Iterate over the styles and group them by their font attributes.
for style in result.styles:
    if style.similar_font_family:
        similar_font_families[style.similar_font_family].append(style)
    if style.font_style:
        font_styles[style.font_style].append(style)
    if style.font_weight:
        font_weights[style.font_weight].append(style)
    if style.color:
        font_colors[style.color].append(style)
    if style.background_color:
        font_background_colors[style.background_color].append(style)

print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
    print(f"- Font family: '{font_family}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
    print(f"- Font style: '{font_style}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
    print(f"- Font weight: '{font_weight}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
    print(f"- Font color: '{font_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
    print(f"- Font background color: '{font_background_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

在 GitHub 上檢視範例。

"content": "Foo bar",
"styles": [
   {
     "similarFontFamily": "Arial, sans-serif",
     "spans": [ { "offset": 0, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "similarFontFamily": "Times New Roman, serif",
     "spans": [ { "offset": 4, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "fontStyle": "italic",
     "spans": [ { "offset": 1, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "fontWeight": "bold",
     "spans": [ { "offset": 2, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "color": "#FF0000",
     "spans": [ { "offset": 4, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "backgroundColor": "#00FF00",
     "spans": [ { "offset": 5, "length": 2 } ],
     "confidence": 0.98
   }
 ]

條碼屬性擷取

ocr.barcode 功能會將 barcodes 集合中所有已識別的條碼擷取為 content 底下的最上層物件。在 content 內部，偵測到的條碼表示為 :barcode:。此集合中的每個項目表示一個條碼，包括條碼類型 kind 和內嵌的條碼內容 value 及其 polygon 座標。最初，條碼顯示在每頁的末尾。 confidence 會被硬編碼為 1。

支援的條碼類型

條碼類型	範例
`QR Code`
`Code 39`
`Code 93`
`Code 128`
`UPC (UPC-A & UPC-E)`
`PDF417`
`EAN-8`
`EAN-13`
`Codabar`
`Databar`
展開了 `Databar`
`ITF`
`Data Matrix`

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=barcodes

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-read",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.BARCODES]    # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
    print(f"----Barcodes detected from page #{page.page_number}----")
    if page.barcodes:
        print(f"Detected {len(page.barcodes)} barcodes:")
        for barcode_idx, barcode in enumerate(page.barcodes):
            print(f"- Barcode #{barcode_idx}: {barcode.value}")
            print(f"  Kind: {barcode.kind}")
            print(f"  Confidence: {barcode.confidence}")
            print(f"  Bounding regions: {barcode.polygon}")

在 GitHub 上檢視範例。

----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
  Kind: QRCode
  Confidence: 0.95
  Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
  Kind: QRCode
  Confidence: 0.98
  Bounding regions: [50.5, 60.5, 70.5, 80.5]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.BARCODES]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
    print(f"----Barcodes detected from page #{page.page_number}----")
    print(f"Detected {len(page.barcodes)} barcodes:")
    for barcode_idx, barcode in enumerate(page.barcodes):
        print(f"- Barcode #{barcode_idx}: {barcode.value}")
        print(f"  Kind: {barcode.kind}")
        print(f"  Confidence: {barcode.confidence}")
        print(f"  Bounding regions: {format_polygon(barcode.polygon)}")

在 GitHub 上檢視範例。

----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
  Kind: QRCode
  Confidence: 0.95
  Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
  Kind: QRCode
  Confidence: 0.98
  Bounding regions: [50.5, 60.5, 70.5, 80.5]

語言偵測

將 languages 功能新增至 analyzeResult 要求可以預測每個文字行偵測到的主要語言以及 analyzeResult 下的 languages 集合中的 confidence。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=languages

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.LANGUAGES]     # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_languages]
print("----Languages detected in the document----")
if result.languages:
    print(f"Detected {len(result.languages)} languages:")
    for lang_idx, lang in enumerate(result.languages):
        print(f"- Language #{lang_idx}: locale '{lang.locale}'")
        print(f"  Confidence: {lang.confidence}")
        print(
            f"  Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'"
        )

在 GitHub 上檢視範例。

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.LANGUAGES]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_languages]
print("----Languages detected in the document----")
print(f"Detected {len(result.languages)} languages:")
for lang_idx, lang in enumerate(result.languages):
    print(f"- Language #{lang_idx}: locale '{lang.locale}'")
    print(f"  Confidence: {lang.confidence}")
    print(f"  Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'")

在 GitHub 上檢視範例。

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

可搜尋 PDF

可搜尋 PDF 功能可讓您將類比 PDF (例如掃描影像 PDF 檔案) 轉換為具有內嵌文字的 PDF。內嵌文字可在 PDF 擷取的內容中啟用深層文字搜尋，方法是將偵測到的文字實體重疊在影像檔案上。

重要

目前，可搜尋 PDF 功能僅支援讀取 OCR 模型 prebuilt-read。使用此功能時，請將 modelId 指定為 prebuilt-read，因為其他模型類型會傳回這個預覽版本的錯誤。
可搜尋 PDF 隨附於 2024-07-31-preview prebuilt-read 模型，且一般 PDF 使用量不需要任何使用量成本。

使用可搜尋 PDF

若要使用可搜尋 PDF，請使用 Analyze 作業提出 POST 要求，並將輸出格式指定為 pdf：


POST /documentModels/prebuilt-read:analyze?output=pdf
{...}
202

完成 Analyze 作業之後，請提出 GET 要求來擷取 Analyze 作業結果。

成功完成時，可以擷取 PDF 並下載為 application/pdf。此作業允許直接下載 PDF 的內嵌文字格式，而不是 Base64 編碼 JSON。


// Monitor the operation until completion.
GET /documentModels/prebuilt-read/analyzeResults/{resultId}
200
{...}

// Upon successful completion, retrieve the PDF as application/pdf.
GET /documentModels/prebuilt-read/analyzeResults/{resultId}/pdf
200 OK
Content-Type: application/pdf

索引鍵/值組

在舊版 API 中 prebuilt-document ，模型會從表單和檔擷取機碼/值組。透過將 keyValuePairs 功能新增至預先建置的版面配置，版面配置模型現在可以產生相同的結果。

索引鍵/值組是文件內的特定範圍，其識別標籤或索引鍵，及其相關的回應或值。在結構化表單中，這些組別可能是標籤，以及使用者為該欄位輸入的值。在非結構化文件中，它們可能是根據段落中文字內容而得的合約執行日期。 AI 模型已經過定型，可以根據各種不同的文件類型、格式和結構來擷取可識別的索引鍵和值。

若模型偵測到索引鍵存在，且沒有相關聯的值或處理選用欄位時，索引鍵也可以單獨存在。例如，在某些情況下，表單上的中間名欄位可以留空。索引鍵/值組是文件中所包含的文字範圍。若是文件對相同的值有不同的描述方式，例如客戶/使用者，則相關聯的關鍵為客戶或使用者，視前後文而定。

REST API

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=keyValuePairs

查詢欄位

查詢欄位是一項附加元件功能，可擴充從任何預先建置模型中擷取的結構描述，或在索引鍵名稱可變時定義特定的索引鍵名稱。若要使用查詢欄位，請將功能設定為 queryFields，並在 queryFields 屬性中提供以逗號分隔的欄位名稱清單。

文件智慧現在支援查詢欄位擷取。使用查詢欄位擷取，您即可使用查詢要求將欄位新增至擷取流程，而不需要新增訓練。
當您需要擴充預先建置或自訂模型的結構描述，或需要使用版面配置輸出來擷取一些欄位時，請使用查詢欄位。
查詢欄位是一項進階附加元件功能。為了獲得最佳結果，請使用駝峰式大小寫或帕斯卡式大小寫欄位名稱 (對於多重單字欄位名稱) 來定義要擷取的欄位。
查詢欄位支援每個要求最多 20 個欄位。如果文件包含欄位的值，則會傳回欄位和值。
此版本具有查詢欄位功能的新實作方式，其價格低於先前的實作方式且應該經過驗證。

注意

文件智慧工作室查詢欄位擷取目前可用於版面配置和預先建置的模型 2024-02-29-preview 2023-10-31-preview API 和更新版本，但 US tax 模型 (W2、1098s 和 1099s 模型) 除外。

查詢欄位擷取

針對查詢欄位擷取，請指定您要擷取的欄位，而文件智慧會據以分析文件。以下是範例：

如果您正在文件智慧工作室中處理合約，請使用 2024-02-29-preview 或 2023-10-31-preview 版本：
您可以傳遞欄位標籤清單，例如 Party1、Party2、TermsOfUse、PaymentTerms、PaymentDate 和 TermEndDate 作為 analyze document 要求的一部分。
文件智慧能夠分析和擷取欄位資料並以結構化 JSON 輸出傳回值。
除了查詢欄位之外，回應還會包含文字、資料表、選取標記和其他相關資料。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=queryFields&queryFields=TERMS

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/invoice/simple-invoice.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.QUERY_FIELDS],    # Specify which add-on capabilities to enable.
    query_fields=["Address", "InvoiceNumber"],  # Set the features and provide a comma-separated list of field names.
)
result: AnalyzeResult = poller.result()
print("Here are extra fields in result:\n")
if result.documents:
    for doc in result.documents:
        if doc.fields and doc.fields["Address"]:
            print(f"Address: {doc.fields['Address'].value_string}")
        if doc.fields and doc.fields["InvoiceNumber"]:
            print(f"Invoice number: {doc.fields['InvoiceNumber'].value_string}")

在 GitHub 上檢視範例。

Address: 1 Redmond way Suite 6000 Redmond, WA Sunnayvale, 99243
Invoice number: 34278587

下一步

深入了解：讀取模型 版面配置模型

SDK 範例：python

尋找更多範例：附加元件功能

分享方式：

文件智慧附加元件功能

功能

版本可用性

支援的檔案格式

高解析度擷取

公式擷取

字型屬性擷取

條碼屬性擷取

支援的條碼類型

語言偵測

可搜尋 PDF

使用可搜尋 PDF

索引鍵/值組

REST API

查詢欄位

查詢欄位擷取

下一步

意見反映

更多資源