ドキュメントインテリジェンスアドオン機能

[アーティクル]
10/18/2024

重要

Document Intelligence パブリックプレビューリリースは、開発中の機能への早期アクセスを提供します。機能、アプローチ、およびプロセスは、一般提供 (GA) の前に、ユーザーからのフィードバックに基づいて変更される可能性があります。
Document Intelligence クライアントライブラリのパブリックプレビューバージョンは、REST API バージョン 2024-07-31-preview にデフォルトで設定されています。
パブリックプレビューバージョン 2024-07-31-preview は、現在、次の Azure リージョンでのみ使用できます。 AI Studio のカスタム生成 (ドキュメントフィールド抽出) モデルは、米国中北部リージョンでのみ使用できます。
- 米国東部
- 米国西部 2
- "西ヨーロッパ"
- 米国中北部

このコンテンツの適用対象: v4.0 (プレビュー) | 以前のバージョン: v3.1 (GA)

このコンテンツの適用対象: v3.1 (GA) | 最新バージョン: v4.0 (プレビュー)

Note

アドオン機能は、名刺モデルを除くすべてのモデル内で使用できます。

機能

ドキュメントインテリジェンスでは、より高度でモジュール形式の解析機能がサポートされています。アドオン機能を使用して結果を拡張し、ドキュメントから抽出されたより多くのフィーチャーを含めます。一部のアドオンフィーチャーでは、追加コストが発生します。これらのオプション機能は、ドキュメント抽出のシナリオに応じて有効または無効にすることができます。機能を有効にするには、関連付けられている機能名を features クエリ文字列プロパティに追加します。機能のコンマ区切りの一覧を指定することで、要求で複数のアドオン機能を有効にすることができます。次のアドオン機能は、2023-07-31 (GA) 以降のリリースで使用できます。

ocrHighResolution
formulas
styleFont
barcodes
languages

2024-07-31-preview リリース以降の読み取りモデルでは、検索可能な PDF 出力がサポートされています。

検索可能な PDF

Note

すべてのアドオン機能がすべてのモデルでサポートされているわけではありません。詳細については、モデルデータの抽出を参照してください。
現在、アドオン機能は Microsoft Office ファイルの種類ではサポートされていません。

Document Intelligence では、ドキュメント抽出シナリオに応じて有効または無効にできるオプション機能がサポートされています。次のアドオン機能は、2023-10-31-preview およびそれ以降のリリースで使用できます。

keyValuePairs
queryFields

Note

2023-10-30-プレビュー API でのクエリフィールドの実装は、前回のプレビューリリースとは異なります。新しい実装はコストが低く、構造化されたドキュメントで適切に動作します。

バージョンの可用性

アドオン機能	アドオン/無料	2024-02-29-preview	`2023-07-31` (GA)	`2022-08-31` (GA)	v2.1 (GA)
Font プロパティの抽出	アドオン	✔️	✔️	該当なし	該当なし
数式の抽出	アドオン	✔️	✔️	該当なし	該当なし
高解像度の抽出	アドオン	✔️	✔️	該当なし	該当なし
バーコード抽出	Free	✔️	✔️	該当なし	該当なし
言語検出	Free	✔️	✔️	該当なし	該当なし
キーと値のペア	Free	✔️	該当なし	なし	該当なし
クエリフィールド	アドオン*	✔️	該当なし	なし	該当なし

✱ アドオン - クエリフィールドは、他のアドオン機能とは価格設定が異なります。詳細については、価格のページを参照してください。

"サポートされているファイル形式"

PDF
画像: JPEG/JPG, PNG, BMP, TIFF, HEIF

✱ 現在、Microsoft Office ファイルはサポートされません。

高解像度の抽出

エンジニアリング図面のように、大きなサイズのドキュメントから小さなテキストを認識する作業は困難です。多くの場合、テキストは他のグラフィック要素と混在しており、それには、さまざまなフォント、サイズ、向きがあります。さらに、テキストを別のパーツに分割したり、他のシンボルと接続したりできます。ドキュメントインテリジェンスでは、これらの種類のドキュメントからコンテンツを抽出する ocr.highResolution 機能がサポートされるようになりました。このアドオン機能を有効にすると、A1/A2/A3 ドキュメントからのコンテンツ抽出の品質が向上します。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=ocrHighResolution

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.OCR_HIGH_RESOLUTION],  # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_with_highres]
if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}")

    if page.lines:
        for line_idx, line in enumerate(page.lines):
            words = get_words(page, line)
            print(
                f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
                f"within bounding polygon '{line.polygon}'"
            )

            for word in words:
                print(f"......Word '{word.content}' has a confidence of {word.confidence}")

    if page.selection_marks:
        for selection_mark in page.selection_marks:
            print(
                f"Selection mark is '{selection_mark.state}' within bounding polygon "
                f"'{selection_mark.polygon}' and has a confidence of {selection_mark.confidence}"
            )

if result.tables:
    for table_idx, table in enumerate(result.tables):
        print(f"Table # {table_idx} has {table.row_count} rows and " f"{table.column_count} columns")
        if table.bounding_regions:
            for region in table.bounding_regions:
                print(f"Table # {table_idx} location on page: {region.page_number} is {region.polygon}")
        for cell in table.cells:
            print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
            if cell.bounding_regions:
                for region in cell.bounding_regions:
                    print(f"...content on page {region.page_number} is within bounding polygon '{region.polygon}'")

GitHub でサンプルを表示する。

"styles": [true],
"pages": [
  {
    "page_number": 1,
    "width": 1000,
    "height": 800,
    "unit": "px",
    "lines": [
      {
        "line_idx": 1,
        "content": "This",
        "polygon": [10, 20, 30, 40],
        "words": [
          {
            "content": "This",
            "confidence": 0.98
          }
        ]
      }
    ],
    "selection_marks": [
      {
        "state": "selected",
        "polygon": [50, 60, 70, 80],
        "confidence": 0.91
      }
    ]
  }
],
"tables": [
  {
    "table_idx": 1,
    "row_count": 3,
    "column_count": 4,
    "bounding_regions": [
      {
        "page_number": 1,
        "polygon": [100, 200, 300, 400]
      }
    ],
    "cells": [
      {
        "row_index": 1,
        "column_index": 1,
        "content": "Content 1",
        "bounding_regions": [
          {
            "page_number": 1,
            "polygon": [110, 210, 310, 410]
          }
        ]
      }
    ]
  }
]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution

# Analyze a document at a URL:
url = "(https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-highres.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.OCR_HIGH_RESOLUTION]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_with_highres]
if any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

for page in result.pages:
    print(f"----Analyzing layout from page #{page.page_number}----")
    print(
        f"Page has width: {page.width} and height: {page.height}, measured with unit: {page.unit}"
    )

    for line_idx, line in enumerate(page.lines):
        words = line.get_words()
        print(
            f"...Line # {line_idx} has word count {len(words)} and text '{line.content}' "
            f"within bounding polygon '{format_polygon(line.polygon)}'"
        )

        for word in words:
            print(
                f"......Word '{word.content}' has a confidence of {word.confidence}"
            )

    for selection_mark in page.selection_marks:
        print(
            f"Selection mark is '{selection_mark.state}' within bounding polygon "
            f"'{format_polygon(selection_mark.polygon)}' and has a confidence of {selection_mark.confidence}"
        )

for table_idx, table in enumerate(result.tables):
    print(
        f"Table # {table_idx} has {table.row_count} rows and "
        f"{table.column_count} columns"
    )
    for region in table.bounding_regions:
        print(
            f"Table # {table_idx} location on page: {region.page_number} is {format_polygon(region.polygon)}"
        )
    for cell in table.cells:
        print(
            f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
        )
        for region in cell.bounding_regions:
            print(
                f"...content on page {region.page_number} is within bounding polygon '{format_polygon(region.polygon)}'"
            )

GitHub でサンプルを表示する。

"styles": [true],
"pages": [
  {
    "page_number": 1,
    "width": 1000,
    "height": 800,
    "unit": "px",
    "lines": [
      {
        "line_idx": 1,
        "content": "This",
        "polygon": [10, 20, 30, 40],
        "words": [
          {
            "content": "This",
            "confidence": 0.98
          }
        ]
      }
    ],
    "selection_marks": [
      {
        "state": "selected",
        "polygon": [50, 60, 70, 80],
        "confidence": 0.91
      }
    ]
  }
],
"tables": [
  {
    "table_idx": 1,
    "row_count": 3,
    "column_count": 4,
    "bounding_regions": [
      {
        "page_number": 1,
        "polygon": [100, 200, 300, 400]
      }
    ],
    "cells": [
      {
        "row_index": 1,
        "column_index": 1,
        "content": "Content 1",
        "bounding_regions": [
          {
            "page_number": 1,
            "polygon": [110, 210, 310, 410]
          }
        ]
      }
    ]
  }
]

数式の抽出

この ocr.formula 機能は、formulas コレクション内のすべての識別された数式 (数式など) を、content の最上位オブジェクトとして抽出します。 content 内では、検出された数式は :formula: として表されます。このコレクションの各エントリは、数式の種類を inline または display として、LaTeX 表現を value として、その polygon 座標を含む数式を表します。最初は、各ページの最後に数式が表示されます。

Note

confidence スコアはハードコーディングされています。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=formulas

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.FORMULAS],  # Specify which add-on capabilities to enable
)
result: AnalyzeResult = poller.result()

# [START analyze_formulas]
for page in result.pages:
    print(f"----Formulas detected from page #{page.page_number}----")
    if page.formulas:
        inline_formulas = [f for f in page.formulas if f.kind == "inline"]
        display_formulas = [f for f in page.formulas if f.kind == "display"]

        # To learn the detailed concept of "polygon" in the following content, visit: https://aka.ms/bounding-region
        print(f"Detected {len(inline_formulas)} inline formulas.")
        for formula_idx, formula in enumerate(inline_formulas):
            print(f"- Inline #{formula_idx}: {formula.value}")
            print(f"  Confidence: {formula.confidence}")
            print(f"  Bounding regions: {formula.polygon}")

        print(f"\nDetected {len(display_formulas)} display formulas.")
        for formula_idx, formula in enumerate(display_formulas):
            print(f"- Display #{formula_idx}: {formula.value}")
            print(f"  Confidence: {formula.confidence}")
            print(f"  Bounding regions: {formula.polygon}")

GitHub でサンプルを表示する。

"content": ":formula:",
 "pages": [
   {
     "pageNumber": 1,
     "formulas": [
       {
         "kind": "inline",
         "value": "\\frac { \\partial a } { \\partial b }",
         "polygon": [...],
         "span": {...},
         "confidence": 0.99
       },
       {
         "kind": "display",
         "value": "y = a \\times b + a \\times c",
         "polygon": [...],
         "span": {...},
         "confidence": 0.99
       }
     ]
   }
 ]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/layout-formulas.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.FORMULAS]    # Specify which add-on capabilities to enable
)
result = poller.result()

# [START analyze_formulas]
for page in result.pages:
    print(f"----Formulas detected from page #{page.page_number}----")
    inline_formulas = [f for f in page.formulas if f.kind == "inline"]
    display_formulas = [f for f in page.formulas if f.kind == "display"]

    print(f"Detected {len(inline_formulas)} inline formulas.")
    for formula_idx, formula in enumerate(inline_formulas):
        print(f"- Inline #{formula_idx}: {formula.value}")
        print(f"  Confidence: {formula.confidence}")
        print(f"  Bounding regions: {format_polygon(formula.polygon)}")

    print(f"\nDetected {len(display_formulas)} display formulas.")
    for formula_idx, formula in enumerate(display_formulas):
        print(f"- Display #{formula_idx}: {formula.value}")
        print(f"  Confidence: {formula.confidence}")
        print(f"  Bounding regions: {format_polygon(formula.polygon)}")

GitHub でサンプルを表示する。

 "content": ":formula:",
   "pages": [
     {
       "pageNumber": 1,
       "formulas": [
         {
           "kind": "inline",
           "value": "\\frac { \\partial a } { \\partial b }",
           "polygon": [...],
           "span": {...},
           "confidence": 0.99
         },
         {
           "kind": "display",
           "value": "y = a \\times b + a \\times c",
           "polygon": [...],
           "span": {...},
           "confidence": 0.99
         }
       ]
     }
   ]

Font プロパティの抽出

ocr.font 機能は、styles コレクションで抽出されたテキストのすべてのフォントプロパティを、content の下の最上位オブジェクトとして抽出します。各スタイルオブジェクトは、1 つのフォントプロパティ、適用対象のテキストスパン、および対応する信頼度スコアを指定します。既存のスタイルプロパティは、テキストのフォントの similarFontFamily、斜体や標準などのスタイルの fontStyle、太字または標準の fontWeight、テキストの色の color など、より多くのフォントプロパティで拡張されています。 backgroundColor はテキスト境界ボックスの色です。

  {your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=styleFont

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.STYLE_FONT]    # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list)  # e.g., 'Arial, sans-serif
font_styles = defaultdict(list)  # e.g, 'italic'
font_weights = defaultdict(list)  # e.g., 'bold'
font_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format

if result.styles and any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")
    return

print("\n----Fonts styles detected in the document----")

# Iterate over the styles and group them by their font attributes.
for style in result.styles:
    if style.similar_font_family:
        similar_font_families[style.similar_font_family].append(style)
    if style.font_style:
        font_styles[style.font_style].append(style)
    if style.font_weight:
        font_weights[style.font_weight].append(style)
    if style.color:
        font_colors[style.color].append(style)
    if style.background_color:
        font_background_colors[style.background_color].append(style)

print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
    print(f"- Font family: '{font_family}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
    print(f"- Font style: '{font_style}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
    print(f"- Font weight: '{font_weight}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
    print(f"- Font color: '{font_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
    print(f"- Font background color: '{font_background_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

GitHub でサンプルを表示する。

"content": "Foo bar",
"styles": [
   {
     "similarFontFamily": "Arial, sans-serif",
     "spans": [ { "offset": 0, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "similarFontFamily": "Times New Roman, serif",
     "spans": [ { "offset": 4, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "fontStyle": "italic",
     "spans": [ { "offset": 1, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "fontWeight": "bold",
     "spans": [ { "offset": 2, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "color": "#FF0000",
     "spans": [ { "offset": 4, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "backgroundColor": "#00FF00",
     "spans": [ { "offset": 5, "length": 2 } ],
     "confidence": 0.98
   }
 ]

  {your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/receipt/receipt-with-tips.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.STYLE_FONT]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_fonts]
# DocumentStyle has the following font related attributes:
similar_font_families = defaultdict(list)   # e.g., 'Arial, sans-serif
font_styles = defaultdict(list)             # e.g, 'italic'
font_weights = defaultdict(list)            # e.g., 'bold'
font_colors = defaultdict(list)             # in '#rrggbb' hexadecimal format
font_background_colors = defaultdict(list)  # in '#rrggbb' hexadecimal format

if any([style.is_handwritten for style in result.styles]):
    print("Document contains handwritten content")
else:
    print("Document does not contain handwritten content")

print("\n----Fonts styles detected in the document----")

# Iterate over the styles and group them by their font attributes.
for style in result.styles:
    if style.similar_font_family:
        similar_font_families[style.similar_font_family].append(style)
    if style.font_style:
        font_styles[style.font_style].append(style)
    if style.font_weight:
        font_weights[style.font_weight].append(style)
    if style.color:
        font_colors[style.color].append(style)
    if style.background_color:
        font_background_colors[style.background_color].append(style)

print(f"Detected {len(similar_font_families)} font families:")
for font_family, styles in similar_font_families.items():
    print(f"- Font family: '{font_family}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_styles)} font styles:")
for font_style, styles in font_styles.items():
    print(f"- Font style: '{font_style}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_weights)} font weights:")
for font_weight, styles in font_weights.items():
    print(f"- Font weight: '{font_weight}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_colors)} font colors:")
for font_color, styles in font_colors.items():
    print(f"- Font color: '{font_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

print(f"\nDetected {len(font_background_colors)} font background colors:")
for font_background_color, styles in font_background_colors.items():
    print(f"- Font background color: '{font_background_color}'")
    print(f"  Text: '{get_styled_text(styles, result.content)}'")

GitHub でサンプルを表示する。

"content": "Foo bar",
"styles": [
   {
     "similarFontFamily": "Arial, sans-serif",
     "spans": [ { "offset": 0, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "similarFontFamily": "Times New Roman, serif",
     "spans": [ { "offset": 4, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "fontStyle": "italic",
     "spans": [ { "offset": 1, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "fontWeight": "bold",
     "spans": [ { "offset": 2, "length": 3 } ],
     "confidence": 0.98
   },
   {
     "color": "#FF0000",
     "spans": [ { "offset": 4, "length": 2 } ],
     "confidence": 0.98
   },
   {
     "backgroundColor": "#00FF00",
     "spans": [ { "offset": 5, "length": 2 } ],
     "confidence": 0.98
   }
 ]

バーコードプロパティの抽出

ocr.barcode 機能では、content の最上位オブジェクトとして、barcodes コレクション内の識別されたバーコードすべてを抽出します。 content 内では、検出されたバーコードは :barcode: として表されます。このコレクションの各エントリはバーコードを表し、バーコードの種類 kind、埋め込まれたバーコードの内容 value とその座標 polygon が含まれます。最初は、各ページの最後にバーコードが表示されます。 confidence は 1 としてハードコーディングされています。

サポートされているバーコードの種類

バーコードの種類	例
`QR Code`
`Code 39`
`Code 93`
`Code 128`
`UPC (UPC-A & UPC-E)`
`PDF417`
`EAN-8`
`EAN-13`
`Codabar`
`Databar`
`Databar` 展開済み
`ITF`
`Data Matrix`

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=barcodes

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-read",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.BARCODES]    # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
    print(f"----Barcodes detected from page #{page.page_number}----")
    if page.barcodes:
        print(f"Detected {len(page.barcodes)} barcodes:")
        for barcode_idx, barcode in enumerate(page.barcodes):
            print(f"- Barcode #{barcode_idx}: {barcode.value}")
            print(f"  Kind: {barcode.kind}")
            print(f"  Confidence: {barcode.confidence}")
            print(f"  Bounding regions: {barcode.polygon}")

GitHub でサンプルを表示する。

----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
  Kind: QRCode
  Confidence: 0.95
  Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
  Kind: QRCode
  Confidence: 0.98
  Bounding regions: [50.5, 60.5, 70.5, 80.5]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-barcodes.jpg?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.BARCODES]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_barcodes]
# Iterate over extracted barcodes on each page.
for page in result.pages:
    print(f"----Barcodes detected from page #{page.page_number}----")
    print(f"Detected {len(page.barcodes)} barcodes:")
    for barcode_idx, barcode in enumerate(page.barcodes):
        print(f"- Barcode #{barcode_idx}: {barcode.value}")
        print(f"  Kind: {barcode.kind}")
        print(f"  Confidence: {barcode.confidence}")
        print(f"  Bounding regions: {format_polygon(barcode.polygon)}")

GitHub でサンプルを表示する。

----Barcodes detected from page #1----
Detected 2 barcodes:
- Barcode #0: 123456
  Kind: QRCode
  Confidence: 0.95
  Bounding regions: [10.5, 20.5, 30.5, 40.5]
- Barcode #1: 789012
  Kind: QRCode
  Confidence: 0.98
  Bounding regions: [50.5, 60.5, 70.5, 80.5]

言語検出

analyzeResult 要求に languages 機能を追加すると、analyzeResult の languages コレクション内の confidence と共に、各テキスト行で検出される主要言語が予測されます。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=languages

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.LANGUAGES]     # Specify which add-on capabilities to enable.
)
result: AnalyzeResult = poller.result()

# [START analyze_languages]
print("----Languages detected in the document----")
if result.languages:
    print(f"Detected {len(result.languages)} languages:")
    for lang_idx, lang in enumerate(result.languages):
        print(f"- Language #{lang_idx}: locale '{lang.locale}'")
        print(f"  Confidence: {lang.confidence}")
        print(
            f"  Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'"
        )

GitHub でサンプルを表示する。

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages

# Analyze a document at a URL:
url = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/add-on/add-on-fonts_and_languages.png?raw=true"
poller = document_analysis_client.begin_analyze_document_from_url(
    "prebuilt-layout", document_url=url, features=[AnalysisFeature.LANGUAGES]    # Specify which add-on capabilities to enable.
)
result = poller.result()

# [START analyze_languages]
print("----Languages detected in the document----")
print(f"Detected {len(result.languages)} languages:")
for lang_idx, lang in enumerate(result.languages):
    print(f"- Language #{lang_idx}: locale '{lang.locale}'")
    print(f"  Confidence: {lang.confidence}")
    print(f"  Text: '{','.join([result.content[span.offset : span.offset + span.length] for span in lang.spans])}'")

GitHub でサンプルを表示する。

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

検索可能な PDF

検索可能な PDF 機能を使用すると、スキャン画像の PDF ファイルなどのアナログ PDF をテキストが埋め込まれた PDF に変換できます。埋め込みテキストにして、検出されたテキストエンティティを画像ファイルの上にオーバーレイすることで、PDF の抽出されたコンテンツ内でディープテキスト検索を実行できるようになります。

重要

現在、検索可能な PDF 機能は読み取り OCR モデル prebuilt-read でのみサポートされています。この機能を使用する場合、他のモデルの種類はこのプレビューバージョンに対してエラーを返すので、modelId を prebuilt-read と指定してください。
検索可能な PDF は 2024-07-31-preview prebuilt-read モデルに含まれており、一般的な PDF の使用には使用料がかかりません。

検索可能な PDF を使用する

検索可能な PDF を使用するには、Analyze 操作を使用して POST 要求を作成し、出力形式を pdf と指定します。


POST /documentModels/prebuilt-read:analyze?output=pdf
{...}
202

Analyze 操作が完了したら、GET 要求を発行して、Analyze 操作の結果を取得します。

正常に完了すると、PDF を取得して application/pdf としてダウンロードできます。この操作により、Base64 でエンコードされた JSON ではなく、埋め込みテキスト形式の PDF を直接ダウンロードできます。


// Monitor the operation until completion.
GET /documentModels/prebuilt-read/analyzeResults/{resultId}
200
{...}

// Upon successful completion, retrieve the PDF as application/pdf.
GET /documentModels/prebuilt-read/analyzeResults/{resultId}/pdf
200 OK
Content-Type: application/pdf

キーと値のペア

以前のバージョンの API では、prebuilt-document モデルによってフォームとドキュメントからキーと値のペアが抽出されました。事前構築済みのレイアウトに keyValuePairs 機能が追加されたので、レイアウトモデルで同じ結果が生成されるようになりました。

キーと値のペアは、ラベルまたはキーとそれに関連付けられている応答または値を識別する、ドキュメント内の特定の範囲です。構造化されたフォームでは、これらのペアは、ラベルと、ユーザーがそのフィールドに入力した値である可能性があります。非構造化ドキュメントでは、段落内のテキストに基づいて契約が実行された日付である可能性があります。さまざまなドキュメントの種類、形式、構造に基づいて、識別可能なキーと値を抽出するために、AI モデルがトレーニングされています。

モデルによってキーの存在が検出されても、関連する値がない場合や、省略可能なフィールドの処理では、キーが単独で存在する可能性もあります。たとえば、一部のインスタンスでは、フォームのミドルネームフィールドを空白のままにすることができます。キーと値のペアは、常に、ドキュメントに含まれるテキストの範囲です。 "顧客" と "ユーザー" など、同じ値が異なる方法で記述されるドキュメントの場合、関連付けられているキーは、(コンテキストに基づき) 顧客またはユーザーのいずれかです。

REST API

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=keyValuePairs

クエリフィールド

クエリフィールドは、事前構築済みモデルから抽出されたスキーマを拡張したり、キー名が変数の場合に特定のキー名を定義したりするアドオン機能です。クエリフィールドを使用するには、機能を queryFields に設定し、 queryFields プロパティにフィールド名のコンマ区切りのリストを指定します。

ドキュメントインテリジェンスでクエリフィールドの抽出がサポートされるようになりました。クエリフィールド抽出を使用すると、トレーニングを追加しなくても、クエリ要求を使用して抽出プロセスにフィールドを追加できます。
事前構築済みまたはカスタムモデルのスキーマを拡張する必要がある場合、またはレイアウトの出力を含むいくつかのフィールドを抽出する必要がある場合は、クエリフィールドを使用します。
クエリフィールドはプレミアムアドオン機能です。最適な結果を得るには、複数単語のフィールド名にキャメルケースまたはパスカルケースフィールド名を使用して抽出するフィールドを定義します。
クエリフィールドは、要求ごとに最大 20 個のフィールドをサポートします。ドキュメントにフィールドの値が含まれている場合は、フィールドと値が返されます。
このリリースには、以前の実装よりも価格が低く、検証する必要があるクエリフィールド機能の新しい実装があります。

Note

現在、Document Intelligence Studio のクエリフィールド抽出は、US tax モデル (W2、1098s、1099s モデル) を除いて、レイアウトおよび事前構築済みのモデル 2024-02-29-preview 2023-10-31-preview API とそれ以降のリリースで使用できます。

クエリフィールドの抽出

クエリフィールド抽出の場合は、抽出するフィールドを指定すると、Document Intelligence により、それに応じてドキュメントが分析されます。次に例を示します。

Document Intelligence Studio でコントラクトを処理する場合は、2024-02-29-preview または 2023-10-31-preview バージョンを使用してください。
analyze document 要求の一部として、Party1、Party2、TermsOfUse、PaymentTerms、PaymentDate、TermEndDate などのフィールドラベルのリストを渡すことができます。
Document Intelligenceでは、フィールドデータを分析して抽出し、構造化された JSON 出力で値を返します。
クエリフィールドに加えて、応答にはテキスト、テーブル、選択マーク、およびその他の関連データが含まれます。

{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=queryFields&queryFields=TERMS

# Analyze a document at a URL:
formUrl = "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/invoice/simple-invoice.png?raw=true"
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source=formUrl),
    features=[DocumentAnalysisFeature.QUERY_FIELDS],    # Specify which add-on capabilities to enable.
    query_fields=["Address", "InvoiceNumber"],  # Set the features and provide a comma-separated list of field names.
)
result: AnalyzeResult = poller.result()
print("Here are extra fields in result:\n")
if result.documents:
    for doc in result.documents:
        if doc.fields and doc.fields["Address"]:
            print(f"Address: {doc.fields['Address'].value_string}")
        if doc.fields and doc.fields["InvoiceNumber"]:
            print(f"Invoice number: {doc.fields['InvoiceNumber'].value_string}")

GitHub でサンプルを表示する。

Address: 1 Redmond way Suite 6000 Redmond, WA Sunnayvale, 99243
Invoice number: 34278587

次のステップ

詳細情報: 読み取りモデル レイアウトモデル

SDK サンプル: python

その他のサンプルを見つける: アドオン機能

次の方法で共有

ドキュメントインテリジェンスアドオン機能

機能

バージョンの可用性

"サポートされているファイル形式"

高解像度の抽出

数式の抽出

Font プロパティの抽出

バーコードプロパティの抽出

サポートされているバーコードの種類

言語検出

検索可能な PDF

検索可能な PDF を使用する

キーと値のペア

REST API

クエリフィールド

クエリフィールドの抽出

次のステップ

フィードバック

その他のリソース

次の方法で共有

ドキュメント インテリジェンス アドオン機能

機能

バージョンの可用性

"サポートされているファイル形式"

高解像度の抽出

数式の抽出

Font プロパティの抽出

バーコード プロパティの抽出

サポートされているバーコードの種類

言語検出

検索可能な PDF

検索可能な PDF を使用する

キーと値のペア

REST API

クエリ フィールド

クエリ フィールドの抽出

次のステップ

フィードバック

その他のリソース

ドキュメントインテリジェンスアドオン機能

バーコードプロパティの抽出

クエリフィールド

クエリフィールドの抽出