Azure AI Document Intelligence-klientbibliotek för Python – version 1.0.0b1

Artikel
11/22/2023

Azure AI Document Intelligence (kallades tidigare Formigenkänning) är en molntjänst som använder maskininlärning för att analysera text och strukturerade data från dina dokument. Den innehåller följande huvudfunktioner:

Layout – Extrahera innehåll och struktur (t.ex. ord, urvalsmarkeringar, tabeller) från dokument.
Dokument – Analysera nyckel/värde-par utöver allmän layout från dokument.
Läs – Läs sidinformation från dokument.
Fördefinierad – Extrahera vanliga fältvärden från utvalda dokumenttyper (t.ex. kvitton, fakturor, visitkort, ID-dokument, amerikanska W-2-skattedokument med mera) med hjälp av fördefinierade modeller.
Anpassad – Skapa anpassade modeller från dina egna data för att extrahera skräddarsydda fältvärden utöver allmän layout från dokument.
Klassificerare – Skapa anpassade klassificeringsmodeller som kombinerar layout- och språkfunktioner för att korrekt identifiera och identifiera dokument som du bearbetar i ditt program.
Tilläggsfunktioner – Extrahera streckkoder/QR-koder, formler, teckensnitt/format osv. eller aktivera högupplösningsläge för stora dokument med valfria parametrar.

Komma igång

Installera paketet

python -m pip install azure-ai-documentintelligence

Förutsättningar

Python 3.7 eller senare krävs för att använda det här paketet.
Du behöver en Azure-prenumeration för att använda det här paketet.
En befintlig Azure AI Document Intelligence-instans.

Skapa en Cognitive Services- eller dokumentinformationsresurs

Dokumentinformation stöder både åtkomst med flera tjänster och en enda tjänst. Skapa en Cognitive Services-resurs om du planerar att komma åt flera kognitiva tjänster under en enda slutpunkt/nyckel. För endast åtkomst till dokumentinformation skapar du en dokumentinformationsresurs. Observera att du behöver en resurs med en enda tjänst om du tänker använda Azure Active Directory-autentisering.

Du kan skapa endera resursen med hjälp av:

Alternativ 1: Azure-portalen.
Alternativ 2: Azure CLI.

Nedan visas ett exempel på hur du kan skapa en dokumentinformationsresurs med hjälp av CLI:

# Create a new resource group to hold the Document Intelligence resource
# if using an existing resource group, skip this step
az group create --name <your-resource-name> --location <location>

# Create the Document Intelligence resource
az cognitiveservices account create \
    --name <your-resource-name> \
    --resource-group <your-resource-group-name> \
    --kind FormRecognizer \
    --sku <sku> \
    --location <location> \
    --yes

Mer information om hur du skapar resursen eller hur du hämtar plats- och SKU-information finns här.

Autentisera klienten

För att kunna interagera med dokumentinformationstjänsten måste du skapa en instans av en klient. En slutpunkt och autentiseringsuppgifter krävs för att instansiera klientobjektet.

Hämta slutpunkten

Du hittar slutpunkten för din dokumentinformationsresurs med hjälp av Azure-portalen eller Azure CLI:

# Get the endpoint for the Document Intelligence resource
az cognitiveservices account show --name "resource-name" --resource-group "resource-group-name" --query "properties.endpoint"

Antingen kan en regional slutpunkt eller en anpassad underdomän användas för autentisering. De formateras på följande sätt:

Regional endpoint: https://<region>.api.cognitive.microsoft.com/
Custom subdomain: https://<resource-name>.cognitiveservices.azure.com/

En regional slutpunkt är samma för varje resurs i en region. En fullständig lista över regionala slutpunkter som stöds kan konsulteras här. Observera att regionala slutpunkter inte stöder AAD-autentisering.

En anpassad underdomän är å andra sidan ett namn som är unikt för dokumentinformationsresursen. De kan bara användas av resurser med en enda tjänst.

Hämta API-nyckeln

DU hittar API-nyckeln i Azure-portalen eller genom att köra följande Azure CLI-kommando:

az cognitiveservices account keys list --name "<resource-name>" --resource-group "<resource-group-name>"

Skapa klienten med AzureKeyCredential

Om du vill använda en API-nyckel som credential parameter skickar du nyckeln som en sträng till en instans av AzureKeyCredential.

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = "https://<my-custom-subdomain>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
document_analysis_client = DocumentIntelligenceClient(endpoint, credential)

Skapa klienten med en Azure Active Directory-autentiseringsuppgift

AzureKeyCredential autentisering används i exemplen i den här komma igång-guiden, men du kan också autentisera med Azure Active Directory med hjälp av biblioteket azure-identity . Observera att regionala slutpunkter inte stöder AAD-autentisering. Skapa ett anpassat underdomännamn för resursen för att kunna använda den här typen av autentisering.

Om du vill använda typen DefaultAzureCredential som visas nedan eller andra typer av autentiseringsuppgifter som medföljer Azure SDK installerar azure-identity du paketet:

pip install azure-identity

Du måste också registrera ett nytt AAD-program och bevilja åtkomst till dokumentinformation genom att tilldela "Cognitive Services User" rollen till tjänstens huvudnamn.

När det är klart anger du värdena för klient-ID, klient-ID och klienthemlighet för AAD-programmet som miljövariabler: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET.

"""DefaultAzureCredential will use the values from these environment
variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
"""
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.identity import DefaultAzureCredential

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
credential = DefaultAzureCredential()

document_analysis_client = DocumentIntelligenceClient(endpoint, credential)

Viktiga begrepp

DocumentIntelligenceClient

DocumentIntelligenceClient tillhandahåller åtgärder för att analysera indatadokument med hjälp av fördefinierade och anpassade modeller via API:et begin_analyze_document . Använd parametern model_id för att välja typ av modell för analys. Se en fullständig lista över modeller som stöds här. DocumentIntelligenceClient innehåller även åtgärder för att klassificera dokument via API:etbegin_classify_document. Anpassade klassificeringsmodeller kan klassificera varje sida i en indatafil för att identifiera dokument i och kan också identifiera flera dokument eller flera instanser av ett enskilt dokument i en indatafil.

Exempelkodfragment tillhandahålls för att illustrera med hjälp av DocumentIntelligenceClient-exempel här. Mer information om hur du analyserar dokument, inklusive funktioner, nationella inställningar och dokumenttyper som stöds finns i tjänstdokumentationen.

DocumentIntelligenceAdministrationClient

DocumentIntelligenceAdministrationClient tillhandahåller åtgärder för:

Skapa anpassade modeller för att analysera specifika fält som du anger genom att märka dina anpassade dokument. En DocumentModelDetails returneras som anger vilka dokumenttyper som modellen kan analysera, samt den uppskattade konfidensen för varje fält. En mer detaljerad förklaring finns i tjänstdokumentationen .
Skapa en sammansatt modell från en samling befintliga modeller.
Hantera modeller som skapats i ditt konto.
Lista åtgärder eller hämta en specifik modellåtgärd som skapats under de senaste 24 timmarna.
Kopiera en anpassad modell från en dokumentinformationsresurs till en annan.
Skapa och hantera en anpassad klassificeringsmodell för att klassificera de dokument som du bearbetar i ditt program.

Observera att modeller också kan skapas med hjälp av ett grafiskt användargränssnitt, till exempel Document Intelligence Studio.

Exempelkodfragment tillhandahålls för att illustrera med hjälp av DocumentIntelligenceAdministrationClient-exempel här.

Tidskrävande åtgärder

Långvariga åtgärder är åtgärder som består av en första begäran som skickas till tjänsten för att starta en åtgärd, följt av avsökning av tjänsten med jämna mellanrum för att avgöra om åtgärden har slutförts eller misslyckats, och om den har lyckats för att få resultatet.

Metoder som analyserar dokument, byggmodeller eller kopierar/skriver modeller modelleras som långvariga åtgärder. Klienten exponerar en begin_<method-name> metod som returnerar en LROPoller eller AsyncLROPoller. Anropare bör vänta tills åtgärden har slutförts genom att anropa result() det pollerobjekt som returneras från begin_<method-name> metoden . Exempelkodfragment tillhandahålls för att illustrera med hjälp av långvariga åtgärder nedan.

Exempel

Följande avsnitt innehåller flera kodfragment som täcker några av de vanligaste uppgifterna för dokumentinformation, inklusive:

Extrahera layout
Använda den allmänna dokumentmodellen
Använda fördefinierade modeller
Skapa en anpassad modell
Analysera dokument med en anpassad modell
Hantera dina modeller
Tilläggsfunktioner

Extrahera layout

Extrahera text, markeringsmarkeringar, textformat och tabellstrukturer, tillsammans med deras koordinater för avgränsningsområde, från dokument.

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(
    endpoint=endpoint, credential=AzureKeyCredential(key)
)
with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-layout", analyze_request=f, content_type="application/octet-stream"
    )
result = poller.result()

for idx, style in enumerate(result.styles):
    print(
        "Document contains {} content".format(
            "handwritten" if style.is_handwritten else "no handwritten"
        )
    )

for page in result.pages:
    print("----Analyzing layout from page #{}----".format(page.page_number))
    print(
        "Page has width: {} and height: {}, measured with unit: {}".format(
            page.width, page.height, page.unit
        )
    )

    for line_idx, line in enumerate(page.lines):
        words = line.get_words()
        print(
            "...Line # {} has word count {} and text '{}' within bounding polygon '{}'".format(
                line_idx,
                len(words),
                line.content,
                line.polygon,
            )
        )

        for word in words:
            print(
                "......Word '{}' has a confidence of {}".format(
                    word.content, word.confidence
                )
            )

    for selection_mark in page.selection_marks:
        print(
            "...Selection mark is '{}' within bounding polygon '{}' and has a confidence of {}".format(
                selection_mark.state,
                selection_mark.polygon,
                selection_mark.confidence,
            )
        )

for table_idx, table in enumerate(result.tables):
    print(
        "Table # {} has {} rows and {} columns".format(
            table_idx, table.row_count, table.column_count
        )
    )
    for region in table.bounding_regions:
        print(
            "Table # {} location on page: {} is {}".format(
                table_idx,
                region.page_number,
                region.polygon,
            )
        )
    for cell in table.cells:
        print(
            "...Cell[{}][{}] has content '{}'".format(
                cell.row_index,
                cell.column_index,
                cell.content,
            )
        )
        for region in cell.bounding_regions:
            print(
                "...content on page {} is within bounding polygon '{}'".format(
                    region.page_number,
                    region.polygon,
                )
            )

print("----------------------------------------")

Använda fördefinierade modeller

Extrahera fält från utvalda dokumenttyper, till exempel kvitton, fakturor, visitkort, identitetsdokument och amerikanska W-2-skattedokument med hjälp av fördefinierade modeller som tillhandahålls av dokumentinformationstjänsten.

Om du till exempel vill analysera fält från ett försäljningskvitto använder du den fördefinierade kvittomodellen som tillhandahålls genom att skicka model_id="prebuilt-receipt" till begin_analyze_document metoden :

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_analysis_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
with open(path_to_sample_documents, "rb") as f:
    poller = document_analysis_client.begin_analyze_document(
        "prebuilt-receipt", analyze_request=f, locale="en-US", content_type="application/octet-stream"
    )
receipts = poller.result()

for idx, receipt in enumerate(receipts.documents):
    print(f"--------Analysis of receipt #{idx + 1}--------")
    print(f"Receipt type: {receipt.doc_type if receipt.doc_type else 'N/A'}")
    merchant_name = receipt.fields.get("MerchantName")
    if merchant_name:
        print(f"Merchant Name: {merchant_name.get('valueString')} has confidence: " f"{merchant_name.confidence}")
    transaction_date = receipt.fields.get("TransactionDate")
    if transaction_date:
        print(
            f"Transaction Date: {transaction_date.get('valueDate')} has confidence: "
            f"{transaction_date.confidence}"
        )
    if receipt.fields.get("Items"):
        print("Receipt items:")
        for idx, item in enumerate(receipt.fields.get("Items").get("valueArray")):
            print(f"...Item #{idx + 1}")
            item_description = item.get("valueObject").get("Description")
            if item_description:
                print(
                    f"......Item Description: {item_description.get('valueString')} has confidence: "
                    f"{item_description.confidence}"
                )
            item_quantity = item.get("valueObject").get("Quantity")
            if item_quantity:
                print(
                    f"......Item Quantity: {item_quantity.get('valueString')} has confidence: "
                    f"{item_quantity.confidence}"
                )
            item_total_price = item.get("valueObject").get("TotalPrice")
            if item_total_price:
                print(
                    f"......Total Item Price: {format_price(item_total_price.get('valueCurrency'))} has confidence: "
                    f"{item_total_price.confidence}"
                )
    subtotal = receipt.fields.get("Subtotal")
    if subtotal:
        print(f"Subtotal: {format_price(subtotal.get('valueCurrency'))} has confidence: {subtotal.confidence}")
    tax = receipt.fields.get("TotalTax")
    if tax:
        print(f"Total tax: {format_price(tax.get('valueCurrency'))} has confidence: {tax.confidence}")
    tip = receipt.fields.get("Tip")
    if tip:
        print(f"Tip: {format_price(tip.get('valueCurrency'))} has confidence: {tip.confidence}")
    total = receipt.fields.get("Total")
    if total:
        print(f"Total: {format_price(total.get('valueCurrency'))} has confidence: {total.confidence}")
    print("--------------------------------------")

Du är inte begränsad till kvitton! Det finns några fördefinierade modeller att välja mellan, där var och en har en egen uppsättning fält som stöds. Se andra fördefinierade modeller som stöds här.

Skapa en anpassad modell

Skapa en anpassad modell på din egen dokumenttyp. Den resulterande modellen kan användas för att analysera värden från de typer av dokument som den har tränats på. Ange en CONTAINER-SAS-URL till din Azure Storage Blob-container där du lagrar träningsdokumenten.

Mer information om hur du konfigurerar en container och nödvändig filstruktur finns i tjänstdokumentationen.

from azure.ai.formrecognizer import (
    DocumentIntelligenceAdministrationClient,
    ModelBuildMode,
)
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
container_sas_url = os.environ["CONTAINER_SAS_URL"]

document_model_admin_client = DocumentIntelligenceAdministrationClient(
    endpoint, AzureKeyCredential(key)
)
poller = document_model_admin_client.begin_build_document_model(
    ModelBuildMode.TEMPLATE,
    blob_container_url=container_sas_url,
    description="my model description",
)
model = poller.result()

print(f"Model ID: {model.model_id}")
print(f"Description: {model.description}")
print(f"Model created on: {model.created_on}")
print(f"Model expires on: {model.expires_on}")
print("Doc types the model can recognize:")
for name, doc_type in model.doc_types.items():
    print(
        f"Doc Type: '{name}' built with '{doc_type.build_mode}' mode which has the following fields:"
    )
    for field_name, field in doc_type.field_schema.items():
        print(
            f"Field: '{field_name}' has type '{field['type']}' and confidence score "
            f"{doc_type.field_confidence[field_name]}"
        )

Analysera dokument med hjälp av en anpassad modell

Analysera dokumentfält, tabeller, urvalsmarkeringar med mera. Dessa modeller tränas med dina egna data, så de är skräddarsydda för dina dokument. För bästa resultat bör du bara analysera dokument av samma dokumenttyp som den anpassade modellen skapades med.

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id)

document_analysis_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))

# Make sure your document's type is included in the list of document types the custom model can analyze
with open(path_to_sample_documents, "rb") as f:
    poller = document_analysis_client.begin_analyze_document(
        model_id=model_id, analyze_request=f, content_type="application/octet-stream"
    )
result = poller.result()

for idx, document in enumerate(result.documents):
    print(f"--------Analyzing document #{idx + 1}--------")
    print(f"Document has type {document.doc_type}")
    print(f"Document has document type confidence {document.confidence}")
    print(f"Document was analyzed with model with ID {result.model_id}")
    for name, field in document.fields.items():
        field_value = field.get("valueString") if field.get("valueString") else field.content
        print(
            f"......found field of type '{field.type}' with value '{field_value}' and with confidence {field.confidence}"
        )

# iterate over tables, lines, and selection marks on each page
for page in result.pages:
    print(f"\nLines found on page {page.page_number}")
    for line in page.lines:
        print(f"...Line '{line.content}'")
    for word in page.words:
        print(f"...Word '{word.content}' has a confidence of {word.confidence}")
    if page.selection_marks:
        print(f"\nSelection marks found on page {page.page_number}")
        for selection_mark in page.selection_marks:
            print(
                f"...Selection mark is '{selection_mark.state}' and has a confidence of {selection_mark.confidence}"
            )

for i, table in enumerate(result.tables):
    print(f"\nTable {i + 1} can be found on page:")
    for region in table.bounding_regions:
        print(f"...{region.page_number}")
    for cell in table.cells:
        print(f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'")
print("-----------------------------------")

Dessutom kan en dokument-URL också användas för att analysera dokument med hjälp av begin_analyze_document metoden .

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_analysis_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/tests/sample_forms/receipt/contoso-receipt.png"
poller = document_analysis_client.begin_analyze_document("prebuilt-receipt", AnalyzeDocumentRequest(url_source=url))
receipts = poller.result()

Hantera dina modeller

Hantera de anpassade modeller som är kopplade till ditt konto.

from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import ResourceNotFoundError

endpoint = "https://<my-custom-subdomain>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")

document_model_admin_client = DocumentIntelligenceAdministrationClient(endpoint, credential)

account_details = document_model_admin_client.get_resource_info()
print("Our account has {} custom models, and we can have at most {} custom models".format(
    account_details.custom_document_models.count, account_details.custom_document_models.limit
))

# Here we get a paged list of all of our models
models = document_model_admin_client.list_models()
print("We have models with the following ids: {}".format(
    ", ".join([m.model_id for m in models])
))

# Replace with the custom model ID from the "Build a model" sample
model_id = "<model_id from the Build a Model sample>"

custom_model = document_model_admin_client.get_model(model_id=model_id)
print("Model ID: {}".format(custom_model.model_id))
print("Description: {}".format(custom_model.description))
print("Model created on: {}\n".format(custom_model.created_on))

# Finally, we will delete this model by ID
document_model_admin_client.delete_model(model_id=custom_model.model_id)

try:
    document_model_admin_client.get_model(model_id=custom_model.model_id)
except ResourceNotFoundError:
    print("Successfully deleted model with id {}".format(custom_model.model_id))

Tilläggsfunktioner

Dokumentinformation stöder mer avancerade analysfunktioner. Dessa valfria funktioner kan aktiveras och inaktiveras beroende på scenariot för dokumentextraheringen.

Följande tilläggsfunktioner är tillgängliga för 2023-07-31 (GA) och senare versioner:

Observera att vissa tilläggsfunktioner medför ytterligare avgifter. Se priser: https://azure.microsoft.com/pricing/details/ai-document-intelligence/.

Felsökning

Allmänt

Dokumentinformationsklientbiblioteket genererar undantag som definierats i Azure Core. Felkoder och meddelanden som genereras av document intelligence-tjänsten finns i tjänstdokumentationen.

Loggning

Det här biblioteket använder standardloggningsbiblioteket för loggning.

Grundläggande information om HTTP-sessioner (URL:er, rubriker osv.) loggas på INFO nivå.

Detaljerad DEBUG nivåloggning, inklusive begärande-/svarskroppar och oredigerade huvuden, kan aktiveras på klienten eller per åtgärd med nyckelordsargumentet logging_enable .

Se fullständig dokumentation om SDK-loggning med exempel här.

Valfri konfiguration

Valfria nyckelordsargument kan skickas på klient- och åtgärdsnivå. Referensdokumentationen för azure-core beskriver tillgängliga konfigurationer för återförsök, loggning, transportprotokoll med mera.

Nästa steg

Mer exempelkod

Se README-exempel för flera kodfragment som illustrerar vanliga mönster som används i Python-API:et för dokumentinformation.

Ytterligare dokumentation

Mer omfattande dokumentation om Azure AI Document Intelligence finns i dokumentationen om docs.microsoft.com .

Bidra

Det här projektet välkomnar bidrag och förslag. Merparten av bidragen kräver att du godkänner ett licensavtal för bidrag, där du deklarerar att du har behörighet att bevilja oss rättigheten att använda ditt bidrag, och att du dessutom uttryckligen gör så. Mer information finns på https://cla.microsoft.com.

När du skickar en pull-förfrågan avgör en CLA-robot automatiskt om du måste tillhandahålla ett licensavtal för bidrag med lämplig PR (t.ex. etikett eller kommentar). Följ bara robotens anvisningar. Du behöver bara göra detta en gång för alla repor som använder vårt licensavtal för bidrag.

Det här projektet använder sig av Microsofts uppförandekod för öppen källkod. Du hittar mer information i Vanliga frågor om uppförandekod eller kontakta opencode@microsoft.com för ytterligare frågor eller kommentarer.