Azure document intelligence OCR extraction

Sriramsubramaniyan Nadarajan 76 Reputation points
2023-09-27T13:30:18.81+00:00

Hi Team,

We are planning to utilize the document intelligence service of Azure for OCR of specific forms, we are using general documents layout, we are able to get the output but I have few queries.

Our requirement is to extract key value pairs for specific fields and their confidence level

I can see that all are extracted using different components in the sample code, however we are unable to relate them together

For ex Key value pairs are fetched in result.key_value_pairs:

page in result.pages, word in page.words, table_idx, table in enumerate(result.tables):

Can you please provide some suggestions sample code so that our output will be like below

Key - Name

Value - Microsoft

Confidence - 100

if any checkbox are present output should be like

Key - Name

Value - Microsoft

Confidence - 100

Selection mark - selected

We are using python sdk, any suggestions will be highly helpful. Thanks

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,541 questions
0 comments No comments
{count} votes

Accepted answer
  1. VasaviLankipalle-MSFT 15,956 Reputation points
    2023-09-27T23:21:05.5733333+00:00

    Hello @Sriramsubramaniyan Nadarajan , Thanks for using Microsoft Q&A Platform.

    Here is the python SDK sample code for the general document model. You can use the below code snippet to generate the key-value pairs along with the confidence scores as per your requirement. You can access the full sample code in the GitHub here: https://github.com/Azure/azure-sdk-for-python/tree/azure-ai-formrecognizer_3.3.0/sdk/formrecognizer/azure-ai-formrecognizer/samples

    print("----Key-value pairs found in document----")
    for kv_pair in result.key_value_pairs:
       
        if kv_pair.key:
            print(
                    "Key '{}' found within '{}' bounding regions".format(
                        kv_pair.key.content,
                        format_bounding_region(kv_pair.key.bounding_regions),
                    )
                )
        if kv_pair.value:
            print(
                    "Value '{}' found within '{}' bounding regions".format(
                        kv_pair.value.content,
                        format_bounding_region(kv_pair.value.bounding_regions),
                    )
                )
        print("confidence score:{} % \n".format(kv_pair.confidence*100))
    

    I hope this helps.

    Regards,
    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful