Azure document intelligence OCR extraction

Sriramsubramaniyan Nadarajan 76 Reputation points
2023-09-27T13:30:18.81+00:00

Hi Team,

We are planning to utilize the document intelligence service of Azure for OCR of specific forms, we are using general documents layout, we are able to get the output but I have few queries.

Our requirement is to extract key value pairs for specific fields and their confidence level

I can see that all are extracted using different components in the sample code, however we are unable to relate them together

For ex Key value pairs are fetched in result.key_value_pairs:

page in result.pages, word in page.words, table_idx, table in enumerate(result.tables):

Can you please provide some suggestions sample code so that our output will be like below

Key - Name

Value - Microsoft

Confidence - 100

if any checkbox are present output should be like

Key - Name

Value - Microsoft

Confidence - 100

Selection mark - selected

We are using python sdk, any suggestions will be highly helpful. Thanks

Azure AI Document Intelligence
0 comments No comments
{count} votes

Answer accepted by question author
  1. VasaviLankipalle-MSFT 18,716 Reputation points Moderator
    2023-09-27T23:21:05.5733333+00:00

    Hello @Sriramsubramaniyan Nadarajan , Thanks for using Microsoft Q&A Platform.

    Here is the python SDK sample code for the general document model. You can use the below code snippet to generate the key-value pairs along with the confidence scores as per your requirement. You can access the full sample code in the GitHub here: https://github.com/Azure/azure-sdk-for-python/tree/azure-ai-formrecognizer_3.3.0/sdk/formrecognizer/azure-ai-formrecognizer/samples

    print("----Key-value pairs found in document----")
    for kv_pair in result.key_value_pairs:
       
        if kv_pair.key:
            print(
                    "Key '{}' found within '{}' bounding regions".format(
                        kv_pair.key.content,
                        format_bounding_region(kv_pair.key.bounding_regions),
                    )
                )
        if kv_pair.value:
            print(
                    "Value '{}' found within '{}' bounding regions".format(
                        kv_pair.value.content,
                        format_bounding_region(kv_pair.value.bounding_regions),
                    )
                )
        print("confidence score:{} % \n".format(kv_pair.confidence*100))
    

    I hope this helps.

    Regards,
    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.