Please advise on the implementation of Azure Form Recognizer in a production environment.

test29998411 281 Reputation points

My team is conducting a PoC to extract text specified by Azure Form Recognizer from PDF files such as purchase orders, payment slips, etc.

During the PoC, we encountered an issue that the total amount in the last row of the table cannot be retrieved because the position is variable depending on the number of rows, as shown in the attachment. (CASE A)

In some cases, the total amount is displayed after the second page. (CASE B)

Our client has asked us to explain the accuracy of the OCR when deploying Azure Form Recognizer to the production environment.

How many PDF files should we train Azure Form Recognizer on in order to get enough accuracy to deploy it in a production environment?

Also, any best practices on how to train the Azure Form Recognizer model?


Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,433 questions
{count} votes

Accepted answer
  1. romungi-MSFT 42,761 Reputation points Microsoft Employee

    @test29998411 the first approach might work as long as the fields or column names are fixed.
    The second approach of using labels is better suited if your tables have fixed number of rows and follow a pattern.
    The third approach of splitting the file can be used for large files. Splitting a file based on tables or labels might not improve the extraction.
    Training is done with a minimum of 5 documents with the form having all the required fields or values you expect to extract. If you want to add more document formats you can always train a new model and create a composite model using all your models to extract different document formats.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful