How to train custom model for different document formats with the same set of labels?

Hugo Scaramal 41 Reputation points
2021-04-23T18:51:39.367+00:00

I'm trying to understand what is the best way to train a custom model for invoices in languages not supported by the prebuilt invoice model, french as an example.

As normal we will have many different invoice formats from different vendors, but in all of them, we will extract the same set of labels (invoice number, amount, date, vendor name, etc).

Should I create a model per vendor and compose it?
If I do so, do I need to train it for all vendors, or will it work for invoices that were not trained, but use the same verbiage as trained invoices?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,406 questions
{count} votes

Accepted answer
  1. Ramr-msft 17,616 Reputation points
    2021-05-04T06:56:19.7+00:00

    @Hugo Scaramal Thanks, For invoices you should use the pre-built Invoice model, no training required - https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/concept-invoices.If you need to train a model and not use the pre-built than yes a model per vendor \ provider and compose them. Start with the top providers so that you get more coverage.


0 additional answers

Sort by: Most helpful