FAQ for document processing
This article consists of frequently asked questions about the document processing model in AI Builder. If you don't find your question here, review the overview of the document processing AI model or submit your question to the Power Automate Community for AI Builder.
With document processing, you can build a custom AI model to extract information from various kinds of documents.
- The Fixed-template documents option is ideal if the elements of your documents can be found in similar places. It's usually the case for invoices, purchase orders, delivery orders, and tax forms.
- General documents option is ideal for any kind of document, including the ones supported by the first option but also contracts, statement of work, letters, and others. This option can be more powerful to extract data, but requires longer training time.
Learn more: Overview of the document processing model
Supported file types are PDF, JPG, and PNG.
Document processing can extract fields, tables, and checkboxes from documents.
Learn more: Define information to extract
Yes. Document processing can extract printed and handwritten text from your documents.
For high-quality documents that use the same layout, five sample documents should be sufficient. For low-quality documents (for example, scans of poor quality, more sample documents might be necessary. To improve results, use 15 to 20 sample documents.
Can a single form-processing model extract information from documents that have different layouts or templates?
Yes. By using the collections feature, you train a single form-processing model to handle documents that have different layouts.
Learn more: Group documents by collections
Each form needs to be in a separate file. For example, if you have a PDF document with multiple invoices in it, create a separate file for each invoice before you send it to the document processing model.
You can also specify pages for the document processing model to handle. This way you can take advantage of the model's functionality to loop page by page, and process one form at a time.
Learn more: Page range
I trained a document processing model but I'm not getting good results when it comes to extracted data. How can I improve the model?
If your model is returning poor results after you trained it, edit the model and provide more samples for training. The more samples you provide, the more the AI model learns how to extract data from your documents.
Learn more: Improve the performance of your document processing model
You can process up to 360 documents per environment, every 60 seconds.
- It can happen that some characters get confused: 0 (number) and O (letter), 1 (number) and l (letter), 4 (number) and A (letter), and more.
- It can happen that some characters over or close to others get recognized incorrectly: O (letter) over a vertical line becomes a 0 (number), 5 (number) over a line becomes a $ (American dollar sign), l_ (lowercase letter, underscore) becomes an L (uppercase letter), and more.
- It can happen that some characters on documents of poor quality get recognized incorrectly, or not at all.
In the above cases, nothing can be done in AI Builder to improve the recognition. We recommand to improve the quality and layout of the source document to solve similar issues.
Note
The OCR technology to detect characters is constantly improved by Microsoft, so such issues happen less often.
You can create up to 200 collections per model. However, training General documents models with tens of collections can take several hours and—in rare occasions—time out. If your model has a high number of collections, expect to wait up to 24 hours for model training completion.
Currently, it isn't possible to create a model in a solution.
Yes, unstructured documents like contracts and letters are supported by document processing, using the General documents option.
What are the differences among document processing, invoice processing, receipt processing, identity document reader, business card reader, and text recognition?
Depending on your situation, you might need to use a particular model or a combination of them.
Use text recognition when you want to extract all the text present in an image or a PDF. You can then, for example, search for a keyword in the text that's extracted, or build some fixed rules to extract certain items.
If you want to extract information from invoices, receipts, passports, driver's licenses, or business cards, start with the corresponding prebuilt model:
- Invoice processing
- Receipt processing
- Identity document reader (passports and driver's licenses)
- Business card reader
You can use these prebuilt models immediately, without having to create a new model. These models can extract common information found in their corresponding document type.
For any other document type, you can create a custom document processing model to extract the fields and tables you need. This also applies if you need additional information not provided by the prebuilt model.
Learn more: Custom document processing model
AI Builder document processing is built on top of Azure Form Recognizer. This provides both products with the latest advancements in Microsoft AI.
AI Builder is part of Microsoft Power Platform. This enables anyone to add AI into apps and automation with an easy-to-use interface. You don't need to be a developer or data scientist.
Azure Form Recognizer is targeted to professional developers. They can use simple REST APIs to add AI capabilities to their custom code solutions.
You can start trying out document processing for free by starting a trial. After you evaluate it, you need to purchase AI Builder credits to use document processing. Every page you process with document processing consumes AI Builder credits, even if the page doesn't contain data to extract. AI Builder credits can be purchased through AI Builder add-ons.
Learn more: AI Builder licensing