Document processing models

This article applies to: Form Recognizer v3.0 checkmark Form Recognizer v3.0. Earlier version: Form Recognizer v2.1

This article applies to: Form Recognizer v2.1 checkmark Form Recognizer v2.1. Later version: Form Recognizer v3.0

Azure Form Recognizer supports a wide variety of models that enable you to add intelligent document processing to your apps and flows. You can use a prebuilt document analysis or domain specific model or train a custom model tailored to your specific business needs and use cases. Form Recognizer can be used with the REST API or Python, C#, Java, and JavaScript SDKs.

Model overview

Model Description
Document analysis models
Read OCR Extract print and handwritten text including words, locations, and detected languages.
Layout analysis Extract text and document layout elements like tables, selection marks, titles, section headings, and more.
General document Extract key-value pairs in addition to text and document structure information.
Prebuilt models
W-2 Process W2 forms to extract employee, employer, wage, and other information.
Invoice Automate invoice processing for English and Spanish invoices.
Receipt Extract receipt data from English receipts.
Identity document (ID) Extract identity (ID) fields from US driver licenses and international passports.
Business card Scan business cards to extract key fields and data into your applications.
Custom models
Custom models Extract data from forms and documents specific to your business. Custom models are trained for your distinct data and use cases.
Composed models Combine several custom models into a single model to automate processing of diverse document types with a single composed model.

Read OCR

The Read API analyzes and extracts lines, words, their locations, detected languages, and handwritten style if detected.

Sample document processed using the Form Recognizer Studio:

Screenshot: Screenshot of sample document processed using Form Recognizer studio Read

Layout analysis

The Layout analysis model analyzes and extracts text, tables, selection marks, and other structure elements like titles, section headings, page headers, page footers, and more.

Sample document processed using the Form Recognizer Studio:

Screenshot of sample newspaper page processed using Form Recognizer studio.

General document

The general document model is ideal for extracting common key-value pairs from forms and documents. It’s a pre-trained model and can be directly invoked via the REST API and the SDKs. You can use the general document model as an alternative to training a custom model.

Sample document processed using the Form Recognizer Studio:

Screenshot: general document analysis in the Form Recognizer Studio.

W-2

The W-2 form model extracts key information reported in each box on a W-2 form. The model supports standard and customized forms from 2018 to the present, including single and multiple forms on one page.

Sample W-2 document processed using Form Recognizer Studio:

Screenshot of a sample W-2.

Invoice

The invoice model automates processing of invoices to extracts customer name, billing address, due date, and amount due, line items and other key data. Currently, the model supports English, Spanish, German, French, Italian, Portuguese, and Dutch invoices.

Sample invoice processed using Form Recognizer Studio:

Screenshot of a sample invoice.

Receipt

Use the receipt model to scan sales receipts for merchant name, dates, line items, quantities, and totals from printed and handwritten receipts. The version v3.0 also supports single-page hotel receipt processing.

Sample receipt processed using Form Recognizer Studio:

Screenshot of a sample receipt.

Identity document (ID)

Use the Identity document (ID) model to process U.S. Driver's Licenses (all 50 states and District of Columbia) and biographical pages from international passports (excluding visa and other travel documents) to extract key fields.

Sample U.S. Driver's License processed using Form Recognizer Studio:

Screenshot of a sample identification card.

Business card

Use the business card model to scan and extract key information from business card images.

Sample business card processed using Form Recognizer Studio:

Screenshot of a sample business card.

Custom models

Custom document models analyze and extract data from forms and documents specific to your business. They are trained to recognize form fields within your distinct content and extract key-value pairs and table data. You only need five examples of the same form type to get started.

Version v3.0 custom model supports signature detection in custom forms (template model) and cross-page tables in both template and neural models.

Sample custom template processed using Form Recognizer Studio:

Screenshot: Form Recognizer tool analyze-a-custom-form window.

Composed models

A composed model is created by taking a collection of custom models and assigning them to a single model built from your form types. You can assign multiple custom models to a composed model called with a single model ID. You can assign up to 100 trained custom models to a single composed model.

Composed model dialog window in Form Recognizer Studio:

Screenshot of Form Recognizer Studio compose custom model dialog window.

Model data extraction

Model ID Text extraction Language detection Selection Marks Tables Paragraphs Structure Key-Value pairs Fields
prebuilt-read
prebuilt-tax.us.w2
prebuilt-document
prebuilt-layout
prebuilt-invoice
prebuilt-receipt
prebuilt-idDocument
prebuilt-businessCard
Custom

Input requirements

  • For best results, provide one clear photo or high-quality scan per document.

  • Supported file formats:

    Model PDF Image:
    JPEG/JPG, PNG, BMP, and TIFF
    Microsoft Office:
    Word (DOCX), Excel (XLS), PowerPoint (PPT), and HTML
    Read REST API version
    2022/06/30-preview
    Layout
    General Document
    Prebuilt
    Custom

    ✱ Microsoft Office files are currently not supported for other models or versions.

  • For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).

  • The file size for analyzing documents must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.

  • Image dimensions must be between 50 x 50 pixels and 10,000 px x 10,000 pixels.

  • PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.

  • If your PDFs are password-locked, you must remove the lock before submission.

  • The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8-point text at 150 dots per inch (DPI).

  • For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.

  • For custom model training, the total size of training data is 50 MB for template model and 1G-MB for the neural model.

Note

The Sample Labeling tool does not support the BMP file format. This is a limitation of the tool not the Form Recognizer Service.

Version migration

Learn how to use Form Recognizer v3.0 in your applications by following our Form Recognizer v3.0 migration guide

Model Description
Document analysis
Layout Extract text and layout information from documents.
Prebuilt
Invoice Extract key information from English and Spanish invoices.
Receipt Extract key information from English receipts.
ID document Extract key information from US driver licenses and international passports.
Business card Extract key information from English business cards.
Custom
Custom Extract data from forms and documents specific to your business. Custom models are trained for your distinct data and use cases.
Composed Compose a collection of custom models and assign them to a single model built from your form types.

Layout

The Layout API analyzes and extracts text, tables and headers, selection marks, and structure information from documents.

Sample document processed using the Sample Labeling tool:

Screenshot of layout analysis using the Sample Labeling tool.

Invoice

The invoice model analyzes and extracts key information from sales invoices. The API analyzes invoices in various formats and extracts key information such as customer name, billing address, due date, and amount due.

Sample invoice processed using the Sample Labeling tool:

Screenshot of a sample invoice analysis using the Sample Labeling tool.

Receipt

  • The receipt model analyzes and extracts key information from printed and handwritten sales receipts.

Sample receipt processed using Sample Labeling tool:

Screenshot of a sample receipt.

ID document

The ID document model analyzes and extracts key information from the following documents:

  • U.S. Driver's Licenses (all 50 states and District of Columbia)

  • Biographical pages from international passports (excluding visa and other travel documents). The API analyzes identity documents and extracts

Sample U.S. Driver's License processed using the Sample Labeling tool:

Screenshot of a sample identification card.

Business card

The business card model analyzes and extracts key information from business card images.

Sample business card processed using the Sample Labeling tool:

Screenshot of a sample business card.

Custom

  • Custom models analyze and extract data from forms and documents specific to your business. The API is a machine-learning program trained to recognize form fields within your distinct content and extract key-value pairs and table data. You only need five examples of the same form type to get started and your custom model can be trained with or without labeled datasets.

Sample custom model processing using the Sample Labeling tool:

Screenshot: Form Recognizer tool analyze-a-custom-form window.

Composed custom model

A composed model is created by taking a collection of custom models and assigning them to a single model built from your form types. You can assign multiple custom models to a composed model called with a single model ID. you can assign up to 100 trained custom models to a single composed model.

Composed model dialog window using the Sample Labeling tool:

Screenshot of Form Recognizer Studio compose custom model dialog window.

Model data extraction

Model Text extraction Language detection Selection Marks Tables Paragraphs Paragraph roles Key-Value pairs Fields
Layout
Invoice
Receipt
ID Document
Business Card
Custom Form

Input requirements

  • For best results, provide one clear photo or high-quality scan per document.

  • Supported file formats:

    Model PDF Image:
    JPEG/JPG, PNG, BMP, and TIFF
    Microsoft Office:
    Word (DOCX), Excel (XLS), PowerPoint (PPT), and HTML
    Read REST API version
    2022/06/30-preview
    Layout
    General Document
    Prebuilt
    Custom

    ✱ Microsoft Office files are currently not supported for other models or versions.

  • For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).

  • The file size for analyzing documents must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.

  • Image dimensions must be between 50 x 50 pixels and 10,000 px x 10,000 pixels.

  • PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.

  • If your PDFs are password-locked, you must remove the lock before submission.

  • The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8-point text at 150 dots per inch (DPI).

  • For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.

  • For custom model training, the total size of training data is 50 MB for template model and 1G-MB for the neural model.

Note

The Sample Labeling tool does not support the BMP file format. This is a limitation of the tool not the Form Recognizer Service.

Version migration

You can learn how to use Form Recognizer v3.0 in your applications by following our Form Recognizer v3.0 migration guide

Next steps