Document processing models
Important
- Document Intelligence public preview releases provide early access to features that are in active development.
- Features, approaches, and processes may change, prior to General Availability (GA), based on user feedback.
- The public preview version of Document Intelligence client libraries default to REST API version 2023-10-31-preview.
This content applies to: v4.0 (preview) | Previous versions:
v3.1 (GA)
v3.0 (GA)
v2.1 (GA)
This content applies to: v3.1 (GA) | Latest version:
v4.0 (preview) | Previous versions:
v3.0
v2.1
This content applies to: v3.0 (GA) | Latest versions:
v4.0 (preview)
v3.1 (preview) | Previous version:
v2.1
This content applies to: v2.1 | Latest version:
v4.0 (preview)
Azure AI Document Intelligence supports a wide variety of models that enable you to add intelligent document processing to your apps and flows. You can use a prebuilt domain-specific model or train a custom model tailored to your specific business need and use cases. Document Intelligence can be used with the REST API or Python, C#, Java, and JavaScript SDKs.
Model overview
The following table shows the available models for each current preview and stable API:
Model | 2023-10-31-preview | 2023-07-31 (GA) | 2022-08-31 (GA) | v2.1 (GA) |
---|---|---|---|---|
Add-on capabilities | ✔️ | ✔️ | n/a | n/a |
Business card | deprecated | ✔️ | ✔️ | ✔️ |
Contract | ✔️ | ✔️ | n/a | n/a |
Custom classifier | ✔️ | ✔️ | n/a | n/a |
Custom composed | ✔️ | ✔️ | ✔️ | ✔️ |
Custom neural | ✔️ | ✔️ | ✔️ | n/a |
Custom template | ✔️ | ✔️ | ✔️ | ✔️ |
General document | deprecated | ✔️ | ✔️ | n/a |
Health insurance card | ✔️ | ✔️ | ✔️ | n/a |
ID document | ✔️ | ✔️ | ✔️ | ✔️ |
Invoice | ✔️ | ✔️ | ✔️ | ✔️ |
Layout | ✔️ | ✔️ | ✔️ | ✔️ |
Read | ✔️ | ✔️ | ✔️ | n/a |
Receipt | ✔️ | ✔️ | ✔️ | ✔️ |
US 1098 Tax | ✔️ | ✔️ | n/a | n/a |
US 1098-E Tax | ✔️ | ✔️ | n/a | n/a |
US 1098-T Tax | ✔️ | ✔️ | n/a | n/a |
US 1099 Tax | ✔️ | n/a | n/a | n/a |
US W2 Tax | ✔️ | ✔️ | ✔️ | n/a |
Model | Description |
---|---|
Document analysis models | |
Read OCR | Extract print and handwritten text including words, locations, and detected languages. |
Layout analysis | Extract text and document layout elements like tables, selection marks, titles, section headings, and more. |
Prebuilt models | |
Health insurance card | Automate healthcare processes by extracting insurer, member, prescription, group number and other key information from US health insurance cards. |
US Tax document models | Process US tax forms to extract employee, employer, wage, and other information. |
Contract | Extract agreement and party details. |
Invoice | Automate invoices. |
Receipt | Extract receipt data from receipts. |
Identity document (ID) | Extract identity (ID) fields from US driver licenses and international passports. |
Business card | Scan business cards to extract key fields and data into your applications. |
Custom models | |
Custom model (overview) | Extract data from forms and documents specific to your business. Custom models are trained for your distinct data and use cases. |
Custom extraction models | ● Custom template models use layout cues to extract values from documents and are suitable to extract fields from highly structured documents with defined visual templates. ● Custom neural models are trained on various document types to extract fields from structured, semi-structured and unstructured documents. |
Custom classification model | The Custom classification model can classify each page in an input file to identify the document(s) within and can also identify multiple documents or multiple instances of a single document within an input file. |
Composed models | Combine several custom models into a single model to automate processing of diverse document types with a single composed model. |
For all models, except Business card model, Document Intelligence now supports add-on capabilities to allow for more sophisticated analysis. These optional capabilities can be enabled and disabled depending on the scenario of the document extraction. There are seven add-on capabilities available for the 2023-07-31
(GA) and later API version:
ocrHighResolution
formulas
styleFont
barcodes
languages
keyValuePairs
(2023-10-31-preview)queryFields
(2023-31-preview)
Analysis features
Model ID | Content Extraction | Query fields | Paragraphs | Paragraph Roles | Selection Marks | Tables | Key-Value Pairs | Languages | Barcodes | Document Analysis | Formulas* | Style Font* | High Resolution* |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prebuilt-read | ✓ | O | O | O | O | O | |||||||
prebuilt-layout | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | O | O | O | O | O | ||
prebuilt-document | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | O | O | O | O | O | |
prebuilt-businessCard | ✓ | ✓ | ✓ | ||||||||||
prebuilt-idDocument | ✓ | ✓ | O | O | ✓ | O | O | O | |||||
prebuilt-invoice | ✓ | ✓ | ✓ | ✓ | O | O | O | ✓ | O | O | O | ||
prebuilt-receipt | ✓ | ✓ | O | O | ✓ | O | O | O | |||||
prebuilt-healthInsuranceCard.us | ✓ | ✓ | O | O | ✓ | O | O | O | |||||
prebuilt-tax.us.w2 | ✓ | ✓ | ✓ | O | O | ✓ | O | O | O | ||||
prebuilt-tax.us.1098 | ✓ | ✓ | ✓ | O | O | ✓ | O | O | O | ||||
prebuilt-tax.us.1098E | ✓ | ✓ | ✓ | O | O | ✓ | O | O | O | ||||
prebuilt-tax.us.1098T | ✓ | ✓ | ✓ | O | O | ✓ | O | O | O | ||||
prebuilt-tax.us.1099(variations) | ✓ | ✓ | ✓ | O | O | ✓ | O | O | O | ||||
prebuilt-contract | ✓ | ✓ | ✓ | ✓ | O | O | ✓ | O | O | O | |||
{ customModelName } | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | O | O | ✓ | O | O | O |
✓ - Enabled
O - Optional
* - Premium features incur extra costs
Read OCR
The Read API analyzes and extracts lines, words, their locations, detected languages, and handwritten style if detected.
Sample document processed using the Document Intelligence Studio:
Layout analysis
The Layout analysis model analyzes and extracts text, tables, selection marks, and other structure elements like titles, section headings, page headers, page footers, and more.
Sample document processed using the Document Intelligence Studio:
Health insurance card
The health insurance card model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to analyze and extract key information from US health insurance cards.
Sample US health insurance card processed using Document Intelligence Studio:
US tax documents
The US tax document models analyze and extract key fields and line items from a select group of tax documents. The API supports the analysis of English-language US tax documents of various formats and quality including phone-captured images, scanned documents, and digital PDFs. The following models are currently supported:
Model | Description | ModelID |
---|---|---|
US Tax W-2 | Extract taxable compensation details. | prebuilt-tax.us.W-2 |
US Tax 1098 | Extract mortgage interest details. | prebuilt-tax.us.1098 |
US Tax 1098-E | Extract student loan interest details. | prebuilt-tax.us.1098E |
US Tax 1098-T | Extract qualified tuition details. | prebuilt-tax.us.1098T |
US Tax 1099 | Extract Information from 1099 forms. | prebuilt-tax.us.1099(variations) |
Sample W-2 document processed using Document Intelligence Studio:
Contract
The contract model analyzes and extracts key fields and line items from contractual agreements including parties, jurisdictions, contract ID, and title. The model currently supports English-language contract documents.
Sample contract processed using Document Intelligence Studio:
Invoice
The invoice model automates processing of invoices to extracts customer name, billing address, due date, and amount due, line items and other key data. Currently, the model supports English, Spanish, German, French, Italian, Portuguese, and Dutch invoices.
Sample invoice processed using Document Intelligence Studio:
Receipt
Use the receipt model to scan sales receipts for merchant name, dates, line items, quantities, and totals from printed and handwritten receipts. The version v3.0 also supports single-page hotel receipt processing.
Sample receipt processed using Document Intelligence Studio:
Identity document (ID)
Use the Identity document (ID) model to process U.S. Driver's Licenses (all 50 states and District of Columbia) and biographical pages from international passports (excluding visa and other travel documents) to extract key fields.
Sample U.S. Driver's License processed using Document Intelligence Studio:
Custom models
Custom document models analyze and extract data from forms and documents specific to your business. They're trained to recognize form fields within your distinct content and extract key-value pairs and table data. You only need five examples of the same form type to get started.
Version v3.0 custom model supports signature detection in custom forms (template model) and cross-page tables in both template and neural models.
Sample custom template processed using Document Intelligence Studio:
Custom extraction
Custom extraction model can be one of two types, custom template or custom neural. To create a custom extraction model, label a dataset of documents with the values you want extracted and train the model on the labeled dataset. You only need five examples of the same form or document type to get started.
Sample custom extraction processed using Document Intelligence Studio:
Custom classifier
The custom classification model enables you to identify the document type prior to invoking the extraction model. The classification model is available starting with the 2023-07-31 (GA)
API. Training a custom classification model requires at least two distinct classes and a minimum of five samples per class.
Composed models
A composed model is created by taking a collection of custom models and assigning them to a single model built from your form types. You can assign multiple custom models to a composed model called with a single model ID. You can assign up to 200 trained custom models to a single composed model.
Composed model dialog window in Document Intelligence Studio:
Model data extraction
Model ID | Text extraction | Language detection | Selection Marks | Tables | Paragraphs | Structure | Key-Value pairs | Fields |
---|---|---|---|---|---|---|---|---|
prebuilt-read | ✓ | ✓ | ✓ | |||||
prebuilt-healthInsuranceCard.us | ✓ | ✓ | ✓ | ✓ | ||||
prebuilt-tax.us.w2 | ✓ | ✓ | ✓ | ✓ | ||||
prebuilt-tax.us.1098 | ✓ | ✓ | ✓ | ✓ | ||||
prebuilt-tax.us.1098E | ✓ | ✓ | ✓ | ✓ | ||||
prebuilt-tax.us.1098T | ✓ | ✓ | ✓ | ✓ | ||||
prebuilt-tax.us.1099(variations) | ✓ | ✓ | ✓ | ✓ | ||||
prebuilt-document | ✓ | ✓ | ✓ | ✓ | ✓ | |||
prebuilt-layout | ✓ | ✓ | ✓ | ✓ | ✓ | |||
prebuilt-invoice | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
prebuilt-receipt | ✓ | ✓ | ✓ | |||||
prebuilt-idDocument | ✓ | ✓ | ✓ | |||||
prebuilt-businessCard | ✓ | ✓ | ✓ | |||||
Custom | ✓ | ✓ | ✓ | ✓ | ✓ |
Input requirements
For best results, provide one clear photo or high-quality scan per document.
Supported file formats:
Model PDF Image:
JPEG/JPG, PNG, BMP, TIFF, HEIFMicrosoft Office:
Word (DOCX), Excel (XLSX), PowerPoint (PPTX), and HTMLRead ✔ ✔ ✔ Layout ✔ ✔ ✔ (2023-10-31-preview) General Document ✔ ✔ Prebuilt ✔ ✔ Custom ✔ ✔ For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
The file size for analyzing documents is 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
Image dimensions must be between 50 x 50 pixels and 10,000 px x 10,000 pixels.
If your PDFs are password-locked, you must remove the lock before submission.
The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about
8
-point text at 150 dots per inch (DPI).For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
For custom extraction model training, the total size of training data is 50 MB for template model and 1G-MB for the neural model.
For custom classification model training, the total size of training data is
1GB
with a maximum of 10,000 pages.
Note
The Sample Labeling tool does not support the BMP file format. This is a limitation of the tool not the Document Intelligence Service.
Version migration
Learn how to use Document Intelligence v3.0 in your applications by following our Document Intelligence v3.1 migration guide
Model | Description |
---|---|
Document analysis | |
Layout | Extract text and layout information from documents. |
Prebuilt | |
Invoice | Extract key information from English and Spanish invoices. |
Receipt | Extract key information from English receipts. |
ID document | Extract key information from US driver licenses and international passports. |
Business card | Extract key information from English business cards. |
Custom | |
Custom | Extract data from forms and documents specific to your business. Custom models are trained for your distinct data and use cases. |
Composed | Compose a collection of custom models and assign them to a single model built from your form types. |
Layout
The Layout API analyzes and extracts text, tables and headers, selection marks, and structure information from documents.
Sample document processed using the Sample Labeling tool:
Invoice
The invoice model analyzes and extracts key information from sales invoices. The API analyzes invoices in various formats and extracts key information such as customer name, billing address, due date, and amount due.
Sample invoice processed using the Sample Labeling tool:
Receipt
- The receipt model analyzes and extracts key information from printed and handwritten sales receipts.
Sample receipt processed using Sample Labeling tool:
ID document
The ID document model analyzes and extracts key information from the following documents:
U.S. Driver's Licenses (all 50 states and District of Columbia)
Biographical pages from international passports (excluding visa and other travel documents). The API analyzes identity documents and extracts
Sample U.S. Driver's License processed using the Sample Labeling tool:
Business card
The business card model analyzes and extracts key information from business card images.
Sample business card processed using the Sample Labeling tool:
Custom
- Custom models analyze and extract data from forms and documents specific to your business. The API is a machine-learning program trained to recognize form fields within your distinct content and extract key-value pairs and table data. You only need five examples of the same form type to get started and your custom model can be trained with or without labeled datasets.
Sample custom model processing using the Sample Labeling tool:
Composed custom model
A composed model is created by taking a collection of custom models and assigning them to a single model built from your form types. You can assign multiple custom models to a composed model called with a single model ID. you can assign up to 100 trained custom models to a single composed model.
Composed model dialog window using the Sample Labeling tool:
Model data extraction
Model | Text extraction | Language detection | Selection Marks | Tables | Paragraphs | Paragraph roles | Key-Value pairs | Fields |
---|---|---|---|---|---|---|---|---|
Layout | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Invoice | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
Receipt | ✓ | ✓ | ✓ | |||||
ID Document | ✓ | ✓ | ✓ | |||||
Business Card | ✓ | ✓ | ✓ | |||||
Custom Form | ✓ | ✓ | ✓ | ✓ | ✓ |
Input requirements
For best results, provide one clear photo or high-quality scan per document.
Supported file formats:
Model PDF Image:
JPEG/JPG, PNG, BMP, TIFF, HEIFMicrosoft Office:
Word (DOCX), Excel (XLSX), PowerPoint (PPTX), and HTMLRead ✔ ✔ ✔ Layout ✔ ✔ ✔ (2023-10-31-preview) General Document ✔ ✔ Prebuilt ✔ ✔ Custom ✔ ✔ For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
The file size for analyzing documents is 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
Image dimensions must be between 50 x 50 pixels and 10,000 px x 10,000 pixels.
If your PDFs are password-locked, you must remove the lock before submission.
The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about
8
-point text at 150 dots per inch (DPI).For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
For custom extraction model training, the total size of training data is 50 MB for template model and 1G-MB for the neural model.
For custom classification model training, the total size of training data is
1GB
with a maximum of 10,000 pages.
Note
The Sample Labeling tool does not support the BMP file format. This is a limitation of the tool not the Document Intelligence Service.
Version migration
You can learn how to use Document Intelligence v3.0 in your applications by following our Document Intelligence v3.1 migration guide
Next steps
Try processing your own forms and documents with the Document Intelligence Studio
Complete a Document Intelligence quickstart and get started creating a document processing app in the development language of your choice.
Try processing your own forms and documents with the Document Intelligence Sample Labeling tool
Complete a Document Intelligence quickstart and get started creating a document processing app in the development language of your choice.
Feedback
Submit and view feedback for