Document Field extraction - custom generative AI model

Article
08/11/2024

Important

Document Intelligence public preview releases provide early access to features that are in active development. Features, approaches, and processes may change, prior to General Availability (GA), based on user feedback.
The public preview version of Document Intelligence client libraries default to REST API version 2024-07-31-preview and is currently only available in the following Azure regions.
- East US
- West US2
- West Europe
- North Central US
The new custom generative model in AI Studio is only available in the North Central US region:

The document field extraction (custom generative AI) model utilizes generative AI to extract user-specified fields from documents across a wide variety of visual templates. The custom generative AI model combines the power of document understanding with Large Language Models (LLMs) and the rigor and schema from custom extraction capabilities to create a model with high accuracy in minutes. With this generative model type, you can start with a single document and go through the schema addition and model creation process with minimal labeling. The custom generative model allows developers and enterprises to easily automate data extraction workflows with greater accuracy and speed for any type of document. The custom generative AI model excels in extracting simple fields from documents without labeled samples. However, providing a few labeled samples improves the extraction accuracy for complex fields and user-defined fields like tables. You can use the REST API or client libraries to submit a document for analysis with a model build and use the custom generative process.

Custom generative AI model benefits

Automatic labeling. Utilize large language models (LLM) and extract user-specified fields for various document types and visual templates.
Improved Generalization. Extract data from unstructured data and varying document templates with higher accuracy.
Grounded results. Localize the data extracted in the documents. Custom generative models ground the results where applicable, ensuring the response is generated from the content and enable human review workflows.
Confidence scores. Use confidence scores for each extracted field to, filter high quality extracted data, maximize straight through processing of documents and minimize human review costs.

Common use cases

Contract Lifecycle Management. Build a generative model and extract the fields, clauses, and obligations from a wide array of contract types.
Loan & Mortgage Applications. Automation of loan and mortgage application process enables banks, lenders, and government entities to quickly process loan and mortgage application.
Financial Services. With the custom generative AI model, analyze complex documents like financial reports and asset management reports.
Expense management. Receipts and invoices from various retailers and businesses need to be parsed to validate the expenses. The custom generative AI model can extract expenses across different formats and documents with varying templates.

Managing the training dataset

With our other custom models, you need to maintain the dataset, add new samples, and train the model for accuracy improvements. With the custom generative AI model, the labeled documents are transformed, encrypted, and stored as part of the model. This process ensures that the model can continually use the labeled samples to improve the extraction quality. As with other custom models, models are stored in Microsoft storage, and you can delete them anytime.

The Document Intelligence service does manage your datasets, but your documents are stored encrypted and only used to improve the model results for your specific model. A service-manged key can be used to encrypt your data or it can be optionally encrypted with a customer managed key. The change in management and lifecycle of the dataset only applies to custom generative models.

Model capabilities

Field extraction custom generative model currently supports dynamic table with the 2024-07-31-preview and the following fields:

Form fields	Selection marks	Tabular fields	Signature	Region labeling	Overlapping fields
Supported	Supported	Supported	Unsupported	Unsupported	Supported

Build mode

The build custom model operation supports custom template, neural, and generative models, see Custom model build mode. Here are the differences in the model types:

Custom generative AI models can process complex documents with various formats, varied templates, and unstructured data.
Custom neural models support complex document processing and also support more variance in pages for structured and semi-structured documents.
Custom template models rely on consistent visual templates, such as questionnaires or applications, to extract the labeled data.

Languages and locale support

Field extraction custom generative model 2024-07-31-preview version supports the en-us locale. For more information on language support, see Language support - custom models.

Region support

Field extraction custom generative model 2024-07-31-preview version is only available in North Central US.

Input requirements

Supported file formats:

Model	PDF	Image: `JPEG/JPG`, `PNG`, `BMP`, `TIFF`, `HEIF`	Microsoft Office: Word (`DOCX`), Excel (`XLSX`), PowerPoint (`PPTX`), HTML
Read	✔	✔	✔
Layout	✔	✔	✔ (2024-07-31-preview, 2024-02-29-preview, 2023-10-31-preview)
General Document	✔	✔
Prebuilt	✔	✔
Custom extraction	✔	✔
Custom classification	✔	✔	✔ (2024-07-31-preview, 2024-02-29-preview)

For best results, provide one clear photo or high-quality scan per document.
For PDF and TIFF, up to 2,000 pages can be processed (with a free tier subscription, only the first two pages are processed).
The file size for analyzing documents is 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
Image dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
If your PDFs are password-locked, you must remove the lock before submission.
The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8 point text at 150 dots per inch (DPI).
For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
- For custom extraction model training, the total size of training data is 50 MB for template model and 1 GB for the neural model.
- For custom classification model training, the total size of training data is 1 GB with a maximum of 10,000 pages. For 2024-07-31-preview and later, the total size of training data is 2 GB with a maximum of 10,000 pages.

Best practices

Representative data. Use representative documents that target actual data distribution, and train a high-quality custom generative model. For example, if the target document includes partially filled tabular fields, add training documents that consist of partially filled tables. Or if field is named date, values for this field should be a date as random strings can affect model performance.
Field naming. Choose a precise field name that represents the field values. For example, for a field value containing the Transaction Date, consider naming the field TransactionDate instead of Date1.
Field Description. Provide more contextual information in description to help clarify the field that needs to be extracted. Examples include location in the document, potential field labels it can be associated with, and ways to differentiate with other terms that could be ambiguous.
Variation. Custom generative models can generalize across different document templates of the same document type. As a best practice, create a single model for all variations of a document type. Ideally, include a visual template for each type, especially for ones that

Service guidance

The Custom Generative preview model doesn't currently support fixed table and signature extraction.
Inference on the same document could yield slightly different results across calls and is a known limitation of current GPT models.
Confidence scores for each field might vary. We recommend testing with your representative data to establish the confidence thresholds for your scenario.
Grounding, especially for tabular fields, is challenging and might not be perfect in some cases.
Latency for large documents is high and a known limitation in preview.
Composed models don't support custom generative extraction.

Training a model

Custom generative models are available with the 2024-07-31-preview version and later models.

The build operation to train model supports the buildMode property, to train a custom generative model, set the buildMode to generative.


https://{endpoint}/documentintelligence/documentModels:build?api-version=2024-07-31-preview

{
  "modelId": "string",
  "description": "string",
  "buildMode": "generative",
  "azureBlobSource":
  {
    "containerUrl": "string",
    "prefix": "string"
  }
}

Next steps

Learn how to create custom generative models
Learn more about custom models

Share via