Train and use custom models
When prebuilt models don't cover your specific document types, you can train custom models to extract data from your own forms. Azure Document Intelligence supports supervised machine learning, where you label sample documents with the fields you want to extract, and the service trains a model to recognize those fields in new documents.
Custom model types
Azure Document Intelligence offers two types of custom extraction models, plus a classification model:
Custom template models
Custom template models rely on a consistent visual template to extract labeled data. They work best for structured forms where the layout is static from one document instance to the next, such as questionnaires, applications, or standard government forms.
Template models accurately extract labeled key-value pairs, selection marks, tables, regions, and signatures. Training takes only a few minutes, and more than 100 languages are supported. Because template models are fast to train and cost-effective to run, they're a good starting point when your documents have a uniform visual layout.
Custom neural models
Custom neural models use deep learning and are fine-tuned on your labeled data. They combine layout and language features to extract fields from structured, semi-structured, and unstructured documents. Neural models support:
- Overlapping fields
- Signature detection
- Table, row, and cell level confidence
Neural models deliver higher accuracy than template models, especially for semi-structured or unstructured documents where the layout varies between instances. However, they take longer to train and consume more resources.
Choose between template and neural models
When deciding which custom model type to use, consider the tradeoffs:
| Factor | Custom template | Custom neural |
|---|---|---|
| Best for | Structured forms with a consistent visual layout | Semi-structured or unstructured documents with varying layouts |
| Training time | Minutes | Longer (depends on dataset size) |
| Training cost | Lower | Higher |
| Accuracy | High for fixed-layout forms; decreases when layout varies | Higher overall, especially for documents with format variation |
| Language support | 100+ languages | Fewer languages (check documentation for current support) |
| Feature support | Key-value pairs, selection marks, tables, regions, signatures | Overlapping fields, signature detection, table/row/cell confidence |
Tip
Start with a custom template model if your forms have a consistent visual layout. It's faster and cheaper to train. If accuracy is insufficient or your documents vary in format, switch to a custom neural model.
Custom classifiers
Custom classification models identify the type of a document before invoking an extraction model. You can use a classifier to route incoming documents to the appropriate extraction model when you're handling multiple form types.
Train a custom model
To train a custom extraction model:
- Store sample forms in an Azure blob container, along with JSON files containing layout and label field information:
- An
ocr.jsonfile for each sample form (generated using the Analyze document function). - A single
fields.jsonfile describing the fields you want to extract. - A
labels.jsonfile for each sample form, mapping fields to their location in the form.
- An
- Generate a shared access signature (SAS) URL for the container.
- Use the Build model REST API function or the equivalent SDK method.
- Use the Get model REST API function to retrieve the trained model ID.
You can also train custom models visually using the Document Intelligence Studio, as described in the Use the Document Intelligence Studio unit.
Tip
Use at least five to six sample forms for training. A larger and more varied dataset produces more accurate models.
Use a custom model
To extract form data with a custom model, call the Analyze document function with your model ID. You can use either a supported SDK or the REST API.
C#
string endpoint = "<endpoint>";
string apiKey = "<apiKey>";
AzureKeyCredential credential = new AzureKeyCredential(apiKey);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
string modelId = "<modelId>";
Uri fileUri = new Uri("<fileUri>");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, modelId, fileUri);
AnalyzeResult result = operation.Value;
Python
endpoint = "YOUR_DOC_INTELLIGENCE_ENDPOINT"
key = "YOUR_DOC_INTELLIGENCE_KEY"
model_id = "YOUR_CUSTOM_BUILT_MODEL_ID"
formUrl = "YOUR_DOCUMENT"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
task = document_analysis_client.begin_analyze_document_from_url(model_id, formUrl)
result = task.result()
A successful response contains an analyzeResult object with the extracted content and an array of pages containing information about the document.
Composed models
You can combine multiple custom models into a single composed model. When you submit a document to a composed model, Document Intelligence classifies it to determine the most appropriate component model, and then returns the extraction results from that model. This approach is useful when you handle multiple form types that each require their own extraction model.