This article describes Azure solutions for building, training, deploying, and using custom document processing models. These Azure services also provide user interface (UI) capabilities for labeling or tagging text during processing.
Architecture
Download a Visio file of this architecture.
Dataflow
The following dataflow corresponds to the previous diagram:
Orchestrators like Azure Logic Apps, Azure Data Factory, or Azure Functions ingest messages and attachments from email servers and files from file transfer protocol servers or web applications.
Functions and Logic Apps enable serverless workloads. The service that you choose depends on your preference for service capabilities like development, connectors, management, and operational context. For more information, see Compare Functions and Logic Apps.
Consider using Azure Data Factory to move data in bulk.
The orchestrators send ingested data to Azure Blob Storage or Azure Data Lake Storage. They organize the data within these stores based on characteristics like file extensions or customer details.
You can use the following Azure services, either independently or in combination, for training documents and building custom models to address various use cases.
Document Intelligence Studio: If the document requires you to extract key-value pairs or create a custom table from an image or PDF, use Document Intelligence Studio to tag the data and train the custom model. If there's a requirement to identify the type of document, called document classification, before you invoke the correct extraction model, use Document Intelligent Studio to label the documents and build the models.
Language Studio: For document classification based on content, or for domain-specific entity extraction, you can train a custom text classification or named entity recognition (NER) model in Language Studio.
Azure Machine Learning studio: For labeling data for text classification or entity extraction to use with open-source frameworks like PyTorch or TensorFlow, use Machine Learning studio, the Python SDK, Azure CLI, or the REST API. Machine Learning studio provides a model catalog of foundation models. These foundation models have fine-tuning capabilities for various tasks like text classification, question answering, and summarization. To fine-tune foundation models, use the Machine Learning studio UI or code.
Azure OpenAI Service: To fine-tune Azure OpenAI models on your own data or domain for various tasks like text summarization and question answering, use Azure AI Foundry portal, Python SDK, or REST API.
To deploy the custom models and use them for inferencing:
Azure AI Document Intelligence has built-in model deployment. Inferencing with the custom models is done by using SDKs or document models REST API. The modelId, or model name, specified during model creation is included in the request URL for document analysis. Document Intelligence doesn't require any further deployment steps.
Language Studio provides an option to deploy custom language models. Get the REST endpoint prediction URL by selecting the model for deployment. You can inference models by using either the REST endpoint or the Azure SDK client libraries.
Machine Learning deploys custom models to online or batch Machine Learning managed endpoints. You can also use the Machine Learning SDK to deploy to Azure Kubernetes Service (AKS) as a web service. Fine-tuned foundation models can be deployed from the model catalog via managed compute or a serverless API. Models deployed through managed compute can be inferenced by using managed endpoints, which include online endpoints for real-time inferencing and batch endpoints for batch inferencing.
Azure AI Foundry provides options to deploy fine-tuned Azure OpenAI models. You can also deploy fine-tuned Azure OpenAI models by using the Python SDK or REST API.
Components
Logic Apps is part of Azure Integration Services. Logic Apps creates automated workflows that integrate apps, data, services, and systems. You can use managed connectors for services like Azure Storage and Microsoft 365 to trigger workflows when a file arrives in the storage account or an email is received.
Azure Data Factory is a managed cloud extract, transform, and load service for data integration and transformation. Azure Data Factory can add transformation activities to a pipeline that include invoking a REST endpoint or running a notebook on the ingested data.
Functions is a serverless compute service that can host event-driven workloads that have short-lived processes.
Blob Storage is the object storage solution for raw files in this scenario. Blob Storage supports libraries for multiple languages, such as .NET, Node.js, and Python. Applications can access files on Blob Storage via HTTP or HTTPS. Blob Storage has hot, cool, and archive access tiers to support cost optimization for storing large amounts of data.
Data Lake Storage is a set of capabilities built on Blob Storage for big data analytics. Data Lake Storage maintains the cost effectiveness of Blob Storage and provides features like file-level security and file system semantics with a hierarchical namespace.
Document Intelligence is a component of Azure AI services. Document Intelligence has built-in document analysis capabilities for extracting printed and handwritten text, tables, and key-value pairs. Document Intelligence has prebuilt models for extracting data from invoices, documents, receipts, ID cards, and business cards. Document Intelligence also has a custom template form model and a custom neural document model that you can use to train and deploy custom models.
Document Intelligence Studio provides an interface to explore Document Intelligence features and models. It also enables you to build, tag, train, and deploy custom models.
Azure AI Language consolidates the Azure natural language processing (NLP) services. The suite provides prebuilt and customizable options.
Language Studio provides a UI that you can use to explore and analyze Language features. It also provides options for building, tagging, training, and deploying custom models.
Azure Machine Learning is a managed machine learning platform for model development and deployment at scale.
Machine Learning studio provides data labeling options for images and text.
Export labeled data as COCO or Machine Learning datasets. You can use these datasets to train and deploy models in Machine Learning notebooks.
Azure OpenAI provides powerful language models and multimodal models as REST APIs that you can use to perform various tasks. Specific models can be fine-tuned to improve the model performance on data that's missing or underrepresented when the base model is originally trained.
Alternatives
You can add more workflows to this scenario based on specific use cases.
If the document is an image or PDF, you can extract the data by using Azure optical character recognition, the Document Intelligence Read API, or open-source libraries.
You can use the prebuilt model in Language for document and conversation summarization.
Use preprocessing code to perform text processing steps. These steps include cleaning, stop words removal, lemmatization, stemming, and text summarization on extracted data according to document processing requirements. You can expose the code as REST APIs for automation. Manually complete or automate these steps by integrating with the Logic Apps or Functions ingestion process.
You can explore Azure OpenAI models and a collection of foundation models in the model catalog. You can also use Azure AI Foundry portal to fine-tune and deploy foundation models, and build generative AI applications. Because there's overlap between Machine Learning and Azure AI Foundry, you must evaluate their capabilities and choose the best platform for your scenario.
You can use Azure AI Content Understanding to create a custom analyzer by defining a field schema for extracting structured data from the document.
Scenario details
Document processing covers a wide range of tasks. It can be difficult to meet all your document processing needs by using the prebuilt models available in Language and Document Intelligence. You might need to build custom models to automate document processing for different applications and domains.
Major challenges in model customization include:
Labeling or tagging text data with relevant key-value pair entities to classify text for extraction.
Managing training infrastructure, such as compute and storage, and their integrations.
Deploying models securely at scale for seamless integration with consuming applications.
Potential use cases
The following use cases can take advantage of custom models for document processing:
Build custom NER and text classification models based on open-source frameworks.
Extract custom key values from documents for various industry verticals like insurance and healthcare.
Tag and extract specific domain-dependent entities beyond the prebuilt NER models for domains like security or finance.
Create custom tables from documents.
Extract signatures.
Label and classify emails or other documents based on content.
Summarize documents or create custom question-and-answer models based on your data.
Considerations
These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Well-Architected Framework.
For this example workload, implementing each pillar depends on optimally configuring and using each component Azure service.
Reliability
Reliability helps ensure that your application can meet the commitments that you make to your customers. For more information, see Design review checklist for Reliability.
Availability
For more information about the service-level agreements for each architecture component, see Licensing documents.
For more information about configuration options to design high-availability applications with Storage accounts, see Use geo-redundancy to design highly available applications.
Resiliency
Address failure modes of individual services like Functions and Storage to help ensure resiliency of the compute services and data stores in this scenario. For more information, see Resiliency checklist for specific Azure services.
Back up and recover your custom text classification models and NER models in Language.
Machine Learning depends on constituent services like Blob Storage, compute services, and AKS. To provide resiliency for Machine Learning, configure each of these services to be resilient. For more information, see Failover for business continuity and disaster recovery (BCDR).
For Azure OpenAI, help ensure continuous availability by provisioning two or more Azure OpenAI resources in different regions. This approach allows failover to another region if there's a problem. For more information, see BCDR with Azure OpenAI.
Security
Security provides assurances against deliberate attacks and the misuse of your valuable data and systems. For more information, see Design review checklist for Security.
Implement data protection, identity and access management, and network security recommendations for Blob Storage, AI services for Document Intelligence and Language Studio, Machine Learning, and Azure OpenAI.
Cost Optimization
Cost Optimization focuses on ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.
The total cost of implementing this solution depends on the pricing of the services that you choose.
The major costs for this solution include:
The compute cost to train and deploy Machine Learning models.
To help optimize costs, choose the right node type, cluster size, and number of nodes. Machine Learning provides options for training, such as setting the minimum number of compute cluster nodes to zero and defining the idle time before scaling down. For more information, see Manage and optimize Machine Learning costs.
Data orchestration duration and activities. For Azure Data Factory, the charges for copy activities on the Azure integration runtime are based on the number of data integration units used and the time taken to perform the activities. Added orchestration activity runs are also charged, based on their number.
Logic Apps pricing plans depend on the resources that you create and use. The following articles can help you choose the right plan for specific use cases:
For more information about pricing for specific components, see the following resources:
- Azure AI Document Intelligence pricing
- Functions pricing
- Logic Apps pricing
- Azure Data Factory pricing
- Blob Storage pricing
- Language pricing
- Machine Learning pricing
- Azure OpenAI pricing
Use the Azure pricing calculator to add the component options that you choose and estimate the overall cost of the solution.
Performance Efficiency
Performance Efficiency refers to your workload's ability to scale to meet user demands efficiently. For more information, see Design review checklist for Performance Efficiency.
Scalability
To scale Functions automatically or manually, choose the right hosting plan.
By default, Document Intelligence supports 15 concurrent requests per second. To increase this quota, create an Azure support ticket.
For Machine Learning custom models hosted as web services on AKS, the azureml-fe front-end component automatically scales as needed. This component also routes incoming inference requests to deployed services.
For deployments as managed endpoints, support autoscaling by integrating with the Azure Monitor autoscale feature. For more information, see Endpoints for inference in production.
The API service limits on custom NER and custom text classification for inferencing are 20 GET or POST requests per minute.
Contributors
Microsoft maintains this article. The following contributors wrote this article.
Principal author:
- Jyotsna Ravi | Senior Customer Engineer
To see nonpublic LinkedIn profiles, sign in to LinkedIn.
Next steps
- Get started with custom projects in Document Intelligence Studio
- Use Document Intelligence models
- What is Azure AI Language?
- What is optical character recognition?
- How to configure Functions with a virtual network