This article describes Azure solutions for building, training, deploying, and using custom document processing models. These Azure services also offer user interface (UI) capabilities to do labeling or tagging for text processing.
Architecture
Download a Visio file of this architecture.
Dataflow
Orchestrators like Azure Logic Apps, Azure Data Factory, or Azure Functions ingest messages and attachments from email servers, and files from FTP servers or web applications.
Azure Functions and Logic Apps enable serverless workloads. The service you choose depends on your preference for service capabilities like development, connectors, management, and execution context. For more information, see Compare Azure Functions and Azure Logic Apps.
Consider using Azure Data Factory for bulk data movement.
The orchestrators send ingested data to Azure Blob Storage or Data Lake Storage, organizing the data across data stores based on characteristics like file extensions or customers.
Document Intelligence Studio, Language Studio, or Azure Machine Learning studio label and tag textual data and build the custom models. You can use these three services independently or in various combinations to address different use cases.
If the document requires you to extract key-value pairs or create a custom table from an image format or PDF, use Document Intelligence Studio to tag the data and train the custom model. Similarly, if there's a requirement to identify the type of document before invoking the right extraction model, use Document Intelligent Studio to label the documents.
For document classification based on content, or for domain-specific entity extraction, you can train a custom text classification or Named Entity Recognition (NER) model in Language Studio.
Machine Learning studio has data labeling capabilities for text classification or entity extraction that you can use with open-source frameworks like PyTorch or TensorFlow. Azure Machine Learning studio provides a model catalog of foundation models. These foundation models have fine-tuning capabilities for various tasks like text classification, question answering, and summarization. To fine-tune foundation models, use Machine Learning studio UI or code.
To deploy the custom models and use them for inference:
Azure AI Document Intelligence has built-in model deployment. Use Document Intelligence SDKs or the REST API to apply custom models for inferencing. Include the model ID or custom model name in the Document Intelligence request URL, depending on the API version. Document Intelligence doesn't require any further deployment steps.
Language Studio provides an option to deploy custom language models. Get the REST endpoint prediction URL by selecting the model to deploy. You can do model inferencing by using either the REST endpoint or the Azure SDK client libraries.
Machine Learning deploys custom models to online or batch Machine Learning managed endpoints. You can use the Machine Learning SDK to deploy to Azure Kubernetes Service (AKS) as a web service.
Fine-tuned foundation models are deployed from the model catalog to endpoints for inferencing.
Components
Logic Apps is part of Azure Integration Services. Logic Apps creates automated workflows that integrate apps, data, services, and systems. With managed connectors for services like Azure Storage and Microsoft 365, you can trigger workflows when a file lands in the storage account or email is received.
Data Factory is a managed cloud extract, transform, load (ETL) service for data integration and transformation. Data Factory can add transformation activities to a pipeline that include invoking a REST endpoint or running a notebook on the ingested data.
Azure Functions is a serverless compute service that can host event-driven workloads with short-lived processes.
Blob Storage is the object storage solution for raw files in this scenario. Blob Storage supports libraries for multiple languages, such as .NET, Node.js, and Python. Applications can access files on Blob Storage via HTTP/HTTPS. Blob Storage has hot, cool, and archive access tiers to support cost optimization for storing large amounts of data.
Data Lake Storage is a set of capabilities built on Azure Blob Storage for big data analytics. Data Lake Storage retains the cost effectiveness of Blob Storage, and provides features like file-level security and file system semantics with hierarchical namespace.
Document Intelligence is part of Azure AI services. Document Intelligence has built-in document analysis capabilities that you can use to extract printed and handwritten text, tables, and key-value pairs. Document Intelligence has prebuilt models for extracting data from invoices, documents, receipts, ID cards, and business cards. Document Intelligence also has a custom template form model and a custom neural document model that you can use to train and deploy custom models.
Document Intelligence Studio provides a UI that you can use to explore Document Intelligence features and models, and to build, tag, train, and deploy custom models.
Azure AI Language consolidates the Azure natural language processing services. The suite offers prebuilt and customizable options. For more information, see the Azure AI Language available features.
Language Studio provides a UI for exploring and analyzing Azure AI Language features. Language Studio also provides options for building, tagging, training, and deploying custom models.
Azure Machine Learning is an open platform for managing machine learning model development and deployment at scale.
- Azure Machine Learning studio provides data labeling options for images and text.
- Export labeled data as COCO or Azure Machine Learning datasets. You can use the datasets for training and deploying models in Azure Machine Learning notebooks.
- Deploy models to AKS as a web service for real-time inferencing at scale, or as managed endpoints for both real-time and batch inferencing.
Alternatives
You can add more workflows to this scenario based on specific use cases.
If the document is in image or PDF format, you can extract the data by using Azure computer vision, Document Intelligence Read API, or open-source libraries.
You can do document and conversation summarization by using the prebuilt model in Azure AI Language.
Use preprocessing code to perform text processing steps. These steps include cleaning, stop words removal, lemmatization, stemming, and text summarization on extracted data according to document processing requirements. You can expose the code as REST APIs for automation. Do these steps manually or automate them by integrating with the Logic Apps or Azure Functions ingestion process.
You can use Azure AI Studio to fine-tune and deploy foundation models.
Scenario details
Document processing is a broad area. It can be difficult to meet all your document processing needs with the prebuilt models available in Document Intelligence and Azure AI Language. You might need to build custom models to automate document processing for different applications and domains.
Major challenges in model customization include:
- Labeling or tagging text data with relevant key-value pair entities to classify text for extraction.
- Deploying models securely at scale for easy integration with consuming applications.
Potential use cases
The following use cases can take advantage of custom models for document processing:
- Build custom NER and text classification models based on open-source frameworks.
- Extract custom key-values from documents for various industry verticals like insurance and healthcare.
- Tag and extract specific domain-dependent entities beyond the prebuilt NER models, for domains like security or finance.
- Create custom tables from documents.
- Extract signatures.
- Label and classify emails or other documents based on content.
Considerations
These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.
For this example workload, implementing each pillar depends on optimally configuring and using each component Azure service.
Reliability
Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the reliability pillar.
Availability
See the service-level agreements (SLAs) for each architecture component at Service Level Agreements (SLA) for Online Services.
For configuration options to design high-availability applications with Azure Storage accounts, see Use geo-redundancy to design highly available applications.
Resiliency
Handle failure modes of individual services like Azure Functions and Azure Storage to ensure resiliency of the compute services and data stores in this scenario. For more information, see Resiliency checklist for specific Azure services.
For Document Intelligence, back up and recover your Document Intelligence models.
For custom text classification with Azure AI Language, back up and recover your custom text classification models.
For custom NER in Azure AI Language, back up and recover your custom NER models.
Azure Machine Learning depends on constituent services like Blob Storage, compute services, and Azure Kubernetes Service (AKS). To provide resiliency for Azure Machine Learning, configure each of these services to be resilient. For more information, see Failover for business continuity and disaster recovery.
Security
Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.
- Implement data protection, identity and access management, and network security recommendations for Blob Storage, AI Services for Document Intelligence and Language Studio, and Azure Machine Learning.
Cost optimization
Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.
The total cost of implementing this solution depends on the pricing of the services you choose.
The major costs for this solution are:
The compute cost involved in Machine Learning training and deployment of models.
Choose the right node type, cluster size, and number of nodes to help optimize costs. For training, Machine Learning provides the options to set the minimum number of compute cluster nodes to zero and to set the idle time before the scale down. For more information, see Manage and optimize Machine Learning costs.
Data orchestration duration and activities. For Azure Data Factory, the charges for copy activities on the Azure integration runtime are based on the number of data integration units (DIUs) used and the execution duration. Added orchestration activity runs are also charged, based on their number.
Logic Apps pricing plans depend on the resources you create and use. The following articles can help you choose the right plan for specific use cases:
For more information on pricing for specific components, see the following resources:
- Azure AI Document Intelligence pricing
- Azure Functions pricing
- Logic Apps Pricing
- Azure Data Factory pricing
- Azure Blob Storage pricing
- Azure AI Language pricing
- Azure Machine Learning pricing
Use the Azure pricing calculator to add your selected component options and estimate the overall solution cost.
Performance efficiency
Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Performance efficiency pillar overview.
Scalability
To scale Azure Functions automatically or manually, choose the right hosting plan.
Document Intelligence supports 15 concurrent requests per second by default. To request an increased quota, create an Azure support ticket.
For Azure Machine Learning custom models hosted as web services on AKS, the
azureml-fe
front end automatically scales as needed. This component also routes incoming inference requests to deployed services.For deployments as managed endpoints, support autoscaling by integrating with the Azure Monitor autoscale feature.
The API service limits on custom NER and custom text classification for inferencing are 20 GET or POST requests per minute.
Contributors
This article is maintained by Microsoft. It was originally written by the following contributor.
Principal author:
- Jyotsna Ravi | Sr. Customer Engineer
To see non-public LinkedIn profiles, sign in to LinkedIn.
Next steps
- Get started: Document Intelligence Studio
- Use Document Intelligence models through SDKs or REST API
- Quickstart: Get started with Language Studio
- What is optical character recognition (OCR)?
- How to configure Azure Functions with a virtual network