This article describes Azure solutions for building, training, deploying, and using custom document processing models. These Azure services also offer user interface (UI) capabilities to do labeling or tagging for text processing.
Download a Visio file of this architecture.
Orchestrators like Azure Logic Apps, Azure Data Factory, or Azure Functions ingest messages and attachments from email servers, and files from FTP servers or web applications.
Azure Functions and Logic Apps enable serverless workloads. The service you choose depends on your preference for service capabilities like development, connectors, management, and execution context. For more information, see Compare Azure Functions and Azure Logic Apps.
Consider using Azure Data Factory for bulk data movement.
The orchestrators send ingested data to Azure Blob Storage or Data Lake Storage, organizing the data across data stores based on characteristics like file extensions or customers.
Form Recognizer Studio, Language Studio, or Azure Machine Learning studio label and tag textual data and build the custom models. You can use these three services independently or in various combinations to address different use cases.
If the document requires extracting key-value pairs or creating a custom table from an image format or PDF, use Form Recognizer Studio to tag the data and train the custom model.
For document classification based on content, or for domain-specific entity extraction, you can train a custom text classification or Named Entity Recognition (NER) model in Language Studio.
Azure Machine Learning studio can also do labeling for text classification or entity extraction with open-source frameworks like PyTorch or TensorFlow.
To deploy the custom models and use them for inference:
Form Recognizer has built-in model deployment. Use Form Recognizer SDKs or the REST API to apply custom models for inferencing. Include the model ID or custom model name in the Form Recognizer request URL, depending on the API version. Form Recognizer doesn't require any further deployment steps.
Language Studio provides an option to deploy custom language models. Get the REST endpoint prediction URL by selecting the model to deploy. You can do model inferencing by using either the REST endpoint or the Azure SDK client libraries.
Azure Machine Learning can deploy custom models to online or batch Azure Machine Learning managed endpoints. You can also deploy to Azure Kubernetes Service (AKS) as a web service by using the Azure Machine Learning SDK.
Logic Apps is part of Azure Integration Services. Logic Apps creates automated workflows that integrate apps, data, services, and systems. With managed connectors for services like Azure Storage and Office 365, you can trigger workflows when a file lands in the storage account or email is received.
Data Factory is a managed cloud extract, transform, load (ETL) service for data integration and transformation. Data Factory can add transformation activities to a pipeline that include invoking a REST endpoint or running a notebook on the ingested data.
Azure Functions is a serverless compute service that can host event-driven workloads with short-lived processes.
Blob Storage is the object storage solution for raw files in this scenario. Blob Storage supports libraries for multiple languages, such as .NET, Node.js, and Python. Applications can access files on Blob Storage via HTTP/HTTPS. Blob Storage has hot, cool, and archive access tiers to support cost optimization for storing large amounts of data.
Data Lake Storage is a set of capabilities built on Azure Blob Storage for big data analytics. Data Lake Storage retains the cost effectiveness of Blob Storage, and provides features like file-level security and file system semantics with hierarchical namespace.
Form Recognizer, part of Azure Applied AI Services, has in-built document analysis capabilities to extract printed and handwritten text, tables, and key-value pairs. Form Recognizer has prebuilt models for extracting data from invoices, documents, receipts, ID cards, and business cards. Form Recognizer can also train and deploy custom models by using either a custom template form model or a custom neural document model.
Form Recognizer Studio provides a UI for exploring Form Recognizer features and models, and for building, tagging, training, and deploying custom models.
Azure Cognitive Service for Language consolidates the Azure natural language processing services. The suite offers prebuilt and customizable options. For more information, see the Cognitive Service for Language available features.
Language Studio provides a UI for exploring and analyzing Azure Cognitive Service for Language features. Language Studio also provides options for building, tagging, training, and deploying custom models.
Azure Machine Learning is an open platform for managing machine learning model development and deployment at scale.
- Azure Machine Learning studio provides data labeling options for images and text.
- Export labeled data as COCO or Azure Machine Learning datasets. You can use the datasets for training and deploying models in Azure Machine Learning notebooks.
- Deploy models to AKS as a web service for real-time inferencing at scale, or as managed endpoints for both real-time and batch inferencing.
You can add more workflows to this scenario based on specific use cases.
If the document is in image or PDF format, you can extract the data by using Azure Computer Vision, Form Recognizer Read API, or open-source libraries.
You can do document and conversation summarization by using the prebuilt model in Azure Cognitive Service for Language.
Use pre-processing code to do text processing steps like cleaning, stop words removal, lemmatization, stemming, and text summarization on extracted data, per document processing requirements. You can expose the code as REST APIs for automation. Do these steps manually or automate them by integrating with the Logic Apps or Azure Functions ingestion process.
Document processing is a broad area. It can be difficult to meet all your document processing needs with the prebuilt models available in Azure Form Recognizer and Azure Cognitive Service for Language. You might need to build custom models to automate document processing for different applications and domains.
Major challenges in model customization include:
- Labeling or tagging text data with relevant key-value pair entities to classify text for extraction.
- Deploying models securely at scale for easy integration with consuming applications.
Potential use cases
The following use cases can take advantage of custom models for document processing:
- Build custom NER and text classification models based on open-source frameworks.
- Extract custom key-values from documents for various industry verticals like insurance and healthcare.
- Tag and extract specific domain-dependent entities beyond the prebuilt NER models, for domains like security or finance.
- Create custom tables from documents.
- Extract signatures.
- Label and classify emails or other documents based on content.
These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.
For this example workload, implementing each pillar depends on optimally configuring and using each component Azure service.
Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the reliability pillar.
See the availability service level agreements (SLAs) for each component Azure service:
- Azure Form Recognizer - SLA for Azure Applied AI Services.
- Azure Cognitive Service for Language - SLA for Azure Cognitive Services.
- Azure Functions - SLA for Azure Functions.
- Azure Kubernetes Service - SLA for Azure Kubernetes Service (AKS).
- Azure Storage - SLA for Storage Accounts.
For configuration options to design high availability applications with Azure storage accounts, see Use geo-redundancy to design highly available applications.
Handle failure modes of individual services like Azure Functions and Azure Storage to ensure resiliency of the compute services and data stores in this scenario. For more information, see Resiliency checklist for specific Azure services.
For Form Recognizer, back up and recover your Form Recognizer models.
For custom text classification with Cognitive Services for Language, back up and recover your custom text classification models.
For custom NER in Cognitive Services for Language, back up and recover your custom NER models.
Azure Machine Learning depends on constituent services like Blob Storage, compute services, and AKS. To provide resiliency for Azure Machine Learning, configure each of these services to be resilient. For more information, see Failover for business continuity and disaster recovery.
Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.
Implement data protection, identity and access management, and network security recommendations for Blob Storage, Cognitive Services for Form Recognizer and Language Studio, and Azure Machine Learning.
Azure Functions can access resources in a virtual network through virtual network integration.
Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.
The total cost of implementing this solution depends on the pricing of the services you choose.
The major costs for this solution are:
The compute cost involved in Azure Machine Learning training. Choose the right node type, cluster size, and number of nodes to help optimize costs. Azure Machine Learning provides options to set the minimum nodes to zero and to set the idle time before the scale down. For more information, see Manage and optimize Azure Machine Learning costs.
Data orchestration duration and activities. For Azure Data Factory, the charges for copy activities on the Azure integration runtime are based on the number of Data Integration Units (DIUs) used and the execution duration. Added orchestration activity runs are also charged, based on their number.
Logic Apps pricing plans depend on the resources you create and use. The following articles can help you choose the right plan for specific use cases:
For more information on pricing for specific components, see the following resources:
- Azure Form Recognizer pricing
- Azure Functions pricing
- Logic Apps Pricing
- Azure Data Factory pricing
- Azure Blob Storage pricing
- Language Service pricing
- Azure Machine Learning pricing
Use the Azure pricing calculator to add your selected component options and estimate the overall solution cost.
Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Performance efficiency pillar overview.
To scale Azure Functions automatically or manually, choose the right hosting plan.
Form Recognizer supports 15 concurrent requests per second by default. To request an increased quota, create an Azure support ticket.
For Azure Machine Learning custom models hosted as web services on AKS, the azureml-fe front end automatically scales as needed. This component also routes incoming inference requests to deployed services.
For deployments as managed endpoints, support autoscaling by integrating with the Azure Monitor autoscale feature.
The API service limits on custom NER and custom text classification for inferencing are 20 GET or POST requests per minute.
This article is maintained by Microsoft. It was originally written by the following contributor.
- Jyotsna Ravi | Sr. Customer Engineer
To see non-public LinkedIn profiles, sign in to LinkedIn.
- Get started: Form Recognizer Studio
- Use Form Recognizer SDKs or REST API
- Quickstart: Get started with Language Studio
- What is optical character recognition (OCR)?
- How to configure Azure Functions with a virtual network