Custom document processing models on Azure

Form Recognizer
Language Understanding
Logic Apps
Machine Learning Studio
Storage

This article describes Azure solutions for building, training, deploying, and using custom document processing models. These Azure services also offer user interface (UI) capabilities to do labeling or tagging for text processing.

Architecture

Diagram showing several alternatives for a custom document processing model build and deployment process.

Download a Visio file of this architecture.

Dataflow

  1. Orchestrators like Azure Logic Apps, Azure Data Factory, or Azure Functions ingest messages and attachments from email servers, and files from FTP servers or web applications.

    • Azure Functions and Logic Apps enable serverless workloads. The service you choose depends on your preference for service capabilities like development, connectors, management, and execution context. For more information, see Compare Azure Functions and Azure Logic Apps.

    • Consider using Azure Data Factory for bulk data movement.

  2. The orchestrators send ingested data to Azure Blob Storage or Data Lake Storage, organizing the data across data stores based on characteristics like file extensions or customers.

  3. Form Recognizer Studio, Language Studio, or Azure Machine Learning studio label and tag textual data and build the custom models. You can use these three services independently or in various combinations to address different use cases.

    • If the document requires extracting key-value pairs or creating a custom table from an image format or PDF, use Form Recognizer Studio to tag the data and train the custom model.

    • For document classification based on content, or for domain-specific entity extraction, you can train a custom text classification or Named Entity Recognition (NER) model in Language Studio.

    • Azure Machine Learning studio can also do labeling for text classification or entity extraction with open-source frameworks like PyTorch or TensorFlow.

  4. To deploy the custom models and use them for inference:

Components

  • Logic Apps is part of Azure Integration Services. Logic Apps creates automated workflows that integrate apps, data, services, and systems. With managed connectors for services like Azure Storage and Office 365, you can trigger workflows when a file lands in the storage account or email is received.

  • Data Factory is a managed cloud extract, transform, load (ETL) service for data integration and transformation. Data Factory can add transformation activities to a pipeline that include invoking a REST endpoint or running a notebook on the ingested data.

  • Azure Functions is a serverless compute service that can host event-driven workloads with short-lived processes.

  • Blob Storage is the object storage solution for raw files in this scenario. Blob Storage supports libraries for multiple languages, such as .NET, Node.js, and Python. Applications can access files on Blob Storage via HTTP/HTTPS. Blob Storage has hot, cool, and archive access tiers to support cost optimization for storing large amounts of data.

  • Data Lake Storage is a set of capabilities built on Azure Blob Storage for big data analytics. Data Lake Storage retains the cost effectiveness of Blob Storage, and provides features like file-level security and file system semantics with hierarchical namespace.

  • Form Recognizer, part of Azure Applied AI Services, has in-built document analysis capabilities to extract printed and handwritten text, tables, and key-value pairs. Form Recognizer has prebuilt models for extracting data from invoices, documents, receipts, ID cards, and business cards. Form Recognizer can also train and deploy custom models by using either a custom template form model or a custom neural document model.

    Form Recognizer Studio provides a UI for exploring Form Recognizer features and models, and for building, tagging, training, and deploying custom models.

  • Azure Cognitive Service for Language consolidates the Azure natural language processing services. The suite offers prebuilt and customizable options. For more information, see the Cognitive Service for Language available features.

    Language Studio provides a UI for exploring and analyzing Azure Cognitive Service for Language features. Language Studio also provides options for building, tagging, training, and deploying custom models.

  • Azure Machine Learning is an open platform for managing machine learning model development and deployment at scale.

    • Azure Machine Learning studio provides data labeling options for images and text.
    • Export labeled data as COCO or Azure Machine Learning datasets. You can use the datasets for training and deploying models in Azure Machine Learning notebooks.
    • Deploy models to AKS as a web service for real-time inferencing at scale, or as managed endpoints for both real-time and batch inferencing.

Alternatives

You can add more workflows to this scenario based on specific use cases.

  • If the document is in image or PDF format, you can extract the data by using Azure Computer Vision, Form Recognizer Read API, or open-source libraries.

  • You can do document and conversation summarization by using the prebuilt model in Azure Cognitive Service for Language.

  • Use pre-processing code to do text processing steps like cleaning, stop words removal, lemmatization, stemming, and text summarization on extracted data, per document processing requirements. You can expose the code as REST APIs for automation. Do these steps manually or automate them by integrating with the Logic Apps or Azure Functions ingestion process.

Scenario details

Document processing is a broad area. It can be difficult to meet all your document processing needs with the prebuilt models available in Azure Form Recognizer and Azure Cognitive Service for Language. You might need to build custom models to automate document processing for different applications and domains.

Major challenges in model customization include:

  • Labeling or tagging text data with relevant key-value pair entities to classify text for extraction.
  • Deploying models securely at scale for easy integration with consuming applications.

Potential use cases

The following use cases can take advantage of custom models for document processing:

  • Build custom NER and text classification models based on open-source frameworks.
  • Extract custom key-values from documents for various industry verticals like insurance and healthcare.
  • Tag and extract specific domain-dependent entities beyond the prebuilt NER models, for domains like security or finance.
  • Create custom tables from documents.
  • Extract signatures.
  • Label and classify emails or other documents based on content.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

For this example workload, implementing each pillar depends on optimally configuring and using each component Azure service.

Reliability

Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the reliability pillar.

Availability

Resiliency

Security

Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.

Cost optimization

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.

The total cost of implementing this solution depends on the pricing of the services you choose.

The major costs for this solution are:

For more information on pricing for specific components, see the following resources:

Use the Azure pricing calculator to add your selected component options and estimate the overall solution cost.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Performance efficiency pillar overview.

Scalability

Contributors

This article is maintained by Microsoft. It was originally written by the following contributor.

Principal author:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps