Custom document processing models on Azure

Azure AI Document Intelligence
Azure AI services
Azure Logic Apps
Azure Machine Learning Studio
Azure Storage

This article describes Azure solutions for building, training, deploying, and using custom document processing models. These Azure services also offer user interface (UI) capabilities to do labeling or tagging for text processing.

Architecture

Architecture diagram showing several alternatives for a custom document processing model build and deployment process.

Download a Visio file of this architecture.

Dataflow

  1. Orchestrators like Azure Logic Apps, Azure Data Factory, or Azure Functions ingest messages and attachments from email servers, and files from FTP servers or web applications.

    • Azure Functions and Logic Apps enable serverless workloads. The service you choose depends on your preference for service capabilities like development, connectors, management, and execution context. For more information, see Compare Azure Functions and Azure Logic Apps.

    • Consider using Azure Data Factory for bulk data movement.

  2. The orchestrators send ingested data to Azure Blob Storage or Data Lake Storage, organizing the data across data stores based on characteristics like file extensions or customers.

  3. Document Intelligence Studio, Language Studio, or Azure Machine Learning studio label and tag textual data and build the custom models. You can use these three services independently or in various combinations to address different use cases.

    • If the document requires you to extract key-value pairs or create a custom table from an image format or PDF, use Document Intelligence Studio to tag the data and train the custom model. Similarly, if there's a requirement to identify the type of document before invoking the right extraction model, use Document Intelligent Studio to label the documents.

    • For document classification based on content, or for domain-specific entity extraction, you can train a custom text classification or Named Entity Recognition (NER) model in Language Studio.

    • Machine Learning studio has data labeling capabilities for text classification or entity extraction that you can use with open-source frameworks like PyTorch or TensorFlow. Azure Machine Learning studio provides a model catalog of foundation models. These foundation models have fine-tuning capabilities for various tasks like text classification, question answering, and summarization. To fine-tune foundation models, use Machine Learning studio UI or code.

  4. To deploy the custom models and use them for inference:

Components

  • Logic Apps is part of Azure Integration Services. Logic Apps creates automated workflows that integrate apps, data, services, and systems. With managed connectors for services like Azure Storage and Microsoft 365, you can trigger workflows when a file lands in the storage account or email is received.

  • Data Factory is a managed cloud extract, transform, load (ETL) service for data integration and transformation. Data Factory can add transformation activities to a pipeline that include invoking a REST endpoint or running a notebook on the ingested data.

  • Azure Functions is a serverless compute service that can host event-driven workloads with short-lived processes.

  • Blob Storage is the object storage solution for raw files in this scenario. Blob Storage supports libraries for multiple languages, such as .NET, Node.js, and Python. Applications can access files on Blob Storage via HTTP/HTTPS. Blob Storage has hot, cool, and archive access tiers to support cost optimization for storing large amounts of data.

  • Data Lake Storage is a set of capabilities built on Azure Blob Storage for big data analytics. Data Lake Storage retains the cost effectiveness of Blob Storage, and provides features like file-level security and file system semantics with hierarchical namespace.

  • Document Intelligence is part of Azure AI services. Document Intelligence has built-in document analysis capabilities that you can use to extract printed and handwritten text, tables, and key-value pairs. Document Intelligence has prebuilt models for extracting data from invoices, documents, receipts, ID cards, and business cards. Document Intelligence also has a custom template form model and a custom neural document model that you can use to train and deploy custom models.

  • Document Intelligence Studio provides a UI that you can use to explore Document Intelligence features and models, and to build, tag, train, and deploy custom models.

  • Azure AI Language consolidates the Azure natural language processing services. The suite offers prebuilt and customizable options. For more information, see the Azure AI Language available features.

    Language Studio provides a UI for exploring and analyzing Azure AI Language features. Language Studio also provides options for building, tagging, training, and deploying custom models.

  • Azure Machine Learning is an open platform for managing machine learning model development and deployment at scale.

    • Azure Machine Learning studio provides data labeling options for images and text.
    • Export labeled data as COCO or Azure Machine Learning datasets. You can use the datasets for training and deploying models in Azure Machine Learning notebooks.
    • Deploy models to AKS as a web service for real-time inferencing at scale, or as managed endpoints for both real-time and batch inferencing.

Alternatives

You can add more workflows to this scenario based on specific use cases.

  • If the document is in image or PDF format, you can extract the data by using Azure computer vision, Document Intelligence Read API, or open-source libraries.

  • You can do document and conversation summarization by using the prebuilt model in Azure AI Language.

  • Use preprocessing code to perform text processing steps. These steps include cleaning, stop words removal, lemmatization, stemming, and text summarization on extracted data according to document processing requirements. You can expose the code as REST APIs for automation. Do these steps manually or automate them by integrating with the Logic Apps or Azure Functions ingestion process.

  • You can use Azure AI Studio to fine-tune and deploy foundation models.

Scenario details

Document processing is a broad area. It can be difficult to meet all your document processing needs with the prebuilt models available in Document Intelligence and Azure AI Language. You might need to build custom models to automate document processing for different applications and domains.

Major challenges in model customization include:

  • Labeling or tagging text data with relevant key-value pair entities to classify text for extraction.
  • Deploying models securely at scale for easy integration with consuming applications.

Potential use cases

The following use cases can take advantage of custom models for document processing:

  • Build custom NER and text classification models based on open-source frameworks.
  • Extract custom key-values from documents for various industry verticals like insurance and healthcare.
  • Tag and extract specific domain-dependent entities beyond the prebuilt NER models, for domains like security or finance.
  • Create custom tables from documents.
  • Extract signatures.
  • Label and classify emails or other documents based on content.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

For this example workload, implementing each pillar depends on optimally configuring and using each component Azure service.

Reliability

Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the reliability pillar.

Availability

Resiliency

Security

Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.

Cost optimization

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.

The total cost of implementing this solution depends on the pricing of the services you choose.

The major costs for this solution are:

For more information on pricing for specific components, see the following resources:

Use the Azure pricing calculator to add your selected component options and estimate the overall solution cost.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Performance efficiency pillar overview.

Scalability

Contributors

This article is maintained by Microsoft. It was originally written by the following contributor.

Principal author:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps