Automate document processing by using Azure Form Recognizer

Cognitive Search
Cognitive Services
Cosmos DB
Form Recognizer
Machine Learning

This article outlines a scalable and secure solution for building an automated document processing pipeline. The solution uses Azure Form Recognizer for the structured extraction of data. Natural language processing (NLP) models and custom models enrich the data.

Architecture

Architecture diagram that shows how data flows through the extraction, enrichment, and analytics stages of document processing.

Download a Visio file of this architecture.

Dataflow

The following sections describe the various stages of the data extraction process.

Data ingestion and extraction

  1. Documents are ingested through a browser at the front end of a web application. The documents contain images or are in PDF format. Azure App Service hosts a back-end application. The solution routes the documents to that application through Azure Application Gateway. This load balancer runs with Azure Web Application Firewall, which helps to protect the application from common attacks and vulnerabilities.

  2. The back-end application posts a request to a Form Recognizer REST API endpoint that uses one of these models:

    The response from Form Recognizer contains raw OCR data and structured extractions. Form Recognizer also assigns confidence values to the extracted data.

  3. The App Service back-end application uses the confidence values to check the extraction quality. If the quality is below a specified threshold, the app flags the data for manual verification. When the extraction quality meets requirements, the data enters Azure Cosmos DB for downstream application consumption. The app can also return the results to the front-end browser.

  4. Other sources provide images, PDF files, and other documents. Sources include email attachments and File Transfer Protocol (FTP) servers. Tools like Azure Data Factory and AzCopy transfer these files to Azure Blob Storage. Azure Logic Apps offers pipelines for automatically extracting attachments from emails.

  5. When a document enters Blob Storage, an Azure function is triggered. The function:

    • Posts a request to the relevant Form Recognizer pre-built endpoint.
    • Receives the response.
    • Evaluates the extraction quality.
  6. The extracted data enters Azure Cosmos DB.

Data enrichment

The pipeline that's used for data enrichment depends on the use case.

  1. Data enrichment can include the following NLP capabilities:

    • Named entity recognition (NER)
    • The extraction of personally identifiable information (PII), key phrases, health information, and other domain-dependent entities

    To enrich the data, the web app:

  2. Custom models perform fraud detection, risk analysis, and other types of analysis on the data:

    • Azure Machine Learning services train and deploy the custom models.
    • The extracted data is retrieved from Azure Cosmos DB.
    • The models derive insights from the data.

    These possibilities exist for inferencing:

  3. The enriched data enters Azure Cosmos DB.

Analytics and visualizations

  1. Applications use the raw OCR, structured data from Form Recognizer endpoints, and the enriched data from NLP:

    • Power BI displays the data and presents reports on it.
    • The data functions as a source for Azure Cognitive Search.
    • Other applications consume the data.

Components

  • App Service is a platform as a service (PaaS) offering on Azure. You can use App Service to host web applications that you can scale in or scale out manually or automatically. The service supports various languages and frameworks, such as ASP.NET, ASP.NET Core, Java, Ruby, Node.js, PHP, and Python.

  • Application Gateway is a layer-7 (application layer) load balancer that manages traffic to web applications. You can run Application Gateway with Azure Web Application Firewall to help protect web applications from common exploits and vulnerabilities.

  • Azure Functions is a serverless compute platform that you can use to build applications. With Functions, you can use triggers and bindings to react to changes in Azure services like Blob Storage and Azure Cosmos DB. Functions can run scheduled tasks, process data in real time, and process messaging queues.

  • Form Recognizer is part of Azure Applied AI Services. Form Recognizer offers a collection of pre-built endpoints for extracting data from invoices, documents, receipts, ID cards, and business cards. This service maps each piece of extracted data to a field as a key-value pair. Form Recognizer also extracts table content and structure. The output format is JSON.

  • Azure Storage is a cloud storage solution that includes object, blob, file, disk, queue, and table storage.

  • Blob Storage is a service that's part of Azure Storage. Blob Storage offers optimized cloud object storage for large amounts of unstructured data.

  • Azure Data Lake Storage is a scalable, secure data lake for high-performance analytics workloads. The data typically comes from multiple heterogeneous sources and can be structured, semi-structured, or unstructured. Azure Data Lake Storage Gen2 combines Azure Data Lake Storage Gen1 capabilities with Blob Storage. As a next-generation solution, Data Lake Storage Gen2 provides file system semantics, file-level security, and scale. But it also offers the tiered storage, high availability, and disaster recovery capabilities of Blob Storage.

  • Azure Cosmos DB is a fully managed, highly responsive, scalable NoSQL database. Azure Cosmos DB offers enterprise-grade security and supports APIs for many databases, languages, and platforms. Examples include SQL, MongoDB, Gremlin, Table, and Apache Cassandra. Serverless, automatic scaling options in Azure Cosmos DB efficiently manage capacity demands of applications.

  • Azure Cognitive Service for Language offers many NLP services that you can use to understand and analyze text. Some of these services are customizable, such as custom NER, custom text classification, conversational language understanding, and question answering.

  • Machine Learning is an open platform for managing the development and deployment of machine-learning models at scale. Machine Learning caters to skill levels of different users, such as data scientists or business analysts. The platform supports commonly used open frameworks and offers automated featurization and algorithm selection. You can deploy models to various targets. Examples include AKS, Azure Container Instances as a web service for real-time inferencing at scale, and Azure Virtual Machine for batch scoring. Managed endpoints in Machine Learning abstract the required infrastructure for real-time or batch model inferencing.

  • AKS is a fully managed Kubernetes service that makes it easy to deploy and manage containerized applications. AKS offers serverless Kubernetes technology, an integrated continuous integration and continuous delivery (CI/CD) experience, and enterprise-grade security and governance.

  • Power BI is a collection of software services and apps that display analytics information.

  • Azure Cognitive Search is a cloud search service that supplies infrastructure, APIs, and tools for searching. You can use Azure Cognitive Search to build search experiences over private, heterogeneous content in web, mobile, and enterprise applications.

Alternatives

Scenario details

Automating document processing and data extraction is an integral task in organizations across all industry verticals. AI is one of the proven solutions in this process, although achieving 100 percent accuracy is a distant reality. But, using AI for digitization instead of purely manual processes can reduce manual effort by up to 90 percent.

Optical character recognition (OCR) can extract content from images and PDF files, which make up most of the documents that organizations use. This process uses key word search and regular expression matching. These mechanisms extract relevant data from full text and then create structured output. This approach has drawbacks. Revising the post-extraction process to meet changing document formats requires extensive maintenance effort.

Potential use cases

This solution is ideal for the finance industry. It can also apply to the automotive, travel, and hospitality industries. The following tasks can benefit from this solution:

  • Approving expense reports
  • Processing invoices, receipts, and bills for insurance claims and financial audits
  • Processing claims that include invoices, discharge summaries, and other documents
  • Automating statement of work (SoW) approvals
  • Automating ID extraction for verification purposes, as with passports or driver licenses
  • Automating the process of entering business card data into visitor management systems
  • Identifying purchase patterns and duplicate financial documents for fraud detection

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

Keep these points in mind when you use this solution.

Availability

The availability of the architecture depends on the Azure services that make up the solution:

Scalability

Security

Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.

  • Azure Web Application Firewall helps protect your application from common vulnerabilities. This Application Gateway option uses Open Web Application Security Project (OWASP) rules to prevent attacks like cross-site scripting, session hijacks, and other exploits.

  • To improve App Service security, consider these options:

    • App Service can access resources in Azure Virtual Network through virtual network integration.
    • You can use App Service in an app service environment (ASE), which you deploy to a dedicated virtual network. This approach helps to isolate the connectivity between App Service and other resources in the virtual network.

    For more information, see Security in Azure App Service.

  • Blob Storage and Azure Cosmos DB encrypt data at rest. You can secure these services by using service endpoints or private endpoints.

  • Azure Functions supports virtual network integration. By using this functionality, function apps can access resources inside a virtual network. For more information, see Azure Functions networking options.

  • You can configure Form Recognizer and Azure Cognitive Service for Language for access from specific virtual networks or from private endpoints. These services encrypt data at rest. You can use subscription keys, tokens, or Azure Active Directory (Azure AD) to authenticate requests to these services. For more information, see Authenticate requests to Azure Cognitive Services.

  • Machine Learning offers many levels of security:

Resiliency

Cost optimization

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.

The cost of implementing this solution depends on which components you use and which options you choose for each component.

Many factors can affect the price of each component:

  • The number of documents that you process
  • The number of concurrent requests that your application receives
  • The size of the data that you store after processing
  • Your deployment region

These resources provide information on component pricing options:

After deciding on a pricing tier for each component, use the Azure Pricing calculator to estimate the solution cost.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

Next steps