Azure Vision (Text Extraction OCR) Issue

Acharya, Rakesh 6 Reputation points

I’m reaching out to you with a request around text extraction from scanned documents using Azure Vision API. We had built a solution to extract data from scanned documents (Architectural drawings), for which we are using Azure Vision API. The requirement is to extract Drawing Title, Drawing Number and Revision from Architectural drawings. For certain images the Azure api is unable to extract data (images with size less than 5kb).

Can you help us in getting this issue resolved? I can schedule the call to discuss this issue further.

Azure Computer Vision
Azure Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
285 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,221 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,521 Reputation points

    @Acharya, Rakesh Thanks for the question, Azure Cognitive Services provides Industry's best optical character recognition (OCR) capability with Read API. The Computer Vision Read API is Azure's latest OCR technology (learn what's new) that extracts printed text (in several languages), handwritten text (English only), digits, and currency symbols from images and multi-page PDF documents. It's optimized to extract text from text-heavy images and multi-page PDF documents with mixed languages. If possible can you please share the sample input images and the output that is unable to extract data.

    Also we have built a form recognition service seems promising for your application. Can you please try with the Form Recognizer Layout API that Detects and extracts text and layout of documents.

    In the following outlines the traditional challenges of doing OCR in the wild, and what are the ways in which deep learning algorithms are being applied to transform these solutions.
    Computer Vision
    Microsoft Form Recognizer
    Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Paper: LayoutLM: Pre-training of Text and Layout for Document Image Understanding