Ocr Read - best config for speed

Constantinescu Andrei 21 Reputation points
2022-02-22T17:26:10.793+00:00

Hello,

We are using the Read OCR Docker containers.

We have large volume of documents we want to ocr-ize, in a batch fashion.
We would need to be able to run ocr on 1000 document-pages per hour.

For now it takes 10 sec per document page.
So we can run ocr on roughly 360 documents per hour.

What can we do to improve speed ?

Is it possible to run the docker on gpu instead of cpu ?
Can we leverage multi core CPUs? If yes what is the best way to do do?
Is it possible to have several docker instances running on the same server ?
If yes, what is the limit of number of instances as a function of the server ram, cpu, and or gpu ?

Thank you for your answers,

Andrei

Azure Computer Vision
Azure Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
379 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,890 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 52,596 Reputation points
    2022-02-23T00:11:52.827+00:00

    @Constantinescu Andrei

    Thanks for reaching out to us. I understand currently you are working on the OCR Read Container and you want to improve the speed.

    First, I hope you are using the latest V3.2 container, this will make sure you have the better experience generally.

    When you are working on big throughput, as you mentioned, multipage document, a good way to try is leveraging multiple containers on a Kubernetes cluster, using Azure Storage and Azure Queue.

    Starting in v3 of the container, you can use the containers in parallel on both a task and page level.

    By design, each v3 container has a dispatcher and a recognition worker. The dispatcher is responsible for splitting a multi-page task into multiple single page sub-tasks. The recognition worker is optimized for recognizing a single page document. To achieve page level parallelism, deploy multiple v3 containers behind a load balancer and let the containers share a universal storage and queue.

    Please refer to the document for the step by step guidance, I hope this helps!
    https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/deploy-computer-vision-on-premises#deploy-multiple-v3-containers-on-the-kubernetes-cluster

    Regards,
    Yutong

    -Please kindly accept the answer if you feel helpful, thanks!

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.