Ocr Read - best config for speed

Constantinescu Andrei 21 Reputation points
2022-02-22T17:26:10.793+00:00

Hello,

We are using the Read OCR Docker containers.

We have large volume of documents we want to ocr-ize, in a batch fashion.
We would need to be able to run ocr on 1000 document-pages per hour.

For now it takes 10 sec per document page.
So we can run ocr on roughly 360 documents per hour.

What can we do to improve speed ?

Is it possible to run the docker on gpu instead of cpu ?
Can we leverage multi core CPUs? If yes what is the best way to do do?
Is it possible to have several docker instances running on the same server ?
If yes, what is the limit of number of instances as a function of the server ram, cpu, and or gpu ?

Thank you for your answers,

Andrei

Computer Vision
Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
415 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,602 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 53,966 Reputation points Moderator
    2022-02-23T00:11:52.827+00:00

    @Constantinescu Andrei

    Thanks for reaching out to us. I understand currently you are working on the OCR Read Container and you want to improve the speed.

    First, I hope you are using the latest V3.2 container, this will make sure you have the better experience generally.

    When you are working on big throughput, as you mentioned, multipage document, a good way to try is leveraging multiple containers on a Kubernetes cluster, using Azure Storage and Azure Queue.

    Starting in v3 of the container, you can use the containers in parallel on both a task and page level.

    By design, each v3 container has a dispatcher and a recognition worker. The dispatcher is responsible for splitting a multi-page task into multiple single page sub-tasks. The recognition worker is optimized for recognizing a single page document. To achieve page level parallelism, deploy multiple v3 containers behind a load balancer and let the containers share a universal storage and queue.

    Please refer to the document for the step by step guidance, I hope this helps!
    https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/deploy-computer-vision-on-premises#deploy-multiple-v3-containers-on-the-kubernetes-cluster

    Regards,
    Yutong

    -Please kindly accept the answer if you feel helpful, thanks!

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.