Ocr Read - best config for speed

Question

Ocr Read - best config for speed

Constantinescu Andrei 21

Hello,

We are using the Read OCR Docker containers.

We have large volume of documents we want to ocr-ize, in a batch fashion.
We would need to be able to run ocr on 1000 document-pages per hour.

For now it takes 10 sec per document page.
So we can run ocr on roughly 360 documents per hour.

What can we do to improve speed ?

Is it possible to run the docker on gpu instead of cpu ?
Can we leverage multi core CPUs? If yes what is the best way to do do?
Is it possible to have several docker instances running on the same server ?
If yes, what is the limit of number of instances as a function of the server ram, cpu, and or gpu ?

Thank you for your answers,

Andrei

Accepted answer

0 additional answers

Your answer

Answer 1

@Constantinescu Andrei

Thanks for reaching out to us. I understand currently you are working on the OCR Read Container and you want to improve the speed.

First, I hope you are using the latest V3.2 container, this will make sure you have the better experience generally.

When you are working on big throughput, as you mentioned, multipage document, a good way to try is leveraging multiple containers on a Kubernetes cluster, using Azure Storage and Azure Queue.

Starting in v3 of the container, you can use the containers in parallel on both a task and page level.

By design, each v3 container has a dispatcher and a recognition worker. The dispatcher is responsible for splitting a multi-page task into multiple single page sub-tasks. The recognition worker is optimized for recognizing a single page document. To achieve page level parallelism, deploy multiple v3 containers behind a load balancer and let the containers share a universal storage and queue.

Please refer to the document for the step by step guidance, I hope this helps!
https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/deploy-computer-vision-on-premises#deploy-multiple-v3-containers-on-the-kubernetes-cluster

Regards,
Yutong

-Please kindly accept the answer if you feel helpful, thanks!

CoreyK-2554 20 Reputation points

2024-04-03T13:37:36.9733333+00:00

Hi Yutong,

Based on your response it appears the recommended solution is to run multiple containers in order to parallelize the documents/pages/tasks. Is it possible to run multiple documents/pages in parallel within a single container, or must the desired number of parallel tasks equal the number of concurrently running containers?

Thank you,

Corey

Share via

Ocr Read - best config for speed

0 additional answers

Your answer