Throughput for Computer Vision API

Anonymous
2022-12-07T02:35:48.933+00:00

Hi,

I am trying to run OCR on PDF's as part of skills pipeline in Cognitive Search (Standard2, 2 Replica's, 2 Partition, 4 Search Units). 20 pdf's with 50 pages each, its taking 20 mins to process. Following is my skillset definition. I did create a cognitive services resource and associated it with skillset so that I am getting bound by free tier. Is there any way I can speed up the processing. I would like to understand the max throughput I can achieve with documents of these sizes. What are the key configuration parameters for speeding up the cognitive pipeline? I have used batch size and degreeofParallelism for custom skills but I do not have the same option for Ocr Skill.

267966-skillset-perf.txt268001-perf-indexer.txt

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,062 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ajkuma 26,636 Reputation points Microsoft Employee
    2022-12-07T08:40:19.897+00:00

    MantuSingh-3347, Thanks for the question!

    Based on my understanding (from the sample), you are leveraging this Skill Microsoft.Skills.Vision.OcrSkill" :
    268089-image.png

    It by default, uses free one. The free enrichments are 20 documents per day, per indexer.
    Based on your requirement, you may add a new cognitive service. Attach Cognitive Services to a skillset - Azure Cognitive Search | Microsoft Learn
    268192-image.png

    The max throughput depends on the search unit you have, see Estimate capacity for query and index workloads - Azure Cognitive Search | Microsoft Learn –. Also, check out the doc section for “Tips for capacity planning” for more info.

    Kindly let us know how it goes, we will be more than happy to assist you further.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.