Share via

Constant latency on Document analyzer

Anonymous
2023-09-21T13:00:37.31+00:00

Hello

We are using Azure Form Recognizer (West Europe) to perform OCR on documents.

We're using the prebuild-read model via the Python SDK. We pass any api version to the contructor, which means api version should be DocumentAnalysisApiVersion.V2022_08_31.

We have 2 use cases: OCR over an image of a pdf document, and OCR over a pdf document. When sending the image, it has been extracted with 150 dpi.

In both cases, we are experiencing some sort of cold start. Each query takes at least 5 seconds. Below are some statistics we ran over a single document:

Azure Form Recognizer (image)

Page IndexDuration (sec)words detected by OCR15.9444325.7248435.76359411.712338512.09221465.7561275.6743285.6050195.44142105.63318115.66567125.57417135.46112# Azure Form Recognizer (document + page range) Page index rangeDuration (sec)words detected by OCR1-15.914511-26.139391-36.4312961-412.3436351-513.0258541-613.4964691-715.3169031-815.9374041-914.3175461-1014.5278641-1114.3284331-1214.9288511-1314.738963What is surprising is that even it would really take 5 sec per page, when sending the document to the API with the first 3 pages should give us 15 sec, not 6.43. That's the reason it looks like there is some cold start.

Any idea?

Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform


1 answer

Sort by: Most helpful
  1. Anonymous
    2023-09-22T08:30:04.0133333+00:00

    Hello @Anonymous

    I tried with both versions: 2022-08-31 and 2023-07-31. Other versions stated in the document doesn't apply for this kind request. The results are identical for both versions.

    Unfortunately the document I'm using cannot be shared.

    There are 13 pages in the document. The pricing tier is : S0 Standard.

    I'm sharing with you another document: Conditions d’utilisation Microsoft Learn-1.pdf

    The document is a public one, I'm experiencing the same issues:

    Time OCR Azure one page

    Page Duration (sec) nb words
    1 5.47 272
    2 5.45 473
    3 5.48 457
    4 5.52 431
    5 5.45 470
    6 5.42 445
    7 5.47 465
    8 5.40 426
    9 5.82 484
    10 5.68 380

    Time OCR Azure from document

    Page Range Duration (sec) nb words
    1-1 5.57 272
    1-2 5.63 745
    1-3 5.68 1202
    1-4 5.72 1633
    1-5 5.88 2103
    1-6 5.89 2548
    1-7 5.99 3013
    1-8 5.92 3439
    1-9 6.02 3923
    1-10 6.18 4303

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.