Optimizing Azure Document Intelligence Layout Model performance on Google Cloud Run

Question

Optimizing Azure Document Intelligence Layout Model performance on Google Cloud Run

Adham Elarabawy 0

Hello,

I'm experimenting with Azure document intelligence (specifically the layout model) to parse a wide variety of PDF documents. I'm using the 2022-08-31 api version, deployed via docker image (on GCP) on to Google Cloud Run (maxed out pod sizes of 8cpus/32gb memory). I have a few questions:

The layout model seems to take a very long time to process pdf documents with length > 100. I've had certain calls take 5-10 minutes, which is unfortunately too long for my usecase. I have managed to circumvent this by parallelizing Layout model calls on a per-page basis (i.e. only sending ~3 pages per model call). This still takes approximately 20+ seconds to process longer documents, and I wanted to see if I could drive this down further. Are there any flags/options that allow me to enable/disable table extraction and other auxilliary features so that I can speed up my calls? I'm mainly interested in extracting paragraph roles (title/subheader/paragraph etc). Essentially any knob I can turn to speed the model call up!
The cold start time per pod is ~15 seconds. Is there any way to speed this up? Perhaps an optimized docker image?
It seems as though the async endpoint stalls endlessly. I've been using the synchronous endpoint instead (and using threads to parallelize calls). Is this a known issue? Is there a fix?

Thank you!

Azar 29,520 Reputation points MVP Volunteer Moderator

2023-10-21T01:30:14.7233333+00:00

Hi @Adham Elarabawy

1- Try extracting paragraph roles by customizing the layout model's behavior. Use the tableExtraction and pageRange parameters to limit table extraction, Since you're already parallelizing your calls on a per-page basis, Try parallelization on a paragraph level for lengthy documents. This can further speed up processing.

2- To reduce cold start times on Google Cloud Run, optimize your Docker image. Keep it as lightweight as possible by removing unnecessary dependencies and files.

3- If you're experiencing issues with the async endpoint, double-check your implementation for correct polling and error handling.

If the endpoints issue persists I recommend you to raise a support ticket from microsoft.

If this answer helps kindly accept the answer thanks much
romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2023-10-23T08:48:36.97+00:00

@Adham Elarabawy I would suggest to use the flags for memory and cpu that are supported by docker run and compose commands if you have not tried them already. For example, --memory 32g --cpus 8

The additional parameters available should be same as the AnalyzeDocument REST API reference. Looks like you already used the pages parameter to limit the number of pages to be analyzed in your document. The AnalyzeDocument API is async in nature which should accept the result and provide the result URL to retrieve the results. Are you seeing a different response?
Adham Elarabawy 0 Reputation points

2023-10-23T16:20:50.4166667+00:00
@romungi-MSFT I have already increased the flags for memory and cpu that are supported by docker run/compose commands.

My questions boil down to two things:

Is there any way to optimize the docker image for the layout model to improve cold start time?

Are there any flags I can pass (other than the page limiting) to increase throughput/speed? I want to be able to run this at even lower latency and 32g/8cpus is still not fast enough for my use case.

It seems as though the latest available container is only the v3.0 version (the 2022-08-31 release). It seems as though the 3.1 version is more optimized/faster/better, and I was wondering if/when that is available!

Your answer

Azar 29,520 Reputation points MVP Volunteer Moderator

2023-10-21T01:30:14.7233333+00:00

Hi @Adham Elarabawy

1- Try extracting paragraph roles by customizing the layout model's behavior. Use the tableExtraction and pageRange parameters to limit table extraction, Since you're already parallelizing your calls on a per-page basis, Try parallelization on a paragraph level for lengthy documents. This can further speed up processing.

2- To reduce cold start times on Google Cloud Run, optimize your Docker image. Keep it as lightweight as possible by removing unnecessary dependencies and files.

3- If you're experiencing issues with the async endpoint, double-check your implementation for correct polling and error handling.

If the endpoints issue persists I recommend you to raise a support ticket from microsoft.

If this answer helps kindly accept the answer thanks much
romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2023-10-23T08:48:36.97+00:00

@Adham Elarabawy I would suggest to use the flags for memory and cpu that are supported by docker run and compose commands if you have not tried them already. For example, --memory 32g --cpus 8

The additional parameters available should be same as the AnalyzeDocument REST API reference. Looks like you already used the pages parameter to limit the number of pages to be analyzed in your document. The AnalyzeDocument API is async in nature which should accept the result and provide the result URL to retrieve the results. Are you seeing a different response?
Adham Elarabawy 0 Reputation points

2023-10-23T16:20:50.4166667+00:00

@romungi-MSFT I have already increased the flags for memory and cpu that are supported by docker run/compose commands.

My questions boil down to two things:

Is there any way to optimize the docker image for the layout model to improve cold start time?

Are there any flags I can pass (other than the page limiting) to increase throughput/speed? I want to be able to run this at even lower latency and 32g/8cpus is still not fast enough for my use case.

It seems as though the latest available container is only the v3.0 version (the 2022-08-31 release). It seems as though the 3.1 version is more optimized/faster/better, and I was wondering if/when that is available!

Share via

Optimizing Azure Document Intelligence Layout Model performance on Google Cloud Run

Your answer