Azure Function Host Crashes When Parsing PDF Documents with Docling Library

David Tschirky 1 Reputation point
2025-05-02T09:04:59.7766667+00:00

We've started using this python library docling to parse documents. Locally, that works perfectly. On Azure though, we experience the issue that parsing PDFs fails and makes the function host crash. This results in a downtime for the whole function, until it has restarted. Also, I can't find any helpful information as to why parsing PDFs makes the runtime crash. Parsing Word and Powerpoint documents works without any issues on Azure.

As part of the investigation, I've added Application Insights OpenTelemetry instrumentation to the app.

Insights gathered while trying to figure out the core issue:

  1. Only PDFs are affected. From a docling perspective, this is reasonable because the processing pipeline for PDFs is quite different than the one for Word and PowerPoint.
  2. In order to parse PDFs, docling is downloading appropriate AI models on demand. These downloads are successful, as can be observed in Application Insights.
  3. It looks like the failure happens in the moment when we attempt the document conversion (having collected the documents and the necessary AI models to do so).
  4. Once the function makes the function host crash (silently, i.e. no exception visible), one can observe the startup logs from a new instance of the same function. As part of these startup logs, the following exception shows up (usually 3 times):
    1. Transport endpoint is not connected : '/home/site/wwwroot/host.json'
    2. Because of that exception, the startup of the function immediately after the crash is not successful and it will take one or two tries to successfully spin up the function again.

Here are some of our environment details:

  • We're using Azure functions for Python v2
  • We're using python 3.11
  • The app currently runs on a consumption plan.
  • Dependencies
      azure-functions
      requests
      azure-identity
      
      # to fix (another) deployment issue on azure
      cryptography==43.0.3
      
      # Search
      azure-search-documents
      
      # Ai Service
      openai
      pydantic
      
      # OCR
      docling==2.31.0
      
      # graph ql client
      gql[all]
      
      # Monitoring
      azure-core-tracing-opentelemetry
      azure-monitor-opentelemetry
    
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,909 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Khadeer Ali 5,990 Reputation points Microsoft External Staff Moderator
    2025-05-20T11:07:01.52+00:00

    @David Tschirky ,

    Summarizing the discussion so far:

    The PDF parsing issue you're facing on the Azure Functions Consumption Plan appears to be caused by resource limitations — particularly memory and lack of support for advanced features like preloading models or using warm-up triggers (which are only available on the Premium Plan). You've confirmed that switching to the Premium Plan resolves the issue, which reinforces that it's resource-related.

    As an alternative to the Premium Plan, you might want to explore the Dedicated (App Service) Plan, which offers more consistent compute resources and greater control compared to the Consumption Plan. It could be a good middle ground, especially for your DEV environment.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for "Was this answer helpful." And if you have any further questions, let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.