An Azure service that provides an event-driven serverless compute platform.
Hi Priya,
When an Azure Blob Storage triggered function fails during intensive back-end workflows (such as PDF extraction via pdfminer, OCR processing, or external OpenAI API orchestration), it typically throws an unhandled exception. This bubbles up to the Azure Functions runtime host as an Internal Server Error (500) and triggers the built-in retry safety mechanism.
According to the official Microsoft Documentation on Azure Blob Storage Triggers for Azure Functions:
"When a blob trigger function fails for a given blob, Azure Functions retries that function up to 5 times by default (including the first try). If all 5 tries fail, Azure Functions adds a message to a Storage queue named
webjobs-blobtrigger-poison."
The message written to your poison queue is a structured JSON payload detailing the failure:
"The queue message for poison blobs is a JSON object that contains the following properties:
FunctionId,BlobType,ContainerName,BlobName,ETag"
Recommended Resolution Steps
- Implement Robust Python Try-Except Blocks Unhandled exceptions cause the function host to fail the entire invocation, resulting in the 5-try loop. You should handle expected errors gracefully within your Python function. If a specific document fails OCR or returns malformed JSON from OpenAI, catch the error, log it, and complete the function successfully (or move the file to a custom "quarantine" container manually).
- Review Logs via Application Insights (Kusto Query) Since Python Azure Functions execute on Linux, standard Windows file system paths like Kudu's
D:\homedo not apply. Instead, navigate to your linked Application Insights resource and execute a log query to see full stack traces:exceptions | where timestamp > ago(1h) | order by timestamp desc - Guard Against Memory Exhaustion (OOM) and Timeouts
- Memory: Large PDF files parsed completely in memory can cause the underlying Linux container worker to terminate unexpectedly. If you are binding directly to
bytesorstring, switch to a Stream-based processing approach. - Timeouts: External requests to OpenAI or OCR engines can time out. Ensure you pass an explicit
timeoutparameter to your HTTP/SDK client calls and handleTimeoutErrorexceptions.
- Memory: Large PDF files parsed completely in memory can cause the underlying Linux container worker to terminate unexpectedly. If you are binding directly to
Please check above and share your findings.