Azure Function App Backend Processing Failure Causing 500 Internal Server Error

Question

Azure Function App Backend Processing Failure Causing 500 Internal Server Error

Priya Kumari 0

We are facing a 500 Internal Server Error in the RECEIPT UTILITY application. The frontend file upload is working correctly, and files are successfully uploaded to Azure Blob Storage.

The issue occurs during backend processing in the Azure Function App, where the uploaded file content is extracted and processed.

During investigation, we checked the Function App health, runtime status, and Diagnose and Solve Problems section. We found multiple failures in the following functions:

blob_trigger

dequeue

purge_blob_data

We also observed that messages are moving to poison queues after multiple retry failures, which indicates that the backend function is failing internally during execution.

Based on the current investigation, the issue may be related to one of the following:

Python dependency or package issue

Azure Function runtime compatibility issue

OCR / OpenAI / PDF extraction processing failure

Missing or incompatible libraries in the deployment package

Backend code exception during file processing

Could you please help us identify the root cause and guide us on how to resolve this 500 Internal Server Error in the Azure Function App?

Rakesh Mishra 9,340 Reputation points Microsoft External Staff Moderator

2026-05-19T23:30:55.9966667+00:00

Hi Priya, following up to see if you had a chance to check my previous response and if it was helpful. Please do let me know if you're still facing the issue and need any further assistance on this.
Priya Kumari 0 Reputation points

2026-05-20T05:21:11.3566667+00:00

Hi, @Rakesh Mishra thanks for the follow-up. Your guidance was really helpful and helped me identify the actual root cause.

After checking the Application Insights exception logs, I found that the Azure Function was failing during the Cosmos DB upsert operation with:

CosmosResourceNotFoundError: Resource Not Found

Further investigation showed that the Stage Cosmos DB database eatedb is missing the required containers, while Dev already contains them. Because of this, the blob trigger function was failing repeatedly and messages were moving to poison queues.

I also attempted to create the missing containers from my end, but currently I do not have the required Cosmos DB permissions for container creation.

Thanks again for your helpful inputs.
Rakesh Mishra 9,340 Reputation points Microsoft External Staff Moderator

2026-05-20T18:51:22.8266667+00:00

Hi @Priya Kumari , Glad to hear that issue is identified and I'm sure they will be fixed also. Please do let me know if you need any further assistance on this. Please accept as Yes if the answer is helpful so that it can help others in the community.

2 answers

Your answer

Rakesh Mishra 9,340 Reputation points Microsoft External Staff Moderator

2026-05-19T23:30:55.9966667+00:00

Hi Priya, following up to see if you had a chance to check my previous response and if it was helpful. Please do let me know if you're still facing the issue and need any further assistance on this.
Priya Kumari 0 Reputation points

2026-05-20T05:21:11.3566667+00:00

Hi, @Rakesh Mishra thanks for the follow-up. Your guidance was really helpful and helped me identify the actual root cause.

After checking the Application Insights exception logs, I found that the Azure Function was failing during the Cosmos DB upsert operation with:

CosmosResourceNotFoundError: Resource Not Found

Further investigation showed that the Stage Cosmos DB database eatedb is missing the required containers, while Dev already contains them. Because of this, the blob trigger function was failing repeatedly and messages were moving to poison queues.

I also attempted to create the missing containers from my end, but currently I do not have the required Cosmos DB permissions for container creation.

Thanks again for your helpful inputs.
Rakesh Mishra 9,340 Reputation points Microsoft External Staff Moderator

2026-05-20T18:51:22.8266667+00:00

Hi @Priya Kumari , Glad to hear that issue is identified and I'm sure they will be fixed also. Please do let me know if you need any further assistance on this. Please accept as Yes if the answer is helpful so that it can help others in the community.

Answer 1

Hi Priya,

When an Azure Blob Storage triggered function fails during intensive back-end workflows (such as PDF extraction via pdfminer, OCR processing, or external OpenAI API orchestration), it typically throws an unhandled exception. This bubbles up to the Azure Functions runtime host as an Internal Server Error (500) and triggers the built-in retry safety mechanism.

According to the official Microsoft Documentation on Azure Blob Storage Triggers for Azure Functions:

"When a blob trigger function fails for a given blob, Azure Functions retries that function up to 5 times by default (including the first try). If all 5 tries fail, Azure Functions adds a message to a Storage queue named webjobs-blobtrigger-poison."

The message written to your poison queue is a structured JSON payload detailing the failure:

"The queue message for poison blobs is a JSON object that contains the following properties: FunctionId, BlobType, ContainerName, BlobName, ETag"

Recommended Resolution Steps

Implement Robust Python Try-Except Blocks Unhandled exceptions cause the function host to fail the entire invocation, resulting in the 5-try loop. You should handle expected errors gracefully within your Python function. If a specific document fails OCR or returns malformed JSON from OpenAI, catch the error, log it, and complete the function successfully (or move the file to a custom "quarantine" container manually).
Review Logs via Application Insights (Kusto Query) Since Python Azure Functions execute on Linux, standard Windows file system paths like Kudu's D:\home do not apply. Instead, navigate to your linked Application Insights resource and execute a log query to see full stack traces:
```
   exceptions
   | where timestamp > ago(1h)
   | order by timestamp desc
```
Guard Against Memory Exhaustion (OOM) and Timeouts
- Memory: Large PDF files parsed completely in memory can cause the underlying Linux container worker to terminate unexpectedly. If you are binding directly to bytes or string, switch to a Stream-based processing approach.
- Timeouts: External requests to OpenAI or OCR engines can time out. Ensure you pass an explicit timeout parameter to your HTTP/SDK client calls and handle TimeoutError exceptions.

Please check above and share your findings.

Answer 2

AI answer

Use the failures and poison queues as evidence that the problem is inside the Function App runtime or code, not the front end or blob upload. Troubleshooting should focus on the function host, configuration, and Python worker.

Follow these steps end-to-end:

Use the built-in diagnostics for Function Apps
1. In the Azure portal, open the Function App.
2. Go to Diagnose and solve problems.
3. Run Function App Down or Reporting Errors (Preview).
4. Review the detectors for:
  - Host startup failures
  - HTTP 5xx errors
  - High CPU/memory or other runtime resource issues This tool aggregates common checks and can highlight configuration, runtime, or resource problems that cause 500 errors and failed executions.
Check for common configuration/runtime issues From the diagnostics and Function App settings, verify the typical causes listed under Issue categories:
- Application settings: confirm all required settings (connection strings, keys for OCR/OpenAI/PDF services, etc.) are present and correct.
- Storage account: ensure the Function App storage account is reachable and correctly configured; trigger and poison queues depend on it.
- Key Vault / managed identity: if secrets are in Key Vault, verify access policies/permissions for the Function App identity.
- Network: confirm the Function App can reach external services used for OCR/OpenAI/PDF extraction.
- Binding/trigger configuration: validate blob_trigger, dequeue, and purge_blob_data bindings (queue names, connection names, etc.). Misconfigured bindings can cause runtime failures and poison messages.
Inspect host and function logs for Python RpcException details
1. Open Monitor for each function (blob_trigger, dequeue, purge_blob_data) and review failed invocations.
2. Drill into a failed execution and inspect the exception details.
3. Look specifically for:
  - ClassNotFoundException / ModuleNotFoundError / ImportError indicating missing or incompatible Python packages.
  - Stack traces pointing to OCR/OpenAI/PDF libraries.
  - Errors during deserialization or binding (e.g., blob/queue payload issues). RpcException from the Python worker usually wraps a Python-side exception; the inner message identifies the root cause (dependency, code bug, or runtime mismatch).
Validate Python runtime and dependencies
- Confirm the Function App runtime version and supported languages using the Supported languages in Azure Function App documentation to ensure the selected Python version is supported by the Functions runtime.
- Ensure all required Python packages are deployed with the function (for consumption/elastic plans, typically via requirements.txt and proper build/deploy process).
- If recent changes were made (new OCR/OpenAI/PDF libraries), roll back or test a minimal version of the function that only logs the event without processing to confirm that the new dependency is the failure point.
Check for resource and runtime issues
- In Diagnose and solve problems, look for detectors indicating:
  - High CPU or memory consumption
  - Long-running or stuck functions
  - SNAT/TCP connection exhaustion
- For long-running OCR/OpenAI/PDF operations, ensure execution time stays within plan limits (for example, 230 seconds on Consumption plans). Timeouts can surface as 500 errors and retries that eventually push messages to poison queues.
Use poison queues to isolate failing payloads
- Inspect messages in the poison queues corresponding to blob_trigger and dequeue.
- Manually reprocess a single poison message in a controlled environment (local Functions host or a test Function App) with detailed logging to see exactly where the code fails.
- This helps distinguish between:
  - Data-specific issues (corrupt PDF, unsupported format)
  - Systemic issues (missing library, misconfiguration, runtime incompatibility)
If host startup or runtime is unreachable
- If diagnostics or portal show errors like Function host not running or Azure Function App runtime is unreachable, follow the guidance in the runtime troubleshooting article referenced by the diagnostics:
  - Verify storage account configuration and access.
  - Check for host startup failures or container allocation issues.
  - Use the Function App Down or Reporting Errors workflow to narrow down host-level problems.
When to escalate
- If diagnostics show host startup failures or internal runtime errors that are not explained by configuration or code, collect:
  - Browser trace while reproducing the issue.
  - Function logs and failed invocation IDs.
  - Any correlation IDs from deployment or runtime errors.
- Then open a support ticket with this data, as recommended for Function App issues.

This process will identify whether the 500 Internal Server Error is caused by:

Misconfiguration (storage, Key Vault, network, bindings)
Unsupported or misaligned Python/runtime versions
Missing or incompatible Python libraries (OCR/OpenAI/PDF)
Resource/time-limit issues causing retries and poison messages
A specific code path or data pattern that throws exceptions during processing.

References:

AI-generated content may be incorrect. Read our transparency notes for more information.

Share via

Azure Function App Backend Processing Failure Causing 500 Internal Server Error

2 answers

Your answer