An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
Hello Nicolas!
When using Azure AI Document Intelligence (formerly Azure Form Recognizer) on PDFs with a lot of text, getting an InternalServerError—especially when enabling both "High Resolution+Style" and Markdown output—is not uncommon. Here’s what you need to know:
What’s Happening:
-These enhanced modes (High Resolution, Style extraction, Markdown output) significantly increase the processing demand, especially on documents with substantial text or complex layouts.
-The InternalServerError signal usually means you’ve hit a backend resource or size limitation, or the output format selected is too much for the computation/session quota per request.
Is it expected?
-Yes, to some extent. Microsoft’s own documentation notes that using several advanced extraction flags on large or complicated documents can exceed internal limits, causing 5xx server errors.
How to work around this:
-Reduce Document Size: If possible, break your large PDF into smaller documents or process only a few pages at a time.
-Disable One Feature: As you discovered, toggling off either "High Resolution" or "Style" (or switching from Markdown to text) allows the run to succeed because you’re lowering processing complexity.
-Retry on Smaller Batches: For large-scale automation, implement retry logic in your solution to reprocess failed files with easier settings.
-Check Service Limits: Review the official Azure AI Document Intelligence quotas and limits for your resource tier.
Recommendation: For now, when dealing with large PDFs, use only the features you absolutely need for your scenario, or process them in chunks if possible. Keep an eye on Microsoft’s release notes and documentation for improvements, as support for more robust processing of large/high-fidelity documents continues to improve.
Best regards,
Jerald Felix