Hi ,
Thanks for reaching out to Microsoft Q&A.
You are right in your observation—when using Azure Document Intelligence (formerly Form Recognizer), especially the prebuilt models like prebuilt-receipt
, the raw OCR output is not directly returned as-is along with the structured data.
However, here is the practical breakdown of your options:
- Behavior of Prebuilt Models (
prebuilt-receipt
)
- The prebuilt models do run OCR under the hood.
- They extract structured fields (merchant name, total, tax, items).
Additional text present on the receipt but not part of the structured schema may not be exposed in the response (this is what you are seeing as highlighted in yellow in the studio).
Issue: This "extra" text is only partially exposed and does not include bounding boxes or full OCR data. It is not included in the API response unless specifically accessed.
- Why using both Read & Receipt model is wasteful
You are absolutely right. Using both:
Read
model: gives you raw OCR text and layout (bounding boxes, lines, words).
prebuilt-receipt
model: gives structured receipt fields but does not expose full raw OCR text.
Running both results in duplicate OCR processing and higher cost.
- Best Practice: Use
prebuilt-receipt
withincludeTextDetails=true
When you call the prebuilt-receipt
API, set the parameter:You are right in your observation—when using Azure Document Intelligence (formerly Form Recognizer), especially the prebuilt models like prebuilt-receipt
, the raw OCR output is not directly returned as-is along with the structured data. However, here is the practical breakdown of your options --> includeTextDetails=true
This gives you:
The structured fields (MerchantName, Items, Total, etc.).
And also all raw OCR text, including bounding boxes, line text, words, and positions.
This is what you need to extract that "yellow highlighted" text using your own regex.
- Output Structure with
includeTextDetails
You can expect in the response:
-
analyzeResult.readResults
--> Full raw OCR text, by page, with bounding boxes and lines. -
analyzeResult.documentResults
-> Structured receipt data.
So, you do not need to run Read
separately. The prebuilt-receipt
model with includeTextDetails=true
gives you everything in one shot.
Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.