A unified Azure platform for creating and managing AI models, agents, and applications with built‑in enterprise security, monitoring, and governance
Azure-hosted Mistral OCR currently does not support the confidence_scores_granularity parameter.
The HTTP 422 (extra_forbidden) error confirms the parameter is not part of the Azure-exposed API schema.
There is no alternative Azure parameter today to retrieve word/page confidence scores from Mistral OCR.
🔍 What’s might be happening
1. Native Mistral vs Azure-hosted Mistral (key difference)
The native Mistral OCR API supports:
JSON"confidence_scores_granularity": "word" | "page"Show more lines
→ returns confidence scores per word/page [docs.mistral.ai]
However, in Azure AI Foundry (Azure-hosted Mistral):
The request schema is restricted
Unsupported fields are rejected with:
extra_forbidden → Extra inputs are not permitted
Azure exposes a subset of the upstream Mistral API, and confidence_scores_granularity is currently not included.
Does Azure Mistral OCR return confidence scores?
Based on available documentation and behavior:
Azure Mistral OCR:
✅ Returns extracted content (markdown/text/structured output)
❌ Does NOT return confidence scores (word/page)
There is no documented field in Azure Mistral responses for confidence values.
⚠️ Important implication for your use case
“validate scanned PDF quality and detect low-confidence regions”
This cannot currently be implemented using Azure Mistral OCR alone.
✅ Recommended alternatives (Azure-supported)
Since you’re doing quality validation / low-confidence detection, here are practical workarounds used in real customer scenarios:
Option 1 — Use Azure Document Intelligence
Azure Document Intelligence provides:
✅ Word-level confidence scores
✅ Page-level / field-level confidence
✅ Production-grade OCR + structured extraction
Microsoft explicitly documents:
“Document Intelligence returns confidence for predicted words… between 0 and 1” [learn.microsoft.com]
👉 Best fit for:
OCR quality validation
Threshold-based filtering (e.g., reject < 0.8 confidence)
Compliance / human-in-the-loop workflows
Option 2 — Dual-pass pipeline (common workaround)
If you must use Mistral OCR for layout/quality:
Pattern
Run Mistral OCR
Get high-quality markdown + structure
Run Document Intelligence (Read OCR)
Extract confidence scores
Align results
Map words/regions between outputs
Use Document Intelligence confidence as proxy
Option 3 — Native Mistral (non-Azure endpoint)
Native Mistral API supports the parameter, You can call them via Custom functions
Hope it helps.
Thank you.