Azure Document Intelligence does not list OneNote files as supported input. The documented supported formats are PDF, images (JPEG/JPG, PNG, BMP, TIFF, HEIF), and for some models Office files (DOCX, XLSX, PPTX) and HTML.
- Plans to support OneNote files
The available documentation does not state any plans or roadmap for native OneNote (.one) support in Document Intelligence.
- Recommended approach to process OneNote content
Because OneNote is not a supported file type, the practical approach is to convert OneNote content into one of the supported formats and then call AnalyzeDocumentAsync on that converted content.
Given the current capabilities:
- HTML is supported by the Read, Layout, and Custom classification models.
- PDF and images are supported broadly across Read, Layout, General document, Prebuilt, and Custom models.
A reasonable pattern is:
- Use the Microsoft Graph OneNote API to export page content.
- For example,
GET ../onenote/pages/{id} returns the page content, which can be obtained as HTML.
- Decide the target format based on the model being used:
- If using Read, Layout, or Custom classification, HTML can be sent directly (within the documented limits, such as the maximum string length for Office file types and HTML).
- If using General document, Prebuilt, or Custom extraction/classification models that do not list HTML as supported, convert the OneNote page to PDF or an image first.
Preprocessing and best practices for OneNote pages
When preparing OneNote content for Document Intelligence:
- Ensure the converted document respects input requirements:
- For images/PDFs: stay within page count, file size, and dimension limits, and avoid very small text (for example, minimum text height guidance).
- For HTML/Office formats: keep total text length under the documented maximum string length.
- Prefer a single clear page per analysis request when possible, especially if exporting to an image or PDF, to align with guidance that one clear photo or high-quality scan per document yields best results.
- Remove password protection if exporting to PDF, since password-locked PDFs must be unlocked before submission.
- If using custom models, include representative OneNote-derived samples (after conversion) in the training set so the model learns the typical layout and structure of your exported pages.
If the goal is Markdown output for RAG or semantic chunking, combine this with the Layout or Read models and the markdown output option, then apply semantic chunking as described in the RAG guidance.
References: