Share via

OneNote support.

Yael Magrafta 0 Reputation points Microsoft Employee
2026-06-11T06:58:09.76+00:00

Hello,

Thank you for Azure Document Intelligence, it has been very useful for our application.

We are currently using the AnalyzeDocumentAsync API to extract Markdown from files, and we’re interested in extending our support to OneNote content.

I understand that OneNote is not currently listed as a supported file type, so I have a couple of questions:

  1. Are there any plans to support OneNote files in the future?
  2. In the meantime, what would you recommend as the best approach to process OneNote content with Document Intelligence?
    • For example, would converting a OneNote page to HTML (since HTML is supported) be a recommended approach?
      • Do you have any best practices or recommendations for preprocessing OneNote content (especially a single page) before sending it for analysis?

Thank you in advance!

Azure Document Intelligence in Foundry Tools

2 answers

Sort by: Most helpful
  1. Vinodh247-1375 43,101 Reputation points Volunteer Moderator
    2026-06-14T14:38:29.87+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    OneNote is not currently supported by azure Doc Intelligence, and there is no publicly confirmed timeline for native support yet. In the interim, converting OneNote pages to HTML is a practical and recommended approach, since HTML preserves structure (headings, tables, lists) better than plain text or PDF and aligns well with the service’s capabilities. The typical pattern is to export the OneNote page via Graph or OneNote API as HTML, clean up unnecessary styling or embedded artifacts, and then pass the normalized HTML to AnalyzeDocumentAsync. For best results, ensure the content is flattened to a single page, remove scripts or unsupported tags, inline essential styles, and validate encoding (UTF8). If the page contains heavy images or handwritten notes, consider extracting text separately using OCR before combining it with the HTML for richer context.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

    Was this answer helpful?

    0 comments No comments

  2. AI answer

    2026-06-11T06:58:28.01+00:00

    Azure Document Intelligence does not list OneNote files as supported input. The documented supported formats are PDF, images (JPEG/JPG, PNG, BMP, TIFF, HEIF), and for some models Office files (DOCX, XLSX, PPTX) and HTML.

    1. Plans to support OneNote files

    The available documentation does not state any plans or roadmap for native OneNote (.one) support in Document Intelligence.

    1. Recommended approach to process OneNote content

    Because OneNote is not a supported file type, the practical approach is to convert OneNote content into one of the supported formats and then call AnalyzeDocumentAsync on that converted content.

    Given the current capabilities:

    • HTML is supported by the Read, Layout, and Custom classification models.
    • PDF and images are supported broadly across Read, Layout, General document, Prebuilt, and Custom models.

    A reasonable pattern is:

    1. Use the Microsoft Graph OneNote API to export page content.
      • For example, GET ../onenote/pages/{id} returns the page content, which can be obtained as HTML.
    2. Decide the target format based on the model being used:
      • If using Read, Layout, or Custom classification, HTML can be sent directly (within the documented limits, such as the maximum string length for Office file types and HTML).
      • If using General document, Prebuilt, or Custom extraction/classification models that do not list HTML as supported, convert the OneNote page to PDF or an image first.

    Preprocessing and best practices for OneNote pages

    When preparing OneNote content for Document Intelligence:

    • Ensure the converted document respects input requirements:
      • For images/PDFs: stay within page count, file size, and dimension limits, and avoid very small text (for example, minimum text height guidance).
      • For HTML/Office formats: keep total text length under the documented maximum string length.
    • Prefer a single clear page per analysis request when possible, especially if exporting to an image or PDF, to align with guidance that one clear photo or high-quality scan per document yields best results.
    • Remove password protection if exporting to PDF, since password-locked PDFs must be unlocked before submission.
    • If using custom models, include representative OneNote-derived samples (after conversion) in the training set so the model learns the typical layout and structure of your exported pages.

    If the goal is Markdown output for RAG or semantic chunking, combine this with the Layout or Read models and the markdown output option, then apply semantic chunking as described in the RAG guidance.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.