OneNote support.

Question

OneNote support.

Yael Magrafta 0 Microsoft Employee

Hello,

Thank you for Azure Document Intelligence, it has been very useful for our application.

We are currently using the AnalyzeDocumentAsync API to extract Markdown from files, and we’re interested in extending our support to OneNote content.

I understand that OneNote is not currently listed as a supported file type, so I have a couple of questions:

Are there any plans to support OneNote files in the future?
In the meantime, what would you recommend as the best approach to process OneNote content with Document Intelligence?
- For example, would converting a OneNote page to HTML (since HTML is supported) be a recommended approach?
  - Do you have any best practices or recommendations for preprocessing OneNote content (especially a single page) before sending it for analysis?

Thank you in advance!

Anshika Varshney 13,305 Reputation points Microsoft External Staff Moderator

2026-06-11T08:22:50.2566667+00:00
Hello @Yael Magrafta

Future OneNote Support Plans

There are no public plans announced yet to add OneNote as a supported file type for AnalyzeDocumentAsync. OneNote is not currently listed in the supported formats, which include: JPEG, PNG, BMP, TIFF, DOCX, XLSX, PPTX, HTML, and PDF. [docs.cloud.deepset]

Recommended Approach: OneNote → HTML Conversion

Yes, converting OneNote to HTML is a recommended approach since HTML is supported by Document Intelligence:

How to Convert OneNote to HTML:

Manual Method (OneNote Desktop):

Open the OneNote page you want to export

Right-click the page → File → Export

Select "Single File Web Page" format (.htm/.html)

Click Export and save [cubexsoft]

Alternative: Use Microsoft Graph OneNote API

Use the OneNote API in Microsoft Graph to access and export OneNote pages programmatically[learn.microsoft]

Additional Recommendations:

Test with PDF: If quality is critical, convert OneNote → PDF → Document Intelligence for better page segmentation[stackoverflow]

Clean HTML: Remove unnecessary CSS/embedded scripts that might interfere with text extraction[sysinfotools]

Single Page Focus: For single pages, ensure the HTML represents the complete content without page-break artifacts

I Hope this helps. Do let me know if you have any further queries.

Thankyou!
Anshika Varshney 13,305 Reputation points Microsoft External Staff Moderator

2026-06-14T12:53:20.67+00:00

Hello @Yael Magrafta

Did you get any chance to review the response.

Thankyou!

2 answers

Your answer

Anshika Varshney 13,305 Reputation points Microsoft External Staff Moderator

2026-06-11T08:22:50.2566667+00:00

Hello @Yael Magrafta

Future OneNote Support Plans

There are no public plans announced yet to add OneNote as a supported file type for AnalyzeDocumentAsync. OneNote is not currently listed in the supported formats, which include: JPEG, PNG, BMP, TIFF, DOCX, XLSX, PPTX, HTML, and PDF. [docs.cloud.deepset]

Recommended Approach: OneNote → HTML Conversion

Yes, converting OneNote to HTML is a recommended approach since HTML is supported by Document Intelligence:

How to Convert OneNote to HTML:

Manual Method (OneNote Desktop):

Open the OneNote page you want to export

Right-click the page → File → Export

Select "Single File Web Page" format (.htm/.html)

Click Export and save [cubexsoft]

Alternative: Use Microsoft Graph OneNote API

Use the OneNote API in Microsoft Graph to access and export OneNote pages programmatically[learn.microsoft]

Additional Recommendations:

Test with PDF: If quality is critical, convert OneNote → PDF → Document Intelligence for better page segmentation[stackoverflow]

Clean HTML: Remove unnecessary CSS/embedded scripts that might interfere with text extraction[sysinfotools]

Single Page Focus: For single pages, ensure the HTML represents the complete content without page-break artifacts

I Hope this helps. Do let me know if you have any further queries.

Thankyou!
Anshika Varshney 13,305 Reputation points Microsoft External Staff Moderator

2026-06-14T12:53:20.67+00:00

Hello @Yael Magrafta

Did you get any chance to review the response.

Thankyou!

Answer 1

Hi ,

Thanks for reaching out to Microsoft Q&A.

OneNote is not currently supported by azure Doc Intelligence, and there is no publicly confirmed timeline for native support yet. In the interim, converting OneNote pages to HTML is a practical and recommended approach, since HTML preserves structure (headings, tables, lists) better than plain text or PDF and aligns well with the service’s capabilities. The typical pattern is to export the OneNote page via Graph or OneNote API as HTML, clean up unnecessary styling or embedded artifacts, and then pass the normalized HTML to AnalyzeDocumentAsync. For best results, ensure the content is flattened to a single page, remove scripts or unsupported tags, inline essential styles, and validate encoding (UTF8). If the page contains heavy images or handwritten notes, consider extracting text separately using OCR before combining it with the HTML for richer context.

Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

Answer 2

Azure Document Intelligence does not list OneNote files as supported input. The documented supported formats are PDF, images (JPEG/JPG, PNG, BMP, TIFF, HEIF), and for some models Office files (DOCX, XLSX, PPTX) and HTML.

Plans to support OneNote files

The available documentation does not state any plans or roadmap for native OneNote (.one) support in Document Intelligence.

Recommended approach to process OneNote content

Because OneNote is not a supported file type, the practical approach is to convert OneNote content into one of the supported formats and then call AnalyzeDocumentAsync on that converted content.

Given the current capabilities:

HTML is supported by the Read, Layout, and Custom classification models.
PDF and images are supported broadly across Read, Layout, General document, Prebuilt, and Custom models.

A reasonable pattern is:

Use the Microsoft Graph OneNote API to export page content.
- For example, GET ../onenote/pages/{id} returns the page content, which can be obtained as HTML.
Decide the target format based on the model being used:
- If using Read, Layout, or Custom classification, HTML can be sent directly (within the documented limits, such as the maximum string length for Office file types and HTML).
- If using General document, Prebuilt, or Custom extraction/classification models that do not list HTML as supported, convert the OneNote page to PDF or an image first.

Preprocessing and best practices for OneNote pages

When preparing OneNote content for Document Intelligence:

Ensure the converted document respects input requirements:
- For images/PDFs: stay within page count, file size, and dimension limits, and avoid very small text (for example, minimum text height guidance).
- For HTML/Office formats: keep total text length under the documented maximum string length.
Prefer a single clear page per analysis request when possible, especially if exporting to an image or PDF, to align with guidance that one clear photo or high-quality scan per document yields best results.
Remove password protection if exporting to PDF, since password-locked PDFs must be unlocked before submission.
If using custom models, include representative OneNote-derived samples (after conversion) in the training set so the model learns the typical layout and structure of your exported pages.

If the goal is Markdown output for RAG or semantic chunking, combine this with the Layout or Read models and the markdown output option, then apply semantic chunking as described in the RAG guidance.

References:

Share via

OneNote support.

2 answers

Your answer