How to examine a WOrd document for changes and comments?

Rakesh Patel 0 Reputation points
2024-11-26T10:27:34.7366667+00:00

Hi,

I need some way to pull out the tracked changes in a Word document along with the comments made. How can I do this using DI?

The existing models like Layout only return the main text structures.

Would I have to examine the underlying xml structure of a Word document?

Thanks

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,126 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Azar 29,520 Reputation points MVP Volunteer Moderator
    2024-11-26T12:38:06.6733333+00:00

    Hi there Rakesh Patel

    Thanks for using QandA platform

    Guess, you would need to analyze the XML structure of the Word document. Word documents in .docx format are ZIP archives containing XML files. unzipg the .docx file, you can access the document.xml and comments.xml . Tracked changes are represented by <w:ins> (inserted text) and <w:del> (deleted text) tags in document.xml, You can use libraries such as python-docx or the OpenXML SDK to programmatically parse and extract the changes and comments from these XML files. , you can integrate XML parsing with Azure AI pipelines for automated extraction and analysis.

    If this helps kindly accept the answer thanks much.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.