Documentation on formats of stream objects in OLE files.

Parth Gupta 180 Reputation points
2024-05-31T10:52:43.25+00:00

Hi,

I am trying to parse a ".doc" file (Microsoft Word 97-2003 Document) (OLE), and I am looking for some documentation.

I found the following reference:

https://learn.microsoft.com/en-us/openspecs/office_file_formats/ms-doc/ccd7b486-7881-484c-a137-51170af7cc22

https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b

Through these documentations, I can parse the sectors, the directory structure, the FAT and mini FAT, etc. I am, however, looking for documentation on the format of streams like 'Data', '\x05DocumentSummaryInformation', '\x05SummaryInformation', '\x01CompObj' and others.

Kindly lead to technical documentation of the above formats (if any).

Thanks.

@Tom Jebo

Microsoft 365 and Office | Open Specifications
Microsoft 365 and Office | Open Specifications
Technical documents for protocols, computer languages, standards support, and data portability. The goal with Open Specifications is to help developers open new opportunities to interoperate with Windows, SQL, Office, and SharePoint.
{count} votes

Answer accepted by question author
  1. Mike Bowen 2,056 Reputation points Microsoft Employee Moderator
    2024-05-31T19:51:58.77+00:00

    Hi @Parth Gupta ,

    There are multiple things referred to as "Data" in the documentation, but if you're referring to the Data stream for a Word document, it is defined in MS-DOC 2.1.3 Data Stream

    The other streams you asked about are defined in:

    I hope that answers your question.

    Best regards,

    Michael Bowen

    Sr. Escalation Engineer Microsoft Open Specifications

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.