Documentation on formats of stream objects in OLE files.

Parth Gupta 180 Reputation points
2024-05-31T10:52:43.25+00:00

Hi,

I am trying to parse a ".doc" file (Microsoft Word 97-2003 Document) (OLE), and I am looking for some documentation.

I found the following reference:

https://learn.microsoft.com/en-us/openspecs/office_file_formats/ms-doc/ccd7b486-7881-484c-a137-51170af7cc22

https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b

Through these documentations, I can parse the sectors, the directory structure, the FAT and mini FAT, etc. I am, however, looking for documentation on the format of streams like 'Data', '\x05DocumentSummaryInformation', '\x05SummaryInformation', '\x01CompObj' and others.

Kindly lead to technical documentation of the above formats (if any).

Thanks.

@Tom Jebo

Microsoft 365 and Office Open Specifications
{count} votes

Accepted answer
  1. Mike Bowen 2,051 Reputation points Microsoft Employee Moderator
    2024-05-31T19:51:58.77+00:00

    Hi @Parth Gupta ,

    There are multiple things referred to as "Data" in the documentation, but if you're referring to the Data stream for a Word document, it is defined in MS-DOC 2.1.3 Data Stream

    The other streams you asked about are defined in:

    I hope that answers your question.

    Best regards,

    Michael Bowen

    Sr. Escalation Engineer Microsoft Open Specifications

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.