Hi @Parth Gupta ,
The Office binary file formats use what is known at the Compound File Binary format described in a specification that's referenced by the section before the one you're referencing:
2.1 File Structure
A Word Binary File is an OLE compound file as specified by [MS-CFB]. The file consists of the following storages and streams.
This is also known as "structured storage" and is effectively a FAT file system within a file. That means there are sectors of data organized by directories (i.e. folders) which contain streams (i.e. files) that have the data like the WordDocument stream that contains the FIB block at offset 0.
All this to say that parsing this without an API library to assist in navigating the internal FAT orgnization to get to this stream is extremely tedious. We typically don't do this but instead use such an API. On Windows, we use the Windows SDK's structured storage API. This API can be found in our developer docs here: https://learn.microsoft.com/en-us/windows/win32/stg/structured-storage-start-page.
Effectively, you start by calling StgOpenStorageEx to obtain the root storage for the .doc file that you're trying to parse. This gives you a IStorage pointer (in the ppObjectOpen out parameter). Then you would call OpenStream on that IStorage passing the name "WordDocument" as the name of the stream you want and receiving the IStream pointer in the ppstm out parameter. With that you would call Seek, Read and Write as you would from typical byte streams in other architectures.
Having said all that, I assume that your use of Python could mean you do have access to the Windows Structured Storage API set (i.e. not on Windows or not in a context that can call these APIs). If that's the case, you would need to find some library to assist in parsing the compound file binary architecture. I see that there are some hits when I search for libraries like that but can't recommend that as I've not tried them. However, the first hits I see are olefile and oletools. These look promising.
I hope this helps.
Best regards,
Tom Jebo
Microsoft Open Specifications Support