Hi @Parth Gupta ,
Structures used to find all the pictures in a .doc file, regardless of position, are found in both MS-DOC and MS-ODRAW. To find them, use this algorithm:
Read the FIB from offset zero in the WordDocument Stream.
- Find the FibRgFcLcb97.
- Find FibRgFcLcb97.fcDggInfo -> the offset in the Table Stream of the MS-DOC 2.9.171 OfficeArtContent
- Find FibRgFcLcb97.lcbDggInfo -> the size (in bytes) of the OfficeArtContent
- Find the OfficeArtContent.
- Find the OfficeArtContent.DrawingGroupData, which is a [MS-ODRAW] section 2.2.12 OfficeArtDggContainer
- Find OfficeArtDggContainer.blipStore, which is a MS-ODRAW 2.2.20 OfficeArtBStoreContainer.
- Find OfficeArtBStoreContainer.rgfb, which is an array of MS-ODRAW 2.2.22 OfficeArtBStoreContainerFileBlock.
- OfficeArtBStoreContainerFileBlock is either a OfficeArtFBSE or OfficeArtBlip depending on the data of the contained record.
- If it is a MS-ODRAW 2.2.23 OfficeArtBlip, it will be defined as below
Value | Meaning |
---|---|
0xF01A | OfficeArtBlipEMF, as defined in section 2.2.24. |
0xF01B | OfficeArtBlipWMF, as defined in section 2.2.25. |
0xF01C | OfficeArtBlipPICT, as defined in section 2.2.26. |
0xF01D | OfficeArtBlipJPEG, as defined in section 2.2.27. |
0xF01E | OfficeArtBlipPNG, as defined in section 2.2.28. |
0xF01F | OfficeArtBlipDIB, as defined in section 2.2.29. |
0xF029 | OfficeArtBlipTIFF, as defined in section 2.2.30. |
0xF02A | OfficeArtBlipJPEG, as defined in section 2.2.27.<5> |
If you need to find the position of pictures, remember that the fundamental unit of a Word binary file is a character. This includes visual characters such as letters, numbers, and punctuation. It also includes formatting characters such as paragraph marks, end of cell marks, line breaks, or section breaks. Finally, it includes anchor characters such as footnote reference characters, picture anchors, and comment anchors. MS-DOC 1.3.1 Characters.
A picture anchor is a character that specifies the location of a picture within a document. To find where to place pictures you need to examine the sprmCFSpec
property of a character, which specifies whether the current text has a meaning that differs or displays differently than the underlying character to which it is applied and the and the sprmCPicLocation
, which specifies the location of the position in the Data Stream of the picture. MS-DOC 2.6.1 Character Properties.
The location and size of each character in the file can be computed using the algorithm in MS-DOC 2.4.1 (Retrieving Text).
I hope this answers your question.
Best regards, Michael Bowen Microsoft Office Open Specifications