Finding VBA Compressed source code in ole file

Parth Gupta 180 Reputation points
2023-05-27T12:37:42.9033333+00:00

Hi,

With reference to [MS-OVBA].

I am parsing a .doc file (MS-97-2003) in Python and i am able to read the directory structure. The directory structure is given at the end here.

I know that the entry 'Module1' contains some compressed VBA code. Although I am unable to understand how to extract the compressed code.

Kindly note that I know how to decompress a compressed VBA code using the algorithm mentioned in https://learn.microsoft.com/en-us/openspecs/office_file_formats/ms-ovba/4742b896-b32b-4eb0-8372-fbf01e3c65fd

However, I am unable to locate the VBA code in 'Module1'.

Kindly help me on how to find the compressed code in a stream.

Directory:

Name: Root Entry, Type: 5, Size: 8640, SectStart: 41

Name: 1Table, Type: 2, Size: 6509, SectStart: 8

Name: WordDocument, Type: 2, Size: 4096, SectStart: 0

Name: ♣SummaryInformation, Type: 2, Size: 4096, SectStart: 21

Name: ♣DocumentSummaryInformation, Type: 2, Size: 4096, SectStart: 29

Name: Macros, Type: 1, Size: 0, SectStart: 0

Name: VBA, Type: 1, Size: 0, SectStart: 0

Name: ThisDocument, Type: 2, Size: 924, SectStart: 0

Name: Module1, Type: 2, Size: 3716, SectStart: 15

Name: _VBA_PROJECT, Type: 2, Size: 2601, SectStart: 74

Name: dir, Type: 2, Size: 563, SectStart: 115

Name: PROJECTwm, Type: 2, Size: 65, SectStart: 124

Name: PROJECT, Type: 2, Size: 409, SectStart: 126

Name: ☺CompObj, Type: 2, Size: 114, SectStart: 133

Office Open Specifications
Office Open Specifications
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Open Specifications: Technical documents for protocols, computer languages, standards support, and data portability. The goal with Open Specifications is to help developers open new opportunities to interoperate with Windows, SQL, Office, and SharePoint.
138 questions
{count} votes

Accepted answer
  1. Tom Jebo 1,996 Reputation points Microsoft Employee
    2023-06-05T05:10:36.5566667+00:00

    Hi @Parth Gupta,

    You will need to parse each of the PROJECTINFORMATION and PROJECTREFERENCES records to get to the PROJECTMODULES record in the dir stream. It's not as hard as it may seem. Each of the variable size sub-records have most of their fields defined as constant sizes. The fields in the sub-records that variable are all determined by previously found size fields.

    For example, in the PROJECTINFORMATION record, the:

    NameRecord (variable): A PROJECTNAME Record (section 2.3.4.2.1.6).

    has a variable ProjectName field that is defined like this:

    ProjectName (variable): An array of SizeOfProjectName bytes that specifies the VBA identifier name for the VBA project. MUST contain MBCS characters encoded using the code page specified in PROJECTCODEPAGE (section 2.3.4.2.1.5). MUST NOT contain null characters.

    You'll see that SizeOfProjectName is the previous field in that record. The other variable fields are the same.

    So although a bit tedious, it is possible to parse your way past the PROJECTINFORMATION and PROJECTREFERENCES records.

    Does that help?

    Tom

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.