How to differentiate between MSI,PUB file vs DOC/XLS/PPT files

GHANASHYAM SATPATHY 301 Reputation points
2020-12-22T13:26:14.52+00:00

What is the correct technical way to differentiate between MSI,PUB,MSG vs DOC,XLS,PPT files. Basically all these files are of ole type and have magic header as D0 CF 11 E0 A1 B1 1A E1

Would like to know what is the correct way to find the file type for the above mentioned files from magic and submagic headers.

Thanks.

JavaScript API
JavaScript API
An Office service that supports add-ins to interact with objects in Office client applications.
863 questions
Office Open Specifications
Office Open Specifications
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Open Specifications: Technical documents for protocols, computer languages, standards support, and data portability. The goal with Open Specifications is to help developers open new opportunities to interoperate with Windows, SQL, Office, and SharePoint.
119 questions
{count} vote

3 answers

Sort by: Most helpful
  1. Mike Bowen 76 Reputation points
    2020-12-23T17:21:10.487+00:00

    @GHANASHYAM SATPATHY

    As you noted, you can determine from the header signature if it is a Microsoft binary file, but not the specific file type. Unfortunately, if you don’t trust the file extension, there’s no prescribed method for how to determine the file type.

    Even though there is no officially endorsed way to determine the file type without the file extension, some third-party utilities have used properties that are unique and required from the Office file formats documentation to create utilities that can determine file type with a high degree of confidence. File for Windows is an open-source example that you could look at for a starting point.

    Mike Bowen

    Escalation Engineer Microsoft Open Specifications

    0 comments No comments

  2. GHANASHYAM SATPATHY 301 Reputation points
    2020-12-23T17:28:09.927+00:00

    Unfortunate to hear that there is no no prescribed method for how to determine the file type. Does the root storage CLSID can be a marker for identifying the different file type as mentioned in following:

    http://fileformats.archiveteam.org/wiki/Microsoft_Compound_File

    Thanks.

    0 comments No comments

  3. Mike Bowen 76 Reputation points
    2020-12-23T19:13:30.247+00:00

    Hi @GHANASHYAM SATPATHY ,

    We don't document those in the Open Specifications. It's likely they'll work, but it's not guaranteed.

    Mike Bowen

    Escalation Engineer Microsoft Open Specifications