I need to write a script to find out if a given document is of the format .doc or not. Iam using Amazon Linux machine. I tried to make use of the linux file command. For a given doc file(word document) the file command outputs the file information as following:
sample_file.doc: Composite Document File V2 Document, No summary info
I found out that file command provides the same file type information for 2003 excel files (.xls).
I want to know what all file types (like doc,xls) come under Composite Document File V2 Document and how I can check if given file is a doc file or not in Amazon Linux 2012 machine?
I tried to find the mime info of the particular file using "file --mime" command.
I got "application/msword" as mime type for most of the doc files.
For few files I got "application/CDFV2-encrypted" as the mime type.
What all are the possible mime types available for a microsoft doc file (.doc) ?
Is Composite Document File V2 Document and CDFV2 specific to microsoft document ? In that case how can I differentiate word doc files from other microsoft documents ?