שתף באמצעות


Detect the real file type

Question

Wednesday, May 27, 2015 6:47 AM

Hello !

Checking the extension , not always can give the real file type. For example a .mp3 file by mistake is changed to .xls , but the file is still an mp3 file and not an excel file.

Is there any way to detect the real file's type ?

( I'm interested especially for Excel and Word files ).

Thank you !

All replies (49)

Wednesday, May 27, 2015 6:57 AM

Hi,
Could you show us your source code?

' get the extension of "C:\filename.xls"
Dim stExtension AsString = System.IO.Path.GetExtension("C:\filename.xls")
' display extension
MessageBox.Show(stExtension)




Wednesday, May 27, 2015 7:03 AM

Hi,
Could you show us your source code?

' get the extension of "C:\filename.xls"
Dim stExtension AsString = System.IO.Path.GetExtension("C:\filename.xls")
' display extension
MessageBox.Show(stExtension)



Sorry , but if you read my question , I don't want to discuss the case using the extension , because I know how to do that.

I want to know if there's any way to detect the real file type , not by using the extension.

For this I have no code to show , because I don't know at all how can I do.

Thank you !


Wednesday, May 27, 2015 7:58 AM | 1 vote

Hello !

Checking the extension , not always can give the real file type. For example a .mp3 file by mistake is changed to .xls , but the file is still an mp3 file and not an excel file.

Is there any way to detect the real file's type ?

( I'm interested especially for Excel and Word files ).

Thank you !

Possibly. You will need to research to find out if some byte sequence representing a file signature is available for .xls or whatever file type you want to know about.

See thread VB.NET PE Files - Signatures

La vida loca


Wednesday, May 27, 2015 8:08 AM

Thank you !

But as I can see the example posted on that link , is using the extension.

I mean to get a file for example c:\123 , and to find the file's type . ( not by extension ).


Wednesday, May 27, 2015 8:16 AM

Thank you !

But as I can see the example posted on that link , is using the extension.

I mean to get a file for example c:\123 , and to find the file's type . ( not by extension ).

No kidding. Try reading the entire thread. Which has to do with attempting to determine a files type by its signature. Not by its extension.

What does extension have to do with attempting to discern file type by file signature?

On the other hand who changes a .xls extension to .MP3 requiring any need to try to discern a file type?

Files don't have to have an extension anyhow.

La vida loca


Wednesday, May 27, 2015 8:29 AM

It doesn't matter who change the file extension.

I just want to detect inside my application that a file that has a specific extension is Really this kind of file.

for example an .xls should really be an Excel File.

Some programs for example , when you try to read a specific file , display an message " Invalid format...". That is what I want to do inside my application. when the user browse and select a file , if this file is not for example a Excel file , should display a similar message.


Wednesday, May 27, 2015 8:43 AM

Right. That would be by file signature. Regardless of extension. If a file has a file signature which all file types do not.

Therefore you will need to research for the file types you want to verify by signature for the files in question if the files in question contain any of the file type signatures. Nobody cares about the extension.

Or instead of learning how to do this on your own are you asking for somebody to write the code for you for certain file types to provide messages and such? Rather than just learning how to read byte sequences from a file to see if they match some file type byte sequences which are available to you via research?

La vida loca


Wednesday, May 27, 2015 8:47 AM

No , I'm not asking for someone to write the code , but how can I read the file signature so I can strat implementing your idea ?

Is there any article I can read ?


Wednesday, May 27, 2015 10:58 AM

No , I'm not asking for someone to write the code , but how can I read the file signature so I can strat implementing your idea ?

Is there any article I can read ?

You must read the documentation of the binary file format of all file types you want to support. There is no global standard to store a signature inside the file.

Armin


Wednesday, May 27, 2015 11:36 AM

Be aware what Armin wrote is the only possibility. 

But it means a so called Tantalus job. There are endless files types made in the MS-Dos format which is still the base of Windows OS's. And there will be made more and more. 

Based on my knowledge I assume you will not succeed in this with 100 men in a lifetime.

Success
Cor


Wednesday, May 27, 2015 12:27 PM

Other than attempting to open the file with the associated application the answer is "no". Windows has no knowledge of the internal file format so it cannot detect whether the file is valid or corrupt. This is the case for Office documents, especially for the newer Office XML format, where a checksum is used to determine whether the document is valid or considered corrupt.

Paul ~~~~ Microsoft MVP (Visual Basic)


Wednesday, May 27, 2015 3:22 PM

Be aware what Armin wrote is the only possibility. 

But it means a so called Tantalus job. There are endless files types made in the MS-Dos format which is still the base of Windows OS's. And there will be made more and more. 

Based on my knowledge I assume you will not succeed in this with 100 men in a lifetime.

Success
Cor

Cor - that was my thought exactly - there are so many thousands of file types and likely different versions of the same program have different formats:

Excel xlsx file
00000000   50 4B 03 04 14 00 06 00  08 00 00 00 21 00 C2 13   PK          ! Â
Excel xls file
00000000   D0 CF 11 E0 A1 B1 1A E1  00 00 00 00 00 00 00 00   ÐÏ à¡± á        
ZIP file
00000000   50 4B 03 04 0A 00 00 00  00 00 13 32 44 3A B4 88   PK         2D:´ˆ

If the OP wants to be sure an excel file is valid, just open it in a try-catch-end try block. That should work.


Wednesday, May 27, 2015 3:41 PM

This Wiki article might help, some common formats are covered in here:

http://en.wikipedia.org/wiki/List_of_file_signatures


Wednesday, May 27, 2015 4:11 PM

No , I'm not asking for someone to write the code , but how can I read the file signature so I can strat implementing your idea ?

Is there any article I can read ?

Really? You ask this question after a link to a thread was provided which has links in it you can read for learning how to implement this?

If information is provided to you and you don't bother to study it then it is unlikely you will endeavor in your quest.

La vida loca


Wednesday, May 27, 2015 4:30 PM

Something like this (requires a reference to Interop.Excel):

The filenames I used for a test should be obvious as to which is which.

Imports Microsoft.Office.Interop.Excel

Module Module1
    Sub Main()
        Dim testfiles As New List(Of String)
        testfiles.Add("H:\mp3.xlsx")
        testfiles.Add("H:\good.xlsx")
        testfiles.Add("H:\good.xls")

        Dim excel As Application = New Application
        Dim w As Workbook = Nothing
        Dim OK As Boolean = True
        For Each f As String In testfiles
            OK = True
            Try
                w = excel.Workbooks.Open(f)
            Catch ex As Exception
                OK = False
            End Try
            If Not OK Then
                Console.WriteLine("The file {0} appears to be corrupted", f)
            Else
                Console.WriteLine("The file {0} appears to be a real Excel file", f)
            End If
            Try
                w.Close()
            Catch ex As Exception
            End Try
        Next
        Console.WriteLine("All done, press enter")
        Console.ReadLine()

    End Sub
End Module

Wednesday, May 27, 2015 9:42 PM

Be aware what Armin wrote is the only possibility. 

But it means a so called Tantalus job. There are endless files types made in the MS-Dos format which is still the base of Windows OS's. And there will be made more and more. 

Based on my knowledge I assume you will not succeed in this with 100 men in a lifetime.

Success
Cor

I understand. But I'm curious : there are some application that when you try to open a file that they doesn't support , they display the message " Cannot open this file". for example a music player that try to open a fake mp3 file (but with .mp3 extension).

So a way may exist.

And I'm not speaking for rare file's type , but for Word and Excel files.


Wednesday, May 27, 2015 10:09 PM

I understand. But I'm curious : there are some application that when you try to open a file that they doesn't support , they display the message " Cannot open this file". for example a music player that try to open a fake mp3 file (but with .mp3 extension).

So a way may exist.

And I'm not speaking for rare file's type , but for Word and Excel files.

There is no general way to read any file and tell what application it belongs to. The "way that exists" is that any application that wants to open a data file needs to check that it contains data in a format that it can process. It can do that any way it likes and there is nothing forcing it to do it one particular way. 

So as others have already said, you can either work out for yourself what a valid Excel or Word file looks like and check for that, or you can have Excel or Word try to open the file (they will only be successful if the file is in the expected format).


Wednesday, May 27, 2015 10:12 PM

I understand. But I'm curious : there are some application that when you try to open a file that they doesn't support , they display the message " Cannot open this file". for example a music player that try to open a fake mp3 file (but with .mp3 extension).

So a way may exist.

The way is: It checks if the file is of one of the file types supported by the application. If it only supports mp3 files, it checks if it's an MP3 file. If not, the message is displayed.

Armin


Thursday, May 28, 2015 2:35 AM

I understand. But I'm curious : there are some application that when you try to open a file that they doesn't support , they display the message " Cannot open this file". for example a music player that try to open a fake mp3 file (but with .mp3 extension).

So a way may exist.

The way is: It checks if the file is of one of the file types supported by the application. If it only supports mp3 files, it checks if it's an MP3 file. If not, the message is displayed.

Armin

The question is how this program detect that the file is or isn't a valid mp3 file ?


Thursday, May 28, 2015 4:09 AM

The question is how this program detect that the file is or isn't a valid mp3 file ?

Only the person who wrote it can explain that, unless you want to disassemble it and see for yourself.  It might look for the extension (unlikely), look for a signature (possibly), or perhaps it tries to analyse the internal structure of the file to detect the data it needs in order to start playback, and decides whether or not it can access each component of that data, and, where possible, then validates that data to confirm it's consistent and usable.  In other words, if it can be played, then it's an MP3 and if it can't be played, then it's not.


Thursday, May 28, 2015 4:55 AM

Dcode,

I don't know how it is with others, and maybe you are happy with it, but I stop replying to you. 

You don't take the answers, you want only copy and past code.

Success
Cor


Thursday, May 28, 2015 6:24 AM

I understand. But I'm curious : there are some application that when you try to open a file that they doesn't support , they display the message " Cannot open this file". for example a music player that try to open a fake mp3 file (but with .mp3 extension).

So a way may exist.

The way is: It checks if the file is of one of the file types supported by the application. If it only supports mp3 files, it checks if it's an MP3 file. If not, the message is displayed.

Armin

The question is how this program detect that the file is or isn't a valid mp3 file ?

As previously provided there are links you can view to learn how to check a files signature against a known signature for a file type.

Is that really difficult to understand?

If an MP3 files signature is that ten bytes from byte 23 to byte 32 are specific bytes always then it is unlikely any other file type would use those same bytes at those same locations for their file signature. Obviously some file types have no file signatures like .Txt or .CSV files and many others probably.

You say you are curious but it seems to me you are not curious enough to bother researching what you need to learn to determine for file types you want to know about if they have signatures and if so how to use code to figure out if files contain those signatures in order for you to verify files, regardless of extension, are file types you want to use.

And yes anybody that can program can pretty much write bytes to locations in any file to make any file appear to have a file signature of some file type in order to fool somebody else into believing via code used to detect that that a file that was done to is actually a file type it is not. As well as altering extentions at will.

Regardless you seem incapable of learning on your own.

La vida loca


Thursday, May 28, 2015 10:27 AM

As previously provided there are links you can view to learn how to check a files signature against a known signature for a file type.

Is that really difficult to understand?

If an MP3 files signature is that ten bytes from byte 23 to byte 32 are specific bytes always then it is unlikely any other file type would use those same bytes at those same locations for their file signature. Obviously some file types have no file signatures like .Txt or .CSV files and many others probably.

La vida loca

The OP seems to be interested mainly in Office files such as Excel and Word. Unfortunately, the 2003-and-earlier office files all have the same "signature" in the first 8 bytes. There doesn't seem to be an easy way to distinguish an excel file from a word document just by looking at the bytes. Office files since 2003 are even harder. They are compressed XML files and all begin with "PK" as does any ZIP file and some other compressed archive.


Thursday, May 28, 2015 11:05 AM

So a way may exist.

The way is: It checks if the file is of one of the file types supported by the application. If it only supports mp3 files, it checks if it's an MP3 file. If not, the message is displayed.

Armin

The question is how this program detect that the file is or isn't a valid mp3 file ?

You've addressed me, so I reply, but I can only endorse what others already wrote, i.e. the other application checks the signature of the file, so you'd have to do it. In general, as data is all bits and bytes, loading/opening a file can be split into phases/situations:

a) it's not a file of a certain type
b) it seems to be a file of a certain type but during loading, integrity checks reveal that it is either not, or the file is broken (unexpected values, unexpected end of file, overall inconsistency)
c) the file can be loaded w/o failures but invalid data is not distinguishable from valid data
d) everything fine

Armin


Friday, May 29, 2015 12:05 AM

As previously provided there are links you can view to learn how to check a files signature against a known signature for a file type.

Is that really difficult to understand?

If an MP3 files signature is that ten bytes from byte 23 to byte 32 are specific bytes always then it is unlikely any other file type would use those same bytes at those same locations for their file signature. Obviously some file types have no file signatures like .Txt or .CSV files and many others probably.

La vida loca

The OP seems to be interested mainly in Office files such as Excel and Word. Unfortunately, the 2003-and-earlier office files all have the same "signature" in the first 8 bytes. There doesn't seem to be an easy way to distinguish an excel file from a word document just by looking at the bytes. Office files since 2003 are even harder. They are compressed XML files and all begin with "PK" as does any ZIP file and some other compressed archive.

Thanks. However I'm not interested in trying to do this for any files.

La vida loca


Friday, May 29, 2015 1:41 AM

Dcode,

I don't know how it is with others, and maybe you are happy with it, but I stop replying to you. 

You don't take the answers, you want only copy and past code.

Success
Cor

How did you came in this conclusion ?

Show me your response that seems to help me start my code ? ( Just start )

- Show me the unique signature code for Excel files

- Show me the unique signature code for Word files.

If you can give me these 2 information , I can write my code.

If you don't know these , yes , I agree it's better to not reply anymore because the threads if full of your words .

As a specialist you think that all are specialist and can understand all the information that you give. Well , I'm not a specialist. I agree that some things I don't understand , so I need more clear ideas.

Well , sometimes I agree maybe I need a 2 lines of code as a start , in order that I can wrote the other 10 or more lines that I need. What's wrong in this ?  Did this forum has a rule , that it's not permitted to ask for code ? If yes , well show me and I change my behavior.

Thank you !


Friday, May 29, 2015 2:01 AM

As previously provided there are links you can view to learn how to check a files signature against a known signature for a file type.

Is that really difficult to understand?

If an MP3 files signature is that ten bytes from byte 23 to byte 32 are specific bytes always then it is unlikely any other file type would use those same bytes at those same locations for their file signature. Obviously some file types have no file signatures like .Txt or .CSV files and many others probably.

La vida loca

The OP seems to be interested mainly in Office files such as Excel and Word. Unfortunately, the 2003-and-earlier office files all have the same "signature" in the first 8 bytes. There doesn't seem to be an easy way to distinguish an excel file from a word document just by looking at the bytes. Office files since 2003 are even harder. They are compressed XML files and all begin with "PK" as does any ZIP file and some other compressed archive.

Thanks. However I'm not interested in trying to do this for any files.

La vida loca

You're not interested to trying. But how do you know that all the ideas and links that you have posted here can work ?


Friday, May 29, 2015 2:05 AM

As previously provided there are links you can view to learn how to check a files signature against a known signature for a file type.

Is that really difficult to understand?

If an MP3 files signature is that ten bytes from byte 23 to byte 32 are specific bytes always then it is unlikely any other file type would use those same bytes at those same locations for their file signature. Obviously some file types have no file signatures like .Txt or .CSV files and many others probably.

La vida loca

The OP seems to be interested mainly in Office files such as Excel and Word. Unfortunately, the 2003-and-earlier office files all have the same "signature" in the first 8 bytes. There doesn't seem to be an easy way to distinguish an excel file from a word document just by looking at the bytes. Office files since 2003 are even harder. They are compressed XML files and all begin with "PK" as does any ZIP file and some other compressed archive.

Thanks. However I'm not interested in trying to do this for any files.

La vida loca

No reason why you should be. I was just trying to explain why the links you posted, while providing useful general information, may not solve this particular problem. I believe there is no general solution to the OP's question.


Friday, May 29, 2015 2:24 AM

Dcode,

I don't know how it is with others, and maybe you are happy with it, but I stop replying to you. 

You don't take the answers, you want only copy and past code.

Success
Cor

How did you came in this conclusion ?

Show me your response that seems to help me start my code ? ( Just start )

- Show me the unique signature code for Excel files

- Show me the unique signature code for Word files.

If you can give me these 2 information , I can write my code.

If you don't know these , yes , I agree it's better to not reply anymore because the threads if full of your words .

As a specialist you think that all are specialist and can understand all the information that you give. Well , I'm not a specialist. I agree that some things I don't understand , so I need more clear ideas.

Well , sometimes I agree maybe I need a 2 lines of code as a start , in order that I can wrote the other 10 or more lines that I need. What's wrong in this ?  Did this forum has a rule , that it's not permitted to ask for code ? If yes , well show me and I change my behavior.

Thank you !

I don't understand why you think there has to be a "unique signature" for Excel or Word files. Any file that Excel can successfully read is a valid Excel file, nobody is forcing Microsoft to mark the file with some special "signature" if it is to be read. 

As I have said before, you can study the file formats in detail until you believe you have identified the characteristics of an Excel (or Word, or whatever) file, or you can use automation to have Excel (or Word, or whatever) load the file and see if there are errors. 

Also keep in mind that the format of Office files changed significantly after 2003. Until then, they were in a proprietary format that had the same "signature" for all office applications. After that the default format (e.g. xlsx and docx) was compressed XML files that had the same "signature" as ZIP files.


Friday, May 29, 2015 2:39 AM

Dcode,

I don't know how it is with others, and maybe you are happy with it, but I stop replying to you. 

You don't take the answers, you want only copy and past code.

Success
Cor

How did you came in this conclusion ?

Show me your response that seems to help me start my code ? ( Just start )

- Show me the unique signature code for Excel files

- Show me the unique signature code for Word files.

If you can give me these 2 information , I can write my code.

If you don't know these , yes , I agree it's better to not reply anymore because the threads if full of your words .

As a specialist you think that all are specialist and can understand all the information that you give. Well , I'm not a specialist. I agree that some things I don't understand , so I need more clear ideas.

Well , sometimes I agree maybe I need a 2 lines of code as a start , in order that I can wrote the other 10 or more lines that I need. What's wrong in this ?  Did this forum has a rule , that it's not permitted to ask for code ? If yes , well show me and I change my behavior.

Thank you !

I don't understand why you think there has to be a "unique signature" for Excel or Word files. Any file that Excel can successfully read is a valid Excel file, nobody is forcing Microsoft to mark the file with some special "signature" if it is to be read. 

As I have said before, you can study the file formats in detail until you believe you have identified the characteristics of an Excel (or Word, or whatever) file, or you can use automation to have Excel (or Word, or whatever) load the file and see if there are errors. 

Also keep in mind that the format of Office files changed significantly after 2003. Until then, they were in a proprietary format that had the same "signature" for all office applications. After that the default format (e.g. xlsx and docx) was compressed XML files that had the same "signature" as ZIP files.

I'm just thinking if maybe there's exist an unique signature.

But if is not a secret for Microsoft , when for example Word try to read and open a .doc file , how came in conclusion that can or can't open this file ? I think something should exist in file's header or somewhere else. ??? Or I'm wrong ?


Friday, May 29, 2015 2:51 AM

But if is not a secret for Microsoft , when for example Word try to read and open a .doc file , how came in conclusion that can or can't open this file ? I think something should exist in file's header or somewhere else. ??? Or I'm wrong ?

I think you are probably wrong. When Word tries to read a .doc or a .docx file, it probably assumes that the file is in the correct format. If it encounters an error because the format is incorrect, it can throw an exception. An application might choose to mark its files with an obvious signature, but it would still have to check for valid data in the file.


Friday, May 29, 2015 3:28 AM

If you rename a .xlsx file to .zip - Winzip/WinRAR etc will open it.

I already provided sample code to test for a valid Excel file using Interop.Excel, I guess it was ignored by the OP.

 


Friday, May 29, 2015 3:30 AM

DCode - Did you look at the example I posted ? - I figured the best way to see if a file is a valid Excel file is to use Excel, the same could easily be done for Word files. Tested here before posting, seems to do the job.


Friday, May 29, 2015 3:42 AM

DCode - Did you look at the example I posted ? - I figured the best way to see if a file is a valid Excel file is to use Excel, the same could easily be done for Word files. Tested here before posting, seems to do the job.

I have not ignored your example , but consider a case where Microsoft office is not installed in the PC. ( but of course my application has the capabilities to open , if it's a valid Word or Excel file ). so I need to do the test only within my application.


Friday, May 29, 2015 3:52 AM

DCode - Did you look at the example I posted ? - I figured the best way to see if a file is a valid Excel file is to use Excel, the same could easily be done for Word files. Tested here before posting, seems to do the job.

I have not ignored your example , but consider a case where Microsoft office is not installed in the PC. ( but of course my application has the capabilities to open , if it's a valid Word or Excel file ). so I need to do the test only within my application.

Perhaps it would help if you explain what you want to do in that case. If the user's PC doesn't have Office installed, it probably can't do much with an Office file.


Friday, May 29, 2015 4:01 AM

DCode - Did you look at the example I posted ? - I figured the best way to see if a file is a valid Excel file is to use Excel, the same could easily be done for Word files. Tested here before posting, seems to do the job.

I have not ignored your example , but consider a case where Microsoft office is not installed in the PC. ( but of course my application has the capabilities to open , if it's a valid Word or Excel file ). so I need to do the test only within my application.

Perhaps it would help if you explain what you want to do in that case. If the user's PC doesn't have Office installed, it probably can't do much with an Office file.

My application can read  word or excel files if they are valid. I can import or export these file.  But I can't use a test mode that depends from  that Office is or not installed to the pc where my program is running.


Friday, May 29, 2015 5:22 AM | 1 vote

My application can read  word or excel files if they are valid. I can import or export these file.

In that case, no other test is required.  If your application can read the file then it is, by definition, a file of that type.

Why do you need this information separately from whether or not the read succeeds?


Friday, May 29, 2015 5:52 AM

My application can read  word or excel files if they are valid. I can import or export these file.

In that case, no other test is required.  If your application can read the file then it is, by definition, a file of that type.

Why do you need this information separately from whether or not the read succeeds?

because he crash if the type is not a valid Word or Excel format. So I need to test


Friday, May 29, 2015 6:28 AM

because he crash if the type is not a valid Word or Excel format. So I need to test

The crash is the indication that it's not a file of that type.  Insert a Try/Catch block around the line where it crashes.  In the Catch portion add you code to provide the indication that the file is not valid.

See:
https://msdn.microsoft.com/en-us/library/fk6t46tz.aspx


Friday, May 29, 2015 6:34 AM

because he crash if the type is not a valid Word or Excel format. So I need to test

The crash is the indication that it's not a file of that type.  Insert a Try/Catch block around the line where it crashes.  In the Catch portion add you code to provide the indication that the file is not valid.

See:
https://msdn.microsoft.com/en-us/library/fk6t46tz.aspx

I've already done that , but the application crash and hang (not responding) if try to open a non-valid file format.


Friday, May 29, 2015 8:29 AM

As previously provided there are links you can view to learn how to check a files signature against a known signature for a file type.

Is that really difficult to understand?

If an MP3 files signature is that ten bytes from byte 23 to byte 32 are specific bytes always then it is unlikely any other file type would use those same bytes at those same locations for their file signature. Obviously some file types have no file signatures like .Txt or .CSV files and many others probably.

La vida loca

The OP seems to be interested mainly in Office files such as Excel and Word. Unfortunately, the 2003-and-earlier office files all have the same "signature" in the first 8 bytes. There doesn't seem to be an easy way to distinguish an excel file from a word document just by looking at the bytes. Office files since 2003 are even harder. They are compressed XML files and all begin with "PK" as does any ZIP file and some other compressed archive.

Thanks. However I'm not interested in trying to do this for any files.

La vida loca

You're not interested to trying. But how do you know that all the ideas and links that you have posted here can work ?

You will never know. Since you are incapable apparently of researching anything.

If you really want to know you'll research until you have an answer rather than having anybody else do everything for you. But I know you will not do that because it would require effort on your part. And it is much simpler/easier to copy/paste and have anybody else do everything for you.

Whether this can help or not I really don't care. FileTypeDetective

La vida loca


Friday, May 29, 2015 8:55 AM

I've already done that , but the application crash and hang (not responding) if try to open a non-valid file format.

Then it was not done correctly, or you are not actually processing the file.  You should post the code where you are getting the error and trapping it.


Saturday, May 30, 2015 1:30 AM

So you expect to have people using your app that have MS Word and Excel files on their computer but they do not have Word or Excel installed ?

In that case you best contact MS directly for the header information for all the versions of Word and Excel or anything else you plan to support.

If your app hangs after opening an invalid file, I'd think you have some kind of loop running that never gets the signal to exit. If that is so, try using a counter and give up after some number of loops.


Saturday, May 30, 2015 2:11 AM

So you expect to have people using your app that have MS Word and Excel files on their computer but they do not have Word or Excel installed ?

In that case you best contact MS directly for the header information for all the versions of Word and Excel or anything else you plan to support.

If your app hangs after opening an invalid file, I'd think you have some kind of loop running that never gets the signal to exit. If that is so, try using a counter and give up after some number of loops.

do you think that this is an information that Microsoft can give ?


Sunday, May 31, 2015 3:43 AM

As previously provided there are links you can view to learn how to check a files signature against a known signature for a file type.

Is that really difficult to understand?

If an MP3 files signature is that ten bytes from byte 23 to byte 32 are specific bytes always then it is unlikely any other file type would use those same bytes at those same locations for their file signature. Obviously some file types have no file signatures like .Txt or .CSV files and many others probably.

La vida loca

The OP seems to be interested mainly in Office files such as Excel and Word. Unfortunately, the 2003-and-earlier office files all have the same "signature" in the first 8 bytes. There doesn't seem to be an easy way to distinguish an excel file from a word document just by looking at the bytes. Office files since 2003 are even harder. They are compressed XML files and all begin with "PK" as does any ZIP file and some other compressed archive.

Thanks. However I'm not interested in trying to do this for any files.

La vida loca

You're not interested to trying. But how do you know that all the ideas and links that you have posted here can work ?

You will never know. Since you are incapable apparently of researching anything.

If you really want to know you'll research until you have an answer rather than having anybody else do everything for you. But I know you will not do that because it would require effort on your part. And it is much simpler/easier to copy/paste and have anybody else do everything for you.

Whether this can help or not I really don't care. FileTypeDetective

La vida loca

Thank you !

This  is working correctly with .xls files.

But  when I try with .xlsx files , he can't detect as a Excel file but as a .zip file.

I don't know if is there a way to distinguish .xlsx file from standart .zip files ?


Sunday, May 31, 2015 5:23 AM

Unzip zip files or decompress compressed files, see what files become unzipped or decompressed then check them singly. Nothing further from me. Either research and learn or don't. I suspect the latter since everything else seems to be handed on platter.

La vida loca


Monday, June 1, 2015 2:06 AM

So you expect to have people using your app that have MS Word and Excel files on their computer but they do not have Word or Excel installed ?

In that case you best contact MS directly for the header information for all the versions of Word and Excel or anything else you plan to support.

If your app hangs after opening an invalid file, I'd think you have some kind of loop running that never gets the signal to exit. If that is so, try using a counter and give up after some number of loops.

do you think that this is an information that Microsoft can give ?

I have no idea but who else would you contact, certainly not Oracle or Apple.

Without your source code, it is impossible to comment further.


Monday, June 1, 2015 3:35 AM

Unzip zip files or decompress compressed files, see what files become unzipped or decompressed then check them singly. Nothing further from me. Either research and learn or don't. I suspect the latter since everything else seems to be handed on platter.

La vida loca

Just for information , I have contacted the library's creator , and he says that for this kinds of files like .docx or xlsx and some others that are difficult to distinguish from each other , he has abandoned this library and support for this  , because he has not came up with any reliable way to distinguish between just a zip archive and docx or xlsx .....

Well , this seems disappoint for me ,  that a proper solution can be found for my problem.

Anyway thank you !


Monday, June 1, 2015 4:09 AM

Hi dcode,

Have you ever look at Excel or Word file as Bynary strings?
I think there's no way but finding out some strings in file itself.

Here's my Excel and Word file image, opened by Bynary editor.

You should find out some key words by yourself.
As far as I know, Windows application leave no standard records in files they save, not like Macintosh files have resource folk.