Detect if file is html

StewartBW 1,765 Reputation points
2024-08-23T00:58:05.0933333+00:00

Hello

I need to detect if a file is html (any version ie 4 or 5), not by extension, but contents.

No need full html validation, just to find if it's html, specially distinguished with xml.

Can I use WebBrowser to load the file?

If file is not html, WebBrowser will report back?

  • If no easy way, the minimum validation which is opening and closing html tags would be enough, how to check their existance, because they can be in lower/upper case and different forms, html start tag can have additional info.

Thanks all :)

Developer technologies VB
Developer technologies C#
0 comments No comments
{count} votes

Accepted answer
  1. Jiachen Li-MSFT 34,221 Reputation points Microsoft External Staff
    2024-08-23T01:57:03.6933333+00:00

    Hi @StewartBW ,

    You can attempt to read the first few bytes of the file and check for text patterns like <!DOCTYPE html> (for HTML5) or <html> (for older HTML versions).

    Dim filePath As String = "yourfile.html"
    Dim bufferSize As Integer = 4096 ' Read first 4KB of the file
    
    Using fs As New System.IO.FileStream(filePath, System.IO.FileMode.Open, System.IO.FileAccess.Read)
        Dim buffer(bufferSize - 1) As Byte
        fs.Read(buffer, 0, bufferSize)
    
        Dim content As String = System.Text.Encoding.UTF8.GetString(buffer)
    
        ' Check for <!DOCTYPE html> or <html> tags in the initial bytes
        If content.IndexOf("<!DOCTYPE html", StringComparison.OrdinalIgnoreCase) >= 0 OrElse 
           content.IndexOf("<html", StringComparison.OrdinalIgnoreCase) >= 0 Then
            MessageBox.Show("This is an HTML file.")
        Else
            MessageBox.Show("This is not an HTML file.")
        End If
    End Using
    
    

    Best Regards.

    Jiachen Li


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment". Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.