Check if string contains non-ASCII

StewartBW 1,805 Reputation points
2025-03-15T21:52:19.04+00:00

Hello

To check if inside a string, there are any non-ASCII characters, is this code snippet ok?

Encoding.UTF8.GetByteCount(inString) <> inString.Length = True/False
False: full ANSI string
True: string contains Non-ANSI characters

Not sure what are possible exceptions, and is it the safe way to do such check?

Thanks :)

Developer technologies VB
0 comments No comments
{count} votes

Accepted answer
  1. Marcin Policht 49,640 Reputation points MVP Volunteer Moderator
    2025-03-16T01:23:02.7566667+00:00

    The code snippet you provided checks if a string contains non-ASCII characters by comparing the byte count of the UTF-8 encoding with the string length. While this approach can work, there are a few things to consider and potential issues:

    1. Character Encoding: UTF-8 encoding can represent characters with more than one byte, so Encoding.UTF8.GetByteCount counts all bytes (including those for characters that require more than one byte). If the string contains characters that require multiple bytes (such as accented characters, emojis, or other non-ASCII characters), the byte count will differ from the length of the string.
    2. Comparison Logic: The logic you're using compares Encoding.UTF8.GetByteCount(inString) with inString.Length. The result of this comparison gives True when there are non-ASCII characters because UTF-8 characters that require more than 1 byte will cause the byte count to exceed the string length.
    3. Potential Exceptions: In general, the GetByteCount method won't throw exceptions unless the string is null. If inString is null, it will throw an ArgumentNullException.

    To safely check if a string contains non-ASCII characters and handle potential exceptions, consider this approach:

    Dim containsNonASCII As Boolean = False
    
    Try
        containsNonASCII = Encoding.UTF8.GetByteCount(inString) <> inString.Length
    Catch ex As ArgumentNullException
        ' Handle the case where inString is null
        containsNonASCII = False
    End Try
    
    If containsNonASCII Then
        ' String contains non-ASCII characters
    Else
        ' String is ASCII (ANSI) only
    End If
    

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.