Is it possible to delete ANSI characters such as Ââ in multiple UTF-8 files with Powershell?

Nicu F 61 Reputation points
2022-09-23T20:08:30.92+00:00

hello, I have a lots of ANSI characters such as Â|â in multiple UTF-8 files. How to delete them?

With notepad++, I try with regex to make a find and replace in Find in Files, but I did not succeed, because the files are in UTF-8.

In UTF-8, Â and â looks like this. And I cannot copy and make replacement with these simbols

244377-image.png

Windows for business | Windows Server | User experience | PowerShell
{count} votes

Accepted answer
  1. Rich Matheisen 47,901 Reputation points
    2022-09-24T21:42:20.933+00:00

    It's important to understand that no matter the encoding of the file, the characters are going to be Unicode characters in PowerShell.

    I didn't find and "Â" (Latin capital letter A with circumflex), or "â" (Latin small letter a with circumflex) in the file (text.txt) you attached to one of your earlier answers. What I did find were the characters "’" (Right single quotation mark, Unicode 8217 decimal) and " " (Non-breaking space, Unicode 160 decimal).

    Here's some code that makes replacing the ASCII-Extended codes a little easier replace without stringing long sequences of (.Net) Replace, or Powershell -creplace (it's important to do the comparison in a case sensitive manner). Just add the decimal value (cast as a 'char') to the $ExtendedAsciiReplacements hash as a key and provide the character you want to use a a replacement and the hash key's value.

    # decimal code points of Unicode characters  
    $ExtendedAsciiReplacements = @{  
        ([char]160)     = " "    # Non-breaking space  
        ([char]194)     = "A"    # Â = Latin capital letter A with circumflex  
        ([char]226)     = "a"    # â = Latin small letter a with circumflex      
        ([char]8216)    = "'"    # ‘ = Left single quotation mark)      
        ([char]8217)    = "'"    # ’ = Right single quotation mark      
        ([char]8220)    = '"'    # “ = Left double quotation mark      
        ([char]8221)    = '"'    # ” = Right double quotation mark      
    }  
      
    $x = Get-Content c:\junk\text.txt -Raw  
    $Replacement = [System.Collections.ArrayList]::new($x.Count)  
    for ($i = 0; $i -le ($x.Length - 1); $i++){  
        # Get the characters and their location  
        # if their value lies above decimal 127 (i.e., they're in the extended ASCII range)  
        # To replace those characters, add them to the $ExtendedAsciiReplacements has. Find  
        # the chacters in the Unicode code points charts found on the web  
        # uncomment the 3 lines below to enable this behavior  
    #    if ([int][char]$x[$i] -gt 127){  
    #        Write-Host "Found $($x[$i]) ($([int][char]$x[$i])) at position $i"  
    #    }  
        # stop uncommenting lines  
        if ( $ExtendedAsciiReplacements.ContainsKey($x[$i]) ){  
            $Replacement.Add($ExtendedAsciiReplacements[$x[$i]]) | Out-Null  
        }  
        else {  
            $Replacement.Add($x[$i]) | Out-Null  
        }  
       }  
    

10 additional answers

Sort by: Most helpful
  1. Andreas Baumgarten 123.4K Reputation points MVP Volunteer Moderator
    2022-09-23T20:27:32.383+00:00

    Hi @Nicu F ,

    my text file content:

    test  
    something  
    Âsomething | âsomething  
    something else  
    

    Maybe this script helps to get started:

    $a = Get-Content -Path "Junk\test.txt"  
    $a.Replace("Â","A").Replace("â","a") | Out-File -FilePath "Junk\textreplaced.txt" -Encoding utf8  
    

    Result in textreplaced.txt:

    test  
    something  
    Asomething | asomething  
    something else  
    

    For multiple txt files in a folder you can try this (the content of the file will be modified!).
    Just modify the -Path to your requirements. For instance "Junk\*.txt" for all txt files.

    Get-ChildItem -File -Path "Junk\test.txt" | ForEach-Object {  
        (Get-Content -Path $_ ).Replace("Â", "A").Replace("â", "a") | Set-Content $_ -Encoding utf8 -WhatIf  
    }  
    

    If this doesn't work for you, maybe you could attach a txt file with your content. This way it's easier to find a solution.

    ----------

    (If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

    Regards
    Andreas Baumgarten

    0 comments No comments

  2. Nicu F 61 Reputation points
    2022-09-23T20:45:25.283+00:00

    hello, sir. After make the replacement, I get this text:

    test  
     something  
     Âsomething | âsomething  
     something else  
    

    So, basicaly, it changed, but not as I wish. Is still ANSI character.

    Besides, I think is better to change it with an empty space, both characters. Such as:

     $a = Get-Content -Path "c:\Folder3\translated\1\test.txt"  
     $a.Replace("Â"," ").Replace("â"," ") | Out-File -FilePath "c:\Folder3\translated\1\textrelaced.txt" -Encoding utf8  
    

    But is the same output.


  3. Nicu F 61 Reputation points
    2022-09-24T07:16:07.56+00:00

    your example is very good. Except the file must be ANSI to see this characters.

    In the example below, the left characters ar visible only in ANSI, the Right characters are similar in UTF8

    ’s = ’
    â = â
      = empty space

    The problem is that my files are in UFT8, so the replacement you made is good if I have ANSI files.

    This example is good for your formula (but works only ANSI files). But I have UTF8 file...and those characters are not visible.

    home’s
    test
    something
    Âsomething | âsomething
    something else

    0 comments No comments

  4. Andreas Baumgarten 123.4K Reputation points MVP Volunteer Moderator
    2022-09-24T07:25:06.487+00:00

    Hi @Nicu F ,

    maybe you could attach a source txt file with your content in the format you have. Based on your file it's easier to find a solution.
    The content could be anonymized if containing personal details. But the file should contain the special characters you want to replace and should be in the format you have.

    ----------

    (If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

    Regards
    Andreas Baumgarten

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.