Powershell - Get-Content Asian characters show as question marks

John Rounds 1 Reputation point
2021-07-07T19:30:05.217+00:00

Hi,
Using Powershell to grab an HTML file and throw it in an email as the body.
Everything (images, etc.) work fine except Asian characters. They show up as question marks.
Trying this:
$body = Get-Content -Path $bodyPath -Raw -Encoding Unicode

If I open the HTML file directly, the characters show up fine.

Any suggestions would be appreciated.
Thanks!

Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
5,357 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. Andreas Baumgarten 95,731 Reputation points MVP
    2021-07-07T19:39:50.267+00:00

    Hi @John Rounds ,

    could you please try the -encoding UTF8 .

    ----------

    (If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

    Regards
    Andreas Baumgarten

    0 comments No comments

  2. Michael Taylor 47,806 Reputation points
    2021-07-07T19:43:18.383+00:00

    Is the HTML actually Unicode? While Unicode can handle pretty much every written language it isn't necessarily used just because something is written in a non-English language. MCBS has been around for decades and is used for non-English languages as well. I should also point out that Unicode is actually utf16 and is not a single format. UTF-8 and UTF-32 are also available. If you pick the wrong one then it'll mangle the text.

    For an HTML document the charset being used is supposed to be contained in a metadata tag in the head of the HTML. This tells the browser the language the HTML is designed for and is used by browsers to determine what character set to use. Look at the metadata element and use the corresponding encoding when getting the content. For example if the charset is set to utf-8 then use utf8 or possibly utf8BOM instead as documented here.

    If your encoding is correct for the HTML charset then look at the text as it appears in PS to see if it is correct there. If it is correct in PS then the issue is with the sending to the mail client and/or how the mail client renders it.

    0 comments No comments

  3. John Rounds 1 Reputation point
    2021-07-07T19:57:52.337+00:00

    Thanks!
    I've tried both, and the utf8 didn't change anything and the utf8BOM came up w/ error: the argument is null or empty.

    I'll see what PS shows in the text.

    This same HTML file was used with a previous app that sent it though email. We're taking it away from that app and using PowerShell to send it ourselves. So, I 'assume' the HTML file is ok as it is the same file. But I'll check it before it heads to email.

    Appreciate the suggestions.

    0 comments No comments

  4. Ian Xue (Shanghai Wicresoft Co., Ltd.) 29,491 Reputation points Microsoft Vendor
    2021-07-08T02:42:38.327+00:00

    Hi,

    It could be a font issue. Try to set the font of PowerShell to SimSun-ExtB.

    112729-image.png

    Best Regards,
    Ian Xue

    ============================================

    If the Answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments