Fixing the Charset on MIME Messages When Using ADODB.Stream.ReadText

We had an issue recently where the customer had written an application to process MIME messages that had body parts encoded with the koi8-r charset…or at least that’s what it said:

 ------_=_NextPart_001_01C9F067.D9742103
Content-Type: text/plain;
    charset="koi8-r"
Content-Transfer-Encoding: quoted-printable

If you use code similar to the following, the resulting characters returned were not correct.

private string ReadBodyPart()

{

    CDO.IDataSource oIDsrc;

    ADODB.Stream objBpStream = new ADODB.Stream();

    CDO.Message objCDOMsg = new CDO.Message();

    CDO.IBodyPart oIBodyPart;

    

    objBpStream.Open(vtMissing,

        ADODB.ConnectModeEnum.adModeUnknown,

        ADODB.StreamOpenOptionsEnum.adOpenStreamUnspecified,

        string.Empty,

        string.Empty);

    objBpStream.LoadFromFile(EML_FILE_PATH);           

    

    oIDsrc = (CDO.IDataSource)objCDOMsg;

    

    oIDsrc.OpenObject(objBpStream, "_Stream");

    objBpStream.Close();

    oIBodyPart = (CDO.IBodyPart)objCDOMsg.TextBodyPart;

    objBpStream = (ADODB.Stream)oIBodyPart.GetDecodedContentStream();

    return objBpStream.ReadText(-1);

}

The problem is that koi8-r is the body name for both “windows-1251” (codepage 1251) and “koi8-r” (codepage 20866). I took a look at what the TextBody and HtmlBody properties were doing because they can return the characters correctly. Turns out, when the message is loaded, we make an API call to GetCharsetInfo and pass in the value defined in the header. This gives us back a structure that contains the uiCodePage used by TextBody and HtmlBody later.

Fortunately, the .net framework provides methods to do this as well. I simply leverage them to correct the charset I’m given.

public static string GetCharSetName(string bodyName)

{

    Encoding encoding = Encoding.GetEncoding(bodyName);

    if (encoding != null)

    {

        int iCP = encoding.WindowsCodePage;

        return Encoding.GetEncoding(iCP).WebName;

    }

    return bodyName;

}

Then I just insert a call to that function above my call to ReadText and we’re good.

private string ReadBodyPart()

{

    CDO.IDataSource oIDsrc;

    ADODB.Stream objBpStream = new ADODB.Stream();

    CDO.Message objCDOMsg = new CDO.Message();

    CDO.IBodyPart oIBodyPart;

    

    objBpStream.Open(vtMissing,

        ADODB.ConnectModeEnum.adModeUnknown,

        ADODB.StreamOpenOptionsEnum.adOpenStreamUnspecified,

        string.Empty,

        string.Empty);

    objBpStream.LoadFromFile(EML_FILE_PATH);           

    

    oIDsrc = (CDO.IDataSource)objCDOMsg;

    

    oIDsrc.OpenObject(objBpStream, "_Stream");

    objBpStream.Close();

    oIBodyPart = (CDO.IBodyPart)objCDOMsg.TextBodyPart;

    objBpStream = (ADODB.Stream)oIBodyPart.GetDecodedContentStream();

    objBpStream.Charset = GetCharSetName(objBpStream.Charset);

    return objBpStream.ReadText(-1);

}