Converting text file code pages
I've said "use Unicode" a lot, but sometimes there are programs that aren't doing what you'd expect, and outputting stuff in a different code page. Additionally, you might sometimes encounter a text file that was created using the system code page of a different machine. (Like if someone emailed me a txt file from a Russian computer, I wouldn't necessarily be able to make sense of it at first).
So, if you happen to have a text file in one encoding that you need to be able to read, you can write a little program to convert it. Or, if you find this blog post, you could even copy my little program to do that:
using System;using System.IO;using System.Text; class Convert{ static void Main(string[] args) { if (args.Length != 3) { Console.WriteLine("Usage: convert.exe infile.txt outfile.txt incodepage"); Console.WriteLine(" eg: convert data.1252.txt data.utf8.txt 1252"); Console.WriteLine(" or: convert data.1252.txt data.utf8.txt windows-1252"); Console.WriteLine(" (output is always UTF-8)"); return; } int codepage = 0; Encoding enc; if (int.TryParse(args[2], out codepage)) { enc = Encoding.GetEncoding(codepage); } else { enc = Encoding.GetEncoding(args[2]); } StreamReader reader = new StreamReader(args[0], enc); StreamWriter writer = new StreamWriter(args[1], false, Encoding.UTF8); String str; while ((str = reader.ReadLine()) != null) { writer.WriteLine(str); } writer.Close(); reader.Close(); }}
I've stuck the source and a compiled version in a convert.zip
Comments
Anonymous
January 24, 2013
Why not wrap your reader and writer with using statements and remove the Close calls?Anonymous
January 24, 2013
No reason, just because I didn't do it that way :)Anonymous
January 24, 2013
Or you could use PowerShell: gc Inputfile.txt | Out-File Outputfile.txt utf8Anonymous
January 25, 2013
You'd have to do a little more to use random code pages for PowerShell input.