IdnMapping.GetAscii Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Encodes a string of domain name labels that include Unicode characters outside the US-ASCII character range to a string of displayable Unicode characters in the US-ASCII character range (U+0020 to U+007E). The string is formatted according to the IDNA standard.
Overloads
GetAscii(String) |
Encodes a string of domain name labels that consist of Unicode characters to a string of displayable Unicode characters in the US-ASCII character range. The string is formatted according to the IDNA standard. |
GetAscii(String, Int32) |
Encodes a substring of domain name labels that include Unicode characters outside the US-ASCII character range. The substring is converted to a string of displayable Unicode characters in the US-ASCII character range and is formatted according to the IDNA standard. |
GetAscii(String, Int32, Int32) |
Encodes the specified number of characters in a substring of domain name labels that include Unicode characters outside the US-ASCII character range. The substring is converted to a string of displayable Unicode characters in the US-ASCII character range and is formatted according to the IDNA standard. |
GetAscii(String)
- Source:
- IdnMapping.cs
- Source:
- IdnMapping.cs
- Source:
- IdnMapping.cs
Encodes a string of domain name labels that consist of Unicode characters to a string of displayable Unicode characters in the US-ASCII character range. The string is formatted according to the IDNA standard.
public:
System::String ^ GetAscii(System::String ^ unicode);
public string GetAscii (string unicode);
member this.GetAscii : string -> string
Public Function GetAscii (unicode As String) As String
Parameters
- unicode
- String
The string to convert, which consists of one or more domain name labels delimited with label separators.
Returns
The equivalent of the string specified by the unicode
parameter, consisting of displayable Unicode characters in the US-ASCII character range (U+0020 to U+007E) and formatted according to the IDNA standard.
Exceptions
unicode
is null
.
unicode
is invalid based on the AllowUnassigned and UseStd3AsciiRules properties, and the IDNA standard.
Examples
The following example uses the GetAscii(String) method to convert an array of internationalized domain names to Punycode, which is an encoded equivalent that consists of characters in the US-ASCII character range. The GetUnicode(String) method then converts the Punycode domain name back into the original domain name, but replaces the original label separators with the standard label separator.
using System;
using System.Globalization;
public class Example
{
public static void Main()
{
string[] names = { "bücher.com", "мойдомен.рф", "παράδειγμα.δοκιμή",
"mycharity\u3002org",
"prose\u0000ware.com", "proseware..com", "a.org",
"my_company.com" };
IdnMapping idn = new IdnMapping();
foreach (var name in names) {
try {
string punyCode = idn.GetAscii(name);
string name2 = idn.GetUnicode(punyCode);
Console.WriteLine("{0} --> {1} --> {2}", name, punyCode, name2);
Console.WriteLine("Original: {0}", ShowCodePoints(name));
Console.WriteLine("Restored: {0}", ShowCodePoints(name2));
}
catch (ArgumentException) {
Console.WriteLine("{0} is not a valid domain name.", name);
}
Console.WriteLine();
}
}
private static string ShowCodePoints(string str1)
{
string output = "";
foreach (var ch in str1)
output += $"U+{(ushort)ch:X4} ";
return output;
}
}
// The example displays the following output:
// bücher.com --> xn--bcher-kva.com --> bücher.com
// Original: U+0062 U+00FC U+0063 U+0068 U+0065 U+0072 U+002E U+0063 U+006F U+006D
// Restored: U+0062 U+00FC U+0063 U+0068 U+0065 U+0072 U+002E U+0063 U+006F U+006D
//
// мойдомен.рф --> xn--d1acklchcc.xn--p1ai --> мойдомен.рф
// Original: U+043C U+043E U+0439 U+0434 U+043E U+043C U+0435 U+043D U+002E U+0440 U+0444
// Restored: U+043C U+043E U+0439 U+0434 U+043E U+043C U+0435 U+043D U+002E U+0440 U+0444
//
// παράδειγμα.δοκιμή --> xn--hxajbheg2az3al.xn--jxalpdlp --> παράδειγμα.δοκιμή
// Original: U+03C0 U+03B1 U+03C1 U+03AC U+03B4 U+03B5 U+03B9 U+03B3 U+03BC U+03B1 U+002E U+03B4 U+03BF U+03BA U+03B9 U+03BC U+03AE
// Restored: U+03C0 U+03B1 U+03C1 U+03AC U+03B4 U+03B5 U+03B9 U+03B3 U+03BC U+03B1 U+002E U+03B4 U+03BF U+03BA U+03B9 U+03BC U+03AE
//
// mycharity。org --> mycharity.org --> mycharity.org
// Original: U+006D U+0079 U+0063 U+0068 U+0061 U+0072 U+0069 U+0074 U+0079 U+3002 U+006F U+0072 U+0067
// Restored: U+006D U+0079 U+0063 U+0068 U+0061 U+0072 U+0069 U+0074 U+0079 U+002E U+006F U+0072 U+0067
//
// prose ware.com is not a valid domain name.
//
// proseware..com is not a valid domain name.
//
// a.org --> a.org --> a.org
// Original: U+0061 U+002E U+006F U+0072 U+0067
// Restored: U+0061 U+002E U+006F U+0072 U+0067
//
// my_company.com --> my_company.com --> my_company.com
// Original: U+006D U+0079 U+005F U+0063 U+006F U+006D U+0070 U+0061 U+006E U+0079 U+002E U+0063 U+006F U+006D
// Restored: U+006D U+0079 U+005F U+0063 U+006F U+006D U+0070 U+0061 U+006E U+0079 U+002E U+0063 U+006F U+006D
Imports System.Globalization
Module Example
Public Sub Main()
Dim names() As String = { "bücher.com", "мойдомен.рф", "παράδειγμα.δοκιμή",
"mycharity" + ChrW(&h3002) + "org",
"prose" + ChrW(0) + "ware.com", "proseware..com", "a.org",
"my_company.com" }
Dim idn As New IdnMapping()
For Each name In names
Try
Dim punyCode As String = idn.GetAscii(name)
Dim name2 As String = idn.GetUnicode(punyCode)
Console.WriteLine("{0} --> {1} --> {2}", name, punyCode, name2)
Console.WriteLine("Original: {0}", ShowCodePoints(name))
Console.WriteLine("Restored: {0}", ShowCodePoints(name2))
Catch e As ArgumentException
Console.WriteLine("{0} is not a valid domain name.", name)
End Try
Console.WriteLine()
Next
End Sub
Private Function ShowCodePoints(str1 As String) As String
Dim output As String = ""
For Each ch In str1
output += String.Format("U+{0} ", Convert.ToUInt16(ch).ToString("X4"))
Next
Return output
End Function
End Module
' The example displays the following output:
' bücher.com --> xn--bcher-kva.com --> bücher.com
' Original: U+0062 U+00FC U+0063 U+0068 U+0065 U+0072 U+002E U+0063 U+006F U+006D
' Restored: U+0062 U+00FC U+0063 U+0068 U+0065 U+0072 U+002E U+0063 U+006F U+006D
'
' мойдомен.рф --> xn--d1acklchcc.xn--p1ai --> мойдомен.рф
' Original: U+043C U+043E U+0439 U+0434 U+043E U+043C U+0435 U+043D U+002E U+0440 U+0444
' Restored: U+043C U+043E U+0439 U+0434 U+043E U+043C U+0435 U+043D U+002E U+0440 U+0444
'
' παράδειγμα.δοκιμή --> xn--hxajbheg2az3al.xn--jxalpdlp --> παράδειγμα.δοκιμή
' Original: U+03C0 U+03B1 U+03C1 U+03AC U+03B4 U+03B5 U+03B9 U+03B3 U+03BC U+03B1 U+002E U+03B4 U+03BF U+03BA U+03B9 U+03BC U+03AE
' Restored: U+03C0 U+03B1 U+03C1 U+03AC U+03B4 U+03B5 U+03B9 U+03B3 U+03BC U+03B1 U+002E U+03B4 U+03BF U+03BA U+03B9 U+03BC U+03AE
'
' mycharity。org --> mycharity.org --> mycharity.org
' Original: U+006D U+0079 U+0063 U+0068 U+0061 U+0072 U+0069 U+0074 U+0079 U+3002 U+006F U+0072 U+0067
' Restored: U+006D U+0079 U+0063 U+0068 U+0061 U+0072 U+0069 U+0074 U+0079 U+002E U+006F U+0072 U+0067
'
' prose ware.com is not a valid domain name.
'
' proseware..com is not a valid domain name.
'
' a.org --> a.org --> a.org
' Original: U+0061 U+002E U+006F U+0072 U+0067
' Restored: U+0061 U+002E U+006F U+0072 U+0067
'
' my_company.com --> my_company.com --> my_company.com
' Original: U+006D U+0079 U+005F U+0063 U+006F U+006D U+0070 U+0061 U+006E U+0079 U+002E U+0063 U+006F U+006D
' Restored: U+006D U+0079 U+005F U+0063 U+006F U+006D U+0070 U+0061 U+006E U+0079 U+002E U+0063 U+006F U+006D
Remarks
The unicode
parameter specifies a string of one or more labels that consist of valid Unicode characters. The labels are separated by label separators. The unicode
parameter cannot begin with a label separator, but it can include and optionally end with a separator. The label separators are FULL STOP (period, U+002E), IDEOGRAPHIC FULL STOP (U+3002), FULLWIDTH FULL STOP (U+FF0E), and HALFWIDTH IDEOGRAPHIC FULL STOP (U+FF61). For example, the domain name "www.adatum.com" consists of the labels, "www", "adatum", and "com", separated by periods.
A label cannot contain any of the following characters:
Unicode control characters from U+0001 through U+001F, and U+007F.
Unassigned Unicode characters, if the value of the AllowUnassigned property is
false
.Non-standard characters in the US-ASCII character range, such as the SPACE (U+0020), EXCLAMATION MARK (U+0021), and LOW LINE (U+005F) characters, if the value of the UseStd3AsciiRules property is
true
.Characters that are prohibited by a specific version of the IDNA standard. For more information about prohibited characters, see RFC 3454: Preparation of Internationalized Strings ("stringprep") for IDNA 2003, and RFC 5982: The Unicode Code Points and Internationalized Domain Names for Applications for IDNA 2008.
The GetAscii method converts all label separators to FULL STOP (period, U+002E).
If unicode
contains no characters outside the US-ASCII character range and no characters within the US-ASCII character range are prohibited, the method returns unicode
unchanged.
Notes to Callers
In the .NET Framework 4.5, the IdnMapping class supports different versions of the IDNA standard, depending on the operating system in use:
When run on Windows 8, it supports the 2008 version of the IDNA standard outlined by RFC 5891: Internationalized Domain Names in Applications (IDNA): Protocol.
When run on earlier versions of the Windows operating system, it supports the 2003 version of the standard outlined by RFC 3490: Internationalizing Domain Names in Applications (IDNA).
See Unicode Technical Standard #46: IDNA Compatibility Processing for the differences in the way these standards handle particular sets of characters.
Applies to
GetAscii(String, Int32)
- Source:
- IdnMapping.cs
- Source:
- IdnMapping.cs
- Source:
- IdnMapping.cs
Encodes a substring of domain name labels that include Unicode characters outside the US-ASCII character range. The substring is converted to a string of displayable Unicode characters in the US-ASCII character range and is formatted according to the IDNA standard.
public:
System::String ^ GetAscii(System::String ^ unicode, int index);
public string GetAscii (string unicode, int index);
member this.GetAscii : string * int -> string
Public Function GetAscii (unicode As String, index As Integer) As String
Parameters
- unicode
- String
The string to convert, which consists of one or more domain name labels delimited with label separators.
- index
- Int32
A zero-based offset into unicode
that specifies the start of the substring to convert. The conversion operation continues to the end of the unicode
string.
Returns
The equivalent of the substring specified by the unicode
and index
parameters, consisting of displayable Unicode characters in the US-ASCII character range (U+0020 to U+007E) and formatted according to the IDNA standard.
Exceptions
unicode
is null
.
index
is less than zero.
-or-
index
is greater than the length of unicode
.
unicode
is invalid based on the AllowUnassigned and UseStd3AsciiRules properties, and the IDNA standard.
Remarks
The unicode
and index
parameters define a substring with one or more labels that consist of valid Unicode characters. The labels are separated by label separators. The first character of the substring cannot begin with a label separator, but it can include and optionally end with a separator. The label separators are FULL STOP (period, U+002E), IDEOGRAPHIC FULL STOP (U+3002), FULLWIDTH FULL STOP (U+FF0E), and HALFWIDTH IDEOGRAPHIC FULL STOP (U+FF61). For example, the domain name "www.adatum.com" consists of the labels, "www", "adatum", and "com", separated by periods.
A label cannot contain any of the following characters:
Unicode control characters from U+0001 through U+001F, and U+007F.
Unassigned Unicode characters, depending on the value of the AllowUnassigned property.
Non-standard characters in the US-ASCII character range, such as the SPACE (U+0020), EXCLAMATION MARK (U+0021), and LOW LINE (U+005F) characters, depending on the value of the UseStd3AsciiRules property.
Characters that are prohibited by a specific version of the IDNA standard. For more information about prohibited characters, see RFC 3454: Preparation of Internationalized Strings ("stringprep") for IDNA 2003, and RFC 5982: The Unicode Code Points and Internationalized Domain Names for Applications for IDNA 2008.
The GetAscii method converts all label separators to FULL STOP (period, U+002E).
If unicode
contains no characters outside the US-ASCII character range and no characters within the US-ASCII character range are prohibited, the method returns unicode
unchanged.
Notes to Callers
In the .NET Framework 4.5, the IdnMapping class supports different versions of the IDNA standard, depending on the operating system in use:
When run on Windows 8, it supports the 2008 version of the IDNA standard outlined by RFC 5891: Internationalized Domain Names in Applications (IDNA): Protocol.
When run on earlier versions of the Windows operating system, it supports the 2003 version of the standard outlined by RFC 3490: Internationalizing Domain Names in Applications (IDNA).
See Unicode Technical Standard #46: IDNA Compatibility Processing for the differences in the way these standards handle particular sets of characters.
Applies to
GetAscii(String, Int32, Int32)
- Source:
- IdnMapping.cs
- Source:
- IdnMapping.cs
- Source:
- IdnMapping.cs
Encodes the specified number of characters in a substring of domain name labels that include Unicode characters outside the US-ASCII character range. The substring is converted to a string of displayable Unicode characters in the US-ASCII character range and is formatted according to the IDNA standard.
public:
System::String ^ GetAscii(System::String ^ unicode, int index, int count);
public string GetAscii (string unicode, int index, int count);
member this.GetAscii : string * int * int -> string
Public Function GetAscii (unicode As String, index As Integer, count As Integer) As String
Parameters
- unicode
- String
The string to convert, which consists of one or more domain name labels delimited with label separators.
- index
- Int32
A zero-based offset into unicode
that specifies the start of the substring.
- count
- Int32
The number of characters to convert in the substring that starts at the position specified by index
in the unicode
string.
Returns
The equivalent of the substring specified by the unicode
, index
, and count
parameters, consisting of displayable Unicode characters in the US-ASCII character range (U+0020 to U+007E) and formatted according to the IDNA standard.
Exceptions
unicode
is null
.
index
or count
is less than zero.
-or-
index
is greater than the length of unicode
.
-or-
index
is greater than the length of unicode
minus count
.
unicode
is invalid based on the AllowUnassigned and UseStd3AsciiRules properties, and the IDNA standard.
Examples
The following example uses the GetAscii(String, Int32, Int32) method to convert an internationalized domain name to a domain name that complies with the IDNA standard. The GetUnicode(String, Int32, Int32) method then converts the standardized domain name back into the original domain name, but replaces the original label separators with the standard label separator.
// This example demonstrates the GetAscii and GetUnicode methods.
// For sake of illustration, this example uses the most complex
// form of those methods, not the most convenient.
using System;
using System.Globalization;
class Sample
{
public static void Main()
{
/*
Define a domain name consisting of the labels: GREEK SMALL LETTER
PI (U+03C0); IDEOGRAPHIC FULL STOP (U+3002); GREEK SMALL LETTER
THETA (U+03B8); FULLWIDTH FULL STOP (U+FF0E); and "com".
*/
string name = "\u03C0\u3002\u03B8\uFF0Ecom";
string international;
string nonInternational;
string msg1 = "the original non-internationalized \ndomain name:";
string msg2 = "Allow unassigned characters?: {0}";
string msg3 = "Use non-internationalized rules?: {0}";
string msg4 = "Convert the non-internationalized domain name to international format...";
string msg5 = "Display the encoded domain name:\n\"{0}\"";
string msg6 = "the encoded domain name:";
string msg7 = "Convert the internationalized domain name to non-international format...";
string msg8 = "the reconstituted non-internationalized \ndomain name:";
string msg9 = "Visually compare the code points of the reconstituted string to the " +
"original.\n" +
"Note that the reconstituted string contains standard label " +
"separators (U+002e).";
// ----------------------------------------------------------------------------
CodePoints(name, msg1);
// ----------------------------------------------------------------------------
IdnMapping idn = new IdnMapping();
Console.WriteLine(msg2, idn.AllowUnassigned);
Console.WriteLine(msg3, idn.UseStd3AsciiRules);
Console.WriteLine();
// ----------------------------------------------------------------------------
Console.WriteLine(msg4);
international = idn.GetAscii(name, 0, name.Length);
Console.WriteLine(msg5, international);
Console.WriteLine();
CodePoints(international, msg6);
// ----------------------------------------------------------------------------
Console.WriteLine(msg7);
nonInternational = idn.GetUnicode(international, 0, international.Length);
CodePoints(nonInternational, msg8);
Console.WriteLine(msg9);
}
// ----------------------------------------------------------------------------
static void CodePoints(string value, string title)
{
Console.WriteLine("Display the Unicode code points of {0}", title);
foreach (char c in value)
{
Console.Write("{0:x4} ", Convert.ToInt32(c));
}
Console.WriteLine();
Console.WriteLine();
}
}
/*
This code example produces the following results:
Display the Unicode code points of the original non-internationalized
domain name:
03c0 3002 03b8 ff0e 0063 006f 006d
Allow unassigned characters?: False
Use non-internationalized rules?: False
Convert the non-internationalized domain name to international format...
Display the encoded domain name:
"xn--1xa.xn--txa.com"
Display the Unicode code points of the encoded domain name:
0078 006e 002d 002d 0031 0078 0061 002e 0078 006e 002d 002d 0074 0078 0061 002e 0063 006f
006d
Convert the internationalized domain name to non-international format...
Display the Unicode code points of the reconstituted non-internationalized
domain name:
03c0 002e 03b8 002e 0063 006f 006d
Visually compare the code points of the reconstituted string to the original.
Note that the reconstituted string contains standard label separators (U+002e).
*/
' This example demonstrates the GetAscii and GetUnicode methods.
' For sake of illustration, this example uses the most complex
' form of those methods, not the most convenient.
Imports System.Globalization
Class Sample
Public Shared Sub Main()
' Define a domain name consisting of the labels: GREEK SMALL LETTER
' PI (U+03C0); IDEOGRAPHIC FULL STOP (U+3002); GREEK SMALL LETTER
' THETA (U+03B8); FULLWIDTH FULL STOP (U+FF0E); and "com".
Dim name As String = "π。θ.com"
Dim international As String
Dim nonInternational As String
Dim msg1 As String = "the original non-internationalized " & vbCrLf & "domain name:"
Dim msg2 As String = "Allow unassigned characters?: {0}"
Dim msg3 As String = "Use non-internationalized rules?: {0}"
Dim msg4 As String = "Convert the non-internationalized domain name to international format..."
Dim msg5 As String = "Display the encoded domain name:" & vbCrLf & """{0}"""
Dim msg6 As String = "the encoded domain name:"
Dim msg7 As String = "Convert the internationalized domain name to non-international format..."
Dim msg8 As String = "the reconstituted non-internationalized " & vbCrLf & "domain name:"
Dim msg9 As String = "Visually compare the code points of the reconstituted string to the " & _
"original." & vbCrLf & _
"Note that the reconstituted string contains standard label " & _
"separators (U+002e)."
' ----------------------------------------------------------------------------
CodePoints(name, msg1)
' ----------------------------------------------------------------------------
Dim idn As New IdnMapping()
Console.WriteLine(msg2, idn.AllowUnassigned)
Console.WriteLine(msg3, idn.UseStd3AsciiRules)
Console.WriteLine()
' ----------------------------------------------------------------------------
Console.WriteLine(msg4)
international = idn.GetAscii(name, 0, name.Length)
Console.WriteLine(msg5, international)
Console.WriteLine()
CodePoints(international, msg6)
' ----------------------------------------------------------------------------
Console.WriteLine(msg7)
nonInternational = idn.GetUnicode(international, 0, international.Length)
CodePoints(nonInternational, msg8)
Console.WriteLine(msg9)
End Sub
' ----------------------------------------------------------------------------
Shared Sub CodePoints(ByVal value As String, ByVal title As String)
Console.WriteLine("Display the Unicode code points of {0}", title)
Dim c As Char
For Each c In value
Console.Write("{0:x4} ", Convert.ToInt32(c))
Next c
Console.WriteLine()
Console.WriteLine()
End Sub
End Class
'
'This code example produces the following results:
'
'Display the Unicode code points of the original non-internationalized
'domain name:
'03c0 3002 03b8 ff0e 0063 006f 006d
'
'Allow unassigned characters?: False
'Use non-internationalized rules?: False
'
'Convert the non-internationalized domain name to international format...
'Display the encoded domain name:
'"xn--1xa.xn--txa.com"
'
'Display the Unicode code points of the encoded domain name:
'0078 006e 002d 002d 0031 0078 0061 002e 0078 006e 002d 002d 0074 0078 0061 002e 0063 006f
'006d
'
'Convert the internationalized domain name to non-international format...
'Display the Unicode code points of the reconstituted non-internationalized
'domain name:
'03c0 002e 03b8 002e 0063 006f 006d
'
'Visually compare the code points of the reconstituted string to the original.
'Note that the reconstituted string contains standard label separators (U+002e).
'
Remarks
The Unicode
, index
, and count
parameters define a substring with one or more labels that consist of valid Unicode characters. The labels are separated by label separators. The first character of the substring cannot begin with a label separator, but it can include and optionally end with a separator. The label separators are FULL STOP (period, U+002E), IDEOGRAPHIC FULL STOP (U+3002), FULLWIDTH FULL STOP (U+FF0E), and HALFWIDTH IDEOGRAPHIC FULL STOP (U+FF61). For example, the domain name "www.adatum.com" consists of the labels, "www", "adatum", and "com", separated by periods.
A label cannot contain any of the following characters:
Unicode control characters from U+0001 through U+001F, and U+007F.
Unassigned Unicode characters, depending on the value of the AllowUnassigned property.
Non-standard characters in the US-ASCII character range, such as the SPACE (U+0020), EXCLAMATION MARK (U+0021), and LOW LINE (U+005F) characters, depending on the value of the UseStd3AsciiRules property.
Characters that are prohibited by a specific version of the IDNA standard. For more information about prohibited characters, see RFC 3454: Preparation of Internationalized Strings ("stringprep") for IDNA 2003, and RFC 5982: The Unicode Code Points and Internationalized Domain Names for Applications for IDNA 2008.
The GetAscii method converts all label separators to FULL STOP (period, U+002E). If the substring contains no characters outside the US-ASCII character range, and no characters within the US-ASCII character range are prohibited, the method returns the substring unchanged.
Notes to Callers
In the .NET Framework 4.5, the IdnMapping class supports different versions of the IDNA standard, depending on the operating system in use:
When run on Windows 8, it supports the 2008 version of the IDNA standard outlined by RFC 5891: Internationalized Domain Names in Applications (IDNA): Protocol.
When run on earlier versions of the Windows operating system, it supports the 2003 version of the standard outlined by RFC 3490: Internationalizing Domain Names in Applications (IDNA).
See Unicode Technical Standard #46: IDNA Compatibility Processing for the differences in the way these standards handle particular sets of characters.