System.Char struct

Artigo
01/11/2024

Este artigo fornece observações complementares à documentação de referência para essa API.

A Char estrutura representa pontos de código Unicode usando codificação UTF-16. O valor de um Char objeto é seu valor numérico (ordinal) de 16 bits.

Se você não estiver familiarizado com Unicode, valores escalares, pontos de código, pares substitutos, UTF-16 e o Rune tipo, consulte Introdução à codificação de caracteres no .NET.

Este artigo examina a relação entre um objeto e um Char caractere e discute algumas tarefas comuns executadas com Char instâncias. Recomendamos que você considere o Rune tipo, introduzido no .NET Core 3.0, como uma alternativa para Char executar algumas dessas tarefas.

Objetos Char, caracteres Unicode e cadeias de caracteres

Um String objeto é uma coleção sequencial de estruturas que representa uma cadeia de caracteres de Char texto. A maioria dos caracteres Unicode pode ser representada por um único Char objeto, mas um caractere codificado como um caractere base, par substituto e/ou sequência de caracteres combinando é representado por vários Char objetos. Por esse motivo, uma Char estrutura em um objeto não é necessariamente equivalente a um String único caractere Unicode.

Várias unidades de código de 16 bits são usadas para representar caracteres Unicode únicos nos seguintes casos:

Glifos, que podem consistir em um único caractere ou em um caractere base seguido por um ou mais caracteres combinados. Por exemplo, o caractere ä é representado por um objeto cuja unidade de código é U+0061 seguido por um Char Char objeto cuja unidade de código é U+0308. (O caractere ä também pode ser definido por um único Char objeto que tem uma unidade de código de U+00E4.) O exemplo a seguir ilustra que o caractere ä consiste em dois Char objetos.

using System;
using System.IO;

public class Example1
{
    public static void Main()
    {
        StreamWriter sw = new StreamWriter("chars1.txt");
        char[] chars = { '\u0061', '\u0308' };
        string strng = new String(chars);
        sw.WriteLine(strng);
        sw.Close();
    }
}
// The example produces the following output:
//       ä

open System
open System.IO

let sw = new StreamWriter("chars1.txt")
let chars = [| '\u0061'; '\u0308' |]
let string = String chars
sw.WriteLine string
sw.Close()

// The example produces the following output:
//       ä

Imports System.IO

Module Example2
    Public Sub Main()
        Dim sw As New StreamWriter("chars1.txt")
        Dim chars() As Char = {ChrW(&H61), ChrW(&H308)}
        Dim strng As New String(chars)
        sw.WriteLine(strng)
        sw.Close()
    End Sub
End Module
' The example produces the following output:
'       ä

Caracteres fora do Plano Multilíngue Básico (BMP) Unicode. O Unicode suporta dezesseis planos, além do BMP, que representa o plano 0. Um ponto de código Unicode é representado em UTF-32 por um valor de 21 bits que inclui o plano. Por exemplo, U+1D160 representa o caractere SÍMBOLO MUSICAL OITAVA NOTA. Como a codificação UTF-16 tem apenas 16 bits, os caracteres fora do BMP são representados por pares substitutos em UTF-16. O exemplo a seguir ilustra que o equivalente UTF-32 de U+1D160, o caractere MUSICAL SYMBOL EIGHTH NOTE, é U+D834 U+DD60. U+D834 é o substituto alto; os substitutos elevados variam de U+D800 a U+DBFF. U+DD60 é o substituto baixo; os substitutos baixos variam de U+DC00 a U+DFFF.

using System;
using System.IO;

public class Example3
{
    public static void Main()
    {
        StreamWriter sw = new StreamWriter(@".\chars2.txt");
        int utf32 = 0x1D160;
        string surrogate = Char.ConvertFromUtf32(utf32);
        sw.WriteLine("U+{0:X6} UTF-32 = {1} ({2}) UTF-16",
                     utf32, surrogate, ShowCodePoints(surrogate));
        sw.Close();
    }

    private static string ShowCodePoints(string value)
    {
        string retval = null;
        foreach (var ch in value)
            retval += String.Format("U+{0:X4} ", Convert.ToUInt16(ch));

        return retval.Trim();
    }
}
// The example produces the following output:
//       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

open System
open System.IO

let showCodePoints (value: char seq) =
    let str =
        value
        |> Seq.map (fun ch -> $"U+{Convert.ToUInt16 ch:X4}")
        |> String.concat ""
    str.Trim()

let sw = new StreamWriter(@".\chars2.txt")
let utf32 = 0x1D160
let surrogate = Char.ConvertFromUtf32 utf32
sw.WriteLine $"U+{utf32:X6} UTF-32 = {surrogate} ({showCodePoints surrogate}) UTF-16"
sw.Close()

// The example produces the following output:
//       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

Imports System.IO

Module Example4
    Public Sub Main()
        Dim sw As New StreamWriter(".\chars2.txt")
        Dim utf32 As Integer = &H1D160
        Dim surrogate As String = Char.ConvertFromUtf32(utf32)
        sw.WriteLine("U+{0:X6} UTF-32 = {1} ({2}) UTF-16",
                   utf32, surrogate, ShowCodePoints(surrogate))
        sw.Close()
    End Sub

    Private Function ShowCodePoints(value As String) As String
        Dim retval As String = Nothing
        For Each ch In value
            retval += String.Format("U+{0:X4} ", Convert.ToUInt16(ch))
        Next
        Return retval.Trim()
    End Function
End Module
' The example produces the following output:
'       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

Personagens e categorias de personagens

Cada caractere Unicode ou par substituto válido pertence a uma categoria Unicode. No .NET, as categorias Unicode são representadas por membros da UnicodeCategory enumeração e incluem valores como UnicodeCategory.CurrencySymbol, e UnicodeCategory.SpaceSeparator, UnicodeCategory.LowercaseLetterpor exemplo.

Para determinar a categoria Unicode de um caractere, chame o GetUnicodeCategory método. Por exemplo, o exemplo a seguir chama o GetUnicodeCategory para exibir a categoria Unicode de cada caractere em uma cadeia de caracteres. O exemplo funciona corretamente somente se não houver pares substitutos na String instância.

using System;
using System.Globalization;

class Example
{
   public static void Main()
   {
      // Define a string with a variety of character categories.
      String s = "The red car drove down the long, narrow, secluded road.";
      // Determine the category of each character.
      foreach (var ch in s)
         Console.WriteLine("'{0}': {1}", ch, Char.GetUnicodeCategory(ch));
   }
}
// The example displays the following output:
//      'T': UppercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'c': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'v': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      'n': LowercaseLetter
//      ' ': SpaceSeparator
//      't': LowercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'l': LowercaseLetter
//      'o': LowercaseLetter
//      'n': LowercaseLetter
//      'g': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      'n': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      's': LowercaseLetter
//      'e': LowercaseLetter
//      'c': LowercaseLetter
//      'l': LowercaseLetter
//      'u': LowercaseLetter
//      'd': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'a': LowercaseLetter
//      'd': LowercaseLetter
//      '.': OtherPunctuation

open System

// Define a string with a variety of character categories.
let s = "The red car drove down the long, narrow, secluded road."
// Determine the category of each character.
for ch in s do
    printfn $"'{ch}': {Char.GetUnicodeCategory ch}"

// The example displays the following output:
//      'T': UppercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'c': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'v': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      'n': LowercaseLetter
//      ' ': SpaceSeparator
//      't': LowercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'l': LowercaseLetter
//      'o': LowercaseLetter
//      'n': LowercaseLetter
//      'g': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      'n': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      's': LowercaseLetter
//      'e': LowercaseLetter
//      'c': LowercaseLetter
//      'l': LowercaseLetter
//      'u': LowercaseLetter
//      'd': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'a': LowercaseLetter
//      'd': LowercaseLetter
//      '.': OtherPunctuation

Imports System.Globalization

Module Example1
    Public Sub Main()
        ' Define a string with a variety of character categories.
        Dim s As String = "The car drove down the narrow, secluded road."
        ' Determine the category of each character.
        For Each ch In s
            Console.WriteLine("'{0}': {1}", ch, Char.GetUnicodeCategory(ch))
        Next
    End Sub
End Module
' The example displays the following output:
'       'T': UppercaseLetter
'       'h': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'r': LowercaseLetter
'       'e': LowercaseLetter
'       'd': LowercaseLetter
'       ' ': SpaceSeparator
'       'c': LowercaseLetter
'       'a': LowercaseLetter
'       'r': LowercaseLetter
'       ' ': SpaceSeparator
'       'd': LowercaseLetter
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'v': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'd': LowercaseLetter
'       'o': LowercaseLetter
'       'w': LowercaseLetter
'       'n': LowercaseLetter
'       ' ': SpaceSeparator
'       't': LowercaseLetter
'       'h': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'l': LowercaseLetter
'       'o': LowercaseLetter
'       'n': LowercaseLetter
'       'g': LowercaseLetter
'       ',': OtherPunctuation
'       ' ': SpaceSeparator
'       'n': LowercaseLetter
'       'a': LowercaseLetter
'       'r': LowercaseLetter
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'w': LowercaseLetter
'       ',': OtherPunctuation
'       ' ': SpaceSeparator
'       's': LowercaseLetter
'       'e': LowercaseLetter
'       'c': LowercaseLetter
'       'l': LowercaseLetter
'       'u': LowercaseLetter
'       'd': LowercaseLetter
'       'e': LowercaseLetter
'       'd': LowercaseLetter
'       ' ': SpaceSeparator
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'a': LowercaseLetter
'       'd': LowercaseLetter
'       '.': OtherPunctuation

Internamente, para caracteres fora do intervalo ASCII (U+0000 a U+00FF), o GetUnicodeCategory método depende das categorias Unicode relatadas CharUnicodeInfo pela classe. A partir do .NET Framework 4.6.2, os caracteres Unicode são classificados com base no padrão Unicode, versão 8.0.0. Em versões do .NET Framework do .NET Framework 4 para o .NET Framework 4.6.1, eles são classificados com base no padrão Unicode, versão 6.3.0.

Caracteres e elementos de texto

Como um único caractere pode ser representado por vários Char objetos, nem sempre é significativo trabalhar com objetos individuais Char . Por exemplo, o exemplo a seguir converte os pontos de código Unicode que representam os números do Egeu de zero a 9 em unidades de código codificadas UTF-16. Como ele equipara Char erroneamente objetos a caracteres, ele relata incorretamente que a cadeia de caracteres resultante tem 20 caracteres.

using System;

public class Example5
{
    public static void Main()
    {
        string result = String.Empty;
        for (int ctr = 0x10107; ctr <= 0x10110; ctr++)  // Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr);

        Console.WriteLine("The string contains {0} characters.", result.Length);
    }
}
// The example displays the following output:
//     The string contains 20 characters.

open System

let result =
    [ for i in 0x10107..0x10110 do  // Range of Aegean numbers.
        Char.ConvertFromUtf32 i ]
    |> String.concat ""

printfn $"The string contains {result.Length} characters."


// The example displays the following output:
//     The string contains 20 characters.

Module Example5
    Public Sub Main()
        Dim result As String = String.Empty
        For ctr As Integer = &H10107 To &H10110     ' Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr)
        Next
        Console.WriteLine("The string contains {0} characters.", result.Length)
    End Sub
End Module
' The example displays the following output:
'     The string contains 20 characters.

Você pode fazer o seguinte para evitar a suposição de que um objeto representa um Char único caractere:

Você pode trabalhar com um String objeto em sua totalidade em vez de trabalhar com seus caracteres individuais para representar e analisar o conteúdo linguístico.

Você pode usar String.EnumerateRunes como mostrado no exemplo a seguir:

int CountLetters(string s)
{
    int letterCount = 0;

    foreach (Rune rune in s.EnumerateRunes())
    {
        if (Rune.IsLetter(rune))
        { letterCount++; }
    }

    return letterCount;
}

let countLetters (s: string) =
    let mutable letterCount = 0

    for rune in s.EnumerateRunes() do
        if Rune.IsLetter rune then
            letterCount <- letterCount + 1

    letterCount

Você pode usar a StringInfo classe para trabalhar com elementos de texto em vez de objetos individuais Char . O exemplo a seguir usa o objeto para contar o StringInfo número de elementos de texto em uma cadeia de caracteres que consiste nos números do mar Egeu de zero a nove. Como ele considera um par substituto um único caractere, ele relata corretamente que a cadeia de caracteres contém dez caracteres.

using System;
using System.Globalization;

public class Example4
{
    public static void Main()
    {
        string result = String.Empty;
        for (int ctr = 0x10107; ctr <= 0x10110; ctr++)  // Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr);

        StringInfo si = new StringInfo(result);
        Console.WriteLine("The string contains {0} characters.",
                          si.LengthInTextElements);
    }
}
// The example displays the following output:
//       The string contains 10 characters.

open System
open System.Globalization

let result =
    [ for i in 0x10107..0x10110 do  // Range of Aegean numbers.
        Char.ConvertFromUtf32 i ]
    |> String.concat ""


let si = StringInfo result
printfn $"The string contains {si.LengthInTextElements} characters."

// The example displays the following output:
//       The string contains 10 characters.

Imports System.Globalization

Module Example6
    Public Sub Main()
        Dim result As String = String.Empty
        For ctr As Integer = &H10107 To &H10110     ' Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr)
        Next
        Dim si As New StringInfo(result)
        Console.WriteLine("The string contains {0} characters.", si.LengthInTextElements)
    End Sub
End Module
' The example displays the following output:
'       The string contains 10 characters.

Se uma cadeia de caracteres contiver um caractere base que tenha um ou mais caracteres combinados, você poderá chamar o String.Normalize método para converter a subcadeia de caracteres em uma única unidade de código codificada em UTF-16. O exemplo a seguir chama o método para converter o caractere base U+0061 (LATIN SMALL LETTER A) e combinar o String.Normalize caractere U+0308 (COMBINANDO DIAERESIS) em U+00E4 (LATIN SMALL LETTER A COM DIAERESIS).

using System;

public class Example2
{
    public static void Main()
    {
        string combining = "\u0061\u0308";
        ShowString(combining);

        string normalized = combining.Normalize();
        ShowString(normalized);
    }

    private static void ShowString(string s)
    {
        Console.Write("Length of string: {0} (", s.Length);
        for (int ctr = 0; ctr < s.Length; ctr++)
        {
            Console.Write("U+{0:X4}", Convert.ToUInt16(s[ctr]));
            if (ctr != s.Length - 1) Console.Write(" ");
        }
        Console.WriteLine(")\n");
    }
}
// The example displays the following output:
//       Length of string: 2 (U+0061 U+0308)
//
//       Length of string: 1 (U+00E4)

open System

let showString (s: string) =
    printf $"Length of string: {s.Length} ("
    for i = 0 to s.Length - 1 do
        printf $"U+{Convert.ToUInt16 s[i]:X4}"
        if i <> s.Length - 1 then printf " "
    printfn ")\n"

let combining = "\u0061\u0308"
showString combining

let normalized = combining.Normalize()
showString normalized

// The example displays the following output:
//       Length of string: 2 (U+0061 U+0308)
//
//       Length of string: 1 (U+00E4)

Module Example3
    Public Sub Main()
        Dim combining As String = ChrW(&H61) + ChrW(&H308)
        ShowString(combining)

        Dim normalized As String = combining.Normalize()
        ShowString(normalized)
    End Sub

    Private Sub ShowString(s As String)
        Console.Write("Length of string: {0} (", s.Length)
        For ctr As Integer = 0 To s.Length - 1
            Console.Write("U+{0:X4}", Convert.ToUInt16(s(ctr)))
            If ctr <> s.Length - 1 Then Console.Write(" ")
        Next
        Console.WriteLine(")")
        Console.WriteLine()
    End Sub
End Module
' The example displays the following output:
'       Length of string: 2 (U+0061 U+0308)
'       
'       Length of string: 1 (U+00E4)

Operações comuns

A Char estrutura fornece métodos para comparar Char objetos, converter o valor do objeto atual Char em um objeto de outro tipo e determinar a categoria Unicode de um Char objeto:

Para fazer isso	Use estes `System.Char` métodos
Comparar Char objetos	CompareTo e Equals
Converter um ponto de código em uma cadeia de caracteres	ConvertFromUtf32 Veja também o Rune tipo.
Converter um objeto ou um par substituto de objetos em um Char ponto de Char código	Para um único caractere: Convert.ToInt32(Char) Para um par substituto ou um caractere em uma cadeia de caracteres: Char.ConvertToUtf32 Veja também o Rune tipo.
Obter a categoria Unicode de um caractere	GetUnicodeCategory Consulte também Rune.GetUnicodeCategory.
Determine se um caractere está em uma categoria Unicode específica, como dígito, letra, pontuação, caractere de controle e assim por diante	IsControl, , IsHighSurrogate, , , , , IsNumber IsLower IsLowSurrogate IsPunctuation IsSeparator IsDigit IsSurrogatePair IsLetter IsUpper IsLetterOrDigit IsSurrogate IsSymboleIsWhiteSpace Consulte também os Rune métodos correspondentes no tipo.
Converter um objeto que representa um número em um Char tipo de valor numérico	GetNumericValue Consulte também Rune.GetNumericValue.
Converter um caractere em uma cadeia de caracteres em um Char objeto	Parse e TryParse
Converter um objeto em um Char String objeto	ToString
Alterar maiúsculas e minúsculas de um Char objeto	ToLower, ToLowerInvariant, ToUpper e ToUpperInvariant Consulte também os Rune métodos correspondentes no tipo.

Valores de Char e interoperabilidade

Quando um tipo gerenciado, que é representado como uma unidade de código codificada Unicode UTF-16, é passado para código não gerenciado Char , o marshaller de interoperabilidade converte o conjunto de caracteres em ANSI por padrão. Você pode aplicar o atributo a declarações de invocação de plataforma e o DllImportAttribute atributo a uma declaração de interoperabilidade COM para controlar qual conjunto de StructLayoutAttribute caracteres um tipo empacotado Char usa.

Compartilhar via