Struct System.Char

Annotazioni

Questo articolo fornisce osservazioni supplementari alla documentazione di riferimento per questa API.

La struttura Char rappresenta i punti di codice Unicode usando la codifica UTF-16. Il valore di un oggetto Char è il valore numerico a 16 bit (ordinale).

Se non si ha familiarità con Unicode, valori scalari, punti codice, coppie di surrogati, UTF-16 e tipo Rune, vedere Introduzione alla codifica dei caratteri in .NET.

Questo articolo esamina la relazione tra un oggetto Char e un carattere e illustra alcune attività comuni eseguite con Char istanze. È consigliabile considerare il tipo di Rune, introdotto in .NET Core 3.0, come alternativa a Char per eseguire alcune di queste attività.

Oggetti Char, caratteri Unicode e stringhe

Un oggetto String è una raccolta sequenziale di strutture Char che rappresenta una stringa di testo. La maggior parte dei caratteri Unicode può essere rappresentata da un singolo oggetto Char, ma un carattere codificato come carattere di base, coppia di surrogati e/o sequenza di caratteri combinati è rappresentato da più oggetti Char. Per questo motivo, una struttura Char in un oggetto String non è necessariamente equivalente a un singolo carattere Unicode.

Più unità di codice a 16 bit vengono usate per rappresentare singoli caratteri Unicode nei casi seguenti:

Glifi, che possono essere costituiti da un singolo carattere o da un carattere di base seguito da uno o più caratteri combinati. Ad esempio, il carattere ä è rappresentato da un oggetto Char la cui unità di codice è U+0061 seguita da un oggetto Char la cui unità di codice è U+0308. Il carattere ä può essere definito anche da un singolo oggetto Char con un'unità di codice U+00E4. Nell'esempio seguente viene illustrato che il carattere ä è costituito da due oggetti Char.

using System;
using System.IO;

public class Example1
{
    public static void Main()
    {
        StreamWriter sw = new StreamWriter("chars1.txt");
        char[] chars = { '\u0061', '\u0308' };
        string strng = new String(chars);
        sw.WriteLine(strng);
        sw.Close();
    }
}
// The example produces the following output:
//       ä

open System
open System.IO

let sw = new StreamWriter("chars1.txt")
let chars = [| '\u0061'; '\u0308' |]
let string = String chars
sw.WriteLine string
sw.Close()

// The example produces the following output:
//       ä

Imports System.IO

Module Example2
    Public Sub Main()
        Dim sw As New StreamWriter("chars1.txt")
        Dim chars() As Char = {ChrW(&H61), ChrW(&H308)}
        Dim strng As New String(chars)
        sw.WriteLine(strng)
        sw.Close()
    End Sub
End Module
' The example produces the following output:
'       ä

Caratteri esterni al Piano Multilingue di Base Unicode (BMP). Unicode supporta sedici piani oltre al BMP, che rappresenta il piano 0. Un punto di codice Unicode è rappresentato in UTF-32 da un valore a 21 bit che include il piano. Ad esempio, U+1D160 rappresenta il carattere MUSICAL SYMBOL OTTAVO NOTA. Poiché la codifica UTF-16 ha solo 16 bit, i caratteri esterni al BMP sono rappresentati da coppie di surrogati in UTF-16. L'esempio seguente illustra che l'equivalente UTF-32 di U+1D160, il carattere MUSICAL SYMBOL EIGHTH NOTE, è U+D834 U+DD60. U+D834 è il surrogato alto; i surrogati alti vanno da U+D800 a U+DBFF. U+DD60 è il surrogato basso; i surrogati bassi vanno da U+DC00 a U+DFFF.

using System;
using System.IO;

public class Example3
{
    public static void Main()
    {
        StreamWriter sw = new StreamWriter(@".\chars2.txt");
        int utf32 = 0x1D160;
        string surrogate = Char.ConvertFromUtf32(utf32);
        sw.WriteLine("U+{0:X6} UTF-32 = {1} ({2}) UTF-16",
                     utf32, surrogate, ShowCodePoints(surrogate));
        sw.Close();
    }

    private static string ShowCodePoints(string value)
    {
        string retval = null;
        foreach (var ch in value)
            retval += String.Format("U+{0:X4} ", Convert.ToUInt16(ch));

        return retval.Trim();
    }
}
// The example produces the following output:
//       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

open System
open System.IO

let showCodePoints (value: char seq) =
    let str =
        value
        |> Seq.map (fun ch -> $"U+{Convert.ToUInt16 ch:X4}")
        |> String.concat ""
    str.Trim()

let sw = new StreamWriter(@".\chars2.txt")
let utf32 = 0x1D160
let surrogate = Char.ConvertFromUtf32 utf32
sw.WriteLine $"U+{utf32:X6} UTF-32 = {surrogate} ({showCodePoints surrogate}) UTF-16"
sw.Close()

// The example produces the following output:
//       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

Imports System.IO

Module Example4
    Public Sub Main()
        Dim sw As New StreamWriter(".\chars2.txt")
        Dim utf32 As Integer = &H1D160
        Dim surrogate As String = Char.ConvertFromUtf32(utf32)
        sw.WriteLine("U+{0:X6} UTF-32 = {1} ({2}) UTF-16",
                   utf32, surrogate, ShowCodePoints(surrogate))
        sw.Close()
    End Sub

    Private Function ShowCodePoints(value As String) As String
        Dim retval As String = Nothing
        For Each ch In value
            retval += String.Format("U+{0:X4} ", Convert.ToUInt16(ch))
        Next
        Return retval.Trim()
    End Function
End Module
' The example produces the following output:
'       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

Caratteri e categorie di caratteri

Ogni carattere Unicode o coppia di surrogati valida appartiene a una categoria Unicode. In .NET le categorie Unicode sono rappresentate dai membri dell'enumerazione UnicodeCategory e includono valori come UnicodeCategory.CurrencySymbol, UnicodeCategory.LowercaseLettere UnicodeCategory.SpaceSeparator, ad esempio.

Per determinare la categoria Unicode di un carattere, chiamare il metodo GetUnicodeCategory. Nell'esempio seguente, ad esempio, viene chiamato il GetUnicodeCategory per visualizzare la categoria Unicode di ogni carattere in una stringa. L'esempio funziona correttamente solo se nell'istanza di String non sono presenti coppie di surrogati.

using System;
using System.Globalization;

class Example
{
   public static void Main()
   {
      // Define a string with a variety of character categories.
      String s = "The red car drove down the long, narrow, secluded road.";
      // Determine the category of each character.
      foreach (var ch in s)
         Console.WriteLine($"'{ch}': {Char.GetUnicodeCategory(ch)}");
   }
}
// The example displays the following output:
//      'T': UppercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'c': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'v': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      'n': LowercaseLetter
//      ' ': SpaceSeparator
//      't': LowercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'l': LowercaseLetter
//      'o': LowercaseLetter
//      'n': LowercaseLetter
//      'g': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      'n': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      's': LowercaseLetter
//      'e': LowercaseLetter
//      'c': LowercaseLetter
//      'l': LowercaseLetter
//      'u': LowercaseLetter
//      'd': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'a': LowercaseLetter
//      'd': LowercaseLetter
//      '.': OtherPunctuation

open System

// Define a string with a variety of character categories.
let s = "The red car drove down the long, narrow, secluded road."
// Determine the category of each character.
for ch in s do
    printfn $"'{ch}': {Char.GetUnicodeCategory ch}"

// The example displays the following output:
//      'T': UppercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'c': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'v': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      'n': LowercaseLetter
//      ' ': SpaceSeparator
//      't': LowercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'l': LowercaseLetter
//      'o': LowercaseLetter
//      'n': LowercaseLetter
//      'g': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      'n': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      's': LowercaseLetter
//      'e': LowercaseLetter
//      'c': LowercaseLetter
//      'l': LowercaseLetter
//      'u': LowercaseLetter
//      'd': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'a': LowercaseLetter
//      'd': LowercaseLetter
//      '.': OtherPunctuation

Imports System.Globalization

Module Example1
    Public Sub Main()
        ' Define a string with a variety of character categories.
        Dim s As String = "The car drove down the narrow, secluded road."
        ' Determine the category of each character.
        For Each ch In s
            Console.WriteLine("'{0}': {1}", ch, Char.GetUnicodeCategory(ch))
        Next
    End Sub
End Module
' The example displays the following output:
'       'T': UppercaseLetter
'       'h': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'r': LowercaseLetter
'       'e': LowercaseLetter
'       'd': LowercaseLetter
'       ' ': SpaceSeparator
'       'c': LowercaseLetter
'       'a': LowercaseLetter
'       'r': LowercaseLetter
'       ' ': SpaceSeparator
'       'd': LowercaseLetter
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'v': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'd': LowercaseLetter
'       'o': LowercaseLetter
'       'w': LowercaseLetter
'       'n': LowercaseLetter
'       ' ': SpaceSeparator
'       't': LowercaseLetter
'       'h': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'l': LowercaseLetter
'       'o': LowercaseLetter
'       'n': LowercaseLetter
'       'g': LowercaseLetter
'       ',': OtherPunctuation
'       ' ': SpaceSeparator
'       'n': LowercaseLetter
'       'a': LowercaseLetter
'       'r': LowercaseLetter
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'w': LowercaseLetter
'       ',': OtherPunctuation
'       ' ': SpaceSeparator
'       's': LowercaseLetter
'       'e': LowercaseLetter
'       'c': LowercaseLetter
'       'l': LowercaseLetter
'       'u': LowercaseLetter
'       'd': LowercaseLetter
'       'e': LowercaseLetter
'       'd': LowercaseLetter
'       ' ': SpaceSeparator
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'a': LowercaseLetter
'       'd': LowercaseLetter
'       '.': OtherPunctuation

Internamente, per i caratteri esterni all'intervallo ASCII (da U+0000 a U+00FF), il metodo GetUnicodeCategory dipende dalle categorie Unicode segnalate dalla classe CharUnicodeInfo. A partire da .NET Framework 4.6.2, i caratteri Unicode vengono classificati in base a Standard Unicode versione 8.0.0. Nelle versioni di .NET Framework dalla 4 alla 4.6.1, viene classificata in base a Standard Unicode, Versione 6.3.0.

Caratteri ed elementi di testo

Poiché un singolo carattere può essere rappresentato da più oggetti Char, non è sempre significativo lavorare con singoli oggetti Char. L'esempio seguente, ad esempio, converte i punti di codice Unicode che rappresentano i numeri dell'Egeo da zero a 9 in unità di codice con codifica UTF-16. Poiché equipara erroneamente gli oggetti Char ai caratteri, riporta inaccuratamente che la stringa risultante ha 20 caratteri.

using System;

public class Example5
{
    public static void Main()
    {
        string result = String.Empty;
        for (int ctr = 0x10107; ctr <= 0x10110; ctr++)  // Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr);

        Console.WriteLine($"The string contains {result.Length} characters.");
    }
}
// The example displays the following output:
//     The string contains 20 characters.

open System

let result =
    [ for i in 0x10107..0x10110 do  // Range of Aegean numbers.
        Char.ConvertFromUtf32 i ]
    |> String.concat ""

printfn $"The string contains {result.Length} characters."


// The example displays the following output:
//     The string contains 20 characters.

Module Example5
    Public Sub Main()
        Dim result As String = String.Empty
        For ctr As Integer = &H10107 To &H10110     ' Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr)
        Next
        Console.WriteLine("The string contains {0} characters.", result.Length)
    End Sub
End Module
' The example displays the following output:
'     The string contains 20 characters.

È possibile eseguire le operazioni seguenti per evitare il presupposto che un oggetto Char rappresenti un singolo carattere:

È possibile usare un oggetto String nella sua interezza invece di lavorare con i singoli caratteri per rappresentare e analizzare il contenuto linguistico.

È possibile usare String.EnumerateRunes come illustrato nell'esempio seguente:

int CountLetters(string s)
{
    int letterCount = 0;

    foreach (Rune rune in s.EnumerateRunes())
    {
        if (Rune.IsLetter(rune))
        { letterCount++; }
    }

    return letterCount;
}

let countLetters (s: string) =
    let mutable letterCount = 0

    for rune in s.EnumerateRunes() do
        if Rune.IsLetter rune then
            letterCount <- letterCount + 1

    letterCount

È possibile usare la classe StringInfo per usare gli elementi di testo anziché i singoli oggetti Char. Nell'esempio seguente viene utilizzato l'oggetto StringInfo per contare il numero di elementi di testo in una stringa costituita dai numeri dell'Mar Egeo da zero a nove. Poiché considera una coppia di surrogati un singolo carattere, segnala correttamente che la stringa contiene dieci caratteri.

using System;
using System.Globalization;

public class Example4
{
    public static void Main()
    {
        string result = String.Empty;
        for (int ctr = 0x10107; ctr <= 0x10110; ctr++)  // Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr);

        StringInfo si = new StringInfo(result);
        Console.WriteLine($"The string contains {si.LengthInTextElements} characters.");
    }
}
// The example displays the following output:
//       The string contains 10 characters.

open System
open System.Globalization

let result =
    [ for i in 0x10107..0x10110 do  // Range of Aegean numbers.
        Char.ConvertFromUtf32 i ]
    |> String.concat ""


let si = StringInfo result
printfn $"The string contains {si.LengthInTextElements} characters."

// The example displays the following output:
//       The string contains 10 characters.

Imports System.Globalization

Module Example6
    Public Sub Main()
        Dim result As String = String.Empty
        For ctr As Integer = &H10107 To &H10110     ' Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr)
        Next
        Dim si As New StringInfo(result)
        Console.WriteLine("The string contains {0} characters.", si.LengthInTextElements)
    End Sub
End Module
' The example displays the following output:
'       The string contains 10 characters.

Se una stringa contiene un carattere di base con uno o più caratteri combinati, è possibile chiamare il metodo String.Normalize per convertire la sottostringa in una singola unità di codice con codifica UTF-16. Nell'esempio seguente viene chiamato il metodo String.Normalize per convertire il carattere base U+0061 (LETTERA MINUSCOLA LATINA A) e il carattere combinato U+0308 (DIAERESI COMBINATA) in U+00E4 (LETTERA MINUSCOLA LATINA A CON DIAERESI).

using System;

public class Example2
{
    public static void Main()
    {
        string combining = "\u0061\u0308";
        ShowString(combining);

        string normalized = combining.Normalize();
        ShowString(normalized);
    }

    private static void ShowString(string s)
    {
        Console.Write("Length of string: {0} (", s.Length);
        for (int ctr = 0; ctr < s.Length; ctr++)
        {
            Console.Write("U+{0:X4}", Convert.ToUInt16(s[ctr]));
            if (ctr != s.Length - 1) Console.Write(" ");
        }
        Console.WriteLine(")\n");
    }
}
// The example displays the following output:
//       Length of string: 2 (U+0061 U+0308)
//
//       Length of string: 1 (U+00E4)

open System

let showString (s: string) =
    printf $"Length of string: {s.Length} ("
    for i = 0 to s.Length - 1 do
        printf $"U+{Convert.ToUInt16 s[i]:X4}"
        if i <> s.Length - 1 then printf " "
    printfn ")\n"

let combining = "\u0061\u0308"
showString combining

let normalized = combining.Normalize()
showString normalized

// The example displays the following output:
//       Length of string: 2 (U+0061 U+0308)
//
//       Length of string: 1 (U+00E4)

Module Example3
    Public Sub Main()
        Dim combining As String = ChrW(&H61) + ChrW(&H308)
        ShowString(combining)

        Dim normalized As String = combining.Normalize()
        ShowString(normalized)
    End Sub

    Private Sub ShowString(s As String)
        Console.Write("Length of string: {0} (", s.Length)
        For ctr As Integer = 0 To s.Length - 1
            Console.Write("U+{0:X4}", Convert.ToUInt16(s(ctr)))
            If ctr <> s.Length - 1 Then Console.Write(" ")
        Next
        Console.WriteLine(")")
        Console.WriteLine()
    End Sub
End Module
' The example displays the following output:
'       Length of string: 2 (U+0061 U+0308)
'       
'       Length of string: 1 (U+00E4)

Operazioni comuni

La struttura Char fornisce metodi per confrontare Char oggetti, convertire il valore dell'oggetto Char corrente in un oggetto di un altro tipo e determinare la categoria Unicode di un oggetto Char:

Per farlo	Usa questi metodi `System.Char`
Confrontare gli oggetti Char	CompareTo e Equals
Convertire un punto di codice in una stringa	ConvertFromUtf32 Vedi anche il tipo Rune.
Convertire un oggetto Char o una coppia surrogata di oggetti Char in un punto di codice	Per un singolo carattere: Convert.ToInt32(Char) Per una coppia di surrogati o un carattere in una stringa: Char.ConvertToUtf32 Vedi anche il tipo Rune.
Ottenere la categoria Unicode di un carattere	GetUnicodeCategory Vedi anche Rune.GetUnicodeCategory.
Determinare se un carattere si trova in una determinata categoria Unicode, ad esempio cifra, lettera, punteggiatura, carattere di controllo e così via	IsControl, IsDigit, IsHighSurrogate, IsLetter, IsLetterOrDigit, IsLower, IsLowSurrogate, IsNumber, IsPunctuation, IsSeparator, IsSurrogate, IsSurrogatePair, IsSymbol, IsUppere IsWhiteSpace Vedere anche i metodi corrispondenti nel tipo Rune.
Convertire un oggetto Char che rappresenta un numero in un tipo valore numerico	GetNumericValue Vedi anche Rune.GetNumericValue.
Convertire un carattere in una stringa in un oggetto Char	Parse e TryParse
Convertire un oggetto Char in un oggetto String	ToString
Modificare il caso di un oggetto Char	ToLower, ToLowerInvariant, ToUppere ToUpperInvariant Vedere anche i metodi corrispondenti nel tipo Rune.

Valori di 'char' e interoperabilità

Quando un tipo di Char gestito, rappresentato come unità di codice con codifica UTF-16 Unicode, viene passato al codice non gestito, il marshaller di interoperabilità converte il set di caratteri in ANSI per impostazione predefinita. È possibile applicare l'attributo DllImportAttribute alle dichiarazioni platform invoke e all'attributo StructLayoutAttribute a una dichiarazione di interoperabilità COM per controllare il set di caratteri utilizzato da un tipo di Char sottoposto a marshalling.

Commenti e suggerimenti

Questa pagina è stata utile?

Last updated on 2025-03-23