Rune Struct

Definition

Represents a Unicode scalar value ([ U+0000..U+D7FF ], inclusive; or [ U+E000..U+10FFFF ], inclusive).

public value class Rune : IComparable, IComparable<System::Text::Rune>, IEquatable<System::Text::Rune>
public value class Rune : IComparable, IComparable<System::Text::Rune>, IEquatable<System::Text::Rune>, ISpanFormattable
public value class Rune : IComparable, IComparable<System::Text::Rune>, IEquatable<System::Text::Rune>, ISpanFormattable, IUtf8SpanFormattable
public value class Rune : IComparable<System::Text::Rune>, IEquatable<System::Text::Rune>
public readonly struct Rune : IComparable, IComparable<System.Text.Rune>, IEquatable<System.Text.Rune>
public readonly struct Rune : IComparable, IComparable<System.Text.Rune>, IEquatable<System.Text.Rune>, ISpanFormattable
public readonly struct Rune : IComparable, IComparable<System.Text.Rune>, IEquatable<System.Text.Rune>, ISpanFormattable, IUtf8SpanFormattable
public readonly struct Rune : IComparable<System.Text.Rune>, IEquatable<System.Text.Rune>
type Rune = struct
type Rune = struct
    interface ISpanFormattable
    interface IFormattable
type Rune = struct
    interface IFormattable
    interface ISpanFormattable
type Rune = struct
    interface IFormattable
    interface ISpanFormattable
    interface IUtf8SpanFormattable
Public Structure Rune
Implements IComparable, IComparable(Of Rune), IEquatable(Of Rune)
Public Structure Rune
Implements IComparable, IComparable(Of Rune), IEquatable(Of Rune), ISpanFormattable
Public Structure Rune
Implements IComparable, IComparable(Of Rune), IEquatable(Of Rune), ISpanFormattable, IUtf8SpanFormattable
Public Structure Rune
Implements IComparable(Of Rune), IEquatable(Of Rune)
Inheritance
Implements

Remarks

A Rune instance represents a Unicode scalar value, which means any code point excluding the surrogate range (U+D800..U+DFFF). The type's constructors and conversion operators validate the input, so consumers can call the APIs assuming that the underlying Rune instance is well formed.

If you aren't familiar with the terms Unicode scalar value, code point, surrogate range, and well-formed, see Introduction to character encoding in .NET.

The following sections explain:

When to use the Rune type

Consider using the Rune type if your code:

  • Calls APIs that require Unicode scalar values
  • Explicitly handles surrogate pairs

APIs that require Unicode scalar values

If your code iterates through the char instances in a string or a ReadOnlySpan<char>, some of the char methods won't work correctly on char instances that are in the surrogate range. For example, the following APIs require a scalar value char to work correctly:

The following example shows code that won't work correctly if any of the char instances are surrogate code points:

// THE FOLLOWING METHOD SHOWS INCORRECT CODE.
// DO NOT DO THIS IN A PRODUCTION APPLICATION.
int CountLettersBadExample(string s)
{
    int letterCount = 0;

    foreach (char ch in s)
    {
        if (char.IsLetter(ch))
        { letterCount++; }
    }

    return letterCount;
}
// THE FOLLOWING METHOD SHOWS INCORRECT CODE.
// DO NOT DO THIS IN A PRODUCTION APPLICATION.
let countLettersBadExample (s: string) =
    let mutable letterCount = 0

    for ch in s do
        if Char.IsLetter ch then
            letterCount <- letterCount + 1
    
    letterCount

Here's equivalent code that works with a ReadOnlySpan<char>:

// THE FOLLOWING METHOD SHOWS INCORRECT CODE.
// DO NOT DO THIS IN A PRODUCTION APPLICATION.
static int CountLettersBadExample(ReadOnlySpan<char> span)
{
    int letterCount = 0;

    foreach (char ch in span)
    {
        if (char.IsLetter(ch))
        { letterCount++; }
    }

    return letterCount;
}

The preceding code works correctly with some languages such as English:

CountLettersInString("Hello")
// Returns 5

But it won't work correctly for languages outside the Basic Multilingual Plane, such as Osage:

CountLettersInString("π“π“˜π“»π“˜π“»π“Ÿ π’»π“Ÿ")
// Returns 0

The reason this method returns incorrect results for Osage text is that the char instances for Osage letters are surrogate code points. No single surrogate code point has enough information to determine if it's a letter.

If you change this code to use Rune instead of char, the method works correctly with code points outside the Basic Multilingual Plane:

int CountLetters(string s)
{
    int letterCount = 0;

    foreach (Rune rune in s.EnumerateRunes())
    {
        if (Rune.IsLetter(rune))
        { letterCount++; }
    }

    return letterCount;
}
let countLetters (s: string) =
    let mutable letterCount = 0

    for rune in s.EnumerateRunes() do
        if Rune.IsLetter rune then
            letterCount <- letterCount + 1

    letterCount

Here's equivalent code that works with a ReadOnlySpan<char>:

static int CountLetters(ReadOnlySpan<char> span)
{
    int letterCount = 0;

    foreach (Rune rune in span.EnumerateRunes())
    {
        if (Rune.IsLetter(rune))
        { letterCount++; }
    }

    return letterCount;
}

The preceding code counts Osage letters correctly:

CountLettersInString("π“π“˜π“»π“˜π“»π“Ÿ π’»π“Ÿ")
// Returns 8

Code that explicitly handles surrogate pairs

Consider using the Rune type if your code calls APIs that explicitly operate on surrogate code points, such as the following methods:

For example, the following method has special logic to deal with surrogate char pairs:

static void ProcessStringUseChar(string s)
{
    Console.WriteLine("Using char");

    for (int i = 0; i < s.Length; i++)
    {
        if (!char.IsSurrogate(s[i]))
        {
            Console.WriteLine($"Code point: {(int)(s[i])}");
        }
        else if (i + 1 < s.Length && char.IsSurrogatePair(s[i], s[i + 1]))
        {
            int codePoint = char.ConvertToUtf32(s[i], s[i + 1]);
            Console.WriteLine($"Code point: {codePoint}");
            i++; // so that when the loop iterates it's actually +2
        }
        else
        {
            throw new Exception("String was not well-formed UTF-16.");
        }
    }
}

Such code is simpler if it uses Rune, as in the following example:

static void ProcessStringUseRune(string s)
{
    Console.WriteLine("Using Rune");

    for (int i = 0; i < s.Length;)
    {
        if (!Rune.TryGetRuneAt(s, i, out Rune rune))
        {
            throw new Exception("String was not well-formed UTF-16.");
        }

        Console.WriteLine($"Code point: {rune.Value}");
        i += rune.Utf16SequenceLength; // increment the iterator by the number of chars in this Rune
    }
}

When not to use Rune

You don't need to use the Rune type if your code:

  • Looks for exact char matches
  • Splits a string on a known char value

Using the Rune type may return incorrect results if your code:

  • Counts the number of display characters in a string

Look for exact char matches

The following code iterates through a string looking for specific characters, returning the index of the first match. There's no need to change this code to use Rune, as the code is looking for characters that are represented by a single char.

int GetIndexOfFirstAToZ(string s)
{
    for (int i = 0; i < s.Length; i++)
    {
        char thisChar = s[i];
        if ('A' <= thisChar && thisChar <= 'Z')
        {
            return i; // found a match
        }
    }

    return -1; // didn't find 'A' - 'Z' in the input string
}

Split a string on a known char

It's common to call string.Split and use delimiters such as ' ' (space) or ',' (comma), as in the following example:

string inputString = "πŸ‚, πŸ„, πŸ†";
string[] splitOnSpace = inputString.Split(' ');
string[] splitOnComma = inputString.Split(',');

There is no need to use Rune here, because the code is looking for characters that are represented by a single char.

Count the number of display characters in a string

The number of Rune instances in a string might not match the number of user-perceivable characters shown when displaying the string.

Since Rune instances represent Unicode scalar values, components that follow the Unicode text segmentation guidelines can use Rune as a building block for counting display characters.

The StringInfo type can be used to count display characters, but it doesn't count correctly in all scenarios for .NET implementations other than .NET 5+.

For more information, see Grapheme clusters.

How to instantiate a Rune

There are several ways to get a Rune instance. You can use a constructor to create a Rune directly from:

  • A code point.

    Rune a = new Rune(0x0061); // LATIN SMALL LETTER A
    Rune b = new Rune(0x10421); // DESERET CAPITAL LETTER ER
    
  • A single char.

    Rune c = new Rune('a');
    
  • A surrogate char pair.

    Rune d = new Rune('\ud83d', '\udd2e'); // U+1F52E CRYSTAL BALL
    

All of the constructors throw an ArgumentException if the input doesn't represent a valid Unicode scalar value.

There are Rune.TryCreate methods available for callers who don't want exceptions to be thrown on failure.

Rune instances can also be read from existing input sequences. For instance, given a ReadOnlySpan<char> that represents UTF-16 data, the Rune.DecodeFromUtf16 method returns the first Rune instance at the beginning of the input span. The Rune.DecodeFromUtf8 method operates similarly, accepting a ReadOnlySpan<byte> parameter that represents UTF-8 data. There are equivalent methods to read from the end of the span instead of the beginning of the span.

Query properties of a Rune

To get the integer code point value of a Rune instance, use the Rune.Value property.

Rune rune = new Rune('\ud83d', '\udd2e'); // U+1F52E CRYSTAL BALL
int codePoint = rune.Value; // = 128302 decimal (= 0x1F52E)

Many of the static APIs available on the char type are also available on the Rune type. For instance, Rune.IsWhiteSpace and Rune.GetUnicodeCategory are equivalents to Char.IsWhiteSpace and Char.GetUnicodeCategory methods. The Rune methods correctly handle surrogate pairs.

The following example code takes a ReadOnlySpan<char> as input and trims from both the start and the end of the span every Rune that isn't a letter or a digit.

static ReadOnlySpan<char> TrimNonLettersAndNonDigits(ReadOnlySpan<char> span)
{
    // First, trim from the front.
    // If any Rune can't be decoded
    // (return value is anything other than "Done"),
    // or if the Rune is a letter or digit,
    // stop trimming from the front and
    // instead work from the end.
    while (Rune.DecodeFromUtf16(span, out Rune rune, out int charsConsumed) == OperationStatus.Done)
    {
        if (Rune.IsLetterOrDigit(rune))
        { break; }
        span = span[charsConsumed..];
    }

    // Next, trim from the end.
    // If any Rune can't be decoded,
    // or if the Rune is a letter or digit,
    // break from the loop, and we're finished.
    while (Rune.DecodeLastFromUtf16(span, out Rune rune, out int charsConsumed) == OperationStatus.Done)
    {
        if (Rune.IsLetterOrDigit(rune))
        { break; }
        span = span[..^charsConsumed];
    }

    return span;
}

There are some API differences between char and Rune. For example:

Convert a Rune to UTF-8 or UTF-16

Since a Rune is a Unicode scalar value, it can be converted to UTF-8, UTF-16, or UTF-32 encoding. The Rune type has built-in support for conversion to UTF-8 and UTF-16.

The Rune.EncodeToUtf16 converts a Rune instance to char instances. To query the number of char instances that would result from converting a Rune instance to UTF-16, use the Rune.Utf16SequenceLength property. Similar methods exist for UTF-8 conversion.

The following example converts a Rune instance to a char array. The code assumes you have a Rune instance in the rune variable:

char[] chars = new char[rune.Utf16SequenceLength];
int numCharsWritten = rune.EncodeToUtf16(chars);

Since a string is a sequence of UTF-16 chars, the following example also converts a Rune instance to UTF-16:

string theString = rune.ToString();

The following example converts a Rune instance to a UTF-8 byte array:

byte[] bytes = new byte[rune.Utf8SequenceLength];
int numBytesWritten = rune.EncodeToUtf8(bytes);

The Rune.EncodeToUtf16 and Rune.EncodeToUtf8 methods return the actual number of elements written. They throw an exception if the destination buffer is too short to contain the result. There are non-throwing TryEncodeToUtf8 and TryEncodeToUtf16 methods as well for callers who want to avoid exceptions.

Rune in .NET vs. other languages

The term "rune" is not defined in the Unicode Standard. The term dates back to the creation of UTF-8. Rob Pike and Ken Thompson were looking for a term to describe what would eventually become known as a code point. They settled on the term "rune", and Rob Pike's later influence over the Go programming language helped popularize the term.

However, the .NET Rune type is not the equivalent of the Go rune type. In Go, the rune type is an alias for int32. A Go rune is intended to represent a Unicode code point, but it can be any 32-bit value, including surrogate code points and values that are not legal Unicode code points.

For similar types in other programming languages, see Rust's primitive char type or Swift's Unicode.Scalar type, both of which represent Unicode scalar values. They provide functionality similar to .NET's Rune type, and they disallow instantiation of values that are not legal Unicode scalar values.

Constructors

Rune(Char)

Creates a Rune from the provided UTF-16 code unit.

Rune(Char, Char)

Creates a Rune from the provided UTF-16 surrogate pair.

Rune(Int32)

Creates a Rune from the specified 32-bit integer that represents a Unicode scalar value.

Rune(UInt32)

Creates a Rune from the specified 32-bit unsigned integer that represents a Unicode scalar value.

Properties

IsAscii

Gets a value that indicates whether the scalar value associated with this Rune is within the ASCII encoding range.

IsBmp

Gets a value that indicates whether the scalar value associated with this Rune is within the BMP encoding range.

Plane

Gets the Unicode plane (0 to 16, inclusive) that contains this scalar.

ReplacementChar

Gets a Rune instance that represents the Unicode replacement character U+FFFD.

Utf16SequenceLength

Gets the length in code units (Char) of the UTF-16 sequence required to represent this scalar value.

Utf8SequenceLength

Gets the length in code units of the UTF-8 sequence required to represent this scalar value.

Value

Gets the Unicode scalar value as an integer.

Methods

CompareTo(Rune)

Compares the current instance to the specified Rune instance.

DecodeFromUtf16(ReadOnlySpan<Char>, Rune, Int32)

Decodes the Rune at the beginning of the provided UTF-16 source buffer.

DecodeFromUtf8(ReadOnlySpan<Byte>, Rune, Int32)

Decodes the Rune at the beginning of the provided UTF-8 source buffer.

DecodeLastFromUtf16(ReadOnlySpan<Char>, Rune, Int32)

Decodes the Rune at the end of the provided UTF-16 source buffer.

DecodeLastFromUtf8(ReadOnlySpan<Byte>, Rune, Int32)

Decodes the Rune at the end of the provided UTF-8 source buffer.

EncodeToUtf16(Span<Char>)

Encodes this Rune to a UTF-16 destination buffer.

EncodeToUtf8(Span<Byte>)

Encodes this Rune to a UTF-8 destination buffer.

Equals(Object)

Returns a value that indicates whether the current instance and a specified object are equal.

Equals(Rune)

Returns a value that indicates whether the current instance and a specified rune are equal.

GetHashCode()

Returns the hash code for this instance.

GetNumericValue(Rune)

Gets the numeric value associated with the specified rune.

GetRuneAt(String, Int32)

Gets the Rune that begins at a specified position in a string.

GetUnicodeCategory(Rune)

Gets the Unicode category associated with the specified rune.

IsControl(Rune)

Returns a value that indicates whether the specified rune is categorized as a control character.

IsDigit(Rune)

Returns a value that indicates whether the specified rune is categorized as a decimal digit.

IsLetter(Rune)

Returns a value that indicates whether the specified rune is categorized as a letter.

IsLetterOrDigit(Rune)

Returns a value that indicates whether the specified rune is categorized as a letter or a decimal digit.

IsLower(Rune)

Returns a value that indicates whether the specified rune is categorized as a lowercase letter.

IsNumber(Rune)

Returns a value that indicates whether the specified rune is categorized as a number.

IsPunctuation(Rune)

Returns a value that indicates whether the specified rune is categorized as a punctuation mark.

IsSeparator(Rune)

Returns a value that indicates whether the specified rune is categorized as a separator character.

IsSymbol(Rune)

Returns a value that indicates whether the specified rune is categorized as a symbol character.

IsUpper(Rune)

Returns a value that indicates whether the specified rune is categorized as an uppercase letter.

IsValid(Int32)

Returns a value that indicates whether a 32-bit signed integer represents a valid Unicode scalar value; that is, it is in the range [ U+0000..U+D7FF ], inclusive; or [ U+E000..U+10FFFF ], inclusive.

IsValid(UInt32)

Returns a value that indicates whether a 32-bit unsigned integer represents a valid Unicode scalar value; that is, it is in the range [ U+0000..U+D7FF ], inclusive, or [ U+E000..U+10FFFF ], inclusive.

IsWhiteSpace(Rune)

Returns a value that indicates whether the specified rune is categorized as a white space character.

ToLower(Rune, CultureInfo)

Returns a copy of the specified Rune converted to lowercase, using the casing rules of the specified culture.

ToLowerInvariant(Rune)

Returns a copy of the specified Rune converted to lowercase using the casing rules of the invariant culture.

ToString()

Returns the string representation of this Rune instance.

ToUpper(Rune, CultureInfo)

Returns a copy of the specified Rune converted to uppercase, using the casing rules of the specified culture.

ToUpperInvariant(Rune)

Returns a copy of the specified Rune converted to uppercase using the casing rules of the invariant culture.

TryCreate(Char, Char, Rune)

Attempts to create a Rune from the specified UTF-16 surrogate pair and returns a value that indicates whether the operation was successful.

TryCreate(Char, Rune)

Attempts to create a Rune from a specified character and returns a value that indicates whether the operation succeeded.

TryCreate(Int32, Rune)

Attempts to create a Rune from a specified signed integer that represents a Unicode scalar value.

TryCreate(UInt32, Rune)

Attempts to create a Rune from the specified 32-bit unsigned integer that represents a Unicode scalar value.

TryEncodeToUtf16(Span<Char>, Int32)

Encodes this Rune to a UTF-16 encoded destination buffer.

TryEncodeToUtf8(Span<Byte>, Int32)

Encodes this Rune to a UTF-8 encoded destination buffer.

TryGetRuneAt(String, Int32, Rune)

Attempts to get the Rune that begins at a specified position in a string, and return a value that indicates whether the operation succeeded.

Operators

Equality(Rune, Rune)

Returns a value that indicates whether two Rune instances are equal.

Explicit(Char to Rune)

Defines an explicit conversion of a 16-bit Unicode character to a Rune.

Explicit(Int32 to Rune)

Defines an explicit conversion of a 32-bit signed integer to a Rune.

Explicit(UInt32 to Rune)

Defines an explicit conversion of a 32-bit unsigned integer to a Rune.

GreaterThan(Rune, Rune)

Returns a value indicating whether a specified Rune is greater than another specified Rune.

GreaterThanOrEqual(Rune, Rune)

Returns a value indicating whether a specified Rune is greater than or equal to another specified Rune.

Inequality(Rune, Rune)

Returns a value that indicates whether two Rune instances have different values.

LessThan(Rune, Rune)

Returns a value indicating whether a specified Rune is less than another specified Rune.

LessThanOrEqual(Rune, Rune)

Returns a value indicating whether a specified Rune is less than or equal to another specified Rune.

Explicit Interface Implementations

IComparable.CompareTo(Object)

Compares the current instance to the specified object.

IFormattable.ToString(String, IFormatProvider)

Formats the value of the current instance using the specified format.

ISpanFormattable.TryFormat(Span<Char>, Int32, ReadOnlySpan<Char>, IFormatProvider)

Tries to format the value of the current instance into the provided span of characters.

IUtf8SpanFormattable.TryFormat(Span<Byte>, Int32, ReadOnlySpan<Char>, IFormatProvider)

Tries to format the value of the current instance as UTF-8 into the provided span of bytes.

Applies to