Rune Struct
Definition
Important
Some information relates to prerelease product that may be substantially modified before itβs released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Represents a Unicode scalar value ([ U+0000..U+D7FF ], inclusive; or [ U+E000..U+10FFFF ], inclusive).
public value class Rune : IComparable, IComparable<System::Text::Rune>, IEquatable<System::Text::Rune>
public value class Rune : IComparable, IComparable<System::Text::Rune>, IEquatable<System::Text::Rune>, ISpanFormattable
public value class Rune : IComparable, IComparable<System::Text::Rune>, IEquatable<System::Text::Rune>, ISpanFormattable, IUtf8SpanFormattable
public value class Rune : IComparable<System::Text::Rune>, IEquatable<System::Text::Rune>
public readonly struct Rune : IComparable, IComparable<System.Text.Rune>, IEquatable<System.Text.Rune>
public readonly struct Rune : IComparable, IComparable<System.Text.Rune>, IEquatable<System.Text.Rune>, ISpanFormattable
public readonly struct Rune : IComparable, IComparable<System.Text.Rune>, IEquatable<System.Text.Rune>, ISpanFormattable, IUtf8SpanFormattable
public readonly struct Rune : IComparable<System.Text.Rune>, IEquatable<System.Text.Rune>
type Rune = struct
type Rune = struct
interface ISpanFormattable
interface IFormattable
type Rune = struct
interface IFormattable
interface ISpanFormattable
type Rune = struct
interface IFormattable
interface ISpanFormattable
interface IUtf8SpanFormattable
Public Structure Rune
Implements IComparable, IComparable(Of Rune), IEquatable(Of Rune)
Public Structure Rune
Implements IComparable, IComparable(Of Rune), IEquatable(Of Rune), ISpanFormattable
Public Structure Rune
Implements IComparable, IComparable(Of Rune), IEquatable(Of Rune), ISpanFormattable, IUtf8SpanFormattable
Public Structure Rune
Implements IComparable(Of Rune), IEquatable(Of Rune)
- Inheritance
- Implements
Remarks
A Rune instance represents a Unicode scalar value, which means any code point excluding the surrogate range (U+D800..U+DFFF). The type's constructors and conversion operators validate the input, so consumers can call the APIs assuming that the underlying Rune instance is well formed.
If you aren't familiar with the terms Unicode scalar value, code point, surrogate range, and well-formed, see Introduction to character encoding in .NET.
The following sections explain:
- When to use the Rune type
- When not to use the Rune type
- How to instantiate a Rune
- How to query properties of a Rune instance
- Convert a
Rune
to UTF-8 or UTF-16 - Rune in .NET vs. other languages
When to use the Rune type
Consider using the Rune
type if your code:
- Calls APIs that require Unicode scalar values
- Explicitly handles surrogate pairs
APIs that require Unicode scalar values
If your code iterates through the char
instances in a string
or a ReadOnlySpan<char>
, some of the char
methods won't work correctly on char
instances that are in the surrogate range. For example, the following APIs require a scalar value char
to work correctly:
- Char.GetNumericValue
- Char.GetUnicodeCategory
- Char.IsDigit
- Char.IsLetter
- Char.IsLetterOrDigit
- Char.IsLower
- Char.IsNumber
- Char.IsPunctuation
- Char.IsSymbol
- Char.IsUpper
The following example shows code that won't work correctly if any of the char
instances are surrogate code points:
// THE FOLLOWING METHOD SHOWS INCORRECT CODE.
// DO NOT DO THIS IN A PRODUCTION APPLICATION.
int CountLettersBadExample(string s)
{
int letterCount = 0;
foreach (char ch in s)
{
if (char.IsLetter(ch))
{ letterCount++; }
}
return letterCount;
}
// THE FOLLOWING METHOD SHOWS INCORRECT CODE.
// DO NOT DO THIS IN A PRODUCTION APPLICATION.
let countLettersBadExample (s: string) =
let mutable letterCount = 0
for ch in s do
if Char.IsLetter ch then
letterCount <- letterCount + 1
letterCount
Here's equivalent code that works with a ReadOnlySpan<char>
:
// THE FOLLOWING METHOD SHOWS INCORRECT CODE.
// DO NOT DO THIS IN A PRODUCTION APPLICATION.
static int CountLettersBadExample(ReadOnlySpan<char> span)
{
int letterCount = 0;
foreach (char ch in span)
{
if (char.IsLetter(ch))
{ letterCount++; }
}
return letterCount;
}
The preceding code works correctly with some languages such as English:
CountLettersInString("Hello")
// Returns 5
But it won't work correctly for languages outside the Basic Multilingual Plane, such as Osage:
CountLettersInString("πππ»ππ»π π»π")
// Returns 0
The reason this method returns incorrect results for Osage text is that the char
instances for Osage letters are surrogate code points. No single surrogate code point has enough information to determine if it's a letter.
If you change this code to use Rune
instead of char
, the method works correctly with code points outside the Basic Multilingual Plane:
int CountLetters(string s)
{
int letterCount = 0;
foreach (Rune rune in s.EnumerateRunes())
{
if (Rune.IsLetter(rune))
{ letterCount++; }
}
return letterCount;
}
let countLetters (s: string) =
let mutable letterCount = 0
for rune in s.EnumerateRunes() do
if Rune.IsLetter rune then
letterCount <- letterCount + 1
letterCount
Here's equivalent code that works with a ReadOnlySpan<char>
:
static int CountLetters(ReadOnlySpan<char> span)
{
int letterCount = 0;
foreach (Rune rune in span.EnumerateRunes())
{
if (Rune.IsLetter(rune))
{ letterCount++; }
}
return letterCount;
}
The preceding code counts Osage letters correctly:
CountLettersInString("πππ»ππ»π π»π")
// Returns 8
Code that explicitly handles surrogate pairs
Consider using the Rune
type if your code calls APIs that explicitly operate on surrogate code points, such as the following methods:
- Char.IsSurrogate
- Char.IsSurrogatePair
- Char.IsHighSurrogate
- Char.IsLowSurrogate
- Char.ConvertFromUtf32
- Char.ConvertToUtf32
For example, the following method has special logic to deal with surrogate char
pairs:
static void ProcessStringUseChar(string s)
{
Console.WriteLine("Using char");
for (int i = 0; i < s.Length; i++)
{
if (!char.IsSurrogate(s[i]))
{
Console.WriteLine($"Code point: {(int)(s[i])}");
}
else if (i + 1 < s.Length && char.IsSurrogatePair(s[i], s[i + 1]))
{
int codePoint = char.ConvertToUtf32(s[i], s[i + 1]);
Console.WriteLine($"Code point: {codePoint}");
i++; // so that when the loop iterates it's actually +2
}
else
{
throw new Exception("String was not well-formed UTF-16.");
}
}
}
Such code is simpler if it uses Rune
, as in the following example:
static void ProcessStringUseRune(string s)
{
Console.WriteLine("Using Rune");
for (int i = 0; i < s.Length;)
{
if (!Rune.TryGetRuneAt(s, i, out Rune rune))
{
throw new Exception("String was not well-formed UTF-16.");
}
Console.WriteLine($"Code point: {rune.Value}");
i += rune.Utf16SequenceLength; // increment the iterator by the number of chars in this Rune
}
}
When not to use Rune
You don't need to use the Rune
type if your code:
- Looks for exact
char
matches - Splits a string on a known char value
Using the Rune
type may return incorrect results if your code:
- Counts the number of display characters in a
string
Look for exact char
matches
The following code iterates through a string
looking for specific characters, returning the index of the first match. There's no need to change this code to use Rune
, as the code is looking for characters that are represented by a single char
.
int GetIndexOfFirstAToZ(string s)
{
for (int i = 0; i < s.Length; i++)
{
char thisChar = s[i];
if ('A' <= thisChar && thisChar <= 'Z')
{
return i; // found a match
}
}
return -1; // didn't find 'A' - 'Z' in the input string
}
Split a string on a known char
It's common to call string.Split
and use delimiters such as ' '
(space) or ','
(comma), as in the following example:
string inputString = "π, π, π";
string[] splitOnSpace = inputString.Split(' ');
string[] splitOnComma = inputString.Split(',');
There is no need to use Rune
here, because the code is looking for characters that are represented by a single char
.
Count the number of display characters in a string
The number of Rune
instances in a string might not match the number of user-perceivable characters shown when displaying the string.
Since Rune
instances represent Unicode scalar values, components that follow the Unicode text segmentation guidelines can use Rune
as a building block for counting display characters.
The StringInfo type can be used to count display characters, but it doesn't count correctly in all scenarios for .NET implementations other than .NET 5+.
For more information, see Grapheme clusters.
How to instantiate a Rune
There are several ways to get a Rune
instance. You can use a constructor to create a Rune
directly from:
A code point.
Rune a = new Rune(0x0061); // LATIN SMALL LETTER A Rune b = new Rune(0x10421); // DESERET CAPITAL LETTER ER
A single
char
.Rune c = new Rune('a');
A surrogate
char
pair.Rune d = new Rune('\ud83d', '\udd2e'); // U+1F52E CRYSTAL BALL
All of the constructors throw an ArgumentException
if the input doesn't represent a valid Unicode scalar value.
There are Rune.TryCreate methods available for callers who don't want exceptions to be thrown on failure.
Rune
instances can also be read from existing input sequences. For instance, given a ReadOnlySpan<char>
that represents UTF-16 data, the Rune.DecodeFromUtf16 method returns the first Rune
instance at the beginning of the input span. The Rune.DecodeFromUtf8 method operates similarly, accepting a ReadOnlySpan<byte>
parameter that represents UTF-8 data. There are equivalent methods to read from the end of the span instead of the beginning of the span.
Query properties of a Rune
To get the integer code point value of a Rune
instance, use the Rune.Value property.
Rune rune = new Rune('\ud83d', '\udd2e'); // U+1F52E CRYSTAL BALL
int codePoint = rune.Value; // = 128302 decimal (= 0x1F52E)
Many of the static APIs available on the char
type are also available on the Rune
type. For instance, Rune.IsWhiteSpace and Rune.GetUnicodeCategory are equivalents to Char.IsWhiteSpace and Char.GetUnicodeCategory methods. The Rune
methods correctly handle surrogate pairs.
The following example code takes a ReadOnlySpan<char>
as input and trims from both the start and the end of the span every Rune
that isn't a letter or a digit.
static ReadOnlySpan<char> TrimNonLettersAndNonDigits(ReadOnlySpan<char> span)
{
// First, trim from the front.
// If any Rune can't be decoded
// (return value is anything other than "Done"),
// or if the Rune is a letter or digit,
// stop trimming from the front and
// instead work from the end.
while (Rune.DecodeFromUtf16(span, out Rune rune, out int charsConsumed) == OperationStatus.Done)
{
if (Rune.IsLetterOrDigit(rune))
{ break; }
span = span[charsConsumed..];
}
// Next, trim from the end.
// If any Rune can't be decoded,
// or if the Rune is a letter or digit,
// break from the loop, and we're finished.
while (Rune.DecodeLastFromUtf16(span, out Rune rune, out int charsConsumed) == OperationStatus.Done)
{
if (Rune.IsLetterOrDigit(rune))
{ break; }
span = span[..^charsConsumed];
}
return span;
}
There are some API differences between char
and Rune
. For example:
- There is no
Rune
equivalent to Char.IsSurrogate(Char), sinceRune
instances by definition can never be surrogate code points. - The Rune.GetUnicodeCategory doesn't always return the same result as Char.GetUnicodeCategory. It does return the same value as CharUnicodeInfo.GetUnicodeCategory. For more information, see the Remarks on Char.GetUnicodeCategory.
Convert a Rune
to UTF-8 or UTF-16
Since a Rune
is a Unicode scalar value, it can be converted to UTF-8, UTF-16, or UTF-32 encoding. The Rune
type has built-in support for conversion to UTF-8 and UTF-16.
The Rune.EncodeToUtf16 converts a Rune
instance to char
instances. To query the number of char
instances that would result from converting a Rune
instance to UTF-16, use the Rune.Utf16SequenceLength property. Similar methods exist for UTF-8 conversion.
The following example converts a Rune
instance to a char
array. The code assumes you have a Rune
instance in the rune
variable:
char[] chars = new char[rune.Utf16SequenceLength];
int numCharsWritten = rune.EncodeToUtf16(chars);
Since a string
is a sequence of UTF-16 chars, the following example also converts a Rune
instance to UTF-16:
string theString = rune.ToString();
The following example converts a Rune
instance to a UTF-8
byte array:
byte[] bytes = new byte[rune.Utf8SequenceLength];
int numBytesWritten = rune.EncodeToUtf8(bytes);
The Rune.EncodeToUtf16 and Rune.EncodeToUtf8 methods return the actual number of elements written. They throw an exception if the destination buffer is too short to contain the result. There are non-throwing TryEncodeToUtf8 and TryEncodeToUtf16 methods as well for callers who want to avoid exceptions.
Rune in .NET vs. other languages
The term "rune" is not defined in the Unicode Standard. The term dates back to the creation of UTF-8. Rob Pike and Ken Thompson were looking for a term to describe what would eventually become known as a code point. They settled on the term "rune", and Rob Pike's later influence over the Go programming language helped popularize the term.
However, the .NET Rune
type is not the equivalent of the Go rune
type. In Go, the rune
type is an alias for int32
. A Go rune is intended to represent a Unicode code point, but it can be any 32-bit value, including surrogate code points and values that are not legal Unicode code points.
For similar types in other programming languages, see Rust's primitive char
type or Swift's Unicode.Scalar
type, both of which represent Unicode scalar values. They provide functionality similar to .NET's Rune
type, and they disallow instantiation of values that are not legal Unicode scalar values.
Constructors
Rune(Char) |
Creates a Rune from the provided UTF-16 code unit. |
Rune(Char, Char) |
Creates a Rune from the provided UTF-16 surrogate pair. |
Rune(Int32) |
Creates a Rune from the specified 32-bit integer that represents a Unicode scalar value. |
Rune(UInt32) |
Creates a Rune from the specified 32-bit unsigned integer that represents a Unicode scalar value. |
Properties
IsAscii |
Gets a value that indicates whether the scalar value associated with this Rune is within the ASCII encoding range. |
IsBmp |
Gets a value that indicates whether the scalar value associated with this Rune is within the BMP encoding range. |
Plane |
Gets the Unicode plane (0 to 16, inclusive) that contains this scalar. |
ReplacementChar |
Gets a Rune instance that represents the Unicode replacement character U+FFFD. |
Utf16SequenceLength |
Gets the length in code units (Char) of the UTF-16 sequence required to represent this scalar value. |
Utf8SequenceLength |
Gets the length in code units of the UTF-8 sequence required to represent this scalar value. |
Value |
Gets the Unicode scalar value as an integer. |
Methods
CompareTo(Rune) |
Compares the current instance to the specified Rune instance. |
DecodeFromUtf16(ReadOnlySpan<Char>, Rune, Int32) |
Decodes the Rune at the beginning of the provided UTF-16 source buffer. |
DecodeFromUtf8(ReadOnlySpan<Byte>, Rune, Int32) |
Decodes the Rune at the beginning of the provided UTF-8 source buffer. |
DecodeLastFromUtf16(ReadOnlySpan<Char>, Rune, Int32) |
Decodes the Rune at the end of the provided UTF-16 source buffer. |
DecodeLastFromUtf8(ReadOnlySpan<Byte>, Rune, Int32) |
Decodes the Rune at the end of the provided UTF-8 source buffer. |
EncodeToUtf16(Span<Char>) |
Encodes this Rune to a UTF-16 destination buffer. |
EncodeToUtf8(Span<Byte>) |
Encodes this Rune to a UTF-8 destination buffer. |
Equals(Object) |
Returns a value that indicates whether the current instance and a specified object are equal. |
Equals(Rune) |
Returns a value that indicates whether the current instance and a specified rune are equal. |
GetHashCode() |
Returns the hash code for this instance. |
GetNumericValue(Rune) |
Gets the numeric value associated with the specified rune. |
GetRuneAt(String, Int32) |
Gets the Rune that begins at a specified position in a string. |
GetUnicodeCategory(Rune) |
Gets the Unicode category associated with the specified rune. |
IsControl(Rune) |
Returns a value that indicates whether the specified rune is categorized as a control character. |
IsDigit(Rune) |
Returns a value that indicates whether the specified rune is categorized as a decimal digit. |
IsLetter(Rune) |
Returns a value that indicates whether the specified rune is categorized as a letter. |
IsLetterOrDigit(Rune) |
Returns a value that indicates whether the specified rune is categorized as a letter or a decimal digit. |
IsLower(Rune) |
Returns a value that indicates whether the specified rune is categorized as a lowercase letter. |
IsNumber(Rune) |
Returns a value that indicates whether the specified rune is categorized as a number. |
IsPunctuation(Rune) |
Returns a value that indicates whether the specified rune is categorized as a punctuation mark. |
IsSeparator(Rune) |
Returns a value that indicates whether the specified rune is categorized as a separator character. |
IsSymbol(Rune) |
Returns a value that indicates whether the specified rune is categorized as a symbol character. |
IsUpper(Rune) |
Returns a value that indicates whether the specified rune is categorized as an uppercase letter. |
IsValid(Int32) |
Returns a value that indicates whether a 32-bit signed integer represents a valid Unicode scalar value; that is, it is in the range [ U+0000..U+D7FF ], inclusive; or [ U+E000..U+10FFFF ], inclusive. |
IsValid(UInt32) |
Returns a value that indicates whether a 32-bit unsigned integer represents a valid Unicode scalar value; that is, it is in the range [ U+0000..U+D7FF ], inclusive, or [ U+E000..U+10FFFF ], inclusive. |
IsWhiteSpace(Rune) |
Returns a value that indicates whether the specified rune is categorized as a white space character. |
ToLower(Rune, CultureInfo) |
Returns a copy of the specified Rune converted to lowercase, using the casing rules of the specified culture. |
ToLowerInvariant(Rune) |
Returns a copy of the specified Rune converted to lowercase using the casing rules of the invariant culture. |
ToString() |
Returns the string representation of this Rune instance. |
ToUpper(Rune, CultureInfo) |
Returns a copy of the specified Rune converted to uppercase, using the casing rules of the specified culture. |
ToUpperInvariant(Rune) |
Returns a copy of the specified Rune converted to uppercase using the casing rules of the invariant culture. |
TryCreate(Char, Char, Rune) |
Attempts to create a Rune from the specified UTF-16 surrogate pair and returns a value that indicates whether the operation was successful. |
TryCreate(Char, Rune) |
Attempts to create a Rune from a specified character and returns a value that indicates whether the operation succeeded. |
TryCreate(Int32, Rune) |
Attempts to create a Rune from a specified signed integer that represents a Unicode scalar value. |
TryCreate(UInt32, Rune) |
Attempts to create a Rune from the specified 32-bit unsigned integer that represents a Unicode scalar value. |
TryEncodeToUtf16(Span<Char>, Int32) |
Encodes this Rune to a UTF-16 encoded destination buffer. |
TryEncodeToUtf8(Span<Byte>, Int32) |
Encodes this Rune to a UTF-8 encoded destination buffer. |
TryGetRuneAt(String, Int32, Rune) |
Attempts to get the Rune that begins at a specified position in a string, and return a value that indicates whether the operation succeeded. |
Operators
Equality(Rune, Rune) |
Returns a value that indicates whether two Rune instances are equal. |
Explicit(Char to Rune) |
Defines an explicit conversion of a 16-bit Unicode character to a Rune. |
Explicit(Int32 to Rune) |
Defines an explicit conversion of a 32-bit signed integer to a Rune. |
Explicit(UInt32 to Rune) |
Defines an explicit conversion of a 32-bit unsigned integer to a Rune. |
GreaterThan(Rune, Rune) |
Returns a value indicating whether a specified Rune is greater than another specified Rune. |
GreaterThanOrEqual(Rune, Rune) |
Returns a value indicating whether a specified Rune is greater than or equal to another specified Rune. |
Inequality(Rune, Rune) |
Returns a value that indicates whether two Rune instances have different values. |
LessThan(Rune, Rune) |
Returns a value indicating whether a specified Rune is less than another specified Rune. |
LessThanOrEqual(Rune, Rune) |
Returns a value indicating whether a specified Rune is less than or equal to another specified Rune. |
Explicit Interface Implementations
IComparable.CompareTo(Object) |
Compares the current instance to the specified object. |
IFormattable.ToString(String, IFormatProvider) |
Formats the value of the current instance using the specified format. |
ISpanFormattable.TryFormat(Span<Char>, Int32, ReadOnlySpan<Char>, IFormatProvider) |
Tries to format the value of the current instance into the provided span of characters. |
IUtf8SpanFormattable.TryFormat(Span<Byte>, Int32, ReadOnlySpan<Char>, IFormatProvider) |
Tries to format the value of the current instance as UTF-8 into the provided span of bytes. |