Intro
.NET uses UTF-16 for string encoding. This means:
- char type, which is atomic unit of string, is 16-bit size data type.
- that surrogate pairs are used for Supplementary code points, U+10000..U+10FFFF, while for Basic Multilingual Plane, U+0000..U+FFFF one char is sufficient to express any Unicode scalar value.
.NET Rune comes in with ability to bypass surrogate pair problems, e.g. their unintended split. Thus by one variable all scalar values can be expressed.
var a = new Rune('a');
var grinnigFace = new Rune ( '\uD83D', '\uDE00' );
grinnnigFace.ToString();
"😀"
Terminology
There exist many terms related to string and char types: symbol, character, pictogram, emoji, grapheme, script, letter, mark, punctuation, accent, diacritics, emoticon, … .NET chose to use textual element as synonym for what is grapheme +/ cluster in Unicode.
Unicode® Technical Standard #51
Note that all emoji sequences are single grapheme clusters:
It is also not 100 % correct to use "grapheme".
Grapheme
In linguistics, a grapheme is the smallest functional unit of a writing system.
While let say this 🧑🏿🎄 emoji is composed as sequence of:
- 1F9D1, 🧑, \Ud83e\Uddd1
- 1F3FF, 🏿, \Ud83c\Udfff
- 200D, , \U200d
- 1F384, 🎄, \Ud83c\Udf84
hardly can be any unit considered functional unit of writing system.
Document The Unicode® Standard: A Technical Introduction is more specific.
For example, in historic Spanish language sorting, "ll"; counts as a single text element. However, when Spanish words are typed, "ll" is two separate text elements: "l" and "l".
Text elements are encoded as sequences of one or more characters. Certain of these sequences are called combining character sequences, made up of a base letter and one or more combining marks, which are rendered around the base letter (above it, below it, etc.). For example, a sequence of "a" followed by a combining circumflex "^" would be rendered as "â".
There can be seen some intersection between Unicode and .NET on text/textual element. Nonetheless, rather opaque terminology reign is apparent.
The Rune
https://www.vocabulary.com/dictionary/rune
A rune is a letter used in early Germanic writing. A linguist might be interested in runes because they're evidence of ancient languages, while a mystic might use runes, believed by some to have magical properties, in fortune-telling.
https://www.thefreedictionary.com/rune
- 1a: Any of the characters in several alphabets used by ancient Germanic peoples from the 3rd to the 13th century.
- 1b: A similar character in another alphabet, sometimes believed to have magic powers.
- 2: A poem or incantation of mysterious significance, especially a magic charm.
Conclusion
After deeper look, "rune" does not resemble any of terminology used by Unicode and seems to be tightly coupled with Germanic tribes' writing system. As far as I can see, Rune is nothing more then Unicode Scalar Value. It is not important whether UCP is expressed by 1:1 numeric relation, surrogate pairs or by series of doggies and cats. It could be said that .NET is about to contribute to terminology goulash:
- Why Rune is not UnicodeScalarValue or something more technical accurate?
- I failed on finding any reference on why this name was chosen.
- Is there some specific reason?
- Or Rune is just fine as grapheme, glyph, symbol and others would be?