Why is System.Text.Rune named like this?

PerceptiveFilament 21 Reputation points
2024-10-02T11:21:18.26+00:00

Intro

.NET uses UTF-16 for string encoding. This means:

  • char type, which is atomic unit of string, is 16-bit size data type.
  • that surrogate pairs are used for Supplementary code pointsU+10000..U+10FFFF, while for Basic Multilingual Plane, U+0000..U+FFFF one char is sufficient to express any Unicode scalar value.

.NET Rune comes in with ability to bypass surrogate pair problems, e.g. their unintended split. Thus by one variable all scalar values can be expressed. 

var a = new Rune('a');
var grinnigFace = new Rune ( '\uD83D', '\uDE00' );
grinnnigFace.ToString();
"😀"

Terminology

 There exist many terms related to string and char types: symbol, character, pictogram, emoji, grapheme, script, letter, mark, punctuation, accent, diacritics, emoticon, … .NET chose to use textual element as synonym for what is grapheme +/ cluster in Unicode.

Unicode® Technical Standard #51

Note that all emoji sequences are single grapheme clusters:

It is also not 100 % correct to use "grapheme".

Grapheme

In linguistics, a grapheme is the smallest functional unit of a writing system.

While let say this 🧑🏿‍🎄 emoji is composed as sequence of:

  • 1F9D1, 🧑, \Ud83e\Uddd1
  • 1F3FF, 🏿, \Ud83c\Udfff
  • 200D, ‍, \U200d
  • 1F384, 🎄, \Ud83c\Udf84

hardly can be any unit considered functional unit of writing system.

Document The Unicode® Standard: A Technical Introduction is more specific.

For example, in historic Spanish language sorting, "ll"; counts as a single text element. However, when Spanish words are typed, "ll" is two separate text elements: "l" and "l".

Text elements are encoded as sequences of one or more characters. Certain of these sequences are called combining character sequences, made up of a base letter and one or more combining marks, which are rendered around the base letter (above it, below it, etc.). For example, a sequence of "a" followed by a combining circumflex "^" would be rendered as "â".

There can be seen some intersection between Unicode and .NET on text/textual element. Nonetheless, rather opaque terminology reign is apparent.

The Rune

https://www.vocabulary.com/dictionary/rune

A rune is a letter used in early Germanic writing. A linguist might be interested in runes because they're evidence of ancient languages, while a mystic might use runes, believed by some to have magical properties, in fortune-telling.

https://www.thefreedictionary.com/rune

  • 1a: Any of the characters in several alphabets used by ancient Germanic peoples from the 3rd to the 13th century.
  • 1b: A similar character in another alphabet, sometimes believed to have magic powers.
  • 2: A poem or incantation of mysterious significance, especially a magic charm.

Conclusion

After deeper look, "rune" does not resemble any of terminology used by Unicode and seems to be tightly coupled with Germanic tribes' writing system. As far as I can see, Rune is nothing more then Unicode Scalar Value. It is not important whether UCP is expressed by 1:1 numeric relation, surrogate pairs or by series of doggies and cats. It could be said that .NET is about to contribute to terminology goulash:

  • Why Rune is not UnicodeScalarValue or something more technical accurate?
  • I failed on finding any reference on why this name was chosen.
  • Is there some specific reason? 
  • Or Rune is just fine as grapheme, glyph, symbol and others would be?
Developer technologies .NET Other
{count} votes

1 answer

Sort by: Most helpful
  1. PerceptiveFilament 21 Reputation points
    2024-10-02T12:52:05.76+00:00

    So, I found it at https://learn.microsoft.com/en-us/dotnet/fundamentals/runtime-libraries/system-text-rune?source=recommendations#rune-in-net-vs-other-languages

    The term "rune" is not defined in the Unicode Standard. The term dates back to the creation of UTF-8. Rob Pike and Ken Thompson were looking for a term to describe what would eventually become known as a code point. They settled on the term "rune", and Rob Pike's later influence over the Go programming language helped popularize the term.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.