Unicode category changed for some Latin-1 characters

2021-11-08

Char methods now return the correct Unicode category for characters in the Latin-1 range. The category matches that of the Unicode standard.

Change description

In previous .NET versions, Char methods used a fixed list of Unicode categories for characters in the Latin-1 range. However, the Unicode standard has changed the categories of some of these characters since those APIs were implemented, creating a discrepancy. In addition, there was also a discrepancy between Char and CharUnicodeInfo APIs, which follow the Unicode standard. In .NET 5 and later versions, Char methods use and return the Unicode category that matches the Unicode standard for all characters.

The following table shows the characters whose Unicode categories have changed in .NET 5:

Character	Unicode category in previous .NET versions	Unicode category in .NET 5 and later versions
§ (\u00a7)	`OtherSymbol`	`OtherPunctuation`
ª (\u00aa)	`LowercaseLetter`	`OtherLetter`
SHY (\u00ad)	`DashPunctuation`	`Format`
¶ (\u00b6)	`OtherSymbol`	`OtherPunctuation`
º (\u00ba)	`LowercaseLetter`	`OtherLetter`

Version introduced

.NET 5.0

Recommended action

If you have any code that gets the Unicode character category by using the Char class and assumes the category will never change, you may need to update it.

Reason for change

This change was made so that the categories returned by the Char type are consistent with both the Unicode standard and the CharUnicodeInfo type.

Affected APIs

Additionally, any class that depends on Char to obtain the Unicode character category, for example, Regex, is affected by this change.