Thanks @Gary Nebbett
I did a bit more searching and found this article which is a bit more up to date, which provides the same information, however, unlike the article suggests, I never received the event log entry when using the sort control without the additional locale set. Yes it did create additional indexes when the new locale were added.
https://learn.microsoft.com/pt-pt/troubleshoot/windows-server/deployment/use-language-id-identify-language-pack
I completed some additional testing with the new locale configured and got some surprising results. @Stephan Steiner yes you can limit the returned items to only object that exactly match the accent\diarictics filter.
I added the following LCID to the server:
Here are the LCID, Language, Ordering OID
41D - Swedish - 1.2.840.113556.1.4.1594
81D - Swedish: Finland - 1.2.840.113556.1.4.1595
C09 - German: Germany - 1.2.840.113556.1.4.1523
I've setup a test OU with two object
Gary Test1
Gäry Test2
I had to create them with different names as ADUC considered Gary Test and Gäry Test to be the same based on the default unicode matching rules.
If I use a search filter of (displayname=gary*) or (displayname=gäry*) both objects are returned.
If I include the LDAP_SERVER_SORT_OID (1.2.840.113556.1.4.473) control with the sorting OID for Germany 1.2.840.113556.1.4.1523 I get the same result:
BaseDN: OU=test1,DC=w2k12,DC=local
Filter: (&(objectclass=user)(displayname=g\C3\A4ry*))
DN> CN=Gäry Test1,OU=test1,DC=w2k12,DC=local
DN> CN=Gary Test2,OU=test1,DC=w2k12,DC=local
2 records returned
However, if I use the sorting OID for Swedish 1.2.840.113556.1.4.1594 or 1.2.840.113556.1.4.1595, it will only return the object that exactly match to the filter:
BaseDN: OU=test1,DC=w2k12,DC=local
Filter: (&(objectclass=user)(displayname=gary*))
DN> CN=Gary Test2,OU=test1,DC=w2k12,DC=local
1 records returned
With the accent character filter
BaseDN: OU=test1,DC=w2k12,DC=local
Filter: (&(objectclass=user)(displayname=g\C3\A4ry*))
DN> CN=Gäry Test1,OU=test1,DC=w2k12,DC=local
1 records returned
This is the dump of the LDAP_SERVER_SORT_OID control I used:
In the MS Unicode reference [MS-UCODEREF] I'm struggling to find anything that suggests that Swedish has a different matching pattern. The only reference I did find was a reference to a different sort order for vowels in Swedish, which might explain it.
Also the definition of the NORM_IGNORENONSPACE flag in LCMapStringEx makes reference to that scripts (notably Latin scripts), NORM_IGNORENONSPACE coincides with LINGUISTIC_IGNOREDIACRITIC but I have been unable to find any reference to a lookup table that shows which language do and don't have it defined.
I did find that the Swedish language uses the Sorting ID {0000001A-57EE-1E5C-00B4-D0000BB1E11E} but can't find any reference to the rules associated to this Sorting ID.
Going deep into the rabbit hole of unicode and locale complexities here!
Gary.