question

StephanSteiner-3520 avatar image
0 Votes"
StephanSteiner-3520 asked GaryReynolds edited

AD LDS - LDAP query with non ascii character gets re-interpreted and returns incorrect results.

I'm running this query against an AD LDS:

(sn=bäch*)

The results contain both results where sn starts with 'bäch' as well as results where the sn starts with 'bach'.

Is there a way to tell LDS to cut this out and only return items where the sn starts with 'bach' if the ldap query is (sn=bach)?

windows-active-directory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryReynolds avatar image
0 Votes"
GaryReynolds answered

Hi @StephanSteiner-3520

I don't believe there is, as LDS is trying to be helpful and return entries that match including the one that have accents!

You could try using escape characters to send the accented character in hex to see if that helps, i.e. (sn=b\00\e4ch*) - I think this is the correct hex for the ä

Gary.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

LimitlessTechnology-2700 avatar image
0 Votes"
LimitlessTechnology-2700 answered LimitlessTechnology-2700 published

Hi there,

There is no such protocol to make LDAP to return with specific syntax as requested . You can try converting the non ascii code to ascii code and then try your syntax.



--If the reply is helpful, please Upvote and Accept it as an answer--

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryNebbett avatar image
0 Votes"
GaryNebbett answered GaryNebbett edited

Hello @StephanSteiner-3520,

It was difficult to find dependable sources of information that could help to answer this question.

The suggestion from @GaryReynolds-8098 does not have any impact on the problem. The filter string is encoded as an ASN.1 OCTET STRING, implicitly containing an UTF-8 encoded string. The interpretation of escaped binary values is performed on the client when constructing the ASN.1 AttributeValue. Correctly escaping “ä” in UTF-8 gives “\C3\A4” (\00\E4 is a UTF-16 encoding of “ä”).

In the following ETW trace, the first highlighted element is the filter as provided to the client API (this ETW string is encoded as UTF-16). The “g?ry” strings in the trace are just an artefact of the tracing (these ETW strings are plain ASCII strings). The later highlighted elements are the encoding of the search request.

144687-x.png

This is a formatted dump of the ASN.1 content of the LDAP request (see RFC 4511 for complete ASN.1 definitions):

[APPLICATION 3] {
OCTET STRING 'CN=Gary,CN=Users,DC=Home,DC=Org'
ENUMERATED 2
ENUMERATED 0
INTEGER 1000
INTEGER 60
BOOLEAN FALSE
[4] {
OCTET STRING 'mail'
SEQUENCE {
[0]
67 C3 A4 72 79
}
}
SEQUENCE {
OCTET STRING
2A
}
}

Correctly escaping “ä” results in an identical search request:

144550-x.png

The statement by @LimitlessTechnology-2700 is also misleading – RFC 4715 “Lightweight Directory Access Protocol (LDAP): Syntaxes and Matching Rules” does define a mechanism for expressing something close to what you probably want: matching rules and more specifically the caseExactMatch rule. The syntax would be (mail: caseExactMatch:=gäry) or (mail: 2.5.13.5:=gäry). However, as [MS-ADTS] confirms in section 3.1.1.3.4.4, Active Directory LDAP does not implement this matching rule.

The schema definition of an LDAP attribute determines its default comparison operation; a different matching rule can only be used when it is compatible with the schema syntax of the attribute (and when the matching rule is implemented).

Most of the string attributes defined in the standard Active Directory schema (including “mail”, “sn”, etc.) have the attribute schema type of 2.5.5.12 (String(Unicode)).

[MS-ADTS] describes (in section 6.5) how Unicode strings are compared. In practice, the Win32 API routine LCMapStringEx is called with a map flags argument of NORM_IGNORECASE | NORM_IGNORENONSPACE | LCMAP_SORTKEY | SORT_STRINGSORT | NORM_IGNOREKANATYPE | NORM_IGNOREWIDTH. The presence of NORM_IGNORENONSPACE effectively means that diacritical marks such as the umlaut are ignored when creating a sort key for the filter string.

The first argument to LCMapStringEx is a locale name and, by default, the “en-US” locale is used. The locale can be controlled by adding an LDAP_SERVER_SORT_OID extended control to the search request (perhaps specifying de-CH as the locale), but this has no effect on the treatment of umlauts when NORM_IGNORENONSPACE is used.

In summary, when searching Active Directory, the results returned by an LDAP search need to be filtered again by client side application code to remove unwanted matches resulting from the NORM_IGNORENONSPACE (e.g. ignore diacritical marks) behaviour.

Gary



x.png (35.7 KiB)
x.png (36.1 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryReynolds avatar image
0 Votes"
GaryReynolds answered GaryNebbett commented

Hi Gary.

Did you manage to get the LDAP_SERVER_SORT_OID control working with different Ordering rule OID. On my DC which has Australia English locale only 1.2.840.113556.1.4.1499 English: United States, and 1.2.840.113556.1.4.1665 English: Australia work, none of the other 158 Ordering rule OIDs work. I've tried install additional language pack and changing the regional settings but this doesn't make any difference. However, this could be expected as [MS-ADTS] 3.1.1.2.2.4.13 does state that the search order is independent of the servers locale so all DCs return the same result and all alphabets are included, so you would expect the OID to work.

With the LDAP_SERVER_SORT_OID control defined with a 1.2.840.113556.1.4.1499 Ordering rule OID I get this result:

 Server: w2k19.w2k12.local
 Domain: 
    
    Control: 1.2.840.113556.1.4.473  - Len: 43
       30 84 00 00 00 25 30 84 00 00 00 1F 04 04 6E   
       61 6D 65 80 17 31 2E 32 2E 38 34 30 2E 31 31   
       33 35 35 36 2E 31 2E 34 2E 31 34 39 39         
    
       ASN.1 Structure Decode
           30 84 00 00 00 25           : Sequence (len: 37)
           |  30 84 00 00 00 1F        : Sequence (len: 31)
           |  |  04 04                 : Octet String (len: 4)
           |  |     6E 61 6D 65                   : name
           |  |  80 17                 : Context Specific[0] (len: 23 Tag: 0 Class: 2 P/C: 0)
           |  |     31 2E 32 2E 38 34 30 2E       : 1.2.840.
           |  |     31 31 33 35 35 36 2E 31       : 113556.1
           |  |     2E 34 2E 31 34 39 39          : .4.1499
    
 BaseDN: DC=w2k12,DC=local
 Filter: (objectclass=*)
    
 <results>
    
 Controls:
     1.2.840.113556.1.4.474 - RESP_SORT
    
    Control: 1.2.840.113556.1.4.474  - Len: 5
       30 03 0A 01 00                                 
    
       ASN.1 Structure Decode
           30 03                       : Sequence (len: 3)
           |  0A 01                    : Enumerated (len: 1)
           |     00                            : .

Which is successful and the returned LDAP_SERVER_RESP_SORT_OID Enum of 0 = successful

The same query with a Ordering rule OID of 1.2.840.113556.1.4.1594 fails:

 Server: w2k19.w2k12.local
 Domain: 
    
    Control: 1.2.840.113556.1.4.473  - Len: 43
       30 84 00 00 00 25 30 84 00 00 00 1F 04 04 6E   
       61 6D 65 80 17 31 2E 32 2E 38 34 30 2E 31 31   
       33 35 35 36 2E 31 2E 34 2E 31 35 39 34         
    
       ASN.1 Structure Decode
           30 84 00 00 00 25           : Sequence (len: 37)
           |  30 84 00 00 00 1F        : Sequence (len: 31)
           |  |  04 04                 : Octet String (len: 4)
           |  |     6E 61 6D 65                   : name
           |  |  80 17                 : Context Specific[0] (len: 23 Tag: 0 Class: 2 P/C: 0)
           |  |     31 2E 32 2E 38 34 30 2E       : 1.2.840.
           |  |     31 31 33 35 35 36 2E 31       : 113556.1
           |  |     2E 34 2E 31 35 39 34          : .4.1594
    
 BaseDN: DC=w2k12,DC=local
 Filter: (objectclass=*)
    
 Error: (0x5D) did not find the specified control, Server Error: 00000057: LdapErr: DSID-0C090B16, comment: Error processing control, data 0, v4563, Ext Error: (87) The parameter is incorrect.
    
 Controls:
     1.2.840.113556.1.4.474 - RESP_SORT
    
    Control: 1.2.840.113556.1.4.474  - Len: 5
       30 03 0A 01 12                                 
    
       ASN.1 Structure Decode
           30 03                       : Sequence (len: 3)
           |  0A 01                    : Enumerated (len: 1)
           |     12                            : .


The returned LDAP_SERVER_RESP_SORT_OID Enum of 0x12 = inappropriateMatching , unrecognized or inappropriate matching rule in sort key.

Do you know how to enable additional Ordering rule OIDs?

Gary.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @GaryReynolds-8098,

This is not something that I have tried myself, but I believe the information in the following two links describe what is necessary (both links are to "archives" of the same old Microsoft KB article):

https://mskb.pkisolutions.com/kb/325622
https://www.betaarchive.com/wiki/index.php?title=Microsoft_KB_Archive/325622

This link gives essentially the same advice: https://www.techrepublic.com/article/implement-proper-language-support-in-windows-2000-server/.

I think that these steps will enable ordering rules, at the cost of building additional indices for the languages.

Given the age of the articles (15 to 16 years old) and the very low profile of the information, one should exercise caution.

Gary

0 Votes 0 ·
GaryReynolds avatar image
0 Votes"
GaryReynolds answered GaryReynolds edited

Thanks @GaryNebbett

I did a bit more searching and found this article which is a bit more up to date, which provides the same information, however, unlike the article suggests, I never received the event log entry when using the sort control without the additional locale set. Yes it did create additional indexes when the new locale were added.
https://docs.microsoft.com/pt-pt/troubleshoot/windows-server/deployment/use-language-id-identify-language-pack

I completed some additional testing with the new locale configured and got some surprising results. @StephanSteiner-3520 yes you can limit the returned items to only object that exactly match the accent\diarictics filter.

I added the following LCID to the server:
145029-lcid.png

Here are the LCID, Language, Ordering OID

 41D - Swedish - 1.2.840.113556.1.4.1594
 81D - Swedish: Finland - 1.2.840.113556.1.4.1595
 C09 - German: Germany - 1.2.840.113556.1.4.1523

I've setup a test OU with two object
Gary Test1
Gäry Test2

I had to create them with different names as ADUC considered Gary Test and Gäry Test to be the same based on the default unicode matching rules.

If I use a search filter of (displayname=gary*) or (displayname=gäry*) both objects are returned.

If I include the LDAP_SERVER_SORT_OID (1.2.840.113556.1.4.473) control with the sorting OID for Germany 1.2.840.113556.1.4.1523 I get the same result:

 BaseDN: OU=test1,DC=w2k12,DC=local
 Filter: (&(objectclass=user)(displayname=g\C3\A4ry*))
    
 DN> CN=Gäry Test1,OU=test1,DC=w2k12,DC=local
 DN> CN=Gary Test2,OU=test1,DC=w2k12,DC=local
 2 records returned

However, if I use the sorting OID for Swedish 1.2.840.113556.1.4.1594 or 1.2.840.113556.1.4.1595, it will only return the object that exactly match to the filter:

 BaseDN: OU=test1,DC=w2k12,DC=local
 Filter: (&(objectclass=user)(displayname=gary*))
    
 DN> CN=Gary Test2,OU=test1,DC=w2k12,DC=local
 1 records returned

With the accent character filter

 BaseDN: OU=test1,DC=w2k12,DC=local
 Filter: (&(objectclass=user)(displayname=g\C3\A4ry*))
    
 DN> CN=Gäry Test1,OU=test1,DC=w2k12,DC=local
 1 records returned

This is the dump of the LDAP_SERVER_SORT_OID control I used:
145134-asn1-control.png

In the MS Unicode reference [MS-UCODEREF] I'm struggling to find anything that suggests that Swedish has a different matching pattern. The only reference I did find was a reference to a different sort order for vowels in Swedish, which might explain it.

Also the definition of the NORM_IGNORENONSPACE flag in LCMapStringEx makes reference to that scripts (notably Latin scripts), NORM_IGNORENONSPACE coincides with LINGUISTIC_IGNOREDIACRITIC but I have been unable to find any reference to a lookup table that shows which language do and don't have it defined.

I did find that the Swedish language uses the Sorting ID {0000001A-57EE-1E5C-00B4-D0000BB1E11E} but can't find any reference to the rules associated to this Sorting ID.

145099-screenshot-2021-10-30-102934.png

144989-screenshot-2021-10-30-102038.png

Going deep into the rabbit hole of unicode and locale complexities here!

Gary.



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryNebbett avatar image
0 Votes"
GaryNebbett answered

Hello @GaryReynolds-8098,

Thank you for some very interesting research; it is unfortunate that the time difference (CET (Switzerland) for me and presumably one of the Australian zones for you) means that we can only exchange one update per day :-(

I found that article too, but I did not cite it since it is rather lacking in context; the "contextual" information that I found useful was that this functionality is possibly just intended to support the Microsoft Exchange Global Address List.

I think that the event entries referenced in the article are logged when the NTDS service starts and finds a language identifier in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NTDS\Language for a language pack that is not installed - it is not logged when an ordering rule is used.

I took a different approach and wrote a program that calls LCMapStringEx with flags of NORM_IGNORECASE | NORM_IGNORENONSPACE | LCMAP_SORTKEY | SORT_STRINGSORT | NORM_IGNOREKANATYPE | NORM_IGNOREWIDTH and configurable locale and string and then dumps the result. Here are some examples:

LCMapStringEx de-DE Gäry
0E-25-0E-02-0E-8A-0E-A7-01-01-01-01-00

LCMapStringEx de-DE Gary
0E-25-0E-02-0E-8A-0E-A7-01-01-01-01-00

LCMapStringEx se-SE Gary
0E-25-0E-02-0E-8A-0E-A7-01-01-01-01-00

LCMapStringEx se-SE Gäry
0E-25-0E-AF-0E-8A-0E-A7-01-01-01-01-00

As you say, the reference that you found probably explains the behaviour:

In Swedish, for example, some vowels with an accent sort after "Z," whereas in other European countries the same accented vowel comes right after the non-diacritic vowel.

Gary

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryReynolds avatar image
0 Votes"
GaryReynolds answered GaryReynolds edited

Hi Gary

Yeah I'm Melbourne based, so the time difference can be a pain for Europe.

I did find this post which did lead me to Sorting Weight Table Order and this reference 3.1.5.2.3.1 in MS-UCODEREF and the download of the Sorting Table. Which contains the sorting details for the {0000001A-57EE-1E5C-00B4-D0000BB1E11E} used for the Swedish sorting table, which is used to create the character map returned by LCMapStringEx function:

  SORTGUID 0000001A-57EE-1E5C-00B4-D0000BB1E11E
  LOCALENAME fi-FI        ;Finnish - fi-FI
  LOCALENAME sv-SE        ;Swedish - sv-SE
  LOCALENAME sv-FI        ;Swedish - Finland - sv-FI
  LOCALENAME fi           ;Finnish - fi
  LOCALENAME sv           ;Swedish - sv
    
  TWO 14
    
 0x0077 0x0302 14 162 18 2 ;w Circumflex
 0x0057 0x0302 14 162 18 18 ;W Circumflex
 0x0075 0x030b 14 167 27 2 ;u Double Acute
 0x0055 0x030b 14 167 27 18 ;U Double Acute
 0x0075 0x0308 14 167 123 2 ;u Diaeresis
 0x0055 0x0308 14 167 123 18 ;U Diaeresis
 0x0061 0x030a 14 173 2 2 ;a Ring
 0x0041 0x030a 14 173 2 18 ;A Ring
 0x0061 0x0308 14 175 2 2 ;a Diaeresis
 0x0041 0x0308 14 175 2 18 ;A Diaeresis
 0x006f 0x0308 14 176 2 2 ;o Diaeresis
 0x004f 0x0308 14 176 2 18 ;O Diaeresis
 0x006f 0x030b 14 176 27 2 ;o Double Acute
 0x004f 0x030b 14 176 27 18 ;O Double Acute

I think this also answers the question on which languages have the LINGUISTIC_IGNOREDIACRITIC option set, it appears to be based on the CW flag attribute and could be on a per character basis

 ; CW Values:
 ;   0-1 - reserved for delimiter/terminator
 ;   0x01 bit - Full Width (if set). (1 == Full Width, 0 == Half/Normal Width)
 ;   0x02 bit - Set by default, can be cleared in some case (if another higher bit is set)
 ;   0x04 bit - Super/Subscript?
 ;   0x08 bit -
 ;   0x10 bit - Upper Case.  (16 == upper case, 0 == lower case)
 ;   0x20 bit -
 ;   0x40/0x80 bits - Reserved for nlstrans, which uses these as flags for characters that may compress
 ;
 ; Flags to NLSTrans
 ; After the GUID
 ;     HAS_3_BYTE_WEIGHTS - This ID has 3 byte weights.  Must be tagged on the EXCEPTION AND on the COMPRESSION
 ;     LINGUISTIC_CASING  - Tagged (only) on the EXCEPTION table that applies to the linguistic case.
 ;                          Should also have another untagged EXCEPTION table for non-linguistic casing.


I think that's probably far enough for this one, as I don't think there is an option to change the sorting table but at least we have identified an option to change the LDAP sorting order if required.

Gary.



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.