I'm trying to confirm if Microsoft LDAP API supports multi-byte UTF-8 variable-length encoding for DNs.
RFC2251 - Section 4.1.3 Distinguished Name and Relative Distinguished Name, states that DNs use LDAPString format
RFC2251 - Section 4.1.2 String Type, states that an LDAPString is a Octet String using UTF-8 encoded based on RFC2044 which supports variable-length encoding
RFC2253 - Section 5 Examples provides examples of UTF-8 encoding for unicode characters
Unicode Letter Description 10646 code UTF-8 Quoted
=============================== ========== ====== =======
LATIN CAPITAL LETTER L U0000004C 0x4C L
LATIN SMALL LETTER U U00000075 0x75 u
LATIN SMALL LETTER C WITH CARON U0000010D 0xC48D \C4\8D
LATIN SMALL LETTER I U00000069 0x69 i
LATIN SMALL LETTER C WITH ACUTE U00000107 0xC487 \C4\87
The Microsoft LDAP Protocol Distinguished Names reference page does state that UTF-8 encoding is used, and notation that should be used:
If an attribute value contains other reserved characters, such as the equals sign (=) or non-printable characters, it must be encoded in hexadecimal by replacing the character with a backslash followed by two hex digits.
And this works if for non-printable and printable characters, all of the following examples work with the ldap_search_s API as the base parameter as the distinguished name
CN=Gary Reynolds,OU=Domain Users,DC=w2k12,DC=local
CN=G\41ry Reynolds,OU=Domain Users,DC=w2k12,DC=local
CN=G\41\52y Reynolds,OU=Domain Users,DC=w2k12,DC=local
However, if you try to use multi-byte UTF-8 encoding of the DN , the object is not found. The object has the following unicode DN
Encoded as UTF-8 this is:
This fails, if you encode the normal char 'a' as hex \41 this works as in the example above, if you encode the same char 'a' Hex 41 using two byte encoding of \C1\81, this also fails.
Does anyone know if multi-byte UTF-8 encoding is supported for DN and if there is an alternative format that must be used for ANSI LDAP APIs?