Does LDAP API support multi-byte UTF-8 encoding for distinguishedName

Gary Reynolds 9,391 Reputation points
2022-06-13T03:46:15.023+00:00

I'm trying to confirm if Microsoft LDAP API supports multi-byte UTF-8 variable-length encoding for DNs.

RFC2251 - Section 4.1.3 Distinguished Name and Relative Distinguished Name, states that DNs use LDAPString format
RFC2251 - Section 4.1.2 String Type, states that an LDAPString is a Octet String using UTF-8 encoded based on RFC2044 which supports variable-length encoding
RFC2253 - Section 5 Examples provides examples of UTF-8 encoding for unicode characters

   Unicode Letter Description      10646 code UTF-8  Quoted  
   =============================== ========== ====== =======  
   LATIN CAPITAL LETTER L          U0000004C  0x4C   L  
   LATIN SMALL LETTER U            U00000075  0x75   u  
   LATIN SMALL LETTER C WITH CARON U0000010D  0xC48D \C4\8D  
   LATIN SMALL LETTER I            U00000069  0x69   i  
   LATIN SMALL LETTER C WITH ACUTE U00000107  0xC487 \C4\87  

The Microsoft LDAP Protocol Distinguished Names reference page does state that UTF-8 encoding is used, and notation that should be used:

If an attribute value contains other reserved characters, such as the equals sign (=) or non-printable characters, it must be encoded in hexadecimal by replacing the character with a backslash followed by two hex digits.

And this works if for non-printable and printable characters, all of the following examples work with the ldap_search_s API as the base parameter as the distinguished name

CN=Gary Reynolds,OU=Domain Users,DC=w2k12,DC=local  
CN=G\41ry Reynolds,OU=Domain Users,DC=w2k12,DC=local  
CN=G\41\52y Reynolds,OU=Domain Users,DC=w2k12,DC=local  
CN=Before\0DAfter,OU=Domain Users,DC=w2k12,DC=local  

However, if you try to use multi-byte UTF-8 encoding of the DN , the object is not found. The object has the following unicode DN

CN=Gačy Reynolds,OU=test1,DC=w2k12,DC=local  

Encoded as UTF-8 this is:

CN=Ga\C4\8Dy Reynolds,OU=test1,DC=w2k12,DC=local  

This fails, if you encode the normal char 'a' as hex \41 this works as in the example above, if you encode the same char 'a' Hex 41 using two byte encoding of \C1\81, this also fails.

Does anyone know if multi-byte UTF-8 encoding is supported for DN and if there is an alternative format that must be used for ANSI LDAP APIs?

Gary.

Active Directory
Active Directory
A set of directory-based technologies included in Windows Server.
5,808 questions
{count} votes

Accepted answer
  1. Gary Nebbett 5,721 Reputation points
    2022-06-13T11:06:25.91+00:00

    Hello Gary,

    An interesting problem!

    I guess that you looked at a trace of the LDAP traffic. I traced a search from base "cn=G\41\C4\8Dy,cn=Users,dc=home,dc=org" with filter "(!(|(cn=Ga\C4\8Dx)(cn=Gačz)))" and noticed that the filter strings are "unescaped" on the client but that the DN string is sent without alteration to the server.

    On the server, the DN seems to be put through a "normalization" process which "unquotes" and then "requotes" the DN. This explains why \41 works - it is unquoted as "a" and does not need to be quoted when "quoting" the DN.

    The only way that I could find (so far) to reference Gačy was to use a BER encoding of a octet string containing the UTF-8 representation of the name: "cn=#04054761c48d79,cn=Users,dc=home,dc=org"; the leading 04 is the tag for "octet string" and the following 05 is the length in bytes of the string.

    Gary

    1 person found this answer helpful.
    0 comments No comments

7 additional answers

Sort by: Most helpful
  1. Gary Reynolds 9,391 Reputation points
    2022-06-14T10:41:33.573+00:00

    Hi Gary,

    The LDAP_SERVER_SORT_OID is not being used in any of the queries.

    The code below is performing a number of different queries to the server, using both the ANSI and Unicode version of the LDAP APIs. I've tried this MultiByteToWideChar\WideCharToMultiByte with UTF-8 encoding and codepages, it's does make much difference, the output is not the same as LDP, which shows the DN as CN=Gačy Reynolds,OU=test1,DC=w2k12,DC=local and works, where my code doesn't.

    String Temp;  
    int c;  
    LDAPMessage *pMes;  
    DWORD ret;  
    LDAP *hLDAP;  
    DWORD SrcLen=0, DstLen=0;  
    
    wchar_t wsrc[] = L"CN=Ga?y Reynolds,OU=test1,DC=w2k12,DC=local";  
    wsrc[5]=0x10d;  
    
    char csrc[] = "CN=Ga??y Reynolds,OU=test1,DC=w2k12,DC=local";  
    csrc[5]=0xc4;  
    csrc[6]=0x8d;  
    
    wchar_t Dst[1024];  
    char Dst1[1024];  
    
    SrcLen = strlen(csrc);  
    DstLen=1024;  
    ret = MultiByteToWideChar(CP_UTF8,0,csrc,-1,Dst,DstLen);  
    
    Temp = "";  
    for(c=0;c<ret;c++){  
        Temp += IntToHex(Dst[c],4) + " ";  
       }  
    memLDPResults->Lines->Add("MultiByteToWideChar = " + Temp);  
    memLDPResults->Lines->Add("MultiByteToWideChar = " + String(Dst));  
    
    
    SrcLen = wcslen(wsrc);  
    
    ret = WideCharToMultiByte(CP_UTF8,0,wsrc,-1,Dst1,DstLen,NULL,NULL);  
    
    Temp = "";  
    for(c=0;c<ret;c++){  
        Temp += IntToHex((BYTE)Dst1[c],2) + " ";  
       }  
    memLDPResults->Lines->Add("WideCharToMultiByte = " + Temp);  
    memLDPResults->Lines->Add("WideCharToMultiByte = " + String(Dst1));  
    
    //---------------------------------------------------------------------------  
    //      Search with Unicode string, Unicode API  
    //---------------------------------------------------------------------------  
    memLDPResults->Lines->Add("Search with Unicode string, Unicode API");  
    
    hLDAP = ldap_initW(L"192.168.1.245",LDAP_PORT);  
    
    if (!hLDAP){  
        memLDPResults->Lines->Add("Failed to ldap_initW to server, ");  
        return;  
       }  
    if (ldap_bind_sW(hLDAP,L"CN=Administrator,CN=Users,DC=w2k12,DC=local",L"Pass",LDAP_AUTH_SIMPLE)){  
        memLDPResults->Lines->Add("Failed to ldap_bindW to server, " );  
        return;  
       }  
    
        struct l_timeval tm;  
    
        tm.tv_sec = 60;  
        tm.tv_usec = 0;  
        try {  
            ret = ldap_search_ext_sW(hLDAP,  
                       wsrc,  
                       LDAP_SCOPE_BASE,  
                       L"(objectclass=*)",  
                       NULL,  
                       0,  
                       NULL,  
                       NULL,  
                       &tm,  
                       0,  
                       &pMes);  
           }  
        catch(...){  
            ret = -1;  
           }  
    
        if (ret != LDAP_SUCCESS && ret !=9){     // enable partial results to be returned  
            memLDPResults->Lines->Add("Failed to ldap_search_ext_sW to server, " + IntToStr(LdapGetLastError()) );  
           } else {  
            memLDPResults->Lines->Add("ldap_search_ext_sW Found " );  
            ldap_msgfree(pMes);  
          }  
    
    ldap_unbind_s(hLDAP);  
    
    //---------------------------------------------------------------------------  
    //      Search with MultiByteToWideChar char -> unicode, Unicode API  
    //---------------------------------------------------------------------------  
    memLDPResults->Lines->Add("Search with MultiByteToWideChar char -> unicode, Unicode API");  
    
    
    hLDAP = ldap_initW(L"192.168.1.245",LDAP_PORT);  
    
    if (!hLDAP){  
        memLDPResults->Lines->Add("Failed to ldap_initW to server, ");  
        return;  
       }  
    if (ldap_bind_sW(hLDAP,L"CN=Administrator,CN=Users,DC=w2k12,DC=local",L"Pass",LDAP_AUTH_SIMPLE)){  
        memLDPResults->Lines->Add("Failed to ldap_bindW to server, " );  
        return;  
       }  
    
        tm.tv_sec = 60;  
        tm.tv_usec = 0;  
        try {  
            ret = ldap_search_ext_sW(hLDAP,  
                       Dst,  
                       LDAP_SCOPE_BASE,  
                       L"(objectclass=*)",  
                       NULL,  
                       0,  
                       NULL,  
                       NULL,  
                       &tm,  
                       0,  
                       &pMes);  
           }  
        catch(...){  
            ret = -1;  
           }  
    
        if (ret != LDAP_SUCCESS && ret !=9){     // enable partial results to be returned  
            memLDPResults->Lines->Add("Failed to ldap_search_ext_sW to server, " + IntToStr(LdapGetLastError()) );  
           } else {  
            memLDPResults->Lines->Add("ldap_search_ext_sW Found " );  
            ldap_msgfree(pMes);  
          }  
    
    ldap_unbind_s(hLDAP);  
    
    //---------------------------------------------------------------------------  
    //      Search with char, ANSI API  
    //---------------------------------------------------------------------------  
    
    memLDPResults->Lines->Add("Search with char, ANSI API");  
    
    hLDAP = ldap_initA("192.168.1.245",LDAP_PORT);  
    
    if (!hLDAP){  
        memLDPResults->Lines->Add("Failed to ldap_init to server, ");  
        return;  
       }  
    if (ldap_bind_sA(hLDAP,"CN=Administrator,CN=Users,DC=w2k12,DC=local","Pass",LDAP_AUTH_SIMPLE)){  
        memLDPResults->Lines->Add("Failed to ldap_bind to server, " );  
        return;  
       }  
       try {  
            ret = ldap_search_ext_sA(hLDAP,  
                       csrc,  
                       LDAP_SCOPE_BASE,  
                       "(objectclass=*)",  
                       NULL,  
                       0,  
                       NULL,  
                       NULL,  
                       &tm,  
                       0,  
                       &pMes);  
           }  
        catch(...){  
            ret = -1;  
           }  
       if (ret != LDAP_SUCCESS && ret !=9){     // enable partial results to be returned  
            memLDPResults->Lines->Add("Failed to ldap_search_ext_s to server, " + IntToStr(LdapGetLastError()) );  
           } else {  
            memLDPResults->Lines->Add("ldap_search_ext_s Found " );  
            ldap_msgfree(pMes);  
          }  
    
    ldap_unbind_s(hLDAP);  
    
    //---------------------------------------------------------------------------  
    //      Search with WideCharToMultiByte unicode -> char, ANSI API  
    //---------------------------------------------------------------------------  
    memLDPResults->Lines->Add("Search with WideCharToMultiByte unicode -> char, ANSI API");  
    
    
    hLDAP = ldap_initA("192.168.1.245",LDAP_PORT);  
    
    if (!hLDAP){  
        memLDPResults->Lines->Add("Failed to ldap_init to server, ");  
        return;  
       }  
    if (ldap_bind_sA(hLDAP,"CN=Administrator,CN=Users,DC=w2k12,DC=local","Pass",LDAP_AUTH_SIMPLE)){  
        memLDPResults->Lines->Add("Failed to ldap_bind to server, " );  
        return;  
       }  
       try {  
            ret = ldap_search_ext_sA(hLDAP,  
                       Dst1,  
                       LDAP_SCOPE_BASE,  
                       "(objectclass=*)",  
                       NULL,  
                       0,  
                       NULL,  
                       NULL,  
                       &tm,  
                       0,  
                       &pMes);  
           }  
        catch(...){  
            ret = -1;  
           }  
       if (ret != LDAP_SUCCESS && ret !=9){     // enable partial results to be returned  
            memLDPResults->Lines->Add("Failed to ldap_search_ext_s to server, " + IntToStr(LdapGetLastError()) );  
           } else {  
            memLDPResults->Lines->Add("ldap_search_ext_s Found " );  
            ldap_msgfree(pMes);  
          }  
    
     ldap_unbind_s(hLDAP);  
    

    This is the output of the code

    MultiByteToWideChar = 0043 004E 003D 0047 0061 010D 0079 0020 0052 0065 0079 006E 006F 006C 0064 0073 002C 004F 0055 003D 0074 0065 0073 0074 0031 002C 0044 0043 003D 0077 0032 006B 0031 0032 002C 0044 0043 003D 006C 006F 0063 0061 006C 0000   
    MultiByteToWideChar = CN=Gacy Reynolds,OU=test1,DC=w2k12,DC=local  
      
    WideCharToMultiByte = 43 4E 3D 47 61 C4 8D 79 20 52 65 79 6E 6F 6C 64 73 2C 4F 55 3D 74 65 73 74 31 2C 44 43 3D 77 32 6B 31 32 2C 44 43 3D 6C 6F 63 61 6C 00   
    WideCharToMultiByte = CN=GaÄ y Reynolds,OU=test1,DC=w2k12,DC=local  
      
    Search with Unicode string, Unicode API  
    ldap_search_ext_sW Found   
      
    Search with MultiByteToWideChar char -> unicode, Unicode API  
    ldap_search_ext_sW Found   
      
    Search with char, ANSI API  
    Failed to ldap_search_ext_s to server, 32  
      
    Search with WideCharToMultiByte unicode -> char, ANSI API  
    Failed to ldap_search_ext_s to server, 32  
    

    The network trace for the four binds, The first two are showing that the unicode character has been simplified based on the current code page. The last two are showing the unicode format, but both fail, even though they are the same as the LDP search.

    211296-image.png

    The network trace for LDP

    211305-image.png

    Also attached is the network traces for the fourbinds and LDP 211209-ldp.txt 211248-foursearches.txt, just rename them to pcapng files

    0 comments No comments

  2. Gary Nebbett 5,721 Reputation points
    2022-06-14T17:49:15.16+00:00

    Hello Gary,

    I think that I have the answer - since the example code does not contain a statement like "ldap_set_option(ld, LDAP_OPT_PROTOCOL_VERSION, LDAP_VERSION3)", LDAP version 2 is being defaulted.

    The definition of LDAPDN differs between versions; in LDAP version 2 it is:

         LDAPString ::= OCTET STRING  
      
       The LDAPString is a notational convenience to indicate that, although  
       strings of LDAPString type encode as OCTET STRING types, the legal  
       character set in such strings is limited to the IA5 character set.  
      
         LDAPDN ::= LDAPString  
      
    

    Since LDAP version 2 does not support UTF-8, the Windows LDAP client converts wide-character strings to IA5 by using the CP_ACP codepage in a call to WideCharToMultiByte.

    The traces with the Unicode API show that the "simplified" name "Gacy" is being used, and is working. When we "hack" a UTF-8 version of "Gačy" into the search base, the server (operating in version 2 mode) does not construct a string from it that matches the Gačy entry.

    Gary


  3. Gary Nebbett 5,721 Reputation points
    2022-06-15T05:35:51.39+00:00

    Hello Gary,

    Thanks for your research results on the LDAP version.

    The very pleasant experience of "suddenly" recognizing the cause of some mysterious behaviour after hours of grovelling through data dumps and debugger output only occurs infrequently...

    Gary

    0 comments No comments