ODBC Driver 17 for SQL Server on Linux. Charset conversion problem with OTRS (bug?)

Marc Fauser 1 Reputation point
2021-09-21T20:07:20.137+00:00

I updated from ODBC Driver 13 for SQL Server to version 17.
With version 13, everything was working fine, with version 17, I cannot find what's wrong.

We have OTRS running which connects to a SQL Server database to retrieve data for customers.
E.g. we transfer if the customer has a maintenance contract or not.
Yes = thumb up emoji which I cannot post in this forum ( 4 byte character )
No = thumb down emoji which I cannot post in this forum ( 4 byte character )
This was working fine until version 17.
The select (simplified) was
SELECT N'<thumb up emoji>' as maintenance
I changed the select to
SELECT cast( N'<thumb up emoji>' as nvarchar(10) ) as maintenance
and it works again.
In my odbc.ini, I have no charset defined. In OTRS I have defined UTF-8 as a source and destination charset.
Any hints on what could be wrong or is it a bug in the driver as it was working before?

Developer technologies Transact-SQL
SQL Server Other
{count} votes

3 answers

Sort by: Most helpful
  1. Olaf Helper 47,436 Reputation points
    2021-09-22T05:29:25.477+00:00

    Yes = thumb up emoji which I cannot post in this forum ( 4 byte character )

    4 byte char's are UTF-8, while MS SQL Server mainly supports 2 byte's char's = Unicode.
    Which data type do you use to store the text and which SQL Server version are you using?

    cast( N'<thumb up emoji>' as nvarchar(10) )

    And the type nvarchar here is Unicode, not UTF-8.


  2. YufeiShao-msft 7,146 Reputation points
    2021-09-22T07:16:27.127+00:00

    Hi @Marc Fauser ,

    Upon connection, the driver detects the current local of the process it is loaded in, if it used one of the support encoding, the driver uses that encoding for SQLCHAR data, otherwise, it defaults to UTF-8.
    Since all process start in the ‘C’locale by default (and cause the driver to default to UTF-8), if an application needs to use one of the encodings, it should use the setlocale function to set the locale appropriately before connecting

    In a typical Linux environment where the encoding is UTF-8, users of ODBC Driver 17 upgrading form 13 or 13.1 won’t observe any differences, however, applications that use a non-UTF-8 encoding need to use that encoding for data to/from the driver instead of UTF-8.
    https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/programming-guidelines?view=sql-server-ver15

    So you should check the character sets/encodings of ODBC 17.

    and you should konw, UTF-8 is supported is SQL Server Version 2019, but not previous Versions, they support only ASCII and UniCode


  3. Jure Sah 0 Reputation points
    2023-09-20T08:42:55.8133333+00:00

    So to potentially save some random Googler a lot of time and research...

    The Microsoft ODBC driver uses glibc gconv to convert character sets. However there is no dependency on the package and the driver simply fails with the error "[Microsoft][ODBC Driver 17 for SQL Server]Unicode conversion failed", if the necessary gconv modules are not installed!

    On some operating systems, the glibc package already contains all the necessary gconv modules to convert from the typical character sets, however on other operating systems only ANSI C UTF-8 support is built-in and the other gconv modules are provided by the glibc-gconv-extra package.

    TL;DR the fix is install glibc-gconv-extra .

    This could be considered a bug in the Microsoft ODBC driver because of the missing dependency, however it's true that change in the packages is relatively recent.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.