The character SINGLE SHIFT THREE you refer to is U+00BF in Unicode, but with varchar and SQL_Latin1_General_CI_AS you are outside the realm of Unicode. That is character 143 in code page 1252 and what that is defined as, I don't know. But maybe it is a character that becomes EFBFBD in UTF-8. Then again, that does not even seem to be a legal UTF-8 sequence.
JDBC: Latin1 to UTF8
There is a table in SQL Server with collation SQL_Latin1_General_CP1_CS_AS.
The table has a column varchar(35) with the same collation SQL_Latin1_General_CP1_CS_AS.
The column contains a string with the character 8f (hexadecimal).
See https://www.fileformat.info/info/unicode/char/008f/index.htm
According to this page, this character converted into UTF8 should become c28f.
When I read the value from this column in Java and convert it to UTF-8, the 8f is replaced with efbfbd. So the 8f get's lost... a kind of.
See https://www.fileformat.info/info/unicode/char/0fffd/index.htm
public static String convertStrToHex(String str) {
byte[] getBytesFromString = str.getBytes(StandardCharsets.UTF_8);
BigInteger bigInteger = new BigInteger(1, getBytesFromString);
String convertedResult = String.format("%X", bigInteger);
return convertedResult;
}
When I query the table
System.out.println(convertStrToHex(resultSet.getString(1));
I get EFBFBD.
When I declare a string variable "\u008f" and convert it in UTF-8:
String code="\u008f";
System.out.println(convertStrToHex(code));
I get correctly C28F.
Testet with:
SQL Server 2017 and 2019
JDBC: mssql and jTDS
with the same result.
I would appreciate any help!
As I understand, the JDBC driver is to blame. But why??
SQL Server | Other
3 answers
Sort by: Most helpful
-
Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator
2022-06-22T22:03:29.51+00:00 -
YufeiShao-msft 7,146 Reputation points
2022-06-23T07:50:44.85+00:00 Hi @Sergo A ,
SQL Server 2019 introduces an additional option for UTF-8 encoding, but need create or change an object's collation to a collation that has a UTF8 suffix
https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver16#utf8-------------
If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
-
Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator
2022-06-23T21:33:07.823+00:00 I should have looked at both links you had in the initial post. The character you get is UxFDFF, REPLACEMENT CHARACTER. This is the glyph you typically get when there is an encoding error, for instance an incorrect UTF-8 sequence.
Why you get this here, I can't say, but as I noted, this is not a defined code point in code page 1252, so that could be the reason.