SQL Server and UTF-8 Encoding (1) -True or False

Today, I will start my series of articles about SQL Server and Unicode UTF-8 Encoding. In many times, I found when people ask me about UTF-8, they actually don't understand UTF-8.  So today's talk will be quite short. I just clarify some misunderstand.

1.    SQL Server doesn't support Unicode, do you mean UTF-8?

    Sometime, people just say "SQL Server doesn't support Unicode". Actually, it is wrong, SQL Server support Unicode since SQL Server 7.0 by providing nchar/nvarchar/ntext data type.  But SQL Server doesn't support UTF-8 encoding for Unicode data, it supports the UTF-16 encoding.  I copy several definitions from Internet for these concepts:

    "Unicode is the universal character encoding, maintained by the Unicode Consortium. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.  The unique number, which is called the code point of a character, is in the range 0 (hex) to 10FFFF(hex). It is normal to reference a Unicode code point by writing "U+" followed by its hexadecimal number. For example, Character A is defined as  "Latin Uppercase Alphabet",  and assigned a code point U+0041.  
       In additional to assign each character to a unique code point, Unicode Consortium also defined several Unicode transformation formats (UTFs) which are the algorithmic mapping from every Unicode code point to a unique byte sequence. Note, the Unicode code point itself has nothing related to how to store/transform in a Computer, but a UTF does. 
       The commonly used UTFs are UTF-16 and UTF-8. UTF-8 is the byte-oriented encoding form of Unicode which commonly used on Unix Operating System, Web Html and Xml File.  UTF-16 uses a single 16-bitcode unit to encode the most common 63K characters, and a pair of 16-bit code unites, called surrogates, to encode the 1M less commonly used characters in Unicode. UTF-16 is commonly used on Windows, .Net and Java. The transform between different UTFs are loseless and round-tripping. "

    In summary, don't confuse with Unicode and UTF-8 Encoding. They are totally different concepts. 

2.    UTF-8 Encoding is much better than UTF-16 Encoding

There are tons of articles comparing with UTF-8 encoding with UTF-16 encoding.  I will compare these two encoding side by side in my next article.  I can have 100 reasons to say UTF-8 Encoding is better than UTF-16 Encoding, and verse vice.  The correct answer is that no encoding is absolute better than the others.  User should choose the suitable encoding according to your application software requirement.  The operation system, programming language, database platform do matter when choosing the encoding.  UTF-8 is most common on the web. UTF-16 is used by Java and Windows. The conversions between all of them are algorithmically based, fast and lossless. This makes it easy to support data input or output in multiple formats, while using a particular UTF for internal storage or processing.

    So please don't jeopardize SQL Server's Unicode support because of it only support one of the UTFs.

3.    SQL Server cannot store all Unicode Characters

You may notice that I say SQL Server support UTF-16 Encoding in previous paragraph, but I also said SQL Server' nvarchar type encoding is UCS-2.  I intend to do this in this article because I want to discuss the different between these two Encodings in here. Let us describe in details in what area SQL Server support UTF-16 encoding:

  • SQL Server can store any Unicode characters in UTF-16 encoding. The reason is that the storage format for UCS-2 and UTF-16 are the same.
  • SQL Server can display any Unicode characters in UTF-16 encoding. The reason is that we internally call Windows functions to display characters, the Windows functions and fonts can take care of the supplementary character (a character take 4 bytes in UTF-16) correctly.
  • SQL Server can input any Unicode characters in UTF-16 encoding. The reason is that we internally call Windows IMEs (Input Method Editors) to input, the Windows IMEs can take care of the supplementary character (a character take 4 bytes in UTF-16) correctly.
  • SQL Server can sort/compare any defined Unicode characters in UTF-16 encoding.  Note, not all code points are map to valid Unicode character. For example, The Unicode Standard, Version 5.1 defines code points for around 10,000 characters.  All these characters can be compared/sorted in SQL Server latest version: SQL Server 2008.

In contrast, I also list the UTF-16 thing SQL Server doesn't support:

  • SQL Server cannot detect Invalid UTF-16 sequence. Unpaired surrogate character is not valid in UTF-16 encoding, but SQL Server accept it as valid. Note, in reality, it is unlikely end-user will input invalid UTF-16 sequence since they are not support in any language or by any IMEs.
  • SQL Server treats a UTF-16 supplementary character as two characters. The Len function return 2 instead of 1 for such input.
  • SQL Server has potential risk of breaking a UTF-16 supplementary character into un-pair surrogate character, such as calling substring function. Note, in the real scenario, the chance of this can happen is much lower, because 1)  supplementary character is rare 2) string function will only break this when it happens be the boundary.  For example, when calling substring(s,5,1) will break if and only if the character at index 5 is a supplementary character.

In summary, SQL Server DOES support storing all Unicode characters; although it has its own limitation.  Please refer to my previous blogs to details.

Comments

  • Anonymous
    April 13, 2009
    Hi, How to set the Collation to UTF-16 in SQL Server 2000 and SQL Server 2005. could you please give the steps to change the collation. What is the name of the Collation for UTF-16 in SQL Server 2000/2005. Thanks

  • Anonymous
    April 14, 2009
    Hello, Ravi  When you choose nvarchar as your data type,  you automatically get UTF-16 Encoding.  Please note,  For nvarchar types, we always use UTF-16 encoding to store the data no matter which collation you use.  The collation is only matter on what method it will use to compare two string.  Since different language has different sorting rule, that is the reason we have many different collations.  The most common collation is latin1_general_ci_as. Although the name seems indicate it can only sort latin based character correctly, but it actually can sort many or majority of the language correctly.

  • Anonymous
    June 12, 2009
    SharePoint Uploading Files to SharePoint Server 2007 from ASP.NET Web Applications by Using the HTTP

  • Anonymous
    July 09, 2009
    Hi Qingsong, I'm dealing with chinese characters. How come when I imported mt utf-8 file with chinese chars in sql server I get shattered non chinese characters even though I use nvarchar? Can you kindly advise? Thank you

  • Anonymous
    July 10, 2009
    Hello, William  If you are using SQL Server 2005, you can use bcp.exe to bulk load UTF-8 data into your table,  try [-C code page specifier] option and use 65001 as code page specifier.  Note,  I never tried this option, but I know it works.  If you are using SQL Server 2008, the bcp.exe shipped with SQL Server 2008 don't support import UTF-8 data in.  As a workaround, you can install the SQLNCLI library shipped in SQL Server 2005 (I beleive it is a downable package) on another machine, and bulk load UTF-8 data into SQL Server 2008 table.  Another workaround is converting your data from UTF-8 to UTF-16 encoding, you can either write a simple C# program or use NotePad.  Note: the default chinese character's encoding is GB2313, which is supported as code page 936. which is natively supported.

  • Anonymous
    August 23, 2009
    So, what you're saying is that SQL Server doesn't support UTF-16 in all cases, but does support UCS-2 ? (equivalent to UTF-16 BMP only). Unfortunately that will mean that not all Chinese, Japanese and Korean characters can be stored in SQL Server. A partial implementation is not much use if you are doing global internationalization. Thanks for taking the time to write the article.

  • Anonymous
    August 23, 2009
    Hello, CJK  All Chinese, Japanese and Korean characters can be stored in SQL Server.  Do you have a case that a character failed to store or retrieve from SQL Server?

  • Anonymous
    September 18, 2009
    Great Article!! Can you please tell me where the next article is, that you referred to, where you describe the difference between UTF-16 and UTF-8? Thanks for writing the article.

  • Anonymous
    October 02, 2009
    Hi, I have to create a database (SQL2005) for multinational usage (English and Korean languages included). This database will be used by an international application. Here is my question : when creating a new database I'm not sure what collation option to pick to support the Korean language. Thanks in advance for your help

  • Anonymous
    October 05, 2009
    HEllo, James  The short answer is that a collation can support many languages.  A collation has a major language, such as latin1_general_ci_as has major language us_English.  It  also support Korean language since Korean language is compatiable with us_English.  So this collation is enough for your application.    However, the latin1_general_ci_as only support Unicode 2.0 characters, so  if you want more advacned Korean Language support, and Korean is the majority language of your application, you should choose Korean_90_CI_As as your collation.   Please let me know if you have further question about this. Qingsong

  • Anonymous
    November 02, 2009
    Hi Qingsong, Here's a case need your help.  I created a linked server in SQL Server 2005 connecting to MYSQL.  I need to query the UTF-8 data from MYSQL and then insert them into local table.  After insert, the English chars are ok but Chinese chars are not displayed correctly.  Is there a way to use linked server retrieving data from MYSQL correctly?

  • Anonymous
    November 03, 2009
    Hello, Frank   First, please make sure that you are using nvarchar type in your SQL Server column.  Second, how you setup your linked server, which OLE-DB provide you are using?

  • Anonymous
    November 03, 2009
    Hi Qingsong, Thanks for the reply. For linked server, I use Microsoft OLE DB Provider for ODBC Drivers which connect to a DSN - MySQL ODBC 5.1 Driver. Other information:  Database collation - Chinese_PRC_CI_AS;  Coding language for application - Java Please advise.

  • Anonymous
    November 03, 2009
    Hello, Frank  Are you using nvarchar type in SQL Server? What about you only select the data throught linked server, but not insert into the table.     The behavior really depends on how we map the character datatype defined in MySQL to OLE-DB type.    If it maps to varchar type,  you will have data loss, other it maps to nvarchar type,  you should be fine Qingsong

  • Anonymous
    November 10, 2009
    I have a SQL server 2005 database that is has the default Collation of SQL_Latin1_General_CP1_CI_AS.   I am trying to store Turkish Data (8859/9 and/or codepage 1254).  There are actually only six characters that don't store correctly in varchar columns.  For example, one of them is "Ş" (LATIN CAPITAL LETTER S WITH CEDILLA, U+015E, decimal 351).   So, for a test, I created new NVarchar(10) column thinking that would be all that would be required.  However when I run the following in SSMS T-SQL command in SSMS: INSERT INTO mytable (MynVarchar) Values ('Ş') the "Ş" gets changed to a Latin1 Captal "S" without the Cedilla--UNLESS I preface the unicode character with an upper-case 'N'  as below INSERT INTO mytable (MynVarchar) Values (N'Ş') I read in the article below that this is a requirement, but I do NOT understand WHY it is this way.   Why can't I just use NVARCHARS and have the data inserted correctly as you wrote in your article above?  To me this means that it's not a simple matter of changing my columns from Varchar to NVarchar, but in addition, I have to change ALL of my source code to include the prefaced 'N'.   From your article above, (and many others on the internet, excluding the one above from Microsoft), I keep reading that I just need to use NVARCHAR, but this is not correct?  What am I missing here?   Very confusing... http://support.microsoft.com/kb/239530

  • Anonymous
    November 10, 2009
    The comment has been removed

  • Anonymous
    November 10, 2009
    The reason is that 'Ş' is a varchar type which has a code page associate with it.  N'Ş' is a nvarch type and it is a Unicode character.  the sql_latin1_general_ci_as code page don't have the character 'Ş' defined in the code page, so we will use the best matched charcter for replacement. For your secondary question,  I don't know the reason.  It might be client side issue, can you try to use SQL Server Management Studio to run the query and see the result?

  • Anonymous
    November 10, 2009
    I actually remote into their server and have been doing the tests inside of SQL server management Studio.   I just tried it again though, and this time, it worked!  I must have done something wrong when I set the collation the first time (I have been switching back and forth between the default collation and the Turkish and I must have goofed in my testing.)   Anyway, is it appears that setting collation to Turkish_CI_AS, I can then issue the Insert command below without the N and the Ş character is preserved. There are many articles that never mention collation settings though and seem to suggest that at least in terms of storing data "the collation doesn't matter as long as you use NVarchar instead of Varchar as your data type"  These articles seem to me to suggest that the collation only has value in regards to sorting and comparing data.   In fact I just re-read this page of yours and you write: "When you choose nvarchar as your data type,  you automatically get UTF-16 Encoding.  Please note,  For nvarchar types, we always use UTF-16 encoding to store the data no matter which collation you use.  The collation is only matter on what method it will use to compare two string.  Since different language has different sorting rule, that is the reason we have many different collations." This again suggests that the collation doesn't matter.   Can you explain?  Thank you.

  • Anonymous
    November 10, 2009
    I have the following problem: server 1 : sql server 2005 sp1 with proper collation supporting UTF 16 for chinese language support I wanted to move the entire database to new server as it is , so i took backup and then moved it to new sql 2005 sp1 server and restored the backup , now when i query chinese characters are not displaying I have compared each and every setting but no luck. please advice what to check.

  • Anonymous
    November 10, 2009
    Hello, Sai  First,  let us confirm whether it is a display issue or data missing/lose issue. From my experience, it is unlikely the back/restore will cause data lose.  So I guess it might related to data display issue.  Can you confirm that you install the correct chinese fonts.  It happens for Winodw XP/2003 which did not install Chinese Package by default.  You can go to Control Panel -> Language, and install it

  • Anonymous
    November 10, 2009
    hi qingsong, thank you very much for replying me back. i have confirmed that chinese package is installed , so please assist what would be my next steps

  • Anonymous
    November 10, 2009
    Hello, Sai  Can you privde some more information on what knind of query you are using, and the table structure. Also, what you see on your screen.

  • Anonymous
    November 11, 2009
    Hi Qingsong, I am hoping you did not forget about me and my last question regarding whether the database collation is essential or not for "storing" unicode.  From my tests, it appear that it is, but as I've read from your posts above, and from many others, usually people say that the collation only matters in relationship to sorting and comparing teh stored data.

  • Anonymous
    November 11, 2009
    Hello, BartJ  For nvarchar type, the collation only matter for sorting and comparing data, but for varchar type, the collation also controls which language you can insert into. For developing International Application, please use nvarchar type, and the prefix N'',   it will help you reduce a lot of issues later.

  • Anonymous
    November 15, 2009
    Hi Qingsong, Thank you for your response, it turned out to be chinese font not installed :)

  • Anonymous
    November 20, 2009
    Hi Quinsong, actually a have a mysql database linked to SQLServer 2000 using oleDb for ODBC and MySQL ODBC 5.1.  The MySql Database is in UTF-8 Charset.  When I run a query, it's returns a bad word, i mean instead to returns Bogotá, it's returns Bogotá. I can't change the collation, directly on mySQL database because it's not mine.  How can I read those fields in the right way?  I've been trying to do it using collations syntax but it doesn't works. I'll wait for your response. Thnx.  Nelins

  • Anonymous
    November 20, 2009
    Hello, Nelins  I never have experimence using linked server with MySQL.  I will try sometime today.  Can you explore some other way to port data from MySQL to SQL Server

  • Anonymous
    November 20, 2009
    Quinsong, I don't know... I've been using this way in others linked Database in the same server and it works... just on this one is failing.

  • Anonymous
    December 14, 2009
    Why do I see a string of question marks when I run the following statements in query analyzer? create table testutf8(stringval nvarchar(200)) insert into testutf8 values('马英九爱吃的李妈妈') select * from testutf8 returns: ?????????

  • Anonymous
    December 15, 2009
    Hello, Foreight360,  Please try:   insert into testutf8 values(N'马英九爱吃的李妈妈') select * from testutf8

  • Anonymous
    December 15, 2009
    Hello Qingsong, nice article. I have worked with Unicode for years now and I do agree with everything you said. I am designing a database for a forum on my site, I would like to store each posted message body in UTF-8 format, so I believe I'll need to resolve to VARBINARY instead of NVARCHAR. Imagine 700,000 user posts in database, what savings I can achieve with UTF-8 instead of UTF-16. :) So all that is fine, my question really is, how do I then get the FTS (Full Text Search) capability enabled for each of these messages? Since it is a forum, it is natural that users would want to search for certain contents. Another naive question: what does "N" in front of a string literal mean (I'm more of a C++ guy, SQL isn't quite my cup of tea). Sorry I could have searched but "N" being a single character is not quite search friendly. Thanks in advance, Ben.

  • Anonymous
    December 15, 2009
    I am using select ... for xml auto is the encoding also utf16? I am unable to convert it to ascii. it is just normal xml, no special chars.

  • Anonymous
    December 16, 2009
    Hello, Martien  When you use for xml auto, the result is a XML data type, which is not nvarchar or nvarchar type and it has no encoding.  If you want to convert it to ascii, you can use explict convert/cast from xml to varchar type.  

  • Anonymous
    December 16, 2009
    Hello, Ben  Article Description of storing UTF-8 data in SQL Server at http://support.microsoft.com/kb/232580 is your best reference for Storing UTF-8 data inside SQL Server.  I strong suggest you store your data in UTF-16 encoding inside of database instead of using varbinary data type. The main reason is that SQL Server don't have any built-in support for UTF-8, so that you will unable to query/fulltext search UTF-8 data.    If you are using ASP or ASP.Net , the client will automaticlly handling coonverting frm UTF-16 data to UTF-8 for you.  So you will not experience on any issue. You can refer to this article http://support.microsoft.com/kb/893663 for details.  It is true that you may waste some disk space, but you will have less trouble on handiling unicode data in your application.   If you want save disk-space, the data compression feature in SQL Server 2008 can achieve this, and the feature was enhanced in SQL Server 2008 R2 to achieve better compression ratio than UTF-8 encoding.  The reasn of having N' is because we have two different kinds of string const:    'abc'  means a  constant with varchar types, since varchar type already have a code page, so that it can only store characteres defined in the code page.  N'abc' means a constant with nvarchar type which is UTF-16 encoding.  Follow statement:  create table t1(c1 nvarcahr(10)  insert into t1('abc') will insert a varchar 'abc' into a nvarchar column.  So even the c1 is nvarchar column, but the constant might be converted to '???' before inserting into the table if a,b,c are not in the codepage.  Please let me know if you have further questions.

  • Anonymous
    March 19, 2010
    I'm in headache to think about a best match of 'Collation in MSSQL' and 'Encoding (charset) in XML' for Korean Language. Initially I pick 'Korean_90_CI_As' for testing. What is your recommendation for me? Thanks in advance.

  • Anonymous
    May 12, 2010
    Hi, thanks for the good article first. I have a problem, How can I use the prefix 'N' in store procedure in MS SQL Server to convert string to UTF-16 for storing? Thanks!  

  • Anonymous
    May 13, 2010
    Hello, Hu  If you are using SqlParamter in C#, be sure to use SqlDataType.NVarChar type instead of SqlDataType.VarChar, the system will pass the correct the value to the server.  If you are building T-SQL query in C# or other language, you need handle N'' by yourself when you construct the query

  • Anonymous
    June 26, 2010
    The comment has been removed

  • Anonymous
    June 28, 2010
    Hello, Deepak   You need nvarchar type of SQL Server to insert Unicode strings.  Also, please use nvarchar type as the parameter of your stored procedure.  Please let me know if you still have trouble on this. You can e-mail me at qyao@microsoft.com for further question.

  • Anonymous
    July 11, 2010
    I am trying to send email using sp_send_dbmail, it works fine but if the email contains any non-English letter (Arabic letters) it will appear in the email as ???? i tried to "@messageBody=TEXTTEXT collate arabic-bin" but i got the same result any idea is highly appreciated Thanks

  • Anonymous
    August 26, 2010
    Hello, thx for article I have a hube problem with correct display of polish characters on PHP application. The collation of table is set to default SQL_Latin1_General_CP1_CI_AS. Data type of column is nchar. When I insert records manualy with polish chars and then display them using select statement then all it's ok, i'm talking about using select query directly in MSSQL Server 2005. Problem appears when I want to display rows in PHP app. Charset of html document is set to UTF-8. I was trying to display chars correctly using utf8_encode function but no effect, all polish chars like "ąęłźćś" are auto converted to "aelzcs". Any ideas?

  • Anonymous
    August 27, 2010
    Hello, Matt   Can you see whether this article msdn.microsoft.com/.../cc626307(SQL.90).aspx can help you with the PHP scenario.

  • Anonymous
    January 17, 2011
    Hi, In fact it was a great read. Thanks for the article. I have an issue with japanese characters. I am using SQL server to connect to a Progress database through ODBC. I have the fields in table set to nvarchar, and DB collation is French_CI_AS. The issue is that I am not able to correctly store japanese characters in the table. WHen I am inserting data manually using the N'' prefix, its working fine. How can I use N'' while selecting? For eg: if I have a query, insert into tab1 select field1 from tab2. How can I prefix N'' to field1?

  • Anonymous
    January 19, 2011
    Hello, Sankar  For select, you don't need the prefix of N'. For your query:      Insert into tab1 select field1 from tab2  it should work fine.  Can you make sure your column is nvarchar types. and give me more information about your issue, and I can help you further.

  • Anonymous
    April 06, 2011
    The comment has been removed

  • Anonymous
    April 06, 2011
    Hello, Larry  Can you try to use notepad to use the csv file. and save as [UTF-Encoding] to the file.

  • Anonymous
    October 30, 2011
    > SQL Server treats a UTF-16 supplementary character as two characters. > The Len function return 2 instead of 1 for such input. I have regular need for characters above U+FFFF (the extended Han characters, specifically). The fact that these characters are counted as two characters is quite problematic. For example, in a user interface, a user is allowed to enter a field up to 25 characters. This will be a mixture of characters below U+FFFF (1 "character") and a few above U+FFFF (2 "characters"). To the user, the total number of characters that one can actually enter is thus less than the defined 25. There are no workarounds so we often need to make the width 15% larger than we actually say is possible. Rather, the length should be counted as UTF-32 units.

  • Anonymous
    November 01, 2011
    In next version of SQL Server, we will have new UTF-16 collation. The length issue will be address. However, when you declare a field/column as nvarchar(10), it might still hold 5 characters due to the Extended Han characters.  The reason is that the length semantic for data type is not changed. Oidon,  Please keep in mind that the Extended Han characters is not frequently used in many apps, so you can include a small number to the total numbers, it should work in many cases. Qingsong

  • Anonymous
    November 27, 2011
    The comment has been removed

  • Anonymous
    March 27, 2012
    Would uou please tell what collation i may use for Thai language for compatibality with java

  • Anonymous
    March 28, 2012
    You can use any collation with Thai language if you use nvarchar data type.  If majority of your data is in Thai lanaguage, you might choose collation Thai_CI_AS or Thai_100_CI_AS. Qingsong

  • Anonymous
    January 11, 2013
    QingsongYao please help me simplify

  1. In regards to XXX_CI and XXX_CS do I undestand correctly that with XXX_CI “Three = ThREE” would be true.
  2. Respevtively with XXX_CS “three = ThRee” it would be false and only “Three = Three” it would be true?
  3. Also is this true in the cases of IF compare, Where clause, Join predicates etc. You can just use "yes" or "no" 3 times and that would be all. Thank you.
  • Anonymous
    June 02, 2014
    I am trying to send email using sp_send_dbmail, it works fine but if the email contains any non-English letter (Arabic letters) it will appear in the email as ????

  • Anonymous
    July 16, 2014
    I want to insert tamil language.but sql server 2005 does not insert properly it had inserted  in this format only ஹரி and i have set in nvarchar ...my query is insert into tamil values(N'ஹரி'); please help me

  • Anonymous
    July 16, 2014
    what is your data type, and how you insert the data, using Sql server management studio or other tools.

  • Anonymous
    August 04, 2014
    Hi QingsongYao Could you please help me on below: I have a user who is pulling data from a SQL Server that we do not support anymore.  The db is on SQL 2008 and the file that the user receives in is being set as UTF6.  What the user needs is the data to come through as UTF8 (Unicode).  What this tells me is that the SQL Server is set to UTF6 or the DB is set to UTF6.  What they would like to do is to take the data in the UTF6 db and have it converted to UTF8.  Is there a way to write a SQL syntax that will allow to specify UTF8 and have the data converted in that format? Thank You Sunil

  • Anonymous
    August 05, 2014
    Hello, Sunil  I remember we block bcp in/out UTF-16 to UTF8 encoding during SQL Server 2008, I guess your customer is hitting this issue. The workaround is a encoding transform at client side. You can use Win32API, .Net, Java, Powershell or many tools to achieve this. Note,  I mean when your bcp out your data, you must specify that you want Unicode format.

  • Anonymous
    August 05, 2014
    Thank You so much for your reply QingsongYao.

  • Anonymous
    September 02, 2014
    Hi QingsongYao.... Can we turn on UTF8 encoding for the database in SQL Server 2008 (SP3) Standard Edition (64-bit) ?

  • Anonymous
    September 02, 2014
    This is no UTF-8 support in SQL Server 2008 and SQL Server 2012.

  • Anonymous
    September 24, 2014
    HI Iam passing query parameter string value . It works fine for other special chars except japanese and chinese. Iam using sql server db with weblogic12c AND JAVA1.6

  • Anonymous
    September 25, 2014
    what is the data type,  how you send command or queries to SQL server.

  • Anonymous
    September 25, 2014
    what is the data type in Sql server, and how you send data to sql server.

  • Anonymous
    December 16, 2014
    The comment has been removed

  • Anonymous
    April 16, 2015
    Hi QingsongYao, I am using SQL Server 2008 and I encountered a problem. I created a table and an index on a column defined as nvarchar: CREATE TABLE dbo.TestTable    (Id int NOT NULL,     Name nvarchar(50) NULL); GO -- Create a clustered index called Index_On_TestTable -- the Name column CREATE UNIQUE CLUSTERED INDEX Index_On_TestTable    ON dbo.TestTable (Name); I introduced into table the values (1, Ecosyst), (2, Ecosyst☆) and (3, Ecosyst抎). But I can't introduce the value (4,Ecosyst☆). I get this error: Cannot insert duplicate key row in object 'dbo.TestTable' with unique index 'Index_On_TestTable'. Why? The last name doesn't exist in my table.

  • Anonymous
    April 17, 2015
    We do linguistic comparison, it is possible that the two last rows are comparing equal.  Can you try to verify whether they are equal:  such as search the third string in the table.

  • Anonymous
    April 19, 2015
    Sorry, I have a syntax mistake in the first comment. So, I have only this three values in the table:  (1, Ecosyst), (2, Ecosyst☆) and (3,Ecosyst抎). But, I can't add the value (4,Ecosyst☆) although it doesn't exist in my table.

  • Anonymous
    April 19, 2015
    Some special characters don't appear in the comment...

  • Anonymous
    April 20, 2015
    I think it is still issue related to linguistic comparison. The symbols of your last character are treated as equal. If you use latin1_general_100_bin2 collation, you will not see this issue.

  • Anonymous
    April 20, 2015
    Yes, it works using latin1_general_100_bin2 collation. Thanks for your help!

  • Anonymous
    July 22, 2015
    I'm trying to retreive Russian Characters from sql server. It' showing as ????. What the correct Collation for nvarchar for multi language including Russian ? nd a solution asap..

  • Anonymous
    July 23, 2015
    Jay,  I think the problem is the insertion part, not storage part.  Please always use N'' to insert your data. any collation with nvarchar data type can store any Unicode characters.

  • Anonymous
    July 23, 2015
    Hi QingsongYao, Thanks a lot for quick reply. :) :) I have tested with ur comment. But to retrieve the particular Russian value, I had to put prefix 'N' also. Our application is already developed one and is there any other solution to do so without using 'N' ?? Any other collation or any other way to get solved this issue ??

  • Anonymous
    July 24, 2015
    In this case, choosing a Russian collation for your database should resolve your issue. Note,  it will only support Russian characters + Latin /ASCII characters since your app use '' instead of N''.