Issue with some Chinese Characters in select statements

mo boy 396 Reputation points
2022-06-20T22:27:01.733+00:00

Dear Experts,

I have an issue with one of select statements, it is running fine on one server and not returning rows in another server.

The first server where it works has same collation (DB and server) (SQL_Latin1_General_CP1_CI_AS)

The second server has different collation (DB and server)
DB has SQL_Latin1_General_CP1_CI_AS
Server has latin1_general_ci_as

FullContent column is nvarchar(max)

SELECT DISTINCT * FROM MyDB.dbo.My_Table WITH (NOLOCK)
WHERE (( CONTAINS ((FullContent),'("人" AND "水")')))
AND txtfilename='sample.txt'

If I remove the second word 水 (sample query below), the query works. But I don't see this issue on the server where it is working.
SELECT DISTINCT * FROM MyDB.dbo.My_Table WITH (NOLOCK)
WHERE (( CONTAINS ((FullContent),'("人" )')))
AND txtfilename='sample.txt'

There appears to be some issue with this 水 on this server. Could you please advise?

I also tried with the N prefix, but works on old server not on new.
SELECT DISTINCT * FROM MyDB.dbo.My_Table WITH (NOLOCK)
WHERE (( CONTAINS ((FullContent),N'("人" AND "水")')))
AND txtfilename='sample.txt'

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,465 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,690 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Erland Sommarskog 116.7K Reputation points MVP
    2022-06-22T06:32:25.463+00:00

    I think I know the answer. In the database where it's working, the language for the index is not Chinese, where as in the one where you don't get a hit, the the language is Chinese.

    I ran this query:

       SELECT * FROM sys.fulltext_system_stopwords WHERE stopword = N'人'  
    

    and it returns to rows. I don't know LCIDs by hear, but I don't think 1028 is Italian. :-)

    A stopword is one that is not included in the index. For English this are small and common words like "to", "from", "and" etc for which it would be useless to search because there are so many of them.

    Whether 人 are of the same ilk, I don't know, since I don't know Chinese.

    1 person found this answer helpful.

  2. LiHong-MSFT 10,051 Reputation points
    2022-06-21T02:22:47.897+00:00

    Hi @mo boy
    How about this :

    SELECT DISTINCT * FROM MyDB.dbo.My_Table WITH (NOLOCK)  
    WHERE CONTAINS (FullContent,N'"人" AND "水"')  
    AND txtfilename='sample.txt'  
    

    Best regards,
    LiHong


  3. Erland Sommarskog 116.7K Reputation points MVP
    2022-06-27T20:32:00.34+00:00

    So it seems likely that the answer is that Microsoft did some changes between SQL 2008 R2 and SQL 2016. Please don't ask exactly what and why, because I have no clue.

    Out of curiousity, would it be possible for you to produce a script that creates the table and the fulltext index and then adds some data the demontstrates the problem?

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.