Full Text Indexing Returns Incorrect or No Results

Shoemaker, Don 1 Reputation point
2022-03-22T15:11:56.1+00:00

In SQL 2017, we have a table defined with a column named CheckNo that is a varchar(100) We have a FTI on this column but are getting inconsistent or at times NO results and I'm trying to determine why. We have rebuilt/re-populated the FTI and still get bad results. I have tried using CONTATINS or FREETEXT with no differing results. Due to the amount of data in this table we really would not want to switch back to a LIKE compare due to processing time, but that seems to be the only way we get the results we need. An examination of the Crawl log yields no input or failures.

Examples follow:

--Example 1. Data exists with this string combo, not complete search term, but FTI returns no rows:
select * from TableName where CONTAINS(CheckNo ,'"51A33"' ) -- 0 rows returned
select * from TableName where FREETEXT(CheckNo , '51A33' ) -- 0 rows returned
select * from TableName where CheckNo like '%51A33%' -- 10 rows returned

--Example 2. Data with partial match returns 2 different ways via FTI with only the like compare returning all data
select * from TableName where CONTAINS(CheckNo , '"da80186140"' ) --10 rows returned
select * from TableName where CONTAINS(CheckNo , '"80186140"' ) --1 rows returned
select * from TableName where CheckNo like '%80186140%' -- 11 rows returned

--Example 3. Sheer difference in return set data volume
select * from TableName where CONTAINS(CheckNo , '"37265"' ) -- 142 rows returned
select * from TableName where CheckNo like '%37265%' -- 589 rows returned

Our team has done a moderate amount of searching for issues with this topic but haven't turned up much. Thought I would ask here to see if we could get some traction. Thanks!

EDIT: The format of my actual queries is correct. The forum posting stripped the * (or doesn't display them)

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,895 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,563 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Erland Sommarskog 102.3K Reputation points
    2022-03-22T22:41:35.41+00:00

    Yes, your operation is correct. Full-text breaks the text into words and index those words. Which means that finding something that start with these words is quick. But it does not help you find fragment in words.

    If you want to find fragment in words and you find LIKE to be a tad slow, there is a way to speed it up: cast to a binary collation

    SELECT * FROM tbl WHERE col COLLATE Latin1_General_BIN2 LIKE '%ABC%'
    

    Which gain you make depends on your current collation and data type. Or more precisely, you will make a gain of a factor 7-10 if you have nvarchar or a Windows collation. If you have an SQL collation and varchar, your gain will be quite modest.

    If you want to it go as about as fast as full-text can give you, this is possible, but it requires heavy artillery. I have actually written about this, but in difference to about everything else I've written, this is not available on my web site, but in a book, where the royalities go to WarChild International. You can find a teaser here: https://www.sommarskog.se/yourownindex.html.

    1 person found this answer helpful.

  2. Tom Phillips 17,716 Reputation points
    2022-03-22T17:25:10.49+00:00

    Full text search breaks strings into "words". Numbers and letters are not "words". Full text indexes will not work for what you are trying to use it for.

    https://learn.microsoft.com/en-us/sql/relational-databases/search/full-text-search?view=sql-server-ver15#overview

    Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese.


  3. Bert Zhou-msft 3,421 Reputation points
    2022-03-23T08:30:55.543+00:00

    Hi,@Shoemaker, Don

    Welcome to Microsoft T-SQL Q&A Forum!

    Maybe you don't need to use like because of inefficiency, well I hate to use it when there are large data, here I suggest you to change the full text parser (take a neutral language: NaturalLanguage6.dll) and English format.

    Below is my test data:

    CREATE FULLTEXT CATALOG FullTexttest22  
    CREATE TABLE TF (Id INT NOT NULL, AllText NVARCHAR(400))  
      
    CREATE UNIQUE INDEX test_tfts2 ON TF(Id)  
      
      
    CREATE FULLTEXT INDEX ON TF(AllText language 2057)  
        KEY INDEX test_tfts2 ON FullTexttest22  
        WITH CHANGE_TRACKING AUTO, STOPLIST OFF  
      
        INSERT INTO TF  
    VALUES (1, ' 123_456 789 '), (2, ' 789 123_456 '),  
           (3, ' 123_456 ABC '), (4, ' ABC 123_456 ')  
        select * from tf  
        where contains(AllText,'"*123*"')  
    

    Here is a link about Change tokenizer.
    Just a reminder: you have to add an extra step. When the parser change is done, the FTI has to be dropped and recreated to refresh the data, my database version is 2008 and debugging works fine at the moment.
    Data I use:
    186041-image.png
    test results:
    185992-image.png

    Best regards,
    Bert Zhou


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.