Full-Text Search Overview

Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.

Full-text search is applicable in a wide range of business scenarios such as e-businesses—searching for items on a web site; law firms—searching for case histories in a legal-data repository; or human resources departments—matching job descriptions with stored resumes. The basic administrative and development tasks of full-text search are equivalent regardless of business scenarios. However, in a given business scenario, full-text index and queries can be honed to meet business goals. For example, for an e-business maximizing performance might be more important than ranking of results, recall accuracy (how many of the existing matches are actually returned by a full-text query), or supporting multiple languages. For a law firm, returning every possible hit (total recall of information) might be the most important consideration.

Overview of How to Configure a Database for Full-Text Searching

For any scenario, a database administrator performs the following basic steps to configure table columns in a database for full-text search:

  1. Create a full-text catalog.

  2. On each table that you want to search, create a full-text index by:

    1. Identify each text columns that you want to include in the full-text index.

    2. If a given column contains documents stored as binary data (varbinary, varbinary(max), or image data), you must specify a table column (the type column) that identifies the type of each document in the column being indexed.

    3. Specify the language that you want full-text search to use on the documents in the column.

    4. Choose the change-tracking mechanism that you want to use on the full-text index to track changes in the base table and its columns.

Full-text search supports multiple languages through the use of the following linguistic components: word breakers and stemmers, stoplists that contain stopwords (also known as noise words), and thesaurus files. Thesaurus files and, in some cases, stoplists require configuration by a database administrator. A given thesaurus file supports all full-text indexes that use the corresponding language, and a given stoplist can be associated with as many full-text indexes as you want.

For more information, see Administering Full-Text Search.

Overview of Full-Text Queries

After the columns have been added to a full-text index, applications and users can run full-text queries on the text in the columns. These queries can search for any of the following:

  • One or more specific words or phrases (simple term)

  • A word or a phrase where the words begin with specified text (prefix term)

  • Inflectional forms of a specific word (generation term)

  • A word or phrase close to another word or phrase (proximity term)

  • Synonymous forms of a specific word (thesaurus)

  • Words or phrases using weighted values (weighted term)

Full-text queries all use a small set of Transact-SQL predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE). However, the search goals of a given business scenario influence the exact structure of the full-text queries; for example:

  • e-business—searching for a product on a website:

    SELECT product_id FROM products WHERE CONTAINS(product_description, ”Snap Happy 100EZ” OR FORMSOF(THESAURUS,’Snap Happy’) OR ‘100EZ’) AND product_cost<200 …
    
  • Recruitment scenario—searching for job candidates that have experience working with SQL Server:

    SELECT candidate_name,SSN FROM candidates WHERE CONTAINS(candidate_resume,”SQL Server”) AND candidate_division =DBA
    

For more information, see Querying SQL Server Using Full-Text Search.

In contrast to full-text search, the LIKETransact-SQL predicate works on character patterns only. Also, you cannot use the LIKE predicate to query formatted binary data. Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data. A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.