Stopwords and Stoplists
To prevent a full-text index from becoming bloated, SQL Server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords. During index creation, the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on stopwords.
Important
SQL Server 2005 noise words have been replaced by stopwords. When a database is upgraded to SQL Server 2008 from a previous release, the noise-word files are no longer used in SQL Server 2008. However, the noise-word files are stored in the FTDATA\ FTNoiseThesaurusBak folder, and you can use them later when updating or building the corresponding SQL Server 2008 stoplists. For information about upgrading noise-word files to stoplists, see Full-Text Search Upgrade.
A stopword can be a word with meaning in a specific language, or it can be a token that does not have linguistic meaning. For example, in the English language, words such as "a," "and," "is," and "the" are left out of the full-text index since they are known to be useless to a search.
Although it ignores the inclusion of stopwords, the full-text index does take into account their position. For example, consider the phrase, "Instructions are applicable to these Adventure Works Cycles models". The following table depicts the position of the words in the phrase:
Word |
Position |
---|---|
Instructions |
1 |
are |
2 |
applicable |
3 |
to |
4 |
these |
5 |
Adventure |
6 |
Works |
7 |
Cycles |
8 |
models |
9 |
The stopwords "are", "to", and "these" that are in positions 2, 4, and 5 are left out of the full-text index. However, their positional information is maintained, thereby leaving the position of the other words in the phrase unaffected.
Stoplists
In SQL Server 2008, stopwords are managed in databases using objects called stoplists. A stoplist is a list of stopwords that, when associated with a full-text index, is applied to full-text queries on that index.
Creating a Stoplist
You can create a stoplist in any of the following ways:
Using the system-supplied stoplist in the database. SQL Server ships with a system stoplist that contains the most commonly used stopwords for each supported language, that is for every language that is associated with given word breakers by default. The system stoplist contains common stopwords for all supported languages. You can copy the system stoplist, and customize your copy by adding and removing stopwords.
The system stoplist is installed in the Resource database.
Creating your own stoplist, and then adding stopwords to it for any language that you specify. You can also drop stopwords from your stoplist when necessary.
Using an existing custom stoplist from any other database in the current server instance and then adding and dropping stopwords as necessary.
Important
CREATE FULLTEXT STOPLIST, ALTER FULLTEXT STOPLIST, and DROP FULLTEXT STOPLIST are supported only under compatibility level 100. Under compatibility levels 80 and 90, these statements are not supported. However, under all compatibility levels the system stoplist is automatically associated with new full-text indexes.
To create a stoplist
To add or drop stopwords from a stoplist
To drop a stoplist
Using a Stoplist in Queries
To make use of a stoplist in queries, you must associate it with a full-text index. You can attach a stoplist to a full-text index when you create the index, or you can alter the index later to add a stoplist.
To create a full-text index and associate a stoplist with it
To associate or disassociate a stoplist with an existing full-text index
To suppress an error message if stopwords cause a Boolean operation on a full-text query to fail
Viewing Stoplists and Stoplist Metadata
To view all the stopwords of a stoplist
To get information about all the stoplists in the current database
To view the tokenization result of a word breaker, thesaurus, and stoplist combination