Condividi tramite


Tune linguistic relevance (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese)

Aggiornato: 10 febbraio 2011

The table below describes the different linguistic features and their effect on relevance and recall.

Linguistic feature Impact on relevance Impact on recall Description

Synonyms

No

Yes

Synonyms are a list of words attached to a keyword. Keywords are words or phrases that you have identified as common terms within your organization. You attach synonyms to keywords to increase recall. When a search includes a synonymous term for a keyword, items that contain the keyword are also returned. Furthermore, if a search includes a keyword, items that contain the synonymous terms are also returned, regardless of whether they contain the keyword. Please note that this only applies when there is a complete match between the search word and any defined keyword or synonymous terms.

Stemming

Yes

Yes

Words in each language can have multiple forms but basically mean the same thing. For example, the verb “to write” includes forms such as writing, wrote and writes. Similarly, nouns usually include singular and plural versions, such as book and books. The stemming feature in FAST Search Server 2010 for SharePoint can increase recall of relevant documents by mapping one form of a word to its variants.

Stemming is applied to the contents of managed properties for which stemming is enabled.

Spell checking

Yes

No

The spell checking feature improves the quality of searches by comparing the search words against language specific dictionaries and identifying misspelled terms. If the dictionary contains a closely matching word with a significantly higher frequency, that word is suggested through the Did you mean? Feature. You can fine-tunes the spell checking dictionaries to make sure that they are aligned with the frequency of words in the processed documents. Users will only get spell checking suggestions that are relevant within the processed content.

You can also define spell checking exceptions. These are words that are not found in the default spell checking dictionary but that are still valid words. When a user types a search word that is included in the Spell checking exception list, the, Did you mean? feature will not suggest a correction for that word.

The spell checking dictionary contributes to both increased recall and relevance because the feature prevents usage of misspelled words.

Anti-phrase

Yes

No

Anti-phrasing refers to phrases for which there is no value in indexing. “Where can I find information on” is a typical anti-phrase for English. You cannot tune the anti-phrasing dictionaries.

Tokenization

Yes

Yes

The tokenization process splits a stream of text into individual words (tokens) that can be indexed. For East Asian languages (Chinese (Simplified and Traditional), Japanese, Korean, Thai), where spaces are not consistently used to separate words, tokenization is especially important for relevancy.

As there is no fixed standard in different languages for what forms a separate token, different speakers have different ideas of what a token may be. For example, some users may consider 富士山(Mount Fuji) to be one token and other users may regard it as two tokens, 富士 (Fuji) and 山 (Mount).

The inconsistencies between what users consider being one token, and what the tokenizer module actually identifies as one token, can cause lower precision or lower recall, for example:

  • The Simplified Chinese tokenizer module splits the name 萨斯喀彻温 (Saskatchewan) into these tokens: 萨 (Sa), 斯 (s), 喀 (ka), 彻 (tche), 温 (wan).

    A search for the name "Saska", 萨 (Sa), 斯 (s), 喀 (ka) will also retrieve a document that contains "Saskatchewan", which means that the precision is lower than may be expected.

  • The Japanese tokenizer module marks "サスカチュワンサスカトゥーン" (Saskatchewan Saskatoon), as one token.

    A search for "Saskatchewan" will not retrieve a document that contains "Saskatchewan Saskatoon". Therefore, recall will be decreased.

In this section:

See also

Add, remove and display synonyms for keywords by using Windows PowerShell (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese)

Manage stop word files (SharePoint Server 2010)(informazioni in lingua inglese)

Linguistic features per language (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese)

Cronologia delle modifiche

Data Descrizione Motivo

10 febbraio 2011

2011/02/07

Aggiornamento contenuto

12 maggio 2010

Pubblicazione iniziale