Delen via


Details of the automatic spelling suggestions in FAST Search for SharePoint 2010

It’s been a well known fact that you cannot change the automatic spelling suggestions you get with FAST Search for SharePoint 2010. Either you go with the out of the box dictionaries, or you get it automatically aligned with your crawled content. As it turns out, this is possible after all!

After I found the article from support (How to utilize the spelltuner to add or boost terms for spellchecking), I wanted to understand all the details of the spelltuning. So here it goes.

Normal process:

  1. The document processor TermExtractor extracts the terms and writes them to a file in $FASTSEARCH\data\termextractor. The file name will be %PORT%_%HOSTNAME%.out.gz. The files contain one line per term, on the format <lang> <term> <frequency>
  2. Every 10th batch (configurable for debugging purposes in $FASTSEARCH\etc\processors\TermExtractor.xml) the document processor will upload the terms to the resource store. (stored in $FASTSEARCH\components\resourcestore\dictionaries\spelltuner)
  3. The spelltuner runs as a continuous process in the node controller. Every 60 minutes it downloads the files the document processor has uploaded to the resource store.
  4. Every 24 hours the spelltuner will generate a new dictionary (one per language) and uploaded to the resource store. (stored in $FASTSEARCH\components\resourcestore\dictionaries\spellcheck)
  5. If there were any changes to one or more dictionaries the spelltuner will notify the qrserver, and it will reload the dictionary.

Custom tuning:

To inject your own words into a dictionary, create your own gzipped file on the format <lang> <term> <frequency> and store it in $FASTSEARCH\components\resourcestore\dictionaries\spelltuner.

Spelltuner parameters:

The spelltuner runs with some preconfigured parameters which might need to be changed. The most interesting one for small indexes is the --wordcount-threshold which by default is 100000. This means that a minimum of 100000 words is required for a language in order to trigger spell tuning. For a small index this parameter might need to be set as low as 1000. The parameter can be changed in NodeConf.xml. However, since changing this file is not supported (and might be overwritten in an update), an alternative would be to stop the spelltuner (nctrl stop spelltuner) and run it manually or in a scheduled task.

 

DISCLAIMER: This information was extracted by myself after digging into an FS4SP installation and the existing documentation. It is not official documentation of any sort.