How to: Load Licensed Third-Party Word Breakers

SQL Server 2008 includes licensed third-party word breakers for the following languages:

  • Danish

  • Polish

  • Turkish

These word breakers are available but are not installed by default, and must be manually registered and then added to the list of LCIDs that are supported for full-text indexing and querying.

Prerequisite Information

Before you can load a word breaker, you need the following information:

  • Instance names for each instance of SQL Server on which you want to register the word breakers.

  • The FTDATA path for each instance.

    After obtaining the instance IDs, you must retrieve the appropriate instance-specific path to the FTData folder. You will use this path when adding configuration values that specify the lexicon and thesaurus files for a language.

To Obtain the Instance ID for an Instance of SQL Server

  1. Click Start, and click Run.

  2. In the Run dialog box, in the Open box, type Regedit.

  3. Click OK. The Registry Editor opens.   

  4. Navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\Instance Names\SQL. There, the right hand pane displays the instance name and instance ID, in the Data column, for every installed instance of SQL Server. Obtain the instance ID of the of each server instance on which you are going to load third-party word breakers.  

To Obtain the FTData Path for Each Instance

  1. Click Start, and click Run.

  2. In the Run dialog box, in the Open box, type Regedit.

  3. Click OK.

  4. In the Registry Editor, select the following registry key for an instance of SQL Server: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\instance_ID\MSSQLServer where instance_ID is the identifier of the server instance on which you are loading word breakers. For example, for the default server instance, the registry key value is:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\Setup

    The right pane displays the FullTextDefaultPath value, which contains the instance specific path to the FTData folder. For example, for the default instance of SQL Server 2008 path is:

    C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\FTData

The installation procedure for third-party word breakers licensed by Microsoft consists of three stages. The following list summarizes these stages, whose steps are described later in this section.

  1. Add the COM ClassID(s) for the word breaker and stemmer interfaces for the language being registered as a key to the <InstanceRoot>\MSSearch\CLSID node of the registry.

  2. Add a key to the <InstanceRoot>\MSSearch\Language node for the language.

  3. Add configuration values that specify the location of the lexicon and thesaurus files for the language.

Note

The Danish word breaker is used as an example in this section. The values required for installing word breakers for each of the languages are provided in the tables later in this topic.

Stage 1: Add the COM ClassID(s) for the Word Breaker and Stemmer Interfaces for the Language Being Registered

Warning

Incorrectly editing the registry can severely damage your system. Before making changes to the registry, you should back up any valued data on the computer.

To add COM Class ID(s) for these components for the Danish language**:**

  1. Open the Registry Editor, by:

    1. Clicking Start, and clicking Run.

    2. In the Run dialog box, in the Open box, type Regedit.

  2. In Registry Editor, select the following registry key for the instance of SQL Server: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\CLSID

  3. On the menu bar, click Edit, click New, and click Key.

  4. Type {16BC5CE4-2C78-4CB9-80D5-386A68CC2B2D}.

  5. Press ENTER.

  6. In the right pane, right-click the Default registry value, and then click Modify.

  7. In the Edit String dialog box, in the Value data box, type danlr.dll, and then click OK.

  8. Repeat steps 3 through 7, replacing the value in step 4 with {83BC7EF7-D27B-4950-A743-0F8E5CA928F8}.

For a given language, follow the steps above, replacing the key values in steps 4 and 8 with the key values for the language you want. These values are listed below. In step 7, replace danlr.dll with the .dll name for the language you want.

Language

Key value for step 4

.DLL name for step 7

Key value for step 8

Danish

{16BC5CE4-2C78-4CB9-80D5-386A68CC2B2D}

danlr.dll

{83BC7EF7-D27B-4950-A743-0F8E5CA928F8}

Polish

{B8713269-2D9D-4BF5-BF40-2615D75723D8}

lrpolish.dll

{CA665B09-4642-4C84-A9B7-9B8F3CD7C3F6}

Turkish

{23A9C1C3-3C7A-4D2C-B894-4F286459DAD6}

trklr.dll

{8DF412D1-62C7-4667-BBEC-38756576C21B}

Stage 2: Add a Key to the <InstanceRoot>\MSSearch\Language Node for the Language

To add a key to this node for the Danish language:

  1. Select the following registry key for the default instance of SQL Server: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\Language

  2. Repeat steps 3 through 5 in the preceding procedure, replacing the key name in step 4 with dan.

For a given language, follow the preceding steps, replacing the key name in step 4 with the value listed below for the specific language.

Language

Key name for step 4

Danish

dan

Polish

plk

Turkish

trk

Stage 3: Add Configuration Values That Give the Location of Each Linguistic Component for a Language

To add configuration values for these components for the Danish language:

  1. Select the registry key you entered in Stage 2 above. For the default instance of SQL Server this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\Language\dan

  2. On the menu bar, click Edit, click New, and click String Value.

  3. Type TsaurusFile.

  4. Press ENTER.

  5. Right-click the TsaurusFile registry value you just added, and then click Modify.

  6. In the Edit String dialog box, in the Valuedata box, type tsdan.xml.

  7. Click OK.

Repeat steps 2 through 7 for the remaining linguistic components of the language—thesaurus file, language (locale), word breaker and stemmer. The values to register those components for the Danish, Polish, or Turkish language are provided below.

Values for Danish

Repeat steps 2 through 7 to add each set of values listed below, replacing the language-specific value type (step 2), value name (steps 3 and 5), and value data (step 6) for each value.

Value type for step 2

Value names for steps 3 and 5

Value type for step 6

String value

TsaurusFile

tsdan.xml

DWORD value

Locale

00000406

String value

WBreakerClass

{16BC5CE4-2C78-4CB9-80D5-386A68CC2B2D}

string value

StemmerClass

{83BC7EF7-D27B-4950-A743-0F8E5CA928F8}

Values for Polish

For the Polish language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Polish in Stage 2 above. For the default instance of the SQL Server, this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\Language\plk

Complete steps 2 through 7 to add each set of values listed below, replacing the language-specific value type (step 2), value name (steps 3 and 5), and value data (step 6) for each value.

Value type for step 2

Value names for steps 3 and 5

Value data for step 6

String value

TsaurusFile

tsplk.xml

DWORD value

Locale

00000415

String value

WBreakerClass

{CA665B09-4642-4C84-A9B7-9B8F3CD7C3F6}

String value

StemmerClass

{B8713269-2D9D-4BF5-BF40-2615D75723D8}

Values for Turkish

For the Turkish language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Turkish in Stage 2 above. For the default instance of SQL Server, this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\Language\trk

Complete steps 2 through 7 to add each set of values listed below, replacing the language-specific value type (step 2), value name (steps 3 and 5), and value data (step 6) for each value.

Value type for step 2

Value names for steps 3 and 5

Value data for step 6

String value

TsaurusFile

tstrk.xml

DWORD value

Locale

0000041f

String value

WBreakerClass

{8DF412D1-62C7-4667-BBEC-38756576C21B}

String value

StemmerClass

{23A9C1C3-3C7A-4D2C-B894-4F286459DAD6}

After you load third-party word breakers, you need to refresh the list of LCIDs that are supported for full-text indexing and querying. To refresh this list, use the sp_fulltext_service system stored procedure to update the list of languages, as follows:

exec sp_fulltext_service 'update_languages';

The languages of the newly-loaded word breakers will now be listed by the sys.fulltext_languages catalog view.