2.15 Index Lexicon File

2019-02-14

The index lexicon file is a text file using Unicode encoding which lists the most frequent tokens which appear in the content index file of a master full-text index component of the current full-text index catalog. It is used by the query server to determine alternative spelling variants for the tokens encountered in the received queries.

In a binary representation, the format of the file is as follows.

0	1	2	3	4	5	6	7	8	9	1 0	1	2	3	4	5	6	7	8	9	2 0	1	2	3	4	5	6	7	8	9	3 0	1
Unicode marker																ListOfTokens (variable)
...

Unicode marker (2 bytes): A 2 byte field specific to the text files which use the Unicode encoding. The values of the bytes MUST be 0xFF followed by 0xFE.

ListOfTokens (variable): Array of Unicode characters representing the list of the most frequent tokens in the catalog. The tokens are separated by the new line characters and each token is composed of 1 to 64 non-space characters.

Share via

2.15 Index Lexicon File

Additional resources