1.1 Glossary

This document uses the following terms:

anchor scope index key: An index key that contains an encoded document identifier. It is used in conjunction with a scope index record that stores links from the item that is referenced by the document identifier.

anchor text: The text that is included with a hyperlink to describe the target content of a hyperlink.

authority page: A webpage that a site collection administrator designated as more relevant than other webpages. This is typically the URL of the home page for the intranet of an organization. The higher the authority level assigned to a page, the higher the page appears in search results. Also referred to as authoritative page.

basic scope index: A scope index file that contains records with basic scope index keys or anchor scope index keys.

basic scope index key: An index key that references a scope index record and contains information about a property and its value.

beginning-of-file (BOF) key: An index key that is stored near the beginning of a content index file. It references a content index record that stores the maximum occurrence for a specified property.

BitStream: A sequence of bits that represents the compressed data for a full-text index catalog.

BitStream field: A section of bits that is part of a BitStream and is 32 or fewer bits.

BitStream field structure: A structure that contains one or more BitStream fields.

BitStream file: A content index file, a scope index file, or a content index extension (.cix) file that is used to store compressed data for a full-text index catalog. It stores the data as a series of BitStreams that are organized into BitStream pages.

BitStream page: A 4,096-byte segment of data in a BitStream file. It stores 32,704 bits, using an array of 4-byte blocks.

BitStreamPosition: A data structure that is used to specify the location of a BitStream field or field structure in a BitStream file.

CheckSummedRecord: A record that stores data fields and the corresponding checksum for each of those fields.

CIndexRecord: A record in an index table file.

compound scope index: A file that is in a search scope index and contains records that store compound scope index keys or anchor scope index keys.

compound scope index key: A key that is used to locate a scope index record. It is based on a compound scope identifier.

content index extension (.cix) file: A file that is part of a full-text index catalog. It is used to store compressed document identifiers and OccCount values for data that is stored in an associated content index file.

content index file: A file that is part of a full-text index catalog. It is used to store data from items as an inverted index and it enables searches for specific terms across items.

content index key: A key that references a record in a content index file. It consists of a property identifier and a normalized token.

content index record: A part of a content index file that is used to store all of the document identifiers for items that have a unique combination of a token and a property identifier.

DocID skip: A forward link that allows the reader of a content index record or a scope index record to skip a group of document identifiers.

DocIDDelta: A number that represents the incremental difference in value between a document identifier and the document identifier that immediately precedes it in a list that is sorted in ascending order.

document identifier: An integer that uniquely identifies a crawled item.

end-of-file (EOF) key: An index key that is stored near the end of a content index file. It references a content index record that stores the maximum occurrence for a specified property.

full-text index component: A set of files that contain all of the index keys that are extracted from a set of items.

index directory file: A file that is part of a full-text index catalog. It is used to store index keys from an associated content index file, which facilitates finding a specific content index record in the content index file.

index directory level: An array of index directory pages that contains index keys from an associated index and the positions of those keys in the index.

index directory page: A page that conforms to the index directory page structure that stores index directory records.

index identifier: An integer that uniquely identifies a full-text index component within a full-text index catalog.

index key: A key that references a record in a content index file or a scope index file. It consists of an index key string and a property identifier.

index key string: A sequence of bytes that specifies the value that is used to sort records in a content index file or a scope index file.

index server: A server that is assigned the task of crawling.

index table file: A directory that is used to store an inventory of files in a full-text index catalog.

inverted index: For each token that is encountered in a corpus of indexed items, a data structure that stores a list of postings that identify which documents matched and a list of occurrences that identify which position in each document.

item: A unit of content that can be indexed and searched by a search application.

log2: A function that returns an integer specifying the minimum number of bits that are required to represent the integer part of an input parameter.

master index component: A full-text index component that contains index keys that are extracted from a set of items. In a full-text index catalog, there is only one master index component. It is referenced by an itMaster CIndexRecord.

max key: An index key that references the last record in a content index file or a scope index file.

MaxOccBucket: An integer that is used to store the approximate number of tokens for a specific item and property.

metadata schema: A schema that is used to manage information about an item.

OccCount: An integer that is used to store the number of instances of a token for a specific item and property.

prefix length: An integer that represents the number of identical bytes at the beginning of the current and previous index key strings. See also suffix length.

property identifier: A unique integer or a 16-bit, numeric identifier that is used to identify a specific attribute or property.

query server: A server that has been assigned the task of fulfilling search queries.

rank: An integer that represents the relevance of a specific item for a search query. It can be a combination of static rank and dynamic rank. See also static rank and dynamic rank.

ranking: A process in which an integer that represents the relevance of a specific item for a search query is assigned to that item. It can be a combination of static rank and dynamic rank.

scope index key: A basic scope index key or a compound scope index key that references a scope index record.

search application: A unique group of search settings that is associated, one-to-one, with a shared service provider.

search query: A complete set of conditions that are used to generate search results, including query text, sort order, and ranking parameters.

search scope: A list of attributes that define a collection of items.

search scope compilation identifier: An integer that identifies the version of the list of search scopes that is associated with a scopes compilation event on a search server.

split key: A content index key that references a record in a target content index file. All of the records before the referenced record have been written to the file successfully.

suffix length: An integer that represents the number of bytes of the current index key string minus the number of identical bytes at the beginning of the current and previous index key strings. See also prefix length.

token: A word in an item or a search query that translates into a meaningful word or number in written text. A token is the smallest textual unit that can be matched in a search query. Examples include "cat", "AB14", or "42".

Unicode: A character encoding standard developed by the Unicode Consortium that represents almost all of the written languages of the world. The Unicode standard [UNICODE5.0.0/2007] provides three forms (UTF-8, UTF-16, and UTF-32) and seven schemes (UTF-8, UTF-16, UTF-16 BE, UTF-16 LE, UTF-32, UTF-32 LE, and UTF-32 BE).

Uniform Resource Locator (URL): A string of characters in a standardized format that identifies a document or resource on the World Wide Web. The format is as specified in [RFC1738].

MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as defined in [RFC2119]. All statements of optional behavior use either MAY, SHOULD, or SHOULD NOT.