2.7 Document Set Files

Document set files contain a list of the indexed items represented by a 32-bit document identifier. Each item also has freshness information; an item is marked as either fresh or outdated. An item is marked as fresh if no other content index file contains a more recent version of the contents of the item, and is marked as outdated otherwise.

The system uses three different file schemes to store the list of document identifiers and freshness information:

  • List document set, as specified in section 2.7.1.

  • Bitmap document set, as specified in section 2.7.2.

  • Indexed bitmap document set, as specified in section 2.7.3.

The guidelines in the following table establish which schema to use.

Document Set Schema

Number of DocIDs

Density

List document set scheme

Low

Low

Bitmap document scheme

Any

High

Indexed bitmap document set scheme

High

Low

Where density of the list of document identifiers is related to the maximum and minimum document identifiers. If the value of Maximum DocIDValue- Minimum DocID Value fields is approximately the number of document identifiers the list has high density, otherwise the list has low density.

Each document set file scheme contains a file with a .wid extension, called a WID file. In addition, the indexed bitmap document set contains a file with a .wsb extension, called the WSB file.