Share via


2.7.2 Bitmap Document Set

In bitmap document set scheme the WID file contains a header and stores the freshness information about the items as a plain bitmap.

The following is a high-level representation of the format of the file.


0


1


2


3


4


5


6


7


8


9

1
0


1


2


3


4


5


6


7


8


9

2
0


1


2


3


4


5


6


7


8


9

3
0


1

Type of scheme

Bdate

Flag

Outdated DocIDs

Number of DocIDs

Reserved1

Reserved2

Size of bitmap

Minimum DocID Value

Maximum DocID Value

Number of DocIDs Delta

Reserved3 (4052 bytes)

...

...

Bitmap (variable)

...

Type of scheme (4 bytes): A 32-bit unsigned integer. Value MUST be 0x00000003.

Bdate (4 bytes): A 32-bit unsigned integer assigned during the creation of the file which is used to indicate order of file creation. The larger the number, the more recent the file.

Flag (4 bytes): A 32-bit unsigned integer. The most significant bit of this integer MUST be set to zero if all instances of the items in the file are outdated in all older files (that is, all files with a lower Bdate field). Otherwise, the most significant bit of the integer MUST be set to 1. Other bits MUST be ignored.

Outdated DocIDs (4 bytes): A 32-bit unsigned integer which represents the count of outdated document identifiers in the file. This integer is used for estimation purposes to determine the efficient document identifiers representation format during further merges. This value SHOULD be within 10% of the correct value. If the integer is not within this range, performance could be affected.

Number of DocIDs (4 bytes): A 32-bit unsigned integer which is the total number of document identifiers stored in the file.

Reserved1 (4 bytes): The value of these 4 bytes is arbitrary, and MUST be ignored.

Reserved2 (4 bytes): MUST be 0.

Size of bitmap (4 bytes): A 32-bit unsigned integer. Size of bitmap in bytes divided by 4. The field is used to calculate range of documents which can be stored in the map, which is Minimum Doc ID field to Minimum Doc ID field + Size of bitmap field * 4 * 8.

Minimum DocID Value, Maximum DocID Value (4 bytes each): Two 32-bit unsigned integers. Recorded at the time of file creation, no updates, used to check the density of the list of document identifiers.

Number of DocIDs Delta (4 bytes): A 32-bit unsigned integer which is the number of outdated DocIDs at the moment of file creation.

Reserved3 (4052 bytes): MUST be ignored.

Bitmap (Size of bitmap times 4 bytes): The following table shows the format of the bitmap, which stores the freshness information about the items.


0


1


2


3


4


5


6


7


8


9

1
0


1


2


3


4


5


6


7


8


9

2
0


1


2


3


4


5


6


7


8


9

3
0


1

Array of Masks (Size of bitmap * 4 bytes)

...

To store freshness information in the bitmap file the normalized document identifiers are used; that is, the Minimum DocID field rounded down to the nearest multiple of 32 is subtracted from each document identifier. Each normalized document identifier is split into two parts. The value of the 27 most significant bits corresponds to the mask number. The value of the 5 least significant bits of each document identifier, shifted left 1 bit in this 32-bit mask, defines the bit which is used to store the freshness information for the specific document identifier.

If an item is not in the full-text index catalog or is outdated the corresponding bit in the mask MUST be 0. If the item is fresh then the corresponding bit in the mask MUST be 1.