1.3 Overview

A compound file is a structure that is used to store a hierarchy of storage objects and stream objects into a single file or memory buffer.

A storage object is analogous to a file system directory. Just as a directory can contain other directories and files, a storage object can contain other storage objects and stream objects. Also like a directory, a storage object tracks the locations and sizes of the child storage object and stream objects that are nested beneath it.

A stream object is analogous to the traditional notion of a file. Like a file, a stream contains user-defined data that is stored as a consecutive sequence of bytes.

The hierarchy is defined by a parent object/child object relationship. Stream objects cannot contain child objects. Storage objects can contain stream objects and/or other storage objects, each of which has a name that uniquely identifies it among the child objects of its parent storage object.

The root storage object has no parent object. The root storage object also has no name. Because names are used to identify child objects, a name for the root storage object is unnecessary and the file format does not provide a representation for it.

Example of a structured storage compound file

Figure 3: Example of a structured storage compound file

A compound file consists of the root storage object with optional child storage objects and stream objects in a nested hierarchy. Stream objects can contain user-defined data that is stored as an array of bytes. Storage objects can contain an object class GUID that is called a class identifier (CLSID), which can identify an application that can read/write stream objects under that storage object.

The benefits of compound files include the following:

  • Because the compound file implementation provides a file system-like abstraction within a file, independent of the details of the underlying file system, compound files can be accessed by different applications on different platform operating systems. The compound file can be a generic container file format that holds data for multiple applications.

  • Because the separate objects in a compound file are saved in a standard format, any browser utility that is reading the standard format can list the storage objects and stream objects in the compound file, even though data within a particular object can be in a proprietary format.

  • Standardized data structures exist for writing certain types of stream objects--for example, summary information property sets (for more information about property sets, see [MS-OLEPS]). Applications can read these stream objects by using parsers for these data structures, even when the rest of the stream objects cannot be understood.

The compound file implementation constructs a level of indirection by supporting a file system within a file. A single flat file requires a large contiguous sequence of bytes on the disk. By contrast, compound files define how to treat a single file as a structured collection of storage objects and stream objects that act as file system directories and files, respectively.

Example of a compound file showing equal-length sector divisions

Figure 4: Example of a compound file showing equal-length sector divisions

A compound file is divided into equal-length sectors. The first sector contains the compound file header. Subsequent sectors are identified by a 32-bit nonnegative integer number, called the sector number.

A group of sectors can form a sector chain, which is a linked list of sectors forming a logical byte array, even though the sectors can be in non-consecutive locations in the compound file. For example, the following figure shows two sector chains. A sector chain starts at sector #0, continues to sector #2, and ends at sector #4. Another sector chain starts at sector #1 and ends at sector #3.

Example of a compound file sector chain

Figure 5: Example of a compound file sector chain

A sector can be unallocated or free, in which case it is not part of a sector chain. A sector number is used for the following purposes:

  1. A sector number is used to identify the file offset of that sector in a compound file.

  2. In a sector chain, a sector number is used to identify the next sector in the chain.

  3. Special sector numbers are used to represent chain termination and free sectors.