Share via


1.3.1 HTML/Plain Text Encapsulation

Encapsulation and de-encapsulation can introduce changes in the content of the original document, as long as such changes do not affect the rendering of the document in its original format. For example, it is allowable to introduce, remove, or change insignificant whitespace in HTML and/or to normalize text line endings to use carriage return/line feed pairs (CRLFs).

Two software roles can be identified in respect to this encapsulation format:

  1. Encapsulating RTF writer: The RTF writer, as described in [MSFT-RTF], that converts content from HTML or plain text format to RTF and preserves the original form of the content in an RTF document by using the encapsulation format specified by this algorithm.

  2. De-encapsulating RTF reader: The RTF reader, as described in [MSFT-RTF], that converts content from RTF back to HTML or plain text format, by recognizing that an RTF document contains encapsulated HTML or plain text content and extracting such content (instead of performing a general format conversion from RTF to HTML or plain text format).

This algorithm does not specify a general format conversion process between HTML (or plain text) and RTF. Such a conversion process can be a proprietary and often approximate mapping between RTF formatting features, as described in [MSFT-RTF], and HTML formatting features, as described in [HTML]. For example, the HTML code fragment "<B>test</B>" could be converted to "{\b test}". The encapsulation of original content is orthogonal to a format conversion process and can be combined with any such format conversion.

An RTF reader can choose to ignore the encapsulation within an RTF document and treat such a document as a pure RTF document. Therefore, the RTF document that contains the encapsulated original content needs to also contain an adequate RTF rendering of the original HTML or plain text document. The implementer determines the richness of the conversion from the original content format to RTF.