RTF Syntax
This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.
An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file.
A control word is a specially formatted command that RTF uses to mark printer control codes and information that applications use to manage documents. A control word cannot be longer than 32 characters. A control word takes the following form:
\
LetterSequence<Delimiter>
Note
A backslash begins each control word.
The LetterSequence is made up of lowercase alphabetic characters (a-z). RTF is case sensitive.
The following Word 97-2000 keywords do not currently follow the requirement that keywords may not contain any uppercase alphabetic characters. All writers should still follow this rule, and Word will also emit completely lowercase versions of all these keywords in the next version. In the meantime, those implementing readers are advised to treat them as exceptions:
\
clFitText\
clftsWidthN\
clNoWrap\
clwWidthN\
tdfrmtxtBottomN\
tdfrmtxtLeftN\
tdfrmtxtRightN\
tdfrmtxtTopN\
trftsWidthAN\
trftsWidthBN\
trftsWidthN\
trwWidthAN\
trwWidthBN\
trwWidthN\
sectspecifygenN
The delimiter marks the end of an RTF control word, and can be one of the following:
- A space. In this case, the space is part of the control word.
- A digit or a hyphen (-), which indicates that a numeric parameter follows. The subsequent digital sequence is then delimited by a space or any character other than a letter or a digit. The parameter can be a positive or a negative number. The range of the values for the number is generally –32767 through 32767. However, Word tends to restrict the range to –31680 through 31680. Word allows values in the range -2,147,483,648 to 2,147,483,648 for a small number of keywords (specifically \bin, \revdttm, and some picture properties). An RTF parser must handle an arbitrary string of digits as a legal value for a keyword. If a numeric parameter immediately follows the control word, this parameter becomes part of the control word. The control word is then delimited by a space or a nonalphabetic or nonnumeric character in the same manner as any other control word.
- Any character other than a letter or a digit. In this case, the delimiting character terminates the control word but is not actually part of the control word.
If a space delimits the control word, the space does not appear in the document. Any characters following the delimiter, including spaces, will appear in the document. For this reason, you should use spaces only where necessary; do not use spaces merely to break up RTF code.
A control symbol consists of a backslash followed by a single, nonalphabetic character. For example, \~ represents a nonbreaking space. Control symbols take no delimiters.
A group consists of text and control words or control symbols enclosed in braces ({ }
). The opening brace ({ ) indicates the start of the group and the closing brace ( }
) indicates the end of the group. Each group specifies the text affected by the group and the different attributes of that text. The RTF file can also include groups for fonts, styles, screen color, pictures, footnotes, comments (annotations), headers and footers, summary information, fields, and bookmarks, as well as document-, section-, paragraph-, and character-formatting properties. If the font, file, style, screen-color, revision mark, and summary-information groups and document-formatting properties are included, they must precede the first plain-text character in the document. These groups form the RTF file header. If the group for fonts is included, it should precede the group for styles. If any group is not used, it can be omitted. The groups are discussed in the following sections.
The control properties of certain control words (such as bold, italic, keep together, and so on) have only two states. When such a control word has no parameter or has a nonzero parameter, it is assumed that the control word turns on the property. When such a control word has a parameter of 0 , it is assumed that the control word turns off the property. For example, \b turns on bold, whereas \b0 turns off bold.
Certain control words, referred to as destinations, mark the beginning of a collection of related text that could appear at another position, or destination, within the document. Destinations may also be text that is used but should not appear within the document. An example of a destination is the \footnote group, where the footnote text follows the control word. Page breaks cannot occur in destination text. Destination control words and their following text must be enclosed in braces. No other control words or text may appear within the destination group. Destinations added after the RTF Specification published in the March 1987 Microsoft Systems Journal may be preceded by the control symbol \*. This control symbol identifies destinations whose related text should be ignored if the RTF reader does not recognize the destination. (RTF writers should follow the convention of using this control symbol when adding new destinations or groups.) Destinations whose related text should be inserted into the document, even if the RTF reader does not recognize the destination, should not use \*. All destinations that were not included in the March 1987 revision of the RTF Specification are shown with \* as part of the control word.
Formatting specified within a group affects only the text within that group. Generally, text within a group inherits the formatting of the text in the preceding group. However, Microsoft implementations of RTF assume that the footnote, annotation, header, and footer groups (described later in this chapter) do not inherit the formatting of the preceding text. Therefore, to ensure that these groups are always formatted correctly, you should set the formatting within these groups to the default with the \sectd, \pard, and \plain control words, and then add any desired formatting.
The control words, control symbols, and braces constitute control information. All other characters in the file are plain text. Here is an example of plain text that does not exist within a group:
{\rtf\ansi\deff0{\fonttbl{\f0\froman Tms Rmn;}{\f1\fdecor
Symbol;}{\f2\fswiss Helv;}}{\colortbl;\red0\green0\blue0;
\red0\green0\blue255;\red0\green255\blue255;\red0\green255\
blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\
green255\blue0;\red255\green255\blue255;}{\stylesheet{\fs20
\snext0Normal;}}{\info{\author John Doe}
{\creatim\yr1990\mo7\dy30\hr10\min48}{\version1}{\edmins0}
{\nofpages1}{\nofwords0}{\nofchars0}{\vern8351}}\widoctrl\ftnbj \sectd\linex0\endnhere \pard\plain \fs20 This is plain text.\par}
The phrase "This is plain text" is not part of a group and is treated as document text.
As previously mentioned, the backslash (\) and braces ({ }
) have special meaning in RTF. To use these characters as text, precede them with a backslash, as in \\, \{, and \}.