Ruby Text Objects

Chinese characters are used to represent syllables and words in a number of East Asian languages. In each language, the characters are pronounced differently and their meanings may differ as well. To help clarify the pronunciation and/or semantics, “ruby text” can be used. In vertical text layout, ruby text is positioned vertically to the right of the base text that it annotates, while in horizontal text, the ruby text is normally placed above the base text.

For example, the word Japanese,日本語, can be annotated as

where the “ruby text” にほんご is the hiragana for “nihongo". Nihongo is how you say Japanese in Japanese. The character 日means sun (or day), 本 means origin (it’s a tree with roots) and 語 means language (literally five mouths speaking). Japan is the “origin of the sun” due to its Eastern location, think sunrise, hence 日本. In Chinese, you might find 日本語 annotated as

showing how Japanese may be pronounced in the People’s Republic of China.


Ruby text can be used for annotations other than pronunciation and isn’t limited to East Asian usage. This post discusses several usages and how to insert and represent them using Word and RichEdit. It concludes with a discussion of nesting ruby objects to create compound ruby layouts as an alternative to the intricate HTML5 ruby model.


Word Ruby Text Object

The HOME tab of the Word ribbon may include the “phonetic guide” tool

The user selects the desired base text in the document and clicks on the tool to bring up a dialog box for inserting the corresponding ruby object into a document. The dialog is prepopulated with default ruby text for the selected base text and language.

An earlier post on Word’s first math editing and display facility describes how Word first implemented its ruby-text object by piggy-backing on top of its mathematics EQ facility. The corresponding RTF control words were already defined aside from an attribute describing the distribution of the ruby text with respect to the base text.

Word 2000 switched to more elegant rendering using a dedicated LineServices in-line ruby object handler that works much the way the math object handlers of the Office math facility work. The corresponding docx format represents a ruby object in an object-oriented fashion similar to that for simple ruby in the HTML5 ruby model. For example, aside from font face-name and language specifications, the default Japanese ruby object for日本語 is given by the docx snippet

<w:rubyAlign w:val="distributeSpace"/>
<w:hps w:val="12"/>
<w:hpsRaise w:val="22"/>
<w:hpsBaseText w:val="24"/>
<w:sz w:val="12"/>
<w:szCs w:val="24"/>
<w:sz w:val="24"/>
<w:szCs w:val="24"/>

Here the ruby object is represented by the <ruby> element (as in HTML5) with the ruby text given by the <rt> element (as in HTML5) and the base text given by the <rubyBase> element, similar to the <rb> element in HTML5. But while HTML5 has a fairly complicated ruby syntax allowing intricate relationships between compound combinations of multiple ruby expressions, Word is limited to a well-formed XML for a simple ruby object consisting of a single annotation for a single base text. Special attributes specify the sizes of the ruby text (<hps>) and base text (<hpsBaseText>) in half points (1/144”), the position of the ruby text (<hpsRaise>), and the alignment of the ruby text (<rubyAlign>). The alignment values are given in the table

rubyAlign val Meaning
center Center <ruby> with respect to <base>
distributeLetter Distribute difference in space between longer and shorter text in the latter, evenly between each character
distributeSpace Distribute difference in space between longer and shorter text in the latter using a ratio of 1:2:1 which corresponds to lead : inter-character : end
left Align <ruby> with the left of <base>
right Align <ruby> with the right of <base>


RichEdit Ruby Text Object

RichEdit relies on its client to provide user-interface features such as dialog boxes. To aid the client in such endeavors, Microsoft Office RichEdit versions 5.0 and above offer the EM_INSERTOBJECT message for inserting various kinds of in-line objects including the simple ruby object, other East Asian objects such as warichu and tatenakayoko, and math objects. Since the content of these objects are in-line rather than separate as for embedded images or OLE objects, probably a better name for the message would be EM_INSERTINLINEOBJECT. We might use that name if the message is documented in MSDN and made usable in the Windows RichEdit (msftedit.dll). For now EM_INSERTOBJECT is defined as WM_USER + 245. The message wparam gives the kind of in-line object OBJECTTYPE as defined in tom.h and the lparam is a pointer to a structure defined by

LONG Count;                       // Count of arguments
LONG Char;                        // Object char, e.g., '/' for tomFraction
LONG Char1;                       // Closing char for tomBrackets
LONG Char2;                       // Separator for tomBracketsWithSeps
LONG Align;                       // Depends on object
LONG TeXStyle;                    // See MATHSTYLE
LONG cCol;                        // Column count for tomMatrix/tomEquationArray

The members of this structure are defined in the documentation for the ITextRange2∷SetInlineObject method. Currently that method can only insert math objects, although it’d be straightforward to extend the implementation to insert East Asian in-line objects. To enable such objects for any insertion avenue (EM_INSERTOBJECT, reading in RTF, or ITextRange2∷SetInlineObject), the client first has to send the EM_SETTYPOGRAPHYOPTIONS message with wparam = lparam = TO_INLINETEXTOBJECTS. The TO_ INLINETEXTOBJECTS flag (0x0040) can be included with other typography options if desired.

For the ruby text object, the Count of arguments is two: one for the ruby text and one for the base text. The only other member relevant for ruby is the Align value, which has the same meanings given in the table above (the same LineServices handler is used) and is numbered 0 through 4, respectively. One addition is that the Align value can be OR’d with the RUBYBELOW flag (0x80). When RUBYBELOW appears, the ruby text is displayed below the base instead of above it. This is handy for displaying some kinds of compound ruby as described in the next section.

The EM_INSERTOBJECT message inserts an in-line object with Count null arguments. The object arguments themselves still need to be inserted. To do this programmatically, one needs to know the internal structure of the object. All in-line objects begin with a start delimiter character. For math objects, the start delimiter is 0xFDD0. This character is the first of 32 consecutive Unicode characters reserved for internal use only. For the ruby object, the start delimiter is 0xFDD1. All but the last object argument are terminated by the argument delimiter 0xFDEE. The last argument is terminated by 0xFDEF, which terminates the object as well. So for the ruby object, EM_INSERTOBJECT inserts three characters: 0xFDD1 0xFDEE 0xFDEF at the current insertion point (IP) and leaves the IP in between the 0xFDD1 and 0xFDEE, ready to insert the ruby text. Various selection-oriented messages and APIs can be used to insert the desired ruby text at the IP. To insert the base text, the IP needs to be moved just past the 0xFDEE and then used to insert the text. Alternatively, one can use an ITextRange[2] to insert the ruby and base texts, without changing the selection. Similarly for a fraction object, EM_INSERTOBJECT inserts three characters: 0xFDD0 0xFDEE 0xFDEF at the current insertion point (IP) and leaves the IP in between the 0xFDD0 and 0xFDEE. For math objects a simpler approach is to use ITextRange2∷SetInlineObject(), since the range can be used to insert the fraction as well as to insert the numerator and denominator.


Compound Ruby Text

Most often, a ruby text object contains a simple base text string and a simple ruby text string. But considerably more complicated arrangements can be constructed using the HTML5 ruby model. These are illustrated both in that specification and in the proposed CSS ruby layout model. To handle the complications, the HTML5 structure can become very involved and require a many-step parsing algorithm. Most of what one might do, though, can be handled by nesting simple ruby objects within one another. Consider the HTML5 ruby specification example of a double-sided ruby expression with both phonetic and semantic annotations for the Chinese word 旧金山 (old gold mountain), that is, San Francisco. Rendered by RichEdit, it can look like

In the HTML5 model, this is represented by

<ruby><rb>旧<rb>金<rb>山<rt>jiù<rt>jīn<rt>shān<rtc>San Francisco</ruby>rubysanfrancisco

This requires matching up the <rt>’s with the corresponding <rb>’s and putting the <rtc> entry under the result. In the RichEdit approach, simple ruby objects for旧金山are used as the base of an under-ruby object that has the ruby text “San Francisco”. Parsing is easy because parsers know how to deal with recursive structures. Admittedly having a single ruby object for compound ruby instances can allow specialized internal character placements. But a remarkable variety of such instances can be rendered very well using straightforward nesting of simple ruby objects, perhaps tweaked by adding spaces. In fact, this is the same approach that one uses in rendering complicated mathematical expressions. Some automatic spacing could be implied much as most mathematical spacing is implied.

Another example of a compound ruby object is



This illustrates that ruby annotations can be handy in explaining aspects of non East Asian text. It consists of three simple ruby objects comprising the base of an under-ruby object as in the San Francisco case.

A nice thing about RichEdit’s ruby implementation is that the user can edit ruby objects in the same way that one edits mathematical fractions. The arrow keys and mouse cursor allow one to navigate inside the objects even when nested. The same shaded highlighting used in math zones is used to identify the argument containing the insertion point and the parent object. You can select text, change character formatting, cut and paste, etc. In both Word and RichEdit, the ruby text appears first in the backing store and is followed by the base text. This is analogous to how the numerator and denominator of a fraction are stored with the numerator appearing first (see post on rich-text search). In HTML5 the base text(s) precede the corresponding ruby text(s). The LineServices ruby handler can deal with either order, since it has to support ruby for Internet Explorer as well as for Word and RichEdit.

In Word you can edit a ruby object but you have to “toggle the field codes” to the EQ field representation (right click on the ruby object to get the toggle option). Modifying an EQ field is far from WYSIWYG and is confusing even to those who understand it. Furthermore you cannot nest ruby objects in Word. It’d certainly be possible to make Word’s ruby editing more user friendly since mathematical expressions can be much more complex than ruby expressions and Word allows the user to edit them easily in a WYSIWYG fashion. But probably such an effort won’t compete with other things that need to be done.