RichEdit Input APIs
With on-screen keyboards used commonly on phones, tablets and even on laptops, one might wonder how such keyboards can input characters and commands into a RichEdit control. With traditional hardware keyboards, messages like WM_CHAR, WM_KEYDOWN, and WM_SYSKEYDOWN offer a wide variety of input data. But the new touch-screen keyboards typically don’t use messages. This post describes the APIs that RichEdit offers for these purposes as well as for text input in general. See also Entering Unicode Characters.
In addition to the messages, characters and navigation commands can be input into RichEdit controls using a variety of TOM (text object model) methods. After listing the messages and methods, we compare the relative advantages of ITextRange2::SetText2() and ITextSelection::TypeText() for providing keyboard input. Note that for windowless (or windowed) controls, messages can be sent by calling ITextServices::TxSendMessage(). In this and many other posts, UI stands for “user interface” and IP stands for “insertion point”, which is the blinking caret with no text or background color change. When the selection selects one or more characters, the background color and possibly the foreground color of those characters change (see RichEdit Colors).
The following table summarizes the RichEdit input methods and messages
ITextRange::SetText(bstr) |
Replaces currently selected text with bstr. RTF is recognized |
ITextRange::SetFormattedText(pRange) |
Replaces currently selected text with rich text given by an ITextRange*pRange |
ITextRange::SetChar(char) |
Replaces current character by (UTF-32) char |
ITextRange::Paste(pVar) |
Replace currently selected text with text given by pVar |
Replaces selected code or code preceding insertion point by corresponding Unicode character (implements alt+x) |
|
ITextRange::SetText2(Flags, bstr) |
Inserts plain or rich text bstr according to option Flags |
ITextSelection::TypeText(bstr) |
Inserts bstr using UI conventions (described below) |
ITextSelection::MoveLeft(), etc. |
Moves left (etc. = right, up, down) by a variety of units |
Moves home (end) for a variety of units |
|
ITextStory::SetText(bstr) |
Replaces text in story by cleansed bstr (no conversions) |
Replaces text in active story by text read in from a file or an IStream |
|
Replaces text in active story by (WCHAR *)pch. RTF is recognized. |
|
Insert a UTF-16 or UTF-32 character wparam with full UI support |
|
Used for hot keys, Enter, Tab. See RichEdit Hot Keys |
|
Replaces text in active story by (WCHAR *)lparam |
|
Replaces text in selection by (WCHAR *)lparam |
|
Combines functionality of WM_SETTEXT and EM_REPLACESEL along with additional features |
|
Streams plain or rich text into the current selection or into the whole document |
ITextSelection::TypeText(bstr)
The ITextSelection methods in the table are designed to provide most of the functionality built into the WM_CHAR, WM_KEYDOWN and WM_SYSKEYDOWN messages, perhaps in a more understandable form. But the methods don’t offer hot key support that involves the Alt and/or Ctrl keys. The methods have been available for user-interface (UI) input since RichEdit 2.0 (Office 97). In particular, ITextSelection::TypeText(bstr) processes the characters in the bstr according to the following UI rules
Tab (U+0009) behavior
1) In math zones, [shift+]Tab moves to the previous/next argument if in a math object, else moves to the start/end of the math zone
2) In tables, [shift+]Tab selects the previous/next cell. Tab in the last cell of the bottom row appends a new row and places the insertion point in the first cell of the new row.
3) Outside of tables and math zones, Tab inserts a Tab (U+0009). When messages are used, Ctrl+Tab inserts a tab whether in a table or not.
CR/VT (U+000D/U+000B) behavior
1) If the insertion point (IP) is at the start of first cell in a table, CR inserts a non-table paragraph in front of the table
2) If the IP follows a row terminator, CR inserts an empty row with same properties as the current row
3) Ensure that CR’s and VT’s don’t use table row terminator formatting aside from 2)
4) Elsewhere CR and VT are inserted into the backing store, producing hard and soft paragraph delimiters, respectively
5) Two sequential CR’s before an EOP turn off numbering
6) VT is changed to CR if it occurs after table row end delimiter or in front of table row start delimiter
7) Turn off math zone for VT/CR; leave math-zone placeholder (if there) for VT
8) If inside math function argument, CR inserts an equation array, e.g., for multiple upper/lower limits of n-ary functions
9) Handle formatting for bullets/numbering
10) Handle auto table insertion
Other characters
1) When the selection is an insertion point, the selection’s character format is used for input characters. E.g., if the user toggles bold on, the character will be bolded. An independent ITextRange2 doesn’t use the selection’s format and the characters would not be bolded
2) Input sequence checking: Thai, Indic, Tibetan and Vietnamese scripts all follow specific input rules. Input sequence checking prevents inputting strings that don’t obey the rules.
3) Indic/Thai overtyping convention: overwrite cluster at the IP if the character is a cluster-start character; otherwise insert current character
4) Dual font for Latin/East Asian input. Use Western font for Latin characters and East Asian font for East Asian characters. This was an important feature in the last century when East Asian fonts had inferior glyphs for ASCII characters. But more recent fonts have high quality glyphs and this feature should be disabled by default.
5) Check if need math font
6) Check if current font supports keyboard
7) Replace text in word-selection mode
8) Handle selection undo
9) Handle all-upper-case (SES_UPPERCASE edit style) and all-lower-case (SES_LOWERCASE edit style) control modes
10) If insertion point and default charrep in a plain-text control, use a charrep that the current font supports
11) Smart quote option for English keyboards
12) If IP for keyboard with 8-bit ANSI code page, convert character codes > 255 back to ANSI when storing SYMBOL_CHARSET characters
13) Stamp keyboard language unless 1) East Asian language and character code is < 127, 2) Hebrew caps is active, or 3) IME composition is in progress
14) Font bind characters except when current font is highly appropriate
15) Autocorrect and math autocorrect
16) Update caret location when at ambiguous end-of-line/start-of-next-line cp
ITextRange2::SetText2(Flags, bstr)
This method inserts the string bstr subject to various options specified by the Flags bits. If Flags = 0, the method reduces to ITextRange::SetText(). That function inserts plain text removing certain special characters, such as U+FDD0..U+FDEF since they have structural meanings in the RichEdit backing store. In addition the CRLF (U+000D U+000A) is replaced by a CR alone and appropriate fonts are used to display the characters.
Various Flags bits control optional text insertion features such as suppressing link and/or hidden attributes, checking for the text limit, converting RTF input to rich text in the backing store, etc. The tomConvertRTF option accepts either the usual RTF byte strings or RTF UTF-16 strings. The latter allow the client to use all Unicode characters directly instead of having to translate nonASCII characters with a Windows code page (if one exists for the characters) or more generally to use \uN control words. The implementation does this by converting an RTF UTF-16 string to UTF-8 RTF before passing it to the RTF reader.
Notably missing in the SetText2() method are the UI features of ITextSelection::TypeText(). So for on-screen keyboards, sometimes called soft input panels (SIPs), characters should be inserted using ITextSelection::TypeText(). IME programs can use SetText2() during composition, but they should insert the finalized character(s) using ITextSelection::TypeText.