Text Insertion Point

People often ask questions about the nature of the text insertion point (IP), the blinking vertical bar in between two characters on screen. This post attempts to address some of these questions, notably about where the IP is, what it means, how it works in BiDi text, how to control it programmatically and how it appears on braille displays. Some folks refer to the IP as the “cursor”, although that term is also used to describe the mouse pointer. “Text cursor” is less ambiguous. Other folks refer to it as the “caret”. Generally, the next character typed is entered at the IP, that is, in front of the character following the IP, not directly on the character unless overtype mode is active.

Furthermore, in text processing, it’s important to think of the IP as in between characters, since a character you type could end up in the preceding text run, in the following text run or in a text run of its own. Consider an IP that immediately follows the word bold in “boldtext”. Will the next character you type be bold or not? The answer depends on whether the IP has the formatting of the preceding text run, the formatting of the following text run or some other formatting chosen in part using toolbar format commands or hot keys like Ctrl+B.

Quantitatively, we describe the insertion point by its character position (cp). In the following figure, character positions are represented by the lines separating the letters. The corresponding cp values are given beneath the lines. The text run starting at cp 5 and ending at cp 7 contains the two-letter word “is”. The difference between these cp’s is 2, the count of characters in the text run.


The cp 5 also marks the end of the text run starting at cp 0. We can call such a cp an “ambiguous cp”, since it delimits two adjacent text runs. This ambiguity is illustrated for an IP in between “bold” and “text” above.

BiDi text insertion points

In BiDi text, such as a mixture of Arabic and English text, the directional ambiguity of an IP between text runs of opposite directionality is usually resolved by the directionality of the current keyboard language. This choice is made because the purpose of the IP is to reveal where the next character typed will be entered. This works unambiguously in BiDi text except when the IP follows a digit and the keyboard is right-to-left. This is because consecutive digits are invariably displayed left-to-right, rather than right-to-left (except for N’Ko). If you then type a digit with a right-to-left keyboard, the IP will be displayed to the right of the digit, but if you instead type a right-to-left letter, that letter will be displayed to the left of the digit(s). There’s no way to know what the user will type next, so in this scenario the IP may not be displayed where the next character typed will be inserted. The bottom line is that in BiDi text you can’t intuit desired behavior 100% of the time, partly because people have conflicting needs. The choices were made because they work the clear majority of the time.

If the top of the caret (text cursor) has a tick mark that points left, an RTL (right-to-left) keyboard is active, while a tick mark that points right indicates an active LTR keyboard. Office apps use a tick-mark-free caret for LTR keyboards. If a document doesn’t have any BiDi text, such a caret is unambiguous. If a document does have BiDi content, some users might prefer to see the right-pointing tick mark on the caret when the keyboard is LTR. Office used to have such an LTR caret, but abandoned it back in the last century. Having the RTL caret was considered sufficiently different from a tickless caret to resolve the ambiguities and then there’s less of a hiccup going between pure LTR docs and BiDi documents. I don’t think we’ve gotten negative feedback on this choice. Conceivably, we should have a user option that’s enabled by default in BiDi locales and disabled by default elsewhere. Also, we might want an option to display a shadow caret where a character would be displayed if typed with a keyboard of the opposite directionality.

Programmatic insertion points

In the text object model TOM, insertion points consist of degenerate ITextRange or ITextRange2 objects. The insertion point controlled directly by a user editing text is the ITextSelection[2], which is an ITextRange[2] with additional user-interface functionality. If these ranges select one or more characters, they are called nondegenerate. ITextRange methods refer to the cp at the start of a range as Start and to the cp at the end of the range as End. They are retrieved and set by methods like ITextRange::GetStart() and ITextRange::SetStart(). If ITextSelection2::GetCch() returns 0, the user selection is an insertion point. Alternatively the condition End = Start implies an insertion point. If you’re using a degenerate ITextSelection2 or ITextRange2, you can change the character format inheritance by changing the Gravity property. That is typically a bit faster since figuring out which character format properties the IP should use when moved may incur considerable calculation.

The ITextSelection object has its own character formatting properties to allow the user to change the formatting of the insertion point from the formatting deduced from the backing store and selection activity. In RichEdit, all degenerate ITextRange objects have their own character formatting to give clients such flexibility.


RichEdit uses an n/m algorithm to move the insertion point through Latin and Arabic ligatures and to select part way through such ligatures. The rationale for doing this is given in Ligatures, Clusters, Combining Marks and Variation Sequences. It would confuse most users if → moved past a whole ffi ligature in the word “difficult”. So, caret motion cannot be dictated solely by the font.

Braille insertion point

People need to know where the IP is so that they know where they enter text. This applies to all editing mechanisms including speech and braille displays. As described in Speaking of math…, the insertion point is identified by speaking the character or object at the insertion point. Most braille displays have 8 buttons for inputting 8-dot braille. However, the popular braille systems only use 6 dots. That leaves dots 7 and 8 available for innovative purposes, such as marking the IP.

VoiceOver indicates the position of the text cursor (IP) on braille displays by flashing dot 8 of the braille cell preceding the IP and dot 7 of the braille cell following the IP. This locates the IP in between two braille cells, much as the text cursor on screen displays the IP in between two characters. Voice over also raises dots 7 and 8 to show the position of the VoiceOver cursor, to help you find it within the line of braille.

Finally, I can’t resist mentioning a cool editing feature of the Visual Studio editor, namely the multiline IP. Create this with a mouse by Alt+dragging the IP over multiple lines. I use it all the time to delete and insert text in whole columns. You can also use alt+dragging to select and operate on blocks of text, e.g., to delete or copy them.