September 2011

Volume 26 Number 09

UI Frontiers - Touch for Text

By Charles Petzold | September 2011

Charles Petzold My first store-bought computer was an Osborne 1, which was the first commercial computer designed to be small enough to fit under the seat of a plane. It accomplished this feat in part by featuring a tiny 5-inch monitor capable of displaying only 24 lines of 52 characters each.

My most recent computer purchase was a Windows Phone 7-based device, and of course it has an even smaller screen. The aspect ratio is different (3:5 rather than 4:3) and the Windows Phone 7 fonts have proportional spacing, but the text density is nearly identical: Silverlight programs written for a Windows Phone 7 device use a default font that displays approximately the same number of lines with the same number of characters as the Osborne 1 display!

I won’t bore you with the myriad of differences between these two computers created 30 years apart, except to note that nobody ever considered reading a book on the Osborne 1. (Writing a book, yes, but reading a book, definitely not.) In contrast, reading books has become one of my favorite activities on my Windows Phone 7 device. I like reading books on my phone so much that I’m writing my own e-book-reading software so I can tailor the experience just the way I want it.

Touching the Words

Certainly much of the appeal of using the phone as an e-book reader is its ideal size: If the screen were any smaller, it would be hard to read and handle. Any larger, and it wouldn’t fit in my pocket, and that’s an important criterion.

I think the touch interface on the phone is a significant aspect of the appeal as well. The ease of flipping pages with a simple tap or a swipe of a finger seems to satisfy two requirements of a book: the tactile and the cerebral. The phone feels nothing like a real book, of course, but the responsiveness is so effortless that the device fades into the background so as to not interfere with the more cerebral experience of reading.

A touch interface is both a blessing and a curse for e-book readers. It’s great for vague gestures that can be handled with leniency, such as dragging a scrolled page up or down, or flicking the pages of a paginated document forward or backward. The touch interface becomes awkward for activities that require more precision, such as text selection. The sheer difference in size between your fingers and the words on the screen makes text selection nearly impossible without some special assistance from the program. Yet, this assistance must itself feel natural. If text selection turns into a complex process from the user’s perspective, it might as well not be a program feature at all.

While it’s easy to imagine an e-book reader without text selection, the feature offers too many benefits to ignore. An e-book reader can let the user look up a selected word or phrase in a dictionary, Wikipedia or a search engine. A selected sentence might be saved along with a reader’s note such as “Great insight!” or “That doesn’t follow at all.” Or you might want to paste a selection from the book into an e-mail.

Though I’ll be speaking about text selection in the context of an e-book reader, the concepts can be applied to any Windows Phone 7 program that displays text to the screen and allows the reader to interact with that text.

The Text-Selection Quandary

The first problem in text selection is hit-testing. When the user touches the screen, what word is underneath that finger?

Windows Phone 7 has three programming interfaces for touch: the low-level Touch.FrameReported event, the higher-level Manipulation events and (my favorite) the Gesture interface available with the Silverlight for Windows Phone Toolkit, which is downloadable from CodePlex. Each of these interfaces lets you determine the element at the point where a finger touches the screen. For text, this element is probably a TextBlock.

What you cannot do, however, is easily determine what part of the text displayed by the TextBlock the user is touching. If the TextBlock is displaying multiple words, you can’t distinguish among these words without performing calculations using font metric information. And if you have the necessary font metric information—for example, based on the techniques I described in last month’s installment of this column—you’re better off using that information to separate text into multiple words where each TextBlock is devoted to displaying a single word.

If each word is a TextBlock, it’s easy to determine which word is at the point on the screen corresponding to the user’s finger, and then to track the user’s finger in selecting multiple consecutive words. However, the user might be more uncertain than the program! Fingers are not transparent, and fingertips often dwarf the tiny words on the screen.

To accurately select individual words on the screen, the user needs to zoom in on the screen before selecting the text. With multi-touch interfaces, the standard way to expand an element on the screen is a two-finger pinch operation. However, pinch is only part of it. Once the screen is magnified, the user should be able to use a single finger to shift the page around relative to the viewport of the screen. This is sometimes known as a “pan” operation, from the use of the word in motion picture camerawork.

In short, implementing text selection means dealing with touch events that are interpreted differently for different modes of operation. A drag operation—touching the screen and moving the finger—normally has the effect of shifting the page forward or back. But a drag operation is also required to extend a text selection beyond one word, which means that something is needed to put the program into a text-selection mode.

To signal when a selection begins, an excellent choice is the gesture called “hold.” This occurs when the user presses a finger to the screen and holds it still for about a second. Any drag operation that follows the hold gesture is interpreted as extending the selection.

However, if the user has used a pinch operation to expand the screen prior to making the selection, then a regular drag—that is, a drag that’s not preceded by a hold—must be interpreted as a panning operation rather than a page transition.

In summary, drag operations can be interpreted in three different ways depending on the current mode.

Implementing Gesture Modes

If you’ve been following the past few installments of this column, you know I’ve been progressively building a Windows Phone 7 e-book reader by isolating various features and exploring them. The downloadable program for this article is called BleakHouseReader. Like the other programs, it’s restricted to one book, this time Charles Dickens’ 1853 novel “Bleak House,” which is not the total downer the title would seem to indicate. The book is a public domain plain text file downloaded from the Project Gutenberg Web site.

The previous program (PhineasReader) showed the performance improvements in pagination when you switch from using a TextBlock for an entire paragraph (or as much of a paragraph as can fit on one page) to using a separate TextBlock element for each word on the page. Besides providing faster layout, that switch was also the necessary first step to implementing text selection.

The BookViewer control in BleakHouseReader installs handlers for all but one of the touch gestures, as shown in the core of the BookViewer.xaml file in Figure 1. The only gesture the control ignores is DoubleTap. As usual, the three nested Border elements are used to host the current page, previous page and next page. Additions to BookViewer.xaml for this program are the two blocks of transforms used for pinch and pan operations. The second block of transforms lets the program do some transform calculations without explicitly performing matrix multiplication.

Figure 1 The Content Area of BookViewer.xaml

<Grid x:Name="LayoutRoot">
  <toolkit:GestureService.GestureListener>
    <toolkit:GestureListener GestureBegin="OnGestureListenerGestureBegin"
                             GestureCompleted="OnGestureListenerGestureCompleted"
                             Tap="OnGestureListenerTap"
                             Hold="OnGestureListenerHold"
                             Flick="OnGestureListenerFlick"
                             DragStarted="OnGestureListenerDragStarted"
                             DragDelta="OnGestureListenerDragDelta"
                             DragCompleted="OnGestureListenerDragCompleted"
                             PinchStarted="OnGestureListenerPinchStarted"
                             PinchDelta="OnGestureListenerPinchDelta"
                             PinchCompleted="OnGestureListenerPinchCompleted" />
  </toolkit:GestureService.GestureListener>
     
  <Grid Name="manipulationGrid" CacheMode="BitmapCache">
    <Border Name="pageContainer0" Style="{StaticResource pageContainerStyle}">
      <Border Name="pageHost0" Style="{StaticResource pageHostStyle}" />
    </Border>
     
    <Border Name="pageContainer1" Style="{StaticResource pageContainerStyle}">
      <Border Name="pageHost1" Style="{StaticResource pageHostStyle}" />
    </Border>
     
    <Border Name="pageContainer2" Style="{StaticResource pageContainerStyle}">
      <Border Name="pageHost2" Style="{StaticResource pageHostStyle}" />
    </Border>
 
    <Grid.RenderTransform>
      <TransformGroup x:Name="transformGroup">
        <MatrixTransform x:Name="matrixTransform" />
        <ScaleTransform x:Name="scaleTransform" />
        <TranslateTransform x:Name="translateTransform" />
      </TransformGroup>
    </Grid.RenderTransform>
  </Grid>
 
<!-- Scratch pad transforms -->
<Grid Width="0" Height="0">
  <Grid.RenderTransform>
    <TransformGroup x:Name="calcTransformGroup">
      <MatrixTransform x:Name="calcMatrixTransform" />
      <ScaleTransform x:Name="calcScaleTransform" />
      <TranslateTransform x:Name="calcTranslateTransform" />
    </TransformGroup>
  </Grid.RenderTransform>
</Grid>
</Grid>

The codebehind file for BookViewer.xaml is BookViewer.xaml.cs, but to prevent that file from becoming too unwieldy, I created a second codebehind file named BookViewer.Gestures.cs specifically for the 11 handlers for the 11 touch gestures.

Previously I used a couple of Boolean fields to help interpret complex combinations of gestures. For this program I switched to enumerations. Figure 2 shows the ViewerTouchMode and ViewerDisplayMode enumerations. These help the gesture event handlers keep what’s happening straight and avoid collisions.

Figure 2 The ViewerTouchMode and ViewerDisplayMode Enumerations

public enum ViewerTouchMode
{
  Reading = 0,
  Dragging,
  Selecting,
  Pinching,
  Panning,
  Animating,
}
 
public enum ViewerDisplayMode
{
  Normal = 0,
  Zoomed
}

For example, a page transition begins with a tap, drag or flick gesture by the user, but ends with an animation. If any gesture begins when the current touch mode is ViewerTouchMode.Animating, the entire gesture is ignored. The handler for the hold gesture checks if a TextBlock is under the finger. If so, the current touch mode becomes ViewerTouchMode.Selecting until the finger lifts from the screen. When the touch mode is Selecting, a drag gesture extends the selection.

If no fingers are touching the screen, the current touch mode is either ViewerTouchMode.Reading or ViewerTouchMode.Animating, and when the animation ends, the touch mode will switch to ViewerTouchMode.Reading. Aside from the Animating value, the touch mode only pertains to gestures currently in progress.

However, the user can pinch the screen to make it larger, and remove all fingers from the screen, and the screen should remain zoomed. Then the user can perform another pinch operation to make the screen still larger or smaller, and combine that with a drag operation to move it around. This is why the ViewerDisplayMode enumeration is required. For example, if the user drags a finger on the screen, that becomes a page transition if the current display mode is ViewerDisplayMode.Normal, but becomes a pan operation for ViewerDisplayMode.Zoomed.

Clamping Pinch and Pan

The current display mode is switched from ViewerDisplayMode.Normal to ViewerDisplayMode.Zoomed if the user touches the screen with two fingers and stretches his fingers apart to make the display larger. The display mode remains ViewerDisplayMode.Zoomed until the user taps the screen. The BookViewer control responds to the tap by animating the screen back to normal size and switching to ViewerDisplayMode.Normal. Alternatively, if the user pinches the screen and contracts it so the scaling factor drops below 1.05, the screen is automatically restored to normal size.

When the display mode is zoomed, any additional pinching and panning operations are compounded with those from previous operations. This is the reason for the three grouped transforms in Figure 1. During a single pan operation, the TranslateTransform properties are altered to move the Grid relative to the screen. During a single pinch operation, both the ScaleTransform and TranslateTransform get involved. After the gesture has completed, the TransformGroup indicates the total transform. A short method named ConsolidateTransform in BookViewer.xaml.cs transfers this total transform into the MatrixTransform, and then sets the TranslateTransform and ScaleTransform back to default values in preparation for the next gesture operation. In this way, the MatrixTransform accumulates the effects of all the pinch and pan operations.

I also found it necessary to limit the extent of the pinch and pan operations. For example, it only makes sense for a pinch operation to result in a page-scaling factor greater than one. There’s no reason to let the page shrink to less than its normal size. Similarly, it makes no sense to let the page be panned so we can see “underneath” the page. The left edge of the page shouldn’t appear to the right of the left edge of the screen, and the equivalent for the other three sides.

These restrictions are applied in a method in BookViewer.xaml.cs named ClampPinchAndPanTransforms, shown in Figure 3. This method was definitely one of the most difficult parts of this program. It uses the other three transforms in BookViewer.xaml for performing “scratchpad” matrix-transform calculations.

Figure 3 The ClampPinchAndPanTransforms Method

void ClampPinchAndPanTransforms(
  double scale, double translateX, double translateY)
{
  // This is the matrix transform from previous operations.
  calcMatrixTransform.Matrix = matrixTransform.Matrix;
 
  // Calculate scaling factor so it's always 1 or greater.
  double totalScale = scale * matrixTransform.Matrix.M11;
  totalScale = Math.Max(1, totalScale);
  scale = totalScale / matrixTransform.Matrix.M11;
 
  // Set up properties for new scale matrix.
  calcScaleTransform.CenterX = scaleTransform.CenterX;
  calcScaleTransform.CenterY = scaleTransform.CenterY;
  calcScaleTransform.ScaleX = scale;
  calcScaleTransform.ScaleY = scale;
 
  // Set up properties for new translation matrix.
  calcTranslateTransform.X = translateX;
  calcTranslateTransform.Y = translateY;
 
  // Obtain the total matrix from the transform group.
  Matrix totalMatrix = calcTransformGroup.Value;
 
  // Restict translation to original area of transformed element.
  double totalTranslationX = totalMatrix.OffsetX;
  double totalTranslationY = totalMatrix.OffsetY;
 
  double clampedTranslationX =
    Math.Min(0,
      Math.Max((1 - totalMatrix.M11) * manipulationGrid.ActualWidth,
        totalTranslationX));
 
  double clampedTranslationY =
    Math.Min(0,
      Math.Max((1 - totalMatrix.M22) * manipulationGrid.ActualHeight,
        totalTranslationY));
 
  // Adjust translation factors.
  translateX += clampedTranslationX - totalTranslationX;
  translateY += clampedTranslationY - totalTranslationY;
 
  // Set transforms.
  scaleTransform.ScaleX = scale;
  scaleTransform.ScaleY = scale;
  translateTransform.X = translateX;
  translateTransform.Y = translateY;
}

Left- and right-flick gestures normally initiate page transitions and are implemented with animations to go to the next or previous page. For this version, I decided that flick gestures going up or down should insert a bookmark at that page, but those bookmarks haven’t been implemented yet.

But what about flick gestures when the current display mode is ViewerDisplayMode.Zoomed? It seems as if flick gestures should move the page relative to the viewport. I decided to simply move the zoomed page to the extreme left, top, right or bottom of the viewport, depending on the angle of the flick gesture. This is one part of the implementation that I’m not entirely happy about. The movement should really be animated to seem as if the page is moving as a result of momentum and then slowing down.

When the user presses a finger to the screen over a word for a second, moves that finger to extend the selection and then lifts the finger, the BookViewer control fires a TextSelected method. MainPage handles this event by displaying a little menu on top of the page, as shown in Figure 4.

A Zoomed Page with a Text-Selection Menu
Figure 4 A Zoomed Page with a Text-Selection Menu

The first item is not yet implemented. This feature will allow the user to type in a little note about the selected passage. These notes will be saved with document settings. The last item simply dismisses the menu. The “bing” item uses the Windows Phone 7 SearchTask class to invoke the phone’s standard Web search application. The other two items use the WebBrowserTask to invoke Internet Explorer with a URL that includes 
a query string with the text selection.

For the dictionary, I originally wanted to use the Bing dictionary, but in practice it seemed rather erratic. Sometimes it didn’t seem like a dictionary at all. After some further exploration, I ended up with the Google dictionary, which provides extensive results that seemed more suitable for my needs.

Second Thoughts

I mentioned that the “notes” option on the text-selection menu is unimplemented, as well as the feature to insert bookmarks with flick gestures. That’s not all that’s not yet implemented! In the image in Figure4, the first button on the application bar brings up the chapter list. That works. The other three buttons will eventually bring up a list of all bookmarks, a list of all notes and a dialog to search for text in the book—but not quite yet.

As I’ve been reading “Bleak House,” I’ve enjoyed looking up words or phrases in Bing, Wikipedia and the Google dictionary. It’s fairly simple to expand the page, select the word or phrase and then tap the menu item.

However, I can tell already that this is not the best approach for selecting text for the notes feature. Text selected for making a note is almost always at least a sentence in length. But once the page is expanded, part of that sentence will probably be off the screen, and there’s currently no way to pan the screen while making a selection.

I’m wondering now if text selection for the notes feature should be a little different. I’m wondering if a different text-selection scheme could automatically kick in if the page is not zoomed. I’m wondering if a hold gesture on an unzoomed page should select an entire sentence, and then dragging should extend the selection to other whole sentences.

That’s the great thing about software: Nothing is ever fully committed. There are always opportunities for enhancements and improvements.


Charles Petzold is a longtime contributing editor to MSDN Magazine*. His recent book, “Programming Windows Phone 7” (Microsoft Press, 2010), is available as a free download at bit.ly/cpebookpdf.*

Thanks to the following technical expert for reviewing this article: Chipalo Street