next up previous contents
Next: A Minimal Device-Independent Text Up: No Title Previous: Issues in Text Input

Subsections

   
Existing Text Input Methods

Often, as is the case with the new text input method given in chapter 4, a close look at the previously existing methods shows that the ideas are not so novel but rather slight variations of preceding work.

To give the reader an idea of the multitude of methods that have been developed over the years an approximate map of the field is given in this chapter. Examples of some important text input methods are presented with short discussion about their strengths and weaknesses.

The map takes a form of a tree seen in figure 3.1. The tree has five main branches: keyboards, text recognition, unistrokes, speech recognition and gesture recognition. Each branch has a number of sub-branches of which the most relevant ones are shown in figure 3.1.


  
Figure 3.1: Map of text input methods.
\begin{figure}
\begin{center}
\epsfig{file=figures/methods.eps,width=12cm}\end{center}\end{figure}

We intentionally avoid calling the map a taxonomy. The main reason for this is that we do not have unambiguous criteria for the classification. We wanted to give a special place for text recognition as computationally expensive and algorithmically difficult method and for unistrokes as computationally simple alternative. The rest of the map is built around these two main branches so that it accommodates most of the methods that have been developed for small portable computing devices.

Keyboards

The most popular method for transferring text from human mind into the memory of the computers is the keyboard. The keyboard gained popularity and developed a large body of skilled users already before the computers invaded the offices. The QWERTY-layout that was patented in 1868 and intended to be used in mechanical railroad ticket typewriters [Barber1997], is still the prevalent layout in our digital PCs. Some refinements have been introduced and the appearance of the keyboard may be very varied, but QWERTY is still the most common arrangement of the alphabet keys by a huge majority. Competing arrangements have been suggested during the years, but they have not gained much popularity.

Today, as computing devices are getting ever-smaller, there is an increasing interest in new, smaller, keyboards. Simultaneously keyboards have shown yet more evidence of their attractiveness. Like many other things in our time, they have moved to virtual worlds. Of all possible text input methods virtual keyboards displayed on a touch screen seem to be the best choice in many situations [MacKenzie et al.1994].

The range of different keyboards available today is so large that it is pointless to list all the different devices here. Only a few main branches of the keyboard family tree are discussed. The two main branches are the physical and the virtual keyboards. These branch again into sub-branches (see figure 3.1). The most significant physical keyboard types are traditional QWERTY-layouts, different optimized layouts with full number of keys, intelligent keyboards with fewer keys, and chord keyboards. The virtual keyboard branch has the same sub-branches except for the chord keyboard branch.

Physical keyboards

A self-evident example of a traditional physical QWERTY-keyboard is the 104-key PC-keyboard. The most important optimized layout with full number of keys (i.e. at least a key for every lowercase letter in English) is the Dvorak-layout. As examples of keyboards with fewer number of keys the Half-QWERTY, FOCL, and SHK keyboards are discussed followed by three ways of inputting text with the telephone keyboard. A good example of a chord keyboard has been described by Gopher and Raij gopher88.

QWERTY

The physical characteristics and operation of the QWERTY-keyboard should be familiar to everyone. Because QWERTY-typing has been a major cost generator in organizations throughout the century, it has been studied carefully. We know that QWERTY-touch-typing is difficult to master, but with the proper learning effort it is a fast and accurate method for text input in controlled office-like surroundings.

The difficulties in learning QWERTY-touch-typing may have many explanations. Three very convincing reasons given by Gopher and Raij gopher88 are:

On the other hand, once a typist has with persistent training attained a good skill in QWERTY-typing, the speed is amazing. Having separate keys for most common characters allows the typist to prepare several strokes in parallel by moving fingers to the correct locations. As soon as a key has been pressed, the finger that is left free can again be moved to a new location while typing continues elsewhere on the keyboard.

Optimized layouts

Because the QWERTY-layout was designed by a mechanical engineer mainly from the perspective of trying to make it mechanically sound, it is not optimal from the typist's point of view. Some have claimed that the QWERTY-layout was designed so that members of frequent digrams are scattered across the keyboard to avoid jamming the arms of a mechanical typewriter. Others point out that this cannot be true because the QWERTY-layout forces the typist to use more adjacent keys than a random layout [Barber1997]. Soles, Glidden and Soulé (the inventors of QWERTY) would have been pretty lousy engineers if QWERTY is their best effort in avoiding closely located common digrams. Regardless of the motivations and skill levels of the 19th century engineers, we can today agree on that the keys can be arranged in a way that requires far less finger travel and distributes work better to both hands than QWERTY does. The most famous of the newer layouts optimized for ten finger typing speed is the Dvorak keyboard shown in figure 3.2


  
Figure: The Dvorak keyboard layout [Huerta1999].
\begin{figure}
\begin{center}
\epsfig{file=figures/dvorak.eps,width=14cm}\end{center}\end{figure}

The improvement in speed over QWERTY is not as great as one might expect. The main reason for this is the parallel nature of typing that was described above. The finger travel is not such a great problem, because it can occur in parallel. Better alternation between the hands may, however, help to achieve better parallelism. Significant speedup can also occur within the set of characters pressed by the same finger and in the difficulty of typing repeating characters.

Dvorak-layout does require less finger travel, and the most frequent strokes do not require as much finger acrobatics as they do with QWERTY-layout. The more natural operating posture of the hands is probably responsible for the claims that Dvorak keyboard causes fewer repetitive stress injury cases than QWERTY.

Although the Dvorak-layout is not always clearly faster, it seems to be at least slightly faster. Much of the information available is claimed to be biased by the ``holy war'' on keyboard superiority that has been going on since the Dvorak layout was first published.

A significant problem with the optimized layouts is that their benefits are to an extent tied to the language or task that they were optimized for. Using an optimized layout for every language could cause too many problems to be a useful solution.

A universally optimized layout could be better, but even this solution may not be significantly better than QWERTY because the advantage gained from the optimization may be in the scale of just few percent due to the counterbalancing effect of the different tasks and languages. For example the left curly bracket ``{'' is very rare in English, but very common in C++.

Intelligent keyboards

As the number of keys decreases, some other measures must step in to keep the input rich enough. One feature that can be introduced is a selection of operating modes. Even on the standard PC keyboard we have Caps-Lock and Num-Lock modes which are frequently used to set the keyboard into a different operating mode. With very few keys the number of modes would grow to be larger than can be conveniently handled and therefore more complex algorithms are used.

The Half-QWERTY keyboard [Matias et al.1993] shown in figure 3.3 is an example of a keyboard where the number of keys has been halved, and the number of operating modes has been increased. The keyboard can be used with either left or right hand. Both arrangements are shown in figure 3.3. The spacebar key is used for switching either the actual half of the keyboard or the mirror image of the other half active. The missing side of the keyboard is mapped as a mirror image on the existing side. Matias et al. demonstrated significant skill transfer from regular QWERTY-keyboard in learning to type with Half-QWERTY. They estimate that with adequate practice typists will achieve up to 88% of their two handed QWERTY-typing rate with Half-QWERTY.


  
Figure: The Half-QWERTY keyboard [Matias et al.1996].
\begin{figure}
\begin{center}
\epsfig{file=figures/half-qwerty.eps,width=12cm}\end{center}\end{figure}

The Single Hand Key Card (SHK) is an eighteen key keyboard designed to be used with four fingers of one hand. The card is held between the thumb and the fingers with the base of the card resting against the palm. The key arrangement is shown in figure 3.4. Most of the keys are labeled with two characters. When a user wishes to input a word she presses the keys on which the desired characters are seen. When the key-sequence for the word is finished the user presses the ambiguity resolution key (AR). The SHK support software takes the key sequence and searches a dictionary for words that can be constructed using the characters imprinted on the pressed keys. [Sugimoto and Takahashi1996]


  
Figure: The SHK key arrangement [Sugimoto and Takahashi1996].
\begin{figure}
\begin{center}
\epsfig{file=figures/shk.eps,width=12cm}\end{center}\end{figure}

Sugimoto and Takahashi do not give precise data on user performance, but they expect speeds faster than 40 words per minute (wpm) to be possible. SHK is a system for inputting English text. It does not support other languages. The keyboard and the dictionary will have to be tailored for each language. As shown in figure 3.4 SHK has a small joystick with three switches for mouse-like input. Sugimoto and Takahashi clearly tried to make SHK a complete solution for mobile interfaces. Without further testing it is hard to say whether SHK would outperform a pen interface. At least it should allow for more eyes-free operation than most pen-interfaces.

The traditional way of inputting text with a mobile phone keyboard is that each key is mapped to several characters. Pressing the key once produces the first character, pressing twice the second etc. The speed of writing is a lot slower than it is with a full sized keyboard. With a dictionary the method can be improved at least in two different ways. First, each key is mapped to several characters as before and characters are entered exactly the same way. The dictionary is scanned for words with a beginning that matches the string input so far. When the typing proceeds and the word becomes unambiguous, the rest of the word is inserted and typing proceeds from the end of the word. The second way is to do the dictionary search with an algorithm similar to the one used in the SHK support software. Each key is pressed only once and the dictionary is searched for words that match to any sequence of characters that is mapped to the keys in question. Both of these speed-improved methods require a backup method for inputting words or character sequences not found in the dictionary.

Chord keyboards

Adding operating modes to a keyboard can be done with a state that persists until the state-inducing key is pressed again. This is the way that the Caps-Lock and Num-Lock keys on a regular keyboard work. Another way is to tie the mode to a state of a given key as is done with the Shift, Ctrl and Alt keys on the common QWERTY-keyboards. While the key is pressed, the keyboard is in a different mode than when the key is not pressed. Taking this a bit further gives us a keyboard where all keys change the mode of the keyboard. Now we can map a mode to a character and we have a chord keyboard where characters are input by pressing a combination of keys simultaneously. Doing this is not as difficult as working with the same number of ``sticky'' keys like Caps-Lock because the user touches each active key and thus it is harder to forget the state of the keys. A chord keyboard with eight keys has 256 different modes and can therefore be used to input any eight bit ASCII code. Further modifications of the basic idea may take into account the order in which the keys were pressed and gives even richer input with fewer keys.

The intelligent keyboards discussed above have a more complex algorithm for setting the keyboard state. In intelligent keyboards the state of the keyboard is implicit because the computer handles it automatically. With chord keyboards the large set of states is not hidden. Instead, the user explicitly chooses the keyboard state by pressing several keys simultaneously.

In their 1988 article Gopher and Raij give some of their results on experiments with a two-hand chord keyboard gopher88. Their keyboard has two similar units of five keys and an extra shift-key for the thumb. The units are mirror images of each other and each is operated with one hand. The five keys allow 31 different combinations. Each shift key gives another 31 combinations. Thus even one hand can be used to input English lower and upper case letters and there still is room for several other characters.

Gopher and Raij compared the rate of increase in typing speed in three test groups. The first group used a two-handed chord keyboard. The second group used a one-handed chord keyboard and the third group used a traditional keyboard with Hebrew layout.

They found that for the first 25 hours of practice the chord keyboards were clearly faster. After that the two-handed chord keyboard continued with faster learning rate and the one-handed chord keyboard was roughly equal to the traditional keyboard in learning speed. At the end of the 35-hour training group one was typing at an average speed of 42 words per minute. Group two reached the speed of 36 wpm and the group with the traditional keyboard finished with the speed of 24 wpm. Clearly learning was faster with the chord keyboards.

The speeds for any of the groups had by no means peaked yet. Two of the two-handed group continued up to 50 hours of practice and reached the speed of 51 wpm. One continued to 60 hours and finished with 59 wpm. However, Gopher and Raij suspect that the speed increase with the chord keyboards will level off sooner than with traditional keyboards. Thus with training period significantly longer than 35 hours the traditional keyboard will start gaining in the speed comparison eventually finishing with superior speed. The reason for this is the possibility of parallel preparation of the strokes. Chord keyboards are more serial because many consequent characters require the same keys and fingers to be used.

However, the training required for superior speeds with traditional keyboards may be too much for most users. Furthermore, the skill may be so complex that it requires constant formal training to keep. Therefore the chord keyboard could be better for casual typists. It gives good enough speed and is faster to learn. According to Gopher and Raij the chord keyboard does not exhibit negative transfer from existing typing skills because the cognitive structure of the keyboard and the typing activity is different and better suited for human capabilities.

Despite their good qualities chord keyboards have not gained much popularity outside some special areas such as mail sorting or stenography. One reason for this is that in comparison the QWERTY-keyboard seems deceivingly easy to use. Having a separate key for each character makes QWERTY look so easy that most people do not bother to train themselves in typing. Often they are not aware that their typing, despite the illusion of productivity, is in fact slow and laborous.

Virtual keyboards

Due to skill transfer benefits QWERTY-keyboard is popular amongst the virtual keyboards too. Layout optimization, however, is very different for virtual keyboards because touch screens often allow only one touch point at a time. The one touch point limitation slows down the typing speeds and therefore adds value to dictionary based guessing schemes that may be included in the text input system.

QWERTY

Because text input is a complex undertaking and people are slow to learn, a familiar keyboard layout gives a lot better walk-up performance for virtual keyboards than even the most optimal unfamiliar layout. This is the main reason for the fact that when we walk to a touch screen based info kiosk or vending machine, the system used for text input is most likely a virtual keyboard with QWERTY-layout. Most people today have some familiarity with the QWERTY-layout and thus it is beneficial for the manufacturers to use QWERTY. Some users might even be annoyed if the machines presented them with a new "optimal" keyboard layout.

Optimized layouts

Most touch screens and digitizing tablets allow only one point of input at a time. Thus the user cannot keep many fingers on the keys of the virtual keyboard like is customary to do in preparation for strokes with a physical keyboard. This limitation effectively reduces typing into a serial task. The parallelism that is essential for gaining great speeds with a physical keyboard does not exist and typing speeds cannot reach the magnitude that we have observed for well trained touch typists.

By killing the parallelism, the virtual keyboards give new value to finger-travel optimization. The difference in speed between an unoptimized layout such as QWERTY and a layout optimized for single-finger typing should be much greater with virtual keyboards than the difference between physical QWERTY and ten-finger optimized layout such as Dvorak.

OPTI is one of the optimized virtual keyboard layouts for the English language. Figure 3.5 shows the OPTI layout as described by MacKenzie and Zhang mackenzie99. The keyboard layout was optimized for speed using trial and error, Fitts' law, and character and digram frequencies in English. Fitts' law gives a function for computing the tapping time given the length of the movement needed and the width of the target thus enabling a researcher to compute a prediction for the upper bound of user performance given the keyboard layout. Trial and error is needed to generate the keyboard layouts.


  
Figure: The OPTI keyboard [MacKenzie and Zhang1999].
\begin{figure}
\begin{center}
\epsfig{file=figures/opti.eps,width=10cm}\end{center}\end{figure}

According to the calculations of MacKenzie and Zhang mackenzie99 the OPTI layout is theoretically 35% faster than QWERTY and 5% faster than FITALY [Isokoski1998,Textware Solutions1998] which is another one-finger ``optimized'' layout. In a longitudinal study described by MacKenzie and Zhang the speed difference between OPTI and QWERTY seemed to exist in the real world too. The test group of five previously QWERTY aware students reached their QWERTY tapping rate in just ten 22 minute sessions. At the end of the 20 session experiment during which the subjects received equal amount of training in both QWERTY and OPTI tapping, OPTI was clearly faster with average speed of 45 wpm. With QWERTY-layout the group reached the speed of 40 wpm. MacKenzie's and Zhang's test subjects were instructed to aim for both speed and accuracy. Emphasis on speed may have contributed to the error rate which was over four percent for both keyboard layouts. The error rate with OPTI was consistently slightly lower than with QWERTY.

Intelligent keyboards

Fluctuating Optimal Character Layout (FOCL) is a text input method for five keys and a display. Four keys are used to move a cursor over an optimally arranged keyboard and the fifth key is pressed to select a character. Between characters the keyboard is rearranged to reflect the probabilities of the next character. The cursor is also returned to the upper left corner to be close to the most probable characters. Bellman and MacKenzie bellman98 tested FOCL with 26 lower case characters and space using eleven test subjects. For comparison a traditional QWERTY-arranged virtual keyboard was used with a similar selection technique except for returning the cursor to a home position. Both arrangements performed equally well reaching a speed of 10 wpm by the tenth and last 15 minute test session.

The emphasis on FOCL is to minimize the key presses that are needed for operating the virtual keyboard. POBox is a system for doing a similar thing using a pen interface. The keyboard layout is static, but on each key press a menu with fluctuating layout is shown. The menu holds the most probable completions for the current word. A POBox keyboard with an active menu after pressing the f-key is shown in figure 3.6. When the user lifts the pen after pressing a key the menu moves below the keyboard. [Masui1998]


  
Figure: The POBox input area [Masui1998].
\begin{figure}
\begin{center}
\epsfig{file=figures/pobox.eps,width=12cm}\end{center}\end{figure}

In addition to English, the POBox has been implemented for the Japanese language. In Japanese operation the keyboard consists of the 46 characters of the Hiragana phoneme alphabet. Thus ideally the user types in the beginning of the pronunciation of the word and then chooses the written form of the word (a mixture of Kanji and kana characters) from the menu as soon as it appears there. While the English writing mode may not seem very inviting, Masui claims that the Japanese mode is very useful because Japanese with its numerous characters and ambiguous pronunciations makes typing a slow and difficult task even with a physical keyboard. [Masui1998]

The problem with many intelligent keyboards, physical and virtual alike, is that they rely on visual information. Thus they cannot in general be used in eyes-free manner. Often, this is not a problem, but when eyes-free operation is needed (as often is the case in note taking), even the best method that requires visual attention becomes annoyingly clumsy.

Text recognition

A huge effort has been put into teaching the computer to recognize different character systems that have developed over the years for human use. The effort has not been unsuccessful. Today Optical Character Recognition (OCR) technology is on a level that suffices for transferring printed text from paper to digital form with only occasional need for corrections.

This is not the case with handwritten text. The Cursive script produced by most English speakers is next to incomprehensible to a computer. In general the writing systems developed for human use are more or less problematic for algorithmic recognition. However, there are writing systems that have been developed for computer use. The most familiar of these is the bar code which is routinely read by computers without any errors.

Human readable characters

Text recognition for computers is very different depending on whether only the end result is available or whether the computer can observe the act of writing. The kind of text recognition that OCR software does, where only the visual appearance of the text is available, is called off-line recognition. When the computer records data as writing happens we are talking about on-line recognition. In on-line recognition we can record data like pen pressure and speed. These may contain important information that can be used to make the recognition more accurate.

There is not much objective test data available on the performance of different handwriting recognition packages. Much of the data that the manufacturers have available is marketing oriented and undoubtedly shows the software in slightly better light than is believable. The manufacturers may have achieved the results in laboratory with world's best specialists teaching both the algorithm and the user, but in real world the accuracy may prove to be worse. For this reason some of the accuracy figures listed by Tappert et al. [1994] probably belong to the marketing oriented category of results. Furthermore, the best accuracy figures are typically reached by training the recognition algorithms individually for each writer in addition to requiring the writer to write each character in separate boxes using a particular writing style.

MacKenzie et al. [1994] tested the handwriting recognizer distributed with Microsoft's Pen Windows. They used non-connected lower case alphabet without user specific training. The result was an entry rate of 16.3 words per minute with 8.1% error rate.

Even the most accurate handwriting recognizer is still bound by the limitations of the traditional character sets. Top handwriting speeds for most people tend to be around 33 words per minute [MacKenzie and Zhang1997,Karlsdottir1997]. That is about half of the speed that many people can reach with touch typing on a keyboard [Card et al.1983]. Thus it is questionable whether traditional handwriting should be used for text input even if an error free recognition system could be developed.

Machine readable characters

Because human writing is difficult for computers to read, the computers need their own writing systems. The one that most of us see every day is the bar code system. Computers write bar codes on paper and glue these labels on various items to be read again by other computers.

Figure 3.7 shows examples of the two main types of bar codes in use today. UPC/EAN codes on the left are the familiar 1-dimensional bar codes that most manufacturers use for product identification in Europe and North America. Datamatrix and PDF417 shown on the right are 2-dimensional bar codes that allow much higher data density.


  
Figure: 1- and 2-dimensional bar-codes [Adams1999].
\begin{figure}
\begin{center}
\epsfig{file=figures/barcode.eps,width=14cm}\end{center}\end{figure}

As one probably noticed while looking at figure 3.7, bar codes are not easy for humans to read. The coding is based on the size and precise location of the elements. These are variables that a machine can easily measure, but which are almost impossible for humans to see with the required precision. The data in the codes may also be compressed and it may contain error correction information which further confuses human readers.

Unistrokes

One way to overcome some of the limitations of the Latin alphabet is to design a new alphabet. Goldberg and Richardson [1993] point to the direction of shorthand systems for valuable information in designing a new text input alphabet. The lesson learned is that for a trained writer one stroke, even a complex one, is faster than several simpler ones. For one-character-per-stroke system a touch sensitive writing tablet offers the additional bonus of explicit segmentation between characters. Armed with this information Goldberg and Richardson set out to design the Unistroke[*] alphabet which can be seen as a forefather of a whole family of unistroke text input methods that have appeared lately.

Today some unistroke text input methods have moved beyond the original one character per stroke rule. Therefore, the systems will be discussed in two groups: character-level unistroke systems and word-level unistroke systems.

Character-level

The design of character-level unistroke characters is a compromise between stroke simplicity and easy learnability. If we want easy to learn strokes, we should use ones that strongly resemble regular handwriting. If on the other hand we want to emphasize simple mechanical and algorithmic efficiency we should use the simplest strokes that are recognizable by our recognition algorithm. T-Cube has the simplest strokes and longest learning time for eyes-free operation, Graffiti is slow to use, but very fast to learn [MacKenzie and Zhang1997]. Unistrokes are somewhere in between.

Unistroke

Goldberg and Richardson [1993] claim that traditional handwriting recognition is like ``hunt-and-peck'' typing and propose that a touch-typing like alternative should be available for expert users. Their candidate for this touch-typing method is the Unistroke alphabet shown in figure 3.8. As seen in figure 3.8 the more common characters such as e, a, t, i and r are mapped to the simplest possible strokes and the less frequent ones are a little slower to draw. Goldberg and Richardson report writing speeds up to twice as fast as others [MacKenzie et al.1994] have measured for handwriting recognition with regular alphabet.


  
Figure: Unistroke (a) and Graffiti (b) unistroke alphabet [MacKenzie and Zhang1997].
\begin{figure}
\begin{center}
\epsfig{file=figures/usgraffiti.eps,width=14cm}\end{center}\end{figure}

Graffiti

Graffiti is a handwriting alphabet developed by Palm Computing[*] for their PDA product series known as Palm Pilots. From the user's point of view Graffiti resembles unistrokes in the sense that characters are generally drawn in one stroke. Strictly speaking Graffiti does not completely follow the unistroke idea of one stroke per character. For example x can be drawn as two separate strokes and characters like ä and ö must be drawn in two parts.

As seen in figure 3.8 Graffiti strokes are more complex than Unistroke strokes. The added complexity makes the recognition algorithm more challenging to program and potentially slower and more inaccurate. The work of the recognition algorithm is even more difficult because Graffiti alphabet resembles the regular Latin albhabet very closely. Some of the problems with ambiguous Latin script have been avoided. For example ambiguity between 0 an o is avoided by writing numbers on different area. However some characters remain very close to each other. For example H and L are easily mixed if the horizontal part of L is even slightly curved.

Despite the obvious problems with speed and ambiguity the Pilots and Graffiti along with them have been a commercial success. This should always be noted before claiming Graffiti to be inferior to other methods. In other words, although Graffiti is not perfect it works well enough and shows that users may value easy learnability over eventual expert performance.

   
T-Cube

Marking menus are a system where instead of drawing the menubar at the top of the window a radial (pie) menu can be used on top of the working area of the window. The menu is activated by clicking and holding for a short period of time (0.3 s). The selection is done by moving the pen (or mouse or whatever the pointing device in use happens to be) over the desired menu item and lifting the pen (or mouse button). When multi-level menus are used the sub-menu is displayed when the user moves over the item that has the sub-menu. Thus choosing an item from a multi-level menu is a continuous stroke with sharp corners between the levels of the menu. Users can learn to draw these strokes without seeing the menu and achieve significantly better speeds than with traditional menus. [Kurtenbach and Buxton1993]

Marking menus can have varying number of items per level and varying depths. Kurtenbach and Buxton measured user performance using menus with breadths of 4, 8 and 12 combined with depths from 1 to 4 kurtenbach93. They found that the response time of the user increases linearly with increasing depth. The response time is also greater when using menus with more items per level. In terms of speed it does not matter much whether we use a menu with three levels of four items, two levels of eight items or two levels of twelve items. However with depths greater than one the error rate increases dramatically when we move from four items per level to eight or twelve items per level.

The error rate for four levels of four items per level is around 5% while the error rate for one level of eight items per level is already in the same range. With more than one level and 8 or 12 items per level the error rate grows quickly above 10%.

Kurtenbach and Buxton also compared mouse and pen as input devices. Their results show that mouse and pen were roughly equal in speed and error rate when using four items per level with mouse being slightly slower and more error prone. With more items per level the mouse was clearly inferior.

All in all, it seems that to achieve the speed and accuracy needed for text input one should stay below four items per menu and use a pen if possible.

We have not encountered reports on using Marking menus directly for text input. However, a slightly modified implementation exists. This system is called T-Cube and is shown in figure 3.9. The difference to Marking menus is that the initial menu is always shown. Input is initiated by touching one of the slices of the radial menu. The touch causes the second level of the menu to be displayed offset so that it never is hidden by the user's hand on a touch-screen. A character is chosen by moving the pen to the direction of the desired slice in the second level menu. The second level menu is displayed just to remind the writer of its layout and the proper direction for the stroke. The stroke is drawn starting from the chosen slice of the first level menu to the direction shown in the second level menu. T-cube uses 9 items in the first level menu and 8 in the second. Thus the user can choose from 81 items by tapping on a slice or performing a simple flick gesture. [Venolia and Neiberg1994]

  
Figure: T-Cube [Venolia and Neiberg1994].
\begin{figure}
\begin{center}
\epsfig{file=figures/t-cube.eps,width=12cm}\end{center}\end{figure}

The input used in T-Cube is very close to the minimum. One cannot make pen input much simpler than a tap or a flick. However, T-cube suffers from the limitations of virtual keyboards. It does not allow eyes-free operation because choosing accurately from the 9 initial slices is very difficult without seeing the menu.

The learning path of T-Cube is also keyboard-like. A novice can use ``hunt-and-peck'' like operation by looking for the desired character in the menus and then choosing it. An expert user will learn to choose from the second level menu without looking and potentially achieve very high speeds. Venolia's and Neiberg's results venolia94 show that learning to be an expert user with T-Cube is a long undertaking. None of their 11 subjects showed clear signs of decay in speed improvement during the course of four to nine learning session. The fastest user reached the speed of 20 wpm while the average was around 16 wpm. These speeds are already comparable to hand printing.

Unfortunately Venolia and Neiberg do not give figures for error rate. We can assume that the error rate is comparable to what others have found on 8-item pie menus [Kurtenbach and Buxton1993,Balakrishnan and Patel1998]. That means that T-Cube error rate is within or close to the range of 5% to 7.3%

Word-level

While character level unistrokes make the design of handwriting recognizers a lot easier with their explicit segmentation and better differentiation of character forms, they also make the writer lift the pen between every character. Undoubtedly writing could be more efficient if the pen could be kept down all the time like most of us do when we write fast using our own variations of cursive Latin script.

Two new systems allowing words or even whole texts to be written with a single gesture were introduced in 1998. The first one called Cirrin [Mankoff and Abowd1998], while probably usable, does not offer great improvements in speed or accuracy, but the second one, Quikwriting [Perlin1998], is very promising.

Cirrin

Figure 3.10 shows the input area of the Cirrin text input method [Mankoff and Abowd1998]. Cirrin is meant to be used with a stylus. One puts the stylus down inside the ring and then moves it over the areas labeled with the characters. Input is generated from the coordinates of the points where the pen crosses the circumference of the inner circle.


  
Figure: Cirrin [Mankoff and Abowd1998].
\begin{figure}
\begin{center}
\epsfig{file=figures/cirrin.eps,width=9cm}\end{center}\end{figure}

Only lowercase alphabet are shown along the circumference in figure 3.10. Putting all 128 characters of the ASCII character set or the thousands in Unicode is clearly not an option. The slices would become too narrow to hit with the pen. The authors suggest using the non-dominant hand to input the characters not found in the Cirrin input area. The device operated with the non-dominant hand may be chosen freely. Mankoff and Abowd have used a regular QWERTY-keyboard. They also discuss using a regular handwriting recognizer for the task. In a mobile setting a QWERTY-keyboard is not an option. A handwriting recognizer might be workable, but that raises the question of whether Cirrin offers enough benefits to be worth the screen real estate if we will have a handwriting recognizer anyway.

While we have found no thorough test reports on Cirrin, it seems that the accuracy could be good enough for text input. The speed may prove to be a bottleneck since visual feedback is needed in order to hit the relatively small areas with the pen. Cirrin is very dependent on the pen-interface and offers no eyes-free operation.

Quikwriting

Quikwriting is a text input method for stylus-based user interfaces. Figure 3.11 shows two quikwriting input areas with a stylus trace. The trace on the left shows the gesture needed for inputting an f. The trace on the right shows a gesture that results in the word ``the'' being input.


  
Figure: Quikwriting [Perlin1998].
\begin{figure}
\begin{center}
\epsfig{file=figures/qwthe.eps,width=10cm}\end{center}\end{figure}

The method works by dividing the input area into 9 zones in a 3x3 grid formation. The zones are numbered 1 through 9. The central zone (5) is the ``home'' zone from which each stroke starts and onto which all strokes end. Although the zones can be thought to reside in a grid formation, they are not necessarily rectangular. In fact the most functional layout is likely to resemble the Cirrin layout shown in figure 3.10 with only eight slices in the ring. A character is drawn as a loop that crosses one or several of the zones surrounding the home zone. Quikwriting recognizes the character from a sequence of zone changes. The changes that are taken into account are the ones from or to the home zone. For example the f in figure 3.11 is chosen because the pen leaves from zone 5 to zone 3 and returns from zone 2.

Because the space is limited on the Quikwriting input area, all characters cannot be input using this simple method. The method chosen for expanding the input space is to have different modes of the input area. The implementation that Perlin describes in perlin98 has four modes between which the user can switch by inputting a ``shift'' character. One shift character causes the next character to be chosen from the character set associated with the shift character in question. Two consequent shift characters ``lock'' the mode so that several characters in sequence can be chosen from the character set associated with the ``locked'' shift. The shift characters are shown in figure 3.11 as up-arrow, rectangle and circle.

Proper test data on the speed and accuracy of Quikwriting is thus far unavailable. In the original paper Perlin claims that users have achieved speeds up to three times as fast as with Graffiti while maintaining a comparable error rate perlin98. However, in a recent posting to the quickwrite-talk mailing list Perlin announced new anecdotical information which suggests that Quikwriting and Graffiti are equally fast, but that Quikwriting is more comfortable to use perlin1999. Quikwriting is a very promising system for pen interfaces because it guides the user with a visual interface and allows relatively high speeds for expert users. Whether Quikwriting is usable in other than pen interfaces remains to be seen. It may be possible for experienced users to draw Quikwriting gestures without visual feedback or with tactile feedback only.

Speech recognition

Producing a speech recognizer that is speaker independent and can recognize connected words with high enough accuracy in a suboptimal sound environment has proved to be a very difficult task. To be as good as a keyboard the speech recognizer should recognize all languages that can be written in the Latin alphabet. Currently such speech recognizers do not exist [Campbell1999]. However, the speech recognizer does not have to be perfect to be usable. Already, a recognizer that can speaker-dependently recognize words from a vocabulary of ten can be useful in a phone that can dial numbers that it hears. In a personal device such as cell-phone speaker dependence may be a good property. It prevents others from using the phone in a meaningful way. However, in other purposes such as public phones, light switches, locks and automated-teller machines speaker independence is a requirement that cannot be dropped.

However, to be good for a general text input method speech recognition has to develop. There still are languages that can be written with a keyboard, but do not have a functional speech recognition software. Also, recognizers that are considered to be close to the state-of-the-art today do not work well enough with all speakers [Barber1997].

The problems with speech recognizers are similar to the ones encountered in online handwriting recognition. Speech, like handwriting, does not have the kind of a structure that computers can easily handle. Speech has ambiguous parts and human speakers tend to be rather sloppy with pronunciation. Of course speakers can be taught to speak so that the recognizer can understand them, but if we are going to teach the users, why not teach them touch typing?

Gesture recognition

Gesture recognition is a term that covers a wide range of various techniques. For this study it is enough to make a distinction between two groups of gesture recognition systems. The first one is the rather traditional convention of using gestures drawn with a stylus on the sensing surface of a pen based user interface. These gestures are used to accomplish various tasks that might require several key presses on a keyboard or a sequence of menu selections when using a traditional windowed GUI. Thus, gestures do not directly compete with text input methods in pen interfaces. However, they can replace some text input in some situations. The second group of gesture recognition systems are those that track human gestures such as hand movements in three dimensional space using techniques like data-gloves and machine vision.

Pen interfaces

Traditional windowed user interfaces can be used with touch screens or digitizing tablets and styli. An interface used with a pen gives a more direct feeling than the same interface used with a mouse. The indirectness arising from the use of menus may also start to seem unnecessary. Many menu commands can be replaced with gestures that are drawn on top of the object that is being manipulated. Prototypes of such systems have been tested at least on spreadsheets [Burnett and Gottfried1998,Watson1993], word processing applications [Watson1993], and air traffic control workstations [Chatty and Lecoanet1996].

Gestures can partly replace text input in some situations. However, the gesture sets seem to be application specific. This means that gestures needed in word processing and spreadsheets may be different from each other, but most certainly are different from gestures needed in a graphics package. Gestures may be more intuitive to use than written commands, but they do cause cognitive load that is partly avoided by using menus that can easily be searched for the right command. Moreover, gestures cannot completely replace menus for giving commands or general text input methods for writing. Thus, it seems that gestures may be suitable for expert use, but a well known text input method is still needed for novice users.

3-D gestures

Three dimensional gestures can be used in a way that is similar to the use of gestures in a pen interface. The three dimensional gestures simply replace menu items or function keys. However, three dimensional hand gestures can be used for text input too. Hearing impaired people have for a long time used sign language as a communications medium. With modern sensors such as data-gloves and video cameras we can measure the movements well enough to recognize them. Researchers at Hitachi Central Laboratory have reported some success in recognizing Japanese Sign Language using data-gloves [Sagawa et al.1997]. The vocabulary was very limited and their recognition algorithm was trained with data from the same speaker that was used for testing the recognition accuracy. However their results show that sign language can be computer recognizable at least in some situations.

Comparison of speeds and error rates

Because speed and error rate are the most important variables of a text input method, we should have some knowledge on the relative speeds of different methods. Table 3.1 contains a collection of speeds and error rates that were available in the papers listed in the references section. The unit used for the speed is words per minute (wpm). One word per minute equals to five characters per minute in most of the sources cited in Table 3.1. One cannot be very specific about the wpm unit because in handwriting the space character does not need to be written whereas in typing the spacebar must be pressed. All papers did not report whether the wpm figure includes spaces or not. Where speeds were given in different units they have been converted to wpm assuming that five characters per word includes spaces.


 
Table 3.1: Approximate speeds and error rates for some text input methods.
Method wpm errors source
Physical QWERTY 64 3,5% [Matias et al.1993]
Virtual OPTI 45 4,18% [MacKenzie and Zhang1999]
Two-Handed chord keyboard 42 <2% [Gopher and Raij1988]
POBox 41   [Masui1998]
Half-QWERTY 40 7,4% [Matias et al.1993]
Unistroke 37   [Goldberg and Richardson1993]
Quikwriting 28   [Perlin1999]
Graffiti 28 <1% [MacKenzie and Zhang1997]
Virtual QWERTY 23 0,6% [MacKenzie et al.1994]
Cirrin 20   [Mankoff and Abowd1998]
T-Cube 16   [Venolia and Neiberg1994]
Microsoft handwriting recognizer 16 1-20% [MacKenzie et al.1994]
Virtual ABC keyboard 13 1,1% [MacKenzie et al.1994]
FOCL 10   [Bellman and MacKenzie1998]
 

The speeds and error rates are not directly comparable due to various reasons. The original authors may have used different procedures for measuring the speeds. Some of the procedures were aimed to find the maximum speed of a method while others concentrated on accuracy. Some studies were longitudinal and thus the test subjects were trained in using the methods. Other studies measured walk-up performance with no or very little training.

The speed for POBox is given as Japanese Kanji characters. Kanji is a word level pictogram character set and thus each character has roughly the same expressive power as an English word. Within the English language this would mean that one Kanji character per minute equals to one wpm. In Table 3.1 the comparison between POBox and other methods is done across languages, but still the speed estimate for POBox is close enough given the imprecise nature of the whole table.

When a measured speed or an estimate for an average trained writer was given, it was used. In other cases we had to estimate the speed from the maximum and minimum speeds given. Most experiments did not continue long enough to get a precise measurements for truly experienced writers. A good example of the huge variation caused by personal abilities is the range of physical QWERTY-typing speeds. Many people, like the author, consider themselves to be reasonably able typists while their typing speed is in the range of 40-60 wpm. Some professional typists reach speeds up to 150 wpm and beyond being three times as fast. QWERTY-typing has possibly the longest range of speeds because the learning path is so long. Serial typing as performed with the virtual keyboards probably has smaller variation of speeds because it offers a shorter learning path and the upper bound for the speed is limited by human motor capabilities. Multi-finger typing may be limited more by cognitive capabilities related to the parallelization of the finger movements than by the motor ability to speedily move one's fingers.

On the whole the numbers in Table 3.1 are not accurate, but do, however, give the approximate range in which the speeds vary. Furthermore while the speed order may not be correct for two methods very close in the table, two methods further apart are likely to be in the correct order. Also, based on the information in Table 3.1 we can conclude that in general over 40 wpm is good performance and below 20 is poor in comparison to other available methods.

Summary

With all the methods discussed above and the dozens that were not discussed it is hard to see which are the ones that have potential for developing to be the writing methods of the future. Goldberg and Goodisman suggested in their article that we should not be satisfied if we can do the same things with a computer that we have done for millenia using pen and paper goldberg91. We should look for ways of doing new things and doing old things better instead of doing the same things in new ways that are not intrinsically any better than the old ways. Thus one criteria for serious evaluation of a writing method is whether it is significantly faster, easier or more reliable than the old ones. If it is not, it probably is not worth investing into.

Some systems have been tested more carefully than others. These well investigated ones include handwriting recognition, speech recognition, and traditional keyboards. The good and bad sides of these systems are mostly known. This may be part of the reason why the new and relatively unknown handwriting methods like Quikwriting and Unistrokes seem to be very promising. Undoubtedly careful investigation will show faults in them that have not been mentioned thus far. Some of these faults may be bad enough to render the methods useless.

Regardless of which system proves to be the best, the reality today is such that the technically superior system does not always prevail. Marketing, politics and prejudices may change the situation so that technically bad systems become standard.


next up previous contents
Next: A Minimal Device-Independent Text Up: No Title Previous: Issues in Text Input
poika@cs.uta.fi