To give the reader an idea of the multitude of methods that have been developed over the years an approximate map of the field is given in this chapter. Examples of some important text input methods are presented with short discussion about their strengths and weaknesses.
The map takes a form of a tree seen in figure 3.1. The tree has five main branches: keyboards, text recognition, unistrokes, speech recognition and gesture recognition. Each branch has a number of sub-branches of which the most relevant ones are shown in figure 3.1.
We intentionally avoid calling the map a taxonomy. The main reason for this is that we do not have unambiguous criteria for the classification. We wanted to give a special place for text recognition as computationally expensive and algorithmically difficult method and for unistrokes as computationally simple alternative. The rest of the map is built around these two main branches so that it accommodates most of the methods that have been developed for small portable computing devices.
Today, as computing devices are getting ever-smaller, there is an increasing interest in new, smaller, keyboards. Simultaneously keyboards have shown yet more evidence of their attractiveness. Like many other things in our time, they have moved to virtual worlds. Of all possible text input methods virtual keyboards displayed on a touch screen seem to be the best choice in many situations [MacKenzie et al.1994].
The range of different keyboards available today is so large that it is pointless to list all the different devices here. Only a few main branches of the keyboard family tree are discussed. The two main branches are the physical and the virtual keyboards. These branch again into sub-branches (see figure 3.1). The most significant physical keyboard types are traditional QWERTY-layouts, different optimized layouts with full number of keys, intelligent keyboards with fewer keys, and chord keyboards. The virtual keyboard branch has the same sub-branches except for the chord keyboard branch.
The difficulties in learning QWERTY-touch-typing may have many explanations. Three very convincing reasons given by Gopher and Raij gopher88 are:
The improvement in speed over QWERTY is not as great as one might expect. The main reason for this is the parallel nature of typing that was described above. The finger travel is not such a great problem, because it can occur in parallel. Better alternation between the hands may, however, help to achieve better parallelism. Significant speedup can also occur within the set of characters pressed by the same finger and in the difficulty of typing repeating characters.
Dvorak-layout does require less finger travel, and the most frequent strokes do not require as much finger acrobatics as they do with QWERTY-layout. The more natural operating posture of the hands is probably responsible for the claims that Dvorak keyboard causes fewer repetitive stress injury cases than QWERTY.
Although the Dvorak-layout is not always clearly faster, it seems to be at least slightly faster. Much of the information available is claimed to be biased by the ``holy war'' on keyboard superiority that has been going on since the Dvorak layout was first published.
A significant problem with the optimized layouts is that their benefits are to an extent tied to the language or task that they were optimized for. Using an optimized layout for every language could cause too many problems to be a useful solution.
A universally optimized layout could be better, but even this solution may not be significantly better than QWERTY because the advantage gained from the optimization may be in the scale of just few percent due to the counterbalancing effect of the different tasks and languages. For example the left curly bracket ``{'' is very rare in English, but very common in C++.
The Half-QWERTY keyboard [Matias et al.1993] shown in figure 3.3 is an example of a keyboard where the number of keys has been halved, and the number of operating modes has been increased. The keyboard can be used with either left or right hand. Both arrangements are shown in figure 3.3. The spacebar key is used for switching either the actual half of the keyboard or the mirror image of the other half active. The missing side of the keyboard is mapped as a mirror image on the existing side. Matias et al. demonstrated significant skill transfer from regular QWERTY-keyboard in learning to type with Half-QWERTY. They estimate that with adequate practice typists will achieve up to 88% of their two handed QWERTY-typing rate with Half-QWERTY.
The Single Hand Key Card (SHK) is an eighteen key keyboard designed to be used with four fingers of one hand. The card is held between the thumb and the fingers with the base of the card resting against the palm. The key arrangement is shown in figure 3.4. Most of the keys are labeled with two characters. When a user wishes to input a word she presses the keys on which the desired characters are seen. When the key-sequence for the word is finished the user presses the ambiguity resolution key (AR). The SHK support software takes the key sequence and searches a dictionary for words that can be constructed using the characters imprinted on the pressed keys. [Sugimoto and Takahashi1996]
Sugimoto and Takahashi do not give precise data on user performance, but they expect speeds faster than 40 words per minute (wpm) to be possible. SHK is a system for inputting English text. It does not support other languages. The keyboard and the dictionary will have to be tailored for each language. As shown in figure 3.4 SHK has a small joystick with three switches for mouse-like input. Sugimoto and Takahashi clearly tried to make SHK a complete solution for mobile interfaces. Without further testing it is hard to say whether SHK would outperform a pen interface. At least it should allow for more eyes-free operation than most pen-interfaces.
The traditional way of inputting text with a mobile phone keyboard is that each key is mapped to several characters. Pressing the key once produces the first character, pressing twice the second etc. The speed of writing is a lot slower than it is with a full sized keyboard. With a dictionary the method can be improved at least in two different ways. First, each key is mapped to several characters as before and characters are entered exactly the same way. The dictionary is scanned for words with a beginning that matches the string input so far. When the typing proceeds and the word becomes unambiguous, the rest of the word is inserted and typing proceeds from the end of the word. The second way is to do the dictionary search with an algorithm similar to the one used in the SHK support software. Each key is pressed only once and the dictionary is searched for words that match to any sequence of characters that is mapped to the keys in question. Both of these speed-improved methods require a backup method for inputting words or character sequences not found in the dictionary.
The intelligent keyboards discussed above have a more complex algorithm for setting the keyboard state. In intelligent keyboards the state of the keyboard is implicit because the computer handles it automatically. With chord keyboards the large set of states is not hidden. Instead, the user explicitly chooses the keyboard state by pressing several keys simultaneously.
In their 1988 article Gopher and Raij give some of their results on experiments with a two-hand chord keyboard gopher88. Their keyboard has two similar units of five keys and an extra shift-key for the thumb. The units are mirror images of each other and each is operated with one hand. The five keys allow 31 different combinations. Each shift key gives another 31 combinations. Thus even one hand can be used to input English lower and upper case letters and there still is room for several other characters.
Gopher and Raij compared the rate of increase in typing speed in three test groups. The first group used a two-handed chord keyboard. The second group used a one-handed chord keyboard and the third group used a traditional keyboard with Hebrew layout.
They found that for the first 25 hours of practice the chord keyboards were clearly faster. After that the two-handed chord keyboard continued with faster learning rate and the one-handed chord keyboard was roughly equal to the traditional keyboard in learning speed. At the end of the 35-hour training group one was typing at an average speed of 42 words per minute. Group two reached the speed of 36 wpm and the group with the traditional keyboard finished with the speed of 24 wpm. Clearly learning was faster with the chord keyboards.
The speeds for any of the groups had by no means peaked yet. Two of the two-handed group continued up to 50 hours of practice and reached the speed of 51 wpm. One continued to 60 hours and finished with 59 wpm. However, Gopher and Raij suspect that the speed increase with the chord keyboards will level off sooner than with traditional keyboards. Thus with training period significantly longer than 35 hours the traditional keyboard will start gaining in the speed comparison eventually finishing with superior speed. The reason for this is the possibility of parallel preparation of the strokes. Chord keyboards are more serial because many consequent characters require the same keys and fingers to be used.
However, the training required for superior speeds with traditional keyboards may be too much for most users. Furthermore, the skill may be so complex that it requires constant formal training to keep. Therefore the chord keyboard could be better for casual typists. It gives good enough speed and is faster to learn. According to Gopher and Raij the chord keyboard does not exhibit negative transfer from existing typing skills because the cognitive structure of the keyboard and the typing activity is different and better suited for human capabilities.
Despite their good qualities chord keyboards have not gained much popularity outside some special areas such as mail sorting or stenography. One reason for this is that in comparison the QWERTY-keyboard seems deceivingly easy to use. Having a separate key for each character makes QWERTY look so easy that most people do not bother to train themselves in typing. Often they are not aware that their typing, despite the illusion of productivity, is in fact slow and laborous.
By killing the parallelism, the virtual keyboards give new value to finger-travel optimization. The difference in speed between an unoptimized layout such as QWERTY and a layout optimized for single-finger typing should be much greater with virtual keyboards than the difference between physical QWERTY and ten-finger optimized layout such as Dvorak.
OPTI is one of the optimized virtual keyboard layouts for the English language. Figure 3.5 shows the OPTI layout as described by MacKenzie and Zhang mackenzie99. The keyboard layout was optimized for speed using trial and error, Fitts' law, and character and digram frequencies in English. Fitts' law gives a function for computing the tapping time given the length of the movement needed and the width of the target thus enabling a researcher to compute a prediction for the upper bound of user performance given the keyboard layout. Trial and error is needed to generate the keyboard layouts.
According to the calculations of MacKenzie and Zhang mackenzie99 the OPTI layout is theoretically 35% faster than QWERTY and 5% faster than FITALY [Isokoski1998,Textware Solutions1998] which is another one-finger ``optimized'' layout. In a longitudinal study described by MacKenzie and Zhang the speed difference between OPTI and QWERTY seemed to exist in the real world too. The test group of five previously QWERTY aware students reached their QWERTY tapping rate in just ten 22 minute sessions. At the end of the 20 session experiment during which the subjects received equal amount of training in both QWERTY and OPTI tapping, OPTI was clearly faster with average speed of 45 wpm. With QWERTY-layout the group reached the speed of 40 wpm. MacKenzie's and Zhang's test subjects were instructed to aim for both speed and accuracy. Emphasis on speed may have contributed to the error rate which was over four percent for both keyboard layouts. The error rate with OPTI was consistently slightly lower than with QWERTY.
The emphasis on FOCL is to minimize the key presses that are needed for operating the virtual keyboard. POBox is a system for doing a similar thing using a pen interface. The keyboard layout is static, but on each key press a menu with fluctuating layout is shown. The menu holds the most probable completions for the current word. A POBox keyboard with an active menu after pressing the f-key is shown in figure 3.6. When the user lifts the pen after pressing a key the menu moves below the keyboard. [Masui1998]
In addition to English, the POBox has been implemented for the Japanese language. In Japanese operation the keyboard consists of the 46 characters of the Hiragana phoneme alphabet. Thus ideally the user types in the beginning of the pronunciation of the word and then chooses the written form of the word (a mixture of Kanji and kana characters) from the menu as soon as it appears there. While the English writing mode may not seem very inviting, Masui claims that the Japanese mode is very useful because Japanese with its numerous characters and ambiguous pronunciations makes typing a slow and difficult task even with a physical keyboard. [Masui1998]
The problem with many intelligent keyboards, physical and virtual alike, is that they rely on visual information. Thus they cannot in general be used in eyes-free manner. Often, this is not a problem, but when eyes-free operation is needed (as often is the case in note taking), even the best method that requires visual attention becomes annoyingly clumsy.
This is not the case with handwritten text. The Cursive script produced by most English speakers is next to incomprehensible to a computer. In general the writing systems developed for human use are more or less problematic for algorithmic recognition. However, there are writing systems that have been developed for computer use. The most familiar of these is the bar code which is routinely read by computers without any errors.
There is not much objective test data available on the performance of different handwriting recognition packages. Much of the data that the manufacturers have available is marketing oriented and undoubtedly shows the software in slightly better light than is believable. The manufacturers may have achieved the results in laboratory with world's best specialists teaching both the algorithm and the user, but in real world the accuracy may prove to be worse. For this reason some of the accuracy figures listed by Tappert et al. [1994] probably belong to the marketing oriented category of results. Furthermore, the best accuracy figures are typically reached by training the recognition algorithms individually for each writer in addition to requiring the writer to write each character in separate boxes using a particular writing style.
MacKenzie et al. [1994] tested the handwriting recognizer distributed with Microsoft's Pen Windows. They used non-connected lower case alphabet without user specific training. The result was an entry rate of 16.3 words per minute with 8.1% error rate.
Even the most accurate handwriting recognizer is still bound by the limitations of the traditional character sets. Top handwriting speeds for most people tend to be around 33 words per minute [MacKenzie and Zhang1997,Karlsdottir1997]. That is about half of the speed that many people can reach with touch typing on a keyboard [Card et al.1983]. Thus it is questionable whether traditional handwriting should be used for text input even if an error free recognition system could be developed.
Figure 3.7 shows examples of the two main types of bar codes in use today. UPC/EAN codes on the left are the familiar 1-dimensional bar codes that most manufacturers use for product identification in Europe and North America. Datamatrix and PDF417 shown on the right are 2-dimensional bar codes that allow much higher data density.
As one probably noticed while looking at figure 3.7, bar codes are not easy for humans to read. The coding is based on the size and precise location of the elements. These are variables that a machine can easily measure, but which are almost impossible for humans to see with the required precision. The data in the codes may also be compressed and it may contain error correction information which further confuses human readers.
alphabet which can be seen as a forefather of a whole
family of unistroke text input methods that have appeared lately.
Today some unistroke text input methods have moved beyond the original one character per stroke rule. Therefore, the systems will be discussed in two groups: character-level unistroke systems and word-level unistroke systems.
for their PDA product series known as Palm Pilots.
From the user's point of view Graffiti resembles unistrokes in the sense
that characters are generally drawn in one stroke. Strictly speaking
Graffiti does not completely follow the unistroke idea
of one stroke per character. For example x can be drawn as two separate
strokes and characters like ä and ö must be drawn in two parts.
As seen in figure 3.8 Graffiti strokes are more complex than Unistroke strokes. The added complexity makes the recognition algorithm more challenging to program and potentially slower and more inaccurate. The work of the recognition algorithm is even more difficult because Graffiti alphabet resembles the regular Latin albhabet very closely. Some of the problems with ambiguous Latin script have been avoided. For example ambiguity between 0 an o is avoided by writing numbers on different area. However some characters remain very close to each other. For example H and L are easily mixed if the horizontal part of L is even slightly curved.
Despite the obvious problems with speed and ambiguity the Pilots and Graffiti along with them have been a commercial success. This should always be noted before claiming Graffiti to be inferior to other methods. In other words, although Graffiti is not perfect it works well enough and shows that users may value easy learnability over eventual expert performance.
Marking menus can have varying number of items per level and varying depths. Kurtenbach and Buxton measured user performance using menus with breadths of 4, 8 and 12 combined with depths from 1 to 4 kurtenbach93. They found that the response time of the user increases linearly with increasing depth. The response time is also greater when using menus with more items per level. In terms of speed it does not matter much whether we use a menu with three levels of four items, two levels of eight items or two levels of twelve items. However with depths greater than one the error rate increases dramatically when we move from four items per level to eight or twelve items per level.
The error rate for four levels of four items per level is around 5% while the error rate for one level of eight items per level is already in the same range. With more than one level and 8 or 12 items per level the error rate grows quickly above 10%.
Kurtenbach and Buxton also compared mouse and pen as input devices. Their results show that mouse and pen were roughly equal in speed and error rate when using four items per level with mouse being slightly slower and more error prone. With more items per level the mouse was clearly inferior.
All in all, it seems that to achieve the speed and accuracy needed for text input one should stay below four items per menu and use a pen if possible.
We have not encountered reports on using Marking menus directly for
text input. However, a slightly modified implementation exists.
This system is called T-Cube and is shown in figure 3.9.
The difference to Marking menus is that the initial menu is always shown.
Input is initiated by touching one of the slices of the radial menu.
The touch causes the second level of the menu to be displayed offset
so that it never is hidden by the user's hand on a touch-screen. A character
is chosen by moving the pen to the direction of the desired slice in
the second level menu. The second level menu is displayed just to remind the
writer of its layout and the proper direction for the stroke. The stroke is
drawn starting from the chosen slice of the first level menu to the direction
shown in the second level menu. T-cube uses 9 items in the first level menu
and 8 in the second. Thus the user can choose from 81 items by tapping
on a slice or performing a simple flick gesture. [Venolia and Neiberg1994]
The input used in T-Cube is very close to the minimum. One cannot make pen input much simpler than a tap or a flick. However, T-cube suffers from the limitations of virtual keyboards. It does not allow eyes-free operation because choosing accurately from the 9 initial slices is very difficult without seeing the menu.
The learning path of T-Cube is also keyboard-like. A novice can use ``hunt-and-peck'' like operation by looking for the desired character in the menus and then choosing it. An expert user will learn to choose from the second level menu without looking and potentially achieve very high speeds. Venolia's and Neiberg's results venolia94 show that learning to be an expert user with T-Cube is a long undertaking. None of their 11 subjects showed clear signs of decay in speed improvement during the course of four to nine learning session. The fastest user reached the speed of 20 wpm while the average was around 16 wpm. These speeds are already comparable to hand printing.
Unfortunately Venolia and Neiberg do not give figures for error rate. We can assume that the error rate is comparable to what others have found on 8-item pie menus [Kurtenbach and Buxton1993,Balakrishnan and Patel1998]. That means that T-Cube error rate is within or close to the range of 5% to 7.3%
Two new systems allowing words or even whole texts to be written with a single gesture were introduced in 1998. The first one called Cirrin [Mankoff and Abowd1998], while probably usable, does not offer great improvements in speed or accuracy, but the second one, Quikwriting [Perlin1998], is very promising.
Only lowercase alphabet are shown along the circumference in figure 3.10. Putting all 128 characters of the ASCII character set or the thousands in Unicode is clearly not an option. The slices would become too narrow to hit with the pen. The authors suggest using the non-dominant hand to input the characters not found in the Cirrin input area. The device operated with the non-dominant hand may be chosen freely. Mankoff and Abowd have used a regular QWERTY-keyboard. They also discuss using a regular handwriting recognizer for the task. In a mobile setting a QWERTY-keyboard is not an option. A handwriting recognizer might be workable, but that raises the question of whether Cirrin offers enough benefits to be worth the screen real estate if we will have a handwriting recognizer anyway.
While we have found no thorough test reports on Cirrin, it seems that the accuracy could be good enough for text input. The speed may prove to be a bottleneck since visual feedback is needed in order to hit the relatively small areas with the pen. Cirrin is very dependent on the pen-interface and offers no eyes-free operation.
The method works by dividing the input area into 9 zones in a 3x3 grid formation. The zones are numbered 1 through 9. The central zone (5) is the ``home'' zone from which each stroke starts and onto which all strokes end. Although the zones can be thought to reside in a grid formation, they are not necessarily rectangular. In fact the most functional layout is likely to resemble the Cirrin layout shown in figure 3.10 with only eight slices in the ring. A character is drawn as a loop that crosses one or several of the zones surrounding the home zone. Quikwriting recognizes the character from a sequence of zone changes. The changes that are taken into account are the ones from or to the home zone. For example the f in figure 3.11 is chosen because the pen leaves from zone 5 to zone 3 and returns from zone 2.
Because the space is limited on the Quikwriting input area, all characters cannot be input using this simple method. The method chosen for expanding the input space is to have different modes of the input area. The implementation that Perlin describes in perlin98 has four modes between which the user can switch by inputting a ``shift'' character. One shift character causes the next character to be chosen from the character set associated with the shift character in question. Two consequent shift characters ``lock'' the mode so that several characters in sequence can be chosen from the character set associated with the ``locked'' shift. The shift characters are shown in figure 3.11 as up-arrow, rectangle and circle.
Proper test data on the speed and accuracy of Quikwriting is thus far unavailable. In the original paper Perlin claims that users have achieved speeds up to three times as fast as with Graffiti while maintaining a comparable error rate perlin98. However, in a recent posting to the quickwrite-talk mailing list Perlin announced new anecdotical information which suggests that Quikwriting and Graffiti are equally fast, but that Quikwriting is more comfortable to use perlin1999. Quikwriting is a very promising system for pen interfaces because it guides the user with a visual interface and allows relatively high speeds for expert users. Whether Quikwriting is usable in other than pen interfaces remains to be seen. It may be possible for experienced users to draw Quikwriting gestures without visual feedback or with tactile feedback only.
However, to be good for a general text input method speech recognition has to develop. There still are languages that can be written with a keyboard, but do not have a functional speech recognition software. Also, recognizers that are considered to be close to the state-of-the-art today do not work well enough with all speakers [Barber1997].
The problems with speech recognizers are similar to the ones encountered in online handwriting recognition. Speech, like handwriting, does not have the kind of a structure that computers can easily handle. Speech has ambiguous parts and human speakers tend to be rather sloppy with pronunciation. Of course speakers can be taught to speak so that the recognizer can understand them, but if we are going to teach the users, why not teach them touch typing?
Gestures can partly replace text input in some situations. However, the gesture sets seem to be application specific. This means that gestures needed in word processing and spreadsheets may be different from each other, but most certainly are different from gestures needed in a graphics package. Gestures may be more intuitive to use than written commands, but they do cause cognitive load that is partly avoided by using menus that can easily be searched for the right command. Moreover, gestures cannot completely replace menus for giving commands or general text input methods for writing. Thus, it seems that gestures may be suitable for expert use, but a well known text input method is still needed for novice users.
Because speed and error rate are the most important variables of a text input method, we should have some knowledge on the relative speeds of different methods. Table 3.1 contains a collection of speeds and error rates that were available in the papers listed in the references section. The unit used for the speed is words per minute (wpm). One word per minute equals to five characters per minute in most of the sources cited in Table 3.1. One cannot be very specific about the wpm unit because in handwriting the space character does not need to be written whereas in typing the spacebar must be pressed. All papers did not report whether the wpm figure includes spaces or not. Where speeds were given in different units they have been converted to wpm assuming that five characters per word includes spaces.
|
The speeds and error rates are not directly comparable due to various reasons. The original authors may have used different procedures for measuring the speeds. Some of the procedures were aimed to find the maximum speed of a method while others concentrated on accuracy. Some studies were longitudinal and thus the test subjects were trained in using the methods. Other studies measured walk-up performance with no or very little training.
The speed for POBox is given as Japanese Kanji characters. Kanji is a word level pictogram character set and thus each character has roughly the same expressive power as an English word. Within the English language this would mean that one Kanji character per minute equals to one wpm. In Table 3.1 the comparison between POBox and other methods is done across languages, but still the speed estimate for POBox is close enough given the imprecise nature of the whole table.
When a measured speed or an estimate for an average trained writer was given, it was used. In other cases we had to estimate the speed from the maximum and minimum speeds given. Most experiments did not continue long enough to get a precise measurements for truly experienced writers. A good example of the huge variation caused by personal abilities is the range of physical QWERTY-typing speeds. Many people, like the author, consider themselves to be reasonably able typists while their typing speed is in the range of 40-60 wpm. Some professional typists reach speeds up to 150 wpm and beyond being three times as fast. QWERTY-typing has possibly the longest range of speeds because the learning path is so long. Serial typing as performed with the virtual keyboards probably has smaller variation of speeds because it offers a shorter learning path and the upper bound for the speed is limited by human motor capabilities. Multi-finger typing may be limited more by cognitive capabilities related to the parallelization of the finger movements than by the motor ability to speedily move one's fingers.
On the whole the numbers in Table 3.1 are not accurate, but do, however, give the approximate range in which the speeds vary. Furthermore while the speed order may not be correct for two methods very close in the table, two methods further apart are likely to be in the correct order. Also, based on the information in Table 3.1 we can conclude that in general over 40 wpm is good performance and below 20 is poor in comparison to other available methods.
Some systems have been tested more carefully than others. These well investigated ones include handwriting recognition, speech recognition, and traditional keyboards. The good and bad sides of these systems are mostly known. This may be part of the reason why the new and relatively unknown handwriting methods like Quikwriting and Unistrokes seem to be very promising. Undoubtedly careful investigation will show faults in them that have not been mentioned thus far. Some of these faults may be bad enough to render the methods useless.
Regardless of which system proves to be the best, the reality today is such that the technically superior system does not always prevail. Marketing, politics and prejudices may change the situation so that technically bad systems become standard.