next up previous contents
Next: Design of the Tests Up: No Title Previous: Existing Text Input Methods

Subsections

   
A Minimal Device-Independent Text Input Method

In this chapter a new text input method called Minimal Device-Independent Text Input Method (MDTIM) is introduced. It draws heavily on some of the designs discussed above. At least Unistrokes, word level unistrokes, T-Cube and Marking Menus have had their impact on the design. However, the resulting combination is yet to be found by the author in existing literature. First the reasons for the word minimal in the name of the method are explained. Then a general tour of the method is given. The rest of the chapter will discuss some design decisions in detail concentrating on the implications of the unistroke capability, the choice of the code-tree, and input device specific details encountered in the implementation of the method.

Minimalism

Currently computing systems have many different input devices. There is a multitude of different keyboards, mice, touchpads, touch screens, pens, trackballs, joysticks ranging all the way to brainwave analysators and other rather exotic apparatuses. This long list of different devices makes one fear that if used for text input they would each require a different method. Users could probably learn to use all methods, but it would be a great burden for them. A burden to the user should always be considered a defeat for user interface design. Maybe one cannot design a method that works exactly the same way on all input devices. After all, the devices are different. However, skill transfer makes it beneficial to use a method that resembles in some critical respects another method already known to the user. A good example of the power of transfer effect is the success of the QWERTY-keyboard layout in virtual keyboards. Methods that allow for a significant skill transfer between different input devices are a desirable path of development. In search for a common input method one should probably look at the common features of all input devices. If they have a rich enough set of common features, the next step is to find out whether this set of features can be used for text input. The other side of using only features common to all input devices is that the set of features may turn out to be rather small. However, following this idea will lead to a text input method that works on all input devices. To ensure that it will work on all future input devices as well, one should make only minimal assumptions about the capabilities of the input device. Thus the guiding principle in the design should be to make do with as little as possible - i.e. minimalism.

The method in general

Many input devices allow input of at least four separate entities. On a joystick these are the four principal directions. Accordingly the four entities used for input will henceforth be called directions regardless of the input device and the actual data that is fed to the computer. In the following discussion the directions are called North, East, South and West (or N, E, S and W) just like in the compass. On a keyboard any four keys can be used for inputting the directions. On all mouse-like two-dimensional pointing devices the mapping is also easily achieved. Thus we have a set of entities that can be input with many, if not all, input devices. What remains to be done in order to turn this revelation into a text input method is the design of usable coding of text with the four entities we have. Mathematically an efficient coding would be something like Huffman-code. That is, an uneven-length prefix code with minimal number of bits per alphabet. We would construct the code using character frequencies computed from the expected input. The resulting code would guarantee that minimal number of directions would be needed for average input. However, this straightforward mathematical approach needs some refinements in order to be usable. Firstly, if we want to maintain character-level unistroke capability, we cannot allow a direction to repeat consecutively. Thus, NN, EE, SS and WW are illegal combinations if we wish to be able to draw a letter without lifting the pen (assuming an input device that deals with drawing and pens). Furthermore, if we wish to maintain multi-character unistroke capability, we cannot have any letter's code starting with the same direction that another letter's code ends with. I.e. if we have a letter code NWS, we cannot have letter code SNE or any other letter code that begins with S. A general multi-character unistroke capability would mean that on a pen-interface-like situation the user never needs to lift the stylus. The price to be paid for general multi-character unistroke capability is thinning of the coding tree. With four directions we quickly find so many branches illegal that the letter-codes will grow much longer than they would without unistroke capability. Secondly, we may not want to keep the code strictly optimized for the number of directions needed for the expected text since this may result in letter codes that cause the user either cognitive or motor difficulties. In other words, we may want to use transfer effect in teaching the new alphabet as was done with Unistrokes and Graffiti, or we may want to avoid direction combinations that are difficult to input using some input devices. Device specific measurements on user performance do exist. Venolia and Neiberg measured speed differences in simple flick gestures using a pen interface venolia94. Goldberg and Richardson goldberg93 present some data on the speed differences in drawing the Unistroke alphabet using a similar pen interface. A mouse mounted touch-pad has also been used to measure user performance while inputting gestures very much like the MDTIM alphabet with maximum length of two directions [Balakrishnan and Patel1998]. The results show significant differences in the difficulty of various gestures. It is, however, hard to tell how much of this is due to cognitive and how much is due to motor difficulties. Furthermore, it remains unclear how much of the motor difficulties depend on the input device used. MDTIM should be device-independent and thus design decisions cannot be based on experiences on one or few devices only. A thorough psychomotoric study involving a large number of input devices might reveal whether the differences found in device specific studies can be generalized.

Unistroke-capability and segmentation

Determining whether we really need unistroke capability and exactly which codes are cognitively or motorically difficult, cannot be done without some hard data. As described above acquiring this data requires an effort that far exceeds the time frame of the current study. Therefore, the design decisions on these two points are currently based on slightly educated guessing as follows. From the user's perspective the character-level unistroke capability seems to be a good thing to have on a pen-interface. Repeating directions would also be somewhat challenging to extract from the data that some input devices such as mouse-like pointer devices and joysticks generate. For example, how do we detect the change from the first N to the second N when all we get is a lengthy stroke to the general direction of N. If we were to measure the length of the stroke, we would force the user to coordinate her movements much more carefully. This seems like a bad thing to require from the user. The word-level unistroke capability is a good thing as well, but we can live without it. However, a compromise is also available. We will allow multi-character unistrokes as long as they do not contain repeating directions. With some luck the whole word can be input without any conflicts. In many cases a word can be input in two pieces and more segments are needed only rarely. Along with the lack of general multi-character unistroke capability goes the need for explicit user initiated or implicit system generated segmentation of the input stream needed to mark a repeating direction. When we have to repeat a direction, we may require the user to lift the pen, click on a button or give us some other signal or we can generate the repeating direction in software. The user-initiated segmentation was chosen for the empirical tests reported later in this document.

Mid-character correction

Because repeating directions are illegal within characters, they can be used for something else. A good use is a character level undo or mid-character correction capability. This is an important feature because unlike in a keyboard, one character is composed of more than one part and the parts are input at different points of time. The user may notice that she is writing a wrong character after some directions have already been input. In this situation the user should be able to restart drawing the character from scratch. MDTIM uses a repeated direction as a token that means that the user wishes to undo the current character.

   
The character codes

As described above, the code used for text input should be the best approximation of a statistically optimized prefix-code that is easy for the user to learn, remember and draw. Character-level unistroke capability requires that no child in the coding tree may have the same name as its parent (for example an ``N'' parent may not have an ``N'' child). Thus an inner node in the coding tree may not have more than three children. In the relaxed form of multi-character unistroke capability that was chosen, the root node may have four children, and all three leaves are allowed. The coding tree could be drawn to illustrate the code. The tree is a useful concept if a detailed proof or description of the algorithm is needed. However, the user is unlikely to find the tree helpful when writing with MDTIM. Therefore, two other kinds of visual representations are used. First, in text it is sometimes useful to refer to the characters with the corresponding direction strings. Thus an ``a'' may be written as NSW. This representation is enough to describe the action needed for inputting the characters. However, it does require some decoding for novice users. Therefore, another representation is used for reference cards and visual feedback during training. Examples of this representation are shown in figure 4.1.
  
Figure 4.1: MDTIM visualizations
\begin{figure}
\begin{center}
\setlength{\unitlength}{1mm}
\begin{picture}
(...
...{mtimlen}}
}{}
\put(122,30){ESW}
\end{picture}
\end{center}
\end{figure}

As seen in figure 4.1 a character begins with a dot. A thick line continues from the center of the dot to the first direction. Each subsequent direction continues from where the previous one ended except when two subsequent directions are opposites (NS, SN, EW, or WE). In this case the starting point of the last direction in the pair is offset slightly to right for NS or SN pairs and down for EW or WE pairs. The effect can be seen in the NSW character in figure 4.1. The only ambiguity problem in this visual representation is a four direction loop like ENWS which becomes indistinguishable from NESW. The ambiguity holds only for the static visualization. Dynamic visualization and the online recognition situation have a time component that clearly distinguishes between opposite four direction loops. Now that we have a general understanding of the tools that are at our disposal, we can start working on the actual coding. Which direction string should match which character code? For speed and general economy in input we should find the most frequent character codes and map them to slightly shorter direction strings than the less frequent ones. But what exactly is a slightly shorter direction string and how long is a long direction string? To answer these questions we have to return to the tree representation of our input space. Table 4.1 lists the maximum number of MDTIM codes for direction densities from 3 to 8 and code lengths from 1 direction to 5 directions. Because we are not allowed to repeat directions, two direction MDTIM is pretty useless as the code length is exactly the same as the tree height. More than eight directions is too much because the error rate grows unacceptable for text input (see section 3.3.1, [Kurtenbach and Buxton1993] or [Balakrishnan and Patel1998]). Although the decision about using four directions for MDTIM is based on different reasons (such as error rate, minimalism and skill transfer from compass and general human tendency to prefer right angles), the code length table shows that four directions is not such a bad choice. It gives a reasonable number of codes with relatively short code length.
 
Table 4.1: Maximum number of codes as a product of number of directions and tree height.
  number of directions
length 3 4 5 6 7 8
1 3 4 5 6 7 8
2 6 12 20 30 42 56
3 12 36 80 150 252 392
4 24 108 320 750 1512 2744
5 48 342 1280 3750 9072 19208
 

Before we can say how many codes we actually need, we need to make some design decisions. First the handling of uppercase letters. In MDTIM the uppercase characters have the same codes as the lowercase characters. A modifier is used to choose between the two. With many input devices the modifier is a button which is pressed once during the drawing of a character to choose the uppercase character. The procedure is very similar to using the shift key on a QWERTY keyboard. Another conceivable modifier would be a special code such as the different ``shift'' gestures that are used in Graffiti. Having the same code for upper and lowercase letters makes learning MDTIM easier because fewer codes need to be memorized. The Second choice is whether we want to use the modifiers for other purposes as well. In MDTIM we use this feature sparingly for character pairs that are strongly associated and thus hopefully easily remembered. These pairs include characters like parentheses, ' and ", - and +, and comma and semicolon. The goal is to keep the method minimal and having too many special functions makes the system complex rather than simple. Whether these pairs are easy enough to remember remains to be seen. Table 4.1 shows that using four directions with code length of four we have a maximum of 108 codes to use for the lowercase alphabet and all other characters that we wish to use in our writing. Is this enough? The 102-key QWERTY keyboard has 56 different keys in the biggest keyblock that is used for most writing tasks. Some of these have two or more functions attached to them, but having the uppercase letters accessible with a modifier, we have enough codes for writing English language. Actually we have enough codes to give some characters shorter codes. Note, however, that one code with a length of two will consume a branch from the coding tree that would allow nine codes with length of four. Therefore, we must use the short codes very sparingly[*]. The character frequencies that we need can be extracted from the data discussed in chapter 2. The data is not representative of all languages and writing tasks. Therefore we do not claim that the resulting character codes would be the best possible. Rather, the code given in this study is an ad-hoc construction created to quickly evaluate the MDTIM concept. Especially the digram frequencies were given very little consideration. Nine of the top 35 digrams shown in figure 2.2 cause a word-level unistroke conflict (repeating direction) in both Kernel and Gutenberg data. Only seven of the 35 digrams listed for the Soukoreff data cause a conflict, but still better digram organization could be possible. We decided that it was more important to get characters like space, a, r, and n forms that resemble the Latin alphabet than to make rapid multi-word unistrokes possible. However, some ideas that are important parts of a universal text input method are represented in the code. Firstly, we reserve space for language specific characters because we do not want to have a completely different code for different languages. We do not want separate language specific character sets because we need a good international walk-up performance so that MDTIM can be embedded, in objects and interfaces that must be immediately usable by all people. Clearly, this approach overlooks languages that do not use the Latin alphabet. However, as rude as it may sound today, we believe that in the future languages such as Chinese, Japanese and Korean that use very many characters will sooner or later start using the Latin alphabet or some slight modification as an official writing method. Already today every student of English language in those countries has to learn the Latin alphabet anyway. As the use of English language increases, the question of abandoning the problematic native character set must rise sooner or later. Most of the variation within languages that use the basic Latin alphabet amended with special characters such as the French á, â, ç, ë, é, è, ê, ï, ô, \oe, ü, ù, and û can be accommodated either by including a special language specific code area or by having operations that allow the construction of the special characters of components. In our code characters å and ä, which are two of the special characters used in Finnish, both occupy a level three branch as place holders for more numerous language specific characters in other languages. Table 4.2 lists the direction strings for the lowercase characters of the Finnish alphabet and the basic control characters. See appendix A for a more complete listing of the MDTIM codes used in the tests discussed in chapter 5.
 
Table 4.2: MDTIM direction strings for a core set of characters.
Character Direction string Character Direction String
a NSW p WNEN
b SEW q WSES
c ESW r WSN
d SWE s ESE
e WES t SNE
f ESNE u SEN
g ESNS v WNWS
h WSWS w WNWN
i WNS x SWSN
j SESW y SWSE
k WSWE z WSWS
l SNS WEN
m WSWN NSE
n NSN WNES
o WSEN    
return SEN backspace NW
space NE    
 

We see that most characters were coded with four directions, few with three, and two with only two. The justification for the two short codes is as follows. Table 2.1 shows that space is the most common character. Therefore if we will give a very short code for some characters, space should be included. The number of backspaces is clearly underrepresented in Table 2.1 because backspace does not appear at all in regular texts. However, backspace is used a lot in writing. With keyboards most corrections are done using it. While MDTIM is not a keyboard, the backspace concept is too good not to be used in MDTIM too. A 6 percent error rate would bring backspace to the second place in the ranking in Table 2.1. While 6 percent error rate is unacceptably high for expert performance, it may not be uncommon among beginners. If corrections are difficult to perform, the users may judge MDTIM difficult and unappealing to use. Therefore, we want to give backspace a short and fast code.

   
Compatible input devices

Now that we have described the method in general, we will show how it can be fitted for use with some popular input devices. The devices we will discuss are mouse, touchpad, trackball, joystick and keyboard. This list reflects the functionality that we implemented for the test software and is more due to easy availability and simple implementation than a result of careful consideration. However, one of these devices can be fitted into almost any imaginable computing device and therefore if MDTIM proves to be functional with all these devices, it is a good candidate for a universal text input method.

   
Mouse

Mouse and all devices capable of generating mouse-like input are handled in the same way. This simplifies the implementation (the standard mouse drivers can be used) and gives a chance to test the hypothesis about reaching device-independence through minimalism. Which, of course, has already been proven partially true because all devices have already been fitted together with the mouse driver. Dragging has been found to be more difficult than merely pointing with the mouse [MacKenzie et al.1991]. For this reason we do not want to require the user to use the mouse buttons more often than is necessary. Therefore the segmentation needed for producing repeating directions is done with a timer that fires after 70 ms of inactivity. This means that when the user wishes to repeat a direction, she needs to stop inputting directions for 70 ms and then input the last direction again. With a mouse this means stopping the mouse for a short period of time and then continuing the interrupted move. A delay of 70 ms was chosen with trial and error. It felt right for the author, who at that time was relatively untrained in MDTIM. The sample rate of the mouse dictates the minimum value for the timeout. Usually the sample rate is around 30 samples/second and thus the timeout should be longer than 34ms. Some hardware, such as the CH TrackBall Pro model for PS/2 connector that was used in our experiment have a faster sample rate and thus allow shorter timeout which may be useful for fast writers. When the MDTIM recognition algorithm receives a timer signal, it resets the flag that instructs it to ignore repeating directions. When the movement continues the recognition engine will receive more coordinates and recognize the direction. It will then append the direction to the current direction string. If the direction is a repeat, the algorithm will fail to recognize the direction string as either legal partial character or legal complete character and reset the current direction string thus accomplishing mid-character undo.

Touchpads, touch-screens, pressure sensitive pens and digitizing tablets

Touchpads, touch-screens, pressure sensitive pens and digitizing tablets usually offer an operating mode in which they can be used as mouse substitutes. Thus they appear to MDTIM as mice. Repeating direction is generated with the timer method described in section 4.6.1. While many devices offer an operating mode in which they produce a three degree of freedom input (x, y, pressure), we chose to ignore the pressure coordinate. There are reasons for this besides the simplicity of implementation.
1.
Minimalism suggests to use as simple input as possible.
2.
Some other input devices have two degrees of freedom. Thus user interface compatibility requires that we only use two degrees of freedom.
3.
In some conditions, such as under vibration in a moving car, or with half-frozen fingers in the winter, the pressure component will be too inaccurate to be useful.
As has been demonstrated [Goldberg and Richardson1993,Venolia and Neiberg1994], unistroke characters can be explicitly segmented using the pressure component of the digitizing tablets. Therefore ignoring the information on whether the pen, finger, or other appendage used for writing is touching the surface or not means giving up on information that some researchers have considered very significant for the success of a text input method. Examples of systems built around the touch information are all unistroke character sets and some virtual keyboards. We hope that the word-level unistroke capability and device-independence will compensate for the potential weakness introduced by ignoring significant components of input on some devices.

Trackball

A big trackball should be very much like touchpad as an input device for MDTIM except for the inertia of the ball. I.e., the only difference is that instead of sliding the finger on a surface, the surface (the ball) moves along with the finger. The posture of the hand and the movements themselves are very similar. A small trackball on the other hand behaves like a small joystick except that the ball is not self-centering and thus the writer needs to lift her finger when it hits the ball casing. Thus, it seems that trackballs can be used for input with MDTIM. They exhibit only slightly different interface and it is interesting to see whether the differences make trackball better or worse than other similar devices. The handling of the input is done as described in section 4.6.1.

   
Joysticks

A joystick is a device with a stick upon which the user may exert some force. The force or in many cases the position of the spring loaded stick is measured and translated into a digital input. Digital joysticks usually offer the capability to feed eight different inputs to the system. Their analog counterparts can be used to input a point in an n x n matrix with a realistic resolution for n well above 10. For the purposes of this text input system we reduced (in software) all types of joysticks to simple four-switch devices capable of inputting the four directions. Figure 4.2 shows the input matrix of an analog joystick. The areas that cause a direction to be generated are marked with the initial of the direction (N, E, S and W). A direction is generated when the stick is tilted over the area. A new direction cannot be generated before the stick has visited the shaded area in the center. The shaded square (c) in the middle and the slices extending to the corners of the area (d) are dead zones inside which the input is ignored. Both dead zones exist to stabilize the input. Typical joysticks are not very accurate devices. The input may jump around up to several percent of the whole range. Therefore when the stick is in the middle or tilted to approximately in the direction of the diagonal the actual input generated may jump around on both sides of the diagonal. This would produce a rapid burst of directions which would ruin all but the most skilled attempts of inputting anything sensible. A joystick offers a very natural way for explicit segmentation between directions. Most joysticks are self centering and therefore returning to the center position between directions is almost automatic.
  
Figure 4.2: The input space of an analog joystick.
\begin{figure}
\begin{center}
\epsfig{file=figures/joystick.eps,width=10cm}
\end{center}
\end{figure}

Keyboard

A keyboard with a minimum of five keys can be used as a MDTIM input device. Using only four keys of a standard PC keyboard for MDTIM input is probably not a very smart move outside laboratory experiments. Instead one might imagine using a small keyboard with fewer keys. Such keyboards can be found in pagers, remote control devices and mobile telephones. Some of these special-purpose keyboards already have a direction control device similar to the direction control hat found in gamepads or four separate keys in a similar arrangement. The output of these keyboards, however, is keyboard-like. The keys produce unique keycodes when pressed or depressed or both. Four of these keycodes can be mapped directly to the four directions. There are no ambiguous diagonal or center areas and no need for filtering of minute movements. Repeating a direction means simply pressing a key twice.

Learnability

As the developers of existing input methods have pointed out [Venolia and Neiberg1994,Kurtenbach and Buxton1993], a method that wishes to allow speedy input should offer the user an easy way to get started and room for learning seamlessly as she writes. A classic example of this is the learning curve with a QWERTY keyboard. A new user can ``hunt and peck'' and with experience move to two-handed operation and with some more formal training to touch typing. MDTIM like Unistrokes does not allow as easy start as one would wish. The alphabet has to be learned in order to start writing. However, the growth path should be there. For example, with a touchpad a user will probably start by writing one stroke at a time, lifting her finger between every stroke while planning for the next one. As she grows more confident with the new character set, she will probably start drawing one character with each stroke. With even more practice word-level unistrokes may become feasible. MDTIM is very forgiving with the timing of the strokes. At any point the user may take an indefinite amount of time to ponder what to do next. Due to using only four directions, MDTIM is also forgiving with the direction of the strokes. The figures for response time and error rate should follow those measured by Kurtenbach and Buxton for marking menus and thus be in favor of four directions (see section 3.3.1 or [Kurtenbach and Buxton1993] for details). In MDTIM the length of the strokes beyond the required minimum is completely irrelevant. Together with the easy mid-character correction capability these qualities should make the user at ease while learning MDTIM and allow the development of a personal ``handwriting'' style that can deviate rather far from the model characters without degrading the recognition accuracy. Figure 4.3 shows a set of strokes that would be interpreted as NSW when fed to MDTIM. The large difference in the size of the different stroke patterns may be due to amplification generated by the mouse driver. This is, however, of no consequence since only small movements are filtered out. After the movement reaches the size (speed) that passes the filter, further growing does not hurt.
  
Figure 4.3: Four strokes interpreted as NSW (a-d) and the canonical representation (e).
\begin{figure}
\begin{center}
\epsfig{file=figures/NSW.eps,width=12cm}
\end{center}
\end{figure}

Summary

We have described a device-independent text input method that utilises four principal directions to construct prefix codes that have been associated with characters. The method gives the writer a lot of freedom. There is no time pressure on the writer. She can draw the directions separately or a character at a time and often a string of characters with a single stroke. The speed and size of the characters may be freely chosen as long as a minimum threshold is exceeded. Unfortunately the users have to learn the prefix codes in order to use the method with any kind of speed. We chose to use only a very limited input that can be generated with very many different input devices. This has the obvious drawback that almost all devices have more advanced features that might allow much faster, easier or less error prone ways to write. The advantage of our approach is the ability to use the same writing method with all devices.


next up previous contents
Next: Design of the Tests Up: No Title Previous: Existing Text Input Methods
poika@cs.uta.fi