next up previous contents
Next: Conclusions Up: No Title Previous: Design of the Tests

Subsections

   
Results

In this chapter we present the results from the experiment that was described in chapter 5. We begin with the speed and accuracy that was measured for our test subjects. We then give a prediction for expert speed and accuracy. Our experience with five different input devices is also discussed. After a short note on the correlation between handwriting, typing and MDTIM writing speeds, we will finish with an evaluation of the ad-hoc MDTIM character set that was used in the experiment.

The software used in the experiment timestamps and saves events such as mouse movements, key presses, direction and character recognitions, and phrase completions. The resulting files can then be examined after the experiment. We used several Perl scripts to condense the information before feeding it to statistics and spreadsheet software which produced the numbers that are discussed below.

Measured speed

Procedure for computing the speed

For speed computations the first character in each phrase was excluded because the test subjects were allowed to rest between phrases and thus the mental preparation time for the first character cannot be measured. Time measured for each character begins when the previous character is recognized and ends when the character in question is recognized. Thus it includes both the time spent in cognitive work of fetching the character from memory and the time spent on actually drawing the character.

All speeds are given in words per minute. One word per minute is equal to five characters. Space, backspace and in general all recognized characters are included except where something else is explicitly stated.

Average speed

Figure 6.1 shows the mean speed computed over the ten half-hour sessions excluding rest periods between the phrases. We have two different measures for the speed. The upper curve in figure 6.1 shows the speed that includes all characters that were used to correct errors. The lower curve gives the productive speed, which is the number of words per minute that the test subjects considered to be correct enough without further corrections. An error includes a wrong character and a backspace to correct it. Thus the difference between the curves in figure 6.1 divided by two gives one measure for the error rate. The errors are discussed in more detail later.


  
Figure 6.1: Mean raw and productive speeds.
\begin{figure}
\begin{center}
\epsfig{file=figures/mdtim_speeds.eps,width=12cm}\end{center}\end{figure}

The speed curves in figure 6.1 show two features that seem common to most people. During the first two sessions learning is very fast. This fast initial learning happened because people simultaneously got familiar with the touchpad and learned the character locations on the reference chart. After the second session the writers started to rely more and more on their memory and the learning continued at a near linear rate of about 1/2 wpm per session with slight degradation towards the end of the experiment.

Figure 6.2 shows the learning curves for each test subject. The variation between the subjects is rather large. By the last session the two fastest ones (1 and 3) were more than twice as fast as the slowest one (4).


  
Figure 6.2: Productive wpm rate for each test subject.
\begin{figure}
\begin{center}
\epsfig{file=figures/subjects_productive.eps,width=12cm}\end{center}\end{figure}

Subject four had huge difficulties with the touchpad. Either she had very clumsy hands, or the shape of her fingertip was such that the pad could not accurately trace its movements. Whatever the reason, subject 4 never got familiar with the touchpad and thus could not concentrate on memorizing the characters. It is unclear how large portion of the population has similar difficulties and whether there are ways to avoid these difficulties with training or different construction of the pad.

In addition to finger-touchpad incompatibility, motivation may be another major factor in MDTIM learning speed. The two fastest writers in our test were also clearly enthusiastic about their task. They took it as a challenge and wanted to write fast. Although each test subject was, for the duration of the experiment, kept unaware of the performance of others, these two considered themselves to be in competitive situation. They attacked the test software with the kind of concentration and competitiveness that can sometimes be observed in computer gaming sessions. This kind of behavior was not encouraged in any way except for listing speed of learning as one of the variables that would be measured.

The other test subjects were in no way reluctant to work through all of the 10 sessions, but they did not exhibit the kind of dedication that was evident in subjects 1 and 3.

The writing speeds shown in figures 6.1 and 6.2 are mean speeds computed over all of the active writing time during the 30 minute session. During this time the writer may have very fast and very slow periods depending on her alertness and the character mix in the text. To give an idea of this variation, we computed the speed within one minute window every six seconds over the entire experiment. Figure 6.3 shows the result of this computation for subjects 3 and 4. The experiment seems to have lasted longer for subject 4. One reason for this is that the writers were allowed to finish phrases that had been started before the 30-minute session ended. For slower writers finishing the phrase takes longer.


  
Figure 6.3: Wpm rates for subjects 3 and 4 throughout the experiment.
\begin{figure}
\begin{center}
\epsfig{file=figures/s3_and_4_complete_speed_window.eps,width=14cm}\end{center}\end{figure}

We can see that the speed for each writer varies mostly within 3 wpm of the average for a given session. The slow periods at the beginning and end of each session are the biggest exception reaching all the way down to 0 wpm. One minute window is long enough to hide most short breaks and exceptionally quick sections of the text. Shorter window will give more variation, spikes up to 20 wpm can be observed with a 10 second window. The curve shown in figure 6.3 tells us that most speed measurements that last longer than a minute will give results that are relatively close to the average speed measured over longer periods of time. This gives some credibility to the results in the 5 minute tests with different input devices that we will give later.

Measured error rate

Earlier we gave the difference between the raw and productive input speeds divided by two as an estimator for the error rate. While this number does tell something about the amount of work that goes into correcting the errors, it is not exactly the number of errors. The most notable source of inaccuracy are situations where an error goes unnoticed for a while and correct text is entered after the wrong input. Our software allows corrections only by erasing all the text between the end of the written string and the error. Thus one error may cause several characters to be erased.

To get a more accurate estimator for the error frequency we counted all errors manually from the log files. We tried to judge what exactly was the error and which inputs were used to correct it. The situation gets somewhat complicated when an error is made in correcting another error and possibly a third one while trying to correct them both. Thus we were not always able to count the errors exactly right. The result was, however, a number that is closer to the real error rate than our earlier estimate. Figure 6.4 shows the mean of the error rates of our five writers.


  
Figure 6.4: Mean error rate through the experiment.
\begin{figure}
\begin{center}
\epsfig{file=figures/mdtim_errors.eps,width=12cm}\end{center}\end{figure}

As with the speed of text entry, the first sessions show the greatest change in error rate too. Familiarization with the touchpad and direction input explain the steep decrease in the error rate. After the third session the error rate stays between 5 and 6.5 percent with no apparent trend. This is probably the real level of MDTIM error rate when using a handheld Circue EasyCat touchpad and under stress for speed.

The error rate is high by most standards. A 6.5% error rate means that almost every 14th character needs to be erased and corrected. In her 1994 article LaLomia gives 3% as the minimum acceptable error rate for handwriting recognizers. This is probably true for unistroke recognizers such as Unistrokes, Graffiti and MDTIM. Therefore we can conclude that without improvements in the error rate MDTIM coupled with a touchpad is not good enough for general use.

Predicted expert speed and accuracy

Above we have described the measured user performance with MDTIM. One of our goals for this experiment was to predict expert performance with MDTIM. We will now discuss the prediction that can be derived from the data gathered during the experiment.

As seen in figure 6.1, the experiment was too short to allow meaningful prediction based on the learning curve. The curve is close to linear and we had too few test subjects to give reliable prediction. All we can say based on the curve in figure 6.1 is that experts will on average write faster than our test subjects during the last test.

For better prediction for expert performance we must use other means. Our data allows the computation of one simple estimate for expert input speed.

The prediction is based on the fastest character times that we measured. This means that we find the fastest writing time for each character and then assume that all instances of the character could be written equally fast by an expert. This computation gives an upper bound for the top speed for a given writer with a given skill level. That is, if a user manages to write all instances of the characters with her best speed, the resulting mean speed will match the estimate.

The writing times for a given character are not normally distributed. Figure 6.5 shows the distribution of writing times for the 1543 space characters entered during the last two sessions. The fitted normal distribution curve illustrates how poorly normal distribution estimates the distribution. The writing speeds are heavily clustered around 1000 ms, with very few instances below 250 ms, but many over 1750 ms. Intuitively it is clear that the writing speeds can never reach zero, but can get values infinitely far along the positive time axis in figure 6.5. Skewness and kurtosis figures computed for the other characters showed that with very few insignificant exceptions all writing time distributions follow the same general pattern of a tight cluster close to zero and a long right tail.


  
Figure 6.5: Distribution of the writing speed of the space character.
\begin{figure}
\begin{center}
\epsfig{file=figures/backspace_histogram_with_norm...
...psfig{file=figures/empty.eps,width=1cm}\rule{9cm}{0cm}}
\end{center}\end{figure}

This means that the expert speed estimate described above will always be greater than the mean speed, but it is entirely possible to write several words at speeds close to the speed that our estimate gives (though maintaining this peak speed for even a minute is rather unlikely as we learned from figure 6.3). Thus our estimate is good for predicting error free peak performance for a writer that generated the data from which the estimate is computed. Figure 6.6 shows the raw input speed that we measured and the upper bound prediction. The prediction is the average of the predictions computed individually for each user from her fastest character times. We see that the fastest character times were still decreasing at the end of the experiment. This suggests that our test users were not even close to their maximum speed yet.


  
Figure 6.6: Upper bound prediction for writing speed.
\begin{figure}
\begin{center}
\epsfig{file=figures/expert_prediction.eps,width=12cm}\end{center}\end{figure}

Expert error rate is likely to be close to what we measured for the last sessions (see figure 6.4) because error rate typically does not decrease with practice after certain skill is achieved. In addition to our 10 session study, this fact has been observed in text input studies at least by MacKenzie and Chang mackenzie_hwcomp and Frankish et al. frankish95.

Device (in)dependence

Above we have listed results on MDTIM speed and error rate when writing is done using a touchpad. Next we will give results for four other input devices. The users did not get to train writing with the other devices like they did with the touchpad. Therefore, the numbers for the touchpad represent performance after some training and the numbers for other devices show initial performance with a new device.

Figure 6.7 shows the productive wpm figures for each of our five test subjects with touchpad, trackball, mouse, joystick, and keyboard. Figures for the last four devices are computed from five minute tests. The touchpad figure is the same that was shown for session 10 in figure 6.2.


  
Figure 6.7: MDTIM speed on various devices.
\begin{figure}
\begin{center}
\epsfig{file=figures/devices_wpm.eps,width=12cm}\end{center}\end{figure}

We see that the variation within subjects is sometimes as great as between subjects. The purpose of the experiment was to see whether MDTIM writing skill can be transferred from a device to another without significant speed penalty. It seems that this is indeed the case. Some test subjects were faster with other devices than with the familiar touchpad. We should not, however, pay too much attention to the individual five minute tests, because one exceptionally easy or difficult phrase can affect a short test rather much. The mean speeds are slightly more reliable and show that the touchpad was fastest with the mouse and trackball about one wpm behind and joystick and keyboard another wpm behind them.

This result is easy to explain. The touchpad should be fastest because the writers had just practiced on it for five hours. Mouse and trackball allow several directions to be drawn with single circular movement and thus with lower motor overhead. Joystick and keyboard both require a discrete finger movement for each direction and therefore they are the slowest. Joystick and keyboard are also more unlike the touchpad in the sense that they require different hand posture and thus more motor adaptation. Keyboard in particular is operated with two or three independently moving fingers whereas the operation of the other devices can be reduced to moving one finger on a two-dimensional plane.

Overall the speed differences do not seem to be very significant at this skill level. With more practice on one of the devices the situation might change.

Figure 6.8 shows the backspace counts in the same experiments. The most notable result is that the average error rate on joystick was slightly below 3% which would make it acceptable according to LaLomia's 1994 results. Keyboard is close to being acceptable, having an average error rate of 3.1%.


  
Figure 6.8: Error rates for all subjects and devices.
\begin{figure}
\begin{center}
\epsfig{file=figures/devices_errors.eps,width=12cm}\end{center}\end{figure}

The fact that part of the devices were handheld, while others laid on a desk, does not seem to have an effect that could be seen through the variation caused by other differences in the devices. The touchpad was held in the non-dominant hand and thus it could be operated by either moving the pad under the finger or by moving the finger over the pad. Similarly the joystick could have been operated by tilting the whole device and keeping the stick stationary or vice versa. Our experience and our observations during the experiment suggest that at least initially coordinating two hands is more difficult than coordinating one. The writers seemed to write faster and with fewer errors when the non-dominant hand with the pad was steadied against a knee or handrest than when the hand had no support below the elbow.

Handwriting, typing and MDTIM speeds

Although the sample is too small to give a reliable statistical proof of the correlation between handwriting, typing and MDTIM writing speeds, the data does suggest that such correlation may exist. Thus a note on this is in order.

Only one of the test subjects had learned touch-typing formally in school. Not surprisingly she was the fastest typist in our sample. None of the test subjects had used any pen-based text input method for longer than a couple of minutes. They were not familiar with Graffiti, Unistrokes or any form of shorthand writing. All used a QWERTY-keyboard daily.

Table 6.1 summarizes the measured typing, handwriting and MDTIM writing speeds. Typing and handwriting have poor correlation with correlation coefficient r=0.41. The three first writers type clearly faster than they can write by hand. The two last test subjects are slightly faster with pen and paper than they are with a QWERTY keyboard. The correlation between handwriting and MDTIM writing speed is stronger with r=0.65. Typing and MDTIM writing speeds have the best correlation in our data with r=0.87.


 
Table 6.1: Typing, handwriting and MDTIM speeds in words per minute.
  writing speeds
subject QWERTY-typing handwriting MDTIM productive
1 35.40 17.96 9.67
2 29.16 11.40 6.26
3 39.52 28.72 10.42
4 14.04 16.52 3.50
5 21.12 21.88 7.98
 

Overall it seems that fast writers tend to be fast with all methods and slow writers slow with all methods. There are at least two possible explanations. The first one has to do with cognitive capability to produce the stream of characters that need to be written. The second explanation places the bottleneck in the motor system. Our experiment does not reveal which, if either, is true.

The character set

The characters for the character set used in the experiment were chosen both to minimize the number of directions that need to be input and to give as many of the common characters a form that somehow resembles Latin handwriting characters. As described in chapter 4 we did not know the character frequencies well enough to make the character set optimal. Similarly we ran out of creativity in our attempt to make all characters resemble their Latin counterparts. Thus the result was a compromise regarding both our major goals. We have somewhat optimized direction consumption and a mixture of familiar and new character forms. This character set allows us to extract some factors that may help future character set designers.

We used the data collected during the two last training sessions with the touchpad. The data consists of timings for 10824 characters. Unfortunately as described in chapter 2 the character frequencies are very uneven. Of the 86 characters that appeared in our sample 40 appeared less than 20 times. This means that, in addition to being statistically unreliable, blindly comparing the mean writing speeds of the characters tells more about learning through repetition than about the goodness of the characters.

Generally speaking we are dealing with a bundle of interconnected factors that all contribute to the writing speed. The most frequent characters were given the shortest forms that resemble the Latin character set and are thus easier to learn. These same characters appeared most frequently in the writing practice thus giving the test subjects more practice on the characters that were easiest to learn and draw.

We will now identify three factors that correlate with the writing speed. Although correlation does not imply causation, these factors do have theoretical foundation and thus we can rely on that together they are responsible for a significant part of the difference in the observed writing times. The first factor is the number of directions needed for writing the character. The second factor is the degree to which the character resembles the Latin character with the same meaning. The last factor is the amount of practice that the writer has had with a character.

Number of directions

We constructed the character set with the assumption that the most frequent characters should be shortest to make them fastest to write. Table 6.2 shows that on the average shortest characters were indeed the fastest.


 
Table 6.2: Writing speeds (in milliseconds) for characters grouped by number of directions.
directions mean N
2 1216 2193
3 1489 5582
4 1827 3049
 

We gave space and backspace the shortest forms with only two directions. In Table 6.2 the two direction group, however, has a surprisingly long mean writing time. The explanation for this is the fact that backspaces are relatively frequent (N=650) and slow (1549 ms). The long mean writing time for backspace can be explained with the long mental preparation before actually writing the character. The writer has to spot the error and decide to erase before she can start fetching the form of backspace from her memory and finally move her finger to draw the character. Space was clearly faster with mean writing time of 1076 ms.

Randomly assigning the 2 two-direction forms, 24 three-direction forms and 60 four-direction forms for the 86 characters that were present in our data would on average result in 3.67 directions to be used per character. The writers used only 3.07 directions per character during the last two sessions. This means that our attempt to minimize the number of directions needed for writing was somewhat successful.

The time required for invoking the ``shift'' effect needed for accessing the second meaning of a character can be computed from the data. We compared the mean writing time for lowercase alphabet (Latin a-z) and upper case alphabet (Latin A-Z). The mean writing time for the lowercase alphabet was 1510 ms (N=7008) and the mean for uppercase alphabet was 2007 ms (N=836). This gives a difference of 497 ms. This seems like a long time to press one button. The conclusion is, therefore, that button pressing, or the tap that some subjects used on the touchpad, requires a lot of mental preparation and therefore may not be the right way to handle uppercase characters. However, we should also notice that the uppercase characters were much more infrequent and therefore it does not really matter if writing them takes a little longer.

Similarity with a Latin character

To see the effect of similarity with Latin character set, we divided the characters into three groups according to subjectively perceived similarity. The first group holds the characters that can be drawn both as Latin and MDTIM characters with the same meaning (n, o, u). The second group consists of characters that have some similarity to their Latin counterparts, but require rotation or omittance and/or addition of some features to be recognized as MDTIM characters (backspace, space, *, a, b, c, d, r, s, t, w, ä, ö). The last group holds all characters that do not belong to either one of the first two groups. The mean writing times for the character groups are shown in Table 6.3.


 
Table 6.3: Writing speeds (in milliseconds) for characters grouped by similarity to their Latin counterparts.
group mean directions N
same 1214 3.4 1611
similar 1422 2.6 5019
dissimilar 1769 3.5 4194
 

The average number of directions per character in the group is also shown in Table 6.3. It seems that having the familiar Latin form for the character helps in making the character fast. However, the good time/direction ratio for the first group is largely due to letter o, which has four directions (WSEN), but is extremely easy to write because it can be drawn as a circle.

The case of ``o'' illustrates the difficulty of finding the significant factors. It has the best time/direction ratio of all characters and it has four directions. ``O'', however, is a frequent character making over 5% of all characters in our sample. Therefore, o should also be faster than most characters simply because it is so frequent that the test subjects had a lot of o-writing practice by the 9th session. Frequent practice, similarity with Latin ``o'', and motorically easy form are enough to make ``o'' the third fastest character in our sample even though it has four directions.

Amount of practice

The number of times a given character is written and the time that it takes to write it have a negative correlation. Figure 6.9 shows a scatter plot with the character count on the horizontal axis and mean writing time on the vertical axis. The plot does not show all characters. Some infrequent characters have very long mean writing times because the test subject has for some reason kept a long pause between characters, either to search the character in the reference chart or for some other reason. The very fast and very infrequent characters can be explained as errors. The two characters with mean writing speeds of less than 1000 ms are ä and cr (ASCII code 13) neither of which was needed for accomplishing the writing task the test subjects were given. The writers entered these characters in error and happened to do it very fast a small number of times (2 times for ä and 4 times for cr).


  
Figure 6.9: Writing time versus character count.
\begin{figure}
\begin{center}
\epsfig{file=figures/time_count_scatter.eps,width=12cm}\end{center}\end{figure}

We can see that the negative correlation gets stronger if we eliminate the characters that have been entered enough times to give the mean some credibility. If we remove the characters that have not been entered at least five times (i.e. on average at least once by each writer) we get a correlation coefficient of r=-0.43 as opposed to the initial coefficient of r=-0.28.

Summary

The experiment did not reveal how fast experts could write using MDTIM because our test group did not finish their learning during the 5-hour experiment. At the end our test group using a touchpad wrote at a rate of slightly more than 7.5 wpm with an error rate close to 6%. Our estimate for expert writing speed with a touchpad is over 15 wpm.

The writing device does not seem all that important. Our test group wrote roughly at the same speed using a touchpad, trackball, mouse, joystick and keyboard the relative speeds being in this order. MDTIM writing speed on a keyboard was about 2.5 wpm slower than on the touchpad. The error rates on the different devices varied between 13% and 1% depending on the writer and on the device.

The number of directions per character, amount of practice per character, and the degree to which the character resembles the same character in the Latin character set all seem to affect the writing speed. However, speed differences between writers are also significant. A person's writing speeds with different writing methods seem to correlate.


next up previous contents
Next: Conclusions Up: No Title Previous: Design of the Tests
poika@cs.uta.fi