I QUIT

Jennifer (“Jen”) Mankoff et al. have a useful paper (PDF), “Is your Web page accessible? A comparative study of methods for assessing Web page accessibility for the blind.” The paper is a test of testing methods for Web accessibility: “We compare a laboratory study with blind users to an automated tool, expert review by Web designers with and without a screen reader, and remote testing by blind users.” (And best of all, I am cited!)

It seems that expert (sighted) Web developers using a screen reader caught the highest number of errors, but it was still only about half. Plus the whole paper was about blind accessibility rather than accessibility accessibility.

[L]ab studies with real users tend to happen towards the end of the iterative design cycle, a time when large accessibility problems might be ignored to ship a product or release a site on time. […] Our results show that Web site developers are best served by asking multiple developers to evaluate a site with the help of both a monitor and a screen reader…. We found that multiple evaluators using a combination of a screen reader and monitor were most consistently effective, finding about 50% of known problems. However, certain classes of problems were missed under these conditions, and other classes of problems could be found as easily using other, simpler techniques.

“Jen” et al. started with a baseline of testing with five real, live screen-reader users (with Jaws, inevitably), doing four tasks. Here’s a good part: “We also excluded problems that occurred when a participant forgot a Jaws command because this could not be addressed by a Web site developer.” Your dysfluency with your own software is not the developer’s problem.

After some filtering, the blind users came up with 29 accessibility errors. Then came the task of developing alternatives to actual user testing that the experiment could study.

We found that learning to use a screen reader well enough to evaluate a Web page with a monitor turned off required 20–40 hours of practice. However, with the monitor turned on, practice time could be reduced to 10-15 minutes (on average). This is partly because participants had to learn far fewer screen reader features, meaning they might have an inaccurate view of the problems with the Web sites. However, the presence of the monitor also helped participants to identify some problems they might otherwise have missed, by allowing them to see if the audio output matched the screen output. For example, they could see when text that should have been read was not. Without the monitor they might not have noticed the missing text.

I’ve never been in favour of sighted developers’ using a screen reader with the monitor turned off, just as you can’t test captioning with the sound off. (Well, not solely in either case. But if you have to pick only one technique, use a monitor and turn the volume up.)

Using only the guidelines was impossible, because there are just too many of them. “[T]he complete set of available guidelines… is simply too great to be reduced to a usable number of heuristics. For instance… over 100 typical errors regarding accessibility problems are listed.”

So that left them with three groups of testers – Web developers with little or no accessibility experience with and without a screen reader and blind users E-mailing in with comments. They also used automated testing (Bobby [4.0], inevitably). Everybody tested against the 29 errors the original blind group had found and also WCAG Priority 1.

The reviewers didn’t always agree with the blind people about severity of errors, and often neither of them agreed with WCAG:

There was no strong correlation between the WCAG priority of a problem and the severity assigned to the same problem by developers, or the severity assigned by developers and the severity derived from [the original blind users]. Additionally, simply meeting WCAG Priority 1 guidelines was not sufficient to address the most severe problems found [by the original blind users].

The results were not great. Experts using a screen reader found the highest percentage of errors the blind users found, but that was only about 23%. (Automated testing found about 3%.) And some blind people who were asked to write in with their results could not complete every task, which is exactly what happened in my own study.

However, once enough experts with screen readers were put on the case (at least five), they found about half of all errors – not just the ones the blind users found, but also WCAG errors. Yet the four testing methods (three groups of people plus Bobby) rarely agreed on anything – from my reading of the graphs, on only three out of 19 topics did all methods catch an error. In all the other cases, one or more methods of testing did not pick up on an error.

An interesting paper. Reads well when accompanied by double espresso.

The foregoing posting appeared on Joe Clark’s personal Weblog on 2005.10.10 16:39. This presentation was designed for printing and omits components that make sense only onscreen. (If you are seeing this on a screen, then the page stylesheet was not loaded or not loaded properly.) The permanent link is:
https://blog.fawny.org/2005/10/10/jen/

(Values you enter are stored and may be published)

  

Information

None. I quit.

Copyright © 2004–2024