In two previous posts on the freakonomics of captioning and audio description, I concluded that the measurement of errors in those fields isn’t going to help you much. You can count up how many mistakes were in the captions and you can survey blind and low-vision viewers to figure out how much information they obtained from the description narrator. But let’s say you had very few or no captioning mistakes and a lot of information obtained by description. Does that mean the captioners and describers did a good job? Did people not only fully understand but enjoy the production? If those kinds of questions cannot be answered by counting things up, how can they be answered?

I don’t know yet, but I’m working on some ideas. Here are a few that immediately came to mind (and probably wouldn’t come to mind for some other researchers).

  1. There’s no reason not to measure things. If you’re trying to gauge overall quality and it is at least practicable to measure something, then measure it. More information is better than less. To draw a comparison with Web accessibility, some studies have shown that sites that flunk the accessibility guidelines (WCAG, a pseudo-objective measure) are still functionally accessible, while some accessibility barriers aren’t even covered by WCAG. No matter what your qualitative results turn out to be, it will be of interest to compare them to whatever numbers you were able to measure. (In my accessible-movie reviews [an exercise that soured me on attending the movies, apparently for life], I always wrote down mistakes.)

  2. We already have ways to measure seemingly qualitative phenomena, particularly in captioning, which is a reading exercise. We have decades of experience measuring people’s objective reading performance, and we can even do things like slide you into a brain scanner and measure your neurology. Typography, including typography of captioning, is not merely a question of “I like it” vs. “I don’t like it.” Typefaces, lines of text, words, and layouts all have performance characteristics that can be measured. While none of this applies to audio description, I have seen no effort at all to apply standard empirical techniques in psychology of reading to captioning and subtitling.

  3. Check for confounding factors. In a lot of tests of captioning, subjects are forced to watch an unfamiliar TV in an unfamiliar room while surrounded by strangers. This will always colour any results. But more important are known biomechanical factors, like visual acuity or distance from the screen. As I keep pointing out, Thorn and Thorn documented other studies showing that deaf people tend to have the wrong eyeglass prescription, and it is known that everyone sits too far from the screen (though the “preferred” viewing distance is not backed up by research I can find).

    Thus if your subjects, in an initial interview, claim that captions are too hard to read, maybe they need their glasses fixed and need to sit closer to the screen. If you’re trying to test caption readability, you must test all subjects’ vision and correct that vision with new lenses if necessary. (Tests with low-vision users may not require that latter step, but history shows that such tests tend to be completely muffed.) Unless you’re actually testing viewing distances, you should run your experiment twice, once at the subject’s habitual viewing distance and once again at a controlled distance. It may be strongly desirable to test in people’s actual homes; we’ve come to accept the necessity of that in testing screen-reader users, for example.

  4. Don’t assume you know enough to set the experiment up correctly. If you’re testing caption-reading, do not try to fake the fonts or any aspect of the typography of captions. Do not pick whatever typeface happens to be near the top of your Word for Windows font menu as a control. Time and time again, researchers pick inappropriate fonts to test – typefaces that anyone with type knowledge would never recommend for captioning or subtitling (particularly Arial and Helvetica, Courier, and Monaco). Your half-arsed knowledge of one field may induce you to set up an unfair and unrealistic case or control, and your results will never be more than suggestive.

  5. Use lots of viewing time if possible. For a large-scale or long-term study, or a study of a novel method of providing accessibility, consider giving people tapes or discs to take home and watch for a long period, like several hours to several weeks. (If you’re testing a new receiver or HDTV set, for example, you could let people watch TV for two weeks straight and then come back and test them.) Yes, people could fudge their viewership or watch TV at their friends’ places, but the shock of the new can be so severe as to throw subjects off. Middle-aged big-D deaf people in particular are extremely reluctant to accept any change in captioning from the captions they first started watching in 1980 or earlier.

    Jensema has shown that even caption-naïve people get better at reading captioning after a few trial runs. If you test a brief selection of something that’s brand-new, subjects may be so weirded out by the novelty of it that all they can focus on is how different it is from what they know. Some studies may benefit from an immersion approach.

  6. Test multiple groups. Totally-blind, visually-impaired, and sighted people all watch audio description. Yes, we know it isn’t made for sighted people, and we almost surely should not alter our description methods to accommodate them, but that doesn’t mean they shouldn’t be tested. Deaf, hard-of-hearing, and two kinds of hearing people (native speakers and second-language speakers) all watch captioning. Try to test with all groups. For captioning, try to test with low-vision people too, but do not test only with them.

  7. Test the practitioners. If you’re trying to analyze the overall acceptability of captioning or description, as distinct from counting the deficiencies in the information either of those communicates, then try something that nobody seems to have done yet: Test the practitioners along with the viewers. It may be the case that, for example, deaf captioning viewers will put up with things that long-term captioners think are an abomination, or vice-versa. To do this properly, you have to test multiple captioning houses, since there is no such thing as standardized captioning. (Though there will be. Trust me.) It will always be instructive to compare what viewers accept or reject with what practitioners do.

    (In all honesty, I don’t know what benefit there would be in testing outside experts like me. Perhaps just for a purely subjective evaluation that is billed as such. I proposed that sort of thing to somebody researching a certain technique in audio description, which I think would have been fine for the level of informality in the ultimate research paper, but it didn’t happen.)

  8. Do not allow organizations with a financial interest to sway your experiments. Broadcasters will, in nearly every case, use the cheapest captioning and description available. They would strongly prefer never to read any scientific comparison of their preferred cheaper methods and more-expensive ones. You may be hoping for an organization representing broadcasters or distributors to fund your study, or a regulator that’s stuffed full of bureaucrats who either just finished working for broadcasters or will do so the minute they leave their current jobs). In that case, do not not self-censor in the first place, and always secure a signed agreement that only you may set the parameters of the study.

That’s all I’ve got for now. I may add more as time goes on.

The foregoing posting appeared on Joe Clark’s personal Weblog on 2006.07.09 13:50. This presentation was designed for printing and omits components that make sense only onscreen. (If you are seeing this on a screen, then the page stylesheet was not loaded or not loaded properly.) The permanent link is:

(Values you enter are stored and may be published)



None. I quit.

Copyright © 2004–2024