Previously, I wrote (and, on many occasions, rewrote) a long post on the freakonomics of captioning errors. My conclusion, which I believe is the only supportable one, is that applying a simple number to captioning errors (like “99% accurate”) is meaningless and beside the point, because even in that case, the 1% you’re allowing to be in error could be important (dropping the word “not” in “not guilty”) or unimportant (dropping one “very” in “very, very angry”).
Now I’m gonna do the same thing to audio description, which, as you know, is an added narration track that explains to a blind person what’s happening onscreen that cannot be understood from the main soundtrack alone. Audio description talks you through a movie, a TV show, a theatrical performance, or any number of other audiovisual events. It’s much less common than captioning, which also implies that it hasn’t been studied as much. But in fact several experiments have concluded that blind viewers receive less information exclusively from the description track than many of us would believe.
First, everybody wants it
A lot of blind people , and nearly every blind organization that isn’t batshit crazy, want more audio description. 36% to 44% of respondents in one survey claimed they would be somewhat or much more likely to watch TV with description.
But how much does it help?
All that is fine. Of course people with disabilities want greater accessibility. But how much does audio description help? How much extra information are you really getting from a described show, and does that vary whether you’re blind, low-vision, or sighted? We actually have data on that question.
Schmeidler and Kirchner (2001) found that viewers could answer questions about two science programs more accurately if they had heard audio description – sometimes by a large margin. But only three out of six sets of questions produced results that were statistically significant. Some of those answers could have been produced by chance or because roughly 15% of the subjects had already seen the show. Importantly, some correct answers were due to the fact that many subjects still had usable vision.
A British study found better, statistically significant results for three out of four tests of comprehension. But there were other factors, like the amount of TV viewing, that also produced better results for description viewers. All the subjects were senior citizens, and none was blind or visually impaired, which I would tend to view as a limitation of the study.
(Carmichael, A., and P.M.A. Rabbitt, 1993: “Assessment of the effect of audio description on elderly people’s comprehension and memory for a television program”)
Peli et al. (1996) used audio-description scripts to develop questions about visual understanding of two programs. Subjects watched the programs without description. At moments in the programs where description would ordinarily be spoken (with some exceptions), Peli paused the video and asked questions about visual details – questions “designed to test whether a visual detail described… was seen or not seen.” “uestions that could not be answered on the basis of the standard audio of the program and questions whose correct answer could not be chosen based on normal viewing were eliminated from the analyses.” As a comparison, another group listened to normal audio only – again without description, but without seeing the video, either.
Peli found that vision was important in answering questions about a described TV show. People who only listened to audio got about 55% of questions right, but low-vision people scored about 70% and people with normal vision 79% to 87%. The authors assume that just answering randomly would assure a 50% success rate for people who only listened to audio. That means the added information from a described show is quite small – it adds about 5% to your factual comprehension in the worst-case scenario.
Another study by Peli (2005) used the same method of deriving questions from an audio-description script while exhibiting only normal audio to subjects. This study, though, used normal and “enhanced” video, the latter having greater contrast and other visual features that subjects could adjust themselves. Peli found that people got more questions right with normal video (71%) than enhanced video (66%), though the difference was not statistically significant. Even if we go by these statistically insignificant numbers, if you’re getting over 70% of the answers right without description, then description could only help about 30% of the time.
Again, numbers don’t help
As with captioning, we see that numbers don’t really help. Yes, you can count up how many words were missing from a caption, or misspelled or misrendered. You can survey exactly how many extra questions a person can answer correctly about a TV show after watching it with description. While both those cases give you numbers, numbers aren’t useful.
- In the captioning case, the question is not “Did they leave words out?” but “Did they change the meaning?” (There are other questions, too, like “Did they change the style? the flavour? the dialect?” but those follow from the main question.)
- In the description case, the real question is “Did they make people miss anything?” You might only get about 5% more information from description, but what if that 5% is absolutely necessary, like who’s holding the murder weapon? What if it’s merely helpful information, like the name of the director or a song title listed in the credits?
Peli (1996) indirectly addressed this question (emphasis added):
There are indications… that some of the narrative was redundant with the audio portion of the programs…. he time may be better used to describe elements of programs that visually-impaired viewers and blind audiences are unable to obtain elsewhere. However, if the extra information provided by is necessary for visually-impaired and blind viewers to follow a story or enjoy a program, may be an ideal sensory substitute for viewers who cannot fully appreciate visual details.
It seems, then, that in captioning and audio description, you may have a hunger to use numbers to measure quality, and if you look hard enough you’ll actually find those numbers. But quality really is qualitative, and you have to look beyond numbers, or ignore them altogether, to answer any question along the lines of “Did they do a good job here?”
Updates & corrections
(2006.07.04) I have been informed by Eli Peli not only that I have misinterpreted his two papers, but I’ve done such a serious job of misinterpreting them that I “obviously” didn’t read any more than the abstracts. He has been informed of the total number of times I have read each paper all the way through (five) and the additional number of times I reread the description segments (two). He also threw in a jibe about blogs.
I look forward to any research that shows how quantitative analysis of audio description or captioning actually helps. Somebody drop me a line if anything ever gets published. Even on a blog.
(2006.07.09) OK, so I promised Eli Peli that I would reread his two papers, and lo and behold, his complaints are well-founded: Most of my description of his experiments was incorrect. The corrected versions are given above. If you’re adamant about reading the old version due to some concern that I am using the fungibility of Web documents to send history down the memory hole, here is that old version:
- Peli et al. (1996) found that vision was important in answering questions about a described TV show. People who only listened to audio got about 55% of questions right, but low-vision people scored about 70% and people with normal vision 79% to 87%. The authors assume that just answering randomly would assure a 50% success rate for people who only listened to audio. That means the added information from a described show is quite small – it adds about 5% to your factual comprehension in the worst-case scenario.
- And another study by Peli (2005) found that people got more questions right with undescribed video (71%) than described (66%), though the difference was not statistically significant. But what was significant was an interaction of viewer preferences – subjects who said they preferred the undescribed version did better with the undescribed version than on the described version. Even if we go by the statistically insignificant numbers, if you’re getting over 70% of the answers right without description, then description could only help about 30% of the time.
Peli’s studies never presented described audio tracks to subjects. Instead, he used the description scripts to develop questions that presumably could be answered only from the added audio description. If subjects answered the questions correctly anyway, it indicated that audio description was less necessary than anticipated. An interesting analysis that Peli makes about both studies is that residual vision is important in understanding video. (In the earlier study, common knowledge was also seen as important.)
One apologizes to Eli Peli for getting this quite so wrong. (However, the 2005 paper, at p. 547, does claim that the previous study used subjects “who watched the programs with or without the ,” which resolutely is not the case, so perhaps these studies are just naturally complex.)
I will point out the that the feedback loop at work here – an interested observer writing a blog post about scientific research followed by the researcher’s complaining that the observer got things wrong – did, in fact, lead to a correction of the record. Such a feedback loop compares favourably with scientific research (seven months passed before the journal that published the 2005 paper read a revised version of it, for example; this took less than a week) and the popular press (where corrections essentially go unnoticed if they are made at all).
Just fewer nastygrams in the future, please. I will always correct a factual mistake; where the mistake is small, I’ll do it on the spot. Calling my reading ability and general cognition into question is unnecessary.
Aaanyway, the the other two researchers wrote in with comments, too.
Corinne Kirchner notes:
I generally believe that doing only a quantitative analysis is incomplete; qualitative analysis always adds something (and vice versa)…. he appropriateness of a quantitative analysis depends in part on the stage of the cumulative body of knowledge on a topic – early exploratory research should rely more heavily on qualitative studies, whereas research that is intended to be more “confirmatory” and/or generalizable comes later and needs a strong quantitative component.
Finally, in critiquing each type of study, as much or more attention needs to be given to the choice of sample respondents, and, in the case of captioning and description, to the choice and characteristics of the material being presented.
That’s a lot of variables and that’s why no study can ever answer broad questions such as you are addressing. We (i.e., the field) need a cumulative body of research which necessarily takes some time.
And Alex Carmichael draws attention to
the impact ( or ) of translating (which really is the wrong word in this context) subtle visual clues into explicit (and prominent) audio description. Watching TV doesn’t and shouldn’t… require the degree of cognitive activity associated with learning – except, obviously, in the case of explicitly learning/teaching material, and perhaps some documentaries…. I think that any program created entirely from the perspective of information transmission would be about as engaging (to the general public) as the average academic PowerPoint presentation! […]
I don’t think it is meaningful to equate (for comparison between VI and sighted viewers) what they get out of a programme with some kind of count of the bits of information made available to them.
Thus, I think it is a fundamental distortion to address the watching of audiovisual entertainment as information uptake…. The approach I took recognised this and aimed to balance information/enjoyment/ intrusion, with the latter being important for sighted elderly viewers who may benefit from the description sometimes but not always (the inherent prominence of speech making it almost impossible to tune out as and when you don’t require it) […]
Finally I don’t think it is reasonable to draw as close a parallel (as you seem to) between captions and audio description with regard to judgements of how good they are (particularly in regard to errors etc.). In the case of captions there is a solid basis for comparing one explicit information source (protagonist’s speech [and sounds, music etc.]) with another (captions) which is intended as a translation, as this allows identification of errors such as typos, editorializing, etc.
I have some ideas about how to measure quality of, and satisfaction with, captioning and description.