Volume 44, Nº 3 of the IBM Systems Journal is an issue dedicated to accessibility of the Web and computer applications.
I like nothing better than to sit of a morning with an espresso, a set of printouts of research papers, and my trusted green editing pen. (Red is so terribly passé, and the leads of a New Yorker–style blue pencil tend to snap under impassioned jotting of marginalia.) Here, then, are my précis and comments.
Web applications are viewed as a kind of black art in Web accessibility, given that WCAG and 508 are almost exclusively oriented toward static Web pages. Nobody has written bulletproof guidelines for the design of Web applications, though IBM and Sun have documentation online. But everyone agrees that accessibility depends on how you design the components of your application.
Here, Hoffman et al. give concrete examples of designing accessible components for the Social Security Administration. Form fields get a lot of attention, and the authors give many examples that are hard to make accessible due to the limitations of HTML.
here are often situations when some of the information needed to understand the purpose of data entry fields or selection mechanisms is conveyed to the user through context, that is, a combination of the surrounding text, page layout, and proximity. For example, when links are presented within a paragraph of text or within a table, the surrounding information often plays a role in communicating their purpose…. user who is tabbing from control to control, as is typical when completing a form, may skip over essential screen text without even realizing that the information is available.
You could try surrounding the explanatory text with
<a id=""></a>, marking it as an anchor and possibly adding it to the tabbing order. If that doesn’t work directly, you can add
a id="" tabindex=""is valid HTML. You don’t need
href.) Whether or not this works in the real world remains to be seen.
The authors seem muddled on the concept of
Providing a tab focus is a means of providing comparable access. Of the most popular browsers, only Internet Explorer supports the use of the “
tabindex” attribute (typically
tabindex=0) to provide a tab focus to screen text.
Similarly, a user who accesses links in a special list of links may not have easy access to surrounding text information.
This gives us further reason not to customize the Web Content Accessibility Guidelines for the bizarre state that is created when screen-reader users extract links or headings from a document. You lose context. (People can still do it; we don’t have to help them and they shouldn’t expect it to work all the time.)
Low-vision users may also have difficulties if the contextual clues are too far away from the link, push button, or field label so that they cannot seen at the same time when the screen is magnified.
Sounds like a job for a zoom layout. (The authors later endorse just such a thing: “sing an entirely different narrow view could reduce horizontal scrolling.”)
The authors echo Jim Thatcher’s advice (hi, Jim!) to use a
titleattribute on all HTML form elements. (They actually just say “in the link, push button, or field,” but I take that to mean everything.) I’m not sure how to apply their detailed advice more generally, however: “The
titlevalue should include both the visual text and the supplementary information because, when screen readers are set to read
titleattributes, they typically read them instead of, rather than in addition to, the visual text. There are times when it is necessary to supplement the text presented on the screen in more than one way, such as by including supplementary text before and after the screen text.” Care to elaborate, with examples?
The authors use seriously vague terminology in discussing “the browser title bar.”
The browser title bar of each page should briefly provide a summary of the user’s location and any special status. The title bar should include the name of the application, followed by the application section, followed by the page, followed by any specific user or record identifiers, as appropriate. Special status information such as the existence of errors, the existence of search results, or the lack of search results should be concisely inserted at the beginning of the title bar when applicable. Other status information, such as specific record or user identifiers, should be inserted at the end.
What a laundry list!
This solution does not replace the display of the same information on other parts of the page. The page title, the currently selected tab, the record identifiers, and any feedback or error messages are still displayed on the page as appropriate for the visual layout. The advantage of including them in the title bar is that they are read by the screen reader as soon as the page loads, making it easy for the user to obtain the title bar information on demand by using an assistive technology keystroke. This can provide a significant benefit for users switching back and forth between multiple application windows.
I think they are talking about the
titleelement, and their advice is not really reliable. Put the most-specific information first so it’s read out loud first and appears first in a browser tab or the Windows taskbar. We could use some real-life examples from the authors’ Web application. (I cannot figure out the PHP necessary to do that with this personal Weblog. I’ve tried and failed.) The advantages listed at the end of the authors’ second paragraph only come about if you do it this way, not their way.
I suspect that the phrase “the page title” refers to whatever’s inside the
h1element, but, as with the preceding case, the authors do not use the actual HTML element names they’re talking about, so we are left to guess.
Another recurring bugbear in Web applications is notification that a timed event will soon come to an end. Nobody has an even barely adequate guideline for that scenario, which happens all the time in financial transactions. Hoffman et al. add a little bit to the existing knowledge, but don’t nail things down the way we need them to be nailed down:
It is important to notify users in advance that an application may time out. This notification should appear at the start of each application and can be included as part of the instructions for completing that application. The notice must include the amount of time that is allowed on a screen before a session will time out, instructions on how to request an extension for more time, the number of extensions that will be granted, the consequences that result from a session time-out, and a notification that client-side scripting is required for this functionality. (If scripting is turned off, the user will not receive any notification until after the session has expired.)
How in the world do you give that much information? There are two main groups who need changes in timeouts – people with dexterity impairments who can’t enter or input information fast enough for the default, and people with learning disabilities like dyslexia who need extra time to figure out what the hell is going on. The former group might not be inconvenienced by the catalogue of information the authors specify, but the latter group certainly will.
Also, it’s no longer worth talking about the scenario in which scripting is turned off. It almost never is.
Users should be notified when a session time-out is about to occur. Providing the user with sufficient warnings and the opportunity to request more time can help the user to avoid losing data. User notifications should be written in such a way that they are clear and not intimidating.
Then your way won’t cut it.
HTTP… actually includes a straightforward means of preventing the response from a server from interfering with the partially completed form that submitted the request to the server…. The client sends a request for additional time to the server, specifying a desired response of status code 204. The server then sends a response to the client consisting of only an HTTP status line with status code 204 (signifying that no content is coming) and a blank line, which does not interfere with the current page. Early testing indicates that this method may, in fact, provide a simple, elegant solution.
Interesting. The spec for status code 204 No Content requires:
The server has fulfilled the request but does not need to return an entity-body, and might want to return updated meta-information. The response may include new or updated meta-information in the form of entity-headers, which if present should be associated with the requested variant.
If the client is a user agent, it should not change its document view from that which caused the request to be sent. This response is primarily intended to allow input for actions to take place without causing a change to the user agent’s active document view, although any new or updated meta-information should be applied to the document currently in the user agent’s active view.
The 204 response must not include a message-body, and thus is always terminated by the first empty line after the header fields.
Clear as mud, but if it works, I’m all for it.
The authors have the usual misconception about absolute vs. relative font sizing: “One should avoid using absolute text units like points and pixels.” Pixels are relative units. “Instead, relative sizes should be used so that the user can adjust browser preferences to display text in a larger font.” Every graphical browser but IE/Win can resize type in any unit. IE/Win has a problem only with the
Anything with the word “semantic” in the title is gonna be hot-hot-hot! for the practicing standardista – and immediately rejected by the WCAG Working Group. Here the authors attempt to extract semantics from Blogger.com (and Blogspot – yes, they used real sites!) and rearrange it in Mozilla.
We assert that the preferred way to enhance visually-impaired individuals’ access to information on Web pages is to encode the meaning of that information into the specific Web page involved. There are, however, problems with this approach. Empirical evidence suggests that authors and designers will not separately create semantic markup to coexist with standard XHTML… because they see it as an unnecessary overhead.
In fact, the authors repeatedly state that Web designers (especially) won’t lift a finger to add semantics if it represents any increased overhead whatsoever. I would like to say this is an untrue generalization, but I can’t – because it isn’t.
Recently, we have seen a movement in Web page design toward a separation of presentation, metadata (XHTML), and information. However, this has not been enough to support unfettered access for visually-impaired users. Consider the excellent CSS Zen Garden Web site. This site is a model of the state of the art, including the application of current standards as well as the separation of presentation and content. It is also visually quite stunning. However, it is still relatively inaccessible to visually-impaired people, because the information is rendered in an order defined by the designer and not in the order required by the user. […]
How can semantic information be built into general-purpose Web pages, without compromising the page’s design vision, such that the information is as accessible to visually-impaired users as it is to sighted users?
The authors’ system “triages” a site for which an “ontology” had already been defined. This will tend to limit the system’s application to the broader Web, I think, but the choice of Blogger is a good one given the many tens of thousands of sites using that platform. Plus, their system only works on sites that use HTML+CSS (and implictly not with tag-soup or table-based sites).
Our … has three types of functionality…
- De-Fluffing removes all information that is classified as removable based on its location in the ontology (not in CSS or XHTML).
- ReOrder rearranges the page so that the most important pieces of information are moved to the top of the document based on the ontology’s
- Finally, Toggle Menu moves menu items from their current location to the top of the DOM (as a child of the DOM body).
Interestingly, our ontology contains two concepts,
archive-list, which have no CSS entries but are used as CSS class identifiers in Blogger.com…. ur application can then treat recently and archive-list as kinds of menus and perform appropriate operations upon them.
It isn’t at all clear what the resulting de-fluffed, reordered, menu-toggled version of a Weblog looks like or acts like. Nor is there any user testing to show that the altered version is actually better. In fact, there aren’t even any screenshots, and the authors don’t explain how even their own experience of a Blogger site is changed after these transformations.
Semantic elements are curiously called “nonpresentational elements,” sort of like referring to life as “nondeath.” Please call a spade a spade instead of a nonclub/nonheart/nondiamond.
The authors differentiate between reordering page elements for blind people and for small-screen devices, two different applications that a lot of observers like to compare (myself included):
Although the amount of information that can be accessed at one time on a small-screen device is certainly limited, the interaction is still visual. The provided visual rendering is persistent (i.e., the screen acts as an external memory device), as opposed to audio rendering, which is transient. Additionally, audio is less focused and more serial in nature than visual rendering, and the user cannot easily and quickly shift focus.
The paper discussed the features of EASE (Evaluating Accessibility through Simulation of User Experience), which simulates visual or motor impairment.
The only part I found interesting (really, the only part) was an evaluation of typing speed using word-prediction software. The authors conducted an experiment with nondisabled people, in which, among other things, they were forbidden to type a subsequent letter until the first one appeared onscreen. (Touch-typists don’t work that way.) The results, which are poorly explained, seem to indicate that people with high intrinsic typing speeds take the greatest performance hit in using word prediction. If you type slowly to begin with, you can reach 85% of the maximum possible speed using word prediction.
Overall, when participants were allowed to type faster, they used the word prediction software less than when they were held to slower typing…. Because participants, on average, achieved slower speeds with word prediction than without word prediction in all but the 5 WPM condition, we can conclude that this threshold is likely between 5 and 8 WPM.
The authors mention that “ne of the original project goals… was to create a cross-platform solution,” which they define to include Windows and Linux. (In fact, “accessibilityWorks is intended to augment existing accessibility development work for the Linux platform.”) Hence the authors run the gamut of platforms from L to W. I wonder how well, or if at all, their proposed extensions work on Macs.
Using XUL and XPCOM, “Mozilla’s application-development environment allowed simplification of the Web Adaptation Technology architecture created for the Internet Explorer implementation.” The revised Mozilla can:
- speak text
- show magnified text in a large banner at (for example) the top of the window
- change colours
- enlarge images (with sharpening)
The controls for these functions appear as a huge band at window bottom. The browser also lets you type one-handed and fixes errant keystrokes by various methods. Perhaps most interestingly, if you’re severely mobility-impaired you can use a normal Webcam that looks at you and interprets head or shoulder gestures.
So: If these additions already work with Mozilla and will be ported to Firefox, what’s stopping the Mozilla Foundation from including them as standard equipment?
‘Personalization, interaction, and navigation in rich multimedia documents for print-disabled users’
This article is rather unclearly written, and I could not exactly understand what it was talking about, except inasmuch as it articulated a need for multimedia with captioning, audio description, and sign-language interpretation. I would not view those as novel requirements.
A serious limitation of the one-document-for-all approach is that the needs of different print-disabled groups are very different. For example, a sign-language video, possibly supported by additional text information, is the preferred medium of many deaf readers. Readers who are hard-of-hearing need to be able to the volume of audio from background noise, human speech, or music. Dyslexic individuals require simple language or pictorial description. Elderly readers have specific requirements to make a document readable, and may need a mixture of such multimedia presentation techniques.
If all of these enrichments are included in a single Web page based on HTML…, they must be read by every reader.
Well, no, they don’t, no.
- You can include multimedia on a page (via the standards-compliant
objectelement, which doesn’t really work, or the noncompliant
embedelement, which does), or just link to it. If you don’t want untargeted groups to watch the video, use the latter option.
- Audio volume is a user-agent issue, but the user agent is the media player, not the browser. (That whole sentence seems like a political gesture to support WCAG 2.0’s unworkable draft technique that would force you to ensure that foreground and background sounds differ by at least 20 dB.)
- The “specific requirements” of “elderly readers” are not listed, probably because the authors do not have a list.
Although assistive devices such as screen readers try to personalize Web pages by integration with the browser, in general this is an uphill battle requiring browsers to support new or novel mixtures of existing markup languages. This leads to reduced efficiency, lack of acceptance, and ultimately an unusable reading system.
- You can include multimedia on a page (via the standards-compliant
The authors provide few details of how the customizations they had originally planned turned out to be inadequate:
An evaluation with 19 partially-sighted, deaf, or dyslexic users investigated the following issues, in addition to general accessibility and usability problems:
- Personalization of intradocument navigation structures (e.g., table of contents, indexes)
- Personalization of intrapage navigation structures (e.g., jump-to-top-of-page capability, location of navigation links)
Users appreciated the idea of personalization of the navigation structures, and the prototype implemented this possibility with a certain degree of success. However, users suggested a number of improvements to the implementation. In particular, readers expressed the need to adjust numerous aspects of both the content and the interface to a degree that went beyond the level originally foreseen for the different stereotypes.
All right. What did they ask for that you didn’t give them, and how easy or difficult would it be to meet those requests on real-world sites?
I also wonder if the prototypes the researchers developed were tested with “assistive devices such as screen readers.” With all that multimedia going on, shouldn’t the prototypes have been tested at least with Jaws? Or is this another case of deaf sites not doing Web standards?
Aaand here we go again with subtitling vs. captioning, and claims that automated technologies are right around the corner.
- Subtitling of audio and video material for deaf and hard-of-hearing readers
- This enrichment may also prove very useful to readers whose native language is not the language of the original material.
Are you talking about captioning or subtitling, then? Or the use of captioning by ESL readers, who are not, in point of fact, disabled?
- Subtitle generation and authoring systems to support subtitling
- In fact, systems that provide at least semi-automatic subtitle generation are now available.
And they don’t work.
- Sign-language interpretation of audio and video material for profoundly deaf readers
- Sign-language interpretation is preferred over subtitling by many profoundly-deaf readers. Unfortunately this is a considerable additional expense if a human signer is used, as is currently preferred by most sign-language users. However, virtual human avatar systems now under development present sign language in a sufficiently natural way that they should soon prove acceptable.
Well, prove that, please. None of the signing avatars I’ve ever seen remotely resembles a human being and they all look incomprehensible. Given that they rely on machine translation, which cannot traverse even the narrow gap between English and French without disgrace, I dispute the claim that these technologies are just around the corner.
Anyway, if humans are preferred, give us humans. And don’t try to manufacture the outcome you secretly want by claiming that real interpreters are “currently” preferred. You make it sound like avatars will be even better than the real thing. They won’t – and that might put research funding at risk for you or someone you love.
- Audio description of video material for visually-impaired readers
- There is currently no algorithmic method available for the generation of an audio description of video.
I am glad this fact has been recognized. Nothing beats a human describer. Nothing beats a human interpreter or a human captioner, either. Why do the authors advocate replacing both of those with machines?
Nonetheless, most subjects approved of the annotations used in the experiments.
An evaluation with 70 print-disabled users from all of the target user groups was carried out…. Both the use of sign-language videos synchronized with text highlighting for deaf readers and text highlighting synchronized with simultaneous speech output for dyslexic readers proved particularly successful…. yslexic readers need to be able to control the speed of highlighting of the text, and blind readers need to be able to start and stop videos, primarily so that they do not interfere with the speech from a screen reader.
Starting and stopping a video is something the player must handle. Some media players are indeed hard to use with a screen reader. (It seems only Windows Media Player is really easy to control with a typical screen reader, but I wish somebody like Bob Easton would do a little study for us.)
If I sound skeptical about this article, it’s because I think we aren’t being told enough about the actual application that was developed; its real features; the complaints voiced by subjects; and the underlying HTML and video files used. I feel like Petrie et al. are saving that all up for a more-important paper in a more-important journal, or perhaps just saving it up for commercial application. As a comparison, another project headed up by Petrie, Vista, has never been adequately documented.
(I checked all my E-mail. I had sent Helen Petrie three E-mails, all unanswered, in 2003. I also E-mailed the department secretary asking her to ask Petrie to respond. No dice. Subsequently, I found some articles on Vista by J. Freeman, which I have requested.)
And here we have another concerted attempt by well-meaning people to replace skilled, qualified human expertise with automated systems that are only slightly smarter than a sack of hammers.
In the program described by the paper, university professors in Halifax (yes!) and in Oz use a speaker-dependent voice-recognition system. Most professors train the system to recognize their own voice, but the paper (unclearly) describes a method of postprocessing for speakers who can’t or won’t do that training. The professors wear a microphone and give a normal spoken lecture. The automated transcription appears on a projected monitor – and looks terrible, with spindly black fonts on a blinding white background. (Actually, the article’s sole photograph does not show transcribed words, just the system welcome messages.)
The paper, at nearly 7,500 words, beautifully modulates its tone to lull the reader into thinking the system actually works. I thought that, too, for a couple of days after reading the paper. But the honeymoon’s over. The researchers’ own data shows that the speech-recognition system simply is not useful for its intended purpose – to make lectures accessible to deaf students and to students who can’t take notes. That’s because it doesn’t actually transcribe what people say.
The issue is accuracy. Speech recognition is proposed here as a replacement for real-time transcription using CART (a system functionally identical to real-time captioning) or using another kind of speaker-dependent speech recognition. Let me quote Gary Robson here (The Closed Captioning Handbook, p. 119): “Real-time stenocaptioners must regularly work at sustained speeds of over 225 words per minute with accuracy of 99% or better.”
How accurate is the IBM system? Well, 51% accurate in the worst case, 72% in a typical case (that’s the mode; the mean is 77%), and 91% in the best case, which happened with one professor. Ten out of 17 professors listed had accuracies under 80%. Here’s the highest gloss the authors can put on the situation: “By the end of the … nearly 40% of faculty participants reached the benchmark of at least 85% accuracy.” In other words, more than 60% didn’t, and none of them reached levels good enough for the trashiest live show on TV. And a 15% error rate would have been just fine with the researchers.
I have not watched the system in operation, but my experience tells me that the 9% to 23% to 49% of words it’s getting wrong are not function words like the, of, or and. They’re content words that you’re there to learn and understand in the first place.
Furthermore, the authors admit it is almost entirely uneconomical to correct the resulting transcript. Words the recognition system cannot even guess are written in “phonetic symbols,” though that is not defined. (IPA?) The instructor (or, more likely, his or her teaching assistant) is stuck cleaning up the text. Even a 95%-accurate transcript requires an hour to edit, and a 75%-accurate transcript (roughly the norm) requires six hours, the paper tells us. Do you want to spend an entire working day delivering a lecture and correcting some machine’s mistakes? “A lecture at 65%… accuracy… requires nearly as much time to edit as… to simply type out the lecture from scratch.”
How well did it go over with students?
Despite these and other research challenges, project outcomes largely substantiated the belief in the technology’s potential. Although perceived utility was highly individualistic according to learning style, students generally liked the concept and wanted to see more testing.
In other words, students had wide-ranging opinions, but tended to think it was a good idea that might work if somebody fixed it.
The technology cannot even figure out when a spoken sentence ends. (They hacked a system to insert ellipses and carriage returns whenever a pause is detected. How would you like to read that in class all day?) Nor does the system identify speakers, which is rather important to everyone who knows something about captioning, or has to use it, or actually likes it.
And, at first blush, it seems we’re doing all this purely to save money:
The proliferation of Webcasts as a communication medium presents a problem for Web accessibility…. Audio captioning… presents a more serious cost challenge. Initial research into companies that provide captioning for webcasts reveals prices ranging from $500–$1,000 per finished hour for accurate transcription and reintegration of captioning into multimedia formats.
You get what you pay for. Computers are too stupid to write down what we say. So are a lot of people actually working in captioning, admittedly, but other people can do it. Computers can’t, and that won’t change in our lifetimes.
Note that the researchers admit to “belief in the technology’s potential.” They need something to believe in, because it doesn’t work right now. This isn’t science, it’s religion.
In any event, money clearly isn’t an irreconcilable issue. The authors write a whole section on collaborative editing to fix the system’s mistakes. In other words, to save money on real-time captioning, they suggest hiring more people, taking multiple editing passes to correct the document, and buying adaptive technology like head-mounted pointers. How much is that gonna cost, or are you just gonna farm it out to India or Uganda, where they won’t be able to understand a Nova Scotian or Australian accent in the first place?
Is money really the issue, or is something else going on? Maybe you just want the world to be as Star Trek–esque as possible. Maybe you’re just better at talking into machines than with people. Then you’re not wasting your time anymore: The result may still be boring written text, but at least it nourishes your gadget fetish. The machine will never question the entire purpose of your extensive mechanized project, or your reported results, or how well you’re actually serving people with disabilities, or the total failure of your expensive international project based on any of the foregoing.
An impression I’ve gotten over the years is that proponents of speech recognition kind of don’t like captioning. They’d really rather have it disappear, as by having some machine do it. It’s just too much money and too much trouble for something they’d never voluntarily sit and watch for long periods, let alone a lifetime.
By far the most confusing paper. I couldn’t make head or tail of it. It appears to describe the vast software infrastructure that IBM has set up to statistically sample its millions of Web pages to assure managers that accessibility guidelines have been met. The system does not, of course, prove that the pages really are compliant, or that unsampled pages are compliant, and it also does not actually fix any of the errors that only people can diagnose.
The systems talk about accessibility but do not do accessibility. The technology seems to be a placebo rather than a guarantee of accessibility. This is, however, a typical characteristic of automated accessibility testing tools, which give you an excuse not to check your own sites.
No, I’m not being a total bitch about this one just because coauthor Jim Thatcher (hi, Jim!) has been a total bitch to me in recent months. I don’t understand the paper and simply have not been won over to the underlying concept.