If I may get all citizen’s-journalism on your arse for a moment, I will disclose that I am writing a superexclusive article on accessibility of PDF. (PDF is sometimes accessible. You knew that, though.)

I have come up with a shortlist of document types that, when posted to the Web, should be PDF. I submit that every other type of document should not be PDF, and nearly all of those should be nice tidy valid HTML.

My question is: Am I leaving anything out? Contact me at the usual address. Credit duly given.

  1. Multicolumnar, particularly if figures and illustrations are included, since multicolumn Web layouts are a mere hack and are unreliable as a method of reproducing print layouts. (Multicolumn documents that are presented that way to save paper and can work as a single column should be HTML. It can be difficult to distinguish that case from a document that is structurally multicolumnar.)
  2. Footnoted, endnoted, or sidenoted, since there is no way to mark up any of those structures in HTML. (You can use a hack like sub or sup, but there are no footnote, endnote, sidenote, or even note elements. That hack may be adequate for simple footnoted documents, but try rendering David Foster Wallace’s footnotes-within-footnotes in HTML 4.)
  3. An interactive form, since PDF interactivity can do more than HTML can. (Use with caution and only if HTML really cannot do what you want.)
  4. A multimedia presentation, since later versions of PDF can truly embed multimedia rather than simply refer to or call multimedia, as HTML does. (Same warning.)
  5. Combined accessible and inaccessible versions. A typical case is a scan of a historical document that also includes live text. (You really need that live text. The Smoking Gun’s scanned court documents wouldn’t pass muster here.) Another example – one that is legal in Canada under a copyright exemption – is a sign-language translation inside or alongside a written text.
  6. Custom-crafted solely for printing. I really mean that, and not a document so badly designed that people have no real choice but to print it because reading it onscreen is so tedious.
  7. Designed for annotation and round-trip travel: If you’re posting something to elicit comments, which are then sent back to you, PDF has useful structures that HTML doesn’t.
  8. A type specimen, which are all but impossible to create in HTML, unless the specimen involved is a “typeface” like Arial.
  9. A sample of a format that cannot be rendered in a browser (a PDF of Illustrator or Photoshop documents) or can only be rendered unsatisfactorily (CAD drawings where GIF and JPEG don’t have enough resolution). (In theory you could use SVG for CAD, but SVG remains mostly theoretical, doesn’t it?) This case also includes PDF files meant as samples of PDF files.
  10. A record of a document’s state at a specific moment. In this context, PDF is useful as a preservation format even for HTML Web pages.
  11. A document in a language whose script has no satisfactory support in Web browsers. This example must be used with great caution: In 2005, there aren’t many “minority” languages that cannot be rendered in a browser. Urdu and Georgian are two examples, though even those can be viewed under the right conditions. This can also be a subset of the type-sample case if your PDF illustrates the script or writing system used by a language.

The foregoing posting appeared on Joe Clark’s personal Weblog on 2005.05.06 15:40. This presentation was designed for printing and omits components that make sense only onscreen. (If you are seeing this on a screen, then the page stylesheet was not loaded or not loaded properly.) The permanent link is:

(Values you enter are stored and may be published)



None. I quit.

Copyright © 2004–2024