Or rather, why Unicode isn’t.

Here’s an observation I’ve been saving up for a while. Back in linguistics school (not quite like the “engineer school” that preceded it), it was drilled into our heads that we were to disregard writing systems at all times. Most of the drilling was done by a phonetician who was trying to teach us how to transcribe utterances into IPA. In essence, he instructed us to ignore the writing system while writing things down. Why didn’t I see a problem there?

It is only in recent years, and through the popularity of the Web, that it has become OK, at long last, to be interested in linguistics through the lens of writing. It is suddenly OK to find writing systems interesting, and not merely as juvenile Bringhurst-style showcases of just how bizarre foreigners’ writing can be. It has never been more OK to be interested in fonts, or typography (not quite the same thing), or graphic design (also not quite the same). It is even OK to be interested in psychology of reading.

Never before in human history have so many people been able to identify a language just from a few telltale accents in a word you show them, or explain why and how Japanese and Chinese are not the same, or give a layperson’s explanation why all-caps text is hard to read, or been able to differentiate many more forms of quotation marks than a mere one or two. Nearly every person with adequate vision who has an office job in a Western country can at least tell you that Times (“New Roman”), Comic Sans, and Arial are three different fonts. All those people, and many more besides, actually know what fonts are.

The Web did not kill off print, as predicted by people who were never any good at computers in the first place. Neither did the Web kill off reading. Nor was the printed page or the task of reading dealt a setback by the Web. There is more to read now than at any time in human history and people are reading it.

When I was a teenager I used to struggle to explain typography (“ ‘Topography’?”) to grownups. (Another thing I had to explain was captioning [“Newspaper photos?”].) Now, in broad terms, it is finally OK to like what I have liked all along.

I was certainly very surprised to find so many of my interests packed into the same book – Unicode Explained by Jukka K. Korpela. If you’ve participated in any of several computer-related demimondes over the last ten years, you will be familiar with Jukka’s irascible style (rather reminiscent of my own, in fact). “We” interviewed him for the NUblog seemingly ages ago (Parts I & II). If you read his book, the whole thing is so smooth and easy to understand you will think they got the byline wrong. The book could be described as a plain-English guide to Unicode, except that plain English seldom requires 630 pages to express itself.

While Unicode is intrinsically complex in many respects, Jukka has succeeded in paring the complexity down to the bare minimum. I am not sure it is possible to explain Unicode any better than Jukka has done here. If things are still somewhat hard to understand, I don’t see how they could possibly be explained in a different way that would remedy the problem. I do not see how the writing style could be significantly improved.

I have considerable reservations about the typography – using Times New Roman with no paragraph indents makes the whole thing look like a Microsoft Word printout. (There is much discussion of MS Word and Windows software generally. The book is not platform-agnostic in any real way. I kept thinking how punishingly difficult it would be for anyone using Windows to type the characters he shows in the book.) There were a few little flubs in the Unicode characters being displayed, which is always a problem. And a tiny few sentences, which I cannot now put my finger on, did not quite sound like something that would be produced by a native English speaker. (The author isn’t one.)

Screenshots are a bit too pixelated and show much too much browser chrome most of the time. (This was a point of honour in my first book, where I set things up carefully and we spent many minutes per screenshot tweaking individual pixels.) While there is much discussion of how people read characters, it is certainly not at the optical or neurological level, which is fine.

It was grand to read the sometimes-lengthy histories and usage notes for ASCII characters, like a full page on #. Jukka always mentions the ways people mistype a character, either intentionally (as in _emphasis markers_ in *E-mail*), or because the software and hardware get in the way, or due to ignorance. His typing exercises (p. 113) – everything from naïve to β-carotene to Fermat’s last theorem in compact notation (30 characters) – are actually tempting, but certainly a huge time commitment even on OS X.

I found a few snippets that may be useful for accessibilitistas and standardistas:

  • [L]anguage” means definitely “human language” as oppos[ed]to computer languages such as programming, command, and data-description languages. Text in a computer language may be characterized as belonging to some human language, to some extent. For example, for the purposes of speech synthesis, comments and variable names in computer source programs need to be interpreted as belonging to some human language.
    [p. 358]

  • [W]hat should you do with words like “status quo” (that’s Latin, isn’t it?) or “fiancé” (French, even if used in English text?) or with proper names of people and things? For example, the Web Accessibility Initiative (WAI) recommendations say that you should indicate all changes of language in a document, and this is a Priority 1 requirement. Yet th WAI documents themselves don’t do that for proper names. […] The paradox of language markup: It’s easy when it’s not needed.
    [p. 362]

If you are the new breed of person who is not ashamed to be interested in writing, typography, reading, or computers, Unicode Explained puts them all in one book. O’Reilly pricing will, however, put Canadians at a disadvantage, since the thing retails for 78 bucks.

The foregoing posting appeared on Joe Clark’s personal Weblog on 2006.09.21 15:17. This presentation was designed for printing and omits components that make sense only onscreen. (If you are seeing this on a screen, then the page stylesheet was not loaded or not loaded properly.) The permanent link is:

(Values you enter are stored and may be published)



First of all, I quit. If you must proceed:

Copyright © 2004–2023