Tag soup, now at a discount

The Indigo E-book service with the unfortunate name, Shortcovers (Shortcomings?), now claims to be able to convert publishers’ files, even “PDF,” into ePub files. At a discount!

Let me offer a prediction of what’s gonna happen.

Shortcovers, which has already proven it cannot handle reasonably complex source texts, will “convert” your original files into ePub (or, as it errantly calls it, ePUB – I wonder what the U and B stand for). ePub is just XHTML 1.1. XHTML is HTML and XML at the same time, but that is one of the many details that will be overlooked.

The files they’ll return to you – and bill you pennies a byte for! – will surely be the worst kind of tag soup. Only tag soup can be so cheap. The typical case of marking up everything as a paragraph will be a walk in the park compared to this malarkey. In fact, I suspect each chapter will be marked up as single paragraph BRoken up with BR “tags.”

Canada, the natural home of the mediocre, has barely any definable skills at semantic markup. Shortcovers developers won’t know what that means. Neither will publishers. The latter is actually a larger problem. I kept getting inklings of this in advance of and at BookCamp (q.v.), where all of a sudden publishers who can barely wrangle their Wintel boxen are suddenly concerned with putting books online in electronic format.

Publishers as a species, like authors as a separate species, self-select against those who know anything about computers. They’re literary people, or marketing people, or some kind of people other than “computer” people.

Hence even though I can teach anyone, and have taught many people, the basics of Web standards in eight minutes flat (and co-deprogrammed a group of tag-soup-indoctrinated blind kids in a single day), these are not the kind of people who will naturally cotton to the concepts underlying semantic markup. And they don’t have to! They’re publishers and writers, not “text encoders.”

But: This means neither group – the author of his or her beloved manuscript or those who release it to readers – can tell the difference between good and bad code. So Shortcovers can happily sell publishers half-assed, nonsemantic XHTML, charge next to nothing for it, and walk away from the transaction having successfully hoodwinked the client into paying for second-rate work that undercuts the interests of writer and publisher.

Absolutely the only saving grace here are two facts: XHTML is XML and the ePub spec requires XML-style draconian error handling:

This specification defines only one level of conformance for a Reading System. A Reading System is conformant if and only if it processes documents as follows: When presented with an OPS Content Document the Reading System must… correctly process the XML as required in the XML 1.1 specification, including that specification’s requirements for the handling of well-formedness errors[.]

That means your ePub reader cannot accept broken ePub files. Or rather, it must stop rendering at the first point of breakage. In the context I am discussing here, that means at least some kind of valid document tree must be present in every ePub file or it won’t even show up correctly in a reader. This does in fact mean that half-assed markup could squeak through Shortcovers’ conversion service. It means horrifically disfigured markup could not.

Human evaluation

The problem here is the central problem of markup of existing texts: You have to use the most semantically appropriate markup for the content. You have to do that, though the spec will let you get away with failing to do so in limited respects. The issue here is that those “limited respects” manifestly alter the meaning of the source text.

Do you just whiz through the document and mark up every bit of italicization as I? Not every italicized word is I, and I say this especially because i let u b u. How about headings? Just B text, right?

How about links and images? Shortcovers, incredibly, considers these to be extras rather than what they actually are – intrinsic features of XHTML. And do Shortcovers staff have any training in writing proper alt texts? (They’re required.)

Marking up existing text requires interpreting the meaning, the structure, of that text. Half-assed markup alters the meaning of the text. Are you sure the author wants you to do that? I’m sure the author doesn’t even if the author can’t tell the difference. Some of us can.

Other reasons to be doubtful

Shortcovers claims to be able to translate notoriously unstructured documents, including the three worst offenders, Quark, PDF, and MS Word files, into ePub. I’m sure you could build a space shuttle out of papier mâché, but I wouldn’t want to try it. For these file types, you must engage an orbital-nuke scenario (™ Eric Meyer) – blow out all existing “formatting” and start over.

PDF is particularly troublesome as it is merely a database of objects. There are several reasons tagged PDFs are rare in the wild, one of which is the fact you can shove pretty much anything digital into a PDF and position it pretty much anywhere on the page. Sighted readers with full faculties can probably read the page with ease, but computers cannot be so described. How does Shortcovers plan to handle an extremely common PDF use case – multicolumn documents?
InDesign files could be easier to convert to semantic XHTML. But the publisher might as well do that at source.
ePub requires Unicode character encoding, but I expect Shortcovers’s service to be dumb as a mule and reduce everything to US-ASCII, especially quotation marks and dashes. (What about two quotation marks in sequence?)

Why enable mediocrity?

Canada is the kind of place where “developers” who don’t know HTML but can talk a good game get hired to run an eBook business. I’d like to think I’m wrong about this, but I have reason to believe I have just now accurately described Shortcovers’s entire dev staff. The really competent Canadian Web developers get picked off by the Americans.

I stand on solid ground as I cast these aspersions because my code does not suck. Plus I have walked the walk, publishing not one but two E-books with valid, semantic code. And I have long since printed out and read the ePub spec. Have you?

So if you’re a publisher, why are you taking Shortcovers’s claims at face value? Don’t you have dangerously little knowledge to make this kind of decision?

The reality is that markup of literary texts requires human inspection in all cases. A competent operator can do it pretty fast, but it’s still gonna cost you. Buy from Shortcovers and I predict you’ll be getting much less than your money’s worth.

Challenge

I can’t prove the foregoing statement, of course, because Shortcovers has so far refused to show its work. I want to see a source file and Shortcovers’s converted ePub file for myself. (And I want to see the price tag.)

So let me drop another gauntlet. I hereby challenge Shortcovers to use every arrow in its quiver to convert this PDF of Chapter 13 of Building Accessible Websites into ePub of a sort it considers adequate. I hold full rights to this work.

I really mean use every arrow, including the existing XHTML version – the worst-marked-up chapter of the book, but superior to 99.999% of the entire Web. Shortcovers can publish the results on its Web site, but may not sell them anywhere or sell or give them away on any mobile device, and it has to provide me with files in advance of doing any of that.

Put up or shut up, Shortcovers.

Permanent link & datestamp ☞ 2009.07.24 12:45
Filed under ☞ E-books · Follow-ups · Web standards
Select a category to see additional posts.
Add feed/ to a category to subscribe via RSS

The foregoing posting appeared on Joe Clark’s personal Weblog on 2009.07.24 12:45. This presentation was designed for printing and omits components that make sense only onscreen. (If you are seeing this on a screen, then the page stylesheet was not loaded or not loaded properly.) The permanent link is:
https://blog.fawny.org/2009/07/24/shortcovers-tagsoup/

Tag soup, now at a discount

Human evaluation

Other reasons to be doubtful

Why enable mediocrity?

Challenge

Search

Information

Copyright