04 November 2014

Ebooks are Software

Publishers have converted a huge number of their books into ebook form over the past few years.

Here are some reasons they might have done these conversions.
  1. They believed that ebooks would be profitable, especially if low-quality conversion was done, making their fixed costs negligible.
  2. They were skeptical that ebooks would be profitable, but low-quality conversion was so cheap that it was worth hedging their bets.
  3. They feared Amazon's reprisal against their paper sales if they failed to get on board with Kindle.
Can you tell that I really want to talk about here is low-quality conversion? (I couldn't resist mentioning the juicy topic of pressure from Amazon, but that gets plenty of attention so I'll leave it at that.)

I used to rail against low-quality conversion, to whoever would listen: mostly my poor wife, since she's more or less a captive audience.

Then I had a humbling realization: publishers did exactly the right thing, in opting for low-quality conversion.

But I haven't become too humble: I think they did the right thing for the wrong reason.

They think they converted cheaply and now they're done.

I think they did the right thing to convert cheaply, but they should just view those conversions as version 1.0.

Another way of putting this is that publishers need to start treating ebooks as software, since ebooks are software.

  1. Software has bugs that need to be fixed.
  2. Software needs to evolve as its environment changes.

To be fair, nothing in their previous, paper-based business would have prepared publishers to understand the dynamics of software.

Heck, software is such a young, changing field, that half of the software companies out there don't understand the dynamics of software. So why would I expect publishers to do better?

Then again, as Kohelet reminds us Jews every Sukkot, there is nothing new under the sun.

Publishers have, over the last four centuries or so, developed quality control processes that
  1. result in low initial defect rates and
  2. allow for fixing the few defects that do creep in.
Kind of sounds like software, doesn't it? At least it sounds like software property #1 above: software has bugs that need to be fixed. (We'll leave property #2 (evolving with environmental changes) alone for now.)

So, publishers don't know anything about the particulars of software, but they do know something about quality control of initial releases and managing bug fixes in subsequent releases.

In paper publishing, a release is analogous to a printing.

Quality control of an initial release, in publishing, includes processes such as proofreading, possibly done multiple times on proofs of increasing finality. Managing bug fixes in subsequent releases includes processes such as receiving reported typos and fixing those that merit fixing.

One might think that publishers' quality process savvy would have ported well to the world of ebooks.

Sadly, this could hardly be farther from the truth.

As far as I can tell, these quality control processes almost never happen to ebooks. This is especially puzzling in the case of bug fixes, since the ebook medium drastically lowers the cost of reporting and fixing typos.

Paper books don't have a button allowing a reader to report a typo to the publisher. But, Kindle books might as well not have such a button, since, in my experience, publishers hardly ever act on such reports. I made hundreds of such reports before realizing that it is virtually pointless to do so.

To fix a typo in a paper book, a publisher has to not only fix the typo but wait for the next printing, which may never happen if the book's popularity falls off. In contrast, for an ebook, there is no such thing as a printing, only releases. A new release can be made at whatever frequency the publisher deems appropriate. Too bad virtually none of them seem to take advantage of this capability.

But, let's get back to my major theme here: low quality conversion.

If the initial conversions of ebooks had not been of such low quality, the typo correction process would not be so important.

But, as I must calmly remind myself when I start foaming at the mouth, I'm now trying to give publishers the benefit of the doubt, admitting that an initial low quality conversion was the right thing to do.

Low quality conversion allowed publishers to quickly enter a new market with low initial investment. What's not to love about that?

So, is all I'm advocating that publishers take reported typos seriously and start releasing 1.1 versions?


I'd like to advocate for something more radical.

Yes, start taking typos more seriously, but, even more importantly:

It's time for version 2.0. In other words, it is time to re-convert, the right way. Or at least a better way.

It may be painful for publishers to hear this, since most are still in the middle of, or have just completed, the conversion of their back catalog. (Luckily, I don't think publishers or anyone else reads my blog, so these painful words will not be heard.)

In software, it is not at all painful to hear that what you do when you complete version 1.0 is you get to work on the next version. In fact, there is often a pipelined development process where work on 2.0 is already well underway when version 1.0 is released!

These 2.0 versions should
  1. Be high quality, e.g. avoid OCR if possible
  2. Be modern, i.e. avoid concessions to the limitations of early e-readers
I've already talked about quality a lot, but this "be modern" admonition takes up a theme I briefly introduced above but then dropped: software needs to evolve with changes in its environment.

"Be modern" means no more concessions to the limitations of early e-readers. Publishers need to make the same hard calls that software companies make, with respect to leaving certain users behind who do not (or cannot) upgrade their hardware or software. Perhaps Amazon and other vendors could ease those users' pain by still making the old version available, but I know of no current mechanism for this. In the big picture, I'm sorry to say it but publishers can't let a few users cause their books to be stranded in a format that was the right thing for one particular time but is not the right thing, going forward.

Here I'm going out on a limb, but I think ebooks are where books are heading. I'm not saying that paper books will die. They will probably always have a place. But I think that in the future, the roles of paper books and ebooks will flip: the ebook will be viewed as the canonical version of the book, and the paper book will be viewed as a convenient alternate form of this canonical version. If I'm right about this, the initial, quick-and-dirty conversions that publishers have done are not appropriate for a lasting, canonical encoding of a book.

Some of the concessions to the limitations of early e-readers that are my pet peeves are as follows.
  1. Images used instead of Unicode.
  2. Raster images (e.g. JPEGs) used instead of vector images (e.g. SVG).
  3. Failure to take advantage of various EPUB 3 features.
I guess this is sort of an abrupt ending, but that's it for now.

The web needs inkscale images

Web browsers are missing an important feature I call inkscale.

Inkscale is like grayscale, but instead of varying from black to white, it varies from its context's background color to its context's foreground color.

Inkscale should also support transparency, in which case it would vary from transparent to its context's foreground color.

The problem that inkscale solves is keeping an image's color scale in sync with the page's surrounding text colors.

Inkscale could be implemented by extending an existing image format, in which case no extension to HTML or CSS would be needed, though web browsers would need to be upgraded to interpret this new feature of, for example, PNG.

Alternately, inkscale could be implemented by extending HTML, perhaps as a new attribute of the IMG element that would instruct the browser to "deliberately misinterpret" a grayscale image as inkscale. Or it could be implemented in CSS.

My particular motivation for wanting inkscale is ebooks. Ebooks readers are basically specialized web browsers, since the two most commercially important ebook formats are the following.
  • EPUB, which is based on HTML
  • Kindle (MOBI/KF8), which is usually generated from EPUB
The reason why inkscale is particularly important for ebooks is that, unlike most web browsers and web sites, many ebook readers allow the user to set the foreground and background colors.

So whereas the lack of inkscale in web pages is primarily an inconvenience for web designers, the lack of inkscale in ebooks is an inconvenience for end users. For example, consider these following two different "statements of pain."
  • Web designer: "Argh! We can't redo the color scheme without re-generating our images!"
  • Ebook end user: "Argh! I like black-on-beige text but it makes this book's images stick out like a sore thumb, since the images are stuck as black-on-white!"
Another reason that inkscale is particularly important for ebooks is that many ebooks encode non-Latin characters as images, since unfortunately for a long time many ebook readers did not have Unicode support. It is particularly jarring when such "text-only images" do not match the foreground and background colors of the surrounding text.

Here are some examples of problems created by the lack of inkscale in ebooks. These are screen snips from the Kindle for PC ebook reader.  I show each example two ways. First, I show it with a problematic color scheme (white-on-black or brown-on-beige) and then I show it with the black-on-white color scheme, in which case there is no problem.

Finally, a few random concluding notes and questions.

AutoCAD's bitonal images feature is a bit like inkscale.

SVG inside HTML already supports inkscale images, for vector elements, i.e. strokes and fills. E.g. an SVG element's "fill" property can be set to the SVG variable "currentColor," which is the current HTML/CSS foreground (text) color. What is not clear to me is whether SVG can be coaxed to transform raster images according to currentColor.

Fonts can, somewhat perversely, be thought of as special-purpose inkscale images. In some sense, this is what has created the whole problem: if fonts weren't inkscale, we wouldn't need inkscale to match text colors!

On the flip side, fonts can, somewhat perversely, be used to allow inkscale on the web today. The idea is, somewhat perversely, to encode the images that need to be inkscale as glyphs in a custom font. (This may not be so perverse if the images are in fact being used to overcome character set limitations, e.g. spotty Unicode support.)

The End.

Update: a friend who shall remain nameless found the following Stack Overflow discussion of how to implement something close to inkscale:

The closest thing to inkscale, at the moment, seems to be the CSS "filter" property, e.g. as described here:


The "filter" property (in its webkit-specific form) is intriguingly demonstrated here:

and here:

I may be missing something, but it seems like the "filter" property, though offering something close to inkscale, isn't there yet.

In particular, the "filter" property offers grayscale() and sepia(), but inkscale support (at least as I have defined it) would require something like inkscale(). I.e. grayscale() and sepia() are mapping the image along two specific background/foreground scales, rather than mapping the image along a "current background" to "current foreground" scale.

Here's a follow-up post I made on the topic of this "inkscale" feature.