Dec 052010

This post is also available in: German

As a proud, although specwise so far inactive, IDPF member I stumbled across the first editor’s draft of EPUB 3.0. There are many features in the current early draft that are important to educational content, such as annotations or MathML3. I think that, beside other important roles that it will play, EPUB 3.0 has the potential to become the dominant school textbook format. The primary reasons are in my view:

  • HW/SW vendor neutrality, open standards:
    • A broader installed base of reading systems means economies of scale for the textbook vendor
    • Neutrality is especially important for markets where public procurement and/or public curriculum definition dominates, as in the German school textbook market
  • EPUB’s design metaprinciple of embracing and packaging mainstream user agent technologies: a promise that interactive applications may be developed more cost-effectively than traditional learning applications

But availability of mainstream technology such as HTML and Javascript alone won’t bring the development costs down to a level where EPUB production costs will beat current print production costs. I’m particularly interested in interactivity as stipulated in item 1 of the charter. Mainstream interactive application scenarios, such as quizzes or fill-in-the-blank exercises, must be really easy to author. An author should only need to declare “this chapter is part of a quiz” or “this is the correct answer”, and the reading system will take care of all the application logic such as unveiling the correct answer, counting scores or selecting questions that need to be rehearsed more thoroughly. No programmer is needed here, because the capability to render certain application scenarios is already built into the reading systems.

It’s desirable to have a commonly accepted vocabulary for certain mainstream interactive scenarios. And of course certain semantics should apply to this vocabulary. For example, if the author declares that a score should be calculated, maybe by setting a fictitious property “calculate-score” of the quiz to “yes”, the score should be calculated as the count of the answers declared as correct that has been given at first try. So there is a certain rendering expectation that conforming  user agents /reading systems must meet.

But what if a publisher of interactive quiz books tries to differentiate itself from the competition, beyond the kind of customization that CSS permits? Users of its quizzes should have the opportunity to try again once if the first guess was wrong. And hints should be given at users’ requests. Because both the vocabulary designers and the renderer programmers cannot anticipate every frill that a content author or publisher may come up with, there will be requirements that cannot be fulfilled by most reading systems’ default configuration.

Trying to extend the standard is an inherently slow process with the only certain outcome that the putative competitive edge will have evaporated by the time that the standard is altered or emerges unmodified from the process. Publishers may consider custom scripting, but scripted content is regarded as a second-class EPUB citizen in the current draft. There’s the clear statement that content authors must not rely on scripting to work in a reading system, and they should include a version of their content that will provide a sensible fallback static rendering. Or better, instead of an alternative Version, the same unscripted HTML/CSS content should render similar to printed quizzes in today’s books, or maybe like the pseudo-interactive EPUB2 that we developed for Eichborn Verlag two years ago:

Screenshot of a quiz epub prepared in 2008

Screenshot of a quiz EPUB prepared in 2008

It contains 23 chapters, each of which containing between 30 and 150 questions, totaling 1,500 questions or 6,000 possible answers. The EPUB file is as interactive as permitted by EPUB 2.0: Each answer links to either a “try again” page (wich links back to the original question) or to a “correct, next question” page. Although all the content of a chapter is in a single linear file, The CSS page-break-before property ensures that the reader is not able to peek for the correct answers if she navigated using the links. (Well, this was before Apple chose not to recognize the CSS page-break-before property in their pre-1.2 iBooks app…) At the 2009 Leipzig book fair, we had users at our booth who kept answering questions on a Sony reader for about 40 minutes. So the user experience wasn’t that bad. But users asked for some score or reporting, or for an application that, during a second pass, would ask them only the questions they couldn’t answer in first place.

As stated above, this may be done by custom scripting. A significant drawback of this will be that the content will be interspersed with script code. You’ll need programmers to author content, and each interactive book will be a custom program, such as apps are today. This is costly and less interoperable.

Early Drafts

Let’s see how interactivity is being scheduled in the current early EPUB 3.0 draft. The final spec should satisfy item 1 of the charter or, derived from that, requirements I-1 and I-8. The first editor’s draft of the spec allows for two basic ways to implement quizzes with calculable scores, a personalized save/resume feature and history-based re-training:

  • Plain scripting (content interspersed with scripting instructions)
  • Embedded islands of interactivity

Although there is still arguing over details, the “islands” (implementable as HTML object or iframe elements) seem to be a smart solution that satisfies some important requirements and basic principles of EPUB 3.0, namely:

  • HTML5 is the predominant payload format
  • separate code from data

These islands of interactivity, where each may contain a quiz and its results, are able to communicate with other islands primarily through local storage, as available per HTML5. This enables, for example, an overall score at the end of the publication. It is another interactive island whose purpose is to sum up the individual scores and optionally also give advice which areas of knowledge need more practicing.

The quizzes and the score summary islands may be made out of arbitrary markup (XML, JSON, text, HTML, …) and a handler script. This script will typically, or rather, necessarily, be implemented in Javascript. It will accept user input (e.g., by clicking radio buttons), it will compare the input to the possible answers, it will give the user feedback whether the answer was correct and it may save a history of questions answered, together with information about correctness of the answers. It may access local storage to write this history to, and the statistics widget will read the data of all individual quizzes from local storage and display the overall score. In addition, the interaction and statistics scripts might be enabled to distinguish between individual users, identified by name, and let each user of the publication continue where she left off the last time she used the reading device.

There are a couple of issues with this solution. (As a matter of fairness I have to concede that when suggesting this solution, Peter Sorotokin had applications in mind that naturally may be confined to an embedded island: a chess game or a small quiz that doesn’t span whole chapters or book.)

If you want a static default rendering, you may put your static quiz content in the object element that you use to invoke the scripted rendering for supporting devices. But this breaks NCX compatibility, for example. If the NCX thinks that the 12th question of the 2nd chapter is somewhere in the object element’s content, but you are using the scripted rendering, it won’t probably take you anywhere if you click on the corresponding ToC entry.

Therefore you might consider not including the whole chapter with its up to 150 questions as a single object. Instead each question remains a HTML heading, linkable from NCX, and you put each question’s four possible answers into an object (that is, one object per question). Rendering up to 150 questions per chapter as custom objects may be a performance issue. Of course you could just render one question at a time below the chapter heading in a single object or iframe. So you just need to include one widget per chapter. But then again you’ll run into linking/bookmarking problems.

Then there’s the question which format to use as input to your interactive script. You may consider some semantic markup or the same HTML that you included for the static rendering. But if you include the HTML twice, you might have neat code/data separation for the script, but you’ll have an issue of double data sets. There are some people who do not generate their EPUB from some other source, but edit the content by hand. (By the time when EPUB will be the predominant publishing format, probably even more people will be inclined to hand-craft their publications.)  Of course it’s desirable to avoid this redundancy.

Considering the alternative: you provide the quiz markup in some supersemantic format. The static fallback HTML is just a pre-generated rendering of that. But who, except yourself, understands the semantics of your markup? The utility of custom declarative markup that nobody shares with you is limited. Your custom vocabulary’s semantics might not be declared anywhere and may only be inferred from what your rendering script renders. So not only did you create your own Humpty Dumpty vocabulary, but it won’t even render natively without scripting engines.

These are the drawbacks of the current proposal, notwithstanding the usefulness I see for that in certain circumstances.

As per the current draft, content authors may choose between this kind of interactivity and what plain <script> scripting is offering. I’m not sure which one to prefer. Plain scripting will at least have the advantage that the content doc’s HTML+CSS, free from redundancy and custom markup, will allow for a static default rendering in the absence of scripting.


The other idea that I carried around with me for a month is Microformats: Suppose that there is some agreed-upon microformat vocabulary for quizzes. It relies entirely on HTML5 markup and establishes conventions for class attribute values.

But today [on Dec. 15, the publication date of this post is misleading], while diffing the most recent submit of Content Docs 3.0, I stumbled across the solution. In short, it’s not the class, it’s the role attribute (or: it’s the well-specified RDFa instead of microformats). See the sections ranging from lines 259 to 499 on the right-hand side of the diff, or rather look at the rendered version.

These role (or epub:type, as the spec calls them) attributes are not limited to islands of interactivity. So these can be used to mark up quizzes for script processing without breaking the default HTML/CSS rendering, without relying on any of the event attributes (onclick etc.), and without Javascript in the content doc.

But the main challenge is to avoid the Humpty Dumpty issue of private vocabularies. One approach may be to include the vocabularies for important areas of application into the standard. But then we’ll never finish. I’m not sure whether even the most canonical interactive vocabularies should be part of the core epub vocabulary. They should rather be maintained in a separate standardization track. That’s for the same reason that a corporation doesn’t specify the amount of its management’s remuneration or its business hours in its bylaws: the update frequencies are different.

During the next couple of years there may be a lot of activity and experimentation going on as regards finding vocabularies and developing handlers for interactivity within EPUB. EPUB3, however, should become a stable, well-crafted and implementable standard within the next couple of months. This core standard will probably last for 10 years with only minor amendments. (Unless the browser/JSON people achieve Complete World Domination in the meantime. In the latter case, EPUB will be replaced by a highly optimized JSON format that doesn’t need zipping and complicated XML mangling any more. The methods for handling interactivity will be included as property values somewhere within, a technique that will be known as EJSON, which stands for executable JSON, file name extension .EXE [but basename length may exceed 8 characters]. Sorry for the digression). Vocabularies, if well-designed, may live very much longer than an eBook standard. But while they are being integrated with EPUB and tested in the field, they may frequently be adapted. And there will hopefully be a dozen or so vocabularies, generic or domain-specific, joining the party during the first two years. So it’s better to finalize the core soon and to keep it stable. At the same time we need to preserve our ability to develop some frills more dynamically and with time. However, this dynamic extension must not happen freewheelingly or seclusively. It has to be tightly coupled to the standard and its standardization body, the IDPF.

Therefore I think that vocabularies should be able to have a status between fixed (the core vocabulary of and user-defined. The IDPF may endorse or bless non-core vocabularies and make them accessible somewhere below But then it should also specify rendering expectations for these vocabularies and for the different kinds of reading systems (visual scripting-enabled, visual non-scripting, screen reader, …). The IDPF might also run a repository of reference implementations, among them Javascript libraries that implement interactive educational content, crossword puzzles or chess game recordings. Reading system vendors are free to use these reference implementations for rendering the IDPF-endorsed vocabularies, or they may modify them according to their needs. Content creators are free to deliver their own implementation in an EPUB publication, and users should be able to choose whether to prefer libraries delivered with the reading software or with the publication.

An open issue remains: how to bind libraries (delivered as Javascript files) to vocabularies? Maybe by the bindings element newly introduced in EPUB3. Its current content model and semantics would have to be changed or extended. In the following example, I’ve extended the example given in the draft in the following ways:

  • Zero or more instances of a newly introduced element, vocabulary, may be used below the bindings element.
  • vocabulary has a mandatory uri attribute and an optional handler attribute.
  • The semantics are as follows:
    • If the handler attribute is not present, a default handler for the URI specified in uri should be used (see below for terminology/definitions). Preconditions: scripting is available on the reading system and the user did not turn off scripting. Content authors are encouraged to omit the handler for blessed URIs, where the Reading System should provide a built-in handler.
    • N.b.: omission of the handler attribute is not equivalent to “no binding element for this URI at all”. In the absence of a binding element for a given URI, no handler will be used, even if a default handler were available.
    • Reading systems may provide users with a switch option: for each URI found in the Content Docs’ prefix attributes and for which there’s a default handler, the user may select “◯ Use built-in handler for [URI] / ◯ Obey publication’s settings / ◯ Don’t use a handler at all.”. (Circles = radio buttons.)
    • If the handler attribute ist present, a Reading System that offers scripting should use the handler for the corresponding URI; subject to the same user preference overrides as described in the previous list item.


  • “A handler for a URI is used” shall mean: The resource associated with the handler [read: the file js/chess.js] is included in a Content Document that associates the URI with a prefix in its root node’s prefix attribute [read: <svg prefix=”chess:″ …>].
  • “Inclusion” shall mean that the content doc should be processed as if the handler resource’s content was contained or referenced in a <script> element.
  • A URI is called “blessed” (by the IDPF) if it is contained in a list of blessed URIs and rendering expectations for their content. This list is maintained by a designated IDPF WG.

Implementation Details:

  • The order in which the handlers are included is the order in which the corresponding URIs appear in the content doc’s prefix attribute.
  • The handlers are included as if they appeared at the first position in the document where scripts are permissible. (If the handlers require other libraries to be loaded before them, they have to manipulate the DOM for that, after making sure that the library hasn’t already been loaded by another script.)
  • The handlers must not remove any ID values in order for NCX mechanisms to function. (Won’t probably suffice, though.)

Remark: It is not guaranteed or enforceable that the referenced script only manipulates “its” content (i.e., the elements that carry an attribute from the vocabulary that it is supposed to processes). It may do whatever a script is allowed to do to the DOM.

  • It is expected that Reading Systems’ software/firmware may be updated easily so that updates in the blessed vocabulary list and in the reference implementation repository will propagate quickly to end users.

Package Document Example:

The following Package Document declares that a custom handler (i.e., a handler supplied in the package) should be used for the quiz vocabulary associated with the base URI, while the chess vocabulary should be dealt with by a built-in handler. In addition, the vocabulary with the base URI should be rendered by another custom handler. The existing example from the spec is kept here to illustrate that both binding methods may well co-exist. (I took the freedom to use application/javascript handlers for vocabularies while the current draft requires another level of indirection for mediaType bindings, where handlers are being provided wrapped in an application/xhtml+xml file. As mentioned above, there’s still discussion going on regarding this topic.)

<package xmlns=""
    <item id="quiz_js" href="js/quiz.js" media-type="application/javascript" />
    <item id="dumpty_js" href="js/dumpty.js" media-type="application/javascript" />
    <item id="impl" href="impl.xhtml" media-type="application/xhtml+xml"/>
    <item id="slideshow" href="slideshow.xml" media-type="application/x-demo-slideshow"/>
    <vocabulary uri="" handler="quiz_js" />
    <vocabulary uri="" handler="dumpty_js" />
    <vocabulary uri="" />
    <mediaType media-type="application/x-demo-slideshow" handler="impl"/>

Items in the Package Document that use the vocabulary-based binding mechanism do not need to be declared as scripted (tbd, maybe distinct attribute values for plain scripting, for custom handlers and for default handlers). In my view, the property is not needed because scripting requirements may be concluded from the presence of a binding element and of a corresponding URI in the Content Document. But there may be reasons for making this explicit on the manifest/item level.

A handler script will typically, but not necessarily, alter the content doc’s static content immediately after the document has been loaded. The static content is there to provide a sensible rendering to users of non-scripting reading systems. The script may do so by adding controls, folding content, rendering a series of chess board images / interspersed HTML move descriptions as a slideshow or replacing it with an iframe that renders an interactive SVG document where you can browse the moves using a slider, …

To sum up, these are my suggestions:

  • If a reading system is able to handle scripted content, it must honor the bindings/vocabulary element.
  • If a reading system is able to handle scripted content, it should provide and use a built-in handler for IDPF-blessed vocabularies, unless another handler is specified in the publication.
  • The list of blessed vocabularies will be maintained by an IDPF Working Group.
  • This WG will also maintain a repository of reference implementations for each blessed vocabulary. The reference implementations must be published under an accepted open source license that permits re-use in commercial proprietary products. Examples of such licenses are Apache, BSD, or MIT. The WG is free to host multiple implementation alternatives for each blessed vocabulary.

At least one question remains open: will this additional layer of indirection and complexity be accepted in practice? Or will the predominance of browser engines lead to a situation where the most wide-spread Reading Systems will only implement plain scripting (because it’s already there) and refuse to ship the RDFa infrastructure and handlers for blessed vocabularies? Will content authors see the advantages of purely declarative markup for common use cases? Or will they opt for plain scripting because they are already familiar with it (if they are Web designers) and because it offers more flexibility? Will they opt for plain scripting even if it leads to the creation of EPUB files that render poorly on non-scripting or non-visual Reading Systems? Will RS vendors and content authors ignore this part of the standardization effort, making HTML5 almost the only whole-heartedly accepted thing in EPUB3? I don’t hope that it will turn out like that, because I think especially the EPUB3 WG so far paved its way impressively well between the two forces: the browser-vendor / visual / API-centric HTML5 momentum on one hand and “traditional” XML, accessibility, and semantic Web technologies.

From another perspective: the relatively limited use of RDFa in the current draft might not give the RS vendors good enough reasons to implement RDFa whole-heartedly. According to the current draft, a content author may include a private vocabulary in a document, leaving rendering or rather non-rendering to the RS. Content authors will have no good reason to use any non-core vocabulary if they care about compatible EPUB documents. And RS vendors may be inclined to provide some form of statically encoded support for the core vocabulary, without using the full RDFa apparatus. Poor adoption on both the content author and the reading systems sides may lead to RDFa being dropped in a later version of the spec. So the proposed binding mechanism and canonical vocabularies (blessed base URIs for which default handlers exist) is an important application that helps justifying RDFa in EPUB.

Acceptance of this binding mechanism will depend on how flexibly the default behavior of the handlers may be adjusted and extended. Otherwise the content developers will likely use plain scripting, maybe together with widget libraries in order to hide complexity. The drawback of this will be:  documents that might be marked up well if the binding mechanism was in place are contaminated with onclick attributes and the like, leading to accessibility problems. I think that if the reference library implementations are well-designed, an ecosystem of plugins and themes may flourish around them, à la jQuery (or the reference implementations may even be designed as jQuery plugins), and content authors will use declarative, canonical, semantic markup and the binding mechanism. Reading System vendors are encouraged to use reference implementations that will help overcome the chicken and egg problem (“no renderer, no content”, and vice versa) and bring interactive content to the masses, including the visually impaired.

  One Response to “EPUB3: “Blessed Vocabularies” for Interactive Publications”

  1. Frank Lowney recently analyzed an iTunes U EPUB (actually, he probably did in October). He says that the interactive quiz content files look like plain Web pages, with custom Javascript, and that’s true.
    This means that plain scripting already happens in the field. Although this clearly is as little EPUB2 compliant as the use of <video> , Apple is not behaving too badly here. This kind of scripting will be allowed in EPUB3, and it will be interoperable. But it’s a non-declarative quiz that will fail to render comprehensibly on a non-scripting device, as for example Adobe Digital Editions on my PC. The button “Show answer” simply doesn’t work. In a script-enabled future edition of ADE it may work, but will it work in an EPUB viewer for the visually impaired, even if it supported Javascript? And it will be a pain to author. In the answer HTML alone, 82 of 123 lines are Javascript code, plus 4 references to external code (accidentally, they are using abovementioned jQuery). I don’t object their using Javascript, but I think it’s neither friendly to the reader nor the author like that. (Reader unfriendly because of: JS only interoperable for visual renderers using larger-screen browser engines; Author unfriendly because it’s too expensive to author and test content that mainly consists of script code.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>