⌂ Use page up/page down (Mac: FN + up/down) to turn pages

Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Extending BITS
with an orthogonal layer
for conveying layout information
using CSSa

Gerrit Imsieke

JATS UG Meeting
Washington, D.C.
2013-10-22

CSSa in a Nutshell

The Importance of Layout
in Conversion Pipelines

Hogrefe’s workflow

Technologies: XProc, XSLT2, Relax NG, Schematron

Motivation for Hub/CSSa: Need for a Neutral Intermediate Format

n times m converters

n times m converters

n×m

n+m

Layout information comes in two flavors:
central styles and local overrides.

.foo {
  font-size: 10pt;
  line-height:1.3em;
  hyphens: auto;
}

central style (here: class) ⇒ named rule

<css:rule name="foo"
  css:font-size="10pt" 
  css:line-height="1.3em"
  css:hyphens="auto"/>
<p style="margin-bottom:6pt; color:red">
  text
</p>

local overrides ⇒ prefixed attributes

<p css:margin-bottom="6pt" 
   css:color="red">
  text
</p>

Heterogeneous Input I

InDesign’s native layout representation

IDML

Test text in InDesign

           <ParagraphStyle Self="ParagraphStyle/test"
                Name="test"
                Imported="false"
                NextStyle="ParagraphStyle/test"
                KeyboardShortcut="0 0"
                FillColor="Color/C=75 M=5 Y=100 K=0"
                FontStyle="Bold"
                PointSize="14"
                SpaceBefore="5.669291338582678"
                SpaceAfter="5.669291338582678">
            <Properties>
               <BasedOn type="string">$ID/[No paragraph style]</BasedOn>
               <PreviewColor type="enumeration">Nothing</PreviewColor>
               <AppliedFont type="string">Minion Pro</AppliedFont>
            </Properties>
            

Heterogeneous Input II

Word’s native layout representation

OOXML

Test text in Word

    <w:style w:type="paragraph" w:customStyle="1" w:styleId="test">
      <w:name w:val="test"/>
      <w:pPr>
        <w:spacing w:before="113" w:after="113"/>
      </w:pPr>
      <w:rPr>
        <w:rFonts w:ascii="Minion Pro" w:hAnsi="Minion Pro"/>
        <w:b/>
        <w:color w:val="339966"/>
        <w:sz w:val="28"/>
      </w:rPr>
    </w:style>
            

Normalized Representation:
CSSa named rules

OOXML

IDML

         <css:rule name="test"
           layout-type="para"
           css:color="#339966"
           css:font-family="Minion Pro"
           css:font-size="14pt"
           css:font-weight="bold"
           css:margin-bottom="5.65pt"
           css:margin-top="5.65pt"/>
            
         <css:rule name="test"
           layout-type="para"
           css:color="device-cmyk(0.75,0.05,1,0)"
           css:font-family="Minion Pro"
           css:font-size="14pt"
           css:font-weight="bold"
           css:margin-bottom="5.65pt"
           css:margin-top="5.65pt"/>
            

BITS Integration

<book xmlns:css="http://www.w3.org/1996/css"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  css:version="3.0-variant le-tex_Hub-1.1"
  css:rule-selection-attribute="content-type style-type"
  dtd-version="0.2-variant Hogrefe Book Tag Set (hobots) 0.1"
  xml:lang="en">
  <book-meta>
    <book-title-group>…</book-title-group>
    <contrib-group>…</contrib-group>
    <custom-meta-group>
      <css:rules>
        <css:rule name="NormalParagraphStyle"
          native-name="$ID/NormalParagraphStyle"
          layout-type="para"
          css:color="device-cmyk(0,0,0,1)"
          css:font-weight="normal"
          css:font-size="10.5pt"
          css:margin-left="0pt"
          css:margin-right="0pt"
          css:text-indent="14.15pt"
          xml:lang="en"
          css:margin-top="0pt"
          css:margin-bottom="0pt"
          css:text-decoration-line="none"
          css:text-align="justify"
          css:direction="ltr"
          css:font-family="Times New Roman"
          css:text-align-last="left" />
        <css:rule name="hog_paragraphs_text_list_p_list2"
          native-name="hog_paragraphs:text:list:p_list2"
          layout-type="para"
          css:color="device-cmyk(0,0,0,1)"
          css:font-weight="normal"
          css:font-size="10.5pt"
          css:margin-left="28.3pt"
          css:margin-right="0pt"
          xml:lang="en"
          css:margin-top="0pt"
          css:margin-bottom="0pt"
          css:text-decoration-line="none"
          css:text-align="justify"
          css:direction="ltr"
          css:font-family="Times New Roman"
          css:text-align-last="left"
          css:text-indent="-14.15pt"
          css:pseudo-marker_content="'–'"
          css:display="list-item"
          css:list-style-type="dash"
          css:pseudo-marker_font-family="Times New Roman"
          css:pseudo-marker_font-weight="normal"/>
<css:rule name="None"
          native-name="$ID/[None]"
          layout-type="cell"
          css:border-top-color="device-cmyk(0,0,0,1)"
          css:border-top-width="0.5pt"
          css:border-top-style="solid"
          css:border-right-color="device-cmyk(0,0,0,1)"
          css:border-right-width="0.5pt"
          css:border-right-style="solid"
          css:border-bottom-color="device-cmyk(0,0,0,1)"
          css:border-bottom-width="0.5pt"
          css:border-bottom-style="solid"
          css:border-left-color="device-cmyk(0,0,0,1)"
          css:border-left-width="0.5pt"
          css:border-left-style="solid"/>

BITS Integration: content example

PDF snippet of the table cell

<td content-type="None"
   css:width="73.7pt"
   css:padding-left="5.65pt"
   css:padding-top="5.65pt"
   css:padding-right="5.65pt"
   css:padding-bottom="5.65pt"
   css:background-color="device-cmyk(0,0,0,0.1)"
   css:vertical-align="middle">
  <p content-type="hog_paragraphs_text_p_text_-_first"
     css:text-align="center"
     css:text-align-last="center">
     <styled-content css:font-size="9pt"
        css:font-family="TheSans">
        <bold>Type of Practice &amp; Time Length</bold>
     </styled-content>
  </p>
</td>

Attributes to elements mapper/wrapper

Part of our open-source framework “transpect” (see the source)

General template for handling boldface:

<xsl:template match="@css:font-weight[matches(., '^bold|[6-9]00$')]" 
  mode="css:map-att-to-elt" as="xs:string?">
  <xsl:param name="context" as="element(*)?"/>
  <xsl:sequence select="$css:bold-elt-name"/>
</xsl:template>

JATS customizing:

<xsl:param name="css:wrap-namespace" as="xs:string" select="''"/> 
  <!-- default: http://www.w3.org/1999/xhtml --> 
<xsl:variable name="css:bold-elt-name" 
  as="xs:string" select="'bold'"/>

Considering context

No <bold> in titles

<xsl:template match="@css:font-weight[matches(., '^bold|[6-9]00$')]"
  mode="css:map-att-to-elt" as="xs:string?">
  <xsl:param name="context" as="element(*)?"/>
  <xsl:if test="not(
                  $context/local-name() = ('title')
                  or
                  ($context/local-name() = ('phrase') 
                   and $context/../local-name() = ('title')) 
                )">
    <xsl:sequence select="$css:bold-elt-name"/>  
  </xsl:if>
</xsl:template>

Using layout information for QA

PDF snippet of the table cell

<rule context="td">
  <let name="table" value="ancestor::table[1]"/>
  <let name="text-nodes" value=".//text()[not(ancestor::fn | ancestor::index-term)]"/>
  <report
    test="( exists(@css:background-color)
            and (: there are also plain cells :)
            ( some $td in $table//td satisfies 
              not($td/@css:background-color) ) )
          or (: all text is boldface :) (
            exists($text-nodes) 
            and ( 
              every $t in $text-nodes satisfies
              exists($t/ancestor::bold) ) )" 
          id="ad-hoc-style-for-header-cell" role="warning" 
          diagnostics="ad-hoc-style-for-header-cell_de">
    If this is this a table heading, 
    please use the corresponding styles.
  </report>
</rule>

Schematron report

Schematron report complaining about missing th cells

QA of failed split cell joins

printed book:
PDF snippet of the split cells

Schematron report complaining about missing SPLIT markup

Schematron for detecting split cells

Operating on DocBook Hub XML rather than BITS*

<rule context="dbk:entry[matches(@role, 'box')]
                        [@css:border-bottom-width='0pt']">
  <assert test="matches(@role, 'SPLIT')" role="warning" 
    id="continued_cell_style_missing">
    If this cell is continued on the next page, 
    its style name must include '~SPLIT'.</assert>
</rule>

* table-based boxes will be converted to boxed-text in BITS

EPUB with some layout properties forwarded,
        some discarded, some thrown away

HTML

Some layout properties have been forwarded, some discarded (font-size, font-family), some converted (width, background-color) – compare with BITS+CSSa source

<table class="No_table_style" style="width: 100%">
   <col style="width: 16.2%" />
   <col style="width: 20.4%" />
   <col style="width: 20.4%" />
   <col style="width: 42.8%" />
   <tbody>
      <tr>
        ……
        <td class="None" 
           style="padding-left: 5.65pt; padding-top: 5.65pt; padding-right: 5.65pt; padding-bottom: 5.65pt; 
                  background-color: #E6E6E6; vertical-align: middle">
            <p class="hog_paragraphs_text_p_text_-_first"><b>Type of 
              Practice & Time Length</b></p>
         </td>

Technical BITS/CSSa Integration
Hogrefe’s HoBoTS customization

⇒ import unaltered (apart from automatic conversion) base schema

Thank you!

On Twitter: @gimsieke, @letexml

Landing page for the open-source transpect framework