Pelican plugins » HTML sanity

Base plug­in that makes your Pel­i­can HTML out­put and ty­pog­ra­phy look like from the cur­rent cen­tu­ry.

How to use

Down­load the m/html­san­i­ty.py file, put it in­clud­ing the m/ di­rec­to­ry in­to one of your PLUGIN_PATHS and add m.htmlsanity pack­age to your PLUGINS in pelicanconf.py.

PLUGINS += ['m.htmlsanity']
M_HTMLSANITY_SMART_QUOTES = True
M_HTMLSANITY_HYPHENATION = True

Hy­phen­ation (see be­low) re­quires the Pyphen li­brary, ei­ther in­stall it via pip or your dis­tri­bu­tion pack­age man­ag­er or dis­able it with the above set­ting.

pip3 install Pyphen

What it does

This plug­in re­places the de­fault Pel­i­can HTM­L4/CSS1 writ­er (it al­ready sounds hor­ri­ble, right?) with a cus­tom HTM­L5 writ­er de­rived from docutils.writers.html5_polyglot that does the fol­low­ing bet­ter:

  • Doc­u­ment sec­tions are us­ing HTM­L5 <section> tag in­stead of <div class="section">
  • Im­ages don’t have the alt at­tribute pop­u­lat­ed with URI, if not spec­i­fied oth­er­wise
  • Fig­ures are us­ing HTM­L5 <figure> tag in­stead of <div class="figure">, fig­ure cap­tion is us­ing HTM­L5 <figcaption> in­stead of <p class="caption"> and fig­ure leg­end is just a <span> as <div> is not al­lowed in­side <figure>
  • Drops a lot of use­less class­es from el­e­ments such as <div class="docutils">
  • Makes it pos­si­ble to have <a> el­e­ments with block con­tents (al­lowed in HTM­L5)
  • Even the Do­cu­tils HTM­L5 writ­er was putting fright­en­ing <colgroup> things in­to HTML ta­bles. Not any­more.
  • Top­ics are us­ing HTM­L5 <aside> tag, top­ic head­ers are us­ing <h3> in­stead of a non­de­script <div>
  • Line blocks are sim­ply <p> el­e­ments with lines de­lim­it­ed us­ing <br>
  • The <abbr> tag now prop­er­ly in­cludes a title at­tribute
  • reST com­ments are sim­ply ig­nored, in­stead of be­ing put in­to <!-- -->

Ty­pog­ra­phy

The Pel­i­can builtin TYPOGRIFY op­tion is us­ing Smar­ty­Pants for con­vert­ing ", ', ---, --, ... in­to smart dou­ble and sin­gle quote, em-dash, en-dash and el­lip­sis, re­spec­tive­ly. Un­for­tu­nate­ly Smar­ty­Pants have this hard­cod­ed for just Eng­lish, so one can’t eas­i­ly get Ger­man or French-style quotes.

This plug­in con­tains a patched ver­sion of smart_quotes op­tion from Do­cu­tils, which is based off Smar­ty­Pants, but with prop­er lan­guage aware­ness on top. It is ap­plied to whole doc­u­ment con­tents and fields that are in­clud­ed in the FORMATTED_FIELDS. See for your­self:

.. class:: language-en

*"A satisfied customer is the best business strategy of all"*

.. class:: language-de

*"Andere Länder, andere Sitten"*

.. class:: language-fr

*"Autres temps, autres mœurs"*

“A sat­is­fied cus­tomer is the best busi­ness strat­e­gy of all”

„An­dere Län­der, an­dere Sit­ten“

« Autres temps, autres mœurs »

The de­fault lan­guage is tak­en from the stan­dard DEFAULT_LANG op­tion, which de­faults to 'en', and can be al­so over­ri­den on per-page or per-ar­ti­cle ba­sis us­ing the :lang: meta­da­ta op­tion. This fea­ture is con­trolled by the M_HTMLSANITY_SMART_QUOTES op­tion, which, sim­i­lar­ly to the builtin TYPOGRIFY op­tion, de­faults to False.

Hy­phen­ation

Or word wrap. CSS has a stan­dard way to hy­phen­ate words, how­ev­er it’s quite hard to con­trol from a glob­al place and I’ve not yet seen any brows­er ac­tu­al­ly im­ple­ment­ing that fea­ture. Lack of word wrap is vis­i­ble es­pe­cial­ly on nar­row screens of mo­bile de­vices, where there is just way too much blank space be­cause of long words be­ing wrapped on new lines.

The hy­phen­ation is done us­ing Pyphen and is ap­plied to whole doc­u­ment con­tents and fields that are in­clud­ed in the FORMATTED_FIELDS. All oth­er fields in­clud­ing doc­u­ment ti­tle are ex­clud­ed from hy­phen­ation, the same goes for lit­er­al and raw blocks. You can see it in prac­tice in the fol­low­ing con­vo­lut­ed ex­am­ple, it’s al­so lan­guage-aware:

.. class:: language-en

incomprehensibilities

.. class:: language-de

Bezirksschornsteinfegermeister

.. class:: language-fr

anticonstitutionnellement

in­com­pre­hen­si­bil­i­ties

Be­zirks­schorn­stein­fe­ger­meis­ter

an­ti­cons­ti­tu­tion­nel­le­ment

The re­sult­ing HTML code looks like this, with &shy; added to places that are can­di­dates for a word break:

<p lang="en">in&shy;com&shy;pre&shy;hen&shy;si&shy;bil&shy;i&shy;ties</p>
<p lang="de">Be&shy;zirks&shy;schorn&shy;stein&shy;fe&shy;ger&shy;meis&shy;ter</p>
<p lang="fr">an&shy;ti&shy;cons&shy;ti&shy;tu&shy;tion&shy;nel&shy;le&shy;ment</p>

Thanks to Uni­code mag­ic this is ei­ther hid­den or con­vert­ed to a re­al hy­phen and doesn’t break search or SEO. Sim­i­lar­ly to smart quotes, the de­fault lan­guage is tak­en from the stan­dard DEFAULT_LANG op­tion or the :lang: meta­da­ta op­tion.This fea­ture is con­trolled by the M_HTMLSANITY_HYPHENATION op­tion, which al­so de­faults to False.

Jin­ja2 good­ies

reST ren­der­ing

It’s pos­si­ble to use the reST-to-HTM­L5 ren­der­er from your Jin­ja2 tem­plate (for ex­am­ple to ren­der a cus­tom fine print text in the foot­er, spec­i­fied through set­tings). Just pipe your vari­able through the render_rst fil­ter:

<html>
  ...
  <body>
    ...
    <footer>{{ FINE_PRINT|render_rst }}</footer>
  </body>
</html>

The fil­ter is ful­ly equiv­a­lent to the builtin reST ren­der­ing and the above M_HTMLSANITY_SMART_QUOTES, M_HTMLSANITY_HYPHENATION and DEFAULT_LANG op­tions af­fect it as well.

SI­TEURL for­mat­ting

Con­ve­nience fil­ter re­plac­ing the com­mon ex­pres­sion {{ SITEURL }}/{{ page.url }} with a for­mat­ter that makes use of urljoin so it does the right thing al­so when deal­ing with ab­so­lute URLs and even when they start with just //.

For ex­am­ple, if SITEURL is 'http://your.site' and you ap­ply format_siteurl to 'about/', then you get http://your.site/about/; but if you ap­ply it to 'https://github.com/mosra/m.css', then you get just https://github.com/mosra/m.css.

{{ page.url|format_siteurl }}

Text hy­phen­ation

If you need to hy­phen­ate text that was not al­ready pro­cessed us­ing the hy­phen­ation fil­ter (for ex­am­ple to wrap ar­ti­cle ti­tles or long words in menu items), use the hyphenate fil­ter:

<nav>
  <ul>
    {% for title, link in LINKS %}
    <li><a href="{{ link }}">{{ title|hyphenate }}</a></li>
    {% endfor %}
  </ul>
</nav>

The hy­phen­ation is by de­fault con­trolled by the M_HTMLSANITY_HYPHENATION op­tion. If you want to con­trol this sep­a­rate­ly, pass a bool­ean vari­able or sim­ply True to the fil­ter enable ar­gu­ment. The lan­guage is by de­fault tak­en from the stan­dard DEFAULT_LANG op­tion, if you want to over­ride it, pass lan­guage name to the lang ar­gu­ment. You can al­so take the val­ue from article.lang or page.lang at­tributes pro­vid­ed by Pel­i­can.

{{ title|hyphenate(enable=TEMPLATE_HYPHENATION, lang='fr_FR') }}

Some­times, on the oth­er hand, you might want to de-hy­phen­ate text that was al­ready hy­phen­at­ed, for ex­am­ple to avoid po­ten­tial is­sues in <meta> tags. The dehyphenate fil­ter sim­ply re­moves all oc­curences of &shy; from passed text. The enable ar­gu­ment works the same as with the hyphenate fil­ter.

<html>
  <head>
    <meta name="description" content="{{ article.summary|dehyphenate|striptags|e }}" />
  </head>
  ...

Why choose this over …

There are al­ready nu­mer­ous Pel­i­can plug­ins that try to do sim­i­lar things, but they at­tempt to fix it us­ing Beau­ti­ful­Soup on top of the gen­er­at­ed HTML. That’s a hor­ren­dous thing to do, so why not just pre­vent the hor­ror from hap­pen­ing?