Plugins » HTML sanity

Base plu­gin that makes your Pel­ic­an HTML out­put and ty­po­graphy look like from the cur­rent cen­tury.

How to use

Pel­ic­an

Down­load the m/htmls­an­ity.py file, put it in­clud­ing the m/ dir­ect­ory in­to one of your PLUGIN_PATHS and add m.htmlsanity pack­age to your PLUGINS in pelicanconf.py. The fol­low­ing shows the min­im­al con­fig­ur­a­tion to­geth­er with de­fault val­ues of all avail­able op­tions. Not spe­cify­ing the op­tion is equi­val­ent to set­ting it to a de­fault value.

PLUGINS += ['m.htmlsanity']
M_HTMLSANITY_SMART_QUOTES = False
M_HTMLSANITY_HYPHENATION = False

Hy­phen­a­tion (see be­low) re­quires the Pyphen lib­rary, either in­stall it via pip or your dis­tri­bu­tion pack­age man­ager or dis­able it with the above set­ting.

pip3 install Pyphen

Py­thon doc theme

The m.htmlsanity plu­gin is avail­able al­ways, no need to men­tion it ex­pli­citly. How­ever, the op­tions aren’t, so you might want to sup­ply them. The same de­pend­en­cies as for Pel­ic­an ap­ply here. The fol­low­ing shows the min­im­al con­fig­ur­a­tion to­geth­er with de­fault val­ues of all avail­able op­tions. Not spe­cify­ing the op­tion is equi­val­ent to set­ting it to a de­fault value.

M_HTMLSANITY_SMART_QUOTES = False
M_HTMLSANITY_HYPHENATION = False

Doxy­gen theme

The Doxy­gen theme gen­er­ates the HTML out­put dir­ectly, without doc­utils in the mix, which means there’s no need for this par­tic­u­lar plu­gin there.

What it does

This plu­gin re­places the de­fault Pel­ic­an HTM­L4/CSS1 writer (it already sounds hor­rible, right?) with a cus­tom HTM­L5 writer de­rived from docutils.writers.html5_polyglot that does the fol­low­ing bet­ter:

  • Doc­u­ment sec­tions are us­ing HTM­L5 <section> tag in­stead of <div class="section">
  • Im­ages don’t have the alt at­trib­ute pop­u­lated with URI, if not spe­cified oth­er­wise
  • Fig­ures are us­ing HTM­L5 <figure> tag in­stead of <div class="figure">, fig­ure cap­tion is us­ing HTM­L5 <figcaption> in­stead of <p class="caption"> and fig­ure le­gend is just a <span> as <div> is not al­lowed in­side <figure>
  • Drops a lot of use­less classes from ele­ments such as <div class="docutils">
  • Makes it pos­sible to have <a> ele­ments with block con­tents (al­lowed in HTM­L5)
  • Even the Doc­utils HTM­L5 writer was put­ting fright­en­ing <colgroup> things in­to HTML tables. Not any­more.
  • Top­ics are us­ing HTM­L5 <aside> tag, top­ic head­ers are us­ing <h3> in­stead of a non­des­cript <div>. A spe­cial case is Table of Con­tents, which is a <nav> in­stead of <aside>
  • Line blocks are simply <p> ele­ments with lines de­lim­ited us­ing <br>
  • The <abbr> tag now prop­erly in­cludes a title at­trib­ute
  • reST com­ments are simply ig­nored, in­stead of be­ing put in­to <!-- -->

Ad­di­tion­ally, the fol­low­ing m.css-spe­cif­ic changes are done:

  • Foot­notes and foot­note ref­er­ences have the .m-footnote styl­ing classes ap­plied
  • Links that are just URLs have .m-link-wrap ap­plied to bet­ter wrap on nar­row screens. Note that it’s also pos­sible to ap­ply this and oth­er CSS classes ex­pli­citly with the m.link plu­gin.

Ty­po­graphy

The Pel­ic­an built­in TYPOGRIFY op­tion is us­ing Smar­tyPants for con­vert­ing ", ', ---, --, ... in­to smart double and single quote, em-dash, en-dash and el­lip­sis, re­spect­ively. Un­for­tu­nately Smar­tyPants have this hard­coded for just Eng­lish, so one can’t eas­ily get Ger­man or French-style quotes.

This plu­gin con­tains a patched ver­sion of smart_quotes op­tion from Doc­utils, which is based off Smar­tyPants, but with prop­er lan­guage aware­ness on top. It is ap­plied to whole doc­u­ment con­tents and fields that are in­cluded in the FORMATTED_FIELDS. See for your­self:

.. class:: language-en

*"A satisfied customer is the best business strategy of all"*

.. class:: language-de

*"Andere Länder, andere Sitten"*

.. class:: language-fr

*"Autres temps, autres mœurs"*

“A sat­is­fied cus­tom­er is the best busi­ness strategy of all”

„An­dere Länder, an­dere Sit­ten“

« Autres temps, autres mœurs »

The de­fault lan­guage is taken from the stand­ard DEFAULT_LANG op­tion, which de­faults to 'en', and can be also over­rid­den on per-page or per-art­icle basis us­ing the :lang: metadata op­tion. This fea­ture is con­trolled by the M_HTMLSANITY_SMART_QUOTES op­tion, which, sim­il­arly to the built­in TYPOGRIFY op­tion, de­faults to False.

Hy­phen­a­tion

Or word wrap. CSS has a stand­ard way to hy­phen­ate words, how­ever it’s quite hard to con­trol from a glob­al place and I’ve not yet seen any browser ac­tu­ally im­ple­ment­ing that fea­ture. Lack of word wrap is vis­ible es­pe­cially on nar­row screens of mo­bile devices, where there is just way too much blank space be­cause of long words be­ing wrapped on new lines.

The hy­phen­a­tion is done us­ing Pyphen and is ap­plied to whole doc­u­ment con­tents and fields that are in­cluded in the FORMATTED_FIELDS. All oth­er fields in­clud­ing doc­u­ment title are ex­cluded from hy­phen­a­tion, the same goes for lit­er­al and raw blocks and links with URL (or e-mail) as a title. You can see it in prac­tice in the fol­low­ing con­vo­luted ex­ample, it’s also lan­guage-aware:

.. class:: language-en

incomprehensibilities

.. class:: language-de

Bezirksschornsteinfegermeister

.. class:: language-fr

anticonstitutionnellement

in­com­pre­hens­ib­il­it­ies

Be­zirks­schorn­stein­fe­ger­meis­ter

an­ti­cons­ti­tu­tion­nel­le­ment

The res­ult­ing HTML code looks like this, with &shy; ad­ded to places that are can­did­ates for a word break:

<p lang="en">in&shy;com&shy;pre&shy;hen&shy;si&shy;bil&shy;i&shy;ties</p>
<p lang="de">Be&shy;zirks&shy;schorn&shy;stein&shy;fe&shy;ger&shy;meis&shy;ter</p>
<p lang="fr">an&shy;ti&shy;cons&shy;ti&shy;tu&shy;tion&shy;nel&shy;le&shy;ment</p>

Thanks to Uni­code ma­gic this is either hid­den or con­ver­ted to a real hy­phen and doesn’t break search or SEO. Sim­il­arly to smart quotes, the de­fault lan­guage is taken from the stand­ard DEFAULT_LANG op­tion or the :lang: metadata op­tion.This fea­ture is con­trolled by the M_HTMLSANITY_HYPHENATION op­tion, which also de­faults to False.

Jin­ja2 good­ies

This plu­gin adds a rtrim fil­ter to Jinja. It’s like the built­in trim, but work­ing only on the right side to get rid of ex­cess­ive newlines at the end.

reST ren­der­ing

It’s pos­sible to use the reST-to-HTM­L5 ren­der­er from your Jin­ja2 tem­plate (for ex­ample to render a cus­tom fine print text in the foot­er, spe­cified through set­tings). Just pipe your vari­able through the render_rst fil­ter:

<html>
  ...
  <body>
    ...
    <footer>{{ FINE_PRINT|render_rst }}</footer>
  </body>
</html>

The fil­ter is fully equi­val­ent to the built­in reST ren­der­ing and the above M_HTMLSANITY_SMART_QUOTES, M_HTMLSANITY_HYPHENATION and DEFAULT_LANG op­tions af­fect it as well.

SITEURL format­ting

Con­veni­ence fil­ter re­pla­cing the com­mon ex­pres­sion {{ SITEURL }}/{{ page.url }} with a format­ter that makes use of url­join so it does the right thing also when deal­ing with ab­so­lute URLs and even when they start with just //.

For ex­ample, if SITEURL is 'https://your.site' and you ap­ply format_siteurl to 'about/', then you get https://your.site/about/; but if you ap­ply it to 'https://github.com/mosra/m.css', then you get just https://github.com/mosra/m.css.

{{ page.url|format_siteurl }}

Text hy­phen­a­tion

If you need to hy­phen­ate text that was not already pro­cessed us­ing the hy­phen­a­tion fil­ter (for ex­ample to wrap art­icle titles or long words in menu items), use the hyphenate fil­ter:

<nav>
  <ul>
    {% for title, link in LINKS %}
    <li><a href="{{ link }}">{{ title|hyphenate }}</a></li>
    {% endfor %}
  </ul>
</nav>

The hy­phen­a­tion is by de­fault con­trolled by the M_HTMLSANITY_HYPHENATION op­tion. If you want to con­trol this sep­ar­ately, pass a boolean vari­able or simply True to the fil­ter enable ar­gu­ment. The lan­guage is by de­fault taken from the stand­ard DEFAULT_LANG op­tion, if you want to over­ride it, pass lan­guage name to the lang ar­gu­ment. You can also take the value from article.lang or page.lang at­trib­utes provided by Pel­ic­an.

{{ title|hyphenate(enable=TEMPLATE_HYPHENATION, lang='fr_FR') }}

Some­times, on the oth­er hand, you might want to de-hy­phen­ate text that was already hy­phen­ated, for ex­ample to avoid po­ten­tial is­sues in <meta> tags. The dehyphenate fil­ter simply re­moves all oc­cur­rences of &shy; from passed text. The enable ar­gu­ment works the same as with the hyphenate fil­ter.

<html>
  <head>
    <meta name="description" content="{{ article.summary|dehyphenate|striptags|e }}" />
  </head>
  ...

Why choose this over …

There are already nu­mer­ous Pel­ic­an plu­gins that try to do sim­il­ar things, but they at­tempt to fix it us­ing Beau­ti­ful­Soup on top of the gen­er­ated HTML. That’s a hor­rendous thing to do, so why not just pre­vent the hor­ror from hap­pen­ing?