Plugins » HTML sanity

Base plugin that makes your Pelican HTML output and typography look like from the current century.

Download the m/htmlsanity.py file, put it including the m/ directory into one of your PLUGIN_PATHS and add m.htmlsanity package to your PLUGINS in pelicanconf.py. The following shows the minimal configuration together with default values of all available options. Not specifying the option is equivalent to setting it to a default value.

PLUGINS += ['m.htmlsanity']
M_HTMLSANITY_SMART_QUOTES = False
M_HTMLSANITY_HYPHENATION = False

Hyphenation (see below) requires the Pyphen library, either install it via pip or your distribution package manager or disable it with the above setting.

pip3 install Pyphen

Python doc theme

The m.htmlsanity plugin is available always, no need to mention it explicitly. However, the options aren’t, so you might want to supply them. The same dependencies as for Pelican apply here. The following shows the minimal configuration together with default values of all available options. Not specifying the option is equivalent to setting it to a default value.

M_HTMLSANITY_SMART_QUOTES = False
M_HTMLSANITY_HYPHENATION = False

Doxygen theme

The Doxygen theme generates the HTML output directly, without docutils in the mix, which means there’s no need for this particular plugin there.

What it does

This plugin replaces the default Pelican HTML4/CSS1 writer (it already sounds horrible, right?) with a custom HTML5 writer derived from docutils.writers.html5_polyglot that does the following better:

Document sections are using HTML5 <section> tag instead of <div class="section">
Images don’t have the alt attribute populated with URI, if not specified otherwise
Figures are using HTML5 <figure> tag instead of <div class="figure">, figure caption is using HTML5 <figcaption> instead of <p class="caption"> and figure legend is just a <span> as <div> is not allowed inside <figure>
Drops a lot of useless classes from elements such as <div class="docutils">
Makes it possible to have <a> elements with block contents (allowed in HTML5)
Even the Docutils HTML5 writer was putting frightening <colgroup> things into HTML tables. Not anymore.
Topics are using HTML5 <aside> tag, topic headers are using <h3> instead of a nondescript <div>. A special case is Table of Contents, which is a <nav> instead of <aside>
Line blocks are simply <p> elements with lines delimited using <br>
The <abbr> tag now properly includes a title attribute
reST comments are simply ignored, instead of being put into

Additionally, the following m.css-specific changes are done:

Footnotes and footnote references have the .m-footnote styling classes applied
Links that are just URLs have .m-link-wrap applied to better wrap on narrow screens. Note that it’s also possible to apply this and other CSS classes explicitly with the m.link plugin.

Typography

The Pelican builtin TYPOGRIFY option is using SmartyPants for converting ", ', ---, --, ... into smart double and single quote, em-dash, en-dash and ellipsis, respectively. Unfortunately SmartyPants have this hardcoded for just English, so one can’t easily get German or French-style quotes.

This plugin contains a patched version of smart_quotes option from Docutils, which is based off SmartyPants, but with proper language awareness on top. It is applied to whole document contents and fields that are included in the FORMATTED_FIELDS. See for yourself:

.. class:: language-en

*"A satisfied customer is the best business strategy of all"*

.. class:: language-de

*"Andere Länder, andere Sitten"*

.. class:: language-fr

*"Autres temps, autres mœurs"*

“A satisfied customer is the best business strategy of all”

„Andere Länder, andere Sitten“

« Autres temps, autres mœurs »

The default language is taken from the standard DEFAULT_LANG option, which defaults to 'en', and can be also overridden on per-page or per-article basis using the :lang: metadata option. This feature is controlled by the M_HTMLSANITY_SMART_QUOTES option, which, similarly to the builtin TYPOGRIFY option, defaults to False.

Hyphenation

Or word wrap. CSS has a standard way to hyphenate words, however it’s quite hard to control from a global place and I’ve not yet seen any browser actually implementing that feature. Lack of word wrap is visible especially on narrow screens of mobile devices, where there is just way too much blank space because of long words being wrapped on new lines.

The hyphenation is done using Pyphen and is applied to whole document contents and fields that are included in the FORMATTED_FIELDS. All other fields including document title are excluded from hyphenation, the same goes for literal and raw blocks and links with URL (or e-mail) as a title. You can see it in practice in the following convoluted example, it’s also language-aware:

.. class:: language-en

incomprehensibilities

.. class:: language-de

Bezirksschornsteinfegermeister

.. class:: language-fr

anticonstitutionnellement

incomprehensibilities

Bezirksschornsteinfegermeister

anticonstitutionnellement

The resulting HTML code looks like this, with  added to places that are candidates for a word break:

<p lang="en">in&shy;com&shy;pre&shy;hen&shy;si&shy;bil&shy;i&shy;ties</p>
<p lang="de">Be&shy;zirks&shy;schorn&shy;stein&shy;fe&shy;ger&shy;meis&shy;ter</p>
<p lang="fr">an&shy;ti&shy;cons&shy;ti&shy;tu&shy;tion&shy;nel&shy;le&shy;ment</p>

Thanks to Unicode magic this is either hidden or converted to a real hyphen and doesn’t break search or SEO. Similarly to smart quotes, the default language is taken from the standard DEFAULT_LANG option or the :lang: metadata option.This feature is controlled by the M_HTMLSANITY_HYPHENATION option, which also defaults to False.

Jinja2 goodies

This plugin adds a rtrim filter to Jinja. It’s like the builtin trim, but working only on the right side to get rid of excessive newlines at the end.

reST rendering

It’s possible to use the reST-to-HTML5 renderer from your Jinja2 template (for example to render a custom fine print text in the footer, specified through settings). Just pipe your variable through the render_rst filter:

<html>
  ...
  <body>
    ...
    <footer>{{ FINE_PRINT|render_rst }}</footer>
  </body>
</html>

The filter is fully equivalent to the builtin reST rendering and the above M_HTMLSANITY_SMART_QUOTES, M_HTMLSANITY_HYPHENATION and DEFAULT_LANG options affect it as well.

For content coming from document metadata fields you still have to use the builtin FORMATTED_FIELDS option, otherwise additional formatting will get lost.

Internal link expansion

By default, link expansion works only in document content and fields that are referenced in the FORMATTED_FIELDS (such as article summaries). In order to expand links in additional fields and arbitrary strings, this plugin provides two Jinja2 filters, producing results equivalent to links expanded by Pelican.

For formatted fields, one can use the expand_links Jinja2 filter in the template. The link expansion needs the content object (either article or page) as a parameter.

{{ article.legal|expand_links(article) }}

If the custom field consists of just one link (for example a link to article cover image for a social meta tag), one can use the expand_link Jinja2 filter:

{{ article.cover|expand_link(article) }}

With the above being in a template and with the FORMATTED_FIELDS setting containing the 'legal' field, a reST article making use of both fields could look like this:

An article
##########

:date: 2017-06-22
:legal: This article is released under `CC0 {filename}/license.rst`_.
:cover: {static}/img/article-cover.jpg

SITEURL formatting

Convenience filter replacing the common expression {{ SITEURL }}/{{ page.url }} with a formatter that makes use of urljoin so it does the right thing also when dealing with absolute URLs and even when they start with just //.

For example, if SITEURL is 'https://your.site' and you apply format_siteurl to 'about/', then you get https://your.site/about/; but if you apply it to 'https://github.com/mosra/m.css', then you get just https://github.com/mosra/m.css.

{{ page.url|format_siteurl }}

Text hyphenation

If you need to hyphenate text that was not already processed using the hyphenation filter (for example to wrap article titles or long words in menu items), use the hyphenate filter:

<nav>
  <ul>
    {% for title, link in LINKS %}
    <li><a href="{{ link }}">{{ title|hyphenate }}</a></li>
    {% endfor %}
  </ul>
</nav>

The hyphenation is by default controlled by the M_HTMLSANITY_HYPHENATION option. If you want to control this separately, pass a boolean variable or simply True to the filter enable argument. The language is by default taken from the standard DEFAULT_LANG option, if you want to override it, pass language name to the lang argument. You can also take the value from article.lang or page.lang attributes provided by Pelican.

{{ title|hyphenate(enable=TEMPLATE_HYPHENATION, lang='fr_FR') }}

Sometimes, on the other hand, you might want to de-hyphenate text that was already hyphenated, for example to avoid potential issues in <meta> tags. The dehyphenate filter simply removes all occurrences of  from passed text. The enable argument works the same as with the hyphenate filter.

<html>
  <head>
    <meta name="description" content="{{ article.summary|dehyphenate|striptags|e }}" />
  </head>
  ...

Why choose this over …

There are already numerous Pelican plugins that try to do similar things, but they attempt to fix it using BeautifulSoup on top of the generated HTML. That’s a horrendous thing to do, so why not just prevent the horror from happening?