HTML sanity
Base plugin that makes your Pelican HTML output and typography look like from the current century.
How to use
Pelican
Download the m/htmlsanity.py file, put it
including the m/
directory into one of your PLUGIN_PATHS
and add
m.htmlsanity
package to your PLUGINS
in pelicanconf.py
. The
following shows the minimal configuration together with default values of all
available options. Not specifying the option is equivalent to setting it to a
default value.
PLUGINS += ['m.htmlsanity'] M_HTMLSANITY_SMART_QUOTES = False M_HTMLSANITY_HYPHENATION = False
Hyphenation (see below) requires the Pyphen library,
either install it via pip
or your distribution package manager or disable
it with the above setting.
pip3 install Pyphen
Python doc theme
The m.htmlsanity
plugin is available always, no need to mention it
explicitly. However, the options aren’t, so you might want to supply them.
The same dependencies as for Pelican apply here. The following shows the
minimal configuration together with default values of all available options.
Not specifying the option is equivalent to setting it to a default value.
M_HTMLSANITY_SMART_QUOTES = False M_HTMLSANITY_HYPHENATION = False
Doxygen theme
The Doxygen theme generates the HTML output directly, without docutils in the mix, which means there’s no need for this particular plugin there.
What it does
This plugin replaces the default Pelican HTML4/CSS1 writer (it already sounds
horrible, right?) with a custom HTML5 writer derived from
docutils.writers.html5_polyglot
that does the following better:
- Document sections are using HTML5
<section>
tag instead of<div class="section">
- Images don’t have the
alt
attribute populated with URI, if not specified otherwise - Figures are using HTML5
<figure>
tag instead of<div class="figure">
, figure caption is using HTML5<figcaption>
instead of<p class="caption">
and figure legend is just a<span>
as<div>
is not allowed inside<figure>
- Drops a lot of useless classes from elements such as
<div class="docutils">
- Makes it possible to have
<a>
elements with block contents (allowed in HTML5) - Even the Docutils HTML5 writer was putting frightening
<colgroup>
things into HTML tables. Not anymore. - Topics are using HTML5
<aside>
tag, topic headers are using<h3>
instead of a nondescript<div>
. A special case is Table of Contents, which is a<nav>
instead of<aside>
- Line blocks are simply
<p>
elements with lines delimited using<br>
- The
<abbr>
tag now properly includes atitle
attribute - reST comments are simply ignored, instead of
being put into
<!-- -->
Additionally, the following m.css-specific changes are done:
- Footnotes and footnote references have the
.m-footnote
styling classes applied - Links that are just URLs have
.m-link-wrap
applied to better wrap on narrow screens. Note that it’s also possible to apply this and other CSS classes explicitly with the m.link plugin.
Typography
The Pelican builtin TYPOGRIFY
option is using
SmartyPants for
converting "
, '
, ---
, --
, ...
into smart double and single
quote, em-dash, en-dash and ellipsis, respectively. Unfortunately SmartyPants
have this hardcoded for just English, so one can’t easily get German or
French-style quotes.
This plugin contains a patched version of
smart_quotes option
from Docutils, which is based off SmartyPants, but with proper language
awareness on top. It is applied to whole document contents and fields that are
included in the FORMATTED_FIELDS
. See for yourself:
The default language is taken from the standard DEFAULT_LANG
option,
which defaults to 'en'
, and can be also overridden on per-page or
per-article basis using the :lang:
metadata option. This feature is
controlled by the M_HTMLSANITY_SMART_QUOTES
option, which, similarly to
the builtin TYPOGRIFY
option, defaults to False
.
Hyphenation
Or word wrap. CSS has a standard way to hyphenate words, however it’s quite hard to control from a global place and I’ve not yet seen any browser actually implementing that feature. Lack of word wrap is visible especially on narrow screens of mobile devices, where there is just way too much blank space because of long words being wrapped on new lines.
The hyphenation is done using Pyphen and is applied to
whole document contents and fields that are included in the FORMATTED_FIELDS
.
All other fields including document title are excluded from hyphenation, the
same goes for literal and raw blocks and links with URL (or e-mail) as a title.
You can see it in practice in the following convoluted example, it’s also
language-aware:
The resulting HTML code looks like this, with ­
added to places
that are candidates for a word break:
<p lang="en">in­com­pre­hen­si­bil­i­ties</p> <p lang="de">Be­zirks­schorn­stein­fe­ger­meis­ter</p> <p lang="fr">an­ti­cons­ti­tu­tion­nel­le­ment</p>
Thanks to Unicode magic this is either hidden or converted to a real hyphen and
doesn’t break search or SEO. Similarly to smart quotes, the default language
is taken from the standard DEFAULT_LANG
option or the :lang:
metadata option.This feature is controlled by the M_HTMLSANITY_HYPHENATION
option, which also defaults to False
.
Jinja2 goodies
This plugin adds a rtrim
filter to Jinja. It’s like the builtin trim
,
but working only on the right side to get rid of excessive newlines at the end.
reST rendering
It’s possible to use the reST-to-HTML5 renderer from your Jinja2 template (for
example to render a custom fine print text in the footer, specified through
settings). Just pipe your variable through the render_rst
filter:
<html> ... <body> ... <footer>{{ FINE_PRINT|render_rst }}</footer> </body> </html>
The filter is fully equivalent to the builtin reST rendering and the above
M_HTMLSANITY_SMART_QUOTES
, M_HTMLSANITY_HYPHENATION
and
DEFAULT_LANG
options affect it as well.
Internal link expansion
By default, link expansion works only in document content and fields that are
referenced in the FORMATTED_FIELDS
(such as article summaries). In order
to expand links in additional fields and arbitrary strings, this plugin
provides two Jinja2 filters, producing results equivalent to
links expanded by Pelican.
For formatted fields, one can use the expand_links
Jinja2 filter in the
template. The link expansion needs the content object (either article
or
page
) as a parameter.
{{ article.legal|expand_links(article) }}
If the custom field consists of just one link (for example a link to article
cover image for a social meta tag), one can use the expand_link
Jinja2
filter:
{{ article.cover|expand_link(article) }}
With the above being in a template and with the FORMATTED_FIELDS
setting
containing the 'legal'
field, a reST article
making use of both fields could look like this:
An article ########## :date: 2017-06-22 :legal: This article is released under `CC0 {filename}/license.rst`_. :cover: {static}/img/article-cover.jpg
SITEURL formatting
Convenience filter replacing the common expression {{ SITEURL }}/{{ page.url }}
with a formatter that makes use of urljoin
so it does the right thing also when dealing with absolute URLs and even when
they start with just //
.
For example, if SITEURL
is 'https://your.site'
and you apply
format_siteurl
to 'about/'
, then you get https://your.site/about/
;
but if you apply it to 'https://github.com/mosra/m.css'
, then you get
just https://github.com/mosra/m.css
.
{{ page.url|format_siteurl }}
Text hyphenation
If you need to hyphenate text that was not already processed using the
hyphenation filter (for example to wrap article titles or long words in menu
items), use the hyphenate
filter:
<nav> <ul> {% for title, link in LINKS %} <li><a href="{{ link }}">{{ title|hyphenate }}</a></li> {% endfor %} </ul> </nav>
The hyphenation is by default controlled by the M_HTMLSANITY_HYPHENATION
option. If you want to control this separately, pass a boolean variable or
simply True
to the filter enable
argument. The language is by default
taken from the standard DEFAULT_LANG
option, if you want to override it,
pass language name to the lang
argument. You can also take the value from
article.lang
or page.lang
attributes provided by Pelican.
{{ title|hyphenate(enable=TEMPLATE_HYPHENATION, lang='fr_FR') }}
Sometimes, on the other hand, you might want to de-hyphenate text that was
already hyphenated, for example to avoid potential issues in <meta>
tags. The dehyphenate
filter simply removes all occurrences of ­
from passed text. The enable
argument works the same as with the
hyphenate
filter.
<html> <head> <meta name="description" content="{{ article.summary|dehyphenate|striptags|e }}" /> </head> ...
Why choose this over …
There are already numerous Pelican plugins that try to do similar things, but they attempt to fix it using BeautifulSoup on top of the generated HTML. That’s a horrendous thing to do, so why not just prevent the horror from happening?