A deep dive into MyST, Part 2: The MyST-Parser, Docutils and Sphinx
This is a series of blog posts inviting you to join me in a little journey I have experienced in the last few weeks to figure it out a nice story for MyST inside the Nikola world.
In the previous blog post, we have identified some limitations in the MyST-Parser Python API and we started to understand why roles and directives were not supported by the API as we expected.
In this post, we will explore the machinery underneath the MyST-Parser with the goal to deepen our understanding and being able to propose some alternatives to provide the expected support.
Markdown-it-py and its plugins¶
Coming back to the MyST-Parser default_parser
implementation, you can see the Parser
as a collection of markdown-it-py
plugins:
md = (
MarkdownIt("commonmark", renderer_cls=renderer_cls)
.enable("table")
.use(front_matter_plugin)
.use(myst_block_plugin)
.use(myst_role_plugin)
.use(footnote_plugin)
.use(wordcount_plugin, per_minute=config.words_per_minute)
.disable("footnote_inline")
.disable("footnote_tail")
)
The MarkdownIt
class is instantiated with some parsing configuration options, dictating
the syntax rules and additional options for the parser and renderer. In addition,
several plugins are activated to load a collection of additional syntax rules and render
methods into the parser.
When the input data is parsed via nested chains of rules, it generates a list (stream) of tokens, that will be eventually passed to the renderer to generate a Docutils object.
In the previous post, we highlighted some MyST specific markdown-it-py plugins,
such as the myst_block_plugin
and the myst_role_plugin
. Let's take a brief look at
the latest one for the sake of simplicity:
def myst_role_plugin(md: MarkdownIt):
"""Parse ``{role-name}`content```"""
md.inline.ruler.before("backticks", "myst_role", myst_role)
md.add_render_rule("myst_role", render_myst_role)
The myst_role_plugin
is essentially adding a new syntax rule to the parser that now is
able to the parse roles from the input and a new method for the renderer to render
the now role-associated tokens:
def render_myst_role(self, tokens, idx, options, env):
token = tokens[idx]
name = token.meta.get("name", "unknown")
return (
'<code class="sphinx-role">'
f"{{{name}}}[{escapeHtml(token.content)}]"
"</code>"
)
We now understand the output the MyST-Parser Python API is giving us at the time to parse and render a simple role input:
from myst_parser.main import to_html
text = """
{emphasis}`content`
"""
to_html(text)
We previously showed the to_html
function using the default_parser
configured with
the html
renderer (provided by the markdown-it-py
RendererHTML class).
But, what happens when we use other renderers provided by the MyST-Parser Python API?
The Docutils and Sphinx renderers¶
Let's first explore the to_docutils
function:
from myst_parser.main import to_docutils
text = """
{emphasis}`content`
```{admonition} This is my admonition
This is my note
```
"""
print(to_docutils(text).pformat())
The DocutilsRenderer converts a token directly to the docutils.document representation of the document, converting roles and directives to a docutils.nodes if a converter can be found for the given name.
The SphinxRenderer builds on the DocutilsRenderer to add sphinx specific nodes, e.g. for cross-referencing between documents.
from myst_parser.main import to_docutils
text = """
{ref}`target`
"""
print(to_docutils(text, in_sphinx_env=True).pformat())
Notice the sphinx-specific roles (and directives) needs the in_sphinx_env
option enabled.
The MystParser class¶
Previously, we presented the MystParser
as a Sphinx parser:
from sphinx.parsers import Parser as SphinxParser
...
class MystParser(SphinxParser):
"""Sphinx parser for Markedly Structured Text (MyST)."""
using a set of general and some specific markdown-it-py
plugins (notice the
default_parser
configured - by default - with the "sphinx" renderer):
def parse(self, inputstring: str, document: nodes.document) -> None:
"""Parse source text.
:param inputstring: The source string to parse
:param document: The root docutils node to add AST elements to
"""
config = document.settings.env.myst_config
parser = default_parser(config)
parser.options["document"] = document
env = AttrDict()
to parse the input text into a token stream and then rendering those (via the SphinxRenderer which is a subclass of the DocutilsRenderer):
tokens = parser.parse(inputstring, env)
if not tokens or tokens[0].type != "front_matter":
# we always add front matter, so that we can merge it with global keys,
# specified in the sphinx configuration
tokens = [Token("front_matter", "", 0, content="{}", map=[0, 0])] + tokens
parser.renderer.render(tokens, parser.options, env)
into a Docutils' document representation:
def parse(app: Sphinx, text: str, docname: str = "index") -> nodes.document:
"""Parse a string as MystMarkdown with Sphinx application."""
app.env.temp_data["docname"] = docname
app.env.all_docs[docname] = time.time()
reader = SphinxStandaloneReader()
reader.setup(app)
parser = MystParser()
parser.set_application(app)
with sphinx_domains(app.env):
return publish_doctree(
text,
path.join(app.srcdir, docname + ".md"),
reader=reader,
parser=parser,
parser_name="markdown",
settings_overrides={"env": app.env, "gettext_compact": True},
)
Finally, those objects are passed to the built-in Sphinx machinery to write the html output.
The big questions¶
This is great! We now understand why we need Sphinx to generate the expected HTML output for roles and directives. Several questions now arise:
-
Do we want to have Sphinx-free support for rendering roles and directives in the MyST-Parser
Nikola (among other projects) already have their own machinery (based in Docutils) to build the final HTML output. Getting the docutils object from the Python API would be a super interesting way to easily expose and provide that object to the downstream projects!
One caveat with this approach would be missing some Sphinx features (ie. cross-linking between documents) based on custom roles and directives that we may need to re-implement.
-
Do we want to have docutils-free support for roles and directives in the MyST Python API?
Docutils actually introduces the roles and directives concept (that Sphinx extend) so if we want to go docutils-free, then we will need to re-implement those concepts.
-
Does it makes sense to create a docutils alternative in Python? How much of its functionality would need to be replicated? How should it be extended or enhanced?
There is currently a nice example about an alternative implementation from the Executable Books community, but in the Javascript world: https://github.com/executablebooks/markdown-it-docutils ;-). Since we do not have Docutils there, it actually makes a lot of sense to write something new. But, what is the value/need/place for an alternative implementation in Python? Maybe, we do not need the whole Docutils stuff... but we maybe need some core functionality?
If we decide to write some minimal support, what pieces are we interested to bring first? Where those pieces should end up? The
markdown-it-docutils
package I referenced above is actually amarkdown-it
(JS) plugin. If we follow that pattern, we should create a newmarkdown-it-py-docutils
plugin and we are no longer in the MyST-Parser territory. But the MyST-Parser has, in fact, some parsing directive functions. We may need to move that towardmarkdown-it-py
as the JS plugin does. That sounds nice, but... is there any other suitable (simpler) alternatives besides the one I proposed above?
Finally,in the MyST community, there are some ongoing discussions about developing a
MyST specification that represent what we understand as the MyST language. Having one
specification could be interesting and super useful because different actors interested
in writing a MyST parser for their own space can check their stuff against that
specification. Those actors could be different programming languages such as JS or Python
or even more, it could be different flavors in the same programming language, such as
Docutils, Sphinx, or a theoretical markdown-it-py-docutils
;-)
Show time... not really ;-)¶
OK, again, this is long enough! In the next post I will try to start answering some of these questions showcasing alternative approaches for the Nikola use case.
I hope you enjoyed the ride and I will see you soon with the third part!
Did you like the content? Great!
Or visit my support page for more information.
Btw, don't forget this blog post is an ipynb file itself! So, you can download it from the "Source" link at the top of the post if you want to play with it ;-)
Comments powered by Disqus