Suggest: Create inline node in parse time instead of convert time

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggest: Create inline node in parse time instead of convert time

chloerei
Asciidoctor currently does not create inline node during parsing, this means can not get inline node after parsing the document.

For example, I want to create a multipage converter, after split document to several sub document, I can not know if this sub document have any footnotes during convert time.

Currently I convert the document to HTML first, use specific mark to represent inline footnote, and then postprocess the HTML to extract footnotes.

Another issue is Asciidoctor will create duplicate footnotes after repeatedly convert.

For example:

irb> doc = Asciidoctor.load 'footnote:[text]'
irb> doc.convert
=> "<div class=\"paragraph\">\n<p><sup class=\"footnote\">[<a id=\"_footnoteref_1\" class=\"footnote\" href=\"#_footnotedef_1\" title=\"View footnote.\">1]</sup></p>\n</div>\n<div id=\"footnotes\">\n<hr>\n<div class=\"footnote\" id=\"_footnotedef_1\">\n<a href=\"#_footnoteref_1\">1. text\n</div>\n</div>"
irb> doc.convert
=> "<div class=\"paragraph\">\n<p><sup class=\"footnote\">[<a id=\"_footnoteref_2\" class=\"footnote\" href=\"#_footnotedef_2\" title=\"View footnote.\">2]</sup></p>\n</div>\n<div id=\"footnotes\">\n<hr>\n<div class=\"footnote\" id=\"_footnotedef_1\">\n<a href=\"#_footnoteref_1\">1. text\n</div>\n<div class=\"footnote\" id=\"_footnotedef_2\">\n<a href=\"#_footnoteref_2\">2. text\n</div>\n</div>"

This is weird because the documentation itself has not changed.

This is not an urgent issue, and it would be better if an API for finding inline nodes could be provided in the future.
Reply | Threaded
Open this post in threaded view
|

Re: Suggest: Create inline node in parse time instead of convert time

mojavelinux
Administrator
This limitation of Asciidoctor has been known for a very long time, covered by https://github.com/asciidoctor/asciidoctor/issues/61, and frequently discussed. It's not something that's going to be solved here. Changing the parsing strategy will very likely have an impact on the grammar and thus has to be dealt with in the specification process. In fact, it will be one of the key mandates of the spec. Essentially, the document should be parsed in one phase and converted in another. But right now still aspects of the parser that rely on conversion happening while parsing. Those dependencies will need to get teased apart.

There are two approaches you can use as a workaround. One is to convert the document fully as a dry run to extract data from it, then go back and convert it again a second time armed with the information you gathers. Another is to do what Asciidoctor Mathematical does and use a treeprocessor to walk the DOM, convert blocks one by one, and extract and/or manipulate the information. See https://github.com/asciidoctor/asciidoctor-mathematical/blob/master/lib/asciidoctor-mathematical/extension.rb Or you can combine the two approaches. You'll need to get creative.

Best,

-Dan

On Mon, Jan 20, 2020 at 12:02 AM chloerei [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Asciidoctor currently does not create inline node during parsing, this means can not get inline node after parsing the document.

For example, I want to create a multipage converter, after split document to several sub document, I can not know if this sub document have any footnotes during convert time.

Currently I convert the document to HTML first, use specific mark to represent inline footnote, and then postprocess the HTML to extract footnotes.

Another issue is Asciidoctor will create duplicate footnotes after repeatedly convert.

For example:

irb> doc = Asciidoctor.load 'footnote:[text]'
irb> doc.convert
=> "<div class=\"paragraph\">\n<p><sup class=\"footnote\">[<a id=\"_footnoteref_1\" class=\"footnote\" href=\"#_footnotedef_1\" title=\"View footnote.\">1]</sup></p>\n</div>\n<div id=\"footnotes\">\n<hr>\n<div class=\"footnote\" id=\"_footnotedef_1\">\n<a href=\"#_footnoteref_1\">1. text\n</div>\n</div>"
irb> doc.convert
=> "<div class=\"paragraph\">\n<p><sup class=\"footnote\">[<a id=\"_footnoteref_2\" class=\"footnote\" href=\"#_footnotedef_2\" title=\"View footnote.\">2]</sup></p>\n</div>\n<div id=\"footnotes\">\n<hr>\n<div class=\"footnote\" id=\"_footnotedef_1\">\n<a href=\"#_footnoteref_1\">1. text\n</div>\n<div class=\"footnote\" id=\"_footnotedef_2\">\n<a href=\"#_footnoteref_2\">2. text\n</div>\n</div>"

This is weird because the documentation itself has not changed.

This is not an urgent issue, and it would be better if an API for finding inline nodes could be provided in the future.


If you reply to this email, your message will be added to the discussion below:
https://discuss.asciidoctor.org/Suggest-Create-inline-node-in-parse-time-instead-of-convert-time-tp7500.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
|

Re: Suggest: Create inline node in parse time instead of convert time

chloerei
Got it, thanks for your reply.