Asciidoctor :: Discussion - Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Posted by Oblomov on Feb 16, 2021; 9:06am
URL: https://discuss.asciidoctor.org/Interpret-links-or-other-AsciiDoc-syntax-only-in-specific-lines-of-a-source-block-tp8499p8503.html

Hello David,

thanks for your patience 8-), I'll try to explain better what I'm doing and what I'm trying to achieve, although probably giving a read to my project's README would cover most of it.

In case you're not familiar with it, literate programming (LP) is an approach to coding that puts documentation and natural discourse at the center. A LP source is a file from which one can (usually with separate programs) extract the actual source code for the program (“tangling”), and the source code for the documentation (“weaving”; traditionally the output is TeX, but this is obviously not the case here).

The LP source itself alternates human discourse (the documentation) and code, and the code is written in chunks that refer to each other. As an example from my project, the chunk “The module structure”

https://github.com/Oblomov/asciidoctor-litprog/blob/f0be55/README.adoc#38-the-module

refers to several other chunks (“Licensing statement”, “Requires”, “Main class definition”, etc) that are defined in other [source] blocks in the document. Note that a chunk can be defined by multiple blocks: for example, the “Requires” chunk is defined by two different blocks in the source code (this kind of incremental definition is one of the powerful features of LP).

During the tangling process, the referenced chunk are “inlined” into the referencing chunks, starting from a “root” chunk. This produces the content of the “output” source code (that will then be compiled by the standard compiler for that particular language). This is the process that my tree processor does: it finds all the [source] blocks, concatenates the ones with the same title*, and then writes out the expected output files from the marked root chunks. This part works correctly (and the extension actually reproduces itself from the LP “source” that is its README).

*the title can actually be shortened after the first time it has been seen, so for example you can have a chunk titled “Some very long name” and then refer to it with “Some very...”.

Now, using AsciiDoc as both the documentation source language and the LP source means that technically there is no need for a “weaving” step, because the LP “source” is a valid AsciiDoc document that can be processed as such directly (and you can see that by processing my project's README without any extension). However, a standard “processing” of the document will lack some navigation features that would make the output more useful: in particular, navigation between the blocks that contribute to the same chunk, and navigation between one chunk and the other, following their usage/reference.

I have solved the “navigate between blocks of the same chunk” issue by manipulating the title of each block from within the tree processor: after it finishes collecting information about which blocks contribute to which chunks, it adds in the title of each block some links to the next/prev block for the same chunk (if present). Note that this must be done at the tree processor level, because the information about which blocks contribute to which chunk (and in particular if there will be a “next” block contributing to that same chunk) isn't available until after the entire document has been parsed.

Now the part that is missing is the transformation of references to chunks _within_ the blocks, which I would like to turn into links. For the same example linked above, for example, I would like to transform the line `+<<Main class...>>+` into links to each block that contribute to the definition of the “Main class definition” chunk —and until the entire document has been parsed I cannot know how many of such links I would need, or what they should link to: so the block processor wouldn't cut it, since _at most_ the thing it could know is that the shorthand “Main class...” stands for “Main class definition”, but not if it the chunk has been defined, or which blocks contribute to it (they may come at a later time).

The second issue is more technical, and it's about how to tell the source highlighter to ignore specific lines (e.g. consider them comment lines) so that Asciidoctor can take over and format the line as a link.

I hope it's clearer now, and thanks again for your patience!

GB