Asciidoctor :: Discussion

Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

10 messages Options

Oblomov

Feb 15, 2021; 11:08pm

Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Hello all,

I've started working on an extension for literate programming with Asciidoctor, called asciidoctor-litprog <https://github.com/Oblomov/asciidoctor-litprog>. I've got the basics down (mostly) and the extension is “self hosting” (the README is an AsciiDoc document that, when processed with the extension, produces the extension —bootstrap version of the Ruby module is provided for convenience), and I'm now trying to tackle what I believe is the most difficult part of my endeavor (and potentially also one of the least essential, but whatever): “enchancing” the presentation of the source blocks.

The source blocks are standard source blocks, but some lines contain references to other chunks of code. These are completely out of place wrt to the syntax highlighter that applies to the block, and I would like these references (and only them) to be “ignored” by the syntax highlighter, and if possible processed by Asciidoctor to generate hyperlinks to the referenced chunk-defining block(s). However, I'm not sure how I should proceed for that.

My understanding is that I would have to enable some kind of substitution; however, I don't want them applied to the whole block, but only to specific lines (and potentially, in the future, to specific _subsets_ of specific lines). The documentation on how to achieve this programmatically is a bit sparse, so any suggestions on how to do this (or a pointer to the relevant documentation) would be much appreciated.

Thanks in advance,

GB

David Jencks

Feb 15, 2021; 11:18pm

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

You should write a block processor extension to process the links. There’s some documentation in this PR: WIP resolves Issue 3884: extension documentation

I’m not sure how to make sure the highlighting will work properly.

David Jencks

On Feb 15, 2021, at 3:08 PM, Oblomov [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hello all,

I've started working on an extension for literate programming with Asciidoctor, called asciidoctor-litprog <https://github.com/Oblomov/asciidoctor-litprog>. I've got the basics down (mostly) and the extension is “self hosting” (the README is an AsciiDoc document that, when processed with the extension, produces the extension —bootstrap version of the Ruby module is provided for convenience), and I'm now trying to tackle what I believe is the most difficult part of my endeavor (and potentially also one of the least essential, but whatever): “enchancing” the presentation of the source blocks.

The source blocks are standard source blocks, but some lines contain references to other chunks of code. These are completely out of place wrt to the syntax highlighter that applies to the block, and I would like these references (and only them) to be “ignored” by the syntax highlighter, and if possible processed by Asciidoctor to generate hyperlinks to the referenced chunk-defining block(s). However, I'm not sure how I should proceed for that.

My understanding is that I would have to enable some kind of substitution; however, I don't want them applied to the whole block, but only to specific lines (and potentially, in the future, to specific _subsets_ of specific lines). The documentation on how to achieve this programmatically is a bit sparse, so any suggestions on how to do this (or a pointer to the relevant documentation) would be much appreciated.

Thanks in advance,

GB

If you reply to this email, your message will be added to the discussion below:
https://discuss.asciidoctor.org/Interpret-links-or-other-AsciiDoc-syntax-only-in-specific-lines-of-a-source-block-tp8499.html

To start a new topic under Asciidoctor :: Discussion, [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

... [show rest of quote]

Oblomov

Feb 16, 2021; 7:01am

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Hello David,

thanks for the link! I'll give it a good read (and see if I can contribute too).

However, I'm not entirely sure a block processor is what I want. Or rather, I can see why it might be appropriate, but I find myself in a peculiar situation, in that the data necessary to modify the block content may only be available _after_ the tree processor (that I've presently implemented) completes its work, and my understanding is that the block processors run _before_ the tree processors.

For the block title, I'm currently running a function at the end of the `process` step of the tree processor that takes the gathered data and adds prev/next link to the title (simply by appending the corresponding syntax to the title), which seems to work correctly in the limited testing I've given it so far. I was hoping to be able to something similar with the in-block links, possibly manipulating the block lines and subs/attributes.

Cheers,

GB

David Jencks

Feb 16, 2021; 8:08am

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

I can’t say I really understand what you are trying to do, but….

I wonder if having a block processor and the tree processor in the same extension would be useful. The block processor can come up with a table of interesting locations, which would be accessible to the tree processor because it’s in the same processor. It’s also possible that you can do everything you need in the tree processor just as easily.

As you can see I’m happy to make suggestions based on the slimmest of knowledge… seeing a fairly concrete example of what you are trying to do could be very helpful.

David Jencks

On Feb 15, 2021, at 11:01 PM, Oblomov [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hello David,

thanks for the link! I'll give it a good read (and see if I can contribute too).

However, I'm not entirely sure a block processor is what I want. Or rather, I can see why it might be appropriate, but I find myself in a peculiar situation, in that the data necessary to modify the block content may only be available _after_ the tree processor (that I've presently implemented) completes its work, and my understanding is that the block processors run _before_ the tree processors.

For the block title, I'm currently running a function at the end of the `process` step of the tree processor that takes the gathered data and adds prev/next link to the title (simply by appending the corresponding syntax to the title), which seems to work correctly in the limited testing I've given it so far. I was hoping to be able to something similar with the in-block links, possibly manipulating the block lines and subs/attributes.

Cheers,

GB

If you reply to this email, your message will be added to the discussion below:
https://discuss.asciidoctor.org/Interpret-links-or-other-AsciiDoc-syntax-only-in-specific-lines-of-a-source-block-tp8499p8501.html

To start a new topic under Asciidoctor :: Discussion, [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

... [show rest of quote]

Oblomov

Feb 16, 2021; 9:06am

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Hello David,

thanks for your patience 8-), I'll try to explain better what I'm doing and what I'm trying to achieve, although probably giving a read to my project's README would cover most of it.

In case you're not familiar with it, literate programming (LP) is an approach to coding that puts documentation and natural discourse at the center. A LP source is a file from which one can (usually with separate programs) extract the actual source code for the program (“tangling”), and the source code for the documentation (“weaving”; traditionally the output is TeX, but this is obviously not the case here).

The LP source itself alternates human discourse (the documentation) and code, and the code is written in chunks that refer to each other. As an example from my project, the chunk “The module structure”

https://github.com/Oblomov/asciidoctor-litprog/blob/f0be55/README.adoc#38-the-module

refers to several other chunks (“Licensing statement”, “Requires”, “Main class definition”, etc) that are defined in other [source] blocks in the document. Note that a chunk can be defined by multiple blocks: for example, the “Requires” chunk is defined by two different blocks in the source code (this kind of incremental definition is one of the powerful features of LP).

During the tangling process, the referenced chunk are “inlined” into the referencing chunks, starting from a “root” chunk. This produces the content of the “output” source code (that will then be compiled by the standard compiler for that particular language). This is the process that my tree processor does: it finds all the [source] blocks, concatenates the ones with the same title*, and then writes out the expected output files from the marked root chunks. This part works correctly (and the extension actually reproduces itself from the LP “source” that is its README).

*the title can actually be shortened after the first time it has been seen, so for example you can have a chunk titled “Some very long name” and then refer to it with “Some very...”.

Now, using AsciiDoc as both the documentation source language and the LP source means that technically there is no need for a “weaving” step, because the LP “source” is a valid AsciiDoc document that can be processed as such directly (and you can see that by processing my project's README without any extension). However, a standard “processing” of the document will lack some navigation features that would make the output more useful: in particular, navigation between the blocks that contribute to the same chunk, and navigation between one chunk and the other, following their usage/reference.

I have solved the “navigate between blocks of the same chunk” issue by manipulating the title of each block from within the tree processor: after it finishes collecting information about which blocks contribute to which chunks, it adds in the title of each block some links to the next/prev block for the same chunk (if present). Note that this must be done at the tree processor level, because the information about which blocks contribute to which chunk (and in particular if there will be a “next” block contributing to that same chunk) isn't available until after the entire document has been parsed.

Now the part that is missing is the transformation of references to chunks _within_ the blocks, which I would like to turn into links. For the same example linked above, for example, I would like to transform the line `+<<Main class...>>+` into links to each block that contribute to the definition of the “Main class definition” chunk —and until the entire document has been parsed I cannot know how many of such links I would need, or what they should link to: so the block processor wouldn't cut it, since _at most_ the thing it could know is that the shorthand “Main class...” stands for “Main class definition”, but not if it the chunk has been defined, or which blocks contribute to it (they may come at a later time).

The second issue is more technical, and it's about how to tell the source highlighter to ignore specific lines (e.g. consider them comment lines) so that Asciidoctor can take over and format the line as a link.

I hope it's clearer now, and thanks again for your patience!

GB

mojavelinux

Feb 16, 2021; 9:29am

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Administrator

> it's about how to tell the source highlighter to ignore specific lines (e.g. consider them comment lines) so that Asciidoctor can take over and format the line as a link.

What you're probably looking for is a custom syntax highlighter adapter, which is also an extension point. It is invoked by the converter when it needs to apply syntax highlighting to a source block. This seems to be the perfect opportunity to do the processing you want to do.

Here's an example: https://github.com/asciidoctor/asciidoctor/blob/master/lib/asciidoctor/syntax_highlighter/rouge.rb

If you want to reuse a built-in adapter, you can extend it by looking it up, and extending the resolved class.

class MyRouge < (Asciidoctor::SyntaxHighlighter.for 'rouge')

register_for 'rouge'

# override any methods here

end

-Dan

Dan Allen (he, him, his) | @mojavelinux | https://twitter.com/mojavelinux

Oblomov

Feb 16, 2021; 2:57pm

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Hello Dan,

I see, a posteriori it makes sense that hooking directly into the syntax highlighter would be the appropriate thing to do.
If I'm reading the thing correctly, the `highlight` method is where most of the magic happens, and the idea would be to use this to override/extend the source -> lexer -> formatter(s) chain.

I'll study this thing a bit to see which part of the chain would be more useful to hijack / override / extend.

Thanks a lot,

GB

mojavelinux

Feb 18, 2021; 7:55am

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Administrator

> If I'm reading the thing correctly, the `highlight` method is where most of the magic happens, and the idea would be to use this to override/extend the source -> lexer -> formatter(s) chain.

You've got it!

-Dan

On Tue, Feb 16, 2021 at 7:57 AM Oblomov [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hello Dan,

I see, a posteriori it makes sense that hooking directly into the syntax highlighter would be the appropriate thing to do.
If I'm reading the thing correctly, the `highlight` method is where most of the magic happens, and the idea would be to use this to override/extend the source -> lexer -> formatter(s) chain.

I'll study this thing a bit to see which part of the chain would be more useful to hijack / override / extend.

Thanks a lot,

GB

If you reply to this email, your message will be added to the discussion below:
https://discuss.asciidoctor.org/Interpret-links-or-other-AsciiDoc-syntax-only-in-specific-lines-of-a-source-block-tp8499p8505.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

... [show rest of quote]

Dan Allen (he, him, his) | @mojavelinux | https://twitter.com/mojavelinux

Oblomov

Feb 18, 2021; 6:50pm

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

OK, there is ONE annoying thing about this strategy, that I've seen so far: even if the only thing I want to do is override one particular aspect, I basically have to copy over the entire highlight method. Let's say that I only want to support rouge: `highlight` does a lot of autoguessing and formatter tuning, and then closes by calling the formatter on the lexer on the source. It would be nice if there a hook right before that line, so that a plugin or extension could do fine-tuning to the lexer and/or formatter without having to copy the entire logic of the function 8-/

mojavelinux

Feb 18, 2021; 9:47pm

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Administrator

On Thu, Feb 18, 2021 at 11:50 AM Oblomov [via Asciidoctor :: Discussion] <[hidden email]> wrote:

OK, there is ONE annoying thing about this strategy, that I've seen so far: even if the only thing I want to do is override one particular aspect, I basically have to copy over the entire highlight method.

Yes, that is a limitation.

Let's say that I only want to support rouge: `highlight` does a lot of autoguessing and formatter tuning, and then closes by calling the formatter on the lexer on the source. It would be nice if there a hook right before that line, so that a plugin or extension could do fine-tuning to the lexer and/or formatter without having to copy the entire logic of the function 8-/

It's better to think of it in a functional way. The logic to prepare the lexer and formatter should be moved to a dedicate function so it can be replaced or intercepted. I don't think that's something that the adapter interface should enforce, but it can be something that the individual adapters do. If you'd like to propose a code change, I'll definitely consider it.

Best Regards,

-Dan

Dan Allen (he, him, his) | @mojavelinux | https://twitter.com/mojavelinux