Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Oblomov
Hello all,

I've started working on an extension for literate programming with Asciidoctor, called asciidoctor-litprog <https://github.com/Oblomov/asciidoctor-litprog>. I've got the basics down (mostly) and the extension is “self hosting” (the README is an AsciiDoc document that, when processed with the extension, produces the extension —bootstrap version of the Ruby module is provided for convenience), and I'm now trying to tackle what I believe is the most difficult part of my endeavor (and potentially also one of the least essential, but whatever): “enchancing” the presentation of the source blocks.

The source blocks are  standard source blocks, but some lines contain references to other chunks of code. These are completely out of place wrt to the syntax highlighter that applies to the block, and I would like these references (and only them) to be “ignored” by the syntax highlighter, and if possible processed by Asciidoctor to generate hyperlinks to the referenced chunk-defining block(s). However, I'm not sure how I should proceed for that.

My understanding is that I would have to enable some kind of substitution; however, I don't want them applied to the whole block, but only to specific lines (and potentially, in the future, to specific _subsets_ of specific lines). The documentation on how to achieve this programmatically is a bit sparse, so any suggestions on how to do this (or a pointer to the relevant documentation) would be much appreciated.

Thanks in advance,

GB
Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

David Jencks
You should write a block processor extension to process the links. There’s some documentation in this PR: WIP resolves Issue 3884: extension documentation

I’m not sure how to make sure the highlighting will work properly.

David Jencks

On Feb 15, 2021, at 3:08 PM, Oblomov [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hello all,

I've started working on an extension for literate programming with Asciidoctor, called asciidoctor-litprog <https://github.com/Oblomov/asciidoctor-litprog>. I've got the basics down (mostly) and the extension is “self hosting” (the README is an AsciiDoc document that, when processed with the extension, produces the extension —bootstrap version of the Ruby module is provided for convenience), and I'm now trying to tackle what I believe is the most difficult part of my endeavor (and potentially also one of the least essential, but whatever): “enchancing” the presentation of the source blocks.

The source blocks are  standard source blocks, but some lines contain references to other chunks of code. These are completely out of place wrt to the syntax highlighter that applies to the block, and I would like these references (and only them) to be “ignored” by the syntax highlighter, and if possible processed by Asciidoctor to generate hyperlinks to the referenced chunk-defining block(s). However, I'm not sure how I should proceed for that.

My understanding is that I would have to enable some kind of substitution; however, I don't want them applied to the whole block, but only to specific lines (and potentially, in the future, to specific _subsets_ of specific lines). The documentation on how to achieve this programmatically is a bit sparse, so any suggestions on how to do this (or a pointer to the relevant documentation) would be much appreciated.

Thanks in advance,

GB


To start a new topic under Asciidoctor :: Discussion, [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Oblomov
Hello David,

thanks for the link! I'll give it a good read (and see if I can contribute too).

However, I'm not entirely sure a block processor is what I want. Or rather, I can see why it might be appropriate, but I find myself in a peculiar situation, in that the data necessary to modify the block content may only be available _after_ the tree processor (that I've presently implemented) completes its work, and my understanding is that the block processors run _before_ the tree processors.

For the block title, I'm currently running a function at the end of the `process` step of the tree processor that takes the gathered data and adds prev/next link to the title (simply by appending the corresponding syntax to the title), which seems to work correctly in the limited testing I've given it so far. I was hoping to be able to something similar with the in-block links, possibly manipulating the block lines and subs/attributes.

Cheers,

GB
Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

David Jencks
I can’t say I really understand what you are trying to do, but….

I wonder if having a block processor and the tree processor in the same extension would be useful.  The block processor can come up with a table of interesting locations, which would be accessible to the tree processor because it’s in the same processor.  It’s also possible that you can do everything you need in the tree processor just as easily.

As you can see I’m happy to make suggestions based on the slimmest of knowledge… seeing a fairly concrete example of what you are trying to do could be very helpful.

David Jencks

On Feb 15, 2021, at 11:01 PM, Oblomov [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hello David,

thanks for the link! I'll give it a good read (and see if I can contribute too).

However, I'm not entirely sure a block processor is what I want. Or rather, I can see why it might be appropriate, but I find myself in a peculiar situation, in that the data necessary to modify the block content may only be available _after_ the tree processor (that I've presently implemented) completes its work, and my understanding is that the block processors run _before_ the tree processors.

For the block title, I'm currently running a function at the end of the `process` step of the tree processor that takes the gathered data and adds prev/next link to the title (simply by appending the corresponding syntax to the title), which seems to work correctly in the limited testing I've given it so far. I was hoping to be able to something similar with the in-block links, possibly manipulating the block lines and subs/attributes.

Cheers,

GB


To start a new topic under Asciidoctor :: Discussion, [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Oblomov
Hello David,

thanks for your patience 8-), I'll try to explain better what I'm doing and what I'm trying to achieve, although probably giving a read to my project's README would cover most of it.

In case you're not familiar with it, literate programming (LP) is an approach to coding that puts documentation and natural discourse at the center. A LP source is a file from which one can (usually with separate programs) extract the actual source code for the program (“tangling”), and the source code for the documentation (“weaving”; traditionally the output is TeX, but this is obviously not the case here).

The LP source itself alternates human discourse (the documentation) and code, and the code is written in chunks that refer to each other. As an example from my project, the chunk “The module structure”

https://github.com/Oblomov/asciidoctor-litprog/blob/f0be55/README.adoc#38-the-module

refers to several other chunks (“Licensing statement”, “Requires”, “Main class definition”, etc) that are defined in other [source] blocks in the document. Note that a chunk can be defined by multiple blocks: for example, the “Requires” chunk is defined by two different blocks in the source code (this kind of incremental definition is one of the powerful features of LP).

During the tangling process, the referenced chunk are “inlined” into the referencing chunks, starting from a “root” chunk. This produces the content of the “output” source code (that will then be compiled by the standard compiler for that particular language). This is the process that my tree processor does: it finds all the [source] blocks, concatenates the ones with the same title*, and then writes out the expected output files from the marked root chunks. This part works correctly (and the extension actually reproduces itself from the LP “source” that is its README).

*the title can actually be shortened after the first time it has been seen, so for example you can have a chunk titled “Some very long name” and then refer to it with “Some very...”.

Now, using AsciiDoc as both the documentation source language and the LP source means that technically there is no need for a “weaving” step, because the LP “source” is a valid AsciiDoc document that can be processed as such directly (and you can see that by processing my project's README without any extension). However, a standard “processing” of the document will lack some navigation features that would make the output more useful: in particular, navigation between the blocks that contribute to the same chunk, and navigation between one chunk and the other, following their usage/reference.

I have solved the “navigate between blocks of the same chunk” issue by manipulating the title of each block from within the tree processor: after it finishes collecting information about which blocks contribute to which chunks, it adds in the title of each block some links to the next/prev block for the same chunk (if present). Note that this must be done at the tree processor level, because the information about which blocks contribute to which chunk (and in particular if there will be a “next” block contributing to that same chunk) isn't available until after the entire document has been parsed.

Now the part that is missing is the transformation of references to chunks _within_ the blocks, which I would like to turn into links. For the same example linked above, for example, I would like to transform the line `+<<Main class...>>+` into links to each block that contribute to the definition of the “Main class definition” chunk —and until the entire document has been parsed I cannot know how many of such links I would need, or what they should link to: so the block processor wouldn't cut it, since _at most_ the thing it could know is that the shorthand “Main class...” stands for “Main class definition”, but not if it the chunk has been defined, or which blocks contribute to it (they may come at a later time).

The second issue is more technical, and it's about how to tell the source highlighter to ignore specific lines (e.g. consider them comment lines) so that Asciidoctor can take over and format the line as a link.

I hope it's clearer now, and thanks again for your patience!

GB
Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

mojavelinux
Administrator
> it's about how to tell the source highlighter to ignore specific lines (e.g. consider them comment lines) so that Asciidoctor can take over and format the line as a link.

What you're probably looking for is a custom syntax highlighter adapter, which is also an extension point. It is invoked by the converter when it needs to apply syntax highlighting to a source block. This seems to be the perfect opportunity to do the processing you want to do.


If you want to reuse a built-in adapter, you can extend it by looking it up, and extending the resolved class.

class MyRouge < (Asciidoctor::SyntaxHighlighter.for 'rouge')
  register_for 'rouge'

  # override any methods here
end

-Dan

--
Dan Allen (he, him, his) | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Oblomov
Hello Dan,

I see, a posteriori it makes sense that hooking directly into the syntax highlighter would be the appropriate thing to do.
If I'm reading the thing correctly, the `highlight` method is where most of the magic happens, and the idea would be to use this to override/extend the source -> lexer -> formatter(s) chain.

I'll study this thing a bit to see which part of the chain would be more useful to hijack / override / extend.

Thanks a lot,

GB
Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

mojavelinux
Administrator
> If I'm reading the thing correctly, the `highlight` method is where most of the magic happens, and the idea would be to use this to override/extend the source -> lexer -> formatter(s) chain.

You've got it!

-Dan

On Tue, Feb 16, 2021 at 7:57 AM Oblomov [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Hello Dan,

I see, a posteriori it makes sense that hooking directly into the syntax highlighter would be the appropriate thing to do.
If I'm reading the thing correctly, the `highlight` method is where most of the magic happens, and the idea would be to use this to override/extend the source -> lexer -> formatter(s) chain.

I'll study this thing a bit to see which part of the chain would be more useful to hijack / override / extend.

Thanks a lot,

GB


To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen (he, him, his) | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

Oblomov
OK, there is ONE annoying thing about this strategy, that I've seen so far: even if the only thing I want to do is override one particular aspect, I basically have to copy over the entire highlight method. Let's say that I only want to support rouge: `highlight` does a lot of autoguessing and formatter tuning, and then closes by calling the formatter on the lexer on the source. It would be nice if there a hook right before that line, so that a plugin or extension could do fine-tuning to the lexer and/or formatter without having to copy the entire logic of the function 8-/
Reply | Threaded
Open this post in threaded view
|

Re: Interpret links (or other AsciiDoc syntax) only in specific lines of a [source] block

mojavelinux
Administrator
On Thu, Feb 18, 2021 at 11:50 AM Oblomov [via Asciidoctor :: Discussion] <[hidden email]> wrote:
OK, there is ONE annoying thing about this strategy, that I've seen so far: even if the only thing I want to do is override one particular aspect, I basically have to copy over the entire highlight method.

Yes, that is a limitation.
 
Let's say that I only want to support rouge: `highlight` does a lot of autoguessing and formatter tuning, and then closes by calling the formatter on the lexer on the source. It would be nice if there a hook right before that line, so that a plugin or extension could do fine-tuning to the lexer and/or formatter without having to copy the entire logic of the function 8-/

It's better to think of it in a functional way. The logic to prepare the lexer and formatter should be moved to a dedicate function so it can be replaced or intercepted. I don't think that's something that the adapter interface should enforce, but it can be something that the individual adapters do. If you'd like to propose a code change, I'll definitely consider it.

Best Regards,

-Dan
 
--
Dan Allen (he, him, his) | @mojavelinux | https://twitter.com/mojavelinux