https://discuss.asciidoctor.org/extension-preprocessor-tp886p891.html
Welcome Peter! We're glad to have you!
I'm thrilled to hear that you're working on an integration between Asciidoctor and asciidoc-bib. I've been looking forward to these two projects coming together. I was hopeful the extensions API would open that door, and it seems it has.
The preprocessor is the trickiest extension point to understand at first because it has the ability to affect the reader's cursor. Nothing some documentation can't solve :) So let's get to it!
You have two options when reading the lines in a preprocessor. You can either work with the raw lines of the main source document, or you can instruct the reader to walk the lines.
The second argument to the process method is the raw lines (original source) of the main document. Working with these lines is the most benign since it doesn't affect the reader's cursor. However, as you observed, preprocessor directives in those lines are not processed.
When you read the lines from the reader, any preprocessor directives, such as the include directive, will get processed. It will also advance the cursor, effectively discarding those lines. As such, you need to restore the lines by pushing them back onto the reader. You can so using the Reader#push_include method.
If the process method returns nil, Asciidoctor will continue processing using the reader in its current state. If you return a reader, it will replace the reader Asciidoctor uses.
Here's an example that shows the difference between the raw lines and lines read from the reader.
[source,ruby]
--
class SamplePreprocessor < Asciidoctor::Extensions::Preprocessor
def process reader, raw_lines
puts '.Raw lines'
puts '....'
puts raw_lines.join # <1>
puts '....'
puts "\n"
lines = []
while reader.has_more_lines?
lines << reader.read_line # <2>
end
puts '.Preprocessed lines'
puts '....'
puts lines.join # <3>
puts '....'
use_push_include = true
if use_push_include
reader.push_include lines, '<stdin>', '<stdin>' # <4>
nil # <5>
else
Asciidoctor::Reader.new lines # <6>
end
end
end
--
<1> Prints the original source of the main document
<2> Consume each line from the reader. The line read is the next line once preprocessor directives on the original line have been evaluated.
<3> Prints the effective lines once all preprocessor directives have been evaluated.
<4> Push the lines back onto the reader so that they can be interpreted by the AsciiDoc processor.
The second and third arguments (file and path, respectively) are required at the moment to work around a bug.
<5> A nil return instructs Asciidoctor to continue with the original Reader
<6> Replace the original Reader with a new one that contains the preprocessed lines
The only major caveat right now with reading the lines from the Reader is that it messes up the line information for reporting warnings. When you push the preprocessed lines back onto the Reader, Asciidoctor thinks they were all in the original file. I'm working on a way to retain this information in the next version.
If you are only reading information from the processed lines, and not modifying them, it's possible to push the raw lines back onto the reader, thus restoring its original state.
[source,ruby]
--
def process reader, raw_lines
lines = reader.read # <1>
cursor = reader.cursor
reader.push_include raw_lines, (cursor.file || '<stdin>'), (cursor.path || '<stdin>'), 1 # <2>
reader.process_lines = true if cursor.file.nil? # <3>
nil
end
--
<1> Preprocess all lines and return as an Array
<2> Restore the original lines onto the Reader
<3> Work around an assumption that lines should not be processed if the file name is nil
I hope that clears some things up. Obviously, the Preprocessor extension needs some polishing, but that's what this discussion is helping to identify.
Cheers,
-Dan