Asciidoctor :: Discussion

How to extract the raw content of a section

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

elmicka

Mar 07, 2020; 11:02pm

How to extract the raw content of a section

Hi,

First of all, I'm new to asciidoctor and ruby, so maybe I'm missing something easy there.

I'm trying to write a script to extract the content from a document.
I came with this script, but "puts level2_block.content" prints out an html formated content
instead of the asciidoc content.

Is it possible to get the asciidoc content ?

Thanks

require 'asciidoctor'                                                                                                                            
include Asciidoctor                                                                 
                                                                                    
document = Asciidoctor.load_file("demo.adoc")                                       
                                                                                    
document.blocks.each do|level1_block|                                               
  puts level1_block.title                                                           
  level1_block.blocks.each do|level2_block|                                         
    page_name = level2_block.title.tr(" ","-")                                      
    puts "===============================================>"                         
    puts "\t #{page_name}"                                                          
    puts level2_block.content                                                                    
  end                                                                               
end

My test file.

= Title 1                                                                           
                                                                                    
== Subtitle 1                                                                       
                                                                                    
=== My section 1.1                                                                  
My content goes here.                                                               
                                                                                                
== Subtitle 2                                                                       
                                                                                  
=== My section 2.1                                                                  
|===                                                                                
|header1|header2                                                                    
|cell1|cell2                                                                        
|===

David Jencks

Mar 07, 2020; 11:45pm

Re: How to extract the raw content of a section

Looking at the javascript translation, I think that .content calls .convert.

You might try .text or .lines (which will be an array of lines)

David Jencks

On Mar 7, 2020, at 3:02 PM, elmicka [via Asciidoctor :: Discussion] <[hidden email]> wrote:

require 'asciidoctor'                                                                                                                            
include Asciidoctor                                                                 
                                                                                    
document = Asciidoctor.load_file("demo.adoc")                                       
                                                                                    
document.blocks.each do|level1_block|                                               
  puts level1_block.title                                                           
  level1_block.blocks.each do|level2_block|                                         
    page_name = level2_block.title.tr(" ","-")                                      
    puts "===============================================>"                         
    puts "\t #{page_name}"                                                          
    puts level2_block.content                                                                    
  end                                                                               
end

My test file.

= Title 1                                                                           
                                                                                    
== Subtitle 1                                                                       
                                                                                    
=== My section 1.1                                                                  
My content goes here.                                                               
                                                                                                
== Subtitle 2                                                                       
                                                                                  
=== My section 2.1                                                                  
|===                                                                                
|header1|header2                                                                    
|cell1|cell2                                                                        
|===

If you reply to this email, your message will be added to the discussion below:

https://discuss.asciidoctor.org/How-to-extract-the-raw-content-of-a-section-tp7732.html

To start a new topic under Asciidoctor :: Discussion, [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

... [show rest of quote]

elmicka

Mar 08, 2020; 12:27am

Re: How to extract the raw content of a section

David,

Thank you for your answer.

I tried to use 'level2_block.text' and 'level2_block.lines',
but it seems that there is no such method for this object (undefined method error)

mojavelinux

Mar 08, 2020; 12:49am

Re: How to extract the raw content of a section

Administrator

The API does not make the raw source of a section available. The source is only stored at the block level.

What I've done in the past to extract the source of the section is to leverage the sourcemap. By enabling the sourcemap option, you get the file and line number for each section. Then, you take that information and go back to the original source of the document and use it to cut out the source for the second.

Here's some really rough code to show you what I'm talking about:

require 'asciidoctor'

source = <<~'EOS'
= Document Title

== First Section

content

== Second Section

content
EOS

doc = Asciidoctor.load source, sourcemap: true

section_source = doc.source_lines[(doc.sections[0].lineno - 1)..(doc.sections[1].lineno - 2)].join ?\n

Best Regards,

-Dan

On Sat, Mar 7, 2020 at 5:27 PM elmicka [via Asciidoctor :: Discussion] <[hidden email]> wrote:

David,

Thank you for your answer.

I tried to use 'level2_block.text' and 'level2_block.lines',
but it seems that there is no such method for this object (undefined method error)

If you reply to this email, your message will be added to the discussion below:
https://discuss.asciidoctor.org/How-to-extract-the-raw-content-of-a-section-tp7732p7734.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | @mojavelinux | https://twitter.com/mojavelinux

elmicka

Mar 08, 2020; 9:38am

Re: How to extract the raw content of a section

Dan,

Thank you so much, I managed to make it works with your snippet.

Regards