Traversing and parsing/inserting of a document at the block element level?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Traversing and parsing/inserting of a document at the block element level?

sean.osterberg
This post was updated on .
Hi guys, I've been investigating Asciidoctor as a new standard for a large documentation set. I'm impressed by the syntax and platform but I still have many outstanding questions. I'm proficient in Java but not in Ruby, so I've looked at AsciidoctorJ in detail (including writing lots of sample code) to determine if needed functionality is currently available. It doesn't appear to be, so I've also been reading through the Ruby API docs (especially the AST) to get a better idea of the low-level API and what's possible.

With or without helper libraries that may not exist yet, is it possible using AsciidoctorJ to traverse an existing document's block elements, parse the elements, and then optionally modify the element? Think Jsoup but for an Asciidoctor file. If this functionality exists in the API I haven't been able to find it.

To use a simple example, this is the Java code I would envision writing to check all links in a block and/or modify them as necessary (maybe not the best code but you get the idea):

Sample pseudocode:
--------------------

Asciidoctor asciidoctor = Asciidoctor.Factory.create();
Document doc = asciidoctor.load(asciidocFile, options);
List<Block> blocks = doc.getBlocks();
for(Block block : blocks) {
  if(block.hasLinks) {
    List<Link> links = block.getLinks();
    for(Link link : links) {
      if(!link.isValid) {
        link.value = "http://asciidoctor.org";
      }
    }
  }
}
doc.saveChanges();

--------------------

It probably goes without saying, but there are many situations where parsing/manipulating block-level objects would be required functionality if your goal is to maintain the integrity of the source doc, rather than merely checking the HTML output using Jsoup or some other library. Other examples: mass updates of attributes, enforcing consistent styles for the Asciidoc content (Setext vs. ATX header styles), injecting conditional blocks (such as for pre-release documentation), etc.

Is this kind of processing possible, either using the AST or other libraries? What about in Ruby?

Thanks!
Sean

EDIT:

To make it crystal clear what I'm trying to accomplish and to demonstrate some problems I've been having with the Java library, here is some test code that I wrote and have tried on fairly long .ad files:

Code that traverses each block layer:
--------------------

    private void printBlockLayersInDocument(Document doc) {
        for(AbstractBlock block : doc.blocks()) {
            writeLineToLog("* " + block.context());
            getBlockLayers(block, 1);
        }
    }

    private void getBlockLayers(AbstractBlock block, int count) {
        for (AbstractBlock b : block.blocks()) {
            for(int i = 0; i < count; i++) {
                writeToLog("  ");
            }
            writeLineToLog("* " + b.context());
            getBlockLayers(b, count + 1);
        }
    }

---------------------

The output of this code looks like the following. I've added notes with [note:] for certain lines to indicate what kind of element actually exists in the source .ad file:

Output on a test Asciidoc file:
--------------------

* preamble
  * paragraph
* section
  * paragraph
* section
  * paragraph
* section
  * paragraph
  * paragraph
  * paragraph      [note: this is actually a link]
  * paragraph
* section
  * paragraph
  * paragraph      [note: this is actually an image]
  * paragraph      [note: this is actually a link]
  * paragraph
  * paragraph
  * paragraph      [note: this is actually a list item]
  * paragraph      [note: this is actually a list item]
  * paragraph      [note: this is actually a list item]
  * paragraph      [note: this is actually an image]
  * paragraph
  * paragraph
  * paragraph
  * paragraph
  * paragraph
  * paragraph
* section
  * paragraph
* section
  * paragraph
* section
  * paragraph

So the question remains: how can I get programmatic access to the underlying image/link/list/etc?
Reply | Threaded
Open this post in threaded view
|

Re: Traversing and parsing/inserting of a document at the block element level?

mojavelinux
Administrator
Sean,

The short answer is "we're getting there".

Currently, Asciidoctor parses block-level elements during the load phase. A block-level element is roughly equivalent to what it means in the HTML spec (e.g., paragraph, sidebar, code listing, etc). What Asciidoctor's parser doesn't fully support is parsing of inline elements, such as links and formatting. Instead, the inline elements are parsed in a streaming manner during conversion. We are well aware of this shortcoming {1} and have plans to solve it in the near future, hopefully as part of the 1.6.0 release (~ Q2 or Q3 2015).

The AST is mutable, so modifications to it take effect immediately (no need to save changes). Your example won't work now because of the inline parsing limitation. However, you can replace the block that contains the link with another block and you'd get the same outcome (just more heavy-handed).

Here's an example of an extension (written in Ruby) that does tree manipulation.


In theory, this should be possible in Java as well. Where the API doesn't currently line up, we are seeking to close those gaps. If you see something that doesn't work as expected, simply file an issue in the issue tracker {2}, preferably with a test case. There's definitely work to be done here.

Cheers,

-Dan


On Thu, Dec 4, 2014 at 3:20 PM, sean.osterberg [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Hi guys, I've been investigating AsciiDoctor as a new standard for a large documentation set. I'm impressed by the syntax and platform but I still have many outstanding questions. I'm proficient in Java but not in Ruby, so I've looked at AsciidoctorJ in detail (including writing lots of sample code) to determine if needed functionality is currently available. It doesn't appear to be, so I've also been reading through the Ruby API docs (especially the AST) to get a better idea of the low-level API and what's possible.

With or without helper libraries that may not exist yet, is it possible using AsciidoctorJ to traverse an existing document's block elements, parse the elements, and then optionally modify the element? Think Jsoup but for an Asciidoctor file. If this functionality exists in the API I haven't been able to find it.

To use a simple example, this is the Java code I would envision writing to check all links in a block and/or modify them as necessary (maybe not the best code but you get the idea):

------------

Asciidoctor asciidoctor = Asciidoctor.Factory.create();
Document doc = asciidoctor.load(asciidocFile, options);
List<Block> blocks = doc.getBlocks();
for(Block block : blocks) {
  if(block.hasLinks) {
    List<Link> links = block.getLinks();
    for(Link link : links) {
      if(!link.isValid) {
        link.value = "http://asciidoctor.org";
      }
    }
  }
}
doc.saveChanges();

------------

It probably goes without saying, but there are many situations where parsing/manipulating block-level objects would be required functionality if your goal is to maintain the integrity of the source doc, rather than merely checking the HTML output using Jsoup or some other library. Other examples: mass updates of attributes, enforcing consistent styles for the Asciidoc content (Setext vs. ATX header styles), injecting conditional blocks (such as for pre-release documentation), etc.

Is this kind of processing possible, either using the AST or other libraries? What about in Ruby?

Thanks!
Sean


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Traversing-and-parsing-inserting-of-a-document-at-the-block-element-level-tp2500.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML



--