Asciidoctor :: Discussion

Asciidoc syntax definition

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

34 messages Options

aalmiray

Jul 24, 2014; 8:04am

Asciidoc syntax definition

31 posts

Hello fellow doc writers,

Is anyone aware of an ANTLR or Javacc definition of the asciidoc syntax?

Cheers,
Andres

asotobu

Jul 24, 2014; 5:06pm

Re: Asciidoc syntax definition

298 posts

there is a first implementation but I have never be able to run it. In fact it is not using ANTLR or JavaCC but Parboiled

https://github.com/asciidocj/asciidocj

I didn't spend so much time but if you want we can try it again.

What do you have in mind? hehehehe

mojavelinux

Sep 13, 2014; 6:53am

Re: Asciidoc syntax definition

Administrator

2681 posts

Andres,

There isn't an official one (in fact, AsciiDoc Python never defined one). As Alex points out, there is an attempt to start one, but it's very, very, very early on.

I think that a core goal of an AsciiDoc (or UniDoc) standard will be to define a grammar. I don't think it will be as hard as it seems, if we agree that some sacrifices/changes will have to be made. In fact, it's already very possible to define a grammar at the block level. It's the inline stuff that's going to be a bit trickier.

I have plans to start working on a Treetop-based grammar because we need it to fix some of the inline parsing inconsistencies today. Treetop is a PEG grammar parser. I think PEG is the right way to go because it keeps the grammar simple and makes implementations easy to write. However, I would also welcome an attempt to define the grammar in ANTLR.

-Dan

On Thu, Jul 24, 2014 at 11:06 AM, asotobu [via Asciidoctor :: Discussion] <[hidden email]> wrote:

there is a first implementation but I have never be able to run it. In fact it is not using ANTLR or JavaCC but Parboiled

https://github.com/asciidocj/asciidocj

I didn't spend so much time but if you want we can try it again.

What do you have in mind? hehehehe

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Asciidoc-syntax-definition-tp1920p1921.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | http://google.com/profiles/dan.j.allen

sdaschner

Mar 13, 2015; 1:40pm

Re: Asciidoc syntax definition

7 posts

Hi there,

any updates on this?
I've been working with compilers & parsers in my studies and would like to help defining a grammar.

Cheers,
Sebastian

asotobu

Mar 13, 2015; 1:57pm

Re: Asciidoc syntax definition

298 posts

Hi, no we have not work on this. In fact we even don't have a document in EBNF of asciidoc format, so maybe this could be a good start

sdaschner

Mar 13, 2015; 2:30pm

Re: Asciidoc syntax definition

7 posts

Sounds good.
Maybe a Gist or Github project for that would be a good idea?

asotobu

Mar 13, 2015; 2:34pm

Re: Asciidoc syntax definition

298 posts

github would be the best way

El dv., 13 març, 2015 a les 15:31, sdaschner [via Asciidoctor :: Discussion] (<[hidden email]>) va escriure:

Sounds good.
Maybe a Gist or Github project for that would be a good idea?

If you reply to this email, your message will be added to the discussion below:

http://discuss.asciidoctor.org/Asciidoc-syntax-definition-tp1920p2832.html

To unsubscribe from Asciidoc syntax definition, click here.
NAML

mojavelinux

Mar 30, 2015; 12:28pm

Re: Asciidoc syntax definition

Administrator

2681 posts

Several people have approached me recently about working on a formal / standard grammar and parser for AsciiDoc. I'm very glad to discover there's so much interest! I think the time is right to kick off the effort.

Initially, the work will be exploratory. I think we should start by focusing on the inline syntax (bold, italic, monospace, etc). I hope we can identify which parts of the syntax already work well in a formal grammar and which parts present problems (perhaps because they are too context sensitive). We'll also want to cleanup the terminology and iron out places where it's a bit quirky (for instance, "formatted text" instead of "quoted text").

I see two key goals for this effort:

* to enable multiple implementations of the parser (perhaps even a reference implementation in Java)

* make the syntax more consistent and predictable

I've been using the following wiki page to capture some information related to the effort:

https://github.com/asciidoctor/asciidoctor/wiki/AsciiDoc-Specification-(aka-UniDoc)-Planning

Here's a list of people who have expressed interest in contributing:

* Martin van Rappard
* Sebastian Daschner
* Erik Pragt
* Jakub Jirutka

* Vincent Massol

Feel free to add your name to this list (ideally on the wiki page).

I'd be interested in connecting with compiler groups at universities to see if any research projects can be formed around this effort. There are a lot of problems to explore with parsing lightweight markup languages (perhaps also aspects of natural language processing) that could make for interesting research and also help move AsciiDoc forward and evolve. If you know of anyone interested, please have them reach out (ideally through this list).

Here are the action items to get started:

Step 1. Choose a name for the GitHub repository name

Here are two possible names for the repo.

* asciidoc-grammar

* asciidoc-syntax

Should we use one of these or something else?

Step 2. Create an initial project structure.

Are we going to do it based on ANTLR 4 or some other parser? We'll need to decide that in order to create the initial project.

Who would like to do step 2?

Step 3. Hack!

I'm really excited about this initiative! Let's make it happen.

Cheers,

-Dan

Dan Allen | http://google.com/profiles/dan.j.allen

sdaschner

Mar 30, 2015; 2:45pm

Re: Asciidoc syntax definition

7 posts

Sounds great, I'm really exited about this!

IMO asciidoc-syntax is a good name for the project (sounds more general to me than -grammar). Other opinions?

I would use Github issues in the project for syntax suggestions / improvements, etc. Do you agree?

Sam

Mar 30, 2015; 2:54pm

Re: Asciidoc syntax definition

3 posts

In reply to this post by mojavelinux

I'd expect

asciidoc-grammar:: to be the formal definition

asciidoc-syntax:: to be the human readable version

On 30 March 2015 at 14:28, mojavelinux [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Several people have approached me recently about working on a formal / standard grammar and parser for AsciiDoc. I'm very glad to discover there's so much interest! I think the time is right to kick off the effort.

Initially, the work will be exploratory. I think we should start by focusing on the inline syntax (bold, italic, monospace, etc). I hope we can identify which parts of the syntax already work well in a formal grammar and which parts present problems (perhaps because they are too context sensitive). We'll also want to cleanup the terminology and iron out places where it's a bit quirky (for instance, "formatted text" instead of "quoted text").

I see two key goals for this effort:

* to enable multiple implementations of the parser (perhaps even a reference implementation in Java)
* make the syntax more consistent and predictable

I've been using the following wiki page to capture some information related to the effort:

https://github.com/asciidoctor/asciidoctor/wiki/AsciiDoc-Specification-(aka-UniDoc)-Planning

Here's a list of people who have expressed interest in contributing:

* Martin van Rappard
* Sebastian Daschner
* Erik Pragt
* Jakub Jirutka
* Vincent Massol

Feel free to add your name to this list (ideally on the wiki page).

I'd be interested in connecting with compiler groups at universities to see if any research projects can be formed around this effort. There are a lot of problems to explore with parsing lightweight markup languages (perhaps also aspects of natural language processing) that could make for interesting research and also help move AsciiDoc forward and evolve. If you know of anyone interested, please have them reach out (ideally through this list).

Here are the action items to get started:

Step 1. Choose a name for the GitHub repository name

Here are two possible names for the repo.

* asciidoc-grammar
* asciidoc-syntax

Should we use one of these or something else?

Step 2. Create an initial project structure.

Are we going to do it based on ANTLR 4 or some other parser? We'll need to decide that in order to create the initial project.

Who would like to do step 2?

Step 3. Hack!

I'm really excited about this initiative! Let's make it happen.

Cheers,

-Dan

--
Dan Allen | http://google.com/profiles/dan.j.allen

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Asciidoc-syntax-definition-tp1920p2866.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

... [show rest of quote]

The Mighty Brown Hat
http://www.festivalhat.com/

mojavelinux

Mar 30, 2015; 2:56pm

Re: Asciidoc syntax definition

Administrator

2681 posts

In reply to this post by sdaschner

On Mon, Mar 30, 2015 at 4:45 PM, sdaschner [via Asciidoctor :: Discussion] <[hidden email]> wrote:

I would use Github issues in the project for syntax suggestions / improvements, etc. Do you agree?

I completely agree. Once we have the project started on GitHub, we'll be able to feed all ideas through issues, pull requests and commits. If necessary, we can use the wiki, though I would prefer to just update the README for the project.

Should we initiate the project to use ANTLR4 or some other parser?

-Dan

Dan Allen | http://google.com/profiles/dan.j.allen

mojavelinux

Mar 30, 2015; 2:59pm

Re: Asciidoc syntax definition

Administrator

2681 posts

In reply to this post by Sam

On Mon, Mar 30, 2015 at 4:54 PM, Sam [via Asciidoctor :: Discussion] <[hidden email]> wrote:

asciidoc-grammar:: to be the formal definition
asciidoc-syntax:: to be the human readable version

We could also make these subfolders in the project. In that case, perhaps:

asciidoc-syntax-definition/

grammar/

syntax (or even spec)

We don't have to get it perfect right now. What's most important is that we have files to start hacking on. It's easy to rearrange.

-Dan

Dan Allen | http://google.com/profiles/dan.j.allen

Sam

Mar 30, 2015; 3:46pm

Re: Asciidoc syntax definition

3 posts

On 30 March 2015 at 16:59, mojavelinux [via Asciidoctor :: Discussion] <[hidden email]> wrote:

We could also make these subfolders in the project. In that case, perhaps:

asciidoc-syntax-definition/
grammar/
syntax (or even spec)

We don't have to get it perfect right now. What's most important is that we have files to start hacking on. It's easy to rearrange.

Sure - I think the only confusing thing for me personally would be if you called it syntax and it contained only grammar (as per my definitions above).

mojavelinux

Mar 30, 2015; 3:58pm

Re: Asciidoc syntax definition

Administrator

2681 posts

On Mon, Mar 30, 2015 at 5:46 PM, Sam [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Sure - I think the only confusing thing for me personally would be if you called it syntax and it contained only grammar (as per my definitions above).

In that case, for now, we should probably call it asciidoctor-grammar and then update it once we have the syntax / spec too. Changing a repository name doesn't break links, so it would support this evolution.

Thanks for the input!

-Dan

Dan Allen | http://google.com/profiles/dan.j.allen

jirutka

Apr 01, 2015; 5:23pm

Re: Asciidoc syntax definition

3 posts

I think that PEG or parser combinators would be more sufficient for Asciidoc syntax than classic parser generator like ANTLR.

mojavelinux

Apr 01, 2015; 5:29pm

Re: Asciidoc syntax definition

Administrator

2681 posts

On Wed, Apr 1, 2015 at 7:23 PM, jirutka [via Asciidoctor :: Discussion] <[hidden email]> wrote:

I think that PEG or parser combinators would be more sufficient for Asciidoc syntax than classic parser generator like ANTLR.

I was kind of thinking that too. In fact, even with an ANTLR grammar, we probably need to be able to parse it in a PEG style too.

What ANTLR gives us is a clear declaration of the grammer, though. PEG tends to mix the grammar and parser together, which makes it harder to see the choices. So perhaps we need to do both?

What's the tool of choice in the Java ecosystem for a PEG parser? I'm looking for something equivalent to Treetop from the Ruby ecosystem.

-Dan

Dan Allen | http://google.com/profiles/dan.j.allen

asotobu

Apr 01, 2015; 6:35pm

Re: Asciidoc syntax definition

298 posts

Yes there is one tool

https://github.com/sirthias/parboiled/wiki/Java-Parser but I think antlr will help us to maintain a clear division between grammar and implementation

El dc., 1 d’abr., 2015 a les 19.30 mojavelinux [via Asciidoctor :: Discussion]
[hidden email]> va escriure:

On Wed, Apr 1, 2015 at 7:23 PM, jirutka [via Asciidoctor :: Discussion] <[hidden email]> wrote:
I think that PEG or parser combinators would be more sufficient for Asciidoc syntax than classic parser generator like ANTLR.

I was kind of thinking that too. In fact, even with an ANTLR grammar, we probably need to be able to parse it in a PEG style too.

What ANTLR gives us is a clear declaration of the grammer, though. PEG tends to mix the grammar and parser together, which makes it harder to see the choices. So perhaps we need to do both?

What's the tool of choice in the Java ecosystem for a PEG parser? I'm looking for something equivalent to Treetop from the Ruby ecosystem.

-Dan

Dan Allen | http://google.com/profiles/dan.j.allen

If you reply to this email, your message will be added to the discussion below:

http://discuss.asciidoctor.org/Asciidoc-syntax-definition-tp1920p2918.html

To unsubscribe from Asciidoc syntax definition, click here.
NAML

jirutka

Apr 01, 2015; 7:29pm

Re: Asciidoc syntax definition

3 posts

I know Parboiled quite well. I wrote some extensions for Markdown parser written on top of Parboiled and also wrote some little grammar/parser of mine in it. Before that I wrote two grammar/parsers (see RSQL) in JavaCC, another classic parser generator. Parboiled was much simpler to use than JavaCC. By the way, another benefit of PEG is extensibility, users can write plugins extending the parser.

The problem is that Parboiled is a legacy project that isn’t under active development anymore. Sirthias (author of Parboiled) concluded that Java is very unsuitable for this, so he’s currently working on Parboiled 2 that is written entirely in Scala.

Java really isn’t a good choice for writing parser, it’ll be better to use Groovy, Scala or any other language that runs on JVM and provide only Java facade.

I think that the most universal would be JavaScript. It runs in a web browser (needed for live preview), on a server (node.js / io.js) and also in JVM (Nashorn, JS engine in Java 8, is quite good as I heard).

We should also consider implementation in some low-level language, to get small binary library usable as Ruby/Python/whatever native extension. I think that Rust is currently the best choice for that. There’s a promising project rust-peg.

sonson

Apr 02, 2015; 7:15am

Re: Asciidoc syntax definition

7 posts

In reply to this post by asotobu

Just as a side note:

Laika https://github.com/planet42/Laika is a good base for writing lightweight text markup parser. It already supports Markdown and Restructured Text, has many tests and works fast (actually much faster than the Markdown parser parboiled which is based on pegdown) . A JVM based extensible pandoc tool would be really cool *dream*.
Laika is based on Scala (JVM): Unfortunately there may not many Scala developers in the Asciidoctor community :-(.

More developers would be familiar with Antlr 4, which could generate code for Java and Javascript.

sdaschner

Apr 02, 2015; 9:35am

Re: Asciidoc syntax definition

7 posts

I also vote for ANTLR.
IMO we should have a Java-based parser (not only, necessarily) and keep the dependencies small due to the use cases we have (and could have). A current issue is for instance the AsciidoctorJ dependency on JRuby (which may cause trouble on classloading, e.g. on WildFly). So a "native"-Java solution would be the best IMO.