Asciidoc syntax definition

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Asciidoc syntax definition

aalmiray
Hello fellow doc writers,

Is anyone aware of an ANTLR or Javacc definition of the asciidoc syntax?

Cheers,
Andres
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

asotobu
there is a first implementation but I have never be able to run it. In fact it is not using ANTLR or JavaCC but Parboiled

https://github.com/asciidocj/asciidocj

I didn't spend so much time but if you want we can try it again.

What do you have in mind? hehehehe
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

mojavelinux
Administrator
Andres,

There isn't an official one (in fact, AsciiDoc Python never defined one). As Alex points out, there is an attempt to start one, but it's very, very, very early on.

I think that a core goal of an AsciiDoc (or UniDoc) standard will be to define a grammar. I don't think it will be as hard as it seems, if we agree that some sacrifices/changes will have to be made. In fact, it's already very possible to define a grammar at the block level. It's the inline stuff that's going to be a bit trickier.

I have plans to start working on a Treetop-based grammar because we need it to fix some of the inline parsing inconsistencies today. Treetop is a PEG grammar parser. I think PEG is the right way to go because it keeps the grammar simple and makes implementations easy to write. However, I would also welcome an attempt to define the grammar in ANTLR.

-Dan

On Thu, Jul 24, 2014 at 11:06 AM, asotobu [via Asciidoctor :: Discussion] <[hidden email]> wrote:
there is a first implementation but I have never be able to run it. In fact it is not using ANTLR or JavaCC but Parboiled

https://github.com/asciidocj/asciidocj

I didn't spend so much time but if you want we can try it again.

What do you have in mind? hehehehe


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Asciidoc-syntax-definition-tp1920p1921.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML



--
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

sdaschner
Hi there,

any updates on this?
I've been working with compilers & parsers in my studies and would like to help defining a grammar.


Cheers,
Sebastian
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

asotobu
Hi, no we have not work on this. In fact we even don't have a document in EBNF of asciidoc format, so maybe this could be a good start
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

sdaschner
Sounds good.
Maybe a Gist or Github project for that would be a good idea?
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

asotobu
github would be the best way

El dv., 13 març, 2015 a les 15:31, sdaschner [via Asciidoctor :: Discussion] (<[hidden email]>) va escriure:
Sounds good.
Maybe a Gist or Github project for that would be a good idea?


If you reply to this email, your message will be added to the discussion below:
To unsubscribe from Asciidoc syntax definition, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

mojavelinux
Administrator
Several people have approached me recently about working on a formal / standard grammar and parser for AsciiDoc. I'm very glad to discover there's so much interest! I think the time is right to kick off the effort.

Initially, the work will be exploratory. I think we should start by focusing on the inline syntax (bold, italic, monospace, etc). I hope we can identify which parts of the syntax already work well in a formal grammar and which parts present problems (perhaps because they are too context sensitive). We'll also want to cleanup the terminology and iron out places where it's a bit quirky (for instance, "formatted text" instead of "quoted text").

I see two key goals for this effort:

* to enable multiple implementations of the parser (perhaps even a reference implementation in Java)
* make the syntax more consistent and predictable

I've been using the following wiki page to capture some information related to the effort:


Here's a list of people who have expressed interest in contributing:

* Martin van Rappard
* Sebastian Daschner
* Erik Pragt
* Jakub Jirutka
* Vincent Massol

Feel free to add your name to this list (ideally on the wiki page).

I'd be interested in connecting with compiler groups at universities to see if any research projects can be formed around this effort. There are a lot of problems to explore with parsing lightweight markup languages (perhaps also aspects of natural language processing) that could make for interesting research and also help move AsciiDoc forward and evolve. If you know of anyone interested, please have them reach out (ideally through this list).

Here are the action items to get started:

Step 1. Choose a name for the GitHub repository name

Here are two possible names for the repo.

* asciidoc-grammar
* asciidoc-syntax

Should we use one of these or something else?

Step 2. Create an initial project structure.

Are we going to do it based on ANTLR 4 or some other parser? We'll need to decide that in order to create the initial project.

Who would like to do step 2?

Step 3. Hack!

I'm really excited about this initiative! Let's make it happen.

Cheers,

Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

sdaschner
Sounds great, I'm really exited about this!

IMO asciidoc-syntax is a good name for the project (sounds more general to me than -grammar). Other opinions?

I would use Github issues in the project for syntax suggestions / improvements, etc. Do you agree?
Sam
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

Sam
In reply to this post by mojavelinux
I'd expect 

asciidoc-grammar:: to be the formal definition
asciidoc-syntax:: to be the human readable version

S

On 30 March 2015 at 14:28, mojavelinux [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Several people have approached me recently about working on a formal / standard grammar and parser for AsciiDoc. I'm very glad to discover there's so much interest! I think the time is right to kick off the effort.

Initially, the work will be exploratory. I think we should start by focusing on the inline syntax (bold, italic, monospace, etc). I hope we can identify which parts of the syntax already work well in a formal grammar and which parts present problems (perhaps because they are too context sensitive). We'll also want to cleanup the terminology and iron out places where it's a bit quirky (for instance, "formatted text" instead of "quoted text").

I see two key goals for this effort:

* to enable multiple implementations of the parser (perhaps even a reference implementation in Java)
* make the syntax more consistent and predictable

I've been using the following wiki page to capture some information related to the effort:


Here's a list of people who have expressed interest in contributing:

* Martin van Rappard
* Sebastian Daschner
* Erik Pragt
* Jakub Jirutka
* Vincent Massol

Feel free to add your name to this list (ideally on the wiki page).

I'd be interested in connecting with compiler groups at universities to see if any research projects can be formed around this effort. There are a lot of problems to explore with parsing lightweight markup languages (perhaps also aspects of natural language processing) that could make for interesting research and also help move AsciiDoc forward and evolve. If you know of anyone interested, please have them reach out (ideally through this list).

Here are the action items to get started:

Step 1. Choose a name for the GitHub repository name

Here are two possible names for the repo.

* asciidoc-grammar
* asciidoc-syntax

Should we use one of these or something else?

Step 2. Create an initial project structure.

Are we going to do it based on ANTLR 4 or some other parser? We'll need to decide that in order to create the initial project.

Who would like to do step 2?

Step 3. Hack!

I'm really excited about this initiative! Let's make it happen.

Cheers,




If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Asciidoc-syntax-definition-tp1920p2866.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML



--
The Mighty Brown Hat
http://www.festivalhat.com/
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

mojavelinux
Administrator
In reply to this post by sdaschner

On Mon, Mar 30, 2015 at 4:45 PM, sdaschner [via Asciidoctor :: Discussion] <[hidden email]> wrote:
I would use Github issues in the project for syntax suggestions / improvements, etc. Do you agree?

I completely agree. Once we have the project started on GitHub, we'll be able to feed all ideas through issues, pull requests and commits. If necessary, we can use the wiki, though I would prefer to just update the README for the project.

Should we initiate the project to use ANTLR4 or some other parser?

Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

mojavelinux
Administrator
In reply to this post by Sam

On Mon, Mar 30, 2015 at 4:54 PM, Sam [via Asciidoctor :: Discussion] <[hidden email]> wrote:
asciidoc-grammar:: to be the formal definition
asciidoc-syntax:: to be the human readable version

We could also make these subfolders in the project. In that case, perhaps:

asciidoc-syntax-definition/
  grammar/
  syntax (or even spec)

We don't have to get it perfect right now. What's most important is that we have files to start hacking on. It's easy to rearrange.

Sam
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

Sam


On 30 March 2015 at 16:59, mojavelinux [via Asciidoctor :: Discussion] <[hidden email]> wrote:


We could also make these subfolders in the project. In that case, perhaps:

asciidoc-syntax-definition/
  grammar/
  syntax (or even spec)

We don't have to get it perfect right now. What's most important is that we have files to start hacking on. It's easy to rearrange.

Sure - I think the only confusing thing for me personally would be if you called it syntax and it contained only grammar (as per my definitions above).

S
 
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

mojavelinux
Administrator

On Mon, Mar 30, 2015 at 5:46 PM, Sam [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Sure - I think the only confusing thing for me personally would be if you called it syntax and it contained only grammar (as per my definitions above).

In that case, for now, we should probably call it asciidoctor-grammar and then update it once we have the syntax / spec too. Changing a repository name doesn't break links, so it would support this evolution.

Thanks for the input!

Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

jirutka
I think that PEG or parser combinators would be more sufficient for Asciidoc syntax than classic parser generator like ANTLR.
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

mojavelinux
Administrator

On Wed, Apr 1, 2015 at 7:23 PM, jirutka [via Asciidoctor :: Discussion] <[hidden email]> wrote:
I think that PEG or parser combinators would be more sufficient for Asciidoc syntax than classic parser generator like ANTLR.

I was kind of thinking that too. In fact, even with an ANTLR grammar, we probably need to be able to parse it in a PEG style too.

What ANTLR gives us is a clear declaration of the grammer, though. PEG tends to mix the grammar and parser together, which makes it harder to see the choices. So perhaps we need to do both?

What's the tool of choice in the Java ecosystem for a PEG parser? I'm looking for something equivalent to Treetop from the Ruby ecosystem.

Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

asotobu
Yes there is one tool

https://github.com/sirthias/parboiled/wiki/Java-Parser but I think antlr will help us to maintain a clear division between grammar and implementation


El dc., 1 d’abr., 2015 a les 19.30 mojavelinux [via Asciidoctor :: Discussion]
[hidden email]> va escriure:

On Wed, Apr 1, 2015 at 7:23 PM, jirutka [via Asciidoctor :: Discussion] <[hidden email]> wrote:
I think that PEG or parser combinators would be more sufficient for Asciidoc syntax than classic parser generator like ANTLR.

I was kind of thinking that too. In fact, even with an ANTLR grammar, we probably need to be able to parse it in a PEG style too.

What ANTLR gives us is a clear declaration of the grammer, though. PEG tends to mix the grammar and parser together, which makes it harder to see the choices. So perhaps we need to do both?

What's the tool of choice in the Java ecosystem for a PEG parser? I'm looking for something equivalent to Treetop from the Ruby ecosystem.
If you reply to this email, your message will be added to the discussion below:
To unsubscribe from Asciidoc syntax definition, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

jirutka
I know Parboiled quite well. I wrote some extensions for Markdown parser written on top of Parboiled and also wrote some little grammar/parser of mine in it. Before that I wrote two grammar/parsers (see RSQL) in JavaCC, another classic parser generator. Parboiled was much simpler to use than JavaCC. By the way, another benefit of PEG is extensibility, users can write plugins extending the parser.

The problem is that Parboiled is a legacy project that isn’t under active development anymore. Sirthias (author of Parboiled) concluded that Java is very unsuitable for this, so he’s currently working on Parboiled 2 that is written entirely in Scala.

Java really isn’t a good choice for writing parser, it’ll be better to use Groovy, Scala or any other language that runs on JVM and provide only Java facade.

I think that the most universal would be JavaScript. It runs in a web browser (needed for live preview), on a server (node.js / io.js) and also in JVM (Nashorn, JS engine in Java 8, is quite good as I heard).

We should also consider implementation in some low-level language, to get small binary library usable as Ruby/Python/whatever native extension. I think that Rust is currently the best choice for that. There’s a promising project rust-peg.
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

sonson
In reply to this post by asotobu
Just as a side note:

Laika https://github.com/planet42/Laika is a good base for writing lightweight text markup parser. It already supports Markdown and Restructured Text, has many tests and works fast (actually much faster than the Markdown parser parboiled which is based on pegdown) . A JVM based extensible pandoc tool would be really cool *dream*.
Laika is based on Scala (JVM): Unfortunately there may not many Scala developers in the Asciidoctor community :-(.

More developers would be familiar with Antlr 4, which could generate code for Java and Javascript.
Reply | Threaded
Open this post in threaded view
|

Re: Asciidoc syntax definition

sdaschner
I also vote for ANTLR.
IMO we should have a Java-based parser (not only, necessarily) and keep the dependencies small due to the use cases we have (and could have). A current issue is for instance the AsciidoctorJ dependency on JRuby (which may cause trouble on classloading, e.g. on WildFly). So a "native"-Java solution would be the best IMO.
12