Smarter, more consistent "smart quote" replacements

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Smarter, more consistent "smart quote" replacements

mojavelinux
Administrator
Recent discussions about "smart quote" replacements have prompted us to reevaluate how and when these replacements occur in AsciiDoc. I'd like to propose a multi-step change that will make the behavior more consistent and accurate.

== Step 1: Apostrophe replacement

In an effort to conform to AsciiDoc Python, Asciidoctor replaces apostrophe's with the curly quote equivalent, ’ (i.e., U+2019), by default.

Given:

----
I can't let you do that.
----

Output:

----
I can’t let you do that.
----

However, to get the smart quotes around phrases, AsciiDoc requires special syntax:

Given:

----
``I can't let you do that, Dave,'' said HAL.
``Squiggly saved my life when he yelled, `Watch out, Aardvark.'''
----

Output:

----
“I can’ let you do that, Dave,” said Hal.
“Squiggly saved my life when he yelled, ‘Watch out, Aardvark.”’
----

Replacing the straight apostrophe but not straight quotes around phrases is inconsistent. Thus, the first thing I'd like to propose is special syntax for the apostrophe replacement, a backtick immediately followed by a straight apostrophe.

Given:

----
I can`'t let you do that.
I just can't.
----

Output:

----
I can’t let you do that.
I just can't.
----

This change at least makes the replacements all explicit by default.

== Step 2: Moving smart character substitutions to replacements group

As it stands now, quotations around phrases are handled in the "quotes" substitution group, whereas the apostrophe replacement (and other smart character replacements, like dashes) are handled in the "replacements" substitution group.

I think all smart character replacements should happen in the same substitution group, in particular the "replacements" group. Not only is this more consistent, it also clears a path for us to control automatic smart quote replacement.

Here's why this matters. Let's assume we provide a setting to substitute quotation marks around phrases automatically (i.e., smartypants behavior). If that happens as part of the "quotes" substitution, where ever we enable the "quotes" substitution, we run the risk of getting smart quote replacement.

Consider:

[subs="+quotes"]
----
System.out.println("*bold* text");
----

In the output, we'd see:

----
System.out.println(&#8220;<strong>bold</strong> text&#8221;);
----

This would be bad. However, if the smart quotes were replaced in the "replacements" groups, then we'd get the expected output:

----
System.out.println("<strong>bold</strong> text");
----

I think this change introduces the proper separation of substitution types.

== Step 3: Automatic smart-quote replacement

Once we finish Step 1 and 2, then we can consider introducing an attribute to enable automatic smart quote replacement. When enabled, anywhere the "replacements" subs group is enabled (e.g., a normal paragraph), the smart quotes would get substituted without having to use special syntax:

Given:

----
:smartquotes: refs

"I can't let you do that, Dave," said HAL.
----

Output:

----
&#8220;I can&#8217;t let you do that, Dave,&#8221; said HAL.
----

The open question here is whether disabling smartquotes would also disable replacements of other smart characters like dashes and arrows. It makes sense to have them all on, all off or a way to specify what to enable / disable.

Notice the value for the smartquotes attribute is "refs". We may want to consider supporting the value "glyphs", which would insert the actual glyph (e.g., the curly quote) into the output.

----
“I can’t let you do that, Dave,” said HAL.
----

== Wrap-up

For Asciidoctor 1.5.0, I'm mostly looking for feedback on Step 1. The reason this first step is critical is because we're also planning to disable the single straight quotes markup to emphasize (italicize) a phrase (e.g., 'emphasis'). When we disable that substitution, the processor catches the second straight single quote and replaces it with &#8217;. That's why I think we want the explicit syntax, `', to enable this replacement.

When we update the smart character replacement, I plan to follow / build on the research done by the Python Smartypants project, since it appears to be the most accurate.

Cheers,

-Dan
Reply | Threaded
Open this post in threaded view
|

Re: Smarter, more consistent "smart quote" replacements

Rob Sykes
I can't say I'm too keen on  `' for an apostrophe, from both readability and writability reasons.  Latex seems to manage well with  back-tick and straight apostrophe for quotes and apostrophe; as described here  http://www.maths.tcd.ie/~dwilkins/LaTeXPrimer/QuotDash.html
And -roff uses this same mechanism, so it seems something of a standard which would be good to stick to if we can.