Asciidoctor :: Discussion

Unicode characters not converted in pdf.

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

18 messages Options

Clemens

Unicode characters not converted in pdf.

Hello,

I use special characters in some text which are entered as unicode. When I look at the result using the Asciidoctor Firefox Plugin I get the result I want.

When I compile the text using Asciidoctor-pdf I get error messages like:

Failed to parse formatted text: &#x03a3;

In the output I get the unconverted description I have entered, like

 "  &#x03a3; ".

Please not, that this also happens if I enter the character in decimal numbers instead of hexadecimal.

Probably there is something missing to correclty convert the expression. However I'm not able to find what it is.

Thank you for your help.

Please find below an example of the text in want to convert:

Hier werden die Parameter Mittelwert der Messreihe  [overline]#X# , Standardabweichung ( &#120590; ),  Minimale (&#x2913;) und maximale (&#x2912;) Schichtdicke angezeigt.

mojavelinux

Re: Unicode characters not converted in pdf.

Administrator

Clemens,

Asciidoctor PDF currently only supports decimal character references. See https://github.com/asciidoctor/asciidoctor-pdf/issues/486

Cheers,

-Dan

On Thu, Feb 7, 2019 at 3:55 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hello,

I use special characters in some text which are entered as unicode. When I look at the result using the Asciidoctor Firefox Plugin I get the result I want.

When I compile the text using Asciidoctor-pdf I get error messages like:
Failed to parse formatted text: &#x03a3;
In the output I get the unconverted description I have entered, like
 "  &#x03a3; ".
Please not, that this also happens if I enter the character in decimal numbers instead of hexadecimal.

Probably there is something missing to correclty convert the expression. However I'm not able to find what it is.

Thank you for your help.

Please find below an example of the text in want to convert:
Hier werden die Parameter Mittelwert der Messreihe  [overline]#X# , Standardabweichung ( &#120590; ),  Minimale (&#x2913;) und maximale (&#x2912;) Schichtdicke angezeigt.
If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | @mojavelinux | https://twitter.com/mojavelinux

Clemens

Re: Unicode characters not converted in pdf.

Dear Dan,

thanks for the fast reply. I'm aware of that issue.

However, conversion also does not work when I enter unicode in decimal e.g.

Standardabweichung ( &#120590;)

Cheers,

Clemens

mojavelinux

Re: Unicode characters not converted in pdf.

Administrator

It seems like you are using a version of Asciidoctor older than 1.5.5. Character references with 6 decimals weren't supported until that version. If I use the latest Asciidoctor PDF and Asciidoctor, your example works.

(The next problem you are going to have is that the font in the default theme doesn't have that glyph).

Cheers,

-Dan

On Thu, Feb 7, 2019 at 4:38 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Dear Dan,

thanks for the fast reply. I'm aware of that issue.

However, conversion also does not work when I enter unicode in decimal e.g.
Standardabweichung ( &#120590;) 
Cheers,

Clemens

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6705.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | @mojavelinux | https://twitter.com/mojavelinux

Clemens

Re: Unicode characters not converted in pdf.

Sorry I forgot to post my version. I think it's the actual one, as I updated yesterday:

Asciidoctor PDF 1.5.0.alpha.16 using Asciidoctor 1.5.8 [https://asciidoctor.org]
Runtime Environment (ruby 2.4.3p205 (2017-12-14 revision 61247) [x64-mingw32]) (
lc:CP850 fs:Windows-1252 in:CP850 ex:CP850)

mojavelinux

Re: Unicode characters not converted in pdf.

Administrator

When you say it doesn't work with the decimal reference, are you getting a warning, or is the character just not showing up in the PDF?

Btw, I strongly recommend just entering the character directly and skipping the whole character reference abstraction.

-Dan

1marc1

Re: Unicode characters not converted in pdf.

Hi,

I just tried this as well with a font that has character #120590. The PDF output prints a square as if to indicate that the glyph isn't available. When I replace 120590 in the source with 64 (the number for the @ symbol) it works fine.

I created a table (and used asciidoctor-pdf to turn it into a pdf file) showing a whole range of glyphs:
Glyphs 120512 - 120679

All but a few of the glyphs in the table above are available in the font that I use.

When looking at a lower range (1748 - 1915), I found that the table is representative of the glyphs that are in the font:
Glyphs 1748 - 1915

Perhaps this helps someone in troubleshooting.

Version information:

Asciidoctor PDF 1.5.0.alpha.16 using Asciidoctor 1.5.6.2 [http://asciidoctor.org]
Runtime Environment (ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:- ex:UTF-8)

Marc.

mojavelinux

Re: Unicode characters not converted in pdf.

Administrator

Thanks for the analysis, Marc.

This appears to be a bug in Prawn's font handling. I tried to use the Prawn API directly with the raw glyph using a font that has that character and it still produces a square box. Perhaps Prawn stops looking for glyphs after a certain threshold.

I recommend submitting an issue for this at https://github.com/prawnpdf/prawn/ with the following test script:

require 'prawn'

Prawn::Document.generate 'missing-glyph.pdf' do
def register_font data
    font_families.update data.inject({}) {|accum, (key, val)| accum[key.to_s] = val; accum }
end

register_font NotoSerif: {
    normal: '/usr/share/fonts/dejavu/DejaVuSerif.ttf'
}

font :NotoSerif do
    text '𝜎'
end
end

Cheers,

-Dan

On Fri, Feb 8, 2019 at 3:57 AM 1marc1 [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hi,

I just tried this as well with a font that has character #120590. The PDF output prints a square as if to indicate that the glyph isn't available. When I replace 120590 in the source with 64 (the number for the @ symbol) it works fine.

I created a table (and used asciidoctor-pdf to turn it into a pdf file) showing a whole range of glyphs:

All but a few of the glyphs in the table above are available in the font that I use.

When looking at a lower range (1748 - 1915), I found that the table is representative of the glyphs that are in the font:

Perhaps this helps someone in troubleshooting.

Marc.

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6711.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | @mojavelinux | https://twitter.com/mojavelinux

Clemens

Re: Unicode characters not converted in pdf.

Dear Dan and Marc,

thanks for the support.

What does this mean for me? Wait until bug is fixed.

Somehow Asciidoctor Plugin for Firefox renders the unicode correctly. I assume it uses the same softwaretool as my Asccidoctor-pdf command line tool. Shouldn't it be possible to use the plugin routines for the command line tool.

@Dan I tried your suggestion to enter the uncode directly into the text document. Unfortunately I seem to be too stupid to do that correctly with Notepad++, as I do not get the correct characters (Mostly I get squares).

Anyway, thank you for your help.

Cheers,

Clemens

mojavelinux

Re: Unicode characters not converted in pdf.

Administrator

Clemens,

Since the bug occurs in a library that's outside of Asciidoctor (Prawn), I have no way of knowing when it would be fixed. You could report the issue in that project (https://github.com/prawnpdf/prawn/) and see what feedback you get.

In the meantime, if this character is important to you, you'll have to find another way to generate the PDF. One idea is to use "Print to PDF" from the browser.

The Firefox plugin and Asciidoctor PDF are different pieces of software and rely on different technologies. The Firefox plugin produces HTML that is rendered by the browser. Asciidoctor PDF produces PDF that is generated by Prawn. Just because one works doesn't mean the other will work. They are just different.

You are not doing anything wrong. I can't get the character to work in Asciidoctor PDF either. No one can. It's a bug in the software.

Cheers,

-Dan

On Wed, Feb 13, 2019 at 9:07 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Dear Dan and Marc,

thanks for the support.

What does this mean for me? Wait until bug is fixed.

Somehow Asciidoctor Plugin for Firefox renders the unicode correctly. I assume it uses the same softwaretool as my Asccidoctor-pdf command line tool. Shouldn't it be possible to use the plugin routines for the command line tool.

@Dan I tried your suggestion to enter the uncode directly into the text document. Unfortunately I seem to be too stupid to do that correctly with Notepad++, as I do not get the correct characters (Mostly I get squares).

Anyway, thank you for your help.

Cheers,

Clemens

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6727.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | @mojavelinux | https://twitter.com/mojavelinux

1marc1

Re: Unicode characters not converted in pdf.

In reply to this post by Clemens

Clemens,

Depending on your use case, you might try a workaround:

1. Make a copy of your font.
2. Use a font glyph editor to copy the glyph required (120590) to a glyph that you do not anticipate to ever use (with a lower unicode number).
3. Reference your patched font in your asciidoctor-pdf theme.

The idea is that Prawn will be able to pickup the lower numbered glyph and in your patched font, that happens to represent the glyph you require.

I hope this helps.

Marc.

mojavelinux

Re: Unicode characters not converted in pdf.

Administrator

Marc, that's a really clever suggestion. 👍

Clemens, I strongly recommend using fontforge to make this modification. There's a great website available that documents how to do operations such as the one Marc suggested. See http://designwithfontforge.com/en-US/index.html

Cheers,

-Dan

On Wed, Feb 13, 2019 at 5:58 PM 1marc1 [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Clemens,

Depending on your use case, you might try a workaround:

1. Make a copy of your font.
2. Use a font glyph editor to copy the glyph required (120590) to a glyph that you do not anticipate to ever use (with a lower unicode number).
3. Reference your patched font in your asciidoctor-pdf theme.

The idea is that Prawn will be able to pickup the lower numbered glyph and in your patched font, that happens to represent the glyph you require.

I hope this helps.

Marc.

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6729.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | @mojavelinux | https://twitter.com/mojavelinux

Clemens

Re: Unicode characters not converted in pdf.

Thank you for the suggestions. I will try this.

I also posted an issue at Prawn.

Cheers,

Clemens

Clemens

Re: Unicode characters not converted in pdf.

In reply to this post by mojavelinux

Dear Dan,

since it turned out, that Prawn does not seem to be responsible for my Unicode problem (https://github.com/prawnpdf/prawn/issues/1103) can you please give suggestions, where my Asciidoc might be misconfigured, as it's printing the Unicode characters unconverted (e.g.

&#963;

)

Thank you for your support.

Clemens

mojavelinux

Re: Unicode characters not converted in pdf.

Administrator

Clemens,

Can you please provide more context about where in the document you are using the character reference?

The quickest way to share an AsciiDoc sample is to use a gist. https://gist.github.com/

Enter a filename with the .adoc extension in the filename field and the relevant part of your document into the body. Verify that the excerpt demonstrates the same problem so I can copy and paste it.

Cheers,

-Dan

On Mon, Feb 18, 2019 at 1:43 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Dear Dan,

since it turned out, that Prawn does not seem to be responsible for my Unicode problem (https://github.com/prawnpdf/prawn/issues/1103) can you please give suggestions, where my Asciidoc might be misconfigured, as it's printing the Unicode characters unconverted (e.g.
&#963;
)

Thank you for your support.

Clemens

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6739.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | @mojavelinux | https://twitter.com/mojavelinux

Clemens

Re: Unicode characters not converted in pdf.

Dear Dan,

during creation of the test document I found the error myself.
I had one character still in Hex-format which resulted in Asciidoctor not converting the whole line.

Now everything works, except two symbols. But the also did not work with the prawn sample script. I think they are just not supported by the font.

Thank you for your patience,

Clemens

mojavelinux

Re: Unicode characters not converted in pdf.

Administrator

Clemens,

I'm so excited to hear that! I'm glad we managed to get it all figured out. Fonts are always a bit tricky because they are so often incomplete. But hopefully you have some insight now that will help you avoid getting stuck on it in the future.

Cheers!

-Dan

On Mon, Feb 18, 2019 at 2:54 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Dear Dan,

during creation of the test document I found the error myself.
I had one character still in Hex-format which resulted in Asciidoctor not converting the whole line.

Now everything works, except two symbols. But the also did not work with the prawn sample script. I think they are just not supported by the font.

Thank you for your patience,

Clemens

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6743.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | @mojavelinux | https://twitter.com/mojavelinux

1marc1

Re: Unicode characters not converted in pdf.

This post was updated on .

Dan, Clemens,

While I was working out some stuff for the Prawn issue, I missed the conversation that was going on here. Great to hear things got resolved. Also, the reason why my test documents are showing the squares is very simply because these glyphs are not (physically) present in my font.

Creating the same table of glyphs, starting at 120512, does show all the glyphs if I use the FreeSerif font.

Marc.