Login  Register

Unicode characters not converted in pdf.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options Options
Embed post
Permalink
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Unicode characters not converted in pdf.

Clemens
Hello,

I use special characters in some text which are entered as unicode. When I look at the result using the Asciidoctor Firefox Plugin I get the result I want.

When I compile the text using Asciidoctor-pdf I get error messages like:
Failed to parse formatted text: Σ

In the output I get the unconverted description I have entered, like
 "  Σ ".
Please not, that this also happens if I enter the character in decimal numbers instead of hexadecimal.

Probably there is something missing to correclty convert the expression. However I'm not able to find what it is.

Thank you for your help.

Please find below an example of the text in want to convert:
Hier werden die Parameter Mittelwert der Messreihe  [overline]#X# , Standardabweichung ( 𝜎 ),  Minimale (⤓) und maximale (⤒) Schichtdicke angezeigt.
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

mojavelinux
Administrator
Clemens,

Asciidoctor PDF currently only supports decimal character references. See https://github.com/asciidoctor/asciidoctor-pdf/issues/486

Cheers,

-Dan

On Thu, Feb 7, 2019 at 3:55 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Hello,

I use special characters in some text which are entered as unicode. When I look at the result using the Asciidoctor Firefox Plugin I get the result I want.

When I compile the text using Asciidoctor-pdf I get error messages like:
Failed to parse formatted text: &#x03a3;

In the output I get the unconverted description I have entered, like
 "  &#x03a3; ".
Please not, that this also happens if I enter the character in decimal numbers instead of hexadecimal.

Probably there is something missing to correclty convert the expression. However I'm not able to find what it is.

Thank you for your help.

Please find below an example of the text in want to convert:
Hier werden die Parameter Mittelwert der Messreihe  [overline]#X# , Standardabweichung ( &#120590; ),  Minimale (&#x2913;) und maximale (&#x2912;) Schichtdicke angezeigt.



If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

Clemens
Dear Dan,

thanks for the fast reply. I'm aware of that issue.

However, conversion also does not work when I enter unicode in decimal e.g.
Standardabweichung ( &#120590;) 

Cheers,

Clemens

Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

mojavelinux
Administrator
It seems like you are using a version of Asciidoctor older than 1.5.5. Character references with 6 decimals weren't supported until that version. If I use the latest Asciidoctor PDF and Asciidoctor, your example works.

(The next problem you are going to have is that the font in the default theme doesn't have that glyph).

Cheers,

-Dan

On Thu, Feb 7, 2019 at 4:38 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Dear Dan,

thanks for the fast reply. I'm aware of that issue.

However, conversion also does not work when I enter unicode in decimal e.g.
Standardabweichung ( &#120590;) 

Cheers,

Clemens




If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6705.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

Clemens
Sorry I forgot to post my version. I think it's the actual one, as I updated yesterday:

Asciidoctor PDF 1.5.0.alpha.16 using Asciidoctor 1.5.8 [https://asciidoctor.org]
Runtime Environment (ruby 2.4.3p205 (2017-12-14 revision 61247) [x64-mingw32]) (
lc:CP850 fs:Windows-1252 in:CP850 ex:CP850)
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

mojavelinux
Administrator
When you say it doesn't work with the decimal reference, are you getting a warning, or is the character just not showing up in the PDF?

Btw, I strongly recommend just entering the character directly and skipping the whole character reference abstraction.

-Dan
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

1marc1
Hi,

I just tried this as well with a font that has character #120590. The PDF output prints a square as if to indicate that the glyph isn't available. When I replace 120590 in the source with 64 (the number for the @ symbol) it works fine.

I created a table (and used asciidoctor-pdf to turn it into a pdf file) showing a whole range of glyphs:
Glyphs 120512 - 120679

All but a few of the glyphs in the table above are available in the font that I use.

When looking at a lower range (1748 - 1915), I found that the table is representative of the glyphs that are in the font:
Glyphs 1748 - 1915

Perhaps this helps someone in troubleshooting.

Version information:

Asciidoctor PDF 1.5.0.alpha.16 using Asciidoctor 1.5.6.2 [http://asciidoctor.org]
Runtime Environment (ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:- ex:UTF-8)

Marc.
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

mojavelinux
Administrator
Thanks for the analysis, Marc.

This appears to be a bug in Prawn's font handling. I tried to use the Prawn API directly with the raw glyph using a font that has that character and it still produces a square box. Perhaps Prawn stops looking for glyphs after a certain threshold.

I recommend submitting an issue for this at https://github.com/prawnpdf/prawn/ with the following test script:

require 'prawn'
 
Prawn::Document.generate 'missing-glyph.pdf' do
  def register_font data
    font_families.update data.inject({}) {|accum, (key, val)| accum[key.to_s] = val; accum }
  end
   
  register_font NotoSerif: {
    normal: '/usr/share/fonts/dejavu/DejaVuSerif.ttf'
  }

  font :NotoSerif do
    text '𝜎'
  end
end

Cheers,

-Dan

On Fri, Feb 8, 2019 at 3:57 AM 1marc1 [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Hi,

I just tried this as well with a font that has character #120590. The PDF output prints a square as if to indicate that the glyph isn't available. When I replace 120590 in the source with 64 (the number for the @ symbol) it works fine.

I created a table (and used asciidoctor-pdf to turn it into a pdf file) showing a whole range of glyphs:
Glyphs 120512 - 120679

All but a few of the glyphs in the table above are available in the font that I use.

When looking at a lower range (1748 - 1915), I found that the table is representative of the glyphs that are in the font:
Glyphs 1748 - 1915

Perhaps this helps someone in troubleshooting.

Marc.


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6711.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

Clemens
Dear Dan and Marc,

thanks for the support.

What does this mean for me? Wait until bug is fixed.

Somehow Asciidoctor Plugin for Firefox renders the unicode correctly. I assume it uses the same softwaretool as my Asccidoctor-pdf command line tool. Shouldn't it be possible to use the plugin routines for the command line tool.

@Dan I tried your suggestion to enter the uncode directly into the text document. Unfortunately I seem to be too stupid to do that correctly with Notepad++, as I do not get the correct characters (Mostly I get squares).

Anyway, thank you for your help.

Cheers,

Clemens
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

mojavelinux
Administrator
Clemens,

Since the bug occurs in a library that's outside of Asciidoctor (Prawn), I have no way of knowing when it would be fixed. You could report the issue in that project (https://github.com/prawnpdf/prawn/) and see what feedback you get.

In the meantime, if this character is important to you, you'll have to find another way to generate the PDF. One idea is to use "Print to PDF" from the browser.

The Firefox plugin and Asciidoctor PDF are different pieces of software and rely on different technologies. The Firefox plugin produces HTML that is rendered by the browser. Asciidoctor PDF produces PDF that is generated by Prawn. Just because one works doesn't mean the other will work. They are just different.

You are not doing anything wrong. I can't get the character to work in Asciidoctor PDF either. No one can. It's a bug in the software.

Cheers,

-Dan

On Wed, Feb 13, 2019 at 9:07 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Dear Dan and Marc,

thanks for the support.

What does this mean for me? Wait until bug is fixed.

Somehow Asciidoctor Plugin for Firefox renders the unicode correctly. I assume it uses the same softwaretool as my Asccidoctor-pdf command line tool. Shouldn't it be possible to use the plugin routines for the command line tool.

@Dan I tried your suggestion to enter the uncode directly into the text document. Unfortunately I seem to be too stupid to do that correctly with Notepad++, as I do not get the correct characters (Mostly I get squares).

Anyway, thank you for your help.

Cheers,

Clemens


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6727.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

1marc1
In reply to this post by Clemens
Clemens,

Depending on your use case, you might try a workaround:

1. Make a copy of your font.
2. Use a font glyph editor to copy the glyph required (120590) to a glyph that you do not anticipate to ever use (with a lower unicode number).
3. Reference your patched font in your asciidoctor-pdf theme.

The idea is that Prawn will be able to pickup the lower numbered glyph and in your patched font, that happens to represent the glyph you require.

I hope this helps.

Marc.
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

mojavelinux
Administrator
Marc, that's a really clever suggestion. 👍

Clemens, I strongly recommend using fontforge to make this modification. There's a great website available that documents how to do operations such as the one Marc suggested. See http://designwithfontforge.com/en-US/index.html

Cheers,

-Dan

On Wed, Feb 13, 2019 at 5:58 PM 1marc1 [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Clemens,

Depending on your use case, you might try a workaround:

1. Make a copy of your font.
2. Use a font glyph editor to copy the glyph required (120590) to a glyph that you do not anticipate to ever use (with a lower unicode number).
3. Reference your patched font in your asciidoctor-pdf theme.

The idea is that Prawn will be able to pickup the lower numbered glyph and in your patched font, that happens to represent the glyph you require.

I hope this helps.

Marc.


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6729.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

Clemens
Thank you for the suggestions. I will try this.

I also posted an issue at Prawn.

Cheers,

Clemens
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

Clemens
In reply to this post by mojavelinux
Dear Dan,

since it turned out, that Prawn does not seem to be responsible for my Unicode problem (https://github.com/prawnpdf/prawn/issues/1103) can you please give suggestions, where my Asciidoc might be misconfigured, as it's printing the Unicode characters unconverted (e.g.
&#963;
)

Thank you for your support.

Clemens
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

mojavelinux
Administrator
Clemens,

Can you please provide more context about where in the document you are using the character reference?

The quickest way to share an AsciiDoc sample is to use a gist. https://gist.github.com/

Enter a filename with the .adoc extension in the filename field and the relevant part of your document into the body. Verify that the excerpt demonstrates the same problem so I can copy and paste it.

Cheers,

-Dan

On Mon, Feb 18, 2019 at 1:43 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Dear Dan,

since it turned out, that Prawn does not seem to be responsible for my Unicode problem (https://github.com/prawnpdf/prawn/issues/1103) can you please give suggestions, where my Asciidoc might be misconfigured, as it's printing the Unicode characters unconverted (e.g.
&#963;
)

Thank you for your support.

Clemens



If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6739.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

Clemens
Dear Dan,

during creation of the test document I found the error myself.
I had one character still in Hex-format which resulted in Asciidoctor not converting the whole line.

Now everything works, except two symbols. But the also did not work with the prawn sample script. I think they are just not supported by the font.

Thank you for your patience,

Clemens
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

mojavelinux
Administrator
Clemens,

I'm so excited to hear that! I'm glad we managed to get it all figured out. Fonts are always a bit tricky because they are so often incomplete. But hopefully you have some insight now that will help you avoid getting stuck on it in the future.

Cheers!

-Dan

On Mon, Feb 18, 2019 at 2:54 AM Clemens [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Dear Dan,

during creation of the test document I found the error myself.
I had one character still in Hex-format which resulted in Asciidoctor not converting the whole line.

Now everything works, except two symbols. But the also did not work with the prawn sample script. I think they are just not supported by the font.

Thank you for your patience,

Clemens


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Unicode-characters-not-converted-in-pdf-tp6703p6743.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML


--
Dan Allen | @mojavelinux | https://twitter.com/mojavelinux
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Unicode characters not converted in pdf.

1marc1
This post was updated on Feb 18, 2019; 12:01pm.
Dan, Clemens,

While I was working out some stuff for the Prawn issue, I missed the conversation that was going on here. Great to hear things got resolved. Also, the reason why my test documents are showing the squares is very simply because these glyphs are not (physically) present in my font.

Creating the same table of glyphs, starting at 120512, does show all the glyphs if I use the FreeSerif font.

Marc.