Asciidoctor :: Discussion

invalid byte sequence in UTF-8 (ArgumentError)

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

26 messages Options

derek-jones

Re: invalid byte sequence in UTF-8 (ArgumentError)

Chris,

I get the error using Suse (ok, you could say that is a German version of Linux).

My file probably contains a few non-ascii characters, no idea what encoding might be used (but asciidoc works fine).

Restricting which editor or OS version must be used to create the input fed to asciidoctor will only ensure that hardly anybody ever uses it, which would be a shame.

asciidoctor needs to handle whatever is thrown at it without throwing a Ruby error.

The output from asciidoctor might be "I don't like this line/character: %s".
This will at least give the user some idea of the problem.

asciidoctor looks like it might be the way forward for me and I thought I would give it a go on a book I am working on:
http://shape-of-code.coding-guidelines.com/2012/06/22/background-to-my-book-project-empirical-software-engineering-with-r/

Chris

Re: invalid byte sequence in UTF-8 (ArgumentError)

Derek,

I'm only a beginning user of Asciidocotor on Windows Plattform. I don't know nothing about Linux or Suse.

As you can see in my posts in this thread I updated Ruby to version 2.0.0 and my error was gone! Make sure you got the latest Asciidoctor v1.5.0 also.

Then get an Editor that creates and stores files in UTF-8 Format, like "Sublime Text 2" or "Sublime Text 3" (like me) or as recommended from Dan "Atom" or "Brackets". But for most serious cross plattform Editors it is standard anyway to create and store files in UTF-8 format. So there should be many of them available.

That's all I can say. But all that is already said in this thread before.

Good Luck!

derek-jones

Re: invalid byte sequence in UTF-8 (ArgumentError)

Chris,

My apologies for incorrectly addressing my last post to you. I meant to address it to Dan or perhaps he prefers to be called mojavelinux on this list.

I should have been more careful in responding to the sudden flurry of activity.

Thank you for such a polite response to what must have appeared to be a confusing message from me.

Derek

mojavelinux

Re: invalid byte sequence in UTF-8 (ArgumentError)

Administrator

Derek et al,

I want to be sure to clarify that I do consider this an important issue. We just want to make sure we address the situation while also educating users about using encodings properly (admittedly, it's a confusing topic). In Asciidoctor, we are very much focused on strong tooling while at the same time improving the state of the industry. We never want to suggest that people are doing it wrong, simply that we can help them find a path towards correctness for their own benefit.

> Restricting which editor or OS version must be used to create the input fed to asciidoctor will only ensure that hardly anybody ever uses it, which would be a shame.

We certainly don't want to restrict editor choice, except in the extreme case where the editor is actually putting garbage into the document like Notepad. In fact, I'd go far to say that we shouldn't view Notepad as an editor because it really isn't doing its job right. It's really the only exception. Everything else should be good.

> asciidoctor needs to handle whatever is thrown at it without throwing a Ruby error.

Agreed. Part of the issue here is that our hands are a little tied by Ruby < 1.9.3. As of Ruby 2.0 (and even Ruby 1.9.3 to a lesser extent), there are actually APIs to handle encodings elegantly. We can start making use of these APIs so that at least the conversion doesn't completely blow up. I'll file an issue to weave in String#scrub so that the worst that can happen is that bad characters are dropped, showing a warning in verbose mode.

> I meant to address it to Dan or perhaps he prefers to be called mojavelinux on this list.

Call me Dan, mojavelinux or crazy, whatever works for you :)

Cheers!

-Dan

On Sun, Aug 24, 2014 at 3:47 PM, derek-jones [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Chris,

My apologies for incorrectly addressing my last post to you. I meant to address it to Dan or perhaps he prefers to be called mojavelinux on this list.

I should have been more careful in responding to the sudden flurry of activity.

Thank you for such a polite response to what must have appeared to be a confusing message from me.

Derek

If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/invalid-byte-sequence-in-UTF-8-ArgumentError-tp2003p2071.html

To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML

Dan Allen | http://google.com/profiles/dan.j.allen

derek-jones

Re: invalid byte sequence in UTF-8 (ArgumentError)

Dan aka Mr Crazy ;-)

> I want to be sure to clarify that I do consider this an important issue. We just want to make sure we address the situation while also educating users about using encodings properly (admittedly, it's a confusing topic). In Asciidoctor, we are very much focused on strong tooling while at the same time improving the state of the industry. We never want to suggest that people are doing it wrong, simply that we can help them find a path towards correctness for their own benefit.

Thanks for taking the time to let me know my issue has not been forgotten.

Encoding is a complicated topic. Last weekend I got bitten by sed complaining "RE error: illegal byte sequence". It was not until I read the source of sed that I found out that this complaint was caused by regexec not liking a character in the input file (not the pattern).

The real problem I had with asciidoctor was there were was no obvious way of locating the offending character in the input file.

I found the character last week when going via the dblatex to pdf route using asciidoc, when dblatex listed the line number of the line that it did not like.

It would be useful if ascidoctor supported some kind of verbose trace mode to enable users to locate the part of the file that is causing problem to occur.

I'm looking forward to trying out the next release!

mojavelinux

Re: invalid byte sequence in UTF-8 (ArgumentError)

Administrator

On Wed, Oct 8, 2014 at 5:12 AM, derek-jones [via Asciidoctor :: Discussion] <[hidden email]> wrote:

The real problem I had with asciidoctor was there were was no obvious way of locating the offending character in the input file.
...
It would be useful if ascidoctor supported some kind of verbose trace mode to enable users to locate the part of the file that is causing problem to occur.

Agreed, the way forward here is to make sure we help the author locate the source of the problem. We do have line number tracing in some areas, so we just need to expand that to cover this issue.

https://github.com/asciidoctor/asciidoctor/issues/1131

-Dan

Dan Allen | http://google.com/profiles/dan.j.allen