Best practice: Translation of Asciidoctor documents

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Best practice: Translation of Asciidoctor documents

cfeddersen
Hi,

I'm looking for experiences with translating Asciidoctor documents into different languages.
So far we've been using word documents that are translated via Trados. It comes with a translation memory, so identifying new or changed content was part of the translation process within Trados.

Is someone already doing that with Asciidoctor files as a source? Has somebody already created a Trados filter that handles asciidoc or asciidoctor syntax? A simple approach would be to send an entire asciidoctor document as well as a pdf version to the translation agency. The pdf would help them to see the formatting. Using the chrome extension might also be an option.

What I've found so far is this feature request: https://github.com/asciidoctor/asciidoctor/issues/788
The Neo4J project seems to use a set of scripts that make use of Po4a. My current understanding is that this toolchain  helps to identify untranslated or changed content, to create a changeset and it helps with merging the translated content. Is anyone using this in production on a regular basis? My impression is that a gettext-based approach is adding a lot of overhead if you're writing technical documentations or books with Asciidoctor. A translator usually needs the context of the paragraph to translate it correctly. So splitting it up into very small chunks doesn't feel right.


Thanks!
Christoph
Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Translation of Asciidoctor documents

mojavelinux
Administrator
Christoph,

Translation is an extremely important and critical feature, but unfortunately one we haven't tackled yet. That isn't to suggest we haven't thought about it. We've done quite a bit of brainstorming over the last few years. We want to make sure we approach the task right, rather than just doing something mechanical that doesn't provide the right level of flexibility for the translator.

I'm not aware of any existing support for translations in AsciiDoc source, other than the stopgap solution that's in Neo4j's build tools. I think we can, and should, do something better in Asciidoctor.

> A translator usually needs the context of the paragraph to translate it correctly. So splitting it up into very small chunks doesn't feel right.

I wholeheartedly agree. This is one of the main reasons we haven't jumped into a solution just yet. Working with the translations for the Arquillian guides {1} and talking to the people translating them, I came to the conclusion that the main service that is needed from the translation tool is to identify what content to review, not provide low-level text replacement. In fact, translators often asked if was okay to reorder sentences or even paragraphs to improve the flow in the native language. We certainly want to enable this level of flexibility to maximize the value of the translations.

The approach I might take today is to perform the translations at the section-level using a custom tree processor. The first step, obviously, is to pull out all the non-prose text into include files so as to minimize the amount of text in the translation (and keep the code samples DRY). Then, you'd probably swap in translations by section (based on what's available), or perhaps just allow the translation to be a standalone file with includes (you have to decide where to swing the pendulum here).

To advance this conversation, we should think about the design of a tool (aka an extension) that would produce a report as to which part of the translation (by section or block, perhaps) needs to be reviewed and updated based on git history of the master translation (e.g., english). In other words, what's the workflow? We have to decide first what to code.

There are a couple of translation services that provide this, so the question may also be, what does it take to integrate Asciidoctor into them. My experience is with Zanata (http://zanata.org), but I know there are others.

Cheers,

On Sun, Oct 5, 2014 at 12:13 PM, cfeddersen [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Hi,

I'm looking for experiences with translating Asciidoctor documents into different languages.
So far we've been using word documents that are translated via Trados. It comes with a translation memory, so identifying new or changed content was part of the translation process within Trados.

Is someone already doing that with Asciidoctor files as a source? Has somebody already created a Trados filter that handles asciidoc or asciidoctor syntax? A simple approach would be to send an entire asciidoctor document as well as a pdf version to the translation agency. The pdf would help them to see the formatting. Using the chrome extension might also be an option.

What I've found so far is this feature request: https://github.com/asciidoctor/asciidoctor/issues/788
The Neo4J project seems to use a set of scripts that make use of Po4a. My current understanding is that this toolchain  helps to identify untranslated or changed content, to create a changeset and it helps with merging the translated content. Is anyone using this in production on a regular basis? My impression is that a gettext-based approach is adding a lot of overhead if you're writing technical documentations or books with Asciidoctor. A translator usually needs the context of the paragraph to translate it correctly. So splitting it up into very small chunks doesn't feel right.


Thanks!
Christoph


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Best-practice-Translation-of-Asciidoctor-documents-tp2309.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML



--
Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Translation of Asciidoctor documents

cfeddersen
Hi Dan,

thanks for your input. I'll let you know how we proceed and if we invest in a asciidoctor filter for Trados or not. Others might be able to benefit from it as well. Our translation agency is currently evaluating how much work that will be.

I agree that the workflow is key, our situation is like this:

We maintain about a dozen products. We provide a documentation in two languages (german, english). Page sizes vary between 20 - 200, depending on the complexity of the product.
At the moment, we greatly benefit from the translation memory system in use. We write the documentation (in german). The "release candidate" gets copied into an english version and all screenshots are replaced with their english counterparts. Both files are send to our translation agency. It takes usually 1-2 weeks to get the translated version back.
We usually make minor changes during that time to the original version, based on the feedback of internal testing etc. The nice thing about the translation memory approach is that we don't need determine which parts have changed. We just need to know that we have changed something. If the updated original version will be reimported by our translation agency and the translation memory system automatically "translates" all unchanged content.

We use the same approach for regular maintenance releases. We do skip the translation agency for minor changes and translate them on our own in favor of a faster release cycle. This means that the translation memory system is not aware until we send the updated file to our translation agency. The will translate the minor changes a second time, which might differ from our translation, but that hasn't been an issue.

In the past that approach worked well with word documents. I expect it to work with asciidoctor files as well. That said, I'd classify our requirements as very basic and simple. A batch approach is sufficient for our needs.

I'm not sure if a more incremental approach is needed for other scenarios or not. Our main product is a content mangement system and batch approach is certainly not feasible for bigger customers for several reasons. You certainly have a better understanding how asciidoctor is used and if they are valid in that context or not:

1) New content is created all the time
2) number of editors is certainly higher than for a documentation or a book
3) time to production does matter more for "websites" than for technical documentation or books
4) the number of languages might be higher. For our customers it's common to have more than 20 languages.

That usually results in requirements like:

a) the translation part needs to be part of the larger "content creation workflow", which includes a review of the original version as well as the translated version.
b) it's common that the original version changes will being in translation. Depending on the content, it might be ok to release a outdated version to production or not. Critical content might need to be correct and up to date all the time. Non critical content can be outdated. Better outdated than not available.
c) a dashboard is needed to see all modified content that needs translation, all content "in translation", translation that need to be reviewed and imported.
d) You usually don't want to send every single change to the translation system immediately. It's much cheaper to queue up a number of smaller changes until and translate them at once.



Using git as a cornerstone of a "workflow" sounds good. You want to keep the documentation close to the source and build the documentation during the build process. It might be possible to use git best practices for certain parts of the workflow (feature branches, pull requests + approval for reviews, merging).

XLIFF (http://en.wikipedia.org/wiki/XLIFF) might be a good alternative to gettext/po. Version 2.0 has some great enhancements. Most translation tools already support XLIFF. The Okapi framework provides a library and an  open source toolstack that help translators (http://okapi.opentag.com/). Commercial tools support XLIFF as well. I'm not a translation expert but it seems to be more suitable for longer documents/books than gettext.

Christoph
Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Translation of Asciidoctor documents

mojavelinux
Administrator
Wow, thanks for the amazing feedback! I'm going to link this discussion to the issue so that it will serve as requirements for the feature and help shape it.

-Dan

On Wed, Oct 8, 2014 at 4:18 PM, cfeddersen [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Hi Dan,

thanks for your input. I'll let you know how we proceed and if we invest in a asciidoctor filter for Trados or not. Others might be able to benefit from it as well. Our translation agency is currently evaluating how much work that will be.

I agree that the workflow is key, our situation is like this:

We maintain about a dozen products. We provide a documentation in two languages (german, english). Page sizes vary between 20 - 200, depending on the complexity of the product.
At the moment, we greatly benefit from the translation memory system in use. We write the documentation (in german). The "release candidate" gets copied into an english version and all screenshots are replaced with their english counterparts. Both files are send to our translation agency. It takes usually 1-2 weeks to get the translated version back.
We usually make minor changes during that time to the original version, based on the feedback of internal testing etc. The nice thing about the translation memory approach is that we don't need determine which parts have changed. We just need to know that we have changed something. If the updated original version will be reimported by our translation agency and the translation memory system automatically "translates" all unchanged content.

We use the same approach for regular maintenance releases. We do skip the translation agency for minor changes and translate them on our own in favor of a faster release cycle. This means that the translation memory system is not aware until we send the updated file to our translation agency. The will translate the minor changes a second time, which might differ from our translation, but that hasn't been an issue.

In the past that approach worked well with word documents. I expect it to work with asciidoctor files as well. That said, I'd classify our requirements as very basic and simple. A batch approach is sufficient for our needs.

I'm not sure if a more incremental approach is needed for other scenarios or not. Our main product is a content mangement system and batch approach is certainly not feasible for bigger customers for several reasons. You certainly have a better understanding how asciidoctor is used and if they are valid in that context or not:

1) New content is created all the time
2) number of editors is certainly higher than for a documentation or a book
3) time to production does matter more for "websites" than for technical documentation or books
4) the number of languages might be higher. For our customers it's common to have more than 20 languages.

That usually results in requirements like:

a) the translation part needs to be part of the larger "content creation workflow", which includes a review of the original version as well as the translated version.
b) it's common that the original version changes will being in translation. Depending on the content, it might be ok to release a outdated version to production or not. Critical content might need to be correct and up to date all the time. Non critical content can be outdated. Better outdated than not available.
c) a dashboard is needed to see all modified content that needs translation, all content "in translation", translation that need to be reviewed and imported.
d) You usually don't want to send every single change to the translation system immediately. It's much cheaper to queue up a number of smaller changes until and translate them at once.



Using git as a cornerstone of a "workflow" sounds good. You want to keep the documentation close to the source and build the documentation during the build process. It might be possible to use git best practices for certain parts of the workflow (feature branches, pull requests + approval for reviews, merging).

XLIFF (http://en.wikipedia.org/wiki/XLIFF) might be a good alternative to gettext/po. Version 2.0 has some great enhancements. Most translation tools already support XLIFF. The Okapi framework provides a library and an  open source toolstack that help translators (http://okapi.opentag.com/). Commercial tools support XLIFF as well. I'm not a translation expert but it seems to be more suitable for longer documents/books than gettext.

Christoph


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Best-practice-Translation-of-Asciidoctor-documents-tp2309p2325.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML



--
Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Translation of Asciidoctor documents

mojavelinux
Administrator
In reply to this post by cfeddersen
> XLIFF (http://en.wikipedia.org/wiki/XLIFF) might be a good alternative to gettext/po. 

The "X" in that tool scares me because it stands for "XML". I think the XML format puts a barrier for contributions in the content / documentation space and/or makes us reliant on tools (because who wants to edit XML?).

I don't know if gettext is the right solution yet. Perhaps it's just one solution. In a perfect world, I'd like to see translations done in AsciiDoc. It makes sense that we'd reuse AsciiDoc for translations since we can maintain some amount of the original structure (don't have to be 1-to-1) and by itself, the translation is just as readable in raw form as the original. Of course, we'd need to develop tooling to help manage the synchronization and do the assembly.

I want to add that several Asciidoctor users actually use the DocBook translation system (and tools that support it) to handle translations. While the original document (for instance, the English one) is in AsciiDoc, the document is converted to DocBook and the translations are done from there. That's probably the most robust solution today. (For reference, see the Weld project).

Cheers,

-Dan

On Fri, Oct 10, 2014 at 1:18 AM, Dan Allen <[hidden email]> wrote:
Wow, thanks for the amazing feedback! I'm going to link this discussion to the issue so that it will serve as requirements for the feature and help shape it.

-Dan

On Wed, Oct 8, 2014 at 4:18 PM, cfeddersen [via Asciidoctor :: Discussion] <[hidden email]> wrote:
Hi Dan,

thanks for your input. I'll let you know how we proceed and if we invest in a asciidoctor filter for Trados or not. Others might be able to benefit from it as well. Our translation agency is currently evaluating how much work that will be.

I agree that the workflow is key, our situation is like this:

We maintain about a dozen products. We provide a documentation in two languages (german, english). Page sizes vary between 20 - 200, depending on the complexity of the product.
At the moment, we greatly benefit from the translation memory system in use. We write the documentation (in german). The "release candidate" gets copied into an english version and all screenshots are replaced with their english counterparts. Both files are send to our translation agency. It takes usually 1-2 weeks to get the translated version back.
We usually make minor changes during that time to the original version, based on the feedback of internal testing etc. The nice thing about the translation memory approach is that we don't need determine which parts have changed. We just need to know that we have changed something. If the updated original version will be reimported by our translation agency and the translation memory system automatically "translates" all unchanged content.

We use the same approach for regular maintenance releases. We do skip the translation agency for minor changes and translate them on our own in favor of a faster release cycle. This means that the translation memory system is not aware until we send the updated file to our translation agency. The will translate the minor changes a second time, which might differ from our translation, but that hasn't been an issue.

In the past that approach worked well with word documents. I expect it to work with asciidoctor files as well. That said, I'd classify our requirements as very basic and simple. A batch approach is sufficient for our needs.

I'm not sure if a more incremental approach is needed for other scenarios or not. Our main product is a content mangement system and batch approach is certainly not feasible for bigger customers for several reasons. You certainly have a better understanding how asciidoctor is used and if they are valid in that context or not:

1) New content is created all the time
2) number of editors is certainly higher than for a documentation or a book
3) time to production does matter more for "websites" than for technical documentation or books
4) the number of languages might be higher. For our customers it's common to have more than 20 languages.

That usually results in requirements like:

a) the translation part needs to be part of the larger "content creation workflow", which includes a review of the original version as well as the translated version.
b) it's common that the original version changes will being in translation. Depending on the content, it might be ok to release a outdated version to production or not. Critical content might need to be correct and up to date all the time. Non critical content can be outdated. Better outdated than not available.
c) a dashboard is needed to see all modified content that needs translation, all content "in translation", translation that need to be reviewed and imported.
d) You usually don't want to send every single change to the translation system immediately. It's much cheaper to queue up a number of smaller changes until and translate them at once.



Using git as a cornerstone of a "workflow" sounds good. You want to keep the documentation close to the source and build the documentation during the build process. It might be possible to use git best practices for certain parts of the workflow (feature branches, pull requests + approval for reviews, merging).

XLIFF (http://en.wikipedia.org/wiki/XLIFF) might be a good alternative to gettext/po. Version 2.0 has some great enhancements. Most translation tools already support XLIFF. The Okapi framework provides a library and an  open source toolstack that help translators (http://okapi.opentag.com/). Commercial tools support XLIFF as well. I'm not a translation expert but it seems to be more suitable for longer documents/books than gettext.

Christoph


If you reply to this email, your message will be added to the discussion below:
http://discuss.asciidoctor.org/Best-practice-Translation-of-Asciidoctor-documents-tp2309p2325.html
To start a new topic under Asciidoctor :: Discussion, email [hidden email]
To unsubscribe from Asciidoctor :: Discussion, click here.
NAML



--



--
Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Translation of Asciidoctor documents

cfeddersen
Hi Dan,


I agree that nobody wants to edit XML and I think nobody does have to.
I thought about XLIFF just as an intermediate format that helps translators. It would allow them to use their tool of choice to do the translation. I haven't found any information on the Weld website how their translation processes looks like, but my guess is that they use DocBook as an intermediate format because it's supported by many translation tools.

As I said we use a professional translation agency. They don't know Asciidoctor and their first reaction was: What's syntax/formatting and what's the content we need to translate. They just want to have a formatted version of the source language (a pdf for example) to get an idea how the end result will look like. For translation, they just want unformatted content. Professional translation tools provide that for standard formats like XLIFF and DocBook. They will hide all formatting from the editor.

To my mind AsciiDoc is great for developers/technical writers, but I don't think that a translator wants to translate in AsciiDoc. A translator needs a tool that is connected to a translation memory system, supports his workflow, provides grammar/spellchecking etc. So my suggestion would be to find an approach to integrate with existing commercial and open source translation tools (example: http://www.omegat.org/en/omegat.html). For open-source, I found the Okapi framework really helpful http://www.opentag.com/okapi/wiki/index.php?title=Main_Page. It has a concept of filters, similar to commercial tools like Trados. Basically a filter extracts all translatable content from a file. See http://www.opentag.com/okapi/wiki/index.php?title=Filters for examples, the "wiki filter" in particular.

To my mind there are two options:

1) Write filters for a couple translations tools. They'll extract the content and will write the translated content back into an AsciiDoc file. Pro: Simple, no large toolchain. Formatting his hidden. Contra: You'll have to do it for every translation tool you want to support.

2) Use an intermediate format like DocBook or XLIFF. Pro: Most translation tools will be able to handle these formats and hide the formatting. Contra: Toolchain is larger; For me it's important to have the original document and all translation in AsciiDoc format. Maybe that could be done if we can provide a converter/transformation from DocBook/XLIFF to AsciiDoc. It should be possible with XLIFF 2.0, as it's a lossless exchange format. I'm not very familiar with DocBook, so can't comment on that.


Both approaches should fit the asciidoctor philosophy of multiple backends that transform/convert the source into another format.

Best,
Christoph
Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Translation of Asciidoctor documents

mojavelinux
Administrator
my guess is that [Weld] use[s] DocBook as an intermediate format because it's supported by many translation tools.

Exactly.
 


They just want to have a formatted version of the source language (a pdf for example) to get an idea how the end result will look like.

I totally get that.
 

To my mind AsciiDoc is great for developers/technical writers, but I don't think that a translator wants to translate in AsciiDoc. A translator needs a tool that is connected to a translation memory system, supports his workflow, provides grammar/spellchecking etc.

I understand that's how it is today, but my hope is to be able to change that perception. I absolutely agree it needs to be connected to a translation memory system, but I think the "chunks" of AsciiDoc content are reasonable to translate in raw form (they can be presented with AsciiDoc and translate it to equivalent AsciiDoc). That's a vision, though, not yet a reality.
 
So my suggestion would be to find an approach to integrate with existing commercial and open source translation tools (example: http://www.omegat.org/en/omegat.html). For open-source, I found the Okapi framework really helpful http://www.opentag.com/okapi/wiki/index.php?title=Main_Page. It has a concept of filters, similar to commercial tools like Trados. Basically a filter extracts all translatable content from a file. See http://www.opentag.com/okapi/wiki/index.php?title=Filters for examples, the "wiki filter" in particular.

Yep, that's definitely along the lines of my idea. I do think it's reasonable to extract AsciiDoc chunks, but that will likely be tunable (we could go lower-level). I'm only interested in integrating with open source solutions, but the community could certainly follow the pattern to pick up the proprietary integrations.
 

1) Write filters for a couple translations tools. They'll extract the content and will write the translated content back into an AsciiDoc file. Pro: Simple, no large toolchain. Formatting his hidden. Contra: You'll have to do it for every translation tool you want to support.

2) Use an intermediate format like DocBook or XLIFF. Pro: Most translation tools will be able to handle these formats and hide the formatting. Contra: Toolchain is larger; For me it's important to have the original document and all translation in AsciiDoc format. Maybe that could be done if we can provide a converter/transformation from DocBook/XLIFF to AsciiDoc. It should be possible with XLIFF 2.0, as it's a lossless exchange format. I'm not very familiar with DocBook, so can't comment on that.

Nice summary. I know some teams at Red Hat (and perhaps Pivotal) are using (or at least experimenting with) the approach of two-way conversion between AsciiDoc and DocBook for translations. We can go from DocBook to AsciiDoc using Docbook Rx {1} (though it still needs some work).

I think we have a much better picture as to where to head now. We'll keep on this.

-Dan


--
Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Translation of Asciidoctor documents

vbidaux
Hi,

Did someone already explore the possibility of translating an internal XML pivot format (tree structure?) of asciidoctor process?

Some Computer Aided Translation tools can work with arbitrary XML, with proper ITS rules (to specify what has or has'nt to be translated).
See: https://www.w3.org/TR/its/

This would shorten the toolchain, making conceivable the translation of frequent updates in dozens of languages

Thanks

Vincent
OmegaT team member
Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Translation of Asciidoctor documents

ciampix
On Wed, Aug 03, 2016 at 01:22:22AM -0700, vbidaux [via Asciidoctor :: Discussion] wrote:

>
>
> Hi,
>
> Did someone already explore the possibility of translating an internal XML
> pivot format (tree structure?) of asciidoctor process?
>
> Some Computer Aided Translation tools can work with arbitrary XML, with
> proper ITS rules (to specify what has or has'nt to be translated).
> See: https://www.w3.org/TR/its/
>
> This would shorten the toolchain, making conceivable the translation of
> frequent updates in dozens of languages
>
> Thanks
>
> Vincent
> OmegaT team member

Why translate XML when you can translate directly in asciidoc with po4a?

See the KiCad manuals:

http://kicad-pcb.org/help/documentation/

bye

--


Marco Ciampa

I know a joke about UDP, but you might not get it.

------------------------

 GNU/Linux User #78271
 FSFE fellow #364

------------------------